Encyclopedia of Mathematical Physics 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 726

http://www.nd-warez.

info/
EDITORS

Jean-Pierre Francoise Gregory L. Naber Tsou Sheung Tsun


Universite P.-M. Curie, Paris VI Drexel University University of Oxford
Paris, France Philadelphia, PA, USA Oxford, UK
EDITORIAL ADVISORY BOARD

Sergio Albeverio Lisa Jeffrey


Rheinische Friedrich-Wilhelms-Universitat Bonn University of Toronto
Bonn, Germany Toronto, Canada

Huzihiro Araki T.W.B. Kibble


Kyoto University Imperial College of Science, Technology and Medicine
Kyoto, Japan London, UK

Abhay Ashtekar Antti Kupiainen


Pennsylvania State University University of Helsinki
University Park, PA, USA Helsinki, Finland

Andrea Braides Shahn Majid


Universita di Roma Tor Vergata Queen Mary, University of London
Roma, Italy London, UK

Francesco Calogero Barry M. McCoy


Universita di Roma La Sapienza State University of New York Stony Brook
Roma, Italy Stony Brook, NY, USA

Cecile DeWitt-Morette Hirosi Ooguri


The University of Texas at Austin California Institute of Technology
Austin, TX, USA Pasadena, CA, USA

Artur Ekert Roger Penrose


University of Cambridge University of Oxford
Cambridge, UK Oxford, UK

Giovanni Gallavotti Pierre Ramond


Universita di Roma La Sapienza University of Florida
Roma, Italy Gainesville, FL, USA

Simon Gindikin Tudor Ratiu


Rutgers University Ecole Polytechnique Federale de Lausanne
Piscataway, NJ, USA Lausanne, Switzerland

Gennadi Henkin Rudolf Schmid


Universite P.-M. Curie, Paris VI Emory University
Paris, France Atlanta, GA, USA

Allen C. Hirshfeld Albert Schwarz


Universitat Dortmund University of California
Dortmund, Germany Davis, CA, USA
Yakov Sinai Vladimir Turaev
Princeton University Institut de Recherche Mathematique Avancee,
Princeton, NJ, USA Strasbourg, France

Herbert Spohn Gabriele Veneziano


Technische Universitat Munchen CERN, Geneve, Switzerland
Munchen, Germany
Reinhard F. Werner
Stephen J. Summers Technische Universitat Braunschweig
University of Florida Braunschweig, Germany
Gainesville, FL, USA
C.N. Yang
Roger Temam Tsinghua University
Indiana University Beijing, China
Bloomington, IN, USA
Eberhard Zeidler
Craig A. Tracy Max-Planck Institut fur Mathematik in
University of California den Naturwissenschaften
Davis, CA, USA Leipzig, Germany

Andrzej Trautman Steve Zelditch


Warsaw University Johns Hopkins University
Warsaw, Poland Baltimore, MD, USA
FOREWORD

I n bygone centuries, our physical world appeared to be filled to the brim with mysteries. Divine powers
could provide for genuine miracles; water and sunlight could turn arid land into fertile pastures, but the
same powers could lead to miseries and disasters. The force of life, the vis vitalis, was assumed to be the
special agent responsible for all living things. The heavens, whatever they were for, contained stars and other
heavenly bodies that were the exclusive domain of the Gods.
Mathematics did exist, of course. Indeed, there was one aspect of our physical world that was recognised to
be controlled by precise, mathematical logic: the geometric structure of space, elaborated to become a genuine
form of art by the ancient Greeks. From my perspective, the Greeks were the first practitioners of mathematical
physics, when they discovered that all geometric features of space could be reduced to a small number of
axioms. Today, these would be called fundamental laws of physics. The fact that the flow of time could be
addressed with similar exactitude, and that it could be handled geometrically together with space, was only
recognised much later. And, yes, there were a few crazy people who were interested in the magic of numbers,
but the real world around us seemed to contain so much more that was way beyond our capacities of analysis.
Gradually, all this changed. The Moon and the planets appeared to follow geometrical laws. Galilei and
Newton managed to identify their logical rules of motion, and by noting that the concept of mass could be
applied to things in the sky just like apples and cannon balls on Earth, they made the sky a little bit more
accessible to us. Electricity, magnetism, light and sound were also found to behave in complete accordance
with mathematical equations.
Yet all of this was just a beginning. The real changes came with the twentieth century. A completely new
way of thinking, by emphasizing mathematical, logical analysis rather than empirical evidence, was pioneered
by Albert Einstein. Applying advanced mathematical concepts, only known to a few pure mathematicians, to
notions as mundane as space and time, was new to the physicists of his time. Einstein himself had a hard
time struggling through the logic of connections and curvatures, notions that were totally new to him, but are
only too familiar to students of mathematical physics today. Indeed, there is no better testimony of Einsteins
deep insights at that time, than the fact that we now teach these things regularly in our university classrooms.
Special and general relativity are only small corners of the realm of modern physics that is presently being
studied using advanced mathematical methods. We have notoriously complex subjects such as phase transitions in
condensed matter physics, superconductivity, BoseEinstein condensation, the quantum Hall effect, particularly
the fractional quantum Hall effect, and numerous topics from elementary particle physics, ranging from fibre
bundles and renormalization groups to supergravity, algebraic topology, superstring theory, CalabiYau spaces
and what not, all of which require the utmost of our mental skills to comprehend them.
The most bewildering observation that we make today is that it seems that our entire physical world
appears to be controlled by mathematical equations, and these are not just sloppy and debatable models, but
precisely documented properties of materials, of systems, and of phenomena in all echelons of our universe.
Does this really apply to our entire world, or only to parts of it? Do features, notions, entities exist that are
emphatically not mathematical? What about intuition, or dreams, and what about consciousness? What
about religion? Here, most of us would say, one should not even try to apply mathematical analysis, although
even here, some brave social scientists are making attempts at coordinating rational approaches.
No, there are clear and important differences between the physical world and the mathematical world.
Where the physical world stands out is the fact that it refers to reality, whatever reality is. Mathematics is
the world of pure logic and pure reasoning. In physics, it is the experimental evidence that ultimately decides
whether a theory is acceptable or not. Also, the methodology in physics is different.
A beautiful example is the serendipitous discovery of superconductivity. In 1911, the Dutch physicist Heike
Kamerlingh Onnes was the first to achieve the liquefaction of helium, for which a temperature below 4.25 K
had to be realized. Heike decided to measure the specific conductivity of mercury, a metal that is frozen solid
at such low temperatures. But something appeared to go wrong during the measurements, since the volt
meter did not show any voltage at all. All experienced physicists in the team assumed that they were dealing
with a malfunction. It would not have been the first time for a short circuit to occur in the electrical
equipment, but, this time, in spite of several efforts, they failed to locate it. One of the assistants was
responsible for keeping the temperature of the sample well within that of liquid helium, a dull job, requiring
nothing else than continuously watching some dials. During one of the many tests, however, he dozed off.
The temperature rose, and suddenly the measurements showed the normal values again. It then occurred to
the investigators that the effect and its temperature dependence were completely reproducible. Below 4.19
degrees Kelvin the conductivity of mercury appeared to be strictly infinite. Above that temperature, it is
finite, and the transition is a very sudden one. Superconductivity was discovered (D. van Delft, Heike
Kamerling Onnes, Uitgeverij Bert Bakker, Amsterdam, 2005 (in Dutch)).
This is not the way mathematical discoveries are made. Theorems are not produced by assistants falling
asleep, even if examples do exist of incidents involving some miraculous fortune.
The hybrid science of mathematical physics is a very curious one. Some of the topics in this Encyclopedia
are undoubtedly physical. High Tc superconductivity, breaking water waves, and magneto-hydrodynamics,
are definitely topics of physics where experimental data are considered more decisive than any high-brow
theory. Cohomology theory, DonaldsonWitten theory, and AdS/CFT correspondence, however, are examples
of purely mathematical exercises, even if these subjects, like all of the others in this compilation, are strongly
inspired by, and related to, questions posed in physics.
It is inevitable, in a compilation of a large number of short articles with many different authors, to see quite a
bit of variation in style and level. In this Encyclopedia, theoretical physicists as well as mathematicians together
made a huge effort to present in a concise and understandable manner their vision on numerous important
issues in advanced mathematical physics. All include references for further reading. We hope and expect that
these efforts will serve a good purpose.

Gerard t Hooft,
Spinoza Institute,
Utrecht University,
The Netherlands.
PREFACE

M athematical Physics as a distinct discipline is relatively new. The International Association of


Mathematical Physics was founded only in 1976. The interaction between physics and mathematics
has, of course, existed since ancient times, but the recent decades, perhaps partly because we are living
through them, appear to have witnessed tremendous progress, yielding new results and insights at a dizzying
pace, so much so that an encyclopedia seems now needed to collate the gathered knowledge.
Mathematical Physics brings together the two great disciplines of Mathematics and Physics to the benefit of
both, the relationship between them being symbiotic. On the one hand, it uses mathematics as a tool to
organize physical ideas of increasing precision and complexity, and on the other it draws on the questions
that physicists pose as a source of inspiration to mathematicians. A classical example of this relationship
exists in Einsteins theory of relativity, where differential geometry played an essential role in the formulation
of the physical theory while the problems raised by the ensuing physics have in turn boosted the development
of differential geometry. It is indeed a happy coincidence that we are writing now a preface to an
encyclopedia of mathematical physics in the centenary of Einsteins annus mirabilis.
The project of putting together an encyclopedia of mathematical physics looked, and still looks, to us a
formidable enterprise. We would never have had the courage to undertake such a task if we did not believe,
first, that it is worthwhile and of benefit to the community, and second, that we would get the much-needed
support from our colleagues. And this support we did get, in the form of advice, encouragement, and
practical help too, from members of our Editorial Advisory Board, from our authors, and from others as well,
who have given unstintingly so much of their time to help us shape this Encyclopedia.
Mathematical Physics being a relatively new subject, it is not yet clearly delineated and could mean
different things to different people. In our choice of topics, we were guided in part by the programs of recent
International Congresses on Mathematical Physics, but mainly by the advice from our Editorial Advisory
Board and from our authors. The limitations of space and time, as well as our own limitations, necessitated
the omission of certain topics, but we have tried to include all that we believe to be core subjects and to cover
as much as possible the most active areas.
Our subject being interdisciplinary, we think it appropriate that the Encyclopedia should have certain
special features. Applications of the same mathematical theory, for instance, to different problems in physics
will have different emphasis and treatment. By the same token, the same problem in physics can draw upon
resources from different mathematical fields. This is why we divide the Encyclopedia into two broad sections:
physics subjects and related mathematical subjects. Articles in either section are deliberately allowed a fair
amount of overlap with one another and many articles will appear under more than one heading, but all are
linked together by elaborate cross referencing. We think this gives a better picture of the subject as a whole
and will serve better a community of researchers from widely scattered yet related fields.
The Encyclopedia is intended primarily for experienced researchers but should be of use also to beginning
graduate students. For the latter category of readers, we have included eight elementary introductory articles for easy
reference, with those on mathematics aimed at physics graduates and those on physics aimed at mathematics
graduates, so that these articles can serve as their first port of call to enable them to embark on any of the main
articles without the need to consult other material beforehand. In fact, we think these articles may even form the
foundation of advanced undergraduate courses, as we know that some authors have already made such use of them.
In addition to the printed version, an on-line version of the Encyclopedia is planned, which will allow both
the contents and the articles themselves to be updated if and when the occasion arises. This is probably a
necessary provision in such a rapidly advancing field.
This project was some four years in the making. Our foremost thanks at its completion go to the members
of our Editorial Advisory Board, who have advised, helped and encouraged us all along, and to all our
authors who have so generously devoted so much of their time to writing these articles and given us much
useful advice as well. We ourselves have learnt a lot from these colleagues, and made some wonderful
contacts with some among them. Special thanks are due also to Arthur Greenspoon whose technical expertise
was indispensable.
The project was started with Academic Press, which was later taken over by Elsevier. We thank warmly
members of their staff who have made this transition admirably seamless and gone on to assist us greatly in
our task: both Carey Chapman and Anne Guillaume, who were in charge of the whole project and have been
with us since the beginning, and Edward Taylor responsible for the copy-editing. And Martin Ruck, who
manages to keep an overwhelming amount of details constantly at his fingertips, and who is never known to
have lost a single email, deserves a very special mention.
As a postscript, we would like to express our gratitude to the very large number of authors who generously
agreed to donate their honorariums to support the Committee for Developing Countries of the European
Mathematical Society in their work to help our less fortunate colleagues in the developing world.

Jean-Pierre Francoise
Gregory L. Naber
Tsou Sheung Tsun
PERMISSION ACKNOWLEDGMENTS
The following material is reproduced with kind permission of Nature Publishing Group
Figures 11 and 12 of Point-vortex Dynamics
http://www.nature.com/nature
The following material is reproduced with kind permission of Oxford University Press
Figure 1 of Random Walks in Random Environments
http://www.oup.co.uk
GUIDE TO USE OF THE ENCYCLOPEDIA

Structure of the Encyclopedia


The material in this Encyclopedia is organised into two sections. At the start of Volume 1 are eight Introductory Articles.
The introductory articles on mathematics are aimed at physics graduates; those on physics are aimed at mathematics
graduates. It is intended that these articles should serve as the first port of call for graduate students, to enable them to
embark on any of the main entries without the need to consult other material beforehand.
Following the Introductory Articles, the main body of the Encyclopedia is arranged as a series of entries in alphabetical
order. These entries fill the remainder of Volume 1 and all of the subsequent volumes (25).
To help you realize the full potential of the material in the Encyclopedia we have provided four features to help you find
the topic of your choice: a contents list by subject, an alphabetical contents list, cross-references, and a full subject index.

1. Contents List by Subject


Your first point of reference will probably be the contents list by subject. This list appears at the front of each volume,
and groups the entries under subject headings describing the broad themes of mathematical physics. This will enable the
reader to make quick connections between entries and to locate the entry of interest. The contents list by subject is divided
into two main sections: Physics Subjects and Related Mathematics Subjects. Under each main section heading, you will
find several subject areas (such as GENERAL RELATIVITY in Physics Subjects or NONCOMMUTATIVE GEOMETRY
in Related Mathematics Subjects). Under each subject area is a list of those entries that cover aspects of that subject,
together with the volume and page numbers on which these entries may be found.
Because mathematical physics is so highly interconnected, individual entries may appear under more than one subject
area. For example, the entry GAUGE THEORY: MATHEMATICAL APPLICATIONS is listed under the Physics Subject
GAUGE THEORY as well as in a broad range of Related Mathematics Subjects.

2. Alphabetical Contents List


The alphabetical contents list, which also appears at the front of each volume, lists the entries in the order in which they
appear in the Encyclopedia. This list provides both the volume number and the page number of the entry.
You will find dummy entries where obvious synonyms exist for entries or where we have grouped together related
topics. Dummy entries appear in both the contents list and the body of the text.

Example
If you were attempting to locate material on path integral methods via the alphabetical contents list:

PATH INTEGRAL METHODS see Functional Integration in Quantum Physics; Feynman Path Integrals

The dummy entry directs you to two other entries in which path integral methods are covered. At the appropriate
locations in the contents list, the volume and page numbers for these entries are given.
If you were trying to locate the material by browsing through the text and you had looked up Path Integral Methods,
then the following information would be provided in the dummy entry:

Path Integral Methods see Functional Integration in Quantum Physics; Feynman Path Integrals
xii GUIDE TO USE OF THE ENCYCLOPEDIA

3. Cross-References
All of the articles in the Encyclopedia have been extensively cross-referenced. The cross-references, which appear at the
end of an entry, serve three different functions:

i. To indicate if a topic is discussed in greater detail elsewhere.

ii. To draw the readers attention to parallel discussions in other entries.

iii. To indicate material that broadens the discussion.

Example
The following list of cross-references appears at the end of the entry STOCHASTIC HYDRODYNAMICS

See also: Cauchy Problem for Burgers-Type Equations; Hamiltonian


Fluid Dynamics; Incompressible Euler Equations: Mathematical Theory;
Malliavin Calculus; Non-Newtonian Fluids; Partial Differential Equations:
Some Examples; Stochastic Differential Equations; Turbulence Theories;
Viscous Incompressible Fluids: Mathematical Theory; Vortex Dynamics

Here you will find examples of all three functions of the cross-reference list: a topic discussed in greater detail elsewhere
(e.g. Incompressible Euler Equations: Mathematical Theory), parallel discussion in other entries (e.g. Stochastic Differ-
ential Equations) and reference to entries that broaden the discussion (e.g. Turbulence Theories).
The eight Introductory Articles are not cross-referenced from any of the main entries, as it is expected that introductory
articles will be of general interest. As mentioned above, the Introductory Articles may be found at the start of Volume 1.

4. Index
The index will provide you with the volume and page number where the material is located. The index entries
differentiate between material that is a whole entry, is part of an entry, or is data presented in a figure or table. Detailed
notes are provided on the opening page of the index.

5. Contributors
A full list of contributors appears at the beginning of each volume.
CONTRIBUTORS

A Abbondandolo L Andersson
Universita di Pisa University of Miami
Pisa, Italy Coral Gables, FL, USA and Albert Einstein Institute
Potsdam, Germany
M J Ablowitz
University of Colorado B Andreas
Boulder, CO, USA Humboldt-Universitat zu Berlin
Berlin, Germany

S L Adler
Institute for Advanced Study V Araujo
Princeton, NJ, USA Universidade do Porto
Porto, Portugal

H Airault
Universite de Picardie A Ashtekar
Amiens, France Pennsylvania State University
University Park, PA, USA

G Alberti
W Van Assche
Universita di Pisa
Katholieke Universiteit Leuven
Pisa, Italy
Leuven, Belgium

S Albeverio
G Aubert
Rheinische FriedrichWilhelms-Universitat Bonn
Universite de Nice Sophia Antipolis
Bonn, Germany
Nice, France

S T Ali H Au-Yang
Concordia University Oklahoma State University
Montreal, QC, Canada Stillwater, OK, USA

R Alicki M A Aziz-Alaoui
University of Gdansk Universite du Havre
Gdansk, Poland Le Havre, France

G Altarelli V Bach
CERN Johannes Gutenberg-Universitat
Geneva, Switzerland Mainz, Germany

C Amrouche C Bachas
Universite de Pau et des Pays de lAdour Ecole Normale Superieure
Pau, France Paris, France

M Anderson V Baladi
State University of New York at Stony Brook Institut Mathematique de Jussieu
Stony Brook, NY, USA Paris, France
xiv CONTRIBUTORS

D Bambusi M Blasone
Universita di Milano Universita degli Studi di Salerno
Milan, Italy Baronissi (SA), Italy

C Bardos M Blau
Universite de Paris 7 Universite de Neuchatel
Paris, France Neuchatel, Switzerland

D Bar-Natan S Boatto
University of Toronto IMPA
Toronto, ON, Canada Rio de Janeiro, Brazil

L V Bogachev
E L Basor
University of Leeds
California Polytechnic State University
Leeds, UK
San Luis Obispo, CA, USA

L Boi
M T Batchelor EHESS and LUTH
Australian National University Paris, France
Canberra, ACT, Australia
M Bojowald
S Bauer The Pennsylvania State University
Universitat Bielefeld University Park, PA, USA
Bielefeld, Germany
C Bonatti
V Beffara Universite de Bourgogne
Ecole Nomale Superieure de Lyon Dijon, France
Lyon, France
P Bonckaert
R Beig Universiteit Hasselt
Universitat Wien Diepenbeek, Belgium
Vienna, Austria
F Bonetto
M I Belishev Georgia Institute of Technology
Petersburg Department of Steklov Institute Atlanta, GA, USA
of Mathematics
St. Petersburg, Russia G Bouchitte
Universite de Toulon et du Var
La Garde, France
P Bernard
Universite de Paris Dauphine
A Bovier
Paris, France
Weierstrass Institute for Applied Analysis and Stochastics
Berlin, Germany
D Birmingham
University of the Pacific H W Braden
Stockton, CA, USA University of Edinburgh
Edinburgh, UK
Jir Bicak
Charles University, Prague, Czech Republic H Bray
and Albert Einstein Institute Duke University
Potsdam, Germany Durham, NC, USA

C Blanchet Y Brenier
Universite de Bretagne-Sud Universite de Nice Sophia Antipolis
Vannes, France Nice, France
CONTRIBUTORS xv

J Bros J Cardy
CEA/DSM/SPhT, CEA/Saclay Rudolf Peierls Centre for Theoretical Physics
Gif-sur-Yvette, France Oxford, UK

R Brunetti R Caseiro
Universitat Hamburg Universidade de Coimbra
Hamburg, Germany Coimbra, Portugal

M Bruschi A S Cattaneo
Universita di Roma La Sapienza Universitat Zurich
Rome, Italy Zurich, Switzerland

T Brzezinski A Celletti
University of Wales Swansea Universita di Roma Tor Vergata
Swansea, UK Rome, Italy

D Buchholz D Chae
Universitat Gottingen Sungkyunkwan University
Gottingen, Germany Suwon, South Korea

N Burq G-Q Chen


Universite Paris-Sud Northwestern University
Orsay, France Evanston, IL, USA

F H Busse L Chierchia
Universitat Bayreuth Universita degli Studi Roma Tre
Bayreuth, Germany Rome, Italy

G Buttazzo S Chmutov
Universita di Pisa Petersburg Department of Steklov
Pisa, Italy Institute of Mathematics
St. Petersburg, Russia
P Butta
Universita di Roma La Sapienza M W Choptuik
Rome, Italy University of British Columbia
Vancouver, Canada
S L Cacciatori
Universita di Milano Y Choquet-Bruhat
Milan, Italy Universite P.-M. Curie, Paris VI
Paris, France
P T Callaghan
Victoria University of Wellington P T Chrusciel
Wellington, New Zealand Universite de Tours
Tours, France
Francesco Calogero
University of Rome, Rome, Italy and Institute Chong-Sun Chu
Nazionale di Fisica Nucleare University of Durham
Rome, Italy Durham, UK

A Carati F Cipriani
Universita di Milano Politecnico di Milano
Milan, Italy Milan, Italy
xvi CONTRIBUTORS

R L Cohen G W Delius
Stanford University University of York
Stanford, CA, USA York, UK

T H Colding G F dellAntonio
University of New York Universita di Roma La Sapienza
New York, NY, USA Rome, Italy

J C Collins
C DeWitt-Morette
Penn State University
The University of Texas at Austin
University Park, PA, USA
Austin, TX, USA

G Comte
Universite de Nice Sophia Antipolis L Diosi
Nice, France Research Institute for Particle and Nuclear Physics
Budapest, Hungary
A Constantin
Trinity College A Doliwa
Dublin, Republic of Ireland University of Warmia and Mazury in Olsztyn
Olsztyn, Poland
D Crowdy
Imperial College G Dolzmann
London, UK University of Maryland
College Park, MD, USA
A B Cruzeiro
University of Lisbon
S K Donaldson
Lisbon, Portugal
Imperial College
London, UK
G Dal Maso
SISSA
Trieste, Italy T C Dorlas
Dublin Institute for Advanced Studies
F Dalfovo Dublin, Republic of Ireland
Universita di Trento
Povo, Italy M R Douglas
Rutgers, The State University of New Jersey
A S Dancer Piscataway, NJ, USA
University of Oxford
Oxford, UK M Dutsch
Universitat Zurich
P DAncona Zurich, Switzerland
Universita di Roma La Sapienza
Rome, Italy
B Dubrovin
SISSA-ISAS
S R Das
Trieste, Italy
University of Kentucky
Lexington, KY, USA
J J Duistermaat
E Date Universiteit Utrecht
Osaka University Utrecht, The Netherlands
Osaka, Japan
S Duzhin
N Datta Petersburg Department of Steklov Institute of
University of Cambridge Mathematics
Cambridge, UK St. Petersburg, Russia
CONTRIBUTORS xvii

G Ecker B Ferrario
Universitat Wien Universita di Pavia
Vienna, Austria Pavia, Italy

M Efendiev R Finn
Universitat Stuttgart Stanford University
Stuttgart, Germany Stanford, CA, USA

T Eguchi D Fiorenza
University of Tokyo Universita di Roma La Sapienza
Tokyo, Japan Rome, Italy

J Ehlers A E Fischer
Max Planck Institut fur Gravitationsphysik University of California
(Albert-Einstein Institut) Santa Cruz, CA, USA
Golm, Germany
A S Fokas
P E Ehrlich University of Cambridge
University of Florida Cambridge, UK
Gainesville, FL, USA
J-P Francoise
D Einzel Universite P.-M. Curie, Paris VI
Bayerische Akademie der Wissenschaften Paris, France
Garching, Germany

S Franz
G A Elliott The Abdus Salam ICTP
University of Toronto Trieste, Italy
Toronto, Canada

L Frappat
G F R Ellis Universite de Savoie
University of Cape Town Chambery-Annecy, France
Cape Town, South Africa

J Frauendiener
C L Epstein Universitat Tubingen
University of Pennsylvania Tubingen, Germany
Philadelphia, PA, USA

K Fredenhagen
J Escher Universitat Hamburg
Universitat Hannover Hamburg, Germany
Hannover, Germany

S Friedlander
J B Etnyre
University of Illinois-Chicago
University of Pennsylvania
Chicago, IL, USA
Philadelphia, PA, USA

G Falkovich M R Gaberdiel
Weizmann Institute of Science ETH Zurich
Rehovot, Israel Zurich, Switzerland

M Farge G Gaeta
Ecole Normale Superieure Universita di Milano
Paris, France Milan, Italy
xviii CONTRIBUTORS

L Galgani H Gottschalk
Universita di Milano Rheinische Friedrich-Wilhelms-Universitat Bonn
Milan, Italy Bonn, Germany

G Gallavotti O Goubet
Universita di Roma La Sapienza Universite de Picardie Jules Verne
Rome, Italy Amiens, France

R Gambini T R Govindarajan
Universidad de la Republica The Institute of Mathematical Sciences
Montevideo, Uruguay Chennai, India

G Gentile A Grassi
Universita degli Studi Roma Tre University of Pennsylvania
Rome, Italy Philadelphia, PA, USA

P G Grinevich
A Di Giacomo
L D Landau Institute for
Universita di Pisa
Theoretical Physics
Pisa, Italy
Moscow, Russia

P B Gilkey
Ch Gruber
University of Oregon
Ecole Polytechnique Federale de Lausanne
Eugene, OR, USA
Lausanne, Switzerland

R Gilmore J-L Guermond


Drexel University Universite de Paris Sud
Philadelphia, PA, USA Orsay, France

S Gindikin F Guerra
Rutgers University Universita di Roma La Sapienza
Piscataway, NJ, USA Rome, Italy

A Giorgilli T Guhr
Universita di Milano Lunds Universitet
Milan, Italy Lund, Sweden

G A Goldin C Guillope
Rutgers University Universite Paris XII Val de Marne
Piscataway, NJ, USA Creteil, France

G Gonzalez C Gundlach
Louisiana State University University of Southampton
Baton Rouge, LA, USA Southampton, UK

R Gopakumar S Gutt
Harish-Chandra Research Institute Universite Libre de Bruxelles
Allahabad, India Brussels, Belgium

D Gottesman K Hannabuss
Perimeter Institute University of Oxford
Waterloo, ON, Canada Oxford, UK
CONTRIBUTORS xix

M Haragus D D Holm
Universite de Franche-Comte Imperial College
Besancon, France London, UK

S G Harris J-W van Holten


St. Louis University NIKHEF
St. Louis, MO, USA Amsterdam, The Netherlands

B Hasselblatt A Huckleberry
Tufts University Ruhr-Universitat Bochum
Medford, MA, USA Bochum, Germany

P Hayden K Hulek
McGill University Universitat Hannover
Montreal, QC, Canada Hannover, Germany

D C Heggie D Iagolnitzer
The University of Edinburgh CEA/DSM/SPhT, CEA/Saclay
Edinburgh, UK Gif-sur-Yvette, France

B Helffer R Illge
Universite Paris-Sud Friedrich-Schiller-Universitat Jena
Orsay, France Jena, Germany

G M Henkin P Imkeller
Universite P.-M. Curie, Paris VI Humboldt Universitat zu Berlin
Paris, France Berlin, Germany

M Henneaux G Iooss
Universite Libre de Bruxelles Institut Non Lineaire de Nice
Bruxelles, Belgium Valbonne, France

S Herrmann M Irigoyen
Universite Henri Poincare, Nancy 1 Universite P.-M. Curie, Paris VI
Vandoeuvre-les-Nancy, France Paris, France

C P Herzog J Isenberg
University of California at Santa Barbara University of Oregon
Santa Barbara, CA, USA Eugene, OR, USA

J G Heywood R Ivanova
University of British Columbia University of Hawaii Hilo
Vancouver, BC, Canada Hilo, HI, USA

A C Hirshfeld E M Izhikevich
Universitat Dortmund The Neurosciences Institute
Dortmund, Germany San Diego, CA, USA

A S Holevo R W Jackiw
Steklov Mathematical Institute Massachusetts Institute of Technology
Moscow, Russia Cambridge, MA, USA

T J Hollowood J K Jain
University of Wales Swansea The Pennsylvania State University
Swansea, UK University Park, PA, USA
xx CONTRIBUTORS

M Jardim L H Kauffman
IMECCUNICAMP University of Illinois at Chicago
Campinas, Brazil Chicago, IL, USA

L C Jeffrey R K Kaul
University of Toronto The Institute of Mathematical Sciences
Toronto, ON, Canada Chennai, India

J Jimenez Y Kawahigashi
Universidad Politecnica de Madrid University of Tokyo
Madrid, Spain Tokyo, Japan

S Jitomirskaya B S Kay
University of California at Irvine University of York
Irvine, CA, USA York, UK

P Jizba R Kenyon
Czech Technical University University of British Columbia
Prague, Czech Republic Vancouver, BC, Canada

A Joets M Keyl
Universite Paris-Sud Universita di Pavia
Orsay, France Pavia, Italy

K Johansson T W B Kibble
Kungl Tekniska Hogskolan Imperial College
Stockholm, Sweden London, UK

G Jona-Lasinio S Kichenassamy
Universita di Roma La Sapienza Universite de Reims Champagne-Ardenne
Rome, Italy Reims, France

V F R Jones J Kim
University of California at Berkeley University of California at Irvine
Berkeley, CA, USA Irvine, USA

N Joshi S B Kim
University of Sydney Chonnam National University
Sydney, NSW, Australia Gwangju, South Korea

D D Joyce A Kirillov
University of Oxford University of Pennsylvania
Oxford, UK Philadelphia, PA, USA

C D Jakel A Kirillov, Jr.


Ludwig-Maximilians-Universitat Munchen Stony Brook University
Munchen, Germany Stony Brook, NY, USA

G Kasperski K Kirsten
Universite Paris-Sud XI Baylor University
Orsay, France Waco, TX, USA
CONTRIBUTORS xxi

F Kirwan M Krbec
University of Oxford Academy of Sciences
Oxford, UK Prague, Czech Republic

S Klainerman D Kreimer
Princeton University IHES
Princeton, NJ, USA Bures-sur-Yvette, France

I R Klebanov A Kresch
Princeton University University of Warwick
Princeton, NJ, USA Coventry, UK

Y Kondratiev D Kretschmann
Universitat Bielefeld Technische Universitat Braunschweig
Bielefeld, Germany Braunschweig, Germany

A Konechny P B Kronheimer
Rutgers, The State University of New Jersey Harvard University
Piscataway, NJ, USA Cambridge, MA, USA

K Konishi B Kuckert
Universita di Pisa Universitat Hamburg
Pisa, Italy Hamburg, Germany

T H Koornwinder Y Kuramoto
University of Amsterdam Hokkaido University
Amsterdam, The Netherlands Sapporo, Japan

P Kornprobst J M F Labastida
INRIA CSIC
Sophia Antipolis, France Madrid, Spain

V P Kostov G Labrosse
Universite de Nice Sophia Antipolis Universite Paris-Sud XI
Nice, France Orsay, France

R Kotecky C Landim
Charles University IMPA, Rio de Janeiro, Brazil and UMR 6085
Prague, Czech Republic and the and Universite de Rouen
University of Warwick, UK France

Y Kozitsky E Langmann
Uniwersytet Marii Curie-Sklodowskiej KTH Physics
Lublin, Poland Stockholm, Sweden

P Kramer S Laporta
Universitat Tubingen Universita di Parma
Tubingen, Germany Parma, Italy

C Krattenthaler O D Lavrentovich
Universitat Wien Kent State University
Vienna, Austria Kent, OH, USA
xxii CONTRIBUTORS

G F Lawler M Lyubich
Cornell University University of Toronto
Ithaca, NY, USA Toronto, ON, Canada and Stony Brook University
NY, USA
C Le Bris
CERMICS ENPC R Leandre
Champs Sur Marne, France Universite de Bourgogne
Dijon, France
A Lesne
Universite P.-M. Curie, Paris VI P Levay
Paris, France Budapest University of Technology and Economics
Budapest, Hungary
D Levi
Universita Roma Tre R Maartens
Rome, Italy Portsmouth University
Portsmouth, UK

J Lewandowski
Uniwersyte Warszawski N MacKay
Warsaw, Poland University of York
York, UK

R G Littlejohn
J Magnen
University of California at Berkeley
Ecole Polytechnique
Berkeley, CA, USA
France

R Livi
F Magri
Universita di Firenze
Universita di Milano Bicocca
Sesto Fiorentino, Italy
Milan, Italy

R Longoni
J Maharana
Universita di Roma La Sapienza
Institute of Physics
Rome, Italy
Bhubaneswar, India

J Lowengrub S Majid
University of California at Irvine Queen Mary, University of London
Irvine, USA London, UK

C Lozano C Marchioro
INTA Universita di Roma La Sapienza
Torrejon de Ardoz, Spain Rome, Italy

T T Q Le K Marciniak
Georgia Institute of Technology Linkoping University
Atlanta, GA, USA Norrkoping, Sweden

B Lucquin-Desreux M Marcolli
Universite P.-M. Curie, Paris VI Max-Planck-Institut fur Mathematik
Paris, France Bonn, Germany

V Lyubashenko M Marino
Institute of Mathematics CERN
Kyiv, Ukraine Geneva, Switzerland
CONTRIBUTORS xxiii

J Marklof P K Mitter
University of Bristol Universite de Montpellier 2
Bristol, UK Montpellier, France

C-M Marle V Moncrief


Universite P.-M. Curie, Paris VI Yale University
Paris, France New Haven, CT, USA

L Mason
Y Morita
University of Oxford
Ryukoku University
Oxford, UK
Otsu, Japan

V Mastropietro
Universita di Roma Tor Vergata P J Morrison
Rome, Italy University of Texas at Austin
Austin, TX, USA
V Mathai
University of Adelaide J Mund
Adelaide, SA, Australia Universidade de Sao Paulo
Sao Paulo, Brazil
J Mawhin
Universite Catholique de Louvain F Musso
Louvain-la-Neuve, Belgium Universita Roma Tre
Rome, Italy
S Mazzucchi
Universita di Trento
Povo, Italy G L Naber
Drexel University
B M McCoy Philadelphia, PA, USA
State University of New York at Stony Brook
Stony Brook, NY, USA B Nachtergaele
University of California at Davis
E Meinrenken Davis, CA, USA
University of Toronto
Toronto, ON, Canada
C Nash
National University of Ireland
I Melbourne
Maynooth, Ireland
University of Surrey
Guildford, UK
S Necasova
J Mickelsson Academy of Sciences
KTH Physics Prague, Czech Republic
Stockholm, Sweden
A I Neishtadt
W P Minicozzi II Russian Academy of Sciences
University of New York Moscow, Russia
New York, NY, USA

N Neumaier
S Miracle-Sole
Albert-Ludwigs-University in Freiburg
Centre de Physique Theorique, CNRS
Freiburg, Germany
Marseille, France

A Miranville S E Newhouse
Universite de Poitiers Michigan State University
Chasseneuil, France E. Lansing, MI, USA
xxiv CONTRIBUTORS

C M Newman P E Parker
New York University Wichita State University
New York, NY, USA Wichita KS, USA

S Paycha
S Nikcevic
Universite Blaise Pascal
SANU
Aubiere, France
Belgrade, Serbia and Montenegro

P A Pearce
M Nitsche University of Melbourne
University of New Mexico Parkville VIC, Australia
Albuquerque, NM, USA
P Pearle
R G Novikov Hamilton College
Universite de Nantes Clinton, NY, USA
Nantes, France
M Pedroni
J M Nunes da Costa Universita di Bergamo
Universidade de Coimbra Dalmine (BG), Italy
Coimbra, Portugal
B Pelloni
University of Reading
S OBrien
UK
Tyndall National Institute
Cork, Republic of Ireland
R Penrose
University of Oxford
A Okounkov Oxford, UK
Princeton University
Princeton, NJ, USA A Perez
Penn State University,
A Onuki University Park, PA, USA
Kyoto University
Kyoto, Japan J H H Perk
Oklahoma State University
J-P Ortega Stillwater, OK, USA
Universite de Franche-Comte
Besancon, France T Peternell
Universitat Bayreuth
Bayreuth, Germany
H Osborn
University of Cambridge
D Petz
Cambridge, UK
Budapest University of Technology and Economics
Budapest, Hungary
Maciej P Wojtkowski
University of Arizona
M J Pflaum
Tucson, AZ, USA and Institute of Mathematics PAN
Johann Wolfgang Goethe-Universitat
Warsaw, Poland
Frankfurt, Germany

J Palmer B Piccoli
University of Arizona Istituto per le Applicazioni del Calcolo
Tucson, AZ, USA Rome, Italy

J H Park C Piquet
Sungkyunkwan University Universite P.-M. Curie, Paris VI
Suwon, South Korea Paris, France
CONTRIBUTORS xxv

L P Pitaevskii K-H Rehren


Universita di Trento Universitat Gottingen
Povo, Italy Gottingen, Germany

S Pokorski E Remiddi
Warsaw University Universita di Bologna
Warsaw, Poland Bologna, Italy

E Presutti
J E Roberts
Universita di Roma Tor Vergata
Universita di Roma Tor Vergata
Rome, Italy
Rome, Italy

E Previato
Boston University L Rey-Bellet
Boston, MA, USA University of Massachusetts
Amherst, MA, USA
B Prinari
Universita degli Studi di Lecce R Robert
Lecce, Italy Universite Joseph Fourier
Saint Martin DHeres, France
J Pullin
Louisiana State University F A Rogers
Baton Rouge, LA, USA Kings College London
London, UK
M Pulvirenti
Universita di Roma La Sapienza
R M S Rosa
Rome, Italy
Universidade Federal do Rio de Janeiro
Rio de Janeiro, Brazil
O Ragnisco
Universita Roma Tre
Rome, Italy C Rovelli
Universite de la Mediterranee et Centre
P Ramadevi de Physique Theorique
Indian Institute of Technology Bombay Marseilles, France
Mumbai, India
S N M Ruijsenaars
S A Ramakrishna Centre for Mathematics and Computer Science
Indian Institute of Technology Amsterdam, The Netherlands
Kanpur, India
F Russo
J Rasmussen Universite Paris 13
Princeton University Villetaneuse, France
Princeton, NJ, USA

L H Ryder
L Rastelli
University of Kent
Princeton University
Canterbury, UK
Princeton, NJ, USA

T S Ratiu S Sachdev
Ecole Polytechnique Federale de Lausanne Yale University
Lausanne, Switzerland New Haven, CT, USA

S Rauch-Wojciechowski H Sahlmann
Linkoping University Universiteit Utrecht
Linkoping, Sweden Utrecht, The Netherlands
xxvi CONTRIBUTORS

M Salmhofer M A Semenov-Tian-Shansky
Universitat Leipzig Steklov Institute of Mathematics
Leipzig, Germany St. Petersburg, Russia and and Universite de Bourgogne
Dijon, France
P M Santini
Universita di Roma La Sapienza A N Sengupta
Rome, Italy Louisiana State University
Baton Rouge LA, USA
A Sarmiento
Universidade Federal de Minas Gerais
S Serfaty
Belo Horizonte, Brazil
New York University
New York, NY, USA
R Sasaki
Kyoto University
E R Sharpe
Kyoto, Japan
University of Utah
Salt Lake City, UT, USA
A Savage
University of Toronto
Toronto, ON, Canada D Shepelsky
Institute for Low Temperature Physics and Engineering
M Schechter Kharkov, Ukraine
University of California at Irvine
Irvine, CA, USA S Shlosman
Universite de Marseille
D-M Schlingemann Marseille, France
Technical University of Braunschweig
Braunschweig, Germany A Siconolfi
Universita di Roma La Sapienza
R Schmid Rome, Italy
Emory University
Atlanta, GA, USA
V Sidoravicius
IMPA
G Schneider
Rio de Janeiro, Brazil
Universitat Karlsruhe
Karlsruhe, Germany
J A Smoller
K Schneider University of Michigan
Universite de Provence Ann Arbor MI, USA
Marseille, France
M Socolovsky
B Schroer Universidad Nacional Autonoma de Mexico
Freie Universitat Berlin Mexico DF, Mexico
Berlin, Germany
J P Solovej
T Schucker
University of Copenhagen
Universite de Marseille
Copenhagen, Denmark
Marseille, France

S Scott A Soshnikov
Kings College London University of California at Davis
London, UK Davis, CA, USA

P Selick J M Speight
University of Toronto University of Leeds
Toronto, ON, Canada Leeds, UK
CONTRIBUTORS xxvii

H Spohn B Temple
Technische Universitat Munchen University of California at Davis
Garching, Germany Davis, CA, USA

J Stasheff R P Thomas
Lansdale, PA, USA Imperial College
London, UK
D L Stein
University of Arizona U Tillmann
Tucson, AZ, USA University of Oxford
Oxford, UK
K S Stelle
Imperial College K P Tod
London, UK University of Oxford
Oxford, UK
G Sterman
J A Toth
Stony Brook University
McGill University
Stony Brook, NY, USA
Montreal, QC, Canada
S Stringari
C A Tracy
Universita di Trento
University of California at Davis
Povo, Italy
Davis, CA, USA

S J Summers
A Trautman
University of Florida
Warsaw University
Gainesville, FL, USA
Warsaw, Poland

V S Sunder
D Treschev
The Institute of Mathematical Sciences
Moscow State University
Chennai, India
Moscow, Russia

Y B Suris L Triolo
Technische Universitat Munchen Universita di Roma Tor Vergata
Munchen, Germany Rome, Italy

R J Szabo J Troost
Heriot-Watt University Ecole Normale Superieure
Edinburgh, UK Paris, France

S Tabachnikov Tsou Sheung Tsun


Pennsylvania State University University of Oxford
University Park, PA, USA Oxford, UK

H Tasaki V Turaev
Gakushuin University IRMA
Tokyo, Japan Strasbourg, France

M E Taylor D Ueltschi
University of North Carolina University of Arizona
Chapel Hill, NC, USA Tucson, AZ, USA

R Temam A M Uranga
Indiana University Consejo Superior de Investigaciones Cientificas
Bloomington, IN, USA Madrid, Spain
xxviii CONTRIBUTORS

A Valentini R F Werner
Perimeter Institute for Theoretical Physics Technische Universitat Braunschweig
Waterloo, ON, Canada Braunschweig, Germany

M Vaugon H Widom
Universite P.-M. Curie, Paris VI University of California at Santa Cruz
Paris, France Santa Cruz, CA, USA

P Di Vecchia C M Will
Nordita Washington University
Copenhagen, Denmark St. Louis, MO, USA

A F Verbeure N M J Woodhouse
Institute for Theoretical Physics University of Oxford
KU Leuven, Belgium Oxford, UK

Y Colin de Verdiere Siye Wu


Universite de Grenoble 1 University of Colorado
Saint-Martin dHeres, France Boulder, CO, USA

M Viana V Wunsch
IMPA Friedrich-Schiller-Universitat Jena
Rio de Janeiro, Brazil Jena, Germany

G Vitiello D R Yafaev
Universita degli Studi di Salerno Universite de Rennes
Baronissi (SA), Italy Rennes, France

D-V Voiculescu M Yamada


University of California at Berkeley Kyoto University
Berkeley, CA, USA Kyoto, Japan

S Waldmann M Yuri
Albert-Ludwigs-Universitat Freiburg Hokkaido University
Freiburg, Germany Sapporo, Japan

J Wambsganss D Zubrinic
Universitat Heidelberg University of Zagreb
Heidelberg, Germany Zagreb, Croatia

R S Ward V Zupanovic
University of Durham University of Zagreb
Durham, UK Zagreb, Croatia

E Wayne R Zecchina
Boston University International Centre for Theoretical Physics (ICTP)
Boston, MA, USA Trieste, Italy

F W Wehrli S Zelditch
University of Pennsylvania Johns Hopkins University
Philadelphia, PA, USA Baltimore, MD, USA
CONTRIBUTORS xxix

S Zelik M R Zirnbauer
Universitat Stuttgart Universitat Koln
Stuttgart, Germany Koln, Germany

S-C Zhang A Zumpano


Stanford University Universidade Federal de Minas Gerais
Stanford, CA, USA Belo Horizonte, Brazil

M B Ziane
University of Southern California
Los Angeles, CA, USA
CONTENTS LIST BY SUBJECT
Location references refer to the volume number and page number (separated by a colon).

INTRODUCTORY ARTICLES DonaldsonWitten Theory 2:110


Duality in Topological Quantum Field
Classical Mechanics 1:1 Theory 2:118
Differential Geometry 1:33 Finite-Type Invariants 2:340
Electromagnetism 1:40 Four-Manifold Invariants and Physics 2:386
Equilibrium Statistical Mechanics 1:51 Gauge Theoretic Invariants of 4-Manifolds 2:457
Functional Analysis 1:88 h-Pseudodifferential Operators and
Minkowski Spacetime and Special Relativity 1:96 Applications 2:701
Quantum Mechanics 1:109 The Jones Polynomial 3:179
Topology 1:131 Knot Theory and Physics 3:220
Kontsevich Integral 3:231
Large-N and Topological Strings 3:263
PHYSICS SUBJECTS MathaiQuillen Formalism 3:390
Mathematical Knot Theory 3:399
Classical Mechanics Operator Product Expansion in Quantum Field
Boundary Control Method and Inverse Problems of Theory 3:616
Wave Propagation 1:340 Schwarz-Type Topological Quantum Field
Constrained Systems 1:611 Theory 4:494
Cotangent Bundle Reduction 1:658 Solitons and Other Extended Field
Gravitational N-body Problem (Classical) 2:575 Configurations 4:602
Hamiltonian Fluid Dynamics 2:593 Topological Defects and Their Homotopy
Hamiltonian Systems: Obstructions to Classification 5:257
Integrability 2:624 Topological Gravity, Two-Dimensional 5:264
Infinite-Dimensional Hamiltonian Systems 3:37 Topological Knot Theory and Macroscopic
Inverse Problem in Classical Mechanics 3:156 Physics 5:271
KAM Theory and Celestial Mechanics 3:189 Topological Sigma Models 5:290
Peakons 4:12 Two-Dimensional Conformal Field Theory and
Poisson Reduction 4:79 Vertex Operator Algebras 5:317
Stability Problems in Celestial Mechanics 5:20 WDVV Equations and Frobenius
Symmetry and Symplectic Reduction 5:190 Manifolds 5:438

Classical, Conformal and Topological Condensed Matter and Optics


Field Theory BoseEinstein Condensates 1:312
Topological Quantum Field Theory: FalicovKimball Model 2:283
Overview 5:278 Fractional Quantum Hall Effect 2:402
AdS/CFT Correspondence 1:174 High Tc Superconductor Theory 2:645
Axiomatic Approach to Topological Quantum Field Hubbard Model 2:712
Theory 1:232 Liquid Crystals 3:320
BF Theories 1:257 Negative Refraction and Subdiffraction
Boundary Conformal Field Theory 1:333 Imaging 3:483
ChernSimons Models: Rigorous Results 1:496 Nuclear Magnetic Resonance 3:592
xxxii CONTENTS LIST BY SUBJECT

Optical Caustics 3:620 KAM Theory and Celestial Mechanics 3:189


Quantum Phase Transitions 4:289 Lyapunov Exponents and Strange Attractors 3:349
Quasiperiodic Systems 4:308 Multiscale Approaches 3:465
Renormalization: Statistical Mechanics and Normal Forms and Semiclassical
Condensed Matter 4:407 Approximation 3:578
Short-Range Spin Glasses: The Metastate Point-Vortex Dynamics 4:66
Approach 4:570 Poisson Reduction 4:79
Topological Defects and Their Homotopy Polygonal Billiards 4:84
Classification 5:257 Quasiperiodic Systems 4:308
Random Dynamical Systems 4:330
Regularization For Dynamical -Functions 4:386
Disordered Systems Resonances 4:415
RiemannHilbert Problem 4:436
Cellular Automata 1:455
Semiclassical Spectra and Closed Orbits 4:512
Lagrangian Dispersion (Passive Scalar) 3:255
Separatrix Splitting 4:535
Mean Field Spin Glasses and Neural
Stability Problems in Celestial Mechanics 5:20
Networks 3:407
Stability Theory and KAM 5:26
Percolation Theory 4:21
Symmetry and Symmetry Breaking in Dynamical
Random Matrix Theory in Physics 4:338
Systems 5:184
Random Walks in Random Environments 4:353
Symmetry and Symplectic Reduction 5:190
Short-Range Spin Glasses: The Metastate
Synchronization of Chaos 5:213
Approach 4:570
Universality and Renormalization 5:343
Spin Glasses 4:655
Weakly Coupled Oscillators 5:448
Stochastic Loewner Evolutions 5:80
Two-Dimensional Ising Model 5:322
Wulff Droplets 5:462 Equilibrium Statistical Mechanics
Bethe Ansatz 1:253
Cluster Expansion 1:531
Dynamical Systems Dimer Problems 2:61
Averaging Methods 1:226 Eight Vertex and Hard Hexagon Models 2:155
Bifurcations of Periodic Orbits 1:285 FalicovKimball Model 2:283
Billiards in Bounded Convex Domains 1:296 Fermionic Systems 2:300
Central Manifolds, Normal Forms 1:467 Finitely Correlated States 2:334
Cellular Automata 1:455 Holonomic Quantum Fields 2:660
Chaos and Attractors 1:477 Hubbard Model 2:712
Cotangent Bundle Reduction 1:658 Large Deviations in Equilibrium Statistical
Diagrammatic Techniques in Perturbation Mechanics 3:261
Theory 2:54 Metastable States 3:417
Dissipative Dynamical Systems of Infinite Phase Transitions in Continuous Systems 4:53
Dimension 2:101 PirogovSinai Theory 4:60
Dynamical Systems and Thermodynamics 2:125 Quantum Central-Limit Theorems 4:130
Dynamical Systems in Mathematical Physics: Quantum Phase Transitions 4:289
An Illustration from Water Waves 2:133 Quantum Spin Systems 4:295
Entropy and Quantitative Transversality 2:237 Quantum Statistical Mechanics: Overview 4:302
Ergodic Theory 2:250 Reflection Positivity and Phase Transitions 4:376
Fractal Dimensions in Dynamics 2:394 Short-Range Spin Glasses: The Metastate
Generic Properties of Dynamical Systems 2:494 Approach 4:570
Gravitational N-Body Problem (Classical) 2:575 Statistical Mechanics and Combinatorial
Hamiltonian Fluid Dynamics 2:593 Problems 5:50
Hamiltonian Systems: Stability and Instability Statistical Mechanics of Interfaces 5:55
Theory 2:631 Superfluids 5:115
Holomorphic Dynamics 2:652 Toeplitz Determinants and Statistical
Homeomorphisms and Diffeomorphisms of the Mechanics 5:244
Circle 2:665 Two-Dimensional Ising Model 5:322
Homoclinic Phenomena 2:672 Wulff Droplets 5:462
h-Pseudodifferential Operators and
Applications 2:701
Hyperbolic Billiards 2:716 Fluid Dynamics
Hyperbolic Dynamical Systems 2:721 Bifurcations in Fluid Dynamics 1:281
Isomonodromic Deformations 3:173 Breaking Water Waves 1:383
CONTENTS LIST BY SUBJECT xxxiii

Capillary Surfaces 1:431 Renormalization: General Theory 4:399


Cauchy Problem for Burgers-Type Equations 1:446 SeibergWitten Theory 4:503
Compressible Flows: Mathematical Theory 1:595 Standard Model of Particle Physics 5:32
Fluid Mechanics: Numerical Methods 2:365 Supergravity 5:122
Geophysical Dynamics 2:534 Supersymmetric Particle Models 5:140
Hamiltonian Fluid Dynamics 2:593 Symmetry Breaking in Field Theory 5:198
Incompressible Euler Equations: Mathematical Twistor Theory: Some Applications 5:303
Theory 3:10 Two-Dimensional Models 5:328
Interfaces and Multicomponent Fluids 3:135
Intermittency in Turbulence 3:144
Inviscid Flows 3:160 General Relativity
Kortewegde Vries Equation and Other Modulation General Relativity: Overview 2:487
Equations 3:239 Asymptotic Structure and Conformal
Lagrangian Dispersion (Passive Scalar) 3:255 Infinity 1:221
Magnetohydrodynamics 3:375 Black Hole Mechanics 1:300
Newtonian Fluids and Thermohydraulics 3:492 Boundaries for Spacetimes 1:326
Non-Newtonian Fluids 3:560 Brane Worlds 1:367
Partial Differential Equations: Some Examples 4:6 Canonical General Relativity 1:412
Peakons 4:12 Critical Phenomena in Gravitational
Stability of Flows 5:1 Collapse 1:668
Superfluids 5:115 Computational Methods in General Relativity:
Turbulence Theories 5:295 The Theory 1:604
Variational Methods in Turbulence 5:351 Cosmology: Mathematical Aspects 1:653
Viscous Incompressible Fluids: Mathematical Dirac Fields in Gravitation and Nonabelian Gauge
Theory 5:369 Theory 2:67
Vortex Dynamics 5:390 EinsteinCartan Theory 2:189
Wavelets: Application to Turbulence 5:408 Einsteins Equations with Matter 2:195
Einstein Equations: Exact Solutions 2:165
Einstein Equations: Initial Value
Gauge Theory Formulation 2:173
Abelian and Nonabelian Gauge Theories Using General Relativity: Experimental Tests 2:481
Differential Forms 1:141 Geometric Analysis and General Relativity 2:502
Abelian Higgs Vortices 1:151 Geometric Flows and the Penrose
AdS/CFT Correspondence 1:174 Inequality 2:510
AharonovBohm Effect 1:191 Gravitational Lensing 2:567
Anomalies 1:205 Gravitational Waves 2:582
BRST Quantization 1:386 Hamiltonian Reduction of Einsteins
ChernSimons Models: Rigorous Results 1:496 Equations 2:607
Dirac Fields in Gravitation and Nonabelian Gauge Minimal Submanifolds 3:420
Theory 2:67 Newtonian Limit of General Relativity 3:503
DonaldsonWitten Theory 2:110 Quantum Field Theory in Curved
Effective Field Theories 2:139 Spacetime 4:202
ElectricMagnetic Duality 2:201 Relativistic Wave Equations Including Higher Spin
Electroweak Theory 2:209 Fields 4:391
Exact Renormalization Group 2:272 Shock Wave Refinement of the Friedman
Gauge Theories from Strings 2:463 RobertsonWalker Metric 4:559
Gauge Theory: Mathematical Applications 2:468 Spacetime Topology, Causal Structure and
Instantons: Topological Aspects 3:44 Singularities 4:617
Large-N and Topological Strings 3:263 Spinors and Spin Coefficients 4:667
Lattice Gauge Theory 3:275 Stability of Minkowski Space 5:14
Measure on Loop Spaces 3:413 Stationary Black Holes 5:38
Noncommutative Geometry and the Standard Twistors 5:311
Model 3:509
Nonperturbative and Topological Aspects of Gauge
Theory 3:568
Perturbative Renormalization Theory and Integrable Systems
BRST 4:41 Integrable Systems: Overview 3:106
Quantum Chromodynamics 4:144 Abelian Higgs Vortices 1:151
Quantum Electrodynamics and Its Precision Affine Quantum Groups 1:183
Tests 4:168 Backlund Transformations 1:241
xxxiv CONTENTS LIST BY SUBJECT

Bethe Ansatz 1:253 Phase Transition Dynamics 4:47


Bi-Hamiltonian Methods in Soliton Theory 1:290 Stochastic Resonance 5:86
Boundary-Value Problems For Integrable
Equations 1:346
CalogeroMoserSutherland Systems of Quantum Field Theory
Nonrelativistic and Relativistic Type 1:403 Quantum Field Theory: A Brief

-Approach to Integrable Systems 2:34 Introduction 4:212
Eigenfunctions of Quantum Completely Integrable AdS/CFT Correspondence 1:174
Systems 2:148 Algebraic Approach to Quantum Field
Functional Equations and Integrable Systems 2:425 Theory 1:198
Holonomic Quantum Fields 2:660 Anomalies 1:205
Instantons: Topological Aspects 3:44 Axiomatic Quantum Field Theory 1:234
Integrability and Quantum Field Theory 3:50 BatalinVilkovisky Quantization 1:247
Integrable Discrete Systems 3:59 Bosons and Fermions in External Fields 1:318
Integrable Systems and Algebraic Geometry 3:65 BRST Quantization 1:386
Integrable Systems and Discrete Geometry 3:78 Constrained Systems 1:611
Integrable Systems and Recursion Operators on Constructive Quantum Field Theory 1:617
Symplectic and Jacobi Manifolds 3:87 Current Algebra 1:674
Integrable Systems and the Inverse Scattering Dirac Operator and Dirac Field 2:74
Method 3:93 Dispersion Relations 2:87
Integrable Systems in Random Matrix Effective Field Theories 2:139
Theory 3:102 Electroweak Theory 2:209
Isochronous Systems 3:166 Euclidean Field Theory 2:256
Nonlinear Schrodinger Equations 3:552 Exact Renormalization Group 2:272
Painleve Equations 4:1 Gerbes in Quantum Field Theory 2:539
Peakons 4:12 Holonomic Quantum Fields 2:660
Quantum CalogeroMoser Systems 4:123 Hopf Algebra Structure of Renormalizable
RiemannHilbert Methods in Integrable Quantum Field Theory 2:678
Systems 4:429 Indefinite Metric 3:17
Sine-Gordon Equation 4:576 Integrability and Quantum Field Theory 3:50
Solitons and KacMoody Lie Algebras 4:594 Large-N and Topological Strings 3:263
Toda Lattices 5:235 Nonperturbative and Topological Aspects of Gauge
Twistor Theory: Some Applications 5:303 Theory 3:568
YangBaxter Equations 5:465 Operator Product Expansion in Quantum Field
Theory 3:616
Quantum Fields with Indefinite Metric: Non-Trivial
M-Theory see String Theory and Models 4:216
Perturbation Theory and Its Techniques 4:28
M-Theory Perturbative Renormalization Theory and
BRST 4:41
Nonequilibrium Statistical Mechanics Quantum Electrodynamics and Its Precision
Nonequilibrium Statistical Mechanics (Stationary): Tests 4:168
Overview 3:530 Quantum Fields with Topological Defects 4:221
Adiabatic Piston 1:160 Quantum Field Theory in Curved
Boltzmann Equation (Classical and Spacetime 4:202
Quantum) 1:306 Quantum Phase Transitions 4:289
Glassy Disordered Systems: Dynamical Renormalization: General Theory 4:399
Evolution 2:553 Renormalization: Statistical Mechanics and
Fourier Law 2:374 Condensed Matter 4:407
Interacting Particle Systems and Hydrodynamic Scattering, Asymptotic Completeness and Bound
Equations 3:123 States 4:475
Interacting Stochastic Particle Systems 3:130 Scattering in Relativistic Quantum Field Theory:
Kinetic Equations 3:200 Fundamental Concepts and Tools 4:456
Macroscopic Fluctuations and Thermodynamic Scattering in Relativistic Quantum Field Theory:
Functionals 3:357 The Analytic Program 4:465
Nonequilibrium Statistical Mechanics: Dynamical SeibergWitten Theory 4:503
Systems Approach 3:540 Standard Model of Particle Physics 5:32
Nonequilibrium Statistical Mechanics: Interaction Supergravity 5:122
between Theory and Numerical Supersymmetric Particle Models 5:140
Simulations 3:544 Symmetries and Conservation Laws 5:166
CONTENTS LIST BY SUBJECT xxxv

Symmetries in Quantum Field Theory: Algebraic Quantum Mechanics: Weak Measurements 4:276
Aspects 5:179 Quantum n-Body Problem 4:283
Symmetries in Quantum Field Theory of Lower Quantum Spin Systems 4:295
Spacetime Dimensions 5:172 Quasiperiodic Systems 4:308
Symmetry Breaking in Field Theory 5:198 Schrodinger Operators 4:487
Two-Dimensional Models 5:328 Stability of Matter 5:8
Thermal Quantum Field Theory 5:227 Stationary Phase Approximation 5:44
TomitaTakesaki Modular Theory 5:251 Supersymmetric Quantum Mechanics 5:145
Topological Defects and Their Homotopy Topological Defects and Their Homotopy
Classification 5:257 Classification 5:257
Twistor Theory: Some Applications 5:303

Quantum Gravity String Theory and M-Theory


AdS/CFT Correspondence 1:174
Knot Invariants and Quantum Gravity 3:215
Brane Construction of Gauge Theories 1:360
Knot Theory and Physics 3:220
Branes and Black Hole Statistical
Loop Quantum Gravity 3:339
Mechanics 1:373
Quantum Cosmology 4:153
Brane Worlds 1:367
Quantum Dynamics in Loop Quantum
Calibrated Geometry and Special Lagrangian
Gravity 4:165
Submanifolds 1:398
Quantum Field Theory in Curved
Compactification of Superstring Theory 1:586
Spacetime 4:202
Derived Categories 2:41
Quantum Geometry and Its Applications 4:230
FourierMukai Transform in String Theory 2:379
Spin Foams 4:645
Gauge Theories from Strings 2:463
WheelerDe Witt Theory 5:453
Large-N and Topological Strings 3:263
Large-N Dualities 3:269
Quantum Information and Computation Mirror Symmetry: A Geometric Survey 3:439
Noncommutative Geometry from Strings 3:515
Capacities Enhanced By Entanglement 1:418
Random Algebraic Geometry, Attractors and
Capacity for Quantum Information 1:424
Flux Vacua 4:323
Channels in Quantum Information Theory 1:472
Riemannian Holonomy Groups and Exceptional
Entanglement 2:228
Holonomy 4:441
Entanglement Measures 2:233
String Field Theory 5:94
Finite Weyl Systems 2:328
String Theory: Phenomenology 5:103
Optimal Cloning of Quantum States 3:628
String Topology: Homotopy and Geometric
Quantum Channels: Classical Capacity 4:142
Perspectives 5:111
Quantum Entropy 4:177
Superstring Theories 5:133
Quantum Error Correction and Fault
Twistor Theory: Some Applications 5:303
Tolerance 4:196
Two-Dimensional Conformal Field Theory and
Source Coding in Quantum Information
Vertex Operator Algebras 5:317
Theory 4:609

Quantum Mechanics YangMills Theory see Gauge Theory


AharonovBohm Effect 1:191
Arithmetic Quantum Chaos 1:212
Coherent States 1:537
RELATED MATHEMATICS
Geometric Phases 2:528 SUBJECTS
h-Pseudodifferential Operators and
Applications 2:701 Algebraic Techniques
N-particle Quantum Scattering 3:585 Affine Quantum Groups 1:183
Normal Forms and Semiclassical Braided and Modular Tensor Categories 1:351
Approximation 3:578 Clifford Algebras and Their
Quantum Entropy 4:177 Representations 1:518
Quantum Ergodicity and Mixing of Derived Categories 2:41
Eigenfunctions 4:183 Finite-Dimensional Algebras and Quivers 2:313
Quantum Mechanical Scattering Finite Group Symmetry Breaking 2:322
Theory 4:251 Hopf Algebras and Q-Deformation Quantum
Quantum Mechanics: Foundations 4:260 Groups 2:687
Quantum Mechanics: Generalizations 4:265 Operads 3:609
xxxvi CONTENTS LIST BY SUBJECT

Algebraic Topology Discrete Mathematics


Characteristic Classes 1:488 Arithmetic Quantum Chaos 1:212
Cohomology Theories 1:545 Combinatorics: Overview 1:553
Derived Categories 2:41 Number Theory in Physics 3:600
Equivariant Cohomology and the Cartan Quasiperiodic Systems 4:308
Model 2:242
FourierMukai Transform in String
Theory 2:379
Functional Analysis and Operator
Index Theorems 3:23 Algebras
Intersection Theory 3:151 Backlund Transformations 1:241
K-theory 3:246 C*-Algebras and their Classification 1:393
MathaiQuillen Formalism 3:390 Coherent States 1:537
Operads 3:609 Free Probability Theory 2:417
Spectral Sequences 4:623 Functional Integration in Quantum Physics 2:434
String Topology: Homotopy and Geometric Gauge Theory: Mathematical Applications 2:468
Perspectives 5:111 h-Pseudodifferential Operators and
Applications 2:701
The Jones Polynomial 3:179
Complex Geometry K-Theory 3:246
LeraySchauder Theory and Mapping
Derived Categories 2:41
Degree 3:281
Gauge Theory: Mathematical Applications 2:468
LjusternikSchnirelman Theory 3:328
FourierMukai Transform in String Theory 2:379
Ordinary Special Functions 3:637
Knot Homologies 3:208
Positive Maps on C*-Algebras 4:88
Mirror Symmetry: A Geometric Survey 3:439
Quantum Dynamical Semigroups 4:159
Moduli Spaces: An Introduction 3:449
Saddle Point Problems 4:447
Quillen Determinant 4:315
Spectral Theory of Linear Operators 4:633
Riemann Surfaces 4:419
TomitaTakesaki Modular Theory 5:251
RiemannHilbert Problem 4:436
von Neumann Algebras: Introduction, Modular
Several Complex Variables: Basic Geometric
Theory, and Classification Theory 5:379
Theory 4:540
von Neumann Algebras: Subfactor Theory 5:385
Several Complex Variables: Compact
Wavelets: Applications 5:420
Manifolds 4:551
Wavelets: Mathematical Theory 5:426
Twistor Theory: Some Applications 5:303

Lie Groups and Lie Algebras


Differential Geometry Classical Groups and Homogeneous Spaces 1:500
Calibrated Geometry and Special Lagrangian Compact Groups and Their
Submanifolds 1:398 Representations 1:576
Capillary Surfaces 1:431 Finite-Dimensional Algebras and Quivers 2:313
Characteristic Classes 1:488 Lie Groups: General Theory 3:286
Derived Categories 2:41 Lie Superalgebras and Their
Einstein Manifolds 2:182 Representations 3:305
FourierMukai Transform in String Theory 2:379 Lie, Symplectic, and Poisson Groupoids and Their
Gauge Theory: Mathematical Applications 2:468 Lie Algebroids 3:312
Index Theorems 3:23 Pseudo-Riemannian Nilpotent Lie Groups 4:94
Intersection Theory 3:151 RiemannHilbert Problem 4:436
K-Theory 3:246 Solitons and KacMoody Lie Algebras 4:594
Lorentzian Geometry 3:343
Mathai-Quillen Formalism 3:390
Moduli Spaces: An Introduction 3:449 Low Dimensional Geometry
Quillen Determinant 4:315 Finite-type Invariants of 3-Manifolds 2:348
Pseudo-Riemannian Nilpotent Lie Groups 4:94 Floer Homology 2:356
Riemann-Hilbert Problem 4:436 Four-manifold Invariants and Physics 2:386
Riemannian Holonomy Groups and Exceptional Gauge Theoretic Invariants of 4-Manifolds 2:457
Holonomy 4:441 Gauge Theory: Mathematical Applications 2:468
Singularity and Bifurcation Theory 4:588 The Jones Polynomial 3:179
Supermanifolds 5:128 Knot Invariants and Quantum Gravity 3:215
Twistor Theory: Some Applications 5:303 Large-N and Topological Strings 3:263
CONTENTS LIST BY SUBJECT xxxvii

Quantum 3-Manifold Invariants 4:117 Deformation Quantization and Representation


Singularities of the Ricci Flow 4:584 Theory 2:9
Twistor Theory: Some Applications 5:303 Deformation Theory 2:16
Deformations of the Poisson Bracket on a
Symplectic Manifold 2:24
Noncommutative Geometry Fedosov Quantization 2:291
Hopf Algebra Structure of Renormalizable Feynman Path Integrals 2:307
Quantum Field Theory 2:678 Functional Integration in Quantum Physics 2:434
Noncommutative Geometry and the Standard Path Integrals in Noncommutative Geometry 4:8
Model 3:509 Regularization for Dynamical -Functions 4:386
Noncommutative Geometry from Strings 3:515
Noncommutative Tori, Yang-Mills, and String
Theory 3:524
Quantum Groups
Path Integrals in Noncommutative Geometry 4:8 Affine Quantum Groups 1:183
Quantum Group Differentials, Bundles and Gauge Bicrossproduct Hopf Algebras and
Theory 4:236 Noncommutative Spacetime 1:265
Quantum Hall Effect 4:244 Braided and Modular Tensor Categories 1:351
RiemannHilbert Problem 4:436 Classical r-Matrices, Lie Bialgebras, and Poisson Lie
Groups 1:511
Hopf Algebras and q-Deformation Quantum
Ordinary and Partial Differential Groups 2:687
Equations Hopf Algebra Structure of Renormalizable
Quantum Field Theory 2:678
Bifurcation Theory 1:275
q-Special Functions 4:105
Boltzmann Equation (Classical and
Quantum Group Differentials, Bundles and Gauge
Quantum) 1:306
Theory 4:236
Boundary Control Method and Inverse Problems
YangBaxter Equations 5:465
of Wave Propagation 1:340
Capillary Surfaces 1:431
Cauchy Problem for Burgers-Type Equations 1:446 Stochastic Methods
Elliptic Differential Equations: Linear Determinantal Random Fields 2:47
Theory 2:216 Free Probability Theory 2:417
Evolution Equations: Linear and Nonlinear 2:265 Growth Processes in Random Matrix
Fluid Mechanics: Numerical Methods 2:365 Theory 2:586
GinzburgLandau Equation 2:547 Integrable Systems in Random Matrix
Image Processing: Mathematics 3:1 Theory 3:102
Inequalities in Sobolev Spaces 3:32 Lagrangian Dispersion (Passive Scalar) 3:255
Isomonodromic Deformations 3:173 Malliavin Calculus 3:383
Kinetic Equations 3:200 Measure on Loop Spaces 3:413
Localization For Quasiperiodic Potentials 3:333 Random Matrix Theory in Physics 4:338
Magnetic Resonance Imaging 3:367 Random Partitions 4:347
Minimal Submanifolds 3:420 Random Walks in Random Environments 4:353
Painleve Equations 4:1 Stochastic Differential Equations 5:63
Partial Differential Equations: Some Examples 4:6 Stochastic Hydrodynamics 5:71
Relativistic Wave Equations Including Higher Spin Stochastic Loewner Evolutions 5:80
Fields 4:391 Supersymmetry Methods in Random Matrix
RiemannHilbert Problem 4:436 Theory 5:151
Semilinear Wave Equations 4:518 Symmetry Classes in Random Matrix
Separation of Variables for Differential Theory 5:204
Equations 4:526
Stationary Phase Approximation 5:44
Symmetric Hyperbolic Systems and Shock Symplectic Geometry and Topology
Waves 5:160 Classical r-Matrices, Lie Bialgebras, and Poisson Lie
Wave Equations and Diffraction 5:401 Groups 1:511
Contact Manifolds 1:631
Quantization Methods and Path Deformations of the Poisson Bracket on a
Symplectic Manifold 2:24
Integration Fedosov Quantization 2:291
Coherent States 1:537 Floer Homology 2:356
Deformation Quantization 2:1 Graded Poisson Algebras 2:560
xxxviii CONTENTS LIST BY SUBJECT

Hamiltonian Group Actions 2:600 Free Interfaces and Free Discontinuities: Variational
Mirror Symmetry: A Geometric Survey 3:439 Problems 2:411
Multi-Hamiltonian Systems 3:459 -Convergence and Homogenization 2:449
Recursion Operators in Classical Gauge Theory: Mathematical Applications 2:468
Mechanics 4:371 Geometric Measure Theory 2:520
Singularity and Bifurcation Theory 4:588 HamiltonJacobi Equations and Dynamical
Stationary Phase Approximation 5:44 Systems: Variational Aspects 2:636
Minimax Principle in the Calculus of
Variations 3:432
Variational Techniques Optimal Transportation 3:632
Capillary Surfaces 1:431 Variational Techniques for GinzburgLandau
Control Problems in Mathematical Physics 1:636 Energies 5:355
Convex Analysis and Duality Methods 1:642 Variational Techniques for Microstructures 5:363
CONTENTS

VOLUME 1

Introductory Article: Classical Mechanics G Gallavotti 1


Introductory Article: Differential Geometry S Paycha 33
Introductory Article: Electromagnetism N M J Woodhouse 40
Introductory Article: Equilibrium Statistical Mechanics G Gallavotti 51
Introductory Article: Functional Analysis S Paycha 88
Introductory Article: Minkowski Spacetime and Special Relativity G L Naber 96
Introductory Article: Quantum Mechanics G F dellAntonio 109
Introductory Article: Topology Tsou Sheung Tsun 131

A
Abelian and Nonabelian Gauge Theories Using Differential Forms A C Hirshfeld 141
Abelian Higgs Vortices J M Speight 151
Adiabatic Piston Ch Gruber and A Lesne 160
AdS/CFT Correspondence C P Herzog and I R Klebanov 174
Affine Quantum Groups G W Delius and N MacKay 183
AharonovBohm Effect M Socolovsky 191
Algebraic Approach to Quantum Field Theory R Brunetti and K Fredenhagen 198
Anderson Localization see Localization for Quasiperiodic Potentials
Anomalies S L Adler 205
Arithmetic Quantum Chaos J Marklof 212
Asymptotic Structure and Conformal Infinity J Frauendiener 221
Averaging Methods A I Neishtadt 226
Axiomatic Approach to Topological Quantum Field Theory C Blanchet and V Turaev 232
Axiomatic Quantum Field Theory B Kuckert 234

B
Backlund Transformations D Levi 241
BatalinVilkovisky Quantization A C Hirshfeld 247
Bethe Ansatz M T Batchelor 253
BF Theories M Blau 257
Bicrossproduct Hopf Algebras and Noncommutative Spacetime S Majid 265
xl CONTENTS

Bifurcation Theory M Haragus and G Iooss 275


Bifurcations in Fluid Dynamics G Schneider 281
Bifurcations of Periodic Orbits J-P Franoise 285
Bi-Hamiltonian Methods in Soliton Theory M Pedroni 290
Billiards in Bounded Convex Domains S Tabachnikov 296
Black Hole Mechanics A Ashtekar 300
Boltzmann Equation (Classical and Quantum) M Pulvirenti 306
BoseEinstein Condensates F Dalfovo, L P Pitaevskii and S Stringari 312
Bosons and Fermions in External Fields E Langmann 318
Boundaries for Spacetimes S G Harris 326
Boundary Conformal Field Theory J Cardy 333
Boundary Control Method and Inverse Problems of Wave Propagation M I Belishev 340
Boundary-Value Problems for Integrable Equations B Pelloni 346
Braided and Modular Tensor Categories V Lyubashenko 351
Brane Construction of Gauge Theories S L Cacciatori 360
Brane Worlds R Maartens 367
Branes and Black Hole Statistical Mechanics S R Das 373
Breaking Water Waves A Constantin 383
BRST Quantization M Henneaux 386

C
C -Algebras and their Classification G A Elliott 393
Calibrated Geometry and Special Lagrangian Submanifolds D D Joyce 398
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type S N M Ruijsenaars 403
Canonical General Relativity C Rovelli 412
Capacities Enhanced by Entanglement P Hayden 418
Capacity for Quantum Information D Kretschmann 424
Capillary Surfaces R Finn 431
Cartan Model see Equivariant Cohomology and the Cartan Model
Cauchy Problem for Burgers-Type Equations G M Henkin 446
Cellular Automata M Bruschi and F Musso 455
Central Manifolds, Normal Forms P Bonckaert 467
Channels in Quantum Information Theory M Keyl 472
Chaos and Attractors R Gilmore 477
Characteristic Classes P B Gilkey, R Ivanova and S Nikcevic 488
ChernSimons Models: Rigorous Results A N Sengupta 496
Classical Groups and Homogeneous Spaces S Gindikin 500
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups M A Semenov-Tian-Shansky 511
Clifford Algebras and Their Representations A Trautman 518
Cluster Expansion R Kotecky 531
Coherent States S T Ali 537
Cohomology Theories U Tillmann 545
Combinatorics: Overview C Krattenthaler 553
Compact Groups and Their Representations A Kirillov and A Kirillov, Jr. 576
Compactification of Superstring Theory M R Douglas 586
Compressible Flows: Mathematical Theory G-Q Chen 595
Computational Methods in General Relativity: The Theory M W Choptuik 604
CONTENTS xli

Confinement see Quantum Chromodynamics


Conformal Geometry see Two-dimensional Conformal Field Theory and Vertex Operator Algebras
Conservation Laws see Symmetries and Conservation Laws
Constrained Systems M Henneaux 611
Constructive Quantum Field Theory G Gallavotti 617
Contact Manifolds J B Etnyre 631
Control Problems in Mathematical Physics B Piccoli 636
Convex Analysis and Duality Methods G Bouchitte 642
Cosmic Censorship see Spacetime Topology, Causal Structure and Singularities
Cosmology: Mathematical Aspects G F R Ellis 653
Cotangent Bundle Reduction J-P Ortega and T S Ratiu 658
Critical Phenomena in Gravitational Collapse C Gundlach 668
Current Algebra G A Goldin 674

VOLUME 2
D
Deformation Quantization A C Hirshfeld 1
Deformation Quantization and Representation Theory S Waldmann 9
Deformation Theory M J Pflaum 16
Deformations of the Poisson Bracket on a Symplectic Manifold S Gutt and S Waldmann 24

@-Approach to Integrable Systems P G Grinevich 34
Derived Categories E R Sharpe 41
Determinantal Random Fields A Soshnikov 47
Diagrammatic Techniques in Perturbation Theory G Gentile 54
Dimer Problems R Kenyon 61
Dirac Fields in Gravitation and Nonabelian Gauge Theory J A Smoller 67
Dirac Operator and Dirac Field S N M Ruijsenaars 74
Dispersion Relations J Bros 87
Dissipative Dynamical Systems of Infinite Dimension M Efendiev, S Zelik and A Miranville 101
Donaldson Invariants see Gauge Theoretic Invariants of 4-Manifolds
DonaldsonWitten Theory M Marino 110
Duality in Topological Quantum Field Theory C Lozano and J M F Labastida 118
Dynamical Systems and Thermodynamics A Carati, L Galgani and A Giorgilli 125
Dynamical Systems in Mathematical Physics: An Illustration from Water Waves O Goubet 133

E
Effective Field Theories G Ecker 139
Eigenfunctions of Quantum Completely Integrable Systems J A Toth 148
Eight Vertex and Hard Hexagon Models P A Pearce 155
Einstein Equations: Exact Solutions Jir Bicak 165
Einstein Equations: Initial Value Formulation J Isenberg 173
Einstein Manifolds A S Dancer 182
EinsteinCartan Theory A Trautman 189
Einsteins Equations with Matter Y Choquet-Bruhat 195
ElectricMagnetic Duality Tsou Sheung Tsun 201
Electroweak Theory K Konishi 209
Elliptic Differential Equations: Linear Theory C Amrouche, M Krbec, S Necasova and B Lucquin-Desreux 216
Entanglement R F Werner 228
xlii CONTENTS

Entanglement Measures R F Werner 233


Entropy and Quantitative Transversality G Comte 237
Equivariant Cohomology and the Cartan Model E Meinrenken 242
Ergodic Theory M Yuri 250
Euclidean Field Theory F Guerra 256
Evolution Equations: Linear and Nonlinear J Escher 265
Exact Renormalization Group P K Mitter 272

F
FalicovKimball Model Ch Gruber and D Ueltschi 283
Fedosov Quantization N Neumaier 291
Feigenbaum Phenomenon see Universality and Renormalization
Fermionic Systems V Mastropietro 300
Feynman Path Integrals S Mazzucchi 307
Finite-Dimensional Algebras and Quivers A Savage 313
Finite Group Symmetry Breaking G Gaeta 322
Finite Weyl Systems D-M Schlingemann 328
Finitely Correlated States R F Werner 334
Finite-Type Invariants D Bar-Natan 340
Finite-Type Invariants of 3-Manifolds T T Q Le 348
Floer Homology P B Kronheimer 356
Fluid Mechanics: Numerical Methods J-L Guermond 365
Fourier Law F Bonetto and L Rey-Bellet 374
FourierMukai Transform in String Theory B Andreas 379
Four-Manifold Invariants and Physics C Nash 386
Fractal Dimensions in Dynamics V Zupanovic and D Zubrinic 394
Fractional Quantum Hall Effect J K Jain 402
Free Interfaces and Free Discontinuities: Variational Problems G Buttazzo 411
Free Probability Theory D-V Voiculescu 417
Frobenius Manifolds see WDVV Equations and Frobenius Manifolds
Functional Equations and Integrable Systems H W Braden 425
Functional Integration in Quantum Physics C DeWitt-Morette 434

G
-Convergence and Homogenization G Dal Maso 449
Gauge Theoretic Invariants of 4-Manifolds S Bauer 457
Gauge Theories from Strings P Di Vecchia 463
Gauge Theory: Mathematical Applications S K Donaldson 468
General Relativity: Experimental Tests C M Will 481
General Relativity: Overview R Penrose 487
Generic Properties of Dynamical Systems C Bonatti 494
Geometric Analysis and General Relativity L Andersson 502
Geometric Flows and the Penrose Inequality H Bray 510
Geometric Measure Theory G Alberti 520
Geometric Phases P Levay 528
Geophysical Dynamics M B Ziane 534
Gerbes in Quantum Field Theory J Mickelsson 539
CONTENTS xliii

GinzburgLandau Equation Y Morita 547


Glassy Disordered Systems: Dynamical Evolution S Franz 553
Graded Poisson Algebras A S Cattaneo, D Fiorenza and R Longoni 560
Gravitational Lensing J Wambsganss 567
Gravitational N-Body Problem (Classical) D C Heggie 575
Gravitational Waves G Gonzalez and J Pullin 582
Growth Processes in Random Matrix Theory K Johansson 586

H
Hamiltonian Fluid Dynamics P J Morrison 593
Hamiltonian Group Actions L C Jeffrey 600
Hamiltonian Reduction of Einsteins Equations A E Fischer and V Moncrief 607
Hamiltonian Systems: Obstructions to Integrability M Irigoyen 624
Hamiltonian Systems: Stability and Instability Theory P Bernard 631
HamiltonJacobi Equations and Dynamical Systems: Variational Aspects A Siconolfi 636
Hard Hexagon Model see Eight Vertex and Hard Hexagon Models
High Tc Superconductor Theory S-C Zhang 645
Holomorphic Dynamics M Lyubich 652
Holonomic Quantum Fields J Palmer 660
Homeomorphisms and Diffeomorphisms of the Circle A Zumpano and A Sarmiento 665
Homoclinic Phenomena S E Newhouse 672
Hopf Algebra Structure of Renormalizable Quantum Field Theory D Kreimer 678
Hopf Algebras and q-Deformation Quantum Groups S Majid 687
h-Pseudodifferential Operators and Applications B Helffer 701
Hubbard Model H Tasaki 712
Hydrodynamic Equations see Interacting Particle Systems and Hydrodynamic Equations
Hyperbolic Billiards M P Wojtkowski 716
Hyperbolic Dynamical Systems B Hasselblatt 721

VOLUME 3
I
Image Processing: Mathematics G Aubert and P Kornprobst 1
Incompressible Euler Equations: Mathematical Theory D Chae 10
Indefinite Metric H Gottschalk 17
Index Theorems P B Gilkey, K Kirsten, R Ivanova and J H Park 23
Inequalities in Sobolev Spaces M Vaugon 32
Infinite-Dimensional Hamiltonian Systems R Schmid 37
Instantons: Topological Aspects M Jardim 44
Integrability and Quantum Field Theory T J Hollowood 50
Integrable Discrete Systems O Ragnisco 59
Integrable Systems and Algebraic Geometry E Previato 65
Integrable Systems and Discrete Geometry A Doliwa and P M Santini 78
Integrable Systems and Recursion Operators on Symplectic and Jacobi Manifolds R Caseiro and
J M Nunes da Costa 87
Integrable Systems and the Inverse Scattering Method A S Fokas 93
Integrable Systems in Random Matrix Theory C A Tracy and H Widom 102
Integrable Systems: Overview Francesco Calogero 106
xliv CONTENTS

Interacting Particle Systems and Hydrodynamic Equations C Landim 123


Interacting Stochastic Particle Systems H Spohn 130
Interfaces and Multicomponent Fluids J Kim and J Lowengrub 135
Intermittency in Turbulence J Jimenez 144
Intersection Theory A Kresch 151
Inverse Problem in Classical Mechanics R G Novikov 156
Inverse Problems in Wave Propagation see Boundary Control Method and Inverse Problems of Wave
Propagation
Inviscid Flows R Robert 160
Ising Model see Two-Dimensional Ising Model
Isochronous Systems Francesco Calogero 166
Isomonodromic Deformations V P Kostov 173

J
The Jones Polynomial V F R Jones 179

K
KacMoody Lie Algebras see Solitons and KacMoody Lie Algebras
KAM Theory and Celestial Mechanics L Chierchia 189
Kinetic Equations C Bardos 200
Knot Homologies J Rasmussen 208
Knot Invariants and Quantum Gravity R Gambini and J Pullin 215
Knot Theory and Physics L H Kauffman 220
Kontsevich Integral S Chmutov and S Duzhin 231
Kortewegde Vries Equation and Other Modulation Equations G Schneider and E Wayne 239
K-Theory V Mathai 246

L
Lagrangian Dispersion (Passive Scalar) G Falkovich 255
Large Deviations in Equilibrium Statistical Mechanics S Shlosman 261
Large-N and Topological Strings R Gopakumar 263
Large-N Dualities A Grassi 269
Lattice Gauge Theory A Di Giacomo 275
LeraySchauder Theory and Mapping Degree J Mawhin 281
Lie Bialgebras see Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups
Lie Groups: General Theory R Gilmore 286
Lie Superalgebras and Their Representations L Frappat 305
Lie, Symplectic, and Poisson Groupoids and Their Lie Algebroids C-M Marle 312
Liquid Crystals O D Lavrentovich 320
LjusternikSchnirelman Theory J Mawhin 328
Localization for Quasiperiodic Potentials S Jitomirskaya 333
Loop Quantum Gravity C Rovelli 339
Lorentzian Geometry P E Ehrlich and S B Kim 343
Lyapunov Exponents and Strange Attractors M Viana 349

M
Macroscopic Fluctuations and Thermodynamic Functionals G Jona-Lasinio 357
Magnetic Resonance Imaging C L Epstein and F W Wehrli 367
Magnetohydrodynamics C Le Bris 375
CONTENTS xlv

Malliavin Calculus A B Cruzeiro 383


MarsdenWeinstein Reduction see Cotangent Bundle Reduction: Poisson Reduction: Symmetry and
Symplectic Reduction
Maslov Index see Optical Caustics: Semiclassical Spectra and Closed Orbits: Stationary Phase
Approximation
MathaiQuillen Formalism S Wu 390
Mathematical Knot Theory L Boi 399
Matrix Product States see Finitely Correlated States
Mean Curvature Flow see Geometric Flows and the Penrose Inequality
Mean Field Spin Glasses and Neural Networks A Bovier 407
Measure on Loop Spaces H Airault 413
Metastable States S Shlosman 417
Minimal Submanifolds T H Colding and W P Minicozzi II 420
Minimax Principle in the Calculus of Variations A Abbondandolo 432
Mirror Symmetry: A Geometric Survey R P Thomas 439
Modular Tensor Categories see Braided and Modular Tensor Categories
Moduli Spaces: An Introduction F Kirwan 449
Multicomponent Fluids see Interfaces and Multicomponent Fluids
Multi-Hamiltonian Systems F Magri and M Pedroni 459
Multiscale Approaches A Lesne 465

N
Negative Refraction and Subdiffraction Imaging S OBrien and S A Ramakrishna 483
Newtonian Fluids and Thermohydraulics G Labrosse and G Kasperski 492
Newtonian Limit of General Relativity J Ehlers 503
Noncommutative Geometry and the Standard Model T Schucker 509
Noncommutative Geometry from Strings Chong-Sun Chu 515
Noncommutative Tori, YangMills, and String Theory A Konechny 524
Nonequilibrium Statistical Mechanics (Stationary): Overview G Gallavotti 530
Nonequilibrium Statistical Mechanics: Dynamical Systems Approach P Butta and C Marchioro 540
Nonequilibrium Statistical Mechanics: Interaction between Theory and
Numerical Simulations R Livi 544
Nonlinear Schrodinger Equations M J Ablowitz and B Prinari 552
Non-Newtonian Fluids C Guillope 560
Nonperturbative and Topological Aspects of Gauge Theory R W Jackiw 568
Normal Forms and Semiclassical Approximation D Bambusi 578
N-Particle Quantum Scattering D R Yafaev 585
Nuclear Magnetic Resonance P T Callaghan 592
Number Theory in Physics M Marcolli 600

O
Operads J Stasheff 609
Operator Product Expansion in Quantum Field Theory H Osborn 616
Optical Caustics A Joets 620
Optimal Cloning of Quantum States M Keyl 628
Optimal Transportation Y Brenier 632
Ordinary Special Functions W Van Assche 637
xlvi CONTENTS

VOLUME 4
P
Painleve Equations N Joshi 1
Partial Differential Equations: Some Examples R Temam 6
Path Integral Methods see Functional Integration in Quantum Physics; Feynman Path Integrals
Path Integrals in Noncommutative Geometry R Leandre 8
Peakons D D Holm 12
Penrose Inequality see Geometric Flows and the Penrose Inequality
Percolation Theory V Beffara and V Sidoravicius 21
Perturbation Theory and Its Techniques R J Szabo 28
Perturbative Renormalization Theory and BRST K Fredenhagen and M Dutsch 41
Phase Transition Dynamics A Onuki 47
Phase Transitions in Continuous Systems E Presutti 53
PirogovSinai Theory R Kotecky 60
Point-Vortex Dynamics S Boatto and D Crowdy 66
Poisson Lie Groups see Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups
Poisson Reduction J-P Ortega and T S Ratiu 79
Polygonal Billiards S Tabachnikov 84
Positive Maps on C-Algebras F Cipriani 88
Pseudo-Riemannian Nilpotent Lie Groups P E Parker 94

Q
q-Special Functions T H Koornwinder 105
Quantum 3-Manifold Invariants C Blanchet and V Turaev 117
Quantum CalogeroMoser Systems R Sasaki 123
Quantum Central-Limit Theorems A F Verbeure 130
Quantum Channels: Classical Capacity A S Holevo 142
Quantum Chromodynamics G Sterman 144
Quantum Cosmology M Bojowald 153
Quantum Dynamical Semigroups R Alicki 159
Quantum Dynamics in Loop Quantum Gravity H Sahlmann 165
Quantum Electrodynamics and Its Precision Tests S Laporta and E Remiddi 168
Quantum Entropy D Petz 177
Quantum Ergodicity and Mixing of Eigenfunctions S Zelditch 183
Quantum Error Correction and Fault Tolerance D Gottesman 196
Quantum Field Theory in Curved Spacetime B S Kay 202
Quantum Field Theory: A Brief Introduction L H Ryder 212
Quantum Fields with Indefinite Metric: Non-Trivial Models S Albeverio and H Gottschalk 216
Quantum Fields with Topological Defects M Blasone, G Vitiello and P Jizba 221
Quantum Geometry and Its Applications A Ashtekar and J Lewandowski 230
Quantum Group Differentials, Bundles and Gauge Theory T Brzezinski 236
Quantum Hall Effect K Hannabuss 244
Quantum Mechanical Scattering Theory D R Yafaev 251
Quantum Mechanics: Foundations R Penrose 260
Quantum Mechanics: Generalizations P Pearle and A Valentini 265
Quantum Mechanics: Weak Measurements L Diosi 276
Quantum n-Body Problem R G Littlejohn 283
CONTENTS xlvii

Quantum Phase Transitions S Sachdev 289


Quantum Spin Systems B Nachtergaele 295
Quantum Statistical Mechanics: Overview L Triolo 302
Quasiperiodic Systems P Kramer 308
Quillen Determinant S Scott 315
Quivers see Finite-Dimensional Algebras and Quivers

R
Random Algebraic Geometry, Attractors and Flux Vacua M R Douglas 323
Random Dynamical Systems V Araujo 330
Random Matrix Theory in Physics T Guhr 338
Random Partitions A Okounkov 347
Random Walks in Random Environments L V Bogachev 353
Recursion Operators in Classical Mechanics F Magri and M Pedroni 371
Reflection Positivity and Phase Transitions Y Kondratiev and Y Kozitsky 376
Regularization for Dynamical -Functions V Baladi 386
Relativistic Wave Equations Including Higher Spin Fields R Illge and V Wunsch 391
Renormalization: General Theory J C Collins 399
Renormalization: Statistical Mechanics and Condensed Matter M Salmhofer 407
Resonances N Burq 415
Ricci Flow see Singularities of the Ricci Flow
Riemann Surfaces K Hulek 419
RiemannHilbert Methods in Integrable Systems D Shepelsky 429
RiemannHilbert Problem V P Kostov 436
Riemannian Holonomy Groups and Exceptional Holonomy D D Joyce 441

S
Saddle Point Problems M Schechter 447
Scattering in Relativistic Quantum Field Theory: Fundamental Concepts and Tools D Buchholz and
S J Summers 456
Scattering in Relativistic Quantum Field Theory: The Analytic Program J Bros 465
Scattering, Asymptotic Completeness and Bound States D Iagolnitzer and J Magnen 475
Schrodinger Operators V Bach 487
Schwarz-Type Topological Quantum Field Theory R K Kaul, T R Govindarajan and P Ramadevi 494
SeibergWitten Theory Siye Wu 503
Semiclassical Approximation see Stationary Phase Approximation; Normal Forms and
Semiclassical Approximation
Semiclassical Spectra and Closed Orbits Y Colin de Verdiere 512
Semilinear Wave Equations P DAncona 518
Separation of Variables for Differential Equations S Rauch-Wojciechowski and K Marciniak 526
Separatrix Splitting D Treschev 535
Several Complex Variables: Basic Geometric Theory A Huckleberry and T Peternell 540
Several Complex Variables: Compact Manifolds A Huckleberry and T Peternell 551
Shock Wave Refinement of the FriedmanRobertsonWalker Metric B Temple and J Smoller 559
Shock Waves see Symmetric Hyperbolic Systems and Shock Waves
Short-Range Spin Glasses: The Metastate Approach C M Newman and D L Stein 570
Sine-Gordon Equation S N M Ruijsenaars 576
Singularities of the Ricci Flow M Anderson 584
Singularity and Bifurcation Theory J-P Francoise and C Piquet 588
xlviii CONTENTS

Sobolev Spaces see Inequalities in Sobolev Spaces


Solitons and KacMoody Lie Algebras E Date 594
Solitons and Other Extended Field Configurations R S Ward 602
Source Coding in Quantum Information Theory N Datta and T C Dorlas 609
Spacetime Topology, Causal Structure and Singularities R Penrose 617
Special Lagrangian Submanifolds see Calibrated Geometry and Special Lagrangian Submanifolds
Spectral Sequences P Selick 623
Spectral Theory of Linear Operators M Schechter 633
Spin Foams A Perez 645
Spin Glasses F Guerra 655
Spinors and Spin Coefficients K P Tod 667

VOLUME 5

Stability of Flows S Friedlander 1


Stability of Matter J P Solovej 8
Stability of Minkowski Space S Klainerman 14
Stability Problems in Celestial Mechanics A Celletti 20
Stability Theory and KAM G Gentile 26
Standard Model of Particle Physics G Altarelli 32
Stationary Black Holes R Beig and P T Chrusciel 38
Stationary Phase Approximation J J Duistermaat 44
Statistical Mechanics and Combinatorial Problems R Zecchina 50
Statistical Mechanics of Interfaces S Miracle-Sole 55
Stochastic Differential Equations F Russo 63
Stochastic Hydrodynamics B Ferrario 71
Stochastic Loewner Evolutions G F Lawler 80
Stochastic Resonance S Herrmann and P Imkeller 86
Strange Attractors see Lyapunov Exponents and Strange Attractors
String Field Theory L Rastelli 94
String Theory: Phenomenology A M Uranga 103
String Topology: Homotopy and Geometric Perspectives R L Cohen 111
Superfluids D Einzel 115
Supergravity K S Stelle 122
Supermanifolds F A Rogers 128
Superstring Theories C Bachas and J Troost 133
Supersymmetric Particle Models S Pokorski 140
Supersymmetric Quantum Mechanics J-W van Holten 145
Supersymmetry Methods in Random Matrix Theory M R Zirnbauer 151
Symmetric Hyperbolic Systems and Shock Waves S Kichenassamy 160
Symmetries and Conservation Laws L H Ryder 166
Symmetries in Quantum Field Theory of Lower Spacetime Dimensions J Mund and K-H Rehren 172
Symmetries in Quantum Field Theory: Algebraic Aspects J E Roberts 179
Symmetry and Symmetry Breaking in Dynamical Systems I Melbourne 184
Symmetry and Symplectic Reduction J-P Ortega and T S Ratiu 190
Symmetry Breaking in Field Theory T W B Kibble 198
Symmetry Classes in Random Matrix Theory M R Zirnbauer 204
Synchronization of Chaos M A Aziz-Alaoui 213
CONTENTS xlix

T
t HooftPolyakov Monopoles see Solitons and Other Extended Field Configurations
Thermal Quantum Field Theory C D Jakel 227
Thermohydraulics see Newtonian Fluids and Thermohydraulics
Toda Lattices Y B Suris 235
Toeplitz Determinants and Statistical Mechanics E L Basor 244
TomitaTakesaki Modular Theory S J Summers 251
Topological Defects and Their Homotopy Classification T W B Kibble 257
Topological Gravity, Two-Dimensional T Eguchi 264
Topological Knot Theory and Macroscopic Physics L Boi 271
Topological Quantum Field Theory: Overview J M F Labastida and C Lozano 278
Topological Sigma Models D Birmingham 290
Turbulence Theories R M S Rosa 295
Twistor Theory: Some Applications L Mason 303
Twistors K P Tod 311
Two-Dimensional Conformal Field Theory and Vertex Operator Algebras M R Gaberdiel 317
Two-Dimensional Ising Model B M McCoy 322
Two-Dimensional Models B Schroer 328

U
Universality and Renormalization M Lyubich 343

V
Variational Methods in Turbulence F H Busse 351
Variational Techniques for GinzburgLandau Energies S Serfaty 355
Variational Techniques for Microstructures G Dolzmann 363
Vertex Operator Algebras see Two-Dimensional Conformal Field Theory and Vertex
Operator Algebras
Viscous Incompressible Fluids: Mathematical Theory J G Heywood 369
von Neumann Algebras: Introduction, Modular Theory, and Classification Theory V S Sunder 379
von Neumann Algebras: Subfactor Theory Y Kawahigashi 385
Vortex Dynamics M Nitsche 390
Vortices see Abelian Higgs Vortices: Point-Vortex Dynamics

W
Wave Equations and Diffraction M E Taylor 401
Wavelets: Application to Turbulence M Farge and K Schneider 408
Wavelets: Applications M Yamada 420
Wavelets: Mathematical Theory K Schneider and M Farge 426
WDVV Equations and Frobenius Manifolds B Dubrovin 438
Weakly Coupled Oscillators E M Izhikevich and Y Kuramoto 448
WheelerDe Witt Theory J Maharana 453
Wightman Axioms see Axiomatic Quantum Field Theory
Wulff Droplets S Shlosman 462

Y
YangBaxter Equations J H H Perk and H Au-Yang 465

INDEX 475
Introductory Articles
Introductory Article: Classical Mechanics
G Gallavotti, Universita di Roma La Sapienza, forces not corresponding to a potential are certain
Rome, Italy velocity-dependent forces like the Coriolis force
2006 G Gallavotti. Published by Elsevier Ltd. (which, however, appears only in noninertial frames
All rights reserved. of reference) and the closely related Lorentz force
(in electromagnetism): they could be easily accom-
modated in the Hamiltonian formulation of
mechanics; see Appendix 2.
General Principles The action principle states that an equivalent
Classical mechanics is a theory of motions of point formulation of the eqns [1] is that a motion
particles. If X = (x1 , . . . , xn ) are the particle positions t ! X 0 (t) satisfying [1] during a time interval
in a Cartesian inertial system of coordinates, the [t1 , t2 ] and leading from X 1 = X 0 (t1 ) to X 2 = X 0 (t2 ),
equations of motion are determined by their masses renders stationary the action
(m1 , . . . , mn ), mj > 0, and by the potential energy of Z t2 X !
n
interaction, V(x1 , . . . , xn ), as 1 _ 2
AfXg mi X i t  VXt dt 2
t1 i1
2
i @xi Vx1 ; . . . ; xn ;
mi x i 1; . . . ; n 1
within the class Mt1 , t2 (X 1 , X 2 ) of smooth (i.e.,
here xi = (xi1 , . . . , xid ) are coordinates of the ith analytic) motions t ! X(t) defined for t 2 [t1 , t2 ]
particle and @xi is the gradient (@xi1 , . . . , @xid ); d is the and leading from X 1 to X 2 .
space dimension (i.e., d = 3, usually). The potential The function
energy function will be supposed smooth, that is,
analytic except, possibly, when two positions coin- 1X n
def
LY, X mi y2i  VX KY  VX,
cide. The latter exception is necessary to include the 2 i1
important cases of gravitational attraction or, when Y y1 , . . . , yn
dealing with electrically charged particles, of Cou-
lomb interaction. A basic result is that if V is is called the Lagrangian function and the action can
bounded below, eqn [1] admits, given initial data be written as
X 0 = X(0), X_ 0 = X(0),
_ a unique global solution
Z t2
t ! X(t), t 2 (1, 1); otherwise a solution can fail
_
LXt; Xt dt
to be global if and only if, in a finite time, it reaches
t1
infinity or a singularity point (i.e., a configuration in
which two or more particles occupy the same point: _
The quantity K(X(t)) is called kinetic energy and
an event called a collision). motions satisfying [1] conserve energy as time
In eqn [1], @xi V(x1 , . . . , xn ) is the force acting on t varies, that is,
the points. More general forces are often admitted.
For instance, velocity-dependent friction forces: they _
KXt VXt E const: 3
are not considered here because of their phenomeno-
logical nature as models for microscopic phenomena Hence the action principle can be intuitively thought
which should also, in principle, be explained in of as saying that motions proceed by keeping
terms of conservative forces (furthermore, even from constant the energy, sum of the kinetic and potential
a macroscopic viewpoint, they are rather incomplete energies, while trying to share as evenly as possible
models, as they should be considered together with their (average over time) contribution to the energy.
the important heat generation phenomena that In the special case in which V is translation invariant,
def P
accompany them). Another interesting example of motions conserve linear momentum Q = i mi x_ i ; if V
2 Introductory Article: Classical Mechanics

is rotation invariant around thePorigin O, motions In general, the -dimensional manifold M will not
def
conserve angular momentum M = i mi xi ^ x_ i , where ^ admit a global system of coordinates: however, it
denotes the vector product in Rd , that is, it is the tensor will be possible to describe points in the vicinity
(a ^ b)ij = ai bj  bi aj , i, j = 1, . . . , d: if the dimension of any X 0 2 M by using N = nd coordinates
d = 3 the a ^ b will be naturally regarded as a vector. q = (q1 , . . . , q , q1 , . . . , qN ) varying in an open ball
More generally, to any continuous symmetry group of BX 0 : X = X(q1 , . . . , q , q1 , . . . , qN ).
the Lagrangian correspond conserved quantities: this is The q-coordinates can be chosen well adapted to
formalized in the Noether theorem. the surface M and to the kinetic metric, i.e., so that
It is convenient to think that the scalar product the points of M are identified by q1 =    = qN = 0
in Rdn is defined Pin terms of the ordinaryPscalar product (which is the meaning of adapted); furthermore,
in R d , a  b = dj= 1 aj bj , by (v, w) = ni= 1 mi vi  wi : infinitesimal displacements (0, . . . , 0, d"1 , . . . , d"N )
so that kinetic energy and line element ds can be out of a point X 0 2 M are orthogonal to M (in the
written as K(X) _ = 1 (X, _ X)_ and ds2 = Pn mi dx2 , kinetic metric) and have a length independent of the
2 i=1 i
respectively. Therefore, the metric generated by the position of X 0 on M (which is the meaning of well
latter scalar product can be called kinetic energy adapted to the kinetic metric).
metric. Motions constrained on M arise when the
The interest of the kinetic metric appears from the potential V has the form
Maupertuis principle (equivalent to [1]): the princi-
ple allows us to identify the trajectory traced in R d VX Va X WX 5
by a motion that leads from X 1 to X 2 moving with
energy E. Parametrizing such trajectories as where W is a smooth function which reaches its
 ! X() by a parameter  varying in [0, 1] so that minimum value, say equal to 0, precisely on the
the line element is ds2 = (@ X, @ X) d 2 , the principle manifold M while Va is another smooth potential.
states that the trajectory of a motion with energy E The factor  > 0 is a parameter called the rigidity of
which leads from X 1 to X 2 makes stationary, among the constraint.
the analytic curves x 2 M0, 1 (X 1 , X 2 ), the function A particularly interesting case arises when the level
surfaces of W also have the geometric property of
Z q being parallel to the surface M: in the precise sense
Lx E  Vxs ds 4 that the matrix @q2i qj W(X), i, j > is positive definite
x
and X-independent, for all X 2 M, in a system of
so that the possible trajectories traced by the coordinates well adapted to the kinetic metric.
solutions of [1] in Rnd and with energy E can be A potential W with the latter properties can be
identified with the geodesics of the metric called an approximately ideal constraint reaction. In
def
dm2 = (E V(X))  ds2 . fact, it can be proved that, given an initial datum
For more details, the reader is referred to Landau X 0 2 M with velocity X_ 0 tangent to M, i.e., given
and Lifshitz (1976) and Gallavotti (1983). an initial datum whose coordinates in a local system
of coordinates are (q0 , 0) and (q_ 0 , 0) with q0 =
(q01 , . . . , q0 ) and q_ 0 = (q_ 01 , . . . , q_ 0 ), the motion
generated by [1] with V given by [5] is a motion
Constraints t ! X  (t) which
Often particles are subject to constraints which force 1. as  ! 1 tends to a motion t ! X 1 (t);
the motion to take place on a surface M  Rnd , i.e., 2. as long as X 1 (t) stays in the vicinity of the initial
X(t) is forced to be a point on the manifold data, say for 0  t  t1 , so that it can be
M. A typical example is provided by rigid systems described in the above local adapted coordinates,
in which motions are subject to forces which keep its coordinates have the form t ! (q(t), 0) =
the mutual distances of the particles constant: (q1 (t), . . . , q (t), 0, . . . , 0): that is, it is a motion
jxi  xj j = ij , with ij time-independent positive quan- developing on the constraint surface M; and
tities. In essentially all cases, the forces that imply 3. the curve t ! X 1 (t), t 2 [0, t1 ], as an element of
constraints, called constraint reactions, are velocity the space M0, t1 (X 0 , X 1 (t1 )) of analytic curves on
dependent and, therefore, are not in the class of M connecting X 0 to X 1 (t1 ), renders the action
conservative forces considered here, cf. [1]. Hence,
Z t1 
from a fundamental viewpoint admitting only conser- 
AX _
KXt  Va Xt dt 6
vative forces, constrained systems should be regarded 0
as idealizations of systems subject to conservative
forces which approximately imply the constraints. stationary.
Introductory Article: Classical Mechanics 3

The latter property can be formulated intrinsically, satisfy the mentioned conditions and therefore, the so
that is, referring only to M as a surface, via the constrained motions X 1 (t) of the body satisfy the
restriction of the metric ds2 to line elements ds = variational principles mentioned in connection with [7]
(dq1 , . . . , dq , 0, . . . , 0) tangent to M atPthe point and [9]: in other words, the above natural way of
X = (q0 , 0, . . . , 0) 2 M; we write ds2 = 1, i, j gij (q) realizing a rather general rigidity constraint is ideal.
dqi dqj . The  symmetric positive-definite matrix g The modern viewpoint on the physical meaning of
can be called the metric on M induced by the kinetic the constraint reactions is as follows: looking at
energy. Then the action in [6] can be written as motions in an inertial Cartesian system, it will appear
Z t1 1;
that the system is subject to the applied forces with
1X potential Va (X) and to constraint forces which are
Aq gij qtq_ i tq_ j t
0 2 i;j defined as the differences Ri = mi x i xi Va (X). The
! latter reflect the action of the forces with potential
W(X) in the limit of infinite rigidity ( ! 1).
 V a qt dt 7
In applications, sometimes the action of a constraint
def
can be regarded as ideal: the motion will then verify the
where V a (q) = Va (X(q1 , . . . , q ,0, . . . , 0)): the function variational principles mentioned and R can be com-
1; puted as the differences between the mi x i and the active
def 1X
Lh; q gij qi j  V a q forces  xi Va (X). In dynamics problems it is, however,
2 i;j a very difficult and important matter, particularly in
1 engineering, to judge whether a system of particles can
 gqh  h  V a q 8 be considered as subject to ideal constraints: this leads
2
to important decisions in the construction of machines.
is called the constrained Lagrangian of the system. It simplifies the calculations of the reactions and fatigue
An important property is that the constrained motions of the materials but a misjudgment can have serious
conserve the energy defined as E = 12 (g(q)q, _ q)
_ consequences about stability and safety. For statics
V a (q); see next section. problems, the difficulty is of lower order: usually
The constrained motion X 1 (t) of energy E satisfies assuming that the constraint reaction is ideal leads to
the Maupertuis principle in the sense that the curve an overestimate of the requirements for stability of
on M on which the motion develops renders equilibria. Hence, employing the action principle to
Z q statics problems, where it constitutes the principle of
Lx E  Va xs ds 9 virtual work, generally leads to economic problems
x
rather than to safety issues. Its discovery even predates
stationary among the (smooth) curves that develop Newtonian mechanics.
on M connecting two fixed values X 1 and X 2 . In the We refer the reader to Arnold (1989) and
particular case in which = n this is again Mauper- Gallavotti (1983) for more details.
tuis principle for unconstrained motions under the
potential V(X). In general, is called the number of
degrees of freedom because a complete description
of the initial data requires 2 coordinates q(0), q(0).
_ Lagrange and Hamilton Forms
If W is minimal on M but the condition on W of of the Equations of Motion
having level surfaces parallel to M is not satisfied, i.e., The stationarity condition for the action A(q), cf.
if W is not an approximate ideal constraint reaction, [7], [8], is formulated in terms of the Lagrangian
it still remains true that the limit motion X 1 (t) takes L(h, x), see [8], by
place on M. However, in general, it will not satisfy the
above variational principles. For this reason, motions d
arising as limits (as  ! 1) of motions developing @ Lqt;
_ qt
dt i
under the potential [5] with W having minimum on M @xi Lqt;
_ qt; i 1; . . . ; 10
and level curves parallel (in the above sense) to M are
called ideally constrained motions or motions subject which is a second-order differential equation called
by ideal constraints to the surface M. the Lagrangian equation of motion. It can be cast in
As anPexample, suppose that W has the form normal form: for this purpose, adopting the
W(X) = i, j2P wij (jxi  xj j) with wij (jxj) 0 an ana- convention of summation over repeated indices,
lytic function vanishing only when jxj = ij for i, j in introduce the generalized momenta
some set of pairs P and for some given distances ij (e.g., def
2
wij (x) = (x  2ij )2 ,  > 0). Then W can be shown to pi gqij q_ j ; i 1; . . . ; 11
4 Introductory Article: Classical Mechanics

Since g(q) > 0, the motions t ! q(t) and the corre- [12] can be equivalently formulated by requiring
sponding velocities t ! q(t)
_ can be described equiva- that the function
lently by t ! (q(t), p(t)): and the equations of motion Z t2 
def
[10] become the first-order equations AH j pt  k_ t  Hpt; k t dt 14
t1
q_ i @pi Hp; q; p_ i @qi Hp; q 12
be stationary for j = j 0 : in fact, eqns [12] are the
where the function H, called the Hamiltonian of the stationarity conditions for the Hamilton action
system, is defined by [14] on Mt0 , t1 ((p1 , q1 ), (p2 , q2 ); M). And, since the
def
derivatives of p(t) do not appear in [14], statio-
Hp; q 12gq1 p; p V a q 13 narity is even achieved in the larger space
Mt1 , t2 (q1 , q2 ; M) of the motions j : t ! (p(t), k (t))
Equations [12], regarded as equations of motion for
leading from q1 to q2 without any restriction on
phase space points (p, q), are called Hamilton
the initial and final momenta p1 , p2 (which, there-
equations. In general, q are local coordinates on M
fore, cannot be prescribed a priori independently
and motions are specified by giving q, q_ or p, q.
of q1 , q2 ). If the prescribed data p1 , q1 , p2 , q2 are
Looking for a coordinate-free representation of
not compatible with the equations of motion (e.g.,
motions consider the pairs X, Y with X 2 M and Y a
H(p1 , q2 ) 6 H(p2 , q2 )), then the action functional
vector Y 2 TX tangent to M at the point X. The
has no stationary trajectory in Mt1 , t2 ((p1 , q1 ),
collection of pairs (Y, X) is denoted T(M) = [X2M
_ (p2 q2 ); M).
(TX  {X}) and a motion t ! (X(t), X(t)) 2 T(M) in
For more details, the reader is referred to Landau
local coordinates is represented by (q(t),
_ q(t)). The
and Lifshitz (1976), Arnold (1989), and Gallavotti
space T(M) can be called the space of initial data for
(1983).
Lagranges equations of motion: it has 2 dimen-
sions (also known as the tangent bundle of M).
Likewise, the space of initial data for the
Hamilton equations will be denoted T
(M) and it Canonical Transformations of Phase
consists of pairs X, P with X 2 M and P = g(X)Y Space Coordinates
with Y a vector tangent to M at X. The space T
(M)
The Hamiltonian form, [13], of the equations of
is called the phase space of the system: it has
motion turns out to be quite useful in several
2 dimensions (and it is occasionally called the
problems. It is, therefore, important to remark that
cotangent bundle of M).
it is invariant under a special class of transformations
Immediate consequence of [12] is
of coordinates, called canonical transformations.
d Consider a local change of coordinates on phase
Hpt; qt  0
dt space, i.e., a smooth, smoothly invertible map
C(p, k ) = (p 0 , k 0 ) between an open set U in the
and it means that H(p(t), q(t)) is constant along phase space of a Hamiltonian system with
the solutions of [12]. Noting that H(p, q) = degrees of freedom, into an open set U0 in a
(1=2)(g(q) q, _ q)
_ V a (q) is the sum of the kinetic 2-dimensional space. The change of coordinates is
and potential energies, it follows that the conservation said to be canonical if for any solution
of H along solutions means energy conservation in t ! (p(t), k (t)) of equations like [12], for any
presence of ideal constraints. Hamiltonian H(p, k ) defined on U, the Cimage
Let St be the flow generated on the phase space t ! (p 0 (t), k 0 (t)) = C(p(t), k (t)) is a solution of [12]
variables (p, q) by the solutions of the equations of with the same Hamiltonian, that is, with
motion [12], that is, let t ! St (p, q)  (p(t), q(t)) def
Hamiltonian H0 (p 0 , k 0 ) = H(C1 (p 0 , k 0 )).
denote a solution of [12] with initial data (p, q). The condition that a transformation of coordi-
Then a (measurable) set  in phase space evolves in nates is canonical is obtained by using the
time t into a new set St  with the same volume: this arbitrariness of the function H and is simply
is obvious because the Hamilton equations [12] have expressed as a necessary and sufficient property of
manifestly zero divergence (Liouvilles theorem). the Jacobian L,
The Hamilton equations also satisfy a variational  
principle, called the Hamilton action principle: that A B
L
is, if Mt1 , t2 ((p1 , q1 ), (p2 , q2 ); M) denotes the space of C D
the analytic functions j : t ! (p(t), k (t)) which in the 15
Aij @j 0i ; Bij @j 0i ;
time interval [t1 , t2 ] lead from (p1 , q1 ) to (p2 , q2 ),
then the condition that j 0 (t) = (p(t), q(t)) satisfies Cij @j 0i ; Dij @j 0i
Introductory Article: Classical Mechanics 5

where i, j = 1, . . . , . Let It means that the Hamiltonians H(p, q) and


def
  H0 (p0 , q0 )) = H(C1 (p0 , q0 )) have Hamilton actions
0 1 AH and AH0 differing by a constant, if evaluated
E
1 0 on corresponding motions (p(t), q(t)) and
denote the 2  2 matrix formed by four  (p0 (t), q0 (t)) = C(p(t), q(t)).
blocks, equal to the 0 matrix or, as indicated, to the The constant depends only on the initial and final
(identity matrix); then, if a superscript T denotes values (p(t1 ), q(t1 )) and (p(t2 ), q(t2 )) and, respec-
matrix transposition, the condition that the map be tively, (p0 (t1 ), q0 (t1 )) and (p0 (t2 ), q0 (t2 )) so that if
canonical is that (p(t), q(t)) makes AH extreme, then (p0 (t), q0 (t)) =
C(p(t), q(t)) also makes AH0 extreme.
 
DT BT Hence, if t ! (p(t), q(t)) solves the Hamilton equa-
L1 ELT ET or L1 16 tions with Hamiltonian H(p, q) then the motion
CT AT
t ! (p0 (t), q0 (t)) = C(p(t), q(t)) solves the Hamilton
which immediately implies that det L = 1. In fact, equations with Hamiltonian H0 (p0, q0 ) = H(C1 (p0, q0 ))
it is possible to show that [16] implies det L = 1. no matter which it is: therefore, the transformation is
Equation [16] is equivalent to the four relations ADT  canonical. The function  is called its generating
BCT = 1, ABT BAT = 0, CDT  DCT = 0, and function.
CBT DAT = 1. More explicitly, since the first and Equation [19] provides a way to construct
the fourth relations coincide, these can be expressed as canonical maps. Suppose that a function (p 0 , k ) is
f0i ; 0j g ij ; f0i ; 0j g 0; f0i ; 0j g 0 17 given and defined on some domain W; then setting

where, for any two functions F(p, k ), G(p, k ), the p @k p 0 ; k
Poisson bracket is k 0 @p 0 p 0 ; k

def
X

and inverting the first equation in the form
fF; Ggp; k @k Fp; k @k Gp; k
p 0 = X(p, k ) and substituting the value for p 0 thus
k1
 obtained, in the second equation, a map
 @k Fp; k @k Gp; k 18
C(p, k ) = (p 0 , k 0 ) is defined on some domain (where
The latter satisfies Jacobis identity: {{F, G}, Q} the mentioned operations can be performed) and if
{{G, Q}, F} {{Q, F}, G} = 0, for any three functions such domain is open and not empty then C is a
F, G, Q on the phase space. It is quite useful to canonical map.
remark that if t ! (p(t), q(t)) = St (p, q) is a solution For similar reasons, if (k , k 0 ) is a function
to Hamilton equations with Hamiltonian H then, defined on some domain then setting p = @k 
given any observable F(p, q), it evolves as (k , k 0 ), p 0 = @k 0 (k , k 0 ) and solving the first rela-
def
F(t) = F(p(t), q(t)) satisfying tion to express k 0 = D(p, k ) and substituting in the
second relation a map (p 0 , k 0 ) = C(p, k ) is defined on
@t Fpt; qt = {H; F}pt; qt some domain (where the mentioned operations can
Requiring the latter identity to hold for all observables be performed) and if such domain is open and not
F is equivalent to requiring that the t ! (p(t), q(t)) be a empty then C is a canonical map.
solution of Hamiltons equations for H. Likewise, canonical transformations can be con-
Let C : U ! U0 be a smooth, smoothly invertible structed starting from a priori given functions
transformation between two open 2-dimensional F(p, k 0 ) or G(p, p 0 ). And the most general canonical
sets: C(p, k ) = (p 0 , k 0 ). Suppose that there is a function map can be generated locally (i.e., near a given point
(p 0 , k ) defined on a suitable domain W such that in phase space) by a single one of the above four
ways, possibly composed with a few trivial
p @k p 0 ; k canonical maps in which one pair of coordinates
Cp; k p 0 ; k 0 ) 19
k 0 @p 0 p 0 ; k (i , i ) is transformed into (i , i ). The necessity of
also including the trivial maps can be traced to the
then C is canonical. This is because [19] implies that existence of homogeneous canonical maps, that is,
if k , p 0 are varied and if p, k 0 , p 0 , k are related by maps such that p  dk = p 0  dk 0 (e.g., the identity
C(p, k ) = (p 0 , k 0 ), then p  dk k 0  dp 0 = d(p 0 , k ), map, see below or [49] for nontrivial examples)
which implies that which are action preserving hence canonical, but
which evidently cannot be generated by a function
p  dk  Hp; k dt  p 0  dk 0  HC1 p 0 ; k 0 dt
(k , k 0 ) although they can be generated by a
dp 0 ; k  dp 0  k 0 20 function depending on p 0 , k .
6 Introductory Article: Classical Mechanics

Simple examples of homogeneous canonical maps The most general solution with energy E has the
are maps in which the coordinates q are changed form q(t) = Q(t0 t), where t0 is defined by
into q0 = R(q) and, correspondingly, the ps are _ 0 ), i.e., it is the time needed for
q0 = Q(t0 ), q_ 0 = Q(t
transformed as p0 = (@q R(q))1 T p, linearly: indeed, the standard solution Q(t) to reach the initial data
def
this map is generated by the function F(p0 , q) = for the new motion.
p0  R(q). If the derivative of V vanishes in one of the
For instance, consider the map Cartesianpolar extremes or if at least one of the two solutions q (E)
coordinates (q1 , q2 ) ! (,
) with (,
) the polar
q does not exist, the motion is not periodic and it may
coordinates of q (namely  = q21 q22 ,
= arctan be unbounded: nevertheless, it is still expressible via
def
(q2 =q1 )) and let n= q=jqj = (n1 , n2 ) and t =(n2 , n1 ). integrals of the type [22]. If the potential V is
def def periodic in q and the variable q is considered to be
Setting p = p  n, p
= p  t, the map (p1 , p2 ,
varying on a circle then essentially all solutions are
q1 , q2 ) !(p , p
, ,
) is homogeneous canonical
periodic: exceptions can occur if the energy E has a
(because p  dq = p  nd p  td
= p d p
d
).
value such that V(q) = E admits a solution where V
As a further example, any area-preserving map
has zero derivative.
(p, q) ! (p0 , q0 ) defined on an open region of the
Typical examples are the harmonic oscillator, the
plane R2 is canonical: because in this case the
pendulum, and the Kepler oscillator: whose Hamil-
matrices A, B, C, D are just numbers, which satisfy
tonians, if m, !, g, h, G, k are positive constants, are,
AD  BC = 1 and, therefore, [16] holds.
respectively,
For more details, the reader is referred to Landau
and Lifshitz (1976) and Gallavotti (1983). p2 1
m!2 q2
2m 2
p2  q
mg 1  cos 24
Quadratures 2m h
2
The simplest mechanical systems are integrable by p 1 G2
 mk m 2
quadratures. For instance, the Hamiltonian on R2 , 2m jqj 2q
1 2 the Kepler oscillator Hamiltonian has a potential
Hp; q p Vq 21
2m which is singular at q = 0 but if G 6 0 the energy
conservation forbids too close an approach to q = 0
generates a motion t ! q(t) with initial data q0 , q_ 0
and the singularity becomes irrelevant.
such that H(p0 , q0 ) = E, i.e., 12 mq_ 20 V(q0 ) = E,
The integral in [23] is called a quadrature and the
satisfying
systems in [21] are therefore integrable by quad-
r
2 ratures. Such systems, at least when the motion is
qt
_ E  Vqt periodic, are best described in new coordinates in
m
which periodicity is more manifest. Namely when
If the equation E = V(q) has only two solutions V(q) = E has only two roots q (E) and V 0 (q (E)) > 0
q (E) < q (E) and j@q V(q (E))j > 0, the motion is the energytime coordinates can be used by replac-
periodic with period ing q, q_ or p, q by E, , where  is the time needed
Z q E for the standard solution t ! Q(t) to reach the given
dx _
TE 2 p 22 data, that is, Q() = q, Q() = q.
_ In such coordi-
q E 2=mE  Vx nates, the motion is simply (E, ) ! (E,  t) and,
of course, the variable  has to be regarded as
The special solution with initial data q0 =
varying on a circle of radius T=2. The E, 
q (E), q_ 0 = 0 will be denoted Q(t), and it is an
variables are a kind of polar coordinates, as can
analytic function (by the general regularity theorem
be checked by drawing the curves of constant E,
on ordinary differential equations). For 0  t  T=2
energy levels, in the plane p, q in the cases in
or for T=2  t  T it is given, respectively, by
[24]; see Figure 1.
Z Qt
dx In the harmonic oscillator case, all trajectories are
t p 23a periodic. In the pendulum case, all motions are
q E 2=mE  Vx
periodic except the ones which separate the oscilla-
or tory motions (the closed curves in the second
Z Qt
drawing) from the rotatory motions (the apparently
T dx open curves) which, in fact, are on closed curves as
t  p 23b
2 q E 2=mE  Vx well if the q coordinate, that is, the vertical
Introductory Article: Classical Mechanics 7

generates (locally) the


p
correspondence between
p = 2m(E(A)  V(q)) and
Z q
0 dx
E A p
0 1
2m EA  Vx
Therefore, by the criterion [20], if
2
E0 A
TEA

Figure 1 The energy levels of the harmonic oscillator, the


i.e., if A0 (E) = T(E)=2, the coordinates (A, ) will
pendulum, and the Kepler motion. be canonical coordinates. Hence, by [22], A(E) can
be taken as
Z q E p
coordinate in Figure 1, is regarded as periodic 1
A 2 2mE  Vqdq
with period 2h. In the Kepler case, only the 2 q E
I
negative-energy trajectories are periodic and a few 1
 p dq 27
of them are drawn in Figure 1. The single dots 2
represent the equilibrium points in phase space.
where the last integral is extended to the closed curve
The region of phase space where motions are
of energy E; see Figure 1. The actionangle coordi-
periodic is a set of points (p, q) with the
nates (A, ) are defined in open regions of phase
topological structure of [u2U ({u}  Cu ), where u is
space covered by periodic motions: in actionangle
a coordinate varying in an open interval U (e.g.,
coordinates such regions have the form W = J  T of
the set of values of the energy), and Cu is a closed
a product of an open interval J and a one-
curve whose points (p, q) are identified by a
dimensional torus T = [0, 2] (i.e., a unit circle).
coordinate (e.g., by the time necessary for an
For details, the reader is again referred to Landau and
arbitrarily fixed datum with the same energy to
Lifshitz (1976), Arnold (1989), and Gallavotti (1983).
evolve into (p, q)).
In the above cases, [24], if the radial coordinate
is chosen to be the energy the set U is the interval Quasiperiodicity and Integrability
(0, 1) for the harmonic oscillator, (0, 2mg) or
A Hamiltonian is called integrable in an open region
(2mg, 1) for the pendulum, and ( 12 mk2 =G2 , 0) in
W  T
(M) of phase space if
the Kepler case. The fixed datum for the reference
motion can be taken, in all cases, to be of the form 1. there is an analytic and nonsingular (i.e., with
(0, q0 ) with the time coordinate t0 given by [23]. nonzero Jacobian) change of coordinates (p, q) !
It is remarkable that the energytime coordinates (I, j) mapping W into a set of the form I  T
are canonical coordinates: for instance, in the vicinity with I  R (open); and furthermore
of (p0 , q0 ) and if p0 > 0, this can be seen by setting 2. the flow t ! St (p, q) on phase space is trans-
Z q p formed into (I, j) ! (I, j w(I)t) where w(I) is a
Sq; E 2mE  Vxdx 25 smooth function on I :
q0
This means that, in suitable coordinates, which
and checking that p = @q S(q, E), t = @E S(q, E) are can be called integrating coordinates, the system
identities if (p, q) and (E, t) are coordinates for the appears as a set of points with coordinates
same point so that the criterion expressed by [20] j = (1 , . . . , ) moving on a unit circle at angular
applies. velocities w(I) = (!1 (I), . . . , ! (I)) depending on the
It is convenient to standardize the coordinates actions of the initial data.
by replacing the time variable by an angle = A system integrable in a region W which, in
(2=T(E))t; and instead of the energy any invertible integrating coordinates I, j, has the form I  T is
function of it can be used. said to be anisochronous if det @I w(I) 6 0. It is said
It is natural to look for a coordinate A = A(E) to be isochronous if w(I)  w is independent of I.
such that the map (p, q) ! (A, ) is a canonical The motions of integrable systems are called
map: this is easily done as the function quasiperiodic with frequency spectrum w(I), or
Z q p with frequencies w(I)=2, in the coordinates (I, j).
^
Sq; A 2mEA  Vx dx 26 Clearly, an integrable system admits independent
q0 constants of motion, the I = (I1 , . . . , I ), and, for each
8 Introductory Article: Classical Mechanics

choice of I, the other coordinates vary on a standard and, since the computation of S(A, j) is reduced to
-dimensional torus T : hence, it is possible to say that integrations which can be regarded as a natural
a phase space region of integrability is foliated into extension of the quadratures discussed in the one-
-dimensional invariant tori T (I) parametrized by the dimensional cases, such systems are also called
values of the constants of motion I 2 I . integrable by quadratures. The just-described con-
If an integrable system is anisochronous then it is struction is a version of the more general Arnold
canonically integrable: that is, it is possible to define Liouville theorem.
on W a canonical change of coordinates (p, q) = In practice, however, the actual evaluation of the
C(A, a) mapping W onto J  T and such that integrals in [29], [30] can be difficult: its analysis in
H(C(A, a)) = h(A) for a suitable h. Then, if various cases (even as elementary as the pendu-
def
w(A) = @A h(A), the equations of motion become lum) has in fact led to key progress in various
domains, for example, in the theory of special
A_ 0; a_ wA 28 functions and in group theory.
Given a system (I, j) of coordinates integrating an In general, any surface on phase space on which
anisochronous system the construction of action the restriction of the differential form p  dq is locally
angle coordinates can be performed, in principle, via integrable is called a Lagrangian manifold: hence the
a classical procedure (under a few extra invariant tori of an anisochronous integrable system
assumptions). are Lagrangian manifolds.
Let 1 , . . . ,  be topologically independent circles If an integrable system is anisochronous, it cannot
on T , for definiteness let i (I) = {j j 1 = 2 =    = admit more than independent constants of motion;
i1 = i1 =    = 0, i 2 [0, 2]}, and set furthermore, it does not admit invariant tori of
I dimension > . Hence -dimensional invariant tori
1 are called maximal.
Ai I p  dq 29
2 i I Of course, invariant tori of dimension < can also
exist: this happens when the variables I are such that
If the map I ! A(I) is analytically invertible as the frequencies w(I) admit nontrivial rational rela-
I = I(A), the function tions; i.e., there is an integer components vector
Z j n 2 Z , n = ( 1 , . . . , ) 6 0 such that
SA; j  p  dq 30 X
0 wI  n !i I i 0 32
i
is well defined if the integral is over any path 
joining the points (p(I(A), 0), q(I(A), 0)) and in this case, the invariant torus T (I) is called
(p(I(A), j)), q(I(A), j) and lying on the torus para- resonant. If the system is anisochronous then
metrized by I(A). det @I w(I) 6 0 and, therefore, the resonant tori are
The key remark in the proof that [30] really associated with values of the constants of motion
defines a function of the only variables A, j is that I which form a set of measure zero in the space
anisochrony implies the vanishing of the Poisson I but which is not empty and dense.
brackets
P (cf. [18]): {Ii , Ij } = 0 (hence also {Ai , Aj }  Examples of isochronous systems are the systems of
h, k @Ik Ai @Ih Aj {Ik , Ih } = 0). And the property harmonic oscillators, i.e., systems with Hamiltonian
{Ii , Ij } = 0 can be checked to be precisely the
X 1;
integrability condition for the differential form p  dq

1 2 1X
pi cij qi qj
restricted to the surface obtained by varying q while p is i1
2mi 2 i; j
constrained so that (p, q) stays on the surface
I = constant, i.e., on the invariant torus of the points where the matrix v is a positive-definite matrix.
with fixed I. This is an isochronous system with frequencies
The latter property is necessary and sufficient in w = (!1 , . . . , ! ) whose squares are the eigenvalues of
1=2 1=2
order that the function S(A, j) be well defined (i.e., the matrix mi cij mj . It is integrable in the region
be independent on the integration path P ) up to an W of the data x = (p, q) 2 R2 such that, setting
additive quantity of the form i 2ni Ai with 0 1
!2 !2
n = (n1 , . . . , n ) integers. 1 B X v ; i pi
X v ; i qi C

Then the actionangle variables are defined by the A @ p !2 q A
2! mi m1
canonical change of coordinates with S(A, j) as i1 i1 i
generating function, i.e., by setting
for all eigenvectors v , = 1, . . . , , of the above
i @Ai SA; j; Ii @j i SA; j 31 matrix, the vectors A have all components >0.
Introductory Article: Classical Mechanics 9

Even though this system is isochronous, it never- Hence, the equations of motion are
theless admits a system of canonical actionangle
d
coordinates in which the Hamiltonian takes the m2
_ 0
simplest form dt
i.e., m2
= G is a constant of motion (it is the
X

angular momentum), and
hA ! A  w  A 33
1 m
 @ V @ 2
_2
with 2
2
0 1 G
P
@ V
v ; i pi
p m3
B i mC
B C def
 arctanB i1 C @ VG 
@ P p A
mi ! v ; i qi Then the energy conservation yields a second
i1
constant of motion E,
as conjugate angles.
An example of anisochronous system is the free m 2 1 G2
_ V E
rotators or free wheels: i.e., noninteracting points 2 2 m2
on a circle of radius R or noninteracting homo- 1 2 1 p2

geneous coaxial wheels of radius R. If Ji = mi R2 or, p V 35


2m 2m 2
respectively, Ji = (1=2)mi R2 are the inertia moments
and if the positions are determined by angles a = The right-hand side (rhs) is the Hamiltonian for the
( 1 , . . . , ), the angular velocities are constants system, derived from L, if p , p
denote conjugate
related to the angular momenta A = (A1 , . . . , A ) by momenta of ,
: p = m and p
= m2
(note that
!i = Ai =Ji . The Hamiltonian and the spectrum are p
= G).
Suppose 2 V() ! 0: then the singularity at the
!0
X   origin cannot be reached by any motion starting
1 2 1
hA Ai ; wA Ai 34 with  > 0 if G > 0. Assume also that the function
i1
2J i Ji i1;...;
def 1 G2
VG  V
For further details see Landau and Lifshitz (1976), 2 m2
Gallavotti (1983), Arnold (1989), and Fasso (1998). has only one minimum E0 (G), no maximum and no
horizontal inflection, and tends to a limit E1 (G)  1
when  ! 1. Then the system is integrable in the
Multidimensional Quadratures: domain W = {(p, q) j E0 (G) < E < E1 (G), G 6 0}.
Central Motion This is checked by introducing a standard periodic
solution t ! R(t) of m = @ VG () with energy
Several important mechanical systems with more
E0 (G) < E < E1 (G) and initial data  = E,(G),
than one degree of freedom are integrable by
 = 0 at time t = 0, where E, (G) are the two
canonical quadratures in vast regions of phase
solutions of VG () = E, see the section Quadratures:
space. This is checked by showing that there is a
this is a periodic analytic function of t with period
foliation into invariant tori T (I) of dimension equal
to the number of degrees of freedom () parame- Z E; G
dx
trized by constants of motion I in involution, i.e., TE; G 2 p
such that {Ii , Ij } = 0. One then performs, if possible, E; G 2=mE  VG x
the construction of the actionangle variables by
the quadratures discussed in the previous section. The function R(t) is given, for 0  t  12 T(E, G)
The above procedure is well illustrated by the or for 12 T(E, G)  t  T(E, G), by the quadratures
theory of the planar motion of a unit mass attracted Z Rt
by a coplanar center of force: the Lagrangian is, in dx
t p 36a
polar coordinates (,
), E; G 2=mE  VG x

m 2 or
L _ 2
_2  V
2 Z Rt
TE; G dx
The planarity of the motion is not a strong restriction t  p 36b
as central motion always takes place on a plane. 2 E; G 2=mE  VG x
10 Introductory Article: Classical Mechanics

respectively. The analytic regularity of R(t) follows 2. 2 as the cycle  = const,


2 [0, 2] on which
from the general existence, uniqueness, and regularity d = 0 and p
= G obtaining
theorems applied to the differential equation for . Z E; G p
Given an initial datum _ 0 , 0 ,
_0 ,
0 with energy E 2
A1 2mE  VG xdx;
and angular momentum G, define t0 to be the time 2 E;  G 38
_ 0 ) = _ 0 : then (t)  R(t t0 )
such that R(t0 ) = 0 , R(t A2 G
and
(t) can be computed as
Z t According to the general theory (cf. the previous
G

t
0 2
dt0 section) a generating function for the canonical
0 mRt0 t0 change of coordinates from (p , , p
,
) to action
a second quadrature. Therefore, we can use as angle variables is (if, to fix ideas, p > 0)
coordinates for the motion E, G, t0 , which determine Z  p
_ 0 , 0 ,
_0 and a fourth coordinate that determines
0 SA1 ; A2 ; ;
G
2mE  VG xdx 39
which could be
0 itself but which is conveniently E; 

determined, via the second quadrature, as follows. In terms of the above !0 , 0 the Jacobian matrix
The function Gm1 R(t)2 is periodic with period G)=@(A
T(E, G); hence it can be expressed in a Fourier series
@(E,
  1 , A2 ) is computed from [38], [39] to be
!0  0   0 t
X   . It follows that @E S = t, @G S =

(t)
2 0 1
0 E; G k E; G exp itk
TE; G so that, see [31],
k60
def def
the quadrature for
(t) can be performed by 1 @A1 S !0 t; 2 @A2 S

t 40
integrating the series terms. Setting
  and (A1 , 1 ), (A2 , 2 ) are the actionangle pairs.
def TE; G
X k E; G 2 For more details, see Landau and Lifshitz (1976)

t0 exp it0 k
2 k60 k TE; G and Gallavotti (1983).
 0 ), the expression
and 1 (0) =
0 
(t
Z t
G Newtonian Potential and Keplers Laws

t
0 2
dt0
0 mRt0 t0 The anisochrony property, that is, det @(!0 , 0 )=
becomes @(A1 , A2 ) 6 0 or, equivalently, det @(!0 , 0 )=
@(E, G) 6 0, is not satisfied in the important cases
1 t 1 0 0 E; G t 37 of the harmonic potential and the Newtonian
Hence the system is integrable and the spectrum is potential. Anisochrony being only a sufficient con-
w(E, G) = (!0 (E, G), !1 (E, G))  (!0 , !1 ) with dition for canonical integrability it is still possible
(and true) that, nevertheless, in both cases the
def 2 def canonical transformation generated by [39] inte-
!0 and !1 0 E; G
TE; G grates the system. This is expected since the two
while I = (E, G) are constants of motion and the potentials are limiting cases of anisochronous ones
angles j = (0 , 1 ) can be taken as (e.g., jqj2" and jqj1" with " ! 0).
The Newtonian potential
def def
0 !0 t0 ; 1
0 
t0 1 2 km
Hp; q p 
At E, G fixed, the motion takes place on a two- 2m jqj
dimensional torus T (E, G) with 0 , 1 as angles.
is integrable in the region G 6 0, E0 (G) =
p
In the anisochronous cases, i.e., when
k2 m3 =2G2 < E < 0, jGj < k2 m3 =(2E). Pro-
det @E, G w(E, G) 6 0, canonical actionangle vari-
ceeding as in the last section, one finds integrating
ables conjugated to (p , , p
,
) can be constructed
coordinates and that the integrable motions develop
via [29], [30] by using two cycles 1 , 2 on the torus
on ellipses with one focus on the center of attraction
T (E, G). It is convenient to choose
S so that motions are periodic, hence not anisochro-
1. 1 as the cycle consisting of the points  = x,
= 0 nous: nevertheless, the construction of the canonical
whose first half (where p 0)pconsists in the
coordinates via [29][31] (hence [39]) works and
set E,  (G)  x  E, (G), p = 2m(E  VG (x)) leads to canonical coordinates (L0 , 0 , G0 ,  0 ). To
and d
= 0; and obtain actionangle variables with a simple
Introductory Article: Classical Mechanics 11

D c E

P P
P

O S O S O S

e = 0.75 e = 0.75 e = 0.3

Figure 2 Eccentric and true anomalies of P, which moves on a small circle E centered at a point c moving on the circle D located
half-way between the two concentric circles containing the Keplerian ellipse: the anomaly of c with respect to the axis OS is . The
circle D is eccentric with respect to S and therefore  is, even today, called eccentric anomaly, whereas the circle D is, in ancient
terminology, the deferent circle (eccentric circles were introduced in astronomy by Ptolemy). The small circle E on which the point P
moves is, in ancient terminology, an epicycle. The deferent and the epicyclical motions are synchronous (i.e., they have the same
period); Kepler discovered that his key a priori hypothesis of inverse proportionality between angular velocity on the deferent and
distance between P and S (i.e., _ = constant) implied both synchrony and elliptical shape of the orbit, with focus in S. The latter law is
equivalent to 2
_ = constant (because of the identity a _ = 
).
_ Small eccentricity ellipses can hardly be distinguished from circles.

interpretation, it is convenient to perform on the where g = g(e sin , e cos ), f = f (e sin , e cos ),
variables (L0 , 0 , G0 ,  0 ) (constructed by following the and g(x, y), f (x, y) are suitable functions analytic
procedure just indicated) a further trivial canonical for jxj, jyj < 1. Furthermore, g(x, y) = x(1 y    ),
transformation by setting L = L0 G0 , G = G0 , f (x, y) = 2x(1 54 y   ) and the ellipses denote
 = 0 ,  =  0  0 ; then terms of degree 2 or higher in x, y, containing only
even powers of x.
1.  (average anomaly) is the time necessary for the
For more details, the reader is referred to Landau
point P to move from the pericenter to its actual
and Lifshitz (1976) and Gallavotti (1983).
position, in units of the period, times 2;
2. L (action) is essentially the energy E = k2 m3 =2L2 ;
3. G (angular momentum);
Rigid Body
4.  (axis longitude), is the angle between a fixed
axis and the major axis of the ellipse oriented Another fundamental integrable system is the rigid
from the center of the ellipse O to the center of body in the absence of gravity and with a fixed point
attraction S. O. It can be naturally described in terms of the Euler
angles
0 , 0 , 0 (see Figure 3) and their derivatives
The eccentricity of the ellipse is e such that G =
p
_0 , _ 0 , _ 0 .
L 1  e2 . The ellipse equation is  = a(1 
Let I1 , I2 , I3 be the three principal inertia moments
e cos ), where  is the eccentric anomaly (see
of the body along the three principal axes with unit
Figure 2), a = L2 =km2 is the major semiaxis, and
vectors i1 , i2 , i3 . The inertia moments and the
 is the distance to the center of attraction S.
principal axes are the eigenvalues and the associated
Finally, the relations between eccentric anomaly ,
unit eigenvectors of the 3 P 3 inertia matrix I ,
average anomaly , true anomaly
(the latter is the
which is defined by I hk = ni= 1 mi (xi )h (xi )k , where
polar angle), and SP distance  are given by the
h, k = 1, 2, 3 and xi is the position of the ith particle
Kepler equations
in a reference frame with origin at O and in which
   e sin 
1  e cos 1 e cos
1  e2 i3 z
Z

d
0 41
 1  e2 3=2 2 i2
0 1 e cos
0 0
y
 1  e2 O

a 1 e cos
0
and the relation between true anomaly and average x 0
anomaly can be inverted in the form
n i1
  g
42 Figure 3 The Euler angles of the comoving frame i 1 , i 2 , i 3 with
 1  e2

 f ) respect to a fixed frame x , y , z. The direction n is the node line,
a 1 e cos f intersection between the planes x, y and i 1 , i 2 .
12 Introductory Article: Classical Mechanics

all particles are at rest: this comoving frame exists as Since angular momentum is conserved, it is con-
a consequence of the rigidity constraint. The venient to introduce the laboratory frame (O; x0 ,
principal axes form a coordinate system which is y0 , z0 ) with fixed axes x0 , y0 , z0 and (see Figure 4):
comoving as well: that is, in the frame (O; i1 , i2 , i3 )
1. (O; x, y, z), the momentum frame with fixed axes,
as well, the particles are at rest.
but with z-axis oriented as M, and x-axis
The Lagrangian is simply the kinetic energy: we
coinciding with the node (i.e., the intersection)
imagine the rigidity constraint to be ideal (e.g., as
of the x0 y0 plane and the xy plane (orthogonal
realized by internal central forces in the limit of
to M). Therefore, x, y, z is determined by the two
infinite rigidity, as mentioned in the section Lagrange
Euler angles ,  of (O; x, y, z) in (O; x0 , y0 , z0 );
and Hamilton forms of equations of motion). The
2. (O; 1, 2, 3), the comoving frame, that is, the
angular velocity of the rigid motion is defined by
frame fixed with the body, and with unit vectors
w
_0 n _ 0 z _ 0 i3 43 i1 , i2 , i3 parallel to the principal axes of the body.
The frame is determined by three Euler angles
expressing that a generic infinitesimal motion
0 , 0 , 0 ;
must consist of a variation of the three Euler 3. the Euler angles of (O; 1, 2, 3) with respect to
angles and, therefore, it has to be a rotation of (O; x, y, z), which are denoted
, , ; P
speeds
_0 , _ 0 , _ 0 around the axes n, z, i3 as shown 4. G, the total angular momentum: G2 = j Ij2 !2j ;
in Figure 3. 5. M3 , the angular momentum along the z0 axis;
Let (!1 , !2 , !3 ) be the components of w along the M3 = G cos ; and
principal axes i1 , i2 , i3 : for brevity, the latter axes 6. L, the projection of M on the axis 3, L = G cos
.
will often be called 1, 2, 3. Then the angular
momentum M, with respect to the pivot point O, The quantities G, M3 , L, , , determine
0 , 0 ,
0 and
0 ,
_ _ 0 , _ 0 , or the p
, p , p variables
and the kinetic energy K can be checked to be 0 0 0
conjugated to
0 , 0 , 0 as shown by the following
M I 1 !1 i 1 I 2 ! 2 i 2 I 3 !3 i 3 comment.
1 44 Considering Figure 4, the angles ,  determine
K I1 !21 I2 !22 I2 !23 location, in the fixed frame (O; x0 , y0 , z0 ) of the
2
direction of M and the node line m, which are,
and are constants of motion. From Figure 3 it follows respectively, the z-axis and the x-axis of the fixed
that !1 =
_0 cos 0 _ 0 sin
0 sin 0 , !2 = 
_0 sin 0 frame associated with the angular momentum; the
_ 0 sin
0 cos 0 and !3 = _ 0 cos
0 _ 0 , so that the angles
, , then determine the position of the
Lagrangian, uninspiring at first, is comoving frame with respect to the fixed frame
def 1 (O; x, y, z), hence its position with respect to
L I1
_0 cos 0 _ 0 sin
0 sin 0 2 (O; x0 , y0 , z0 ), that is, (
0 , 0 , 0 ). From this and
2
1 G, it is possible to determine w because
2
I2 
_0 sin 0 _ 0 sin
0 cos 0
2 I3 !3 I 2 !2
1 cos
; tan
I3 _ 0 cos
0 _ 0 2 45 G I 1 !1 47
2 !22 I22 G2  I12 !21  I32 !23
Angular momentum conservation does not imply and, from [43],
_0 , _ 0 , _ 0 are determined.
that the components !j are constants because
i1 , i2 , i3 also change with time according to
z0
d 3
ij w ^ ij ; j 1; 2; 3 0 2
dt M ||z
_ = 0 becomes, by the first of [44] and y
Hence, M
denoting Iw = (I1 !1 , I2 !2 , I3 !3 ), the Euler equations O y0
Iw w ^ Iw = 0, or

I1 !_ 1 I2  I3 !2 !3
x0 1
I2 !_ 2 I3  I1 !3 !1 46 0 0
I3 !_ 3 I1  I2 !1 !2 n
n0
x =m
which can be considered together with the conserved Figure 4 The laboratory frame, the angular momentum frame,
quantities [44]. and the comoving frame (and the Deprit angles).
Introductory Article: Classical Mechanics 13

The Lagrangian [45] gives immediately (after Note that if I1 = I2 = I, the above analysis is
expressing w, i.e., n, z, i3 , in terms of the Euler extremely simplified. Furthermore, if gravity g acts
angles
0 , 0 , 0 ) an expression for the variables on the system the Hamiltonian will simply change by
p
0 , p0 , p 0 conjugated to
0 , 0 , 0 : the addition of a potential mgz if z is the height of
the center of mass. Then (see Figure 4), if the center
p
0 M  n0 ; p0 M  z0 ; p M  i3 48
0 of mass of the body is on the axis i3 and z = h cos
0 ,
and, in principle, we could proceed to compute the and h is the distance of the center of mass from O,
Hamiltonian. since cos
0 = cos
cos   sin
sin  cos , the Hamil-
However, the computation can be avoided tonian will become H = K  mgh cos
0 or
because of the very remarkable property (DEPRIT),  1=2
which can be checked with some patience, making G2 G2  L2 M3 L M23
H  mgh  1 
use of [48] and of elementary spherical trigonometry 2I3 2I G2 G2
identities,  1=2 !
L2
 1 2 cos 53
M3 d G d L d G
p0 d0 p 0 d 0 p
0 d
0 49
so that, again, the system is integrable by quadratures
which means that the map ((M3 , ), (L, ), (with the roles of and interchanged with respect
(G, )) ! ((p
0 ,
0 ), (p0 , 0 ), (p 0 , 0 )) is a canoni- to the previous case) in suitable regions of phase space.
cal map. And in the new coordinates, the kinetic This is called the Lagranges gyroscope.
energy, hence the Hamiltonian, takes the form A less elementary integrable case is when the
" !# inertia moments are related as I1 = I2 = 2I3 and the
1 L2 2 2 sin2 cos2 center of mass is in the i1 i2 plane (rather than on
K G  L 50 the i3 -axis) and only gravity acts, besides the
2 I3 I1 I2
constraint force on the pivot point O; this is called
This again shows that G, M3 are constants of Kowalevskaias gyroscope.
motion, and the L, variables are determined by a For more details, see Gallavotti (1983).
quadrature, because the Hamilton equation for
combined with the energy conservation yields
Other Quadratures
!
2 2
_ 1 sin cos An interesting classical integrable motion is that of a
 
I3 I1 I2 point mass attracted by two equal-mass centers of
v
  gravitational attraction, or a point ideally constrained
u
u2E  G2 sin2 cos2 to move on the surface of a general ellipsoid.
u I1 I2
t 51 New integrable systems have been discovered
1 sin2 cos2
I3  I1  I2 quite recently and have generated a wealth of new
developments ranging from group theory (as integ-
In the integrability region, this motion is periodic rable systems are closely related to symmetries) to
with some period TL (E, G). Once (t) is determined, partial differential equations.
the Hamilton equation for leads to the further It is convenient to extend the notion of integ-
quadrature rability by stating that a system is integrable in a
!
sin2 t cos2 t region W of phase space if
_ G 52
I1 I2 1. there is a change of coordinates (p, q) 2
W ! {A, a, Y, y} 2 (U  T )  (V  Rm ) where
which determines a second periodic motion with U  R , V  Rm , with m 1, are open sets; and
period TG (E, G). The , M3 are constants and, 2. the A, Y are constants of motion while the other
therefore, the motion takes place on three- coordinates vary linearly:
dimensional invariant tori T E, G, M3 in phase space,
each of which is always foliated into two- a; y ! a wA; Yt; y vA; Yt 54
dimensional invariant tori parametrized by the
where w(A, Y), v(A, Y) are smooth functions.
angle  which is constant (by [50], because K is
M3 -independent): the latter are in turn foliated by In the new sense, the systems studied in the previous
one-dimensional invariant tori, that is, by periodic sections are integrable in much wider regions (essen-
orbits, with E, G such that the value of tially on the entire phase space with the exception of a
TL (E, G)=TG (E, G) is rational. set of data which lie on lower-dimensional surfaces
14 Introductory Article: Classical Mechanics

forming sets of zero volume). The notion is con- whose Lax pair is related to that of the Calogero
venient also because it allows us to say that even the lattice.
systems of free particles are integrable. By taking suitable limits as n ! 1 and as the
Two very remarkable systems integrable in the other parameters tend to 0 or 1 at suitable rates,
new sense are the Hamiltonian systems, respectively integrability of a few differential equations, among
called Toda lattice (KRUSKAL, ZABUSKY), and which the KortewegdeVries equation or the non-
Calogero lattice (CALOGERO, MOSER); if (pi , qi ) 2 R 2 , linear Schrodinger equation, can be derived.
they are As mentioned in the introductory section, sym-
metry properties under continuous groups imply
1 Xn X
n1
existence of constants of motion. Hence, it is natural
HT p; q p2i g eqi1 qi
2m i1 i1 to think that integrability of a mechanical system
1 Xn Xn
g reflects enough symmetry to imply the existence of
HC p; q p2i 55 as many constants of motion, independent and in
2m i1 i<j qi  qj 2 involution, as the number of degrees of freedom, n.
1X n This is in fact always true, and in some respects it
m!2 q2i is a tautological statement in the anisochronous
2 i1
cases. Integrability in a region W implies existence
where m > 0 and , !, g 0. They describe the of canonical actionangle coordinates (A, a) (see the
motion of n interacting particles on a line. section Quasiperiodicity and integrability) and the
The integration method for the above systems is Hamiltonian depends solely on the As: therefore, its
again to find first the constants of motion and later restriction to W is invariant with respect to the
to look for quadratures, when appropriate. The action of the continuous commutative group T n of
constants of motion can be found with the method the translations of the angle variables. The actions
of the Lax pairs. One shows that there is a pair of can be seen as constants of motion whose existence
self-adjoint n  n matrices M(p, q), N(p, q) such that follows from Noethers theorem, at least in the
the equations of motion become anisochronous cases in which the Hamiltonian
p formulation is equivalent to a Lagrangian one.
d
Mp; q iMp; q; Np; q; i 1 56 What is nontrivial is to recognize, prior to
dt realizing integrability, that a system admits this
which imply that M(t) = U(t)M(0)U(t)1 , with U(t) a kind of symmetry: in most of the interesting cases,
unitary matrix. When the equations can be written in the systems either do not exhibit obvious symmetries
the above form, it is clear that the n eigenvalues of the or they exhibit symmetries apparently unrelated to
matrix M(0) = M(p0 , q0 ) are constants of motion. the group T n , which nevertheless imply existence of
When appropriate (e.g., in the Calogero lattice case sufficiently many independent constants of motion
with ! > 0), it is possible to proceed to find canonical as required for integrability. Hence, nontrivial
actionangle coordinates: a task that is quite difficult integrable systems possess a hidden symmetry
due to the arbitrariness of n, but which is possible. under T n : the rigid body is an example.
The Lax pairs for the Calogero lattice (with However, very often the symmetries of a Hamiltonian
! = 0, g = m = 1) are H which imply integrability also imply partial
isochrony, that is, they imply that the number of
Mhh ph ; Nhh 0 independent frequencies is smaller than n (see the
i 1 57 section Quasiperiodicity and integrability). Even
Mhk ; Nhk 2
h 6 k in such cases, often a map exists from the original
qh  qk qh  qk
coordinates (p, q) to the integrating variables (A, a)
while for the Toda lattice (with m = g = 12  = 1) the in which A are constants of motion and the a are
nonzero matrix elements of M, N are uniformly rotating angles (some of which are also
constant) with spectrum w(A), which is the gradient
Mhh ph ; Mh; h1 Mh1; h eqh qh1 A h(A) for some function h(A) depending only on a
58
Nh; h1 Nh1; h i eqh qh1 few of the A coordinates. However, the map might
fail to be canonical. The system is then said to be
which are checked by first trying the case n = 2. bi-Hamiltonian: in the sense that one can represent
Another integrable system (SUTHERLAND) is motions in two systems of canonical coordinates,
1 Xn Xn
g not related by a canonical transformation, and by
HS p; q p2k 2
59 two Hamiltonian functions H and H0  h which
2m ik h<k sinh qh  qk generate the same motions in the respective
Introductory Article: Classical Mechanics 15

coordinates (the latter changes of variables are power series expansion in " as " = "1 "2 2    .
sometimes called canonical with respect to the Hence, 1 would have to satisfy
pair H, H 0 while the transformations considered in
the section Canonical transformations of phase wA0  a 1 A0 ; a f A0 ; a f A0 61
space coordination are called completely
canonical). where f (A0 ) depends only on A0 (hence integrating
For more details, we refer the reader to Calogero both sides with respect to a, it appears that f (A0 )
and Degasperis (1982). must coincide with the average of f (A0 , a) over a).
This implies that the Fourier transform fn (A),
n 2 Z , should satisfy
Generic Nonintegrability fn A0 0 if wA0  n 0; n 6 0 62
It is natural to try to prove that a system close to which is equivalent to the existence of e fn (A0 ) such that
an integrable one has motions with properties very 0 e
fn (A) = w(A )  n fn (A) for n 6 0. But since there is no
close to quasiperiodic. This is indeed the case, but in relation between w(A) and f (A, a), this property
a rather subtle way. That there is a problem is easily generically will not hold in the sense that as close
seen in the case of a perturbation of an anisochro- as wished to an f which satisfies the property [62] there
nous integrable system. will be another f which does not satisfy it essentially no
Assume that a system is integrable in a region W matter how closenessPis defined, (e.g., with respect to
of phase space which, in the integrating actionangle the metric jjf  gjj = n jfn (A)  gn (A)jj). This is so
variables (A, a), has the standard form U  T with 2
because the rank of AA h(A) is higher than 1 and w(A)
a Hamiltonian h(A) with gradient w(A) = @A h(A). If varies at least on a two-dimensional surface, so that
the forces are perturbed by a potential which is w  n = 0 becomes certainly possible for some n 6 0
smooth then the new system will be described, in the while fn (A) in general will not vanish, so that 1 ,
same coordinates, by a Hamiltonian like hence " , does not exist.
This means that close to a function f there is a
H" A; a hA "f A; a 60 function f 0 which violates [62] for some n. Of course,
this depends on what is meant by close: however,
with h, f analytic in the variables A, a. here essentially any topology introduced on the
If the system really behaved like the unperturbed space of the functions f will make the statement
one, it ought to have constants of motion of the correct. For instance, ifPthe distance between two
form F" (A, a) analytic in " near " = 0 and uniform, functions is defined by n supA2U jfn (A)  gn (A)j or
that is, single valued (which is the same as periodic) by sup A, a jf (A, a)  g(A, a)j.
in the variables a. However, the following theorem The idea behind the last statement of the theorem
(POINCARE) shows that this is a somewhat unlikely is in essence the same: consider, for simplicity, the
possibility. anisochronous case in which the matrix AA h(A)
2

2 has maximal rank , that is, the determinant


Theorem 1 If the matrix AA h(A) has rank 2, the 2
Hamiltonian [60] generically (an intuitive notion det AA h(A) does not vanish. Anisochrony implies
precised below) cannot be integrated by a canonical that w(A)n 6 0 for all n 6 0 and A on a dense set,
transformation C" (A, a) which and this property will be used repeatedly in the
following analysis.
(i) reduces to the identity as " ! 0; and Let B(", A, a) be a uniform constant of motion,
(ii) is analytic in " near " = 0 and in (A, a) 2 meaning that it is single valued and analytic in the
U0  T , with U0  U open. non-simply-connected region U  T and, for " small,
Furthermore, no uniform constants of motion F" (A, a), B"; A; a B0 A; a "B1 A; a
defined for " near 0 and (A, a) in an open domain U0 
T , exist other than the functions of H" itself. "2 B2 A; a    63

Integrability in the sense (i), (ii) can be called The condition that B is a constant of motion can be
analytic integrability and it is the strongest (and written order by order in its expansion in ": the first
most naive) sense that can be given to the attribute. two orders are
The first part of the theorem, that is, (i), (ii), holds wA  @a B0 A; a 0
simply because, if integrability was assumed, a
@A f A; a  @a B0 A; a  @a f A; a  @A B0 A; a 64
generating function of the integrating map would
have the form A0  a " (A0 , a) with  admitting a wA  @a B1 A; a 0
16 Introductory Article: Classical Mechanics

Then the above two relations and anisochrony imply coordinates as "V(x), in terms of the actionangle
(1) that B0 must be a function of A only and (2) that variables of the unperturbed, integrable, system.
w(A)  n and @A B0 (A)  n vanish simultaneously for all In particular, the problem arises when trying to
n. Hence, the gradient of B0 must be proportional to check nonexistence of nontrivial constants of
w(A), that is, to the gradient of h(A) : A B0 (A) = motion when the anisochrony assumption (cf. the
(A) A h(A). Therefore, generically (because of the previous section) is not satisfied. Usually it
anisochrony) it must be that B0 depends on A becomes satisfied to second order (or higher):
through h(A) : B0 (A) = F(h(A)) for some F. but to show this, a more detailed information on
Looking again, with the new information, at the the structure of the perturbing function expressed
second of [64] it follows that at fixed A the in actionangle variables is needed. For instance,
a-derivative in the direction w(A) of B1 equals this is often necessary even when the perturbation
F0 (h(A)) times the a-derivative of f, that is, is approximated by a trigonometric polynomial, as
B1 (A, a) = f (A, a)F0 (h(A)) C1 (A). it is essentially always the case in celestial
Summarizing: the constant of motion B has been mechanics.
written as B(A, a) = F(h(A)) "F0 (h(A))f (A, a) Finding explicit expressions for the actionangle
"C1 (A) "2 B2    which is equivalent to variables is in itself a rather nontrivial task which
B(A, a) = F(H" ) "(B00 "B01    ) and therefore leads to many problems of intrinsic interest even in
B00 "B01    is another analytic constant of seemingly simple cases. For instance, in the case of
motion. Repeating the argument also B00 "B01    the planar gravitational central motion, the Kepler
must have the form F1 (H" ) "(B000 "B001    ); equation  = " sin  (see the first of [41]) must be
conclusion solved expressing  in terms of  (see the first of
[42]). It is obvious that for small ", the variable 
B FH" "F1 H" "2 F2 H"    can be expressed as an analytic function of ":
"n Fn H" O"n1 65 nevertheless, the actual construction of this expres-
sion leads to several problems. For small ", an
By analyticity, B = F" (H" (A, a)) for some F" : hence interesting algorithm is the following.
generically all constants of motion are trivial. Let h() =   , so that the equation to solve (i.e.,
Therefore, a system close to integrable cannot the first of [41]) is
behave as it would naively be expected. The
h " sin h
problem, however, was not manifest until POIN-
CAREs proof of the above results: because in most @c
 "  h 66
applications the function f has only finitely many @
Fourier components, or at least is replaced by an where c() = cos ; the function  ! h() should be
approximation with this property, so that at least periodic in , with period 2, and analytic in ",  for
[62] and even a few of the higher-order constraints " small and  real. If h() = "h(1) "2 h(2)    , the
like [64] become possible in open regions of action Fourier transform of h(k) () satisfies the recursion
space. In fact, it may happen that the values of A of relation
interest are restricted so that w(A)  n = 0 only for
large values of n for which fn = 0. Nevertheless, X1
1 X
hk
 i 0 c 0 i 0 p
the property that fn (A) = (w(A)  n)e fn (A) (or the p! k1 kp k1
p1
analogous higher-order conditions, e.g., [64]), 0 1  p
Y
which we have seen to be necessary for analytic  h kj j ; k>1 67
integrability of the perturbed system, can be
checked to fail in important problems, if no with c the Fourier transform of the cosine (c 1 = 12 ,
approximation is made on f. Hence a conceptual c = 0 if 6 1), and (of course) h(1) = i c .
problem arises. Equation [67] is obtained by expanding the RHS
For more details see Poincare (1987). of [66] in powers of h and then taking the Fourier
transform of both sides retaining only terms of order
k in ".
Iterating the above relation, imagine drawing all
Perturbing Functions
trees
with k branches, or lines, distinguished
To check, in a given problem, the nonexistence of by a label taking k values, and k nodes and attach to
nontrivial constants of motion along the lines each node v a harmonic label v = 1 as in Figure 5.
indicated in the previous section, it is necessary to The trees will be assumed to start with a root line vr
express the potential, usually given in Cartesian linking a point r and the first node v (see Figure 5)
Introductory Article: Classical Mechanics 17

4
(also readable from the tree representation): the
1
actual radius of convergence, first determined by
5
Laplace, of the series for h can also be determined
2 6

from the latter expression for h (ROUCHE) or directly
7 from the tree representation: it is 0.6627.
0
3 8 One can find better estimates or at least more
9 efficient methods for evaluating the sums in [69]:
10 in fact, in performing the sum in [69] important
cancellations occur. For instance, the harmonic
Figure 5 An example of a tree graph and its labels. It contains
only one simple node (3). Harmonics are indicated next to their labels can be subject to the further strong constraint
nodes. Labels distinguishing lines are not marked. that no line carries zero current because the
sum of the values of the trees of fixed order and
with at least one line carrying zero current
and then bifurcate arbitrarily (such trees are some-
vanishes.
times called rooted trees).
The above expansion can also be simplified by
Imagine the tree oriented from the endpoints
partial resummations. For the purpose of an
towards the root r (not to be considered a node)
example, let the nodes with one entering and one
and given a node v call v0 the node immediately
exiting line (see Figure 5) be called as simple
following it. If v is the first node before the root r,
nodes. Then all tree graphs which, on any line
let v0 = r and v 0 = 1. For each such decorated tree
between two nonsimple nodes, contain any number
define its numerical value
of simple nodes can be eliminated. This is done by
i Y Y replacing, in evaluating the (remaining) tree values,
Val
v 0 v c v 68 the factors v0 v in [68] by v0 v =(1  " cos ): then
k! lines lv0 v nodes the value of
(denoted Val(
) ) for a tree becomes a
and define a current (l) on a line l = v0 v to be the function of and " and [69] is replaced by
sum of P the harmonics of the nodes preceding
v0 : (l) = wv v . Call (
) the current flowing in
1 X
X

h "k ei Val
70
the root branch and call order of
the number of
k1
;

nodes (or branches). Then order
k

X where the
means that the trees are subject to the
hk
Val
69 further restriction of not containing any simple

;

order
k node. It should be noted that the above graphical
representation of the solution of the Kepler equation
provided trees are considered identical if they can be
is strongly reminiscent of the representations of
overlapped (labels included) after suitably scaling
quantities in terms of graphs that occur often in
the lengths of their branches and pivoting them
quantum field theory. Here the trees correspond to
around the nodes out of which they emerge (the root
Feynman graphs, the factors associated with the
is always imagined to be fixed at the origin).
nodes are the couplings, the factors associated with
If the trees are stripped of the harmonic labels,
the lines are the propagators, and the resummations
their number is finite and it can be estimated to be
are analogous to the self-energy resummations,
 k!4k (because the labels which distinguish the lines
while the cancellations mentioned above can be
can be attached to an unlabeled tree in many ways).
related to the class of identities called Ward
The harmonic labels (i.e., v = 1) can be laid
identities. Not only the analogy can be shown not
down in 2k ways, and the value of each tree can be
to be superficial, but it also turns out to be very
bounded by P(1=k!)2
k
(because c 1 = 12).
(k) k helpful in key mechanical problems: see Appendix 1.
Hence jh j  4 , which gives a (rough) The existence of a vast number of identities
estimate of the radius of convergence of the
relating the tree values is shown already by the
expansion of h in powers of ": namely 0.25 (easily
simple form of the Lagrange series and by the
improvable to 0.3678 if 4k k! is replaced by kk1
even more remarkable resummation (LEVI-CIVITA)
using Cayleys formula for the enumeration of
leading to
rooted trees). A simple expression for h(k) ( )
(LAGRANGE) is  k
X1
" sin k 1
1 h @ 71
hk = @ k1 sink k! 1  " cos
k! k1
18 Introductory Article: Classical Mechanics

It is even possible to further collect the series analytic invariant torus on which the motion is
terms to express it as a series with much better quasiperiodic and
convergence properties; for instance, its terms can be
1. has the same spectrum w 0 ,
reorganized and collected (resummed) so that h is
2. depends analytically on " at least for " small,
expressed as a power series in the parameter
3. reduces to the unperturbed torus {A0 }  T as
p
" e 1"
2 " ! 0.
 p 72
1 1  "2 More concretely, the question is:

with radius of convergence 1, which corresponds to Are there functions H " (y ), h" (y ) analytic in y 2 T
" = 1 (via a simple argument by Levi-Civita). The and in " near 0, vanishing as " ! 0 and such that the
torus with parametric equations
analyticity domain for the Lagrange series is jj < 1.
This also determines the value of Laplace radius,
which is the point closest to the origin of the A A0 H " y ; a y h" y ; y 2 T 73
complex curve j(")j = 1: it is imaginary so that it is def
is invariant and, if w 0 = w(A0 ), the motion on it is
the root of the equation
simply y ! y w 0 t, i.e., it is quasiperiodic with
p p spectrum w 0 ?
2
"e 1 " =1 1 "2 1

The analysis provides an example, in a simple In this context, Poincares theorem (in the section
case of great interest in applications, of the kind of Generic nonintegrability) had followed another
computations actually necessary to represent the key result, earlier developed in particular cases and
perturbing function in terms of actionangle completed by him, which provides a partial answer
variables. The property that the function c() in to the question.
[66] is the cosine has been used only to limit the Suppose that w 0 = w(A0 ) 2 R satisfies a Diophan-
range of the label to be 1; hence the same tine property, namely suppose that there exist
method, with similar results, can be applied to constants C,  > 0 such that
study the inversion of the relation between the
average anomaly  and the true anomaly
and to 1
jw 0  nj ; for all 0 6 n 2 Z 74
efficiently obtain, for instance, the properties of Cjnj
f, g in [42].
For more details, the reader is referred to Levi- which, for each  >  1 fixed, is a property
Civita (1956). enjoyed by all w 2 R but for a set of zero measure.
Then the motions on the unperturbed torus run over
trajectories that fill the torus densely because of the
irrationality of w 0 implied by [74]. Writing
Lindstedt and Birkhoff Series:
Hamiltons equations,
Divergences
Nonexistence of constants of motion, rather than a_ @A H0 A " A f A; a; A_ " a f A; a
being the end of the attempts to study motions close
to integrable ones by perturbation methods, marks with A, a given by [73] with y replaced by y wt,
the beginning of renewed efforts to understand their and using the density of the unperturbed trajectories
nature. implied by [74], the condition that [73] are
Let (A, a) 2 U  T be actionangle variables equations for an invariant torus on which the
defined in the integrability region for an analytic motion is y ! y w 0 t are
Hamiltonian and let h(A) be its value in the action
angle coordinates. Suppose that h(A0 ) is anisochro- w 0 w 0  y h" y A H0 A0 H " y
nous and let f (A, a) be an analytic perturbing " A f A0 H " y ; y h" y w 0  y H " y
function. Consider, for " small, the Hamiltonian
" a f A0 H " y ; y h" y 75
H" (A, a) = H0 (A) "f (A, a).
Let w 0 = w (A0 )  A H0 (A) be the frequency spec-
The theorem referred to above (POINCARE) is that
trum (see the section Quasiperiodicity and integ-
rability) of one of the invariant tori of the Theorem 2 If the unperturbed system is anisochro-
unperturbed system corresponding to an action A0 . nous and w 0 = w(A0 ) satisfies [74] for some C,  > 0
Short of integrability, the question to ask at this there
P1 exist two well definedP power series h" (y ) =
k (k) 1 k (k)
point is whether the perturbed system admits an k=1 " h (y ) and H " (y ) = k = 1 " H (y ) which
Introductory Article: Classical Mechanics 19

solve [75] to all orders in ". The series for H " is u" A0 "A2
uniquely determined, and such is also the series for
F" A0 ; a
h" up to the addition of an arbitrary constant at each
order, so that it is unique if h" is required, as X
1 X i 2 k
"k fn eian 77
henceforth done with no loss of generality, to have i!01 1 !02 2 k1
k1 06n2Z2
zero average over y .
The algorithm for the construction is illustrated in The series does not converge: in fact, its convergence
a simple case in the next section (see eqns [83], would imply integrability and, consequently,
[84]). Convergence of the above series, called bounded trajectories in phase space: however, the
Lindstedt series, even for small " has been a problem equations of motion for [76] can be easily solved
for rather a long time. Poincare proved the existence explicitly and in any open region near given initial
of the formal solution; but his other result, discussed data there are other data which have unbounded
in the section Generic nonintegrability, casts trajectories if !01 =(!02 ") is rational.
doubts on convergence although it does not exclude Nevertheless, even in this elementary case a
it, as was immediately stressed by several authors formal sum of the series yields
(including Poincare himself). The result in that
section shows the impossibility of solving [75] for uA0 "A02
all w 0 s near a given spectrum, analytically and X fn eian 78
uniformly, but it does not exclude the possibility of F" A0 ; a "
2
i!01 1 !20 " 2
06n2Z
solving it for a single w 0 .
The theorem admits several extensions or analogs: and the series in [78] (no longer a power series in ")
an interesting one is to the case of isochronous is really convergent if w = (!01 , !02 ") is a Dio-
unperturbed systems: phantine vector (by [74], because analyticity implies
Given the Hamiltonian H" (A, a) = w 0  A "f (A, a), exponential decay of jfn j). Remarkably, for such
with w 0 satisfying [74] and f analytic, there exist values of " the Hamiltonian H" is integrable and it is
power series C" (A0 , a 0 ), u" (A0 ) such that H" (C" (A0 , a 0 )) = integrated by the canonical map generated by [78],
w 0  A0 u" (A0 ) holds as an equality between formal in spite of the fact that [78] is obtained, from [77],
power series (i.e., order by order in ") and at the via the nonrigorous sum rule
same time the C" , regarded as a map, satisfies order by
order the condition (i.e., (4.3)) that it is a canonical map. X
1
1
zk for z 6 1 79
This means that there is a generating function k0
1z
A0  a F" (A0 , a) also
P defined by a formal power
series F" (A0 , a) = 1 k=1 " k (k)
F (A 0
, a), that is, such (applied to cases with jzj 1, which are certainly
0 0
that if C" (A , a ) = (A, a) then it is true, order by realized for a dense set of "s even if w is Diophantine
order in powers of ", that A = A0 a F" (A0 , a) and because the zs have values z = 2 =w 0  n). In other
a 0 = a A0 F" (A0 , a). The series for F" , u" are called words, the integration of the equations is elementary
Birkhoff series. and once performed it becomes apparent that, if w is
In this isochronous case, if Birkhoff series were diophantine, the solutions can be rigorously found
convergent for small " and (A0 , a) in a region of the from [78]. NoteP that,k for instance, this means that
form U  T , with U  R open and bounded, it relations like 1 k = 0 2 = 1 are really used to obtain
would follow that, for small ", H" would be inte- [78] from [77].
grable in a large region of phase space (i.e., where the Another extension of Lindstedt series arises in a
generating function can be used to build a canonical perturbation of an anisochronous system when
map: this would essentially be U  T deprived of a asking the question as to what happens to the
small layer of points near the boundary of U). unperturbed invariant tori T w 0 on which the spec-
However, convergence for small " is false (in general), trum is resonant, that is, w 0  n = 0 for some n 6 0,
as shown by the simple two-dimensional example n 2 Z . The result is that even in such a case there is a
formal power series solution showing that at least
H" A; a w 0  A " A2 f a a few of the (infinitely many) invariant tori into
76 which T w0 is in turn foliated in the unperturbed case
A; a 2 R2  T2
can be formally continued at " 6 0 (see the section
with f (a) an arbitrary analytic function with all Resonances and their stability).
Fourier coefficients fn positive for n 6 0 and fo = 0. For more details, we refer the reader to Poincare
In the latter case, the solution is (1987).
20 Introductory Article: Classical Mechanics

Quasiperiodicity and KAM Stability This is a stability result: for instance, in systems
with two degrees of freedom the invariant tori of
To discuss more advanced results, it is convenient
dimension two which lie on a given three-dimensional
to restrict attention to a special (nontrivial) para-
energy surface, will separate the points on the energy
digmatic case
surface into the set which is inside the torus and the
H" A; a 12 A2 "f a 80 set which is outside. Hence, an initial datum
starting (say) inside cannot reach the outside. Like-
In this simple case (called Thirring model: represent- wise, a point starting between two tori has to stay in
ing particles on a circle interacting via a potential between forever. Further, if the two tori are close, this
"f (a)) the equations for the maximal tori [75] means that motion will stay very localized in action
reduce to equations for the only functions h" : space, with a trajectory accessing only points close to
the tori and coming close to all such points, within a
w  y 2 h" y " a f y h" y ; y 2 T 81 distance of the order of the distance between the
confining tori. The case of three or more degrees of
as the second of [75] simply becomes the definition
freedom is quite different (see sections Diffusion in
of H " because the RHS does not involve H " .
phase space and The three-body problem).
The real problem is therefore whether the formal
In the simple case of the rotators system [80] the
series considered in the last section converge at least
equations for the parametric representation of the
for small ": and the example [76] on the Birkhoff
tori are given by [81]. The latter bear some analogy
series shows that sometimes sum rules might be
with the easier problem in [66]: but [81] are
needed in order to give a meaning to the series. In
equations instead of one and they are differential
fact, whenever a problem (of physical interest)
equations rather than ordinary equations. Further-
admits a formal power series solution which is not
more, the function f (a) which plays here the role of
convergent, or which is such that it is not known
c() in [66] has Fourier coefficient fn with no
whether it is convergent, then one should look for
restrictions on n, while the Fourier coefficients c
sum rules for it.
for c in [66] do not vanish only for = 1.
The modern theory of perturbations starts with
The above differences are, to some extent,
the proof of the convergence for " small enough of
minor and the power series solution to [81] can
the Lindstedt series (KOLMOGOROV). The general
be constructed by the same algorithm as used in the
KAM result is:
case of [66]: namely one forms trees as in Figure 5
Theorem 3 (KAM) Consider the Hamiltonian with the harmonic labels v 2 Z replaced by n v 2 Z
H" (A, a) = h(A) "f (A, a), defined in U = V  T (still to be thought of as possible harmonic indices in
with V  R open and bounded and with f (A, a), the Fourier expansion of the perturbing function f).
h(A) analytic in the closure V  T where h(A) is also All other labels affixed to the trees in the section
def
anisochronous; let w 0 = w(A0 ) = @A h(A0 ) and assume Generic nonintegrability will be the same. In
that w 0 satisfies [74]. Then particular, the current flowing on a branch l = v0 v
will be defined as the sum of the harmonics of the
(i) there is "C,  > 0 such that the Lindstedt series
nodes w  v preceding v:
converges for j"j < "C,  ;
(ii) its sum yields two function H " (y ), h" (y ) on T def
X
which parametrize an invariant torus nl nw 82
wv
T C,  (A0 , ");
(iii) on T C,  (A0 , ") the motion is y ! y w 0 t, see
and we call n(
) the current flowing in the root
[73]; and
branch.
(iv) the set of data in U which belong to invariant
Here the value Val(
) of a tree has to be defined
tori T C,  (A0 , ") with w(A0 ) satisfying [74]
differently because the equation to be solved ([81])
with prefixed C,  has complement with volume
contains the differential operator (w 0  y )2 which,
<const Ca for a suitable a > 0 and with area
when Fourier transformed, becomes multiplication
also <const Ca on each nontrivial surface of
of the Fourier component with harmonic n by
constant energy H" = E.
(iw  n)2 .
In other words, for small " the spectra of most The variation due to the presence of the operator
unperturbed quasiperiodic motions can still be found (w 0  y )2 and the necessity of its inversion in the
as spectra of perturbed quasiperiodic motions devel- evaluation of u  h(k)
n , that is, of the component of
oping on tori which are close to the corresponding h(k)
n along an arbitrary unit vector u, is nevertheless
unperturbed ones (i.e., with the same spectrum). quite simple: the value of a tree graph
of order k
Introductory Article: Classical Mechanics 21

(i.e., with k nodes and k branches) has to be defined which lie on the same path to the root carry the
by (cf. [68]) same current and, furthermore, the node harmonics
! are bounded by jnj  N for some N. Then the
def i1
k Y n v0  n v number of lines in
with divisor w 0  n satisfying
Val

k! 2 2n < Cjw 0  n j  2n1 does not exceed 4Nk2n= .
lines lv0 v w 0  nl
!
Y Hence, setting
 fn v 83
def
nodes v F C2 maxjnjN jfn j
where the n v0 appearing in the factor relative to the the corresponding Val(
) can be bounded by
root line rv from the first node v to the root r (see
1 k 2k Y 1
n= def 1
Figure 5) is interpreted as a unit vector u (it was F N 22n4Nk2 Bk
interpreted as 1 in the one-dimensional case [66]). k! n0
k!
X 85
Equation [83] makes sense only for trees in which B FN 2 2 8n2n=
no line carries zero current. Then the component n
along u (the harmonic label attached to the root of a
since the product is convergent. In the case in which
tree) of h(k) is given (see also [69]) by
f is a trigonometric polynomial of degree N, the
X

above restricted contributions to u  h(k) n would
u  hk
n Val
84 generate a convergent series for " small enough. In

; n
n
order
k
fact, the number of trees is bounded (as in the
section Perturbing
P funct ions) by k!4k (2N 1)k so
k (k)
where the
means that the sum is only over trees in that the series n j"j ju  hn j would converge for
which a nonzero current n(l) flows on the lines l 2
. small " (i.e., j"j < (B  4(2N 1) )1 ).
The quantity u  h(k) 0 will be defined to be 0 (see the Given this comment, the analysis of the remain-
previous section). ing contributions becomes the real problem, and it
In the case of [66] zero-current lines could appear: requires new ideas because among the excluded trees
but the contributions from tree graphs containing at there are some simple kth order trees whose value
least one zero current line would cancel. In the alone, if considered separately from the other
present case, the statement that the above algorithm contributions, would generate a factorially divergent
actually gives h(k)n by simply ignoring trees with lines power series in ".
with zero current is nontrivial. It was Poincares However, the contributions of all large-valued
contribution to the theory of Lindstedt series to show trees of order k can be shown to cancel: although
that even in the general case (cf. [75]) the equations not exactly (unlike the case of the elementary
for the invariant tori can be solved by a formal power problem in the section Perturbing functions,
series. Equation [84] is proved by induction on k after where the cancellation is not necessary for the
checking it for the first few orders. proof, in spite of its exact occurrence), but enough
The algorithm just described leading to [83] can so that in spite of the existence of exceedingly large
be extended to the case of the general Hamiltonian values of individual tree graphs their total sum can
considered in the KAM theorem. still be bounded by a constant to the power k so that
The convergence proof is more delicate than the the power series actually converges for " small
(elementary) one for eqn [66]. In fact, the values of enough. The idea is discussed in Appendix 1.
trees of order k can give large contributions to h(k) n : For more details, the reader is referred to Poincare
because the new factors (w 0  n(l))2 , although not (1987), Kolmogorov (1954), Moser (1962), and Arnold
zero, can be quite small and their small size can (1989).
overwhelm the smallness of the factors fn and ". In
fact, even if f is a trigonometric polynomial (so that fn
vanishes identically for jnj large enough) the currents Resonances and their Stability
flowing in the branches can be very large, of the
A quasiperiodic motion with r rationally indepen-
order of the number k of nodes in the tree; see [82].
dent frequencies is called resonant if r is strictly less
This is called the small-divisors problem. The key
than the number of degrees of freedom, . The
to its solution goes back to a related work (SIEGEL)
difference s =  r is the degree of the resonance.
which shows that
Of particular interest are the cases of a perturba-
Theorem 4 Consider the contribution to the sum tion of an integrable system in which resonant
in [82] from graphs
in which no pairs of lines motions take place.
22 Introductory Article: Classical Mechanics

A typical example is the n-body problem which other words, the a priori stable case, s1 = s2 = 0 in
studies the mutual perturbations of the motions of [86], is the only excluded case. Of course, the stability
n  1 particles gravitating around a more massive properties of the motions when a perturbation acts
particle. If the particle masses can be considered to will depend on the perturbation in both cases.
be negligible, the system will consist of n  1 central The a priori stable systems usually have a great
Keplerian motions: it will therefore have = 3(n  1) variety of resonances (e.g., in the anisochronous
degrees of freedom. In general, only one frequency case, resonances of any dimension are dense). The
per body occurs in the absence of the perturbations a priori unstable systems have (among possible other
(the period of the Keplerian orbit). Hence, r  n  1 resonances) some very special r-dimensional
and s 2(n  1) (or in the planar case s (n  1)) resonances occurring when the unstable coordinates
with equality holding when the periods are ration- (p, q) and (p, k ) are zero and the frequencies of the r
ally independent. actionangle coordinates are rationally independent.
Another example is the rigid body with a fixed In the first case (a priori stable), the general
point perturbed by a conservative force: in this case, question is whether the resonant motions, which
the unperturbed system has three degrees of freedom form invariant tori of dimension r arranged into
but, in general, only two frequencies (see the families that fill -dimensional invariant tori, con-
discussion following [52]). tinue to exist, in presence of small enough perturba-
Furthermore, in the above examples there is the tions "f (A, a), on slightly deformed invariant tori.
possibility that the independent frequencies assume, Similar questions can be asked in the a priori
for special initial data, values which are rationally unstable cases. To examine the matter more closely
related, giving rise to resonances of even higher consider the formulation of the simplest problems.
order (i.e., with smaller values of r). A priori stable resonances: more precisely, suppose
In an integrable anisochronous system, resonant H0 = 12 A2 and let {A0 }  T be the unperturbed
motions will be dense in phase space because the invariant torus T A0 with spectrum w 0 = w(A0 ) =
frequencies w(A) will vary as much as the actions @A H0 (A0 ) with only r rationally independent compo-
and therefore resonances of any order (i.e., any nents. For simplicity, suppose that w 0 = (!1 , . . . ,
def
r < ) will be dense in phase space: in particular, the !r , 0, . . . , 0) = (w, 0) with w 2 Rr . The more general
periodic motions (i.e., the highest-order resonances) case in which w has only r rationally independent
will be dense. components can be reduced to the special case above
Resonances, in integrable systems, can arise in by a canonical linear change of coordinates at the price
a priori stable integrable systems and in a priori of changing the H0 to a new one, still quadratic in the
unstable systems: the former are systems whose actions but containing mixed products Ai Bj : the proofs
Hamiltonian admits canonical actionangle coordi- of the results that are discussed here would not be
nates (A, a) 2 U  T with U  R open, while the really affected by such more general form of H.
latter are systems whose Hamiltonian has, in It is convenient to distinguish between the fast
suitable local canonical coordinates, the form angles 1 , . . . , r and the resonant angles
r1 , . . . , (also called slow or secular) and
X
s1
1 X
s2
1 call a = (a 0 , b) with a 0 2 Tr and b 2 Ts . Likewise,
H0 A p2i  2i q2i 2j 2j 2j ; we distinguish the fast actions A0 = (A1 , . . . , Ar ) and
i1
2 j1
2 86
the resonant ones Ar1 , . . . , A and set A = (A0 , B)
i ; j > 0 with A0 2 Rr and B 2 Rs .
Therefore, the torus T A0 , A0 = (A00 , B0 ), is in turn a
where (A, a) 2 U  Tr , U 2 R r , (p, q) 2 V  R 2s1 , continuum of invariant tori T A0 , b with trivial
(p, k )2V  R2s2 with V,V 0 neighborhoods of the
0
parametric equations: b fixed, a 0 = y , y 2 Tr , and
origin
p and = r s1 s2 , si 0, s1 s2 > 0 and A0 = A00 , B = B0 . On each of them the motion is:
p
j , j are called Lyapunov coefficients of A0 , B, b constant and a 0 ! a 0 wt, with rationally
the resonance. The perturbations considered are independent w 2 Rr .
supposed to have the form "f (A, a, p, q, p, k ). The Then the natural question is whether there exist
denomination of a priori stable or unstable refers to functions h" , k" , H " , K " smooth in " near " = 0 and in
the properties of the a priori given unperturbed y 2 Tr , vanishing for " = 0, and such that the torus
Hamiltonian. The label a priori unstable is T A0 , b 0 , " with parametric equations
certainly appropriate if s1 > 0: here also s1 = 0 is
allowed for notational convenience implying that the
A0 A00 H " y ; a 0 y h" y ;
Lyapunov coefficients in a priori unstable cases are all
p y 2 Tr 87
of order 1 (whether real j or imaginary i j ). In B B0 K " y ; b b 0 k" y
Introductory Article: Classical Mechanics 23

is invariant for the motions with Hamiltonian Theorem 5 If w 2 Rr satisfies a Diophantine


property and if b 0 is a nondegenerate stationarity
2
H" A; a 12 A0 12 B2 "f a 0 ; b point for the fast angle average f ( b) (i.e., such
2
that det @bb f ( b 0 ) 6 0), then the following equations
and the motions on it are y ! y wt. The above for the functions h" , k" ,
property, when satisfied, is summarized by saying
that the unperturbed resonant motions w  @y 2 h" y "@a 0 f y h" y ; b 0 k" y
A = (A00 , B0 ), a = (a 00 w 0 t, b 0 ) can be continued in 89
presence of perturbation "f , for small ", to quasiper- w  @y 2 k" y "@b f y h" y k" y
iodic motions with the same spectrum and on a
slightly deformed torus T A00 , b 0 , " . can be formally solved in powers of ".
A priori unstable resonances: here the question is Given the simplicity of the Hamiltonian [80] that
whether the special invariant tori continue to exist we are considering, it is not necessary to discuss the
in presence of small enough perturbations, of functions H " , K " because the equations that they
course slightly deformed. This means asking should obey reduce to their definitions as in the
whether, given A0 such that w(A0 ) = @A H0 (A0 ) has section Quasiperiodicity and KAM stability, and
rationally independent components, there are func- for the same reason.
tions (H " (y ), h" (y )), (P " (y ), Q" (y )) and (P " (y ), In other words, also the resonant tori admit a
K " (y )) smooth in " near " = 0, vanishing for " = 0, Lindstedt series representation. It is however very
analytic in y 2 Tr and such that the r-dimensional unlikely that the series are, in general, convergent.
surface Physically, this new aspect is due to the fact that
the linearization of the motion near the torus T A0 , b 0
A A0 H " y ; a y h" y introduces oscillatory motions around T A00 , b 0 with
p P" y ; q Q" y y 2 Tr 88 frequencies proportional to the square roots of the
2
p P " y ; k K " y positive eigenvalues of the matrix "@bb f ( b 0 ): there-
fore, it is naively expected that it has to be necessary
is an invariant torus T A0, " on which the motion is that a Diophantine property be required on the
p
y ! y w(A0 )t. Again, the above property is vector (w, "1 , . . . ), where "j are the positive
summarized by saying that the unperturbed special eigenvalues. Hence, some values of ", namely those
p
resonant motions can be continued in presence of for which (w, "1 , . . . ) is not a Diophantine vector
perturbation "f for small " to quasiperiodic motions or is too close to a non-Diophantine vector, should
with the same spectrum and on a slightly deformed be excluded or at least should be expected to
torus T A0 , " . generate difficulties. Note that the problem arises
Some answers to the above questions are pre- irrespective of the assumptions about the nonde-
2
sented in the following section. For more details, the generate matrix @bb f ( b 0 ) (since " can have either
reader is referred to Gallavotti et al. (2004). sign), and no matter how small j"j is supposed to be.
2
But we can expect that if the matrix @bb f ( b 0 ) is
(say) positive definite (i.e., b 0 is a minimum point
for f ( b)) then the problem should be easier for " < 0
Resonances and Lindstedt Series
and vice versa, if b 0 is a maximum, it should be
We discuss eqns [87] in the paradigmatic case in easier for " > 0 (i.e., in the cases in which the
which the Hamiltonian H0 (A) is 12 A2 (cf. [80]). It eigenvalues of "@bb 2
f ( b 0 ) are negative and their roots
will be w(A0 )  A0 so that A0 = w, B0 = 0 and the do not have the interpretation of frequencies).
perturbation f (a) can be considered as a function Technically, the sums of the formal series can be
of a = (a 0 , b): let f ( b) be defined as its average over given (so far) a meaning only via summation rules
a 0 . The determination of the invariant torus of involving divergent series: typically, one has to
dimension r which can be continued in the sense identify in the formal expressions (denumerably
discussed in the last section is easily understood in many) geometric series which, although divergent,
this case. can be given a meaning by applying the rule [79].
A resonant invariant torus which, among the tori Since the rule can only be applied if z 6 1, this leads
T A0 , b , has parametric equations that can be con- to conditions on the parameter ", in order to exclude
tinued as a formal power series in " is the torus that the various z that have to be considered are very
T A0 , b 0 with b 0 a stationarity point for f ( b), that is, close to 1. Hence, this stability result turns out to be
an equilibrium point for the average perturbation: rather different from the KAM result for the
@b f ( b 0 ) = 0. In fact, the following theorem holds: maximal tori. Namely the series can be given a
24 Introductory Article: Classical Mechanics

meaning via summation rules provided f and b 0 The case of a priori unstable systems has also
satisfy certain additional conditions and provided been widely studied. In this case too resonances
certain values of " are excluded. An example of a with Diophantine r-dimensional spectrum w are
theorem is the following: considered. However, in the case s2 = 0 (called a
priori unstable hyperbolic resonance) the Lindstedt
Theorem 6 Given the Hamiltonian [80] and a
series can be shown to be convergent, while in the
resonant torus T A00 , b 0 with w = A00 2 Rr satisfying a
case s1 = 0 (called a priori unstable elliptic reso-
Diophantine property let b 0 be a nondegenerate
nance) or in the mixed cases s1 , s2 > 0 extra
maximum R point for the average potential f ( b) def =
r conditions are needed. They involve w and
(2) Tr f (a 0 , b)dr a 0 . Consider the Lindstedt series
m = (1 , . . . , s2 ) (cf. [86]) and properties of the
solution for eqns [89] of the perturbed resonant
perturbations as well. It is also possible to study a
torus with spectrum (w, 0). It is possible to express
slightly different problem: namely to look for
the single nth-order term of the series as a sum of
conditions on w, m, f which imply that, for small
many terms and then rearrange the series thus
", invariant tori with spectrum "-dependent but
obtained so that the resummed series converges for
close, in a suitable sense, to w exist.
" in a domain E which contains a segment [0, "0 ] and
The literature is vast, but it seems fair to say that,
also a subset of ["0 , 0] which, although with open
given the above comments, particularly those con-
dense complement, is so large that it has 0 as a
cerning uniqueness and analyticity, the situation is still
Lebesgue density point. Furthermore, the resummed
quite unsatisfactory. We refer the reader to Gallavotti
series for h" , k" define an invariant r-dimensional
et al. (2004) for more details.
analytic torus with spectrum w.
More generally, if b 0 is only a nondegenerate
stationarity point for f ( b), the domain of definition
Diffusion in Phase Space
of the resummed series is a set E  ["0 , "0 ] which
on both sides of the origin has an open dense The KAM theorem implies that a perturbation of an
complement although it has 0 as a Lebesgue density analytic anisochronous integrable system, i.e., with
point. an analytic Hamiltonian H" (A, a) = H0 (A)
Theorem 6 can be naturally extended to the "f (A, a) and nondegenerate Hessian matrix
2
general case in which the Hamiltonian is the most @AA h(A), generates large families of maximal invar-
general perturbation of an anisochronous integrable iant tori. Such tori lie on the energy surfaces but do
2
system H" (A, a) = h(A) "f (A, a) if @AA h is a non- not have codimension 1 on them, i.e., they do not
singular matrix and the resonance arises from a split the (2  1)dimensional energy surfaces into
spectrum w(A0 ) which has r independent compo- disconnected regions except, of course, in the case of
nents (while the remaining are not necessarily zero). systems with two degrees of freedom (see the section
We see that the convergence is a delicate problem Quasiperiodicity and KAM stability).
for the Lindstedt series for nearly integrable reso- The refore, there might exist trajectories with
nant motions. They might even be divergent initial data close to Ai in action space which reach
(mathematically, a proof of divergence is an open phase space points close to Af 6 Ai in action space
problem but it is a very reasonable conjecture in for " 6 0, no matter how small. Such diffusion
view of the above physical interpretation); never- phenomenon would occur in spite of the fact that
theless, Theorem 6 shows that sum rules can be the corresponding trajectory has to move in a space
given that sometimes (i.e., for " in a large set near in which very close to each {A}  T there is an
" = 0) yield a true solution to the problem. invariant surface on which points move keeping
This is reminiscent of the phenomenon met in A constant within O("), which for " small can be
discussing perturbations of isochronous systems in  jAf  Ai j.
[76], but it is a much more complex situation. It In a priori unstable systems (cf. the section
leaves many open problems: foremost among them Resonances and their stability) with s1 = 1,
is the question of uniqueness. The sum rules of s2 = 0, it is not difficult to see that the correspond-
divergent series always contain some arbitrary ing phenomenon can actually occur: the paradig-
choices, which lead to doubts about the uniqueness matic example (ARNOLD) is the a priori unstable
of the functions parametrizing the invariant tori system
constructed in this way. It might even be that the
convergence set E may depend upon the arbitrary A21 p2
H" A2 gcos q  1
choices, and that considering several of them no " 2 2
with j"j < "0 is left out. "cos 1 sin 2 cos q  1 90
Introductory Article: Classical Mechanics 25

This is a system describing a motion of a pendu-


lum ((p, q) coordinates) interacting with a rotat- Af
ing wheel ((A1 , 1 ) coordinates) and a clock
Ai
((A2 , 2 ) coordinates) a priori unstable near the
p Af
points p = 0, q = 0, 2 (s1 = 1, s2 = 0, 1 = g, Ai
cf. [86]). It can be proved that on the energy surface
of energy E and for each " 6 0 small enough (no
(a) (b)
matter how small) there are initial data with action
Figure 6 (a) The " = 0 geometry: the partial energy lines are
coordinates close to Ai = (Ai1 , Ai2 ) with (1=2)Ai21 A2
i
parabolas, (1=2)A21 A2 = const: The vertical lines are the
close to E eventually evolving to a datum resonances A1 = rational (i.e., 1 A1 2 = 0). The disks are
A0 = (A01 , A02 ) with A01 at a distance from Af1 smaller neighborhoods of the points Ai and Af (the dots at their centers).
than an arbitrarily prefixed distance (of course with (b) " 6 0; an artists rendering of a trajectory in A space, driven
energy E). Furthermore, during the whole process by the pendulum swings to accelerate the wheel from Ai1 to Af1 at
the expenses of the clock energy, sneaking through invariant tori
the pendulum energy stays close to zero within o(")
not represented and (approximately) located away from the
(i.e., the pendulum swings following closely the intersections between resonances and partial energy lines (a
unperturbed separatrices). dense set, however). The pendulum coordinates are not shown:
In other words, [90] describes a machine (the its energy stays close to zero, within a power of ". Hence the
pendulum) which, working approximately in a pendulum swings, staying close to the separatrix. The oscilla-
tions symbolize the wiggly behavior of the partial energy
cycle, extracts energy from a reservoir (the clock)
(1=2)A21 A2 in the process of sneaking between invariant tori
to transfer it to a mechanical device (the wheel). The which, because of their invariance, would be impossible without
statement that diffusion is possible means that the the pendulum. The energy (1=2)A21 of the wheel increases
machine can work as soon as " 6 0, if the initial slightly at each pendulum swing: accurate estimates yield an
actions and the initial phases (i.e., 1 , 2 , p, q) are increase of the wheel speed A1 of the order of "=( log "1 ) at
each swing of the pendulum implying a transition time of the
suitably tuned (as functions of ").
order of g 1=2 "1 log "1 .
The peculiarity of the system [90] is that the fixed
points P of the unperturbed pendulum (i.e., the
equilibria p = 0, q = 0, 2) remain unstable equilibria
The latter property remains true for more general
even when " 6 0 and this is an important simplify-
a priori unstable Hamiltonians
ing feature.
It is a peculiarity that permits bypassing the H" H0 A Hu p; q "f A; a; p; q
obstacle, arising in the analysis of more general 91
in U  T  R2
cases, represented by the resonance surfaces consist-
ing of the As with A1 1 2 = 0: the latter where Hu is a one-dimensional Hamiltonian which
correspond to harmonics ( 1 , 2 ) present in the has two unstable equilibrium points P and P
perturbing function, i.e., the harmonics which linearly repulsive in one direction and linearly
would lead to division by zero in an attempt to attractive in another which are connected by two
construct (as necessary in studying [90] by Arnolds heteroclinic trajectories which, as time tends to 1,
method) the parametric equations of the perturbed approach P and P and vice versa.
invariant tori with action close to such As. In the Actually, the points need not be different but, if
case of [90] the problem arises only on the coinciding, the trajectories linking them must be
resonance marked in Figure 6 by a heavy line, i.e., nontrivial: in the case [90] the variable q can be
A1 = 0, corresponding to cos 1 in [90]. considered an angle and then P and P would
If " = 0, the points P with p = 0, q = 0 and the coincide (but are connected by nontrivial trajec-
point P with p = 0, q = 2 are both unstable tories, i.e., by trajectories that also visit points
equilibria (and they are, of course, the same point, different from P ). Such trajectories are called
if q is an angular variable). The unstable manifold heteroclinic if P 6 P and homoclinic if P = P .
(it is a curve) of P coincides with the stable In the general case, besides the homoclinicity (or
manifold of P and vice versa. So that the heteroclinicity) condition, certain weak genericity
unperturbed system admits nontrivial motions lead- conditions, automatically satisfied in the example
ing from P to P and from P to P , both in a bi- [90], have to be imposed in order to show that,
infinite time interval (1, 1): the p, q variables given Ai and Af with the same unperturbed energy
describe a pendulum and P are its unstable E, one can find, for all " small enough but not equal
equilibria which are connected by the separatrices to zero, initial data ("-dependent) with actions
(which constitute the zero-energy surfaces for the arbitrarily close to Ai which evolve to data with
pendulum). actions arbitrarily close to Af . This is a phenomenon
26 Introductory Article: Classical Mechanics

called the Arnold diffusion. Simple sufficient con- Long-Time Stability of Quasiperiodic
ditions for a transition from near Ai to near Af are Motions
expressed by the following result:
A more difficult problem is whether the same
Theorem 7 Given the Hamiltonian [91] with Hu phenomenon of migration in action space occurs in
admitting two hyperbolic fixed points P with a priori stable systems. The root of the difficulty is a
heteroclinic connections, t ! (pa (t), qa (t)), a = 1, 2, remarkable stability property of quasiperiodic
suppose that: motions. Consider Hamiltonians H" (A, a) = h(A)
(i) On the unperturbed energy surface of energy "f (A, a) with H0 (A) = h(A) strictly convex, analytic,
E = H(Ai ) Hu (P ) there is a regular curve and anisochronous on the closure U of an open
 : s ! A(s) joining Ai to Af such that the bounded region U  R , and a perturbation "f (A, a)
unperturbed tori {A(s)}  T can be continued analytic in U  T .
at " 6 0 into invariant tori T A(s), " for a set of Then a priori bounds are available on how long it
values of s which fills the curve  leaving only can possibly take to migrate from an action close to
gaps of size of order o("). A1 to one close to A2 : and the bound is of
(ii) The  matrix Dij of the second derivatives of exponential type as " ! 0 (i.e., it admits a lower
the integral of f over the heteroclinic motions is bound which behaves as the exponential of an
not degenerate, that is, inverse power of "). The simplest theorem is
(NEKHOROSSEV):
j det Dj Theorem 7 There are constants 0 < a, b, d, g, 

Z such that any initial datum (A, a) evolves so that A

1

det dt @ i j f A; a wAt; will not change by more than a"g before a long time
1 bounded below by  exp (b"d ).



Thus, this puts an exponential bound, i.e., a
pa t; qa t

> c > 0 92


bound exponential in an inverse power of ", to the
diffusion time: before a time  exp (b"d ) actions can
for all As on the curve  and all a 2 T2 . only change by O("g ) so that their variation cannot
be large no matter how small " 6 0 is chosen. This
Given arbitrary  > 0, for " 6 0 small enough
places a (long) lower bound to the time of diffusion
there are initial data with action and energy closer
in a priori stable systems.
than  to Ai and E, respectively, which after a long
The proof of the theorem provides, actually, an
enough time acquire an action closer than  to Af
interesting and detailed picture of the variations in
(keeping the initial energy).
actions showing that some actions may vary more
The above two conditions can be shown to hold slowly than others.
generically for many pairs Ai 6 Af (and many The theorem is constructive, i.e., all constants
choices of the curves  connecting them) if the 0 < a, b, d,  can be explicitly chosen and depend
number of degrees of freedom is 3. Thus, the result, on , H0 , f although some of them can be fixed to
obtained by a simple extension of the argument depend only on and on the minimum curvature of
originally outlined by Arnold to discuss the para- the convex graph of H0 . Its proof can be adapted
digmatic example [90], proves the existence of to cover many cases which do not fall in the class of
diffusion in a priori unstable systems. The integral systems with strictly convex unperturbed Hamilto-
in [92] is called Melnikov integral. nian, and even to cases with a resonant unperturbed
The real difficulty is to estimate the time needed Hamiltonian.
for the transition: it is a time that obviously has to However, in important problems (e.g., in the
diverge as " ! 0. Assuming g fixed (i.e., " indepen- three-body problems met in celestial mechanics)
dent) a naive approach easily leads to estimates there is empirical evidence that diffusion takes
which can even be worse than O(exp (a"b )) with place at a fast pace (i.e., not exponentially slow in
some a, b > 0. It has finally been shown that in such the above sense) while the above results would
cases the minimum time can be, for rather general forbid a rapid migration in phase space if they
perturbations "f (a, q), estimated above by applied: however, in such problems the assumptions
O("1 log "1 ), which is the best that can be hoped of the theorem are not satisfied, because the
for under generic assumptions. unperturbed system is strongly resonant (as in the
The reader is referred to Arnold (1989) and celestial mechanics problems, where the number of
Chierchia and Valdinoci (2000) for more details. independent frequencies is a fraction of the number
Introductory Article: Classical Mechanics 27

of degrees of freedom and h(A) is far from strictly with " small and the mass mM moves in the plane of
convex), leaving wide open the possibility of observ- the circular orbit. This will be called the circular
ing rapid diffusion. restricted three-body problem.
Further, changing the assumptions can dramati- In a reference system with center S and rotating at
cally change the results. For instance, rapid diffusion the angular speed of J around S inertial forces
can sometimes be proved even though it might be (centrifugal and Coriolis) act. Supposing that the
feared that it should require exponentially long body J is located on the axis with unit vector i at
times: an example that has been proposed is the distance R from the origin S, the acceleration of the
case of a three-timescales system, with Hamiltonian point M is
 
p2 2 "R
!1 A1 !2 A2 g1 cos q F !0 R 
R i  2w 0 ^ R_
2 1"
"f 1 ; 2 ; p; q 93
if F is the force of attraction and w 0 ^ R_  !0 R_ ?
def 1=2 1=2
with w " = (!1 , !2 ), where !1 = " !, !2 = " !e where w 0 is a vector with jw 0 j = !0 and perpen-
and p e > 0 constants. The three scales are
!,
! dicular to the orbital plane and R? def = (2 , 1 ) if
!1
1 , g1 , !1
2 . In this case, there are many R = (1 , 2 ). Here, taking into account that the origin
(although by no means all) pairs A1 , A2 which can S rotates around the fixed center of mass, !20 (R 
be connected within a time that can be estimated to "R=(1 ")i) is the centrifugal force while 2w 0 ^ R_
be of order O("1 log "1 ). is the Coriolis force. The equations of motion can
This is a rapid-diffusion case in an a priori therefore be derived from a Lagrangian
unstable system in which condition [92] is not
satisfied: because the "-dependence of w(A) implies 1 2 1
L R_  W !0 R?  R_ !20 R2
that the lower bound c in [92] must depend on " 2 2
(and be exponentially small with an inverse power 2 "R
 !0 Ri 94
of " as " ! 0). 1"
The unperturbed system in [93] is nonresonant in with
the H0 part for " > 0 outside a set of zero measure
(i.e., where the vector w " satisfies a suitable def
!20 R3 kmS 1 " g0
Diophantine property) and, furthermore, it is
kmS kmS "
a priori unstable: cases met in applications can be W 
a priori stable and resonant (and often not aniso- jRj jR  Rij
chronous) in the H0 part. In such a system, not only where k is the gravitational constant, R the distance
the speed of diffusion is not understood but between S and J, and finally the last three terms in [94]
proposals to prove its existence, if present (as come from the Coriolis force (the first) and from the
expected), have so far not given really satisfactory centripetal force (the other two, taking into account that
results. the origin S rotates around the fixed center of mass).
For more details, the reader in referred Setting g = g0 =(1 ")  kmS , the Hamiltonian of
to Nekhorossev (1977). the system is
1 g 1
H p  !0 R? 2   !20 R2
The Three-Body Problem 2  jRj 2
g

R
1 R 

Mechanics and the three-body problem can be "


 i
  i 95
R R R
almost identified with each other, in the sense that
the motion of three gravitating masses has long been The first part can be expressed immediately in the
a key astronomical problem and at the same time actionangle coordinates for the two-body problem
the source of inspiration for many techniques: (cf. the section Newtonian potential and Keplers
foremost among them the theory of perturbations. laws). Calling such coordinates (L0 , 0 , G0 , 0 ) and
As an introduction, consider a special case. Let
0 the polar angle of M with respect to the major axis
three masses mS = m0 , mJ = m1 , mM = m2 interact of the ellipse and 0 the mean anomaly of M on its
via gravity, that is, with interaction potential ellipse, the Hamiltonian becomes, taking into account
kmi mj jxi  xj j1 : the simplest problem arises that for " = 0 the ellipse axis rotates at speed !0 ,
when the third body has a neglegible mass compared 
1 R 
g2 g

to the two others and the latter are supposed to be H  2  !0 G0  "


 i
  i 96
on a circular orbit; furthermore, the mass mJ is "mS 2L0 R R R
28 Introductory Article: Classical Mechanics

which is convenient if we study the interior problem, on an ellipse rotating at a rate !0 ) with actions
i.e., jRj < R. This can be expressed in the action (L0 , G0 ), provided " is small enough. Hence,
angle coordinates via [41], [42]:
The KAM theorem answers, at least conceptually, the

0 0 f 0 ;
0 0 0 0 f 0 classical question: can a solution of the three-body
 1=2 problem remain close to an unperturbed one forever?
G20 jRj G20 1 97 That is, is it possible that a solar system is stable
e 1 2 ; forever?
L0 R gR 1 ecos0 f0

where (see [42]), f =f (esin, ecos) and Assuming e, j%j=R  1 and retaining only the lowest
  orders in e and j%j=R  1 the Hamiltonian [98]
5 simplifies into
f x; y 2x 1 y   
4
g2 "g G40 
with the ellipsis denoting higher orders in x, y even H  2  !G0 " G0  3cos20 0
2L0 2R g2 R2
in x. The Hamiltonian takes the form, if !2 = gR3 , 9
 e cos0  e cos0 20
g2 g 2
H"   !G0 " FG0 ; L0 ;0 ;0 0 98 3 
2L20 R ecos30 20 100
2
where the only important feature (for our purposes) is
where
that F(L, G, , ) is an analytic function of L, G, ,
near a datum with jGj < L (i.e., e > 0) and jRj < R. "g G40
However, the domain of analyticity in G is rather " G0 1 "1=2  1!G0 
2R g2 R2
small as it is constrained by jGj < L excluding in  1=2
particular the circular orbit case G = L. G20
e 1 2
Note that apparently the KAM theorem fails to be L0
applicable to [98] because the matrix of the second
derivatives of H0 (L, G) has vanishing determinant. It is an interesting exercise to estimate, assuming
Nevertheless, the proof of the theorem also goes as model the Hamiltonian [100] and following the
through in this case, with minor changes. This can proof of the KAM theorem, how small has " to be if
be checked by studying the proof or, following a a planet with the data of Mercury can be stable
remark by Poincare, by simply noting that the forever on a (slowly precessing) orbit with actions
squared Hamiltonian H0" def
= (H" )2 has the form close to the present-day values under the influence
of a mass " times the solar mass orbiting on a circle,
 2
g2 at a distance from the Sun equal to that of Jupiter. It
H"  2  !G0 "F0 G0 ; L0 ; 0 ; 0 0 99
0
is possible to follow either the above reduction to
2L0
the ordinary KAM theorem or to apply directly to
with F0 still analytic. But this time [100] the Lindstedt series expansion, proceeding
@ 2 H00 along the lines of the section Quasiperiodicity and
det 6g2 L4 2
0 !0 h 6 0 KAM stability. The first approach is easy but the
@G0 ; L0
second is more efficient: in both cases, unless the
if h g2 L2
0  2!G0 6 0 estimates are done in a particularly careful manner,
the value found for "mS is not interesting from the
Therefore, the KAM theorem applies to H0" and viewpoint of astronomy.
the key observation is that the orbits generated by The reader is refered to Arnold (1989) for more
the Hamiltonian (H" )2 are geometrically the same as details.
those generated by the Hamiltonian H" : they are
only run at a different speed because of the need of a
time rescaling by the constant factor 2H" .
Rationalization and Regularization of
This shows that, given an unperturbed ellipse of
Singularities
parameters (L0 , G0 ) such that w = (g2 =L30 , !),
G0 > 0, with !1 =!2 Diophantine, then the perturbed Often integrable systems have interesting data which
system admits a motion which is quasiperiodic with lie on the boundary of the integrability domain. For
spectrum proportional to w and takes place on an orbit instance, the central motion when L = G (circular
which wraps around a torus remaining forever close to orbits) or the rigid body in a rotation around one of
the unperturbed torus (which can be visualized as the principal axes or the two-body problem when
described by a point moving, according to the area law G = 0 (collisional data). In such cases, perturbation
Introductory Article: Classical Mechanics 29

theory cannot be applied as discussed above. obtained from the one in [101] by letting alone
Typically, the
perturbation depends on quantities
p L,  and setting
like L  G and is not analytic at L = G. Never- p p
theless, it is sometimes possible to enlarge phase space p 2G cos ; q 2G sin  104
and introduce new coordinates in the vicinity of the then p, q vary in a neighborhood of the origin with
data which in the initial phase space are singular. the origin itself excluded.
A notable example is the failure of the analysis of Adding the origin of the pq plane then in a full
the circular restricted three-body problem: it appar- neighborhood of the origin, the Hamiltonian [96] is
ently fails when the orbit that we want to perturb is analytic in L, , p, q. This is because it is analytic
circular. (cf. [96], [97]) as a function of L,  and e cos
0
It is convenient to introduce the canonical and of cos (0
0 ). Since
0 =   f and
coordinates L,  and G, :
0 0 =  f by [97], the Hamiltonian [96] is
analytic in L, , e cos (  f ), cos ( f )
L L0 ; G L 0  G0 for e small (i.e., for G small) and, by [42], f is
101
 0 0 ;  0 analytic in e sin ( ) and e cos ( ). Hence the
pq trigonometric identities
so that e = 2GL1 1  G(2L)1 and 0 =   r
and
0 = 0 f0 , where f0 is defined in [42] (see p sin  q cos  G
e sin  p 1
also [97]). Hence, L 2L
r 105

0   f ;
0 0  f p cos   q sin  G
e cos  p 1
s
  L 2L
p 1 G
e 2G 1
102 together with G = (1=2)(p2 q2 ) imply that [103] is
L 2L
analytic near p = q = 0 and L > 0,  2 [0, 2]. The
j%j L2 1  e2 1 Hamiltonian becomes analytic and the new coordi-

R gR 1 e cos  f nates are suitable to describe motions crossing the
origin: for example, by setting
and the Hamiltonian [100] takes the form  
def 1 p2 q2 1=2
g2 C 1 L
H"   !L !G 2 4L
2L2
g [100] becomes
" FL  G; L;  ;  103
R
g2
In the coordinates L,G of [101] the unperturbed H  !L !12p2 q2
2L2
circular case corresponds to G = 0 and [96], once 4
"g L  12 p2 q2
expressed in the actionangle variables G, L, , , is " 12p2 q2 
analytic in a domain whose size is controlled by 2R g2 R2
p
G. Nevertheless, very often problems of perturba-  3 cos 2  11 cos  3 cos 3p
tion theory can be regularized.  7 sin  3 sin 3qC 106
This is done by enlarging the integrability
domain by adding to it points (one or more) around The KAM theorem does not apply in the form
the singularity (a boundary point of the domain of discussed above to Cartesian coordinates, that is,
the coordinates) and introducing new coordinates to when, as in [106], the unperturbed system is not
describe simultaneously the data close to the assigned in actionangle variables. However, there
singularity and the newly added points: in many are versions of the theorem (actually its corollaries)
interesting cases, the equations of motion are no which do apply and therefore it becomes possible to
longer singular (i.e., become analytic) in the new obtain some results even for the perturbations of
coordinates and are therefore apt to describe the circular motions by the techniques that have been
motions that reach the singularity in a finite time. illustrated here.
One can say that the singularity was only apparent. Likewise, the Hamiltonian of the rigid body with
Perhaps this is best illustrated precisely in the a fixed point O and subject to analytic external
above circular restricted three-body problem, with forces becomes singular, if expressed in the action
the singularity occurring where G = 0, that is, at a angle coordinates of Deprit, when the body motion
circular unperturbed orbit. If we describe the points nears a rotation around a principal axis or, more
with G small in a new system of coordinates generally, nears a configuration in which any two of
30 Introductory Article: Classical Mechanics

the axes i3 , z, or z0 coincide (i.e., any two among the It is useful to introduce the notion of a line 1
principal axis, the angular momentum axis and the situated between two lines , 0 with 0 > : this
inertial z-axis coincide; see the section Rigid will mean that 1 precedes 0 but not .
body). Nevertheless, by imitating the procedure All trees
in which there are some pairs l0 > l of
just described in the simpler cases of the circular consecutive lines of scale label 1 which have equal
three-body problem, it is possible to enlarge the current and such that all lines between them bear
phase space so that in the new coordinates the scale label 0 are obtained by inserting on the lines
Hamiltonian is analytic near the singular of trees in 0 with label 1 any number of clusters
configurations. of lines and nodes, with lines of scale 0 and with the
A regularization also arises when considering property that the sum of the harmonics of the nodes
collisional orbits in the unrestricted planar three- inserted vanishes.
body problem. In this respect, a very remarkable Consider a line l0 2
0 2 0 linking nodes v1 < v2
result is the regularization of collisional orbits in the and labeled 1 and imagine inserting on it a cluster
planar three-body problem. After proving that if the  of lines of scale 0 with sum of the node harmonics
total angular momentum does not vanish, simulta- vanishing and out of which emerges one line
neous collisions of the three masses cannot occur connecting a node vout in  to v2 and into which
within any finite time interval, the question is enters one line linking v1 to a node vin 2 . The
reduced to the regularization of two-body collisions, insertion of a klines, jj = (k 1)-nodes, cluster
under the assumption that the total angular momen- changes the tree value by replacing the line factor,
tum does not vanish. that will be briefly called value of the cluster , as
The local change of coordinates, which changes the
n v1  n v2 n v1  M; nl0 n v2 1
relative position coordinates (x, y) of two colliding 2
! 2
107
bodies as (x, y) ! (, ), with x iy = ( i)2 , is not w 0  nl0 w 0  nl0 w 0  nl0 2
one to one, hence it has to be regarded as an where M is an  matrix
enlargement of the positions space, if points with
different (, ) are considered different. However, the "jj Y Y n v  n v0
Mrs ; nl0 out; r in; s fn v 2
equations of motion written in the variables ,  have k! v2 l2 w 0  nl
no singularity at ,  = 0 (LEVI-CIVITA).
Another celebrated regularization is the regular- if = v0 v denotes a line linking v0 and v. Therefore, if
ization of the Schwartzschild metric, i.e., of the all possible connected clusters are inserted and the
general relativity version of the two-body problem: resulting values are added up, the result can be taken
it is, however, somewhat out of the scope of this into account by attributing to the original
P line l0 a
review (SYNGE, KRUSKAL). factor like [107] with M(0) (n(l0 )) def=  M(; n(l0 ))
For more details, the reader is refered to Levi- replacing M(; n(l0 )).
Civita (1956). If several connected clusters  are inserted on the
same line and their values are summed, the result is
a modification of the factor associated with the line
Appendix 1: KAM Resummation Scheme l0 into
The idea to control the remaining contributions is to !k
reduce the problem to the case in which there are no X
1
M0 nl0 1
pairs of lines that follow each other in the tree order n v1  2
n v2
k0 w 0  nl0 w 0  nl0 2
and which have the same current. Mark by a scale !
label 0 the lines, see [74], [83], of a tree whose 1
n v1  n v2 108
divisors C=w 0 :n(l) are >1: these are lines which give w 0  nl0 2  M0 nl0
no problems in the estimates. Then mark by a scale
label 1 the lines with current n(l) such that The series defining M(0) involves, by construction, only
jw 0  n(l)j  2n1 for n = 1 (i.e., the remaining lines). trees with lines of scale 0, hence with large divisors, so
The lines labeled 0 are said to be on scale 0, while that it converges to a matrix of small size of order "
those labeled 1 are said to be on scale 1. A cluster (actually "2 , more precisely) if " is small enough.
of scale 0 will be a maximal collection of lines of Convergence can be established by simply remark-
scale 0 forming a connected subgraph of a tree
. ing that the series defining M(1) is built with lines
Consider only trees
0 2 0 of the family 0 of with values >(1=2) of the propagator, so that it
trees containing no clusters of lines with scale label certainly converges for " small enough (by the
0 which have only one line entering the cluster and estimates in the section Perturbing functions,
one exiting it with equal current. where the propagators were identically 1) and the
Introductory Article: Classical Mechanics 31

sum is of order " (actually "2 ), hence <1. However, follow each other while any line between them has
such an argument cannot be repeated when dealing lower scale (i.e., 0), here between means preced-
with lines with smaller propagators (which still have ing l0 but not preceding l, as above.
to be discussed). Therefore, a method not relying on Therefore, a scale-independent method has to be
so trivial a remark on the size of the propagators has devised to check the convergence for M(1) and for the
eventually to be used when considering lines of scale matrices to be introduced later to deal with even
higher than 1, as it will soon become necessary. smaller propagators. This is achieved by the following
The advantage of the collection of terms achieved extension of Siegels theorem mentioned in the section
with [108] is that we can represent h as a sum of Quasiperiodicity and KAM stability:
values of trees which are simpler because they
Theorem 8 Let w 0 satisfy [74] and set w = Cw 0 .
contain no pair of lines of scale 1 with in between
Consider the contribution to the sum in [82] from
lines of scale 0 with total sum of the node harmonics
graphs
in which
vanishing. The price is that the divisors are now more
involved and we even have a problem due to the fact (i) no pairs 0 > of lines which lie on the same
that we have not proved that the series in [108] path to the root carry the same current n if all
converges. In fact, it is a geometric series whose value lines 1 between them have current n(1 ) such
is the RHS of [108] obtained by the sum rule [79] that jw  n(1 )j > 2jw  nj;
unless we can prove that the ratio of the geometric (ii) the node harmonics are bounded by jnj  N for
series is <1. This is trivial in this case by the previous some N.
remark: but it is better to note that there is another
Then the number of lines in
with divisor w  n
reason for convergence, whose use is not really
satisfying 2n < jw  n j  2n1 does not exceed
necessary here but will become essential later.
4 Nk2n= , n = 1, 2, . . . .
The property that the ratio of the geometric series
is <1 can be regarded as due to the consequence of This implies, by the same estimates in [85], that
the cancellation mentioned in the section Quasi- the series defining M(1) converges. Again, it must be
periodicity and KAM stability which can be checked that there are cancellations implying that
shown to imply that the ratio is <1 because M(1) (n) = "2 (w 0  n)2 m(1) (n) with jm(1) (n)j < D0 for
M(0) (n) = "2 (w 0  n)2 m(0) (n) with C jm(0) (n)j < D0 the same D0 > 0 and the same "0 .
for some D0 > 0 and for all j"j < "0 for some "0 . At this point, one deals with trees containing only
Then for small " the divisor in [108] is essentially lines carrying labels 0, 1, 2, and the line factors for
still what it was before starting the resummation. the lines = v0 v of scale 0 are n v0  n v =(w 0 n())2 ,
At this point, an induction can be started. Consider those of the lines = v0 v of scale 1 have line factors
trees evaluated with the new rule and place a scale n v0  (w 0  n()2  M(0) (n()))1 n v , and those of the
level 2 on the lines with C jw 0  n(l)j  2n1 for lines = v0 v of scale 2 have line factors
n = 2: leave the label 0 on the lines already marked
so and label by 1 the other lines. The lines of scale n v0  w 0  n2  M1 n1 n v
1 will satisfy 2n < jw 0  n(l)j  2n1 for n = 1.
The graphs will now possibly contain lines of scale 0, Furthermore, no pair of lines of scale 1 or of scale
1 or 2 while lines with label 1 no longer can 2 with the same momentum and with only lines
appear, by construction. of lower scale (i.e., of scale 0 in the first case or of
A cluster of scale 1 will be a maximal collection of scale 0, 1 in the second) between them can
lines of scales 0, 1 forming a connected subgraph of follow each other.
a tree
and containing at least one line of scale 1. This procedure can be iterated until, after infi-
The construction carried out by considering clusters nitely many steps, the problem is reduced to the
of scale 0 can be repeated by considering trees
1 2 1 , evaluation of tree values in which each line carries a
with 1 the collection of trees with lines marked 0, 1, scale label n and there are no pairs of lines which
or 2 and in which no pairs of lines with equal follow each other and which have only lines of
momentum appear to follow each other if between lower scale in between. Then the Siegel argument
them there are only lines marked 0 or 1. applies once more and the series so resumed is an
Insertion of connected clusters  of such lines on a absolutely convergent series of functions analytic in
line l0 of
1 leads to define a matrix M(1) formed by ": hence the original series is convergent.
summing tree values of clusters  with lines of scales Although at each step there is a lower bound on the
0 or 1 evaluated with the line factors defined in denominators, it would not be possible to avoid using
[107] and with the restriction that in  there are no Siegels theorem. In fact, the lower bound would become
pairs of lines < 0 with the same current and which worse and worse as the scale increases. In order to check
32 Introductory Article: Classical Mechanics

the estimates of the constants D0 , "0 which control the Therefore, if H = Hk is directed along the k-axis,
scale independence of the convergence of the various the acceleration it produces is the same that the
series, it is necessary to take advantage of the theorem, Coriolis force would impress on a unit mass located
and of the absence (at each step) of the necessity of in a reference frame which rotates with angular
considering trees with pairs of consecutive lines with velocity !0 k around the k-axis if H = 2!0 k.
equal momentum and intermediate lines of higher scale. The above remarks imply that a homogeneous
One could also perform the analysis by bounding sphere electrically charged uniformly with a unit
h(k) order by order with no resummations (i.e., charge and freely pivoting about its center in a
without changing the line factors) and exhibiting the constant magnetic field H directed along the k-axis
necessary cancellations. Alternatively, the paths that undergoes the same motion as it would follow if not
Kolmogorov, Arnold and Moser used to prove subject to the magnetic field but seen in a
the first three (somewhat different) versions of the noninertial reference frame rotating at constant
theorem, by successive approximations of the angular velocity !0 around the k-axis if H and !0
equations for the tori, can be followed. are related by H = 2!0 : in this frame, the Coriolis
The invariant tori are Lagrangian manifolds just force is interpreted as a magnetic field.
as the unperturbed ones (cf. comments after [31]) This holds, however, only if the centrifugal force
and, in the case of the Hamiltonian [80], the has zero moment with respect to the center: true in
generating function A  y (A, y ) can be the spherical symmetry case only. In spherically
expressed in terms of their parametric equations nonsymmetric cases, the centrifugal forces have in
general nonzero moment, so the equivalence
A; y Gy a  y hy  A  w  hy between Coriolis force and the Lorentz force is
def
y Gy  hy h y y h y  a only approximate.

Z The Larmor theorem makes this more precise. It
def dy 109 gives a quantitative estimate of the difference between
a hy h y y h y

2 the motion of a general system of particles of mass m
Z
dy in a magnetic field and the motion of the same
h y y h y

2 particles in a rotating frame of reference but in the
absence of a magnetic field. The approximation is
where  = (w  y ) and the invariant torus corre- estimated in terms of the size of the Larmor frequency
sponds to A0 = w in the map a = y A F(A, y ) and eH=2mc, which should be small compared to the
A0 = A y (A, y ). In fact, by [109] the latter other characteristic frequencies of the motion of the
becomes A0 = A  h and, from the second of [75] system: the physical meaning is that the centrifugal
written for f depending only on the angles a, it is force should be small compared to the other forces.
A = w h when A, a are on the invariant torus. The vector potential A for a constant magnetic
Note that if a exists it is necessarily determined by the field in the k-direction, H = 2!0 k, is A = 2!0 k ^ R 
third relation in [109] but the check that the second 2!0 R? . Therefore, from the treatment of the Coriolis
equation in [109] is soluble (i.e., that the RHS is an exact force in the section Three-body problem (see
gradient up to a constant) is nontrivial. The canonical [95]), the motion of a charge e with mass m in a
map generated by A  y F(A, y ) is also defined for A0 magnetic field H with vector potential A and subject
close to w and foliates the neighborhood of the invariant to other forces with potential W can be described, in
torus with other tori: of course, for A0 6 w the tori an inertial frame and in generic units, in which the
defined in this way are, in general, not invariant. speed of light is c, by a Hamiltonian
The reader is referred to Gallavotti et al. (2004)
for more details. 1  e 2
H p  A WR 110
2m c
where p = mR_ (e=c)A and R are canonically con-
Appendix 2: Coriolis and Lorentz jugate variables.
Forces Larmor Precession
Larmor precession refers to the motion of an
electrically charged particle in a magnetic field H Further Reading
(in an inertial frame of reference). It is due to the
Arnold VI (1989) Mathematical Methods of Classical Mechanics.
Lorentz force which, on a unit mass with unit Berlin: Springer.
charge, produces an acceleration R = v ^ H if the Calogero F and Degasperis A (1982) Spectral Transform and
speed of light is c = 1. Solitons. Amsterdam: North-Holland.
Introductory Article: Differential Geometry 33

Chierchia L and Valdinoci E (2000) A note on the construction of Landau LD and Lifshitz EM (1976) Mechanics. New York:
Hamiltonian trajectories along heteroclinic chains. Forum Pergamon Press.
Mathematicum 12: 247255. Levi-Civita T (1956) Opere Matematiche. Accademia Nazionale
Fasso F (1998) Quasi-periodicity of motions and complete dei Lincei. Bologna: Zanichelli.
integrability of Hamiltonian systems. Ergodic Theory and Moser J (1962) On invariant curves of an area preserving
Dynamical Systems 18: 13491362. mapping of the annulus. Nachricten Akademie Wissenschaften
Gallavotti G (1983) The Elements of Mechanics. New York: Gottingen 11: 120.
Springer. Nekhorossev V (1977) An exponential estimate of the time of
Gallavotti G, Bonetto F, and Gentile G (2004) Aspects of the stability of nearly integrable Hamiltonian systems. Russian
Ergodic, Qualitative and Statistical Properties of Motion. Mathematical Surveys 32(6): 165.
Berlin: Springer. Poincare H (1987) Methodes nouvelles de la mecanique celeste
Kolmogorov N (1954) On the preservation of conditionally vol. I. Paris: Gauthier-Villars. (reprinted by Gabay, Paris,
periodic motions. Doklady Akademia Nauk SSSR 96: 1987).
527530.

Introductory Article: Differential Geometry


S Paycha, Universite Blaise Pascal, Aubiere, France Differential geometry appeared later in the eight-
2006 Elsevier Ltd. All rights reserved.
eenth century with the works of Euler Recherches
sur la courbure des surfaces (1760) (Investigations
on the curvature of surfaces) and Monge Une
Differential geometry is the study of differential application de lanalyse a la geometrie (1795) (An
properties of geometric objects such as curves, application of analysis to geometry). Until Gauss
surfaces and higher-dimensional manifolds endowed fundamental article Disquisitiones generales circa
with additional structures such as metrics and superficies curvas (General investigations of curved
connections. One of the main ideas of differential surfaces) published in Latin in 1827 (of which one
geometry is to apply the tools of analysis to can find a partial translation to English in Spivak
investigate geometric problems; in particular, it (1979)), surfaces embedded in R3 were either
studies their infinitesimal parts, thereby lineariz- described by an equation, W(x, y, z) = 0, or by
ing the problem. However, historically, geometric expressing one variable in terms of the others.
concepts often anticipated the analytic tools Although Euler had already noticed that the
required to define them from a differential geometric coordinates of a point on a surface could be
point of view; the notion of tangent to a curve, for expressed as functions of two independent variables,
example, arose well before the notion of derivative. it was Gauss who first made a systematic use of such
In its barely more than two centuries of existence, a parametric representation, thereby initiating the
differential geometry has always had strong (often concept of local chart which underlies differential
two-way) interactions with physics. Just to name a geometry.
few examples, the theory of curves is used in
kinematics, symplectic manifolds arise in Hamilto- Differentiable Manifolds
nian mechanics, pseudo-Riemannian manifolds in
general relativity, spinors in quantum mechanics, Lie The actual notion of n-manifold independent of a
groups and principal bundles in gauge theory, and particular embedding in a Euclidean space goes back
infinite-dimensional manifolds in the path-integral to a lecture Uber die Hypothesen, welche der
approach to quantum field theory. Geometrie zu Grunde liegen (On the hypotheses
which lie at the foundations of geometry) (of which
one can find a translation to English and comments
in Spivak (1979)) delivered by Riemann at Gottingen
Curves and Surfaces
University in 1854, in which he makes clear the
The study of differential properties of curves and fact that n-manifolds are locally like n-dimensional
surfaces resulted from a combination of the coordi- Euclidean space. In his work, Riemann mentions
nate method (or analytic geometry) developed by the existence of infinite-dimensional manifolds,
Descartes and Fermat during the first half of the such as function spaces, which today play an
seventeenth century and infinitesimal calculus devel- important role since they naturally arise as config-
oped by Leibniz and Newton during the second half uration spaces in quantum field theories.
of the seventeenth and beginning of the eighteenth In modern language a differentiable manifold
century. modeled on a topological space V (which can be
34 Introductory Article: Differential Geometry

finite dimensional, Frechet, Banach, or Hilbert for m 2 M and for any X, Y 2 Tm M,  2 R so that
example) is a topological space M equipped with a vector fields on M build a linear space.
family of local coordinate charts (Ui , i )i2I such that the One can generate tangent vectors to M via local
open subsets Ui  M cover M and where i : Ui ! V, one-parameter groups of differentiable transforma-
i 2 I, are homeomorphisms which give rise to smooth tions of M, that is, mappings (t, m) 7! t (m) from
transition maps i  1 j : j (Ui \ Uj ) ! i (Ui \ Uj ). ], [  U to U (with  > 0 and U  M an
An n-dimensional differentiable manifold is a differ- open subset of M) such that 0 = Id, ts = t  s
entiable manifold modeled P on Rn . The sphere 8s, t 2 ], [ with t s 2 ], [ and m 7! t (m) is a
n n
Sn1 := {(x1 , . . . , xn ) 2 R , i = 1 x2i = 1} is a differenti- diffeomorphism of U onto an open subset t (U).
able manifold of dimension n  1. The tangent vector at t = 0 to the curve (t) = t (m)
Simple differentiable curves in Rn are one- yields a tangent vector to M at point m = (0).
dimensional differentiable manifolds locally speci- Conversely, when M is finite dimensional, the
fied by coordinates x(t) = (x1 (t), . . . , xn (t)) 2 R n , fundamental theorem for systems of ordinary
where t 7! xj (t) is of class Ck . The tangent at point equations yields, for any vector field X on M, the
x(t0 ) to such a curve, which is a straight line passing existence (around any point m 2 M) of a
through this point with direction given by the vector local one-parameter group of local transformations
x0 (t0 ), generalizes to the concept of tangent space  :], [  U ! M (with U an open subset contain-
Tm M at point m 2 M of a smooth manifold M ing m) which induces the tangent vector
modeled on V which is a vector space isomorphic to X(m) 2 Tm M.
V spanned by tangent vectors at point m to curves A differentiable mapping  : M ! N induces a map
(t) of class C1 on M such that (t0 ) = m.  (m) : Tm M ! T(m) M defined by  Xf = X(f  ).
In order to make this more precise, one needs the An immersion of a manifold M in a manifold N is a
notion of differentiable mapping. Given two differ- differentiable mapping  : M ! N such that the maps
entiable manifolds M and N, a mapping f : M ! N  (m) are injective at any point m 2 M. Such a map is
is differentiable at point m if, for every chart (U, ) an embedding if it is moreover injective in which case
of M containing m and every chart (V, ) of N such (M)  N is a submanifold of N. The unit sphere Sn
that f (U)  V, the mapping  f  1 : (U) ! (V) is a submanifold of Rn1 . Whitney showed that every
is differentiable at point (m). In particular, differenti- smooth real n-dimensional manifold can be embedded
able mappings f : M ! R form the algebra C1 (M, R) in R2n1 .
of smooth real-valued functions on M. Differentiable A differentiable manifold whose coordinate charts
mappings  : [a, b] ! M from an interval [a, b]  R to take values in a complex vector space V and whose
a differentiable manifold M are called differentiable transition maps are holomorphic is called a complex
curves on M. A differentiable mapping f : M ! N manifold, which is complex n-dimensional if V = Cn .
which is invertible and with differentiable inverse The complex projective space CPn , the union of
f 1 : N ! M is called a diffeomorphism. complex straight lines through 0 in Cn1 , is a
The derivative of a function f 2 C 1 (M, R) along compact complex manifold of dimension n. Similarly
a curve  : [a, b] ! M at point (t0 ) 2 M with t0 2 to the notion of differentiable mapping between
[a, b] is given by differentiable manifolds, we have the notion of
d holomorphic mapping between complex manifolds.
Xf : f  t A smooth family m 7! Jm of endomorphisms of the
dtjtt0
tangent spaces Tm M to a differentiable manifold M such
and the map f 7! Xf is called the tangent vector to 2
that Jm = Id gives rise to an almost-complex manifold.
the curve  at point (t0 ). Tangent vectors to some The prototype is the almost-complex structure on Cn
curve  : [a, b] ! M at a given point m 2 ([a, b]) defined by J(@xi ) = @yi ; J(@yi ) = @xi with z = (x1
form a vector space Tm M called the tangent space iy1 , . . . , xn iyn ) 2 Cn which can be transferred to a
to M at point m. complex manifold M by means of local charts. An
A (smooth) map which, to a point m 2 M, assigns almost-complex structure J on a manifold M is called
a tangent vector X 2 Tm M is called a (smooth) complex if M is the underlying differentiable manifold
vector field. It can also be seen as a derivation of a complex manifold which induces J in this way.
~ : f 7! Xf on C1 (M, R) defined by (Xf
X ~ )(m) := Studying smooth functions on a differentiable
X(m)f for any m 2 M and the bracket of vector manifold can provide information on the topology
fields is thereby defined from the operator bracket of the manifold: for example, the behavior of a
gY] := X
[X, ~ Y~ Y~  X.
~ The linear operations on smooth function on a compact manifold as its
tangent vectors carry out to vector fields (X critical points strongly restricted by the topological
Y)(m) := X(m) Y(m), (X)(m) := X(m) for any properties of the manifold. This leads to the Morse
Introductory Article: Differential Geometry 35

critical point theory which extends to infinite- Lobatchevsky in 1829 and Bolyai in 1832. Non-
dimensional manifolds and, among other conse- Euclidean geometries actually played a major role in
quences, leads to conclusions on extremals or closed the development of differential geometry and Loba-
extremals of variational problems. Rather than chevskys work inspired Riemann and later Klein.
privileging points on a manifold, one can study Dropping the positivity assumption for the
instead the geometry of manifolds from the point of bilinear forms gm on Tm M leads to Lorentzian
view of spaces of functions, which leads to an manifolds which are (n 1)-dimensional smooth
algebraic approach to differential geometry. The manifolds equipped with bilinear forms on the
initial concept there is a commutative ring (which tangent spaces with signature (1, n). These occur in
becomes a possibly noncommutative algebra in the general relativity and tangent vectors with negative,
framework of noncommutative geometry), namely positive, or vanishing squared length are called
the ring of smooth functions on the manifold, while timelike, spacelike, and lightlike, respectively.
the manifold itself is defined in terms of the ring as the Just as complex vector spaces can be equipped with
space of maximal ideals. In particular, this point of positive-definite Hermitian products, a complex
view proves to be fruitful to understand supermani- manifold M can come equipped with a Hermitian
folds, a generalization of manifolds which is impor- metric, namely a positive-definite Hermitian product
tant for supersymmetric field theories. hm on Tm M for every point m 2 M depending
One can further consider the sheaf of smooth smoothly on the point m; every Hermitian metric
functions on an open subset of the manifold; this induces a Riemannian one given by its real part. The
point of view leads to sheaf theory which provides a complex projective space CPn comes naturally
unified approach to establishing connections between equipped with the FubiniStudy Hermitian metric.
local and global properties of topological spaces.
Transformation Groups
Metric Properties Metric properties can be seen from the point of view
Riemann focused on the metric properties of manifolds of transformation groups. Poncelet in his Traite
but the first clear formulation of the concept of a projectif des figures (1822) had investigated classical
manifold equipped with a metric was given by Weyl in Euclidean geometry from a projective geometric
Die Idee der Riemannsche Flache. A Riemannian point of view, but it was not until Cayley (1858)
metric on a differentiable manifold M is a positive- that metric properties were interpreted as those
definite scalar product gm on Tm M for every point stable under any projective transformation which
m 2 M depending smoothly on the point m. A manifold leaves cyclic points (points at infinity on the
equipped with a Riemannian metric is called a imaginary axis of the complex plane) invariant.
Riemannian manifold. A Weyl transformation, which Transformation groups were further investigated by
is multiplying the metric by a smooth positive function, Lie, leading to the modern concept of Lie group, a
yields a new Riemannian metric with the same angle smooth manifold endowed with a group structure
measurement as the original one, and hence leaves the such that the group operations are smooth.
conformal structure on M unchanged. A vector field X on a Lie group G is called left-
Riemann also suggested considering metrics on (resp. right-) invariant if it is invariant under left
the tangent spaces that are not induced from scalar translations Lg : h 7! gh (resp. right translations
products; metrics on the manifold built this way Rg : h 7! hg) for every g 2 G, that is, if (Lg ) X(h) =
were first systematically investigated by Finsler and X(gh) 8(g, h) 2 G2 (resp. (Rg ) X(h) = X(gh) 8(g, h)
are therefore called Finsler metrics. Geodesics on a 2 G2 ). The set of all left-invariant vector fields
Riemannian manifold M which correspond to equipped with the sum, scalar multiplication, and
smooth curves  : [a, b] ! M that minimize the the bracket operation on vector fields form an
length functional algebra called the Lie algebra of G.
The group Gln (R) (resp. Gln (C)) of all real (resp.
Z s
 
complex) invertible n  n matrices is a Lie group
1 b d d
L : gt ; dt with Lie algebra, the algebra gln (R) (resp. gln (C)) of
2 a dt dt
all real (resp. complex) n  n matrices and the
then generalize to curves which realize the shortest bracket operation reads [A, B] = AB  BA.
distance between two points chosen sufficiently close. The orthogonal (resp. unitary) group On (R) :=
Euclids axioms which naturally lead to Rieman- {A 2 Gln (R), At A = 1}, where At denotes the trans-
nian geometry are also satisfied up to the axiom posed matrix (resp. Un (C) := {A 2 Gln (C), A A = 1},
of parallelism by a geometry developed by where A = A  t ), is a compact Lie group with Lie
36 Introductory Article: Differential Geometry

algebra on (R) := {A 2 Gln (R), At = A} (resp. un (C) := space. Smooth sections of E are maps  : B ! E such
{A 2 Gln (C), A = A}). that    = IB .
A left-invariant vector field X on a finite-dimen- When F is a vector space and when, given open
sional Lie group G (or equivalently an element X of subsets Ui  B that cover B with corresponding
the Lie algebra of G) generates a global one- coordinate charts (Ui , i )i2I , the local diffeomorph-
parameter group of transformations X (t), t 2 R. isms i : 1 (Ui ) i (Ui )  F give rise to transition
The mapping from the Lie algebra of G into G maps i  j1 : j (Ui \ Uj )  F ! i (Ui \ Uj )  F that
defined by exp(X) := X (1) is called the exponential are linear in the fiber, the bundle is S called a vector
mapping. The exponential mapping on Gl Pn (R) (resp. bundle. The tangent bundle TM = m2M Tm M to a
Gln (C)) is given by the series exp (A) = 1 i
i = 0 A =i!. differentiable manifold M modeled on a vector space
As symmetry groups of physical systems, Lie V is a vector bundle with typical fiber V and
groups play an important role in physics, in transition maps ij = (i  1 1
j , d(i  j )) expressed
particular in quantum mechanics and YangMills in terms of the differentials of the transition maps on
theory. Infinite-dimensional Lie groups arise as the manifold M. So are the cotangent bundle, the
symmetry groups, such as the group of diffeomorph- dual of the tangent bundle, and tensor products of
isms of a manifold in general relativity, the group of the tangent and cotangent vector bundles with
gauge transformations in YangMills theory, and typical fiber the dual V  and tensor products of V
the group of Weyl transformations of metrics on a and V  . Vector fields defined previously are sections
surface in string theory. The principle the physics of the tangent bundle, 1-forms on M are sections of
should not depend on how it is described translates the cotangent bundle, and contravariant tensors,
to an invariance under the action of the (possibly resp. covariant tensors are sections of tensor
infinite-dimensional group) of symmetries of the products of the tangent, resp. cotangent bundles. A
theory. Anomalies arise when such an invariance differentiable mapping  : M ! N takes covariant
holds for the classical action of a physical theory but p-tensor fields on N to their pullbacks by ,
breaks at the quantized level. covariant p-tensors on M given by
In his Erlangen program (1872), Klein puts the
 TX1 ; . . . ; Xp := T X1 ; . . . ;  Xp
concept of transformation group in the foreground
introducing a novel idea by which one should for any vector fields X1 , . . . , Xp on M.
consider a space endowed with some properties Differentiating a smooth function f on M gives
as a set of objects invariant under a given group of rise to a 1-form df on M. More generally, exterior p-
transformations. One thereby reaches a classifica- forms are antisymmetric smooth covariant p-tensors
tion of geometric results according to which group is so that !(X(1) , . . . , X(p) ) = ()!(X1 , . . . , Xp ) for
relevent in a particular problem as, for example, the any vector fields X1 , . . . , Xp on M and any permuta-
projective linear group for projective geometry, tion  2 p with signature ().
the orthogonal group for Riemannian geometry, or Riemannian metrics are covariant 2-tensors and
the symplectic group for symplectic geometry. the space of Riemannian metrics on a manifold M is
an infinite-dimensional manifold which arises as a
configuration space in string theory and general
Fiber Bundles
relativity.
Transformation groups give rise to principal fiber A principal bundle is a fiber bundle (P, , B) with
bundles which play a major role in YangMills typical fiber a Lie group G acting freely and properly
theory. The notion of fiber bundle first arose out of on the total space P via a right action (p, g) 2
questions posed in the 1930s on the topology and the P  G 7! pg = Rg (p) 2 P and such that the local
geometry of manifolds, and by 1950 the definition of diffeomorphisms 1 (U) U  G are G-equivariant.
fiber bundle had been clearly formulated by Steenrod. Given a principal fiber bundle (P, , B) with structure
A smooth fiber bundle with typical fiber a group a finite-dimensional Lie group G, the action of
manifold F is a triple (E, , B), where E and B are G on P induces a homomorphism which to an
smooth manifolds called the total space and the base element X of the Lie algebra of G assigns a vector
space, and  : E ! B is a smooth surjective map field X on P called the fundamental vector field
called the projection of the bundle such that the generated by X. It is defined at p 2 P by
preimage 1 (b) of a point b 2 B called the fiber of
the bundle over b is isomorphic to F and any base d
X p : RexptX p
point b has a neighborhood U  B with preimage dtjt0
1 (U) diffeomorphic to U  F, where the diffeo-
mophisms commute with the projection on the base where exp is the exponential map on G.
Introductory Article: Differential Geometry 37

Given an action of G on a vector space V, one the group G) decomposition of the tangent space
builds from a principal bundle with typical fiber G an Tp P = Hp P  Vp P at each point p into a horizontal
associated vector bundle with typical fiber V. space Hp P and the vertical space Vp P = Ker dp ,
Principal bundles are essential in gauge theory; U(1)- gives rise to a linear connection on the associated
principal bundles arise in electro-magnetism and vector bundle.
nonabelian structure groups arise in YangMills A connection on P gives rise to a 1-form ! on P
theory. There the fields are connections on the with values in the Lie algebra of the structure group
principal bundle, and the action of gauge transforma- G called the connection 1-form and defined as
tions on (irreducible) connections gives rise to an follows. For each X 2 Tp P, !(X) is the unique
infinite-dimensional principal bundle over the moduli element U of the Lie algebra of G such that the
space with structure group given by gauge transfor- corresponding fundamental vector field U (p) at
mations. Infinite-dimensional bundles arise in other point p coincides with the vertical component of X.
field theories such as string theory where the moduli In particular, !(U ) = U for any element U of the Lie
space corresponds to inequivalent complex structures algebra of G.
on a Riemann surface and the infinite-dimensional The space of connections which is an infinite-
structure group is built up from Weyl transformations dimensional manifold arises as a configuration space
of the metric and diffeomorphisms of the surface. in YangMills theory and also comes into play in the
SeibergWitten theory.
Connections
On a manifold there is no canonical method to Geometric Differential Operators
identify tangent spaces at different points. Such an
From connections one defines a number of differ-
identification, which is needed in order to differenti-
ential operators on a Riemannian manifold, among
ate vector fields, can be achieved on a Riemannian
them second-order Laplacians. In particular, the
manifold via parallel transport of the vector fields.
LaplaceBeltrami operator f 7! tr(rTM df ) on
The basic concepts of the theory of covariant 
smooth functions, where rT M is the connection on
differentiation on a Riemannian manifold were given
the cotangent bundle induced by the Levi-Civita
at the end of the nineteenth century by Ricci and, in a
connection on M, generalizes the ordinary Laplace
more complete form, in 1901 in collaboration with
operator on Euclidean space. This in turn generalizes
Levi-Civita in Methodes de calcul differentiel absolu et 
to second-order operators E := tr(rT ME rE )
leurs applications; on a Riemannian manifold, it is
acting on smooth sections of a vector bundle E over
possible to define in a canonical manner a parallel
a Riemannian manifold M, where rE is a connection
displacement of tangent vectors and thereby to 
on E and rT ME the connection on T  M  E
differentiate vector field covariantly using the since
induced by rE and the Levi-Civita connection on M.
then called Levi-Civita connection.
The Dirac operator on a spin Riemannian
More generally, a (linear) connection (or equiva-
manifold, a first-order differential operator whose
lently a covariant derivation) on a vector bundle E
square coincides with the LaplaceBeltrami opera-
over a manifold M provides a way to identify fibers
tor up to zeroth-order terms, can be best under-
of the vector bundle at different points; it is a map r
stood going back to the initial idea of Dirac. A
taking sections  of E to E-valued 1-forms on M
first-order differentialP operator with constant
which satisfies a Leibniz rule, r(f ) = df  f r, n
matrix coefficients i = 1 i (@=@x
Pi) has square
for any smooth function f on M. When E is the
given by the Laplace operator  ni= 1 @ 2 =@x2i on
tangent bundle over M, curves  on the manifold
Rn if and only if its coefficients satisfy the the
with covariantly constant velocity r(t)_ = 0 give rise
Clifford relations
to geodesics. Given an initial velocity (0) _ =X 2
Tm M and provided X has small enough norm, X (1) i2 1 8 i 1; . . . ; n
defines a point on the corresponding geodesic and
i j j i 0 8 i 6 j
the map exp : X 7! X (1) a diffeomorphism from a
neighborhood of 0 in Tm M to a neighborhood of The resulting Clifford algebra, once complexified, is
m 2 M called the exponential map of r. isomorphic in even dimensions n = 2k to the space
The concept of connection extends to principal End(Sn ) (and End(Sn )  End(Sn ) in odd dimensions
k
bundles where it was developed by Ehresmann n = 2k 1) of endomorphisms of the space Sn = C2
building on the work of Cartan. A connection on a of complex n-spinors. When instead of the canoni-
principal bundle (P, , B) with structure group G, cal metric on Rn one starts from the the metric on
which is a smooth equivariant (under the action of the tangent bundle TM induced by the Riemannian
38 Introductory Article: Differential Geometry

metric on M and provided the corresponding spinor Riemannian curvature tensor, a 4-tensor which in
spaces patch up to a spinor bundle over M, M is local coordinates reads
called a spin manifold. The Dirac operator on a    
spin Riemannian manifold M is a first-order @ @ @ @
Rijkl : g  ; ;
differential
P operator acting on spinors given by @i @j @k @l
Dg = ni= 1 i rei , where r is the connection
on spinors (sections of the spinor bundle S) induced further taking a partial trace leads to the P Ricci
by the Levi-Civita connection and e1 , . . . , en is curvature given by the 2-tensor Ricij = k Rikjk ,
an orthonormal frame of the tangent bundle TM. the trace ofPwhich gives in turn the scalar cur-
This is a particular case of more general twisted vature R = i Ricii . Sectional curvature at a point
Dirac operators DW g on a twisted spinor bundle m in the direction of a two-dimensional plane
S  W equipped with the connection rSW which spanned by two vectors U and V corresponds to
combines the connection r with a connection rW K(U, V) = g((U, V)V, U). A manifold has constant
on an auxilliary vector bundle W. Their square sectional curvature whenever K(U, V)=kU ^ Vk2 is a
2
(DWg ) relates to the Laplacian 
SW
built from this constant K for all linearly independent vectors U,V.
twisted connection via the Lichnerowicz formula A Riemannian manifold with constant sectional
which is useful for estimates on the spectrum of the curvature is said to be spherical, flat, or hyperbolic
Dirac operator in terms of the underling geometric type depending on whether K > 0, K = 0, or K < 0,
data. respectively. One owes to Cartan the discovery of an
When there is no spin structure on M, one can still important class of Riemannian manifolds, symmetric
hope for a Spinc structure and a Dirac Dc operator spaces, which contains the spheres, the Euclidean
associated with a connection compatible with that spaces, the hyperbolic spaces, and compact Lie
structure. In particular, every compact orientable groups. A connected Riemannian manifold M
4-manifold can be equipped with a Spinc structure equipped at every point m with an isometry m
and one can build invariants of the differentiable such that m (m) = m and the tangent map Tm m
manifold called SeibergWitten invariants from equals Id on the tangent space (it therefore reverses
solutions of a system of two partial differential the geodesics through m) is called symmetric. CPn
equations, one of which is the Dirac equation equipped with the FubiniStudy metric is a symmetric
Dc  = 0 associated with a connection compatible space with the isometry given by the reflection with
with the Spinc structure and the other a nonlinear respect to a line in Cn1 . A compact symmetric space
equation involving the curvature. has non-negative sectional curvature K.
Constraints on the curvature can have topological
consequences. Spheres are the only simply connected
manifolds with constant positive sectional curvature;
Curvature
if a simply connected complete Riemannian mani-
The concept of curvature, which is now under- fold of dimension >1 has non-positive sectional
stood in terms of connections (the curvature of a curvature along every plane, then it is homeo-
connection r is defined by  = r2 ), historically morphic to the Euclidean space.
arose prior to that of connection. In its modern A manifold with Ricci curvature tensor propor-
form, the concept of curvature dates back to Gauss. tional to the metric tensor is called an Einstein
Using a spherical representation of surfaces the manifold. Since Einstein, curvature is a cornerstone
Gauss map , which sends a point m of an oriented of general relativity with gravitational force being
surface   R3 to the outward pointing unit normal interpreted in terms of curvature. For example, the
vector m Gauss defined what is since then called vacuum Einstein equation reads Ricg = (1=2)Rg g with
the Gaussian curvature Km at point m 2 U   as Ricg the Ricci curvature of a metric g and Rg its scalar
the limit when the area of U tends to zero of the curvature. In addition, KaluzaKlein supergravity is a
ratio area( (U))=area(U). It measures the obstruc- unified theory modeled on a direct product of the
tion to finding a distance-preserving map from a Mikowski four-dimensional space and an Einstein
piece of the surface around m to a region in the manifold with positive scalar curvature.
standard plane. Gauss Teorema Egregium says that The Ricci flow dg(t)=dt = 2Ricg(t) , which is
the Gaussian curvature of a smooth surface in R3 is related with the Einstein equation in general
defined in terms of the metric on the surface so that relativity, was only fairly recently introduced in the
it agrees for two isometric surfaces. mathematical literature. Hopes are strong to get a
From the curvature  of a connection on a classification of closed 3-manifolds using the Ricci
Riemannian manifold (M, g), one builds the flow as an essential ingredient.
Introductory Article: Differential Geometry 39

Cohomology isomorphic to the space of harmonic (i.e., annihi-


lated by the LaplaceBeltrami operator) differential
Differentiation of functions f 7! df on a differenti-
forms. Thus, the dimension of the set of harmonic
able manifold M generalizes to exterior differentia-
k-forms equals the kth Betti numbers from which
tion
7! d
of differential forms. A form
is closed
one can define the Euler characteristic (M) of the
whenever it is in the kernel of d and it is exact
manifold M taking their alternate sum. Hodge
whenever it lies in the range of d. Since d2 = 0, exact
theory plays an important role in mirror symmetry
forms are closed.
which posits a duality between different manifolds
Cartans structure equations d! = (1=2)[!, !] 
on the geometric side and between different field
relate the exterior differential of the connection 1-form
theories via their correlation functions on the
! on a principal bundle to its curvature  given by
physics side. CalabiYau manifolds, which are
the exterior covariant derivative D! := d!  h, where
Ricci-flat Kahler manifolds, are studied extensively
h : Tp P ! Hp P is the projection onto the horizontal
in the context of duality.
space.
On a complex manifold, forms split into sums
of (p, q)-forms, those with p-holomorphic and
Index Theory
q-antiholomorphic components, and exterior differ-
entiation splits as d = @ @ into holomorphic and While the Gaussian curvature is the solution to a
antiholomorphic derivatives, with @ 2 = @ 2 = 0. local problem, it has strong influence on the global
Geometric data are often expressed in terms of topology of a surface. The GaussBonnet formula
closedness conditions on certain differential forms. (1850) relates the Euler characteristic on a closed
For example, a symplectic manifold is a manifold surface to the Gaussian curvature by
M equipped with a closed nondegenerate differential Z
1
2-form called the symplectic form. The theory of M Km dAm
2 M
J-holomorphic curves on a manifold equipped with
an almost-complex structure J has proved fruitful in where dAm is the volume element on M. This is the
building invariants on symplectic manifolds. A first result relating curvature to global properties
Kahler manifold is a complex manifold equipped and can be seen as one of the starting points for
with a Hermitian metric h whose imaginary part index theory. It generalizes to the ChernGauss
Im h yields a closed (1, 1)-form. The complex Bonnet theorem (1944) on an even-dimensional
projective space CPn is Kahler. closed manifold and can be interpreted as an
The exterior differentation d gives rise to de Rham example of the AtiyahSinger index theorem (1963)
cohomology as Ker d=Im d, and de Rhams theorem Z
establishes an isomorphism between de Rham coho- indDW ^ g etrW
A
g
mology and the real singular cohomology of a M

manifold. Chern (or characteristic) classes are topo- where g denotes a Riemannian metric on a spin
logical invariants associated to fiber bundles and play manifold M, DW g a Dirac operator acting on sections
a crucial role in index theory. ChernWeil theory of some twisted bundle S  W with S the spinor
builds representatives of these de Rham cohomology bundle on M and W an auxiliary vector bundle over
classes from a connection r of the form tr(f (r2 )), M, ind(DW g ) the index of the Dirac operator, and
where f is some analytic function. g , W respectively the curvatures of the Levi-Civita
When the manifold is Riemannian, the Laplace connection and a connection on W, and A( ^ g) a
Beltrami operator on functions generalizes to differ- ^
particular Chern form called the A-genus. Index
ential forms in two different ways, namely to the theorems are useful to compute anomalies in gauge

Bochner Laplacian T M on forms (i.e., sections of theories arising from functional quantisation of
T  M), where the contangent bundle T  M is classical actions.
equipped with a connection induced by the Levi-Civita Given an even-dimensional closed spin manifold
connection and to the LaplaceBeltrami operator on (M, g) and a Hermitian vector bundle W over M, the
forms (d d )2 = d d d d , where d is the (formal) index of the associated Dirac operator DW g yields the
adjoint of the exterior differential d. These are related so-called Atiyah map K0 (M) 7! Z defined by
via Weitzenbocks formula which in the particular case W 7! ind(DW 0
g ), where K (M) is the group of formal
of 1-forms states that the difference of those two differences of stable homotopy classes of smooth
operators is measured by the Ricci curvature. vector bundles over M. This is the starting point for
When the manifold is compact, Hodges theorem the noncommutative geometry approach to index
asserts that the de Rham cohomology groups are theory, in which the space of smooth functions on a
40 Introductory Article: Electromagnetism

manifold which arises here in a disguised from since Husemoller D (1994) Fibre Bundles, 3rd edn. Graduate Texts in
K0 (M) K0 (C1 (M)) (which consists of formal Mathematics 20. New York: Springer Verlag.
Jost J (1998) Riemannian Geometry and Geometric Analysis,
differences of smooth homotopy classes of idempo- Universitext. Berlin: Springer.
tents in the inductive limit of spaces of matrices Klingenberg W (1995) Riemannian Geometry, 2nd edn. Berlin: de
gln (C1 (M))) is generalized to any noncommutative Gruyter.
smooth algebra. Kobayashi S and Nomizou K (1996) Foundations of Differential
Geometry I, II. Wiley Classics Library, a Wiley-Interscience
Publication. New York: Wiley.
Further Reading Lang S (1995) Differential and Riemannian Manifolds, 3rd edn.
Graduate Texts in Mathematics, 160. New York: Springer
Bishop R and Crittenden R (2001) Geometry of Manifolds. Verlag.
Providence, RI: AMS Chelsea Publishing. Milnor J (1997) Topology from the Differentiate Viewpoint.
Chern SS, Chen WH, and Lam KS (2000) Lectures on Differential Princeton Landmarks in Mathematics. Princeton, NJ: Princeton
Geometry, Series on University Mathematics. Singapore: World University Press.
Scientific. Nakahara M (2003) Geometry, Topology and Physics, 2nd edn.
Choquet-Bruhat Y, de Witt-Morette C, and Dillard-Bleick M Bristol: Institute of Physics.
(1982) Analysis, Manifolds and Physics, 2nd edn. Amsterdam Spivak M (1979) A Comprehensive Introduction to Differential
New York: North Holland. Geometry, vols. 1, 2 and 3. Publish or Perish Inc., Wilmington,
Gallot S, Hulin D, and Lafontaine J (1993) Riemannian Geometry, Delaware.
Universitext. Berlin: Springer. Sternberg S (1983) Lectures on Differential Geometry, 2nd edn.
Helgason S (2001) Differential, Lie Groups and Symmetric Spaces. New York: Chelsea Publishing Co.
Graduate Studies in Mathematics 36. AMS, Providence, RI.

Introductory Article: Electromagnetism


N M J Woodhouse, University of Oxford, Oxford, UK that they generate. From these equations, one can
2006 Springer-Verlag. Published by Elsevier Ltd.
derive the familiar predictions of electrostatics and
All rights reserved. magnetostatics, as well as the dynamical behavior
of fields and charges, in particular, the generation
This article is adapted from Chapters 2 and 3 of Special
and propagation of electromagnetic waves light
Relativity, N M J Woodhouse, Springer-Verlag, 2002, by kind
permission of the publisher. waves.
Maxwell would not have recognized the equations
in this compact vector notation still less in the
tensorial form that they take in special relativity. It
is notable that although his contribution is univer-
Introduction sally acknowledged in the naming of the equations,
The modern theory of electromagnetism is built on it is rare to see references to Maxwells theory.
the foundations of Maxwells equations: This is for a good reason. In his early studies of
electromagnetism, Maxwell worked with elaborate
mechanical models, which he saw as analogies
div E 1
0 rather than as literal descriptions of the underlying
div B 0 2 physical reality. In his later work, the mechanical
models, in particular the mechanical properties of
1 @E the lumiferous ether through which light waves
curl B  0 J 3
c2 @t propagate, were put forward more literally as
@B the foundations of his electromagnetic theory. The
curl E 0 4 equations survive in the modern theory, but the
@t
mechanical models with which Maxwell, Faraday,
On the left-hand side are the electric and magnetic and others wrestled live on only in the survival of
fields, E and B, which are vector-valued functions archaic terminology, such as lines of force and
of position and time. On the right are the sources, magnetic flux. The luminiferous ether evaporated
the charge density , which is a scalar function of with the advent of special relativity.
position and time, and the current density J. The Maxwells legacy is not his theory, but his
source terms encode the distribution and velocities equations: a consistent system of partial differential
of charges, and the equations, together with equations that describe the whole range of known
boundary conditions at infinity, determine the fields interactions of electric and magnetic fields with
Introductory Article: Electromagnetism 41

moving charges. They unify the treatment of rest or in uniform motion. In the world of classical
electricity and magnetism by revealing for the first mechanics, therefore:
time the full duality between the electric and
Principle of Relativity There is no absolute stan-
magnetic fields. They have been verified over an
dard of rest; only relative motion is observable.
almost unimaginable variety of physical processes,
from the propagation of light over cosmological In his Dialogue concerning the two chief world
distances, through the behavior of the magnetic systems, Galileo illustrated the principle by arguing
fields of stars and the everyday applications in that the uniform motion of a ship on a calm sea does
electrical engineering and laboratory experiments, not affect the behavior of fish, butterflies, and other
down in their quantum version to the exchange moving objects, as observed in a cabin below deck.
of photons between individual electrons. Relativity theory takes the principle as funda-
The history of Maxwells equations is convoluted, mental, as a statement about the nature of space and
with many false turns. Maxwell himself wrote down time as much as about the properties of the
an inconsistent form of the equations, with a Newtonian equations of motion. But if it is to be
different sign for  in the first equation, in his given such universal significance, then it must apply
1865 work A dynamical theory of the electromag- to all of physics, and not just to Newtonian
netic field. The consistent form appeared later in dynamics. At first this seems unproblematic it is
his Treatise on Electricity and Magnetism (1873); hard to imagine that it holds at such a basic level,
see Chalmers (1975). but not for more complex physical interactions.
In this article, we shall not follow the historical Nonetheless, deep problems emerge when we try to
route to the equations. Some of the complex story of extend it to electromagnetism since Galilean invari-
the development hinted at in the remarks above can ance conflicts with Maxwells equations.
be found in the articles by Chalmers (1975), Siegel All appears straightforward for systems involving
(1985), and Roche (1998). Neither shall we follow slow-moving charges and slowly varying electric and
the traditional pedagogic route of many textbooks in magnetic fields. These are governed by laws that
building up to the full dynamical equations through appear to be invariant under transformations
the study of basic electrical and magnetic phenom- between uniformly moving frames of reference.
ena. Instead, we shall follow a path to Maxwells One can imagine a modern version of Galileos
equations that is informed by knowledge of their ship also carrying some magnets, batteries, semi-
most critical feature, invariance under Lorentz conductors, and other electrical components. Salvia-
transformations. Maxwell, of course, knew nothing tis argument for relativity would seem just as
of this. compelling.
We shall start with a summary of basic facts The problem arises when we include rapidly
about the behavior of charges in electric and varying fields in particular, when we consider the
magnetic fields, and then establish the full dynami- propagation of light. As Einstein (1905) put it,
cal framework by considering this behavior as seen Maxwells electrodynamics . . . , when applied to
from moving frames of reference. It is impossible, of moving bodies, leads to asymmetries which do not
course, to do this consistently within the framework appear to be inherent in the phenomena. The
of classical ideas of space and time since Maxwells central difficulty is that Maxwells equations give
equations are inconsistent with Galilean relativity. light, along with other electromagnetic waves, a
But it is at least possible to understand some of the definite velocity: in empty space, it travels with the
key features of the equations, in particular the need same speed in every direction, independently of the
for the term involving the time derivative of E, the motion of the source a fact that is incompatible
so-called displacement current, in the third of with Galilean invariance. Light traveling with speed
Maxwells equations. c in one frame should have speed c u in a frame
We shall begin with some remarks concerning the moving towards the source of the light with speed u.
role of relativity in classical dynamics. Thus, it should be possible for light to travel with
any speed. Light that travels with speed c in a frame
in which its source is at rest should have some other
Relativity in Newtonian Dynamics
speed in a moving frame; so Galilean invariance
Newtons laws hold in all inertial frames. The would imply dependence of the velocity of light on
formalism of classical mechanics is invariant under the motion of the source.
Galilean transformations and it is impossible to tell A full resolution of the conflict can only be
by observing the dynamical behavior of particles achieved within the special theory of relativity: here,
and other bodies whether a frame of reference is at remarkably, Maxwells equations retain exactly
42 Introductory Article: Electromagnetism

their classical form, but the transformations between EM2. A stationary point charge e generates an electric
the space and time coordinates of frames of field, but no magnetic field. The electric field is
reference in relative motion do not. The difference given by
appears when the velocities involved are not insig- ker
nificant when compared with the velocity of light. E 7
r3
So long as one can ignore terms of order u2 =c2 ,
Maxwells equations are compatible with the Gali- where r is the position vector from the charge,
lean principle of relativity. r = jrj, and k is a positive constant, analogous
to the gravitational constant.

Charges, Fields, and the By combining [7] and [5], we obtain an inverse-
Lorentz-Force Law square law electrostatic force
The basic objects in the modern form of electro- kee0
magnetic theory are 8
r2
 charged particles; and between two stationary charges; unlike gravity, it is
 the electric and magnetic fields E and B, which repulsive when the charges have the same sign.
are vector quantities that depend on position and
time. EM3. A point charge moving with velocity v gen-
erates a magnetic field
The charge e of a particle, which can be positive
or negative, is an intrinsic quantity analogous k0 ev ^ r
B 9
to gravitational mass. It determines the strength r3
of the particles interaction with the electric where k0 is a second positive constant.
and magnetic fields as its mass determines
the strength of its interaction with gravitational This is extrapolated from measurements of the
fields. magnetic field generated by currents flowing in
The interaction is in two directions. First, electric electrical circuits.
and magnetic fields exert a force on a charged The constants k and k0 in EM2 and EM3
particle which depends on the value of the charge, determine the strengths of electric and magnetic
the particles velocity, and the values of E and B at interactions. They are usually denoted by
the location of the particle. The force is given by the 1 0
Lorentz-force law k ; k0 10
40 4
f eE u ^ B 5 Charge e is measured in coulombs, jBj in teslas, and
jEj in volts per meter. With other quantities in SI units,
in which e is the charge and u is the velocity. It is
analogous to the gravitational force 0 8:9  1012 ; 0 1:3  106 11

f mg 6 The charge of an electron is 1.6  1019 C; the


current through an electric fire is a flow
on a particle of mass m in a gravitational field g. It is of 510 C s1 . The earths magnetic field is about
through the force law that an observer can, in 4  105 T; a bar magnets is about 1 T; there is a
principle, measure the electric and magnetic fields at field of about 50 T on the second floor of the
a point, by measuring the force on a standard charge Clarendon Laboratory in Oxford; and the magnetic
moving with known velocity. field on the surface of a neutron star is about 108 T.
Second, moving charges generate electric and Although we are more aware of gravity in every-
magnetic fields. We shall not yet consider in detail day life, it is very much weaker than the electrostatic
the way in which they do this, beyond stating the force the electrostatic repulsion between two
following basic principles. protons is a factor of 1.2  1036 greater than their
gravitational attraction (at any separation, both
EM1. The fields depend linearly on the charges.
forces obey the inverse-square law).
This means that if we superimpose two distributions Our aim is to pass from EM1EM3 to Maxwells
of charge, then the resultant E and B fields are the equations, by replacing [7] and [9] by partial
sums of the respective fields that the two distribu- differential equations that relate the field strengths
tions generate separately. to the charge and current densities  and J of a
Introductory Article: Electromagnetism 43

continuous distribution of charge. The densities are volume V between S and a small sphere SR to
defined as the limits deduce that
P  P  Z Z Z
e ev E  dS  E  dS E  dS 0
 lim ; J lim 12
V!0 V V!0 V S SR @V

where V is a small volume containing the point, e is and that the integrals of E over S and SR are the
a charge within the volume, and v is its velocity; the same. Therefore,
sums are over the charges in V and the limits are Z (
e=0 if the charge is in
taken as the volume is shrunk (although we shall not E  dS the volume bounded by S
worry too much about the precise details of the S
0 otherwise
limiting process).
When we sum over a distribution of charges,
the integral on the left picks out the total charge
within S. Therefore, we have the Gauss theorem.
Stationary Distributions of Charge
The Gauss theorem. For any closed surface @V
We begin the task of converting the basic principles bounding a volume V,
into partial differential equations by looking at the Z
electric field of a stationary distribution of charge, E  dS Q=0
where the passage to the continuous limit is made by @V
using the Gauss theorem to restate the inverse- where E is the total electric field and Q is the total
square law. charge within V.
The Gauss theorem relates the integral of the
electric field over a closed surface to the total charge Now we can pass to the continuous limit. Suppose
contained within it. For a point charge, the electric that E is generated by a distribution of charges with
field is given by EM2: density  (charge per unit volume). Then by the
Gauss theorem,
er Z Z
E 1
40 r3 E  dS  dV
@V 0 V
Since div r = 3 and grad r = r=r, we have
    for any volume V. But then, by the divergence
er e 3 3r  r theorem,
divE div  0 Z
0 r3 40 r3 r5
div E  =0 dV 0
everywhere except at r = 0. Therefore, by the V
divergence theorem, Since this holds for any volume V, it follows that
Z
div E =0 14
E  dS 0 13
@V By an argument in a similar spirit, we can also
for any closed surface @V bounding a volume V that show that the electric field of a stationary distribu-
does not contain the charge. tion of charge is conservative in the sense that the
What if the volume does contain the charge? total work done by the field when a charge is moved
Consider the region bounded by the sphere SR of around a closed loop vanishes; that is,
radius R centered on the charge; SR has outward I
unit normal r=r. Therefore, E  ds 0
Z Z
e e for any closed path. This is equivalent to
E  dS 2
dS
SR 4R 0 SR 0 curl E 0 15
In particular, the value of the surface integral on the since, by Stokes theorem,
left-hand side does not depend on R. I Z
Now consider arbitrary finite volume bounded by E  ds curl E  dS
S
a closed surface S. If the charge is not inside
the volume, then the integral of E over S vanishes where S is any surface spanning the path. This vanishes
by [13]. If it is, then we can apply [13] to the for every path and for every S if and only if [15] holds.
44 Introductory Article: Electromagnetism

The field of a single stationary charge is con- the right-hand side, by analogy with the charge
servative since density in [14].
e
E grad ; 
40 r Inconsistency with Galilean Relativity
and therefore curl E = 0 since the curl of a gradient Our central concern is the compatibility of the laws
vanishes identically. For a continuous distribution, of electromagnetism with the principle of relativity.
E = grad , where As Einstein observed, simple electromagnetic inter-
Z actions do indeed depend only on relative motion;
1 r 0
r dV 0 16 the current induced in a conductor moving through
40 r 0 2V jr  r 0 j
the field of a magnet is the same as that generated in
In the integral, r (the position of the point at which a stationary conductor when a magnet is moved past
 is evaluated) is fixed, and the integration is over it with the same relative velocity (Einstein 1905).
the positions r 0 of the individual charges. In spite of Unfortunately, this symmetry is not reflected in our
the singularity at r = r 0 , the integral is well defined. basic principles. We very quickly come up against
So, [15] also holds for a continuous distribution of contradictions if we assume that they hold in every
stationary charge. inertial frame of reference.
One emerges as follows. An observer O can measure
the values of B and E at a point by measuring the force
The Divergence of the Magnetic Field on a particle of standard charge, which is related to the
velocity v of the charge by the Lorentz-force law,
We can apply the same argument that established
the Gauss theorem to the magnetic field of a slow- f eE v ^ B
moving charge. Here, A second observer O0 moving relative to the first with
0 ev ^ r velocity v will see the same force, but now acting on a
B particle at rest. He will therefore measure the electric
4r3
field to be E0 = f =e. We conclude that an observer
where r is the vector from the charge to the point at
moving with velocity v through a magnetic field B and
which the field is measured. Since r=r3 = grad(1=r),
an electric field E should see an electric field
we have
   E0 E v ^ B 18
r 1
div v ^ 3 v ^ curl grad 0
r r By interchanging the roles of the two observers, we
should also have
Therefore, div B = 0 except at r = 0, as in the case of
the electric field. However, in the magnetic case, the E E 0  v ^ B0 19
integral of the field over a surface surrounding the where B0 is the magnetic field measured by the
charge also vanishes, since if SR is a sphere of radius second observer. If both are to hold, then B  B0
R centered on the charge, then must be a scalar multiple of v.
Z Z
0 e v^r r But this is incompatible with EM3; if the fields are
B  dS 3
 dS 0 those of a point charge at rest relative to the first
SR 4 SR r r
observer, then E is given by [7], and
By the divergence theorem, the same is true for any
surface surrounding the charge. We deduce that if B0
magnetic fields are generated only by moving On the other hand, the second observer sees the field
charges, then of a point charge moving with velocity v. Therefore,
Z
B  dS 0 0 ev ^ r
B0 
@V 4r3
for any volume V, and hence that So B  B0 is orthogonal to v, not parallel to it.
This conspicuous paradox is resolved, in part, by
div B 0 17
the realization that EM3 is not exact; it holds only
Of course, if there were free magnetic poles when the velocities are small enough for the
generating magnetic fields in the same way that magnetic force between two particles to be negli-
charges generate electric fields, then this would not gible in comparison with the electrostatic force. If v
hold; there would be a magnetic pole density on is a typical velocity, then the condition is that v2 0
Introductory Article: Electromagnetism 45

should be much less than 1=0 . That is, the velocities when we replace B by cB to put it into the same
involved should be much less than units as E). The magnetic fields generated by
currents in electrical circuits are not, however,
1
c p 3  108 m s1 dominated by large electric fields. This is because
0  0 the currents are created by the flow, at slow
This, of course, is the velocity of light. velocity, of electrons, while overall the matter in
the wire is roughly electrically neutral, with the
electric fields of the positively charged nuclei and
The Limits of Galilean Invariance negatively charged electrons canceling.
Our basic principles EM1EM3 must now be seen to This is the physical context to keep in mind in
be approximations they describe the interactions of the following deduction of Faradays law of
particles and fields when the particles are moving induction from Galilean invariance for velocities
relative to each other at speeds much less than that of much less than c. The law relates the electromotive
light. To emphasize that we cannot expect, in force or voltage around an electrical circuit
particular, EM3 to hold for particles moving at to the rate of change of the magnetic field B over
speeds comparable with c, we must replace it by a surface spanning the circuit. In its differential
form, the law becomes one of Maxwells
EM30 . A charge moving with velocity v, where v  c, equations.
generates a magnetic field Suppose first that the fields are generated by
0 ev ^ r charges all moving relative to a given inertial
B Ov2 =c2 20 frame of reference R with the same velocity v.
4r3
Then in a second frame R0 moving relative to R
The magnetic field of a system of charges in with velocity v, there is a stationary distribution of
general motion satisfies charge. If the velocity is much less than that of
div B 0 21 light, then the electric field E0 measured in R0 is
related to the electric and magnetic E and B
In the second part, we have retained [21] as a measured in R by
differential form of the statement that there are no
free magnetic poles; the magnetic field is generated E0 E v ^ B
only by the motion of the charges. With this change, Since the field measured in R0 is that of a stationary
the theory is consistent with the principle of distribution of charge, we have
relativity, provided that we ignore terms of order
v2 =c2 . The substitution of EM30 for EM3 resolves the curl E0 0
conspicuous paradox; the symmetry noted by Ein- In R, the charges are all moving with velocity v, so
stein between the current generated by the motion of their configuration looks exactly the same from the
the conductor in a magnetic field and by the motion point r at time t as it does from the point r v at
of a magnet past a conductor is explained, provided time t . Therefore,
that the velocities are much less than that of light.
The central problem remains however; the equa- Br v; t  Br; t
tions of electromagnetism are not invariant under Er v; t  Er; t
a Galilean transformation with velocity comparable
to c. The paradox is still there, but it is more subtle and hence by taking derivatives with respect to 
than it appeared to be at first. There are three at  = 0,
possible ways out: (1) the noninvariance is real and @B
has observable effects (necessarily of order v2 =c2 or v  grad B 0
@t 22
smaller); (2) Maxwells theory is wrong; or (3) the @E
Galilean transformation is wrong. Disconcertingly, v  grad E 0
@t
it is the last path that physics has taken. But that is
to jump ahead in the story. Our task is to complete So we must have
the derivation of Maxwells equations. 0 curl E0
curl E curlv ^ B
Faradays Law of Induction curl E v div B  v  grad B
The magnetic field of a slow-moving charge will @B
curl E 23
always be small in relation to its electric field (even @t
46 Introductory Article: Electromagnetism

since div B = 0. It follows that 0 ev ^ r


B Ov2 =c2
@B 4r3
curl E 0 24 where r is the vector from the charge to the point at
@t
which the field is measured. In the frame of reference
Equation [24] is linear in B and E; so by adding R0 in which the charge is at rest, its electric field is
the magnetic and electric fields of different streams
er
of charges moving relative to R with different E0
40 r3
velocities, we deduce that it holds generally for the
electric and magnetic fields generated by moving In the frame in which it is moving with velocity
charges. v, E = E0 O(v=c). Therefore,
Equation [24] encodes Faradays law of electro-  2
magnetic induction, which describes how changing v ^ E0 v ^ E v
cB O 2
magnetic fields can generate currents. In the static case c c c

@B By taking the curl of both sides, and dropping terms


0 of order v2 =c2 ,
@t
 
and the equation reduces to curl E = 0 the v^E
curlcB curl
condition that the electrostatic field should be c
conservative; that is, it should do no net work 1
when a charge is moved around a closed loop. v div E  v  grad E
c
More generally, consider a wire loop in the shape of
But
a closed curve . Let S be a fixed surface spanning .
Then we can deduce from eqn [24] that @E
I Z div E =0 ; v  grad E 
@t
E  ds curl E  dS
 S
by [22]. Therefore,
Z
@B 1 @E 1
  dS curlcB  J c0 J
S @t c @t c0
Z
d where J = v. By summing over the separate particle
 B  dS 25
dt S velocities, we conclude that
If the magnetic field is varying, so that the integral of B 1 @E
over S is not constant, then the integral of E around the curl B  0 J
c2 @t
loop will not be zero. There will be a nonzero electric
field along the wire, which will exert a force on the holds for an arbitrary distribution of charges, provided
electrons in the wire and cause a current to flow. that their velocities are much less than that of light.
The quantity
I
E  ds Maxwells Equations
The basic principles, together with the assumption of
which is measured in volts, is the work done by the
Galilean invariance for velocities much less than that
electric field when a unit charge makes one circuit
of light, have allowed us to deduce that the electric and
of the wire. It is called the electromotive force
magnetic fields generated by a continuous distribution
around the circuit. The integral is the magnetic flux
of moving charges in otherwise empty space satisfy
linking the circuit. The relationship [25] between
electromotive force and rate of change of magnetic 
div E 26
flux is Faradays law. 0
div B 0 27

The Field of Charges in Uniform Motion 1 @E


curl B  0 J 28
We can extract another of Maxwells equations c2 @t
from this argument. By EM30 , a single charge e with
velocity v generates an electric field E and a @B
curl E 0 29
magnetic field @t
Introductory Article: Electromagnetism 47

where  is the charge density, J is the current charge; it is a differential form of the statement
density, and c2 = 1=0 0 . These are Maxwells that charges are neither created nor destroyed.
equations, the basis of modern electrodynamics.
Together with the Lorentz-force law, they describe
the dynamics of charges and electromagnetic fields. Conservation of Charge
We have arrived at them by considering how basic
electromagnetic processes appear in moving frames To see the connection between the continuity
of reference an unsatisfactory route because we equation and charge conservation, let us look at
have seen on the way that the principles on which the total charge within a fixed V bounded by a
we based the derivation are incompatible with surface S. If charge is conserved, then any increase
Galilean invariance for velocities comparable with or decrease in a short period of time must be
that of light. Maxwell derived them by analyzing an exactly balanced by an inflow or outflow of charge
elaborate mechanical model of electric and magnetic across S.
fields as displacements in the luminiferous ether. Consider a small element dS of S with outward
That is also unsatisfactory because the model has unit normal and consider all the particles that have a
long been abandoned. The reason that they are particular charge e and a particular velocity v at
accepted today as the basis of theoretical and time t. Suppose that there are of these per unit
practical applications of electromagnetism has little volume ( is a function of position). Those that cross
to do with either argument. It is first that they are the surface element between t and t
t are those
self-consistent, and second that they describe the that at time t lie in the region of volume
behavior of real fields with unreasonable accuracy. jv  n dS
tj
shown in Figure 1. They contribute e v  dS
t to the
outflow of charge through the surface element. But
The Continuity Equation the value of J at the surface element is the sum of
It is not immediately obvious that the equations are e v over all possible values of v and e. By summing
self-consistent. Given  and J as functions of the over v, e, and the elements of the surface, therefore,
coordinates and time, Maxwells equations are two and by passing to the limit of a continuous
scalar and two vector equations in the unknown distribution, the total rate of outflow is
Z
components of E and B. That is, a total of eight
equations for six unknowns more equations than J  dS
S
unknowns. Therefore, it is possible that they are in
fact inconsistent. Charge conservation implies that the rate of
If we take the divergence of eqn [29], then we outflow should be equal to the rate of decrease in
obtain the total charge within V. That is,
Z Z
@ d
div B 0  dV J  dS 0 31
@t dt V S

which is consistent with eqn [27]; so no problem By differentiating the first term under the integral
arises here. However, by taking the divergence of sign and by applying the divergence theorem to the
eqn [28] and substituting from eqn [26], we get second integral,
Z  
@
0 div curl B div J dV 0 32
V @t
1 @
2 div E 0 div J If this is to hold for any choice of V, then  and J
c @t 
@ must satisfy the continuity equation. Conversely, the
0 div J continuity equation implies charge conservation.
@t
This gives a contradiction unless
n
@
div J 0 30 dS
@t
dt dt
So the choice of  and J is not unconstrained; they
must be related by the continuity equation [30]. This
holds for physically reasonable distributions of Figure 1 The outflow through a surface element.
48 Introductory Article: Electromagnetism

p
The Displacement Current where c = 1= 0 0 . By taking the curl of eqn [36]
and by substituting from eqns [35] and [37], we
The third of Maxwells equations can be written as
obtain
   
@E 1 @E
curl B 0 J 0 33 2
0 grad div B  r B  2 curl
@t c @t
1 @
in which form it can be read as an equation r2 B  2 curl E
for an unknown magnetic field B in terms of c @t
a known current distribution J and electric 1 @2B
r2 B 2 2 38
field E. When E and J are independent of t, it c @t
reduces to Therefore, the three components of B in empty space
satisfy the (scalar) wave equation
curl B 0 J
&u 0
which determines the magnetic field of a steady
current, in a way that was already familiar Here & is the dAlembertian operator, defined by
to Maxwells contemporaries. But his second 1 @2 1 @2 @2 @2 @2
2
term on the right-hand side of [33] was new; it &  r   
c2 @t2 c2 @t2 @x2 @y2 @z2
adds to J the so-called vacuum displacement
current By taking the curl of eqn [37], we also obtain
& E = 0.
@E
0
@t
Monochromatic Plane Waves
The name comes from an analogy with the
behavior of charges in an insulating material. The fact that E and B are vector-valued solutions of
Here no steady current can flow, but the distribu- the wave equation in empty space suggests that we
tion of charges within the material is distorted look for plane wave solutions of Maxwells
by an external electric field. When the field equations in which
changes, the distortion also changes, and the result E a cos  b sin  39
appears as a current the displacement current
which flows during the period of change. Max- where a, b are constant vectors and
wells central insight was that the same term !
should be present even in empty space. The  ct  r  e; e  e 1 40
c
consequence was profound; it allowed him to
with ! > 0, , , and e constant; ! is the frequency
explain the propagation of light as an electromag-
and e is a unit vector that gives the direction of
netic phenomenon.
propagation (adding  to t and ce to r leaves u
unchanged). This satisfies the wave equation, but for
a general choice of the constants, it will not be
The Source-Free Equations possible to find B such that eqns [34][37] also hold.
By taking the divergence of eqn [39], we obtain
In a region of empty space, away from the
charges generating the electric and magnetic fields, !
div E e  a sin   e  b cos  41
we have  = 0 = J, and Maxwells equations c
reduce to For eqn [34] to hold, therefore, we must choose a
and b orthogonal to e. For eqn [37] to hold, we
div E 0 34
must find B such that
! @B
div B 0 35 curl E e ^ a sin   e ^ b cos   42
c @t
1 @E A possible choice is
curl B  0 36
c2 @t e^E 1
B e ^ a cos  e ^ b sin  43
c c
@B and it is not hard to see that E and B then satisfy
curl E 0 37
@t [35] and [36] as well.
Introductory Article: Electromagnetism 49

The solutions obtained in this way are called nontrivial topology, then it may not be possible to
monochromatic electromagnetic plane waves. find a suitable  or a throughout the whole of U.
Note that such waves are transverse in the sense Suppose now that we are given fields E and B
that E and B are orthogonal to the direction of satisfying Maxwells equations [26][29] with
propagation. The definition E can be written more sources represented by the charge density  and the
concisely in the form current density J. Since div B = 0, there exists a time-
  dependent vector field A (t, x, y, z) such that
E Re a ibei 44
B curl A
It is an exercise in Fourier analysis to show every
solution in empty space is a combination of If we substitute B = curl A into [29] and interchange
monochromatic plane waves. A plane wave has curl with the time derivative, then we obtain
plane or linear polarization if a and b are  
@A
proportional. It has circular polarization if curl E 0
a  a = b  b, a  b = 0. @t
At the heart of Maxwells theory was the idea that It follows that there exists a scalar (t, x, y, z) such
a light wave with definite frequency or color is that
represented by a monochromatic plane solution of
his equations. @A
E grad   47
@t
Such a vector field A is called a magnetic vector
Potentials potential; a function  such that eqn [47] holds is
For every solution of Maxwells equations in vacuo, called an electric scalar potential.
the components of E and B satisfy the three- Conversely, given scalar and vector functions 
dimensional wave equation; but the converse is not and A of t, x, y, z, we can define B and E by
true. That is, it is not true in general that if @A
B curl A; E grad   48
&B 0; &E 0 @t
Then two of Maxwells equations hold automati-
then E and B satisfy Maxwells equations. For this
cally, since
to happen, the divergence of both fields must vanish,
and they must be related by [36] and [37]. These @B
additional constraints are somewhat simpler to div B 0; curl E 0
@t
handle if we work not with the fields themselves,
The remaining pair translate into conditions on A
but with auxiliary quantities called potentials.
and . Equation [26] becomes
The definition of the potentials depends on
standard integrability conditions from vector calcu- @ 
div E r2   div A
lus. Suppose that v is a vector field, which may @t 0
depend on time. If curl v = 0, then there exists a
and eqn [28] becomes
function  such that
1 @E
v grad  45 curl B  r2 A grad div A
c2 @t  
If div v = 0, then there exists a second vector field a 1 @ @A
such that 2 grad 
c @t @t
v curl a 46 0 J
Neither  nor a is uniquely determined by v. In the If we put
first case, if [45] holds, then it also holds when  is
1 @
replaced by 0 =  f , where f is a function of time div A
alone; in the second, if [46] holds, then it also holds c2 @t
when a is replaced by then we can rewrite the equations for A and  more
0 simply as
a a grad u
@ 
for any scalar function u of position and time. It & 
@t 0
should be kept in mind that the existence statements
are local. If v is defined on a region U with &A grad 0 J
50 Introductory Article: Electromagnetism

Here we have four equations (one scalar, one vector) If we impose the Lorenz condition, then the only
in four unknowns ( and the components of A). Any remaining freedom in the choice of A and  is to
set of solutions , A determines a solution of make gauge transformations [49] in which u is a
Maxwells equations via [48]. solution of the wave equation &u = 0. Under the
Lorenz condition, Maxwells equations take the
form
Gauge Transformations & =0 ; &A 0 J 51
Given solutions E and B of Maxwells equations, Consistency with the Lorenz condition follows from
what freedom is there in the choice of A and ? the continuity equation on  and J.
First, A is determined by curl A = B up to the In the absence of sources, therefore, Maxwells
replacement of A by equations for the potential in the Lorenz gauge
reduce to
A0 A grad u
& 0; &A 0 52
for some function u of position and time. The scalar
potential 0 corresponding to A0 must be chosen so together with the constraint
that
1 @
div A 0
@A0 c2 @t
grad 0 E
@t   We can, for example, choose three arbitrary solu-
@A @u tions of the scalar wave equation for the compo-
E grad
@t @t nents of the vector potential, and then define  by
 
@u Z
grad    c2 div Adt
@t
That is, 0 =   @u=@t f (t), where f is a function Whatever choice we make, we shall get a solution of
of t alone. We can absorb f into u by subtracting Maxwells equations, and every solution of Max-
Z
wells equations (without sources) will arise from
f dt some such choice.

(this does not alter A0 ). So the freedom in the choice


of A and  is to make the transformation Historical Note
@u At the end of the eighteenth century, four types of
A 7! A0 A grad u;  7! 0   49 electromagnetic phenomena were known, but not
@t
the connections between them.
for any u = u(t, x, y, z). The transformation [49] is
called a gauge transformation.  Magnetism, the word derives from the Greek for
Under [49], stone from Magnesia.
0
 Static electricity, produced by rubbing amber with
1 @
7! 0 divA0  &u fur; the word electricity derives from the Greek
c2 @t for amber.
It is possible to show, under certain very mild  Light.
conditions on , that the inhomogeneous wave  Galvanism or animal electricity the electricity
equation produced by batteries, discovered by Luigi
Galvani.
&u 50
The construction of a unified theory was a slow
has a solution u = u(t, x, y, z). If we choose u so that and painful business. It was hindered by attempts,
[50] holds, then the transformed potentials A0 and 0 which seem bizarre in retrospect, to understand
satisfy electromagnetism in terms of underlying mechanical
models involving such inventions as electric fluids
1 @0
divA0 0 and magnetic vortices. We can see the legacy of
c2 @t this period, which ended with Einsteins work in
This is the Lorenz gauge condition, named after 1905, in the misleading and archaic terms that still
L Lorenz (not the H A Lorentz of the Lorentz survive in modern terminology: magnetic flux,
contraction). lines of force, electric displacement, and so on.
Introductory Article: Equilibrium Statistical Mechanics 51

Maxwells contribution was decisive, although  1846 Faraday suggested that light is a vibration
much of what we now call Maxwells theory is in magnetic lines of force.
due to his successors (Lorentz, Hertz, Einstein, and  1863 Maxwell published the equations that
so on); and, as we shall see, a key element in describe the dynamics of electric and magnetic
Maxwells own description of electromagnetism fields.
the electromagnetic ether, an all-pervasive  1905 Einsteins paper On the electrodynamics
medium which was supposed to transmit electro- of moving bodies.
magnetic waves was thrown out by Einstein.
A rough chronology is as follows.
 1800 Volta demonstrated the connection between Further Reading
galvanism and static electricity.
Chalmers AF (1975) Maxwell and the displacement current.
 1820 Oersted showed that the current from a Physics Education January 1975: 4549.
battery generates a force on a magnet. Einstein A (1905) On the Electrodynamics of Moving Bodies. A
 1822 Ampere suggested that light was a wave translation of the paper can be found in The Principle of
motion in a luminiferous ether made up of two Relativity by Lorentz HA, Einstein A, Minkowski H, and
types of electric fluid. In the same year, Galileos Weyl H, with notes by Sommerfeld A. New York: Dover,
1952.
Dialogue concerning the two chief world sys- Roche J (1998) The present status of Maxwells displacement
tems was removed from the index of prohibited current. European Journal of Physics 19: 155166.
books. Siegel DM (1985) Mechanical image and reality in Maxwells
 1831 Faraday showed that moving magnets can electromagnetic theory. In: Harman PM (ed.) Wranglers and
Physicists. Manchester: Manchester University Press.
induce currents.

Introductory Article: Equilibrium Statistical Mechanics


G Gallavotti, Universita di Roma La Sapienza, interactions and, possibly, to external conservative
Rome, Italy forces: a typical example is a gas in a container
2006 G Gallavotti. Published by Elsevier Ltd. subject to forces due to the walls of  and gravity,
All rights reserved. besides the internal interactions. This is a very
restricted class of systems and states.
A more general case is when the system is in a
stationary state but it is also subject to nonconservative
Foundations: Atoms and Molecules
forces: a typical example is a gas or fluid in which a
Classical statistical mechanics studies properties of wheel rotates, as in the Joule experiment, with some
macroscopic aggregates of particles, atoms, and device acting to keep the temperature constant. The
molecules, based on the assumption that they are device is called a thermostat and in statistical
point masses subject to the laws of classical mechanics it has to be modeled by forces, including
mechanics. Distinction between macroscopic and nonconservative ones, which prevent an indefinite
microscopic systems is evanescent and in fact the energy transfer from the external forcing to the system:
foundations of statistical mechanics have been laid such a transfer would impede the occurrence of
on properties, proved or assumed, of few-particle stationary states. For instance, the thermostat could
systems. simply be a constant friction force (as in stirred
Macroscopic systems are often considered in incompressible liquids or as in electric wires in which
stationary states, which means that their micro- current circulates because of an electromotive force).
scopic configurations follow each other as time A more fundamental approach would be to
evolves while looking the same macroscopically. imagine that the thermostat device is not a phenom-
Observing time evolution is the same as sampling enologically introduced nonconservative force (e.g.,
(not too closely time-wise) independent copies of a friction force) but is due to the interaction with an
the system prepared in the same way. external infinite system which is in equilibrium at
A basic distinction is necessary: a stationary state infinity.
may or may not be in equilibrium. The first case In any event nonequilibrium stationary states are
arises when the particles are enclosed in a container intrinsically more complex than equilibrium states.
 and are subject only to their mutual conservative Here attention will be confined to equilibrium
52 Introductory Article: Equilibrium Statistical Mechanics

statistical mechanics of systems of N identical point theories), it will be useful to consider also systems of
particles Q = (q1 , . . . , qN ) enclosed in a cubic box , particles in dimension d 6 3: in this case the above
with volume V and side L, normally assumed to 6N and 3N become, respectively, 2dN and dN.
have perfectly reflecting walls. Systems with dimension d = 1, 2 are in fact some-
Particles of mass m located at q, q0 will be times very good models for thin filaments or thin
supposed to interact via a pair potential (q  q0 ). films. For the same reason, it is often useful to
The microscopic motion follows the equations imagine that space is discrete and particles can only
be located on a lattice, for example, on Zd (see the
X
N X
mq
i  @qi qi  qj Wwall qi section Lattice models).
j1 i The reader is referred to Gallavotti (1999) for
def
more details.
@qi Q 1
where the potential is assumed to be smooth
except, possibly, for jq  q0 j  r0 where it could be Pressure, Temperature, and Kinetic
1, that is, the particles cannot come closer than Energy
r0 , and at r0 [1] is interpreted by imagining that they
undergo elastic collisions; the potential Wwall models The beginning was BERNOULLIs derivation of
the container and it will be replaced, unless the perfect gas law via the identification of
explicitly stated, by an elastic collision rule. the pressure at numerical density  with the
The time evolution (Q, Q) _ ! St (Q, Q)_ will, there- average momentum transferred per unit time to
fore, be described on the position velocity space, a surface element of area dS on the walls: that is,
Fb(N), of the N particles or, more conveniently, on the average of the observable 2mvv dS, with v
the phase space, i.e., by a time evolution St on the the normal component of the velocity of
momentum position (P, Q, with P = mQ) _ space, the particles that undergo collisions with dS.
F (N). The motion being conservative, the energy If f (v)dv is the distribution of the Q normal compo-
nent of velocity and f (v)d3 v  i f (vi )d3 v, v =
X 1 X X
U
def
p2i qi  qj Wwall qi (v1 , v2 , v3 ), is the total velocity distribution,
i
2m i<j i the average of the momentum transferred is pdS
def
given by
KP Q Z Z
will be a constant of motion; the last term in  is dS 2mv2 f vdv dS mv2 f vdv
v>0
missing if walls are perfect. This makes it convenient to Z  
regard the dynamics as associated with two dynamical 2 m 2 3 2 K
 dS v f vd v  dS 2
systems (F (N), St ) on the 6N-dimensional phase 3 2 3 N
space, and (F U (N), St ) on the (6N  1)-dimensional
Furthermore (2=3)hK=Ni was identified as pro-
surface of energy U. Since the dynamics [1] is def
portional to the absolute temperature hK=Ni =
Hamiltonian on phase space, with Hamiltonian
const (3=2)T which, with present-day notations, is
def
X 1 def written as (2=3)hK=Ni = kB T. The constant kB was
HP; Q p2i Q K 
2m (later) called Boltzmanns constant and it is the
i
same for at least all perfect gases. Its independence
it follows that the volume d3N Pd3N Q is conserved on the particular nature of the gas is a conse-
(i.e., a region E has the same volume as St E) and quence of Avogadros law stating that equal
also the area (H(P, Q)  U)d3N Pd3N Q is conserved. volumes of gases at the same conditions of
The above dynamical systems are well defined, temperature and pressure contain equal number
i.e., St is a map on phase space globally defined for of molecules.
all t 2 (1, 1), when the interaction potential is Proportionality between average kinetic energy
bounded below: this is implied by the a priori and temperature via the universal constant kB
bounds due to energy conservation. For gravita- became in fact a fundamental assumption extending
tional or Coulomb interactions, much more has to to all aggregates of particles gaseous or not, never
be said, assumed, and done in order to even define challenged in all later works (until quantum
the key quantities needed for a statistical theory of mechanics, where this is no longer true, see the
motion. section Quantum statistics.
Although our world is three dimensional (or at For more details, we refer the reader to Gallavotti
least was so believed to be until recent revolutionary (1999).
Introductory Article: Equilibrium Statistical Mechanics 53

Heat and Entropy U total energy of the system  K 


After Clausius discovery of entropy, BOLTZMANN, in T time average of the kinetic energy K hKi
order to explain it mechanically, introduced the heat
V the parameter on which 4
theorem, which he developed to full generality
between 1866 and 1884. Together with the men- is supposed to depend
tioned identification of absolute temperature with p time average of @V ; h@V i
average kinetic energy, the heat theorem can also be
considered a founding element of statistical A state is thus parametrized by U, V. If such
mechanics. parameters change by dU, dV, respectively, and
def def
The theorem makes precise the notion of time if dL =  pdV, dQ = dU pdV, then [3] holds. In
average and then states in great generality that fact, let x (U, V) be the extremes of the oscillations of
given any mechanical system one can associate with the motion with given U, V and define S as
its dynamics four quantities U, V, p, T, defined as
time averages of suitable mechanical observables Z x U;V p
(i.e., functions on phase space), so that when the S 2 log U  xdx
x U;V
external conditions are infinitesimally varied and R p
the quantities U, V change by dU, dV, respectively, dU  @V xdVdx= K
) dS R p
5
the ratio (dU pdV)=T is exact, i.e., there is a dx= KK
function S(U, V) whose corresponding variation
p p
equals the ratio. It will be better, for the purpose of Noting that dx= K = 2=m dt, [3] follows because
considering very large boxes (V ! 1) to write this time averages are given by integrating with p respect
def p
relation in terms of intensive quantities u = U=N and to dx= K and dividing by the integral of 1= K.
v = V=N as For more details, the reader is referred to Boltzmann
(1968b) and Gallavotti (1999).
du pdv
is exact 3
T
Heat Theorem and Ergodic Hypothesis
i.e., the ratio equals the variation ds of
s(U=N, V=N)  (1=N)S(U, V). Boltzmann tried to extend the result beyond the one-
The proof originally dealt with monocyclic dimensional systems (e.g., to Keplerian motions,
systems, i.e., systems in which all motions are which are not monocyclic unless only motions with
periodic. The assumption is clearly much too a fixed eccentricity are considered). However, the
restrictive and justification for it developed from early statement that aperiodic motions can be
the early nonperiodic motions can be regarded regarded as periodic with infinite period is really
as periodic with infinite period (1866), to the the heart of the application of the heat theorem
later ergodic hypothesis and finally to the for monocyclic systems to the far more complex gas
realization that, after all, the heat theorem in a box.
does not really depend on the ergodic hypothesis Imagine that the gas container  is closed by a
(1884). piston of section A located to the right of the
Although for a one-dimensional system the proof origin at distance L and acting as a lid, so that the
of the heat theorem is a simple check, it was a real volume is V = AL. The microscopic model for the
breakthrough because it led to an answer to the piston will be a potential (L  ) if x = (, , ) are
general question as to under which conditions one the coordinates of a particle. The function (r)
could define mechanical quantities whose variations will vanish for r > r0 , for some r0  L, and
were constrained to satisfy [3] and therefore could diverge to 1 at r = 0. Thus, r0 is the width of
be interpreted as a mechanical model of Clausius the layer near the piston where the force of the
macroscopic thermodynamics. It is reproduced in wall is felt by the particles that happen to be
the following. roaming there.
Consider a one-dimensional system subject to The contribution to the total P potential energy
forces with a confining potential (x) such that  due to the walls is Wwall = j (L  j ) and
j0 (x)j > 0 for jxj > 0, 00 (0) > 0 and (x) x!
!1 1. @V = A1 @L ; assuming monocyclicity, it is neces-
All motions are periodic, so that the system is sary to evaluate
P the time average of @L (x) =
monocyclic. Suppose that the potential (x) depends @L Wwall   j 0 (L  j ). As time evolves, the
on a parameter V and define a state to be a motion with particles xj with j in the layer within r0 of the
given energy U and given V; let wall will feel the force exercised by the wall and
54 Introductory Article: Equilibrium Statistical Mechanics

bounce back. One particle in the layer will con- and (up to a proportionality factor) absolute
tribute to the average of @L (x) the amount temperature, respectively.
Z t1 Boltzmanns conception of space (and time) as
1
2 0 L  j dt 6 discrete allowed him to conceive the property that
total time t0 the energy surface is constituted by points all of
if t0 is the first instant when the point j enters the which belong to a single trajectory: a property that
layer and t1 is the instant when the -component of would be impossible if the phase space was really a
the velocity vanishes against the wall. Since continuum. Regarding phase space as consisting of a
0 (L  j ) is the -component of the force, the finite number of cells of finite volume hdN , for
integral is 2mjj j (by Newtons law), provided, of some h > 0 (rather than of a continuum of points),
course, j > 0: allowed him to think, without logical contradiction,
Suppose that no collisions between particles occur that the energy surface consisted of a single
while the particles travel within the range of the trajectory and, hence, that motion was a cyclic
potential of the wall, i.e., the mean free path is much permutation of its points (actually cells).
greater than the range of the potential defining the Furthermore, it implied that the time average of
wall. The contribution of collisions to the average an observable F(P, Q) had to be identified with its
momentum transfer to the wall per unit time is average on the energy surface computed via the
therefore given by, see [2], Liouville distribution
Z Z
2mv f vwall Av dv C1 FP; QHP; Q UdP dQ
v>0

if wall , f (v) are the average density near the wall with
and, respectively, the average fraction of particles Z
with a velocity component normal to the wall C HP; Q UdP dQ
between v and v dv. Here p, f are supposed to be
independent of the point on the wall: this should be (the appropriate normalization factor): a property
true up to corrections of size o(A). that was written symbolically
Thus, writing the average kinetic
R energy per particle dt dP dQ
and per velocity component, (m=2)v2 f (v)dv, as R
T dP dQ
(1=2)1 (cf. [2]) it follows that
or
def
p  h@V i wall 1 7 Z
1 T
lim FSt P; Qdt
has the physical interpretation of pressure. (1=2) 1 T!1 T 0
is the average kinetic energy per degree of freedom: R
hence, it is proportional to the absolute temperature FP 0 ; Q0 HP0 ; Q0  U dP 0 dQ0
R 8
T (cf. see the section Pressure, temperature, and HP0 ; Q0  U dP 0 dQ0
kinetic energy).
On the other hand, if motion on the energy The validity of [8] for all (piecewise smooth)
surface takes place on a single periodic orbit, the observables F and for all points of the energy
quantity p in [7] is the right quantity that would surface, with the exception of a set of zero area, is
make the heat theorem work; see [4]. Hence, called the ergodic hypothesis.
regarding the trajectory on each energy surface as For more details, the reader is referred to
periodic (i.e., the system as monocyclic) leads to the Boltzmann (1968) and Gallavotti (1999).
heat theorem with p, U, V, T having the right
physical interpretation corresponding to their appel-
lations. This shows that monocyclic systems provide
Ensembles
natural models of thermodynamic behavior.
Assuming that a chaotic system like a gas in a Eventually Boltzmann in 1884 realized that the
container of volume V will satisfy, for practical validity of the heat theorem for averages computed
purposes, the above property, a quantity p can be via the right-hand side (rhs) of [8] held indepen-
defined such that dU pdV admits the inverse of dently of the ergodic hypothesis, that is, [8] was not
the average kinetic energy hKi as an integrating necessary because the heat theorem (i.e., [3]) could
factor and, furthermore, p, U, V, hKi have the also be derived under the only assumption that the
physical interpretations of pressure, energy, volume, averages involved in its formulation were computed
Introductory Article: Equilibrium Statistical Mechanics 55

as averages over phase space with respect to the probability distributions attributing the same
probability distribution on the rhs of [8]. average values to the corresponding microscopic
Furthermore, if T was identified with the average observables (i.e., whose averages have the inter-
kinetic energy, U with the average energy, and p pretation of thermodynamic functions).
with the average force per unit surface on the walls 2. Once the correct correspondence between the
of the container  with volume V, the relation [3] elements of the different ensembles is established,
held for a variety of families of probability distribu- that is, once the pairs (u, v), (, v), (, ) are so
tions on phase space, besides [8]. Among these are: related to produce the same values for the
def
averages U, V, kB T =  1 , pj@j of
1. The microcanonical ensemble, which is the
collection of probability distributions on the rhs Z
2KP
of [8] parametrized by u = U=N, v = V=N (energy HP; Q; V; ; @ q1 2mv1 n2 dq1 12
and volume per particle), 3N

mc
u;v dP dQ
where (@ (q1 ) is a delta-function pinning q1 to
1 dP dQ the surface @), then the averages of all physi-
HP; Q U 9 cally interesting observables should coincide at
Zmc U; N; V N!hdN
least in the thermodynamic limit,  ! 1. In this
where h is a constant with the dimensions of an way, the elements  of the considered collection
action which, in the discrete representation of of probability distributions can be identified with
phase space mentioned in the previous section, can the states of macroscopic equilibrium of the
be taken such that hdN equals the volume of the system. The s depend on parameters and there-
cells and, therefore, the integrals with respect to [9] fore they form an ensemble: each of them
can be interpreted as an (approximate) sum over corresponds to a macroscopic equilibrium state
the cells conceived as microscopic configurations whose thermodynamic functions are appropriate
of N indistinguishable particles (whence the N!). averages of microscopic observables and therefore
2. The canonical ensemble, which is the collec- are functions of the parameters identifying .
tion of probability distributions parametrized by
Remark The word ensemble is often used to
, v = V=N,
indicate the individual probability distributions of
1 dPdQ what has been called here an ensemble. The meaning
c;v dPdQ eHP;Q 10 used here seems closer to the original sense in the
Zc ; N; V N!hdN
1884 paper of Boltzmann (in other words, often by
to which more ensembles can be added, such as ensemble one means that collection of the phase
the grand canonical ensemble (Gibbs). space points on which a given probability distribu-
3. The grand canonical ensemble which is the tion is considered, and this does not seem to be the
collection of probability distributions parameter- original sense).
ized by , and defined over the space For instance, in the case of the microcanonical
F gc = [1N = 0 F (N), distributions this means interpreting energy, volume,
temperature, and pressure of the equilibrium state
gc
; dPdQ with specific energy u and specific volume v as
1 dPdQ proportional, through appropriate universal propor-
e NHP;Q 11 tionality constants, to the integrals with respect to
Zgc ; ; V N!hdN
mc
u, v (dP dQ) of the mechanical quantities in [12].
The averages of other thermodynamic observables in
Hence, there are several different models of thermo-
the state with specific energy u and specific volume
dynamics. The key tests for accepting them as real
v should be given by their integrals with respect
microscopic descriptions of macroscopic thermo-
to mc u, v .
dynamics are as follows.
Likewise, one can interpret energy, volume,
1. A correspondence between the macroscopic temperature, and pressure of the equilibrium state
states of thermodynamic equilibrium and the with specific energy u and specific volume v as the
elements of a collection of probability distribu- averages of the mechanical quantities [12] with
tions on phase space can be established by respect to the canonical distribution c, v (dP dQ)
identifying, on the one hand, macroscopic which has average specific energy precisely u. The
thermodynamic states with given values of the averages of other thermodynamic observables in the
thermodynamic functions and, on the other, state with specific energy and volume u and v are
56 Introductory Article: Equilibrium Statistical Mechanics

given by their integrals with respect to c, v . A ensembles with the orthodicity property, hence
similar definition can be given for the description of leading to equivalent mechanical models of thermo-
thermodynamic equilibria via the grand canonical dynamics, can be naturally interpreted in connection
distributions. with the phenomenon of phase transition (see the
For more details, see Gibbs (1981) and Gallavotti section Phase transitions and boundary conditions).
(1999). Clearly, the quoted results do not prove
that thermodynamic equilibria are described by
the microcanonical, canonical, or grand canonical
Equivalence of Ensembles
ensembles. However, they certainly show that,
BOLTZMANN proved that, computing averages via the for most systems, independently of the number of
microcanonical or canonical distributions, the essen- degrees of freedom, one can define quite unambigu-
tial property [3] was satisfied when changes in their ously a mechanical model of thermodynamics estab-
parameters (i.e., u, v or , v, respectively) induced lishing parameter-free, system-independent, physically
changes du and dv on energy and volume, respec- important relations between thermodynamic quanti-
tively. He also proved that the function s, whose ties (e.g., @u (p(u, v)=T(u, v))  @v (1=T(u, v)), from [3]).
existence is implied by [3], was the same function The ergodic hypothesis which was at the root
once expressed as a function of u, v (or of any pair of the mechanical theorems on heat and entropy
of thermodynamic parameters, e.g., of T, v or p, u). cannot be taken as a justification of their validity.
A close examination of Boltzmanns proof shows Naively one would expect that the time scale
that the [3] holds exactly in the canonical ensemble necessary to see an equilibrium attained, called
and up to corrections tending to 0 as  ! 1 in the recurrence time scale, would have to be at least the
microcanonical ensemble. Identity of thermo- time that a phase space point takes to visit all
dynamic functions evaluated in the two ensembles possible microscopic states of given energy: hence,
holds, as a consequence, up to corrections of this an explanation of why the necessarily enormous size
order. In addition, Gibbs added that the same held of the recurrence time is not a problem becomes
for the grand canonical ensemble. necessary.
Of course, not every collection of stationary In fact, the recurrence time can be estimated once
probability distributions on phase space would the phase space is regarded as discrete: for the
provide a model for thermodynamics: Boltzmann purpose of countering mounting criticism, Boltz-
called orthodic the collections of stationary mann assumed that momentum was discretized in
distributions which generated models of thermo- units of (2mkB T)1=2 (i.e., the average momentum
dynamics through the above-mentioned identifica- size) and space was discretized in units of 1=3
tion of its elements with macroscopic equilibrium (i.e., the average spacing), implying a volume of
def
states. The microcanonical, canonical, and the later cells h3N with h = 1=3 (2mkB T)1=2 ; then he calcu-
grand canonical ensembles are the chief examples lated that, even with such a gross discretization, a
of orthodic ensembles. Boltzmann and Gibbs cell representing a microscopic state of 1 cm3 of
proved these ensembles to be not only orthodic hydrogen at normal condition would require a time
19
but to generate the same thermodynamic functions, (called recurrence time) of the order of
1010
that is to generate the same thermodynamics. times the age of the Universe (!) to visit the entire
This meant freedom from the analysis of the truth energy surface. In fact, the phase space volume is
of the doubtful ergodic hypothesis (still unproved in  = (3 N(2mkB T)3=2 )N  h3N and the number of
any generality) or of the monocyclicity (manifestly cells of volume h3N is =(N!h3N ) e3N ; and the
false if understood literally rather than regarding the time to visit all will be e3N
0 , with
0 a typical
phase space as consisting of finitely many small, atomic unit, e.g., 1012 s but N = 1019 . In this
discrete cells), and allowed Gibbs to formulate the sense, the statement boldly made by young Boltz-
problem of statistical mechanics of equilibrium as mann that aperiodic motions can be regarded as
follows. periodic with infinite period was even made
quantitative.
Problem Study the properties of the collection of
The recurrence time is clearly so long to be
probability distributions constituting (any) one of
irrelevant for all purposes: nevertheless, the correct-
the above ensembles.
ness of the microscopic theory of thermodynamics
However, by no means the three ensembles just can still rely on the microscopic dynamics once it is
introduced exhaust the class of orthodic ensembles understood (as stressed by Boltzmann) that the
producing the same models of thermodynamics in reason why we observe approach to equilibrium,
the limit of infinitely large systems. The wealth of and equilibrium itself, over human timescales
Introductory Article: Equilibrium Statistical Mechanics 57

(which are far shorter than the recurrence times) is Not surprisingly, assumptions on the interparticle
due to the property that on most of the energy surface potential (q  q0 ) are necessary to achieve an
the (very few) observables whose averages yield existence proof of the limits in [13]. The assump-
macroscopic thermodynamic functions (namely pres- tions on are not only quite general but also have a
sure, temperature, energy, . . .) assume the same value clear physical meaning. They are
even if N is only very moderately large (of the order of
1. stability: that PN is, existence of a constant B 0
103 rather than 1019 ). This implies that this value
such that i<j (qi  qj ) BN for all N 0,
coincides with the average and therefore satisfies the
q1 , . . . , qN 2 Rd , and
heat theorem without any contradiction with the
2. temperedness: that is, existence of constants "0 ,
length of the recurrence time. The latter rather
R > 0 such that j(q  q0 )j < Bjq  q0 jd"0 for
concerns the time needed to the generic observable to
jq  q0 j > R.
thermalize, that is, to reach its time average: the
generic observable will indeed take a very long time to The assumptions are satisfied by essentially all
thermalize but no one will ever notice, because the microscopic interactions with the notable exceptions
generic observable (e.g., the position of a pre-identified of the gravitational and Coulombic interactions,
particle) is not relevant for thermodynamics. which require a separate treatment (and lead to
The word proof is not used in the mathematical somewhat different results on the thermodynamic
sense so far in this article: the relevance of a behavior).
mathematically rigorous analysis was widely rea- For instance, assumptions (1), (2) are satisfied
lized only around the 1960s at the same time when if (q) is 1 for jqj < r0 and smooth for jqj > r0 ,
the first numerical studies of the thermodynamic for some r0 0, and furthermore (q) > B0 jqj(d"0 )
functions became possible and rigorous results were if r0 < jqj  R, while for jqj > R it is j(q)j <
needed to check the correctness of various numerical B1 jqj(d"0 ) , for some B0 , B1 , "0 > 0, R > r0 . Briefly,
simulations. is fast diverging at contact and fast approaching 0
For more details, the reader is referred to Boltzmann at large distance. This is called a (generalized)
(1968a, b) and Gallavotti (1999). LennardJones potential. If r0 > 0, is called a
hard-core potential. If B1 = 0, the potential is said
to have finite range. (See Appendix 1 for physical
implications of violations of the above stability and
Thermodynamic Limit
temperedness properties.) However, in the following,
Adopting Gibbs axiomatic point of view, it is it will be necessary, both for simplicity and to contain
interesting to see the path to be followed to achieve the length of the exposition, to restrict consideration
an equivalence proof of three ensembles introduced to the case B1 = 0, i.e., to
in the section Heat theorem and ergodic
hypothesis. q > B0 jqjd"0 ; r0 < jqj  R;
14
A preliminary step is to consider, given a cubic jqj  0; jqj > R
box  of volume V = Ld , the normalization factors
Zgc (, , V), Zc (, N, V), and Zmc (U, N, V) in [9], unless explicitly stated.
[10], and [11], respectively, and to check that the Assuming stability and temperedness, the exis-
following thermodynamic limits exist: tence of the limits in [13] can be mathematically
proved: in Appendix 2, the proof of the first is
def 1
pgc ; lim log Zgc ; ; V analyzed to provide the simplest example of the
V!1 V
technique. A remarkable property of the functions
def 1
 fc ;  lim log Zc ; N; V pgc (, ), fc (, ), and smc (u, ) is that they are
V!1;N
V  N 13 convex functions: hence, they are continuous in the
k1
B smc u; 
interior of their domains of definition and, at one
1 variable fixed, are differentiable with respect to the
def
lim log Zmc U; N; V other with at most countably many exceptions.
V!1;N=V; U=Nu N
In the case of a potential without hard core
def
where the density  = v1  N=V is used, instead of (max = 1), fc (, ) can be checked to tend to 0
v, for later reference. The normalization factors play slower than  as  ! 0, and to 1 faster than  as
an important role because they have simple thermo-  ! 1 (essentially proportionally to  log  in both
dynamic interpretation (see the next section): they cases). Likewise, in the same case, smc (u, ) can be
are called grand canonical, canonical, and micro- shown to tend to 0 slower than u  umin as u ! umin ,
canonical partition functions, respectively. and to 1 faster than u as u ! 1. The latter
58 Introductory Article: Equilibrium Statistical Mechanics

asymptotic properties can be exploited to derive, from with parameters (, ) should correspond with the
the relations between the partition functions in [13], canonical with parameters (, vgc ).
X
1 For more details, the reader is referred to Ruelle
Zgc ; ; V e N Zc ; N; V (1969) and Gallavotti (1999).
N0 15
Z 1
c
Z ; N; V eU Zmc U; N; V dU Physical Interpretation of
B
Thermodynamic Functions
and, from the above-mentioned convexity, the
consequences The existence of the limits [13] implies several
properties of interest. The first is the possibility of
pmc ; max v1  v1 fc ; v1 finding the physical meaning of the functions
v
16 pgc , fc , smc and of the parameters , 
fc ; v1 maxu k1 1
B smc u; v
u Note first that, for all V the grand canonical average
and that the maxima are attained in points, or hKi, is (d=2)1 hNi, so that 1 is proportional to
intervals, internal to the intervals of definition. Let the temperature Tgc = T(, ) in the grand canonical
vgc , uc be points where the maxima are, respectively, distribution: 1 = kB T(, ). Proceeding heuristically,
attained in [16]. the physical meaning of p(, ) and can be found
Note that the quantity e N Zc (, N, V)=Zgc (, , V) through the following remarks.
has the interpretation of probability of a density ConsiderR the microcanonical distribution mc u, v and

v1 = N=V evaluated in the grand canonical distribu- denote by the integral over (P, Q) extended to the
tion. It follows that, if the maximum in the first of domain of the (P, Q) such that H(P, Q) = U and, at
[16] is strict, that is, it is reached at a single point, the the same time, q1 2 dV, where dV is an infinitesimal
values of v1 in closed intervals not containing the volume surrounding the region . Then, by the
maximum point v1 microscopic definition of the pressure p (see the
gc have a probability behaving as
<e cV , c > 0, as V ! 1, compared to the probability introductory section), it is
of v1 s in any interval containing v1 Z
gc . Hence, vgc has N 2 p21 dP dQ
the interpretation of average value of v in the grand pdV 
ZU; N; V 3 2m N!hdN
canonical distribution, in the limit V ! 1. Z
2 dP dQ
Likewise, the interpretation of  KP 18
3ZU; N; V N!hdN
euN Zmc uN; N; V=Zc ; N; V
where   (H(P, Q)  U). The RHS of [18] can be
as probability in the canonical distribution of an compared with
energy density u shows that, if the maximum in the Z
second of [16] is strict, the values of u in closed @V ZU; N; VdV N dP dQ

intervals not containing the maximum point uc have ZU; N; V ZU; N; V N!hdN
a probability behaving as <ecV, c > 0, as V ! 1, to give
compared to the probability of us in any interval
containing uc . Hence, in the limit  ! 1, the @V Z dV p dV
N p dV
average value of u in the canonical distribution is uc . Z 2=3hKi
If the maxima are strict, [16] also establishes a R R
because hKi , which denotes the average K= 1,
relation between the grand canonical density, the should be essentially the same as the microcanonical
canonical free energy and the grand canonical para- average hKimc (i.e., insensitive to the fact that one
meter , or between the canonical energy, the micro- particle is constrained to the volume dV) if N is
canonical entropy, and the canonical parameter : large. In the limit V ! 1, V=N = v, the latter
@v1 v1 1
kB  @u smc uc ; v1 remark together with the second of [17] yields
gc fc ; vgc ; 17
k1 1
B @v smc u; v pu; v;
where convexity and strictness of the maxima imply
the derivatives existence. k1
B @u smc u; v  19
Remark Therefore, in the equivalence between respectively. Note that p 0 and it is not increasing
canonical and microcanonical ensembles, the cano- in v because smc () is concave as a function of
nical distribution with parameters (, v) should v = 1 (in fact, by the remark following [14]
correspond with the microcanonical with para- smc (u, ) is convex in  and, in general, if g() is
meters (uc , v). The grand canonical distribution convex in  then g(v1 ) is always concave in v = 1 ).
Introductory Article: Equilibrium Statistical Mechanics 59

Hence, dsmc (u, v) = (du pdv)=T, so that taking For more details the reader is referred to Ruelle
into account the physical meaning of p, T (as (1969) and Gallavotti (1999).
pressure and temperature, see the section Pressure,
temperature, and kinetic energy), smc is, in thermo-
dynamics, the entropy. Therefore (see the second Phase Transitions and Boundary
of [16]), fc (, ) = uc k1
B smc (uc , ) becomes Conditions
fc ;  uc  Tc smc uc ; ; The analysis in the last two sections of the relations
dfc p dv  smc dT 20 between elements of ensembles of distributions
describing macroscopic equilibrium states not only
and since uc has the interpretation (as mentioned in
allows us to obtain mechanical models of thermo-
the last section) of average energy in the canonical
dynamics but also shows that the models, for a given
distribution c, v it follows that fc has the thermo-
system, coincide at least as  ! 1. Furthermore, the
dynamic interpretation of free energy (once com-
equivalence between the thermodynamic functions
pared with the definition of free energy, F = U  TS,
computed via corresponding distributions in differ-
in thermodynamics).
ent ensembles can be extended to a full equivalence
By [17] and [20],
of the distributions.
@v1 v1 1
gc fc ; vgc  uc  Tc smc pvgc If the maxima in [16] are attained at single points
vgc or uc the equivalence should take place in the
and vgc has the meaning of specific volume v. Hence, gc
sense that a correspondence between , , c, v , mc
u, v
after comparison with the definition of chemical can be established so that, given any local obser-
potential, V = U  TS pV, in thermodynamics, it vable F(P, Q), defined as an observable depending
follows that the thermodynamic interpretation of on (P, Q) only through the pi , qi with qi 2 , where
is the chemical potential and (see [16], [17]), the   is a finite region, has the same average with
grand canonical relation respect to corresponding distributions in the limit
pgc ;  v1 1 1 1  ! 1.
gc  vgc uc kB smc uc ; v
The correspondence is established by considering
shows that pgc (, )  p, implying that pgc (, ) is ( , ) $ (, vgc ) $ (umc , v), where vgc is where the
the pressure expressed, however, as a function of maximum in [16] is attained, umc  uc is where the
temperature and chemical potential. maximum in [17] is attained and vgc  v, (cf. also
To go beyond the heuristic derivations above, it [19], [20]). This means that the limits
should be remarked that convexity and the property Z
that the maxima in [16], [17] are reached in the def
lim FP; Qa dP dQ hFia
interior of the intervals of variability of v or u are V!1

sufficient to turn the above arguments into rigorous a  independent; a gc; c; mc 21
mathematical deductions: this means that given [19]
as definitions of p(u, v), (u, v), the second of [20] coincide if the averages are evaluated by the
gc
follows as well as pgc (, )  p(uv , v1gc ). But the distributions , , c, vc , mc
umc , vmc
values vgc and uc in [16] are not necessarily unique: Exceptions to [21] are possible: and are certainly
convex functions can contain horizontal segments likely to occur at values of u, v where the maxima in
and therefore the general conclusion is that the [16] or [17] are attained in intervals rather than in
maxima may possibly be attained in intervals. isolated points; but this does not exhaust, in general,
Hence, instead of a single vgc , there might be a the cases in which [21] may not hold.
whole interval [v , v ], where the rhs of [16] reaches However, no case in which [21] fails has to be
the maximum and, instead of a single uc , there regarded as an exception. It rather signals that an
might be a whole interval [u , u ] where the rhs of interesting and important phenomenon occurs. To
[17] reaches the maximum. understand it properly, it is necessary to realize that
Convexity implies that the values of or  the grand canonical, canonical, and microcanonical
for which the maxima in [16] or [17] are attained families of probability distributions are by far not
in intervals rather than in single points are rare the only ensembles of probability distributions
(i.e., at most denumerably many): the interpretation whose elements can be considered to generate
is, in such cases, that the thermodynamic functions models of thermodynamics, that is, which are
show discontinuities, and the corresponding orthodic in the sense of the discussion in the section
phenomena are called phase transitions (see the Equivalence of ensembles. More general families
next section). of orthodic statistical ensembles of probability
60 Introductory Article: Equilibrium Statistical Mechanics

distributions can be very easily conceived. In canonical, or microcanonical distributions with


particular: different kinds of boundary conditions.
For instance, a boundary condition with high
Definition Consider the grand canonical, canoni-
density may produce an equilibrium state with
cal, and microcanonical distributions associated
parameters , which also has high density, i.e., the
with an energy function in which the potential
density v1 at the right extreme of the interval in
energy contains, besides the interaction  between
which the maximum in [16] is attained, while using a
particles located inside the container, also the
low-density boundary condition the limit in [21] may
interaction energy in, out between particles inside
describe the averages taken in a state with density v1

the container and external particles, identical to the
at the left extreme of the interval or, perhaps, with a
ones in the container but not allowed to move and
density intermediate between the two extremes.
fixed in positions such that in every unit cube 
Therefore, the following definition emerges.
external to  there is a finite number of them
bounded independently of . Such configurations of Definition If the grand canonical distributions
external particles will be called boundary condi- with parameters (, ) and different choices of
tions of fixed external particles. fixed external particles boundary conditions gene-
The thermodynamic limit with such boundary rate for some local observable F average values
conditions is obtained by considering the grand which are different by more than a quantity  > 0
canonical, canonical, and microcanonical distribu- for all large enough volumes  then one says that
tions constructed with potential energy function the system has a phase transition at (, ). This
 in, out in containers  of increasing size taking implies that the limits in [21], when existing, will
care that, while the size increases, the fixed particles depend on the boundary condition and their values
that would become internal to  are eliminated. The will represent averages of the observables in
argument used in the section Thermodynamic limit different phases. A corresponding definition is
to show that the three models of thermodynamics, given in the case of the canonical and microcano-
considered there, did define the same thermodynamic nical distributions when, given (, v) or (u, v), the
functions can be repeated to reach the conclusion that limit in [21] depends on the boundary conditions
also the (infinitely many) new models of thermo- for some F.
dynamics in fact give rise to the same thermodynamic
Remarks
functions and averages of local observables. Further-
more, the values of the limits corresponding to [13] 1. The idea is that by fixing one of the thermodynamic
can be computed using the new partition functions ensembles and by varying the boundary conditions
and coincide with the ones in [13] (i.e., they are one can realize all possible states of equilibrium of
independent of the boundary conditions). the system that can exist with the given values of
However, it may happen, and in general it is the parameters determining the state in the chosen
the case, for many models and for particular values ensemble (i.e., (, ), (, v), or (u, v) in the grand
of the state parameters, that the limits in [21] do canonical, canonical, or microcanonical cases,
not coincide with the analogous limits computed respectively).
in the new ensembles, that is, the averages of 2. The impression that in order to define a phase
some local observables are unstable with respect transition the thermodynamic limit is necessary
to changes of boundary conditions with fixed is incorrect: the definition does not require
particles. considering the limit  ! 1. The phenomenon
There is a very natural interpretation of such that occurs is that by changing boundary condi-
apparent ambiguity of the various models of tions the average of a local observable can
thermodynamics: namely, at the values of the change at least by amounts independent of the
parameters that are selected to describe the macro- system size. Hence, occurrence of a phase
scopic states under consideration, there may corre- transition is perfectly observable in finite volume:
spond different equilibrium states with the same it suffices to check that by changing boundary
parameters. When the maximum in [16] is reached conditions the average of some observable
on an interval of densities, one should not think of changes by an amount whose minimal size is
any failure of the microscopic models for thermo- volume independent. It is a manifestation of an
dynamics: rather one has to think that there are instability of the averages with respect to changes
several states possible with the same , and that in boundary conditions: an instability which does
they can be identified with the probability distribu- not fade away when the boundary recedes to
tions obtained by forming the grand canonical, infinity, i.e., boundary perturbations produce
Introductory Article: Equilibrium Statistical Mechanics 61

bulk effects and at a phase transition the averages an idealization void of physical reality, it is never-
of the local observable, if existing at all, will theless useful to define such states because certain
exhibit a nontrivial dependence on the boundary notions (e.g., that of pure state) can be sharply
conditions. This is also called long range order. defined, with few words and avoiding wide circum-
3. It is possible to show that when this happens then volutions, in terms of them. Therefore, let:
some thermodynamic function whose value is
Definition An infinite-volume state with parameters
independent of the boundary condition (e.g., the
(, v), (u, v) or (, ) is a collection of average values
free energy in the canonical distributions) has
F ! hFi obtained, respectively, as limits of finite-
discontinuous derivatives in terms of the para-
volume averages hFin defined from canonical, micro-
meters of the ensemble. This is in fact one of the
canonical, or grand canonical distributions in n with
frequently-used alternative definitions of phase
fixed parameters (, v), (u, v) or (, ) and with general
transitions: the latter two natural definitions of
boundary condition of fixed external particles, on
first-order phase transition are equivalent. How-
sequences n ! 1 for which such limits exist simul-
ever, it is very difficult to prove that a given system
taneously for all local observables F.
shows a phase transition. For instance, existence of
a liquidgas phase transition is still an open Having set the definition of infinite-volume
problem in systems of the type considered until state consider a local observable G(X) and let
the section Lattice models below.
 G(X) = G(X ),  2 Rd , with X  denoting the
4. A remarkable unification of the theory of the configuration X in which all particles are trans-
equilibrium ensembles emerges: all distributions of lated by : then an infinite-volume state is called
any ensemble describe equilibrium states. If a a pure state if for any pair of local observables
boundary condition is fixed once and for all, then F, G it is
some equilibrium states might fail to be described
hF
 Gi  hFih
 Gi! 0 22
by an element of an ensemble. However, if all !1
boundary conditions are allowed then all equili-
which is called a cluster property of the pair F, G.
brium states should be realizable in a given
The result alluded to in remark (6) is that at least in
ensemble by varying the boundary conditions.
the case of hard-core systems (or of the simple lattice
5. The analysis leads us to consider as completely
systems discussed in the section Lattice models) the
equivalent without exceptions grand canonical,
infinite-volume equilibrium states in the above sense
canonical, or microcanonical ensembles enlarged
exhaust at least the totality of the infinite-volume
by adding to them the distributions with poten-
pure states. Furthermore, the other states that can be
tial energy augmented by the interaction with
obtained in the same way are convex combinations of
fixed external particles.
the pure states, i.e., they are statistical mixtures of
6. The above picture is really proved only for
pure phases. Note that h
 Gi cannot be replaced, in
special classes of models (typically in models
general, by hGi because not all infinite-volume states
in which particles are constrained to occupy
are necessarily translation invariant and in simple
points of a lattice and in systems with hard core
cases (e.g., crystals) it is even possible that no
interactions, r0 > 0 in [14]) but it is believed to
translation-invariant state is a pure state.
be correct in general. At least it is consistent
with all that is known so far in classical Remarks
statistical mechanics. The difficulty is that,
1. This means that, in the latter models, general-
conceivably, one might even need boundary
izing the boundary conditions, for example
conditions more complicated than the fixed
considering external particles to be not identical
particles boundary conditions (e.g., putting
to the ones inside the system, using periodic or
different particles outside, interacting with
partially periodic boundary conditions, or the
the system with an arbitrary potential, rather
widely used alternative of introducing a small
than via ).
auxiliary potential and first taking the infinite-
The discussion of the equivalence of the ensembles volume states in presence of it and then letting
and the question of the importance of boundary the potential vanish, does not enlarge further the
conditions has already imposed the consideration set of states (but may sometimes be useful: an
of several limits as  ! 1. Occasionally, it will example of a study of a phase transition by using
again come up. For conciseness, it is useful to set up the latter method of small fields will be given in
a formal definition of equilibrium states of an the section Continuous symmetries: no d = 2
infinite-volume system: although infinite volume is crystal theorem).
62 Introductory Article: Equilibrium Statistical Mechanics

2. If is the indicator function of a local event, it both sides of the equations of motion, mqi = f i , by
will make sense to consider the probability of (1=2)qi and summing over i, it follows that
occurrence of the event in an infinite-volume state
defining it as h i. In particular, the probability 1X N
1X N
def 1
 mqi q
i  qi f i Cq
density for finding p particles at x1 , x2 , . . . , xp , 2 i1 2 i1 2
called the p-point correlation function, will thus be
defined in an infinite-volume state. For instance, and the quantity C(q) defines the virial of the forces
if the state is obtained as a limit of canonical in the configuration q. Note that C(q) is not
states h in with parameters , ,  = Nn =Vn , in a translation invariant because of the presence of the
sequence of containers n , then forces due to the walls.
* + Writing the force f i as a sum of the internal and
X
Nn
the external forces (due to the walls) the virial C can
x lim x  qj
n
j1
be expressed naturally as sum of the virial Cint of the
n
* + internal forces (translation invariant) and of the
X p
Nn Y
virial Cext of the external forces.
x1 ; x2 ; . . . ; xp lim xj  qij
n
i1 ;...;ip j1
By dividing both sides of the definition of the
n
virial by
and integrating over the time interval
where the sum is over the ordered p-ples [0,
], one finds in the limit
! 1, that is, up to
(j1 , . . . , jp ). Thus, the pair correlation (q, q0 ) quantities relatively infinitesimal as
! 1, that
and its possible cluster property are
hKi 12hCi and hCext i 3pV
0
q; q
where p is the pressure and V the volume. Hence
R 0
def n expUq; q ; q1 ; . . . ; qNn 2 dq1 dqNn 2
lim hKi 32 pV 12hCint i
n Nn  2!Zc0 ;; Vn

q; q0 x  qq0 x ! 0 23


or
x!1
1 hCint i
pv 24
where  3N
Z
def Equation [24] is Clausius virial theorem: in the case
Zc0 eUQ dQ
of no internal forces, it yields pv = 1, the ideal-gas
equation.
is the configurational partition function.
The internal virial Cint can be written, if f j ! i =
The reader is referred to Ruelle (1969), Dobrushin @qi (qi  qj ), as
(1968), Lanford and Ruelle (1969), and Gallavotti
(1999). N X
X
Cint  f j!i qi
i1 i6j
X
Virial Theorem and Atomic Dimensions  @ qi qi  qj qi  qj
i<j
For a long time it has been doubted that just
changing boundary conditions could produce such which shows that the contribution to the virial by
dramatic changes as macroscopically different states the internal repulsive forces is negative while that of
(i.e., phase transitions in the sense of the definition in the attractive forces is positive. The average of Cint
the last section). The first evidence that by taking the can be computed by the canonical distribution,
thermodynamic limit very regular analytic functions which is convenient for the purpose. van der Waals
like N1 log Zc (, N, V) (as a function of , v = V=N) first used the virial theorem to perform an actual
could develop, in the limit  ! 1, singularities like computation of the corrections to the perfect-gas
discontinuous derivatives (corresponding to the max- laws. Simply neglect the third-order term in the
imum in [16] being reached on a plateau and to a density and use the approximation (q1 , q2 ) =
consequent existence of several pure phases) arose in 2 e(q1 q2 ) for the pair correlation function, [23],
the van der Waals theory of liquidgas transition. then
Consider a real gas with N identical particles with
mass m in a container  with volume V. Let the 1 3 2
hCint i V  I VO3 25
force acting on the ith particle be f i ; multiplying 2 2
Introductory Article: Equilibrium Statistical Mechanics 63

where numerical simulations. In fact, this idea has been


Z exploited in many numerical experiments, in which
1
I eq  1d3 q [24] plays a key role.
2 For more details, the reader is referred to Gallavotti
and the equation of state [24] becomes (1999).
I
pv Ov2 1
v van der Waals Theory
For the purpose of illustration, the calculation of I Equation [27] is empirically used beyond its validity
can be performed approximately at high tempera- region (small density and small ) by regarding A, B as
ture ( small) in the case phenomenological parameters to be experimentally
   determined by measuring them near generic values of
r0 12 r0 6 p, V, T. The measured values of A, B do not usually
r 4" 
r r vary too much as functions of v, T and, apart from
this small variability, the predictions of [27] have
(the classical LennardJones potential), ", r0 > 0.
reasonably agreed with experience until, as experi-
The result is
mental precision increased over the years, serious
I b  a inadequacies eventually emerged.
32 4 r0 3 Certain consequences of [27] are appealing: for
b 4v0 ; a "v0 ; v0 example, Figure 1 shows that it does not give a p
3 3 2
monotonic nonincreasing in v if the temperature is
Hence, small enough. A critical temperature can be defined
  as the largest value, Tc , of the temperature below
a b 1 1
pv  O which the graph of p as a function of v is not
v v  v2
     monotonic decreasing; the critical volume Vc is the
a b 1 1 1 1 value of v at the horizontal inflection point
p 2 v 1 O
v v  1  b=v  v2 occurring for T = Tc .
or For T < Tc the van der Waals interpretation of the
 equation of state is that the function p(v) may
a describe metastable states while the actual equilibrium
p 2 v  b 1 Ov2 26
v states would follow an equation with a monotonic
which gives the equation of state for "  1. Equation dependence on v and p(v) becoming horizontal in the
[26] can be compared with the well-known empirical coexistence region of specific volumes. The precise
van der Waals equation of state: value of p where to draw the plateau (see Figure 1)
 a would then be fixed by experiment or theoretically
 p 2 v  b 1 predicted via the simple rule that the plateau
v
associated with the represented isotherm is drawn at
or a height such that the area of the two cycles in the
p An2 =V 2 V  nB nRT 27 resulting loop are equal.
This is Maxwells rule: obtained by assuming
where, if NA is Avogadros number, A = aNA2 , that the isotherm curve joining the extreme points of
B = bNA , R = kB NA , n = N=NA . It shows the possi- the plateau and the plateau itself define a cycle
bility of accessing the microscopic parameters " and
r0 of the potential via measurements detecting
deviations from the BoyleMariotte law, pv = 1,
of the rarefied gases: " = 3a=8b = 3A=8BNA
r0 = (3b=2 )1=3 = (3B=2 NA )1=3 .
As a final comment, it is worth stressing that the p
virial theorem gives in principle the exact correc-
tions to the equation of state, in a rather direct and
simple form, as time averages of the virial of the
internal forces. Since the virial of the internal forces vi vg v
is easy to calculate from the positions of the Figure 1 The van der Waals equation of state at a temperature
particles as a function of time, the theorem provides T < Tc where the pressure is not monotonic. The horizontal line
a method for computing the equation of state in illustrates the Maxwell rule.
64 Introductory Article: Equilibrium Statistical Mechanics

(see Figure 1) representing a sequence of possible and call P0 (v) the (-independent) product of  times
macroscopic equilibrium states (the ones correspond- the pressure of the hard-core system without any
ing to the plateau) or states with extremely long time attractive tail (P0 (v) is not explicitly known except
of stability (metastable) represented by the curved if d = 1, in which case it is P0 (v)(v  b) = 1, b = r0 ),
part. This would be an isothermal Carnot cycle which, and let
therefore, could not produce Z
H work: since the work 1
produced in the cycle (i.e., pdv) is the signed area a j1 qjdq
enclosed by the cycle the rule just means that the area is 2 jqj>r0
zero. The argument is doubtful at least because it is not If p(, v; ) is the pressure when > 0 then it can be
clear that the intermediate states with p increasing proved that
with v could be realized experimentally or could even
def
be theoretically possible. p; v lim p; v;
!0
A striking prediction of [27], taken literally, is

that the gas undergoes a gasliquid phase transition a


 2 P0 v 29
with a critical point at a temperature Tc , volume vc , v Maxwell0 s rule
and pressure pc that can be computed via [27] and
are given by RTc = 8A=27B, Vc = 3B (n = 1). where the subscript means that the graph of p(, v)
At the same time, the above prediction is interesting as a function of v is obtained from the function in
as it shows that there are simple relations between the square bracket by applying to it Maxwells rule,
critical parameters and the microscopic inter- described above in the case of the van der Waals
action constants, i.e., " kB Tc and r0 (Vc =NA ))1=3 : equation. Equation [29] reduces exactly to the
or more precisely " = 81kB Tc =64, r0 = (Vc =2 NA )1=3 van der Waals equation for d = 1, and for d > 1
if a classical LennardJones potential (i.e., = 4" it leads to an equation with identical critical
((r0 =jqj)12  (r0 =jqj)6 ); see the last section) is used behavior (even though P0 (v) cannot be explicitly
for the interaction potential . computed).
However, [27] cannot be accepted acritically not The reader is referred to Lebowitz and Penrose
only because of the approximations (essentially the (1979) and Gallavotti (1999) for more details.
neglecting of O(v1 ) in the equation of state), but
mainly because, as remarked above, for T < Tc the
function p is no longer monotonic in v as it must be;
Absence of Phase Transitions: d = 1
see comment following [19].
The van der Waals equation, refined and comple- One of the most quoted no-go theorems in statistical
mented by Maxwells rule, predicts the following mechanics is that one-dimensional systems of parti-
behavior: cles interacting via short-range forces do not exhibit
phase transitions (cf. the next section) unless the
p  pc / v  vc  ;  3; T Tc somewhat unphysical situation of having zero
vg  vl / Tc  T ;  1=2; for T ! Tc 28 absolute temperature is considered. This is particu-
larly easy to check in the case of nearest-neighbor
which are in sharp contrast with the experimental hard-core interactions. Let the hard-core size be r0 ,
data gathered in the twentieth century. For the so that the interaction potential (r) = 1 if r  r0 ,
simplest substances, one finds instead  5,  1=3. and suppose also that (r)  0 if f 2r0 . In this
Finally, blind faith in the equation of state [27] is case, the thermodynamic functions can be exactly
untenable, last but not least, also because nothing in computed and checked to be analytic: hence the
the analysis would change if the space dimension was equation of state cannot have any phase transition
d = 2 or d = 1: but for d = 1, it is easily proved that the plateau. This is a special case of van Hoves theorem
system, if the interaction decays rapidly at infinity, establishing smoothness of the equation of state for
does not undergo phase transitions (see next section). interactions extending beyond the nearest neighbor
In fact, it is now understood that van der Waals and rapidly decreasing at infinity.
equation represents rigorously only a limiting situa- If the definition of phase transition based on the
tion, in which particles have a hard-core interaction sensitivity of the thermodynamic limit to variations
(or a strongly repulsive one at close distance) and a of boundary conditions is adopted then a more
further smooth interaction with very long range. general, conceptually simple, argument can be given
More precisely, suppose that the part of the potential to show that in one-dimensional systems there
outside a hard-core radius r0 > 0 is attractive cannot be any phase transition if the potential
(i.e., non-negative) and has the form d 1 ( 1 jqj)  0 energy of mutual interaction between a
Introductory Article: Equilibrium Statistical Mechanics 65

configuration Q of particles to the left of a reference configuration which, at least for one boundary
particle (located at the origin O, say) and a condition (e.g., periodic or open), has the same
configuration Q0 to the right of the particle (with energy.
Q [ O [ Q0 compatible with the hard cores) is A symmetry is said to be continuous if the
uniformly bounded below. Then a mathematical group of transformations is a continuous group. For
proof can be devised showing that the influence of instance, continuous systems have translational
boundary conditions disappears as the boundaries symmetry if considered in a container  with
recede to infinity. One also says that no long-range periodic boundary conditions. Systems with too
order can be established in a one-dimensional case, much symmetry sometimes cannot show phase
in the sense that one loses any trace of the boundary transitions. For instance, the continuous translation
conditions imposed. symmetry of a gas in a container  with periodic
The analysis fails if the space dimension is 2: in boundary conditions is sufficient to exclude the
this case, even if the interaction is short-ranged, the possibility of crystallization in dimension d = 2.
energy of interaction between two regions of space To discuss this, which is a prototype of a proof
separated by a boundary is of the order of the which can be used to infer absence of many
boundary area. Hence, one cannot bound above and transitions in systems with continuous symmetries,
below the probability of any two configurations in consider the translational symmetry and a potential
two half-spaces by the product of the probabilities satisfying, besides the usual [14] and with the
of the two configurations, each computed as if the symbols used in [14], the further property that
other was not there. This is because such a bound jqj2 j@ij2 (q)j < Bjqj(d"0 ) , with "0 > 0, for some B
would be proportional to the exponential of the holds for r0 < jqj  R. This is a very mild extra
surface of separation, which tends to 1 when the requirement (and it allows for a hard-core
surface grows large. This means that we cannot interaction).
consider, at least not in general, the configurations Consider an ideal crystal on a square lattice
in the two half-spaces as independently distributed. (for simplicity) of spacing a, exactly fitting in its
Analytically, a condition on the potential suffi- container  of side L assumed with periodic
cient to imply that the energy between a configura- boundary conditions: so that N = (L=a)d is the
tion to the left and one to the right of the origin is number of particles and ad is the density, which is
bounded below, if d = 1, is simply expressed by supposed to be smaller than the close packing
Z 1 density if the interaction has a hard core. The
rjrjdr < 1 for r0 > r0 probability distribution of the particles is rather
r0 trivial:
Therefore, in order to have phase transitions in XY dQ
d = 1, a potential is needed that is so long range  qpn  a n
p n
N!
that it has a divergent first moment. It can be
shown by counterexamples that if the latter condi- the sum running over the permutations m ! p(m) of
tion fails there can be phase transitions even in the sites m 2 , m 2 Zd , 0 < mi  La1 . The density
d = 1 systems. at q is
The results just quoted also apply to discrete * +
X X
N
models like lattice gases or lattice spin models that bq q  a n  q  qj
will be considered later in the article. n j1
For more details, we refer the reader to Landau
and Lifschitz (1967), Dyson (1969), Gallavotti and its Fourier transform is proportional to
(1999), and Gallavotti et al. (2004). * +
def 1
X 2
k eik qj ; k n; n 2 Zd
N j
L
Continuous Symmetries: No d = 2
(k) has value 1 for all k of the form K = (2 =a)n
Crystal Theorem
and (1=N)O( maxc = 1, 2 jeikc a  1j2 ) otherwise. In
A second case in which it is possible to rule out presence of interaction, it has to be expected that,
existence of phase transitions or at least of certain in a crystal state, (k) has peaks near the values K:
kinds of transitions arises when the system under but the value of (k) can depend on the boundary
analysis enjoys large symmetry. By symmetry is conditions.
meant a group of transformations acting on the Since the system is translation invariant a crystal
configurations and transforming each of them into a state defined as a state with a distribution close to ,
66 Introductory Article: Equilibrium Statistical Mechanics

i.e., with (q)


with peaks at the ideal lattice points crystal will be identified with the impossibility of the
q = na, cannot be realized under periodic boundary [30]. Other criteria can be imagined, for example,
conditions, even when the system state is crystalline. considering crystals with a lattice different from
To realize such a state, a symmetry-breaking term is simple cubic, which lead to the same result by
needed in the interaction. following the same technique. Nevertheless, it is not
This can be done in several ways, for example, by mathematically excluded (but unlikely) that, with
changing the boundary condition. Such a choice some weaker existence definition, a crystal state
implies a discussion of how much the boundary could be possible even in two dimensions.
conditions influence the positions of the peaks of The following inequalities hold under the present
(k): for instance, it is not obvious that a boundary assumptions on the potential and in the canonical
condition will not generate a state with a period distribution with periodic boundary conditions
different from the one that a priori has been selected and parameters (, ),  = a3 in a box  with side
for disproval (a possibility which would imply a multiple of a (so that N = (La1 )d ) and potential of
reciprocal lattice of Ks different from the one interaction "W. The further assumption that the
considered to begin with). Therefore, here the choice lattice na is not a close-packed lattice is (of course)
will be to imagine that an external weak force with necessary when the interaction potential has a hard
potential "W(q) acts forcing a symmetry breaking core. Then, for suitable B0 , B, B1 , B2 > 0, indepen-
that favors the occupation of regions around the dent of N, and " and for jk j < =a and for all 
points of the ideal lattice (which would mark the (if K 6 0)
average positions of the particles in the crystal state  N 2 
that is being sought). The proof (Mermins theorem) 1 X
ik K qj " K " K 2k 2
e B
that no equilibrium state with particles distribution N j1 B1 k 2 "B2
close to , i.e., with peaks in place of the delta  N 2 
functions (see below), is essentially reproduced 1X dk X
ikK qj
k e  B0 < 1 31
below. N k N j1
P
Take W(q) = na2 (q  na), where (q)  0 is
smooth and zero everywhere except in a small where the averages are in the canonical distribu-
vicinity of the lattice points around which it tion (, ) with periodic boundary conditions and a
decreases to some negative minimum keeping a symmetry-breaking potential "W(q); (k) 0 is an
rotation symmetry around them. The potential W is (arbitrary) smooth function vanishing for 2jk j 
invariant under translations by the lattice steps. By with  < 2 =a and B0 depends on . See Appendix
the choice of the boundary condition and "W, the 3 for a derivation of [31].
density e" (q) will be periodic with period a so that Multiplying both sides of the first equation in [31]
" (k) will, possibly, not have a vanishing limit as by N 1 (k ) and summing over k , the crystallinity
N ! 1 only if k is a reciprocal vector K = (2 =a)n. condition in the form [30] implies
If the potential is "W and if there exists a crystal Z
k dk
state in which particles have higher probability of B0 Br2 ad 2 B "B
jkj< k 1 2
being near the lattice points na, it should be
expected that for small " > 0 the system will be For d = 1, 2 the integral diverges, as "1=2 or log "1 ,
found in a state with Fourier transform of the respectively, implying j" (K)j  ! r = 0: the criterion
"!0
density, " (k), satisfying, for some vector K 6 0 in of crystallinity, [30] cannot be satisfied if d = 1, 2.
the reciprocal lattice, The above inequality is an example of a general
class of inequalities called infrared inequalities stem-
lim lim j" Kj r > 0 30 ming from another inequality called Bogoliubovs
"!0 N!1
inequality (see Appendix 3), which lead to the proof
that is, the requirement is that uniformly in " ! 0 that certain kinds of ordered phases cannot exist if
the Fourier transform of the density has a peak at the dimension of the ambient space is d = 2 when a
some K 6 0. Note that if k is not in the reciprocal finite volume, under suitable boundary conditions
lattice " (k) N!!1 0, being bounded above by (e.g., periodic), shows a continuous symmetry. The
excluded phenomenon is, more precisely, the non-
 
1 existence of equilibrium states exhibiting, in the
O max jeikj a  1j2 thermodynamic limit, a symmetry lower than
N j1;2
the continuous symmetry holding in a finite volume.
" is periodic and its integral over q is
because (1=N)e In general, existence of thermodynamic equili-
equal to 1. Hence, excluding the existence of a brium states with symmetry lower than the
Introductory Article: Equilibrium Statistical Mechanics 67

symmetry enjoyed by the system in finite volume defined, in the grand canonical distribution with
and under suitable boundary conditions is called a parameters , (and empty boundary conditions), by
spontaneous symmetry breaking. It is yet another X1
def 1
manifestation of instability with respect to changes  q1 ; .. .; qn znm
in boundary conditions, hence its occurrence reveals Zgc ; ; V m0
Z
a phase transition. There is a large class of systems dy dym
 eq1 ;...;qn ;y1 ;...;ym 1 32
for which an infrared inequality implies absence of  m!
spontaneous symmetry breaking: in most of the one-
This is the probability density for finding particles
or two-dimensional systems a continuous symmetry
with any momentum in the volume element dq1 dqn
cannot be spontaneously broken.
(irrespective of where other particles are), and
p
The limitation to dimension d  2 is a strong
z = e ( 2 m 1 h2 )d accounts for the integration
limitation to the generality of the applicability of
over the momenta variables and is called the activity:
infrared theorems to exclude phase transitions.
it has the dimension of a density (cf. [23]).
More precisely, systems can be divided into classes
Assuming that the potential has a hard core (for
each of which has a critical dimension below
simplicity) of radius R, the interaction energy
which too much symmetry implies absence of
q1 (q2 , . . . , qn ) of a particle at q1 with any number
phase transitions (or of certain kinds of phase
of other particles at q2 , . . . , qm with jqi  qj j > R is
transitions).
bounded below by B for some B 0 (related but
It should be stressed that, at the critical dimen-
not equal to the B in [14]). The functions  will be
sion, the symmetry breaking is usually so weakly
regarded as a sequence of functions of one, two, . . .
forbidden that one might need astronomically large
particle positions:  = { (q1 , . . . , qn )}1
n = 1 vanish-
containers to destroy small effects (due to boundary
ing for qj 62 . Then, one checks that
conditions or to very small fields) which break the
symmetry. For example, in the crystallization just  q1 ; . . . ; qn zn;1  q1 K  q1 ; . . . ; qn 33a
discussed, the Fourier transform peaks are only
p with
bounded by O(1= log "1 ). Hence, from a practical
point of view, it might still be possible to have some def
K q1 ; ... ;qn eq1 q2 ;...;qn  q2 ;.. .;qn n>1
kind of order even in large containers. 1 Z
X dy1 dys Y s
The reader is referred to Mermin (1968), Hohen- eq1 yk  1
berg (1969), and Ruelle (1969). s1  s! k1
 q2 ;. ..;qn ; y1 ; ...; ys 33b

High Temperature and Small Density where n,1 , n>1 are Kronecker deltas and  (q) is the
indicator function of . Equation [33] is called the
There is another class of systems in which no phase KirkwoodSalzburg equation for the family of corre-
transitions take place. These are the systems with lation functions in . The kernel K of the equations is
stable and tempered interactions (e.g., those independent of , but the domain of integration is .
satisfying [14]) in the high-temperature and low- Calling  the sequence of functions
density region. The property is obtained by showing  (q1 , . . . , qn )  0 if n 6 1 and  (q) = z  (q), a
that the equation of state is analytic in the variables recursive expansion arises, namely
(, ) near the origin (0, 0).
A simple algorithm (Mayers series) yields the  z z2 K z3 K2  z4 K3  34
coefficients of the virial series
It gives the correlation functions, provided the series
X
1 converges. The inequality
p;   ck k Z p
k2
jKp  q1 ;. ..;qn j  e2B1p jeq  1jdq
It has the drawback that the kth order coefficient ck ()
is expressed as a sum of many terms (a number def 2B1p
e r3p 35
growing more than exponentially fast in the order k)
and it is not so easy (but possible) to show shows that the series [34], called Mayers series,
combinatorially that their sum is bounded exponen- converges if jzj < e(2B1) r()3 . Convergence is
tially in k if  is small enough. A more efficient uniform (as  ! 1) and (Kp ) (q1 , ... , qn ) tends to
approach leads quickly to the desired solution. a limit as V ! 1 at fixed q1 , ... , qn and the limit is
def P
Denoting F(q1 , . . . , qn ) = i<j (qi  qj ), consider simply (Kp )(q1 , .. ., qn ), if (q1 , .. ., qn )  0 for n 6 1,
the (spatial or configurational) correlation functions and (q1 )  1. This is because the kernel K contains
68 Introductory Article: Equilibrium Statistical Mechanics

the factors (e(q1 y)  1) which decay rapidly or, if therefore their configurations do not contain
has finite range, will eventually even vanish. It momentum variables.
is also clear that (Kp )(q1 , ... , qn ) is translation The interaction energy is just the potential
invariant. energy, and ensembles are defined as collections of
Hence, if jzje2B1 r()3 < 1, the limits, as  ! 1, probability distributions on the position coordinates
of the correlation functions exist and can be of the particle configurations. Usually, the potential
computed by a convergent power series in z; the is a pair potential decaying fast at 1 and, often,
correlation functions will be translation invariant (in with a hard-core forbidding double or higher
the thermodynamic limit). occupancy of the same lattice site. For instance,
In particular, the one-point correlation function the lattice gas with potential , in a cubic box 
 = (q) is  = z(1 O(zr()3 )), which, to lowest order with jj = V = Ld sites of a square lattice with mesh
in z, just shows that activity and density essentially a > 0, is defined by the potential energy attributed
coincide when they are small enough. Furthermore, to the configuration X of occupied distinct sites,
p = (1=V) log Zgc (, , V) is such that i.e., subsets X :
Z X
1
z@z p  q dq HX  x  y 37
V x;y2X

(from the definition of  in [32]). Therefore,


where the sum is over pairs of distinct points in X.
1 The canonical ensemble and the grand canonical
p; z lim log Zgc ; ; V
V!1 V ensemble are the collections of distributions, para-
Z z 0
dz metrized by (, ), ( = N=V), or, respectively, by
0
; z0 36 (, ), attributing to X the probability
0 z

and, since the density  is analytic in z as well and eHX


p; X jXj;N 38a
 z for z small, the grand canonical pressure is Zcp ; N; 
analytic in the density and p = (1 O(2 )), at small
density. In other words, the equation of state is, to or
lowest order, essentially the equation of a perfect gas. e jXj eHX
All quantities that are conceivably of some interest p; X 38b
Zgc
p ; ; 
turn out to be analytic functions of temperature and
density. The system is essentially a free gas and it has where the denominators are normalization factors
no phase transitions in the sense of a discontinuity or that can, respectively, be called, in analogy with the
of a singularity in the dependence of a thermodynamic theory of continuous systems, canonical and grand
function in terms of others. Furthermore, the system canonical partition functions; the subscript p stands
cannot show phase transitions in the sense of sensitive for particles.
dependence on boundary conditions of fixed external A lattice gas in which in each site there can be at
particles. This also follows, with some extra work, most one particle can be regarded as a model for the
from the KirkwoodSalzburg equations. distribution of a family of spins on a lattice. Such
The reader is referred to Ruelle (1969) and models are quite common and useful (e.g., they arise
Gallavotti (1969) for more details. in studying systems with magnetic properties).
Simply identify an occupied site with a spin
up or and an empty site with a spin down
Lattice Models or  (say). If s = {x }x2 is a spin configuration, the
energy of the configuration for potential and
The problem of proving the existence of phase magnetic field h will be
transitions in models of homogeneous gases with
X X
pair interactions is still open. Therefore, it makes Hs  x  yx y  h x 39
sense to study the problem of phase transitions x;y2 x
in simpler models, tractable to some extent but
nontrivial, and which are of practical interest in with the sum running over pairs (x, y) 2  of distinct
their own right. sites. If (x  y)  Jxy 0, the model is called a
The simplest models are the so-called lattice ferromagnetic Ising model. As in the case of
models in which particles are constrained to points continuous systems, it will be assumed to have a
of a lattice: they cannot move in the ordinary sense finite range for : that is, (x) = 0 for jxj > R, for
of the word (but, of course, they could jump) and some R, unless explicitly stated otherwise.
Introductory Article: Equilibrium Statistical Mechanics 69

The canonical and grand canonical ensembles in the can be shown to exist by a method similar to the
box  with respective parameters (, m) or (, h) will one discussed in Appendix 2. They have convexity
be defined as the probability distributions
P on the spin and continuity properties as in the cases of the
configurations s = {x }x2 with x2 x = M = mV
 continuum systems. In the case of a lattice gas, the
or without constraint on M, respectively; hence, f , p functions are still interpreted as free energy
 P  and pressure, respectively. In the case of spin, f (, h)
exp  x;y x  yx y has the interpretation of magnetic free energy,
p;m s while g(, m) does not have a special name in the
Zcs ; M; 
thermodynamics of magnetic systems. As in the
p;h s 40
  continuum systems, it is occasionally useful to define
P P
exp h x   x;y x  yx y infinite-volume equilibrium states:

Zgc
s ; h;  Definition An infinite-volume state with para-
meters (, h) or (, m) is a collection of average
where the denominators are normalization factors
values F ! hFi obtained, respectively, as limits of
again called, respectively, the canonical and grand
finite-volume averages hFin defined from canonical
canonical partition functions. As in the study of the
or grand canonical distributions in n with fixed
previous continuous systems, canonical and grand
parameters (, h) or (, m), or (u, v) and with general
canonical ensembles with external fixed particle
boundary condition of fixed external spins or empty
configurations can be defined together with the
sites, on sequences n ! 1 for which such limits
corresponding ensembles with external fixed spin
exist simultaneously for all local observables F.
configurations; the subscript s stands for spins.
For each configuration X  of a lattice gas, let This is taken verbatim from the definition in the
{nx } be nx = 1 if x 2 X and nx = 0 if x 62 X. Then the section Phase transitions and boundary condi-
transformation x = 2nx  1 establishes a correspon- tions. In this way, it makes sense to define the
dence between lattice gas and spin distributions. In spin correlation Q functions for X = (x1 , . . . , xn ) as
the correspondence, the potential (x  y) of the hX i if X = j xj . For instance, we shall call
def
lattice gas generates a potential (1=4)(x  y) for the (x1 , x2 ) = hx1 x2 i and a pure phase can be defined
corresponding spin system and the chemical potential as an infinite-volume state such that
for the lattice gas is associated with a magnetic
P field
hX Yx i  hX ihYx i ! 0 42
h for the spin system with h = (1=2)( x60 (x)). x!1
The correspondence between boundary conditions
Again, for more details, we refer the reader to Ruelle
is natural: for instance, a boundary condition for the
(1969) and Gallavotti (1969).
lattice gas in which all external sites are occupied
becomes a boundary condition in which external
sites contain a spin . The close relation between
lattice gas and spin systems permits switching from Thermodynamic Limits and Inequalities
one to the other with little discussion. An interesting property of lattice systems is that it is
In the case of spin systems, empty boundary possible to study delicate questions like the existence
conditions are often considered (no spins outside ). of infinite-volume states in some (moderate) generality.
In lattice gases and spin systems (as well as in A typical tool is the use of inequalities. As the simplest
continuum systems), often periodic and semiperiodic example of a vast class of inequalities, consider the
boundary conditions are considered (i.e., periodic in ferromagnetic Ising model with some finite (but
one or more directions and with empty or fixed arbitrary) range interaction Jxy 0 in a field hx 0 :
external particles or spins in the others). J, h may even be not translationally invariant. Then
Thermodynamic limits for the partition functions def
the average of X = x1 x2 xn , X = (x1 , . . . , xn ),
1 in a state with empty boundary conditions (i.e., no
f ; v lim log Zcp ; N;  external spins) satisfies the inequalities
!1
V=Nv
N
1 gc
hX i; @hx hX i; @Jxy hX i 0 X = x1 ; . . . ; xn
p; lim log Zp ; ; 
!1 V More generally, let H(s) in [39] be replaced by
41 P
1 H(s) =  X JX X with JX 0 and X can be any
g; m lim log Zcs ; M; 
!1; V finite set; then, if Y = (y1 , . . . , yn ), X = (x1 , . . . , xn ),
M=V!m
the following Griffiths inequalities hold:
1
f ; h lim log Zgc
s ; ; 
!1 V hX i 0; @JY hX i  hX Y i  hX ihY i 0 43
70 Introductory Article: Equilibrium Statistical Mechanics

The inequalities can be used to check, in ferromag- Symmetry-Breaking Phase Transitions


netic Ising models, [39], existence of infinite-volume
The simplest phase transitions (see the section
states (cf. the sections Phase transitions and boundary
Phase transitions and boundary conditions) are
conditions and Lattice models) obtained by fixing
symmetry-breaking transitions in lattice systems:
the boundary condition B to be either all external
they take place when the energy of the system in a
spins or all external sites empty. If hFiB, 
container  and with some special boundary
denotes the grand canonical average with boundary
condition (e.g., periodic, antiperiodic, or empty) is
condition B and any fixed , h > 0, this means that
invariant with respect to the action of a group G on
for all local observables F(s  ) (i.e., for all F depending
phase space. This means that on the points x of
on the spin configuration in any fixed region ) all the
phase space acts a group of transformations G so
following limits exist:
that with each 2 G is associated a map x ! x
lim hFiB; hFiB 44 which transforms x into x respecting the composi-
!1 tion law in G, that is, (x ) 0  x( 0 ). If F is an
observable, the action of the group on phase space
The reason is that the inequalities [43] imply that all
induces an action on the observable F changing F(x)
averages hX iB,  are monotonic in  for all fixed def
into F (x) = F(x 1 ).
X : so the limit [44] exists for F(s) = X . Hence,
A symmetry-breaking transition occurs when, by
it exists for all Fs depending only on finitely many
fixing suitable boundary conditions and taking the
spins, because any local function F measurable in 
thermodynamic limit, a state F ! hFi is obtained in
can be expressed (uniquely) as a linear combination
which some local observable shows a nonsymmetric
of functions X with X  .
average hFi 6 hF i for some .
Monotonicity with empty boundary conditions is
An example is provided by the nearest-neighbor
seen by considering the sites outside  and in a
ferromagnetic Ising model on a d-dimensional lattice
region 0 with side one unit larger than that of 
with energy function given by [39] with h = 0 and
and imagining that the couplings JX with X 0 but
(x  y)  0 unless jx  yj = 1, i.e., unless x, y are
X 6  vanish. Then, hX i0 hX i , because hX i0
nearest neighbors, in which case (x  y) = J > 0.
is an average computed with a distribution corre-
With periodic or empty boundary conditions, it
sponding to an energy with the couplings JX with
exhibits a discrete updown symmetry s !s.
X 6 , but X 0 , changed from 0 to JX 0.
Instability with respect to boundary conditions
Likewise, if the boundary condition is , then
can be revealed by considering the two boundary
enlarging the box from  to 0 corresponds to
conditions, denoted or , in which the lattice
decreasing an external field h acting on the external
sites outside the container  are either occupied by
spins from 1 (which would force all external spins to
spins or by spins . Consider also, for later
be ) to a finite value h 0: so, increasing the box 
reference, (1) the boundary conditions in which
causes hX i, to decrease. Therefore, as  increases,
the boundary spins in the upper half of the
Ising ferromagnets spin correlations increase if the
boundary are and the ones in the lower
boundary condition is empty and decrease if it is .
part are : call this the -boundary condition
The inequalities can be used in similar ways to prove
(see Figure 2); or (2) the boundary conditions in
that the infinite-volume states obtained from or
empty boundary conditions are translation invariant;
and that in zero external field, h = 0, the and 
boundary conditions generate pure states if the interac-
tion potential is only a pair ferromagnetic interaction.
There are many other important inequalities
which can be used to prove several existence
theorems along very simple paths. Unfortunately,
their use is mostly restricted to lattice systems and A O B
requires very special assumptions on the energy
(e.g., ferromagnetic interactions in the above exam-
ple). The quoted examples were among the first
discovered and provide a way to exhibit nontrivial
thermodynamic limits and pure states.
For more details, see Ruelle (1969), Lebowitz Figure 2 The dashed line is the boundary of ; the outer spins
(1974), Gallavotti (1999), Lieb and Thirring (2001), correspond to the  boundary condition. The points A, B are
and Lieb (2002). points where an open line ends.
Introductory Article: Equilibrium Statistical Mechanics 71

which some of the opposite sides of  are because the last ratio in [46] does not exceed 1.
identified while or  conditions are assigned on Note that there are >3p different shapes of with
the remaining sides: call these cylindrical or perimeter p and at most p2 congruent s containing
semiperiodic boundary conditions. x; therefore, the probability that the spin at x is 
A new description of the spin configurations is when the boundary condition is satisfies the
useful: given s, draw a unit segment perpendicular inequality
to the center of each bond b having opposite spins at X
1
its extremes. An example of this construction is P;   p2 3p e2Jp ! 0
provided by Figure 2 for the boundary condition . !1
p4
The set of segments can be grouped into lines
separating regions where the spins are positive from This probability can be made arbitrarily small so
regions where they are negative. If the boundary that hx i, is estimated by a quantity which is as
condition is or , the lines form closed polygons, close to 1 as desired provided  is large enough and
whereas, if the condition is , there is also a single the closeness of hx i, to 1 is estimated by a
polygon 1 which is not closed (as in Figure 2). If the quantity which is both x and  independent.
boundary condition is periodic or cylindrical, all A similar argument for the ()-boundary condition,
polygons are closed but some may go around . or the remark that for h = 0 it is hx i, = hx i, ,
The polygons are also called contours and the length leads to conclude that, at large , hx i, 6 hx i,
of a polygon will be denoted j j. and the difference between the two quantities
The correspondence ( 1 , 2 , . . . , n , 1 ) ! s, for is positive uniformly in . This is the proof
the boundary condition  or, for the boundary (Peierls theorem) of the fact that there is, if  is
condition (or ), s ! ( 1 , . . . , n ) is one-to-one large, a strong instability, of the magnetization with
and, if h = 0, the energy H (s) of a configuration is respect to the boundary conditions, i.e., the nearest-
higher than J(number of bonds in ) P by an neighbor Ising model in dimension 2 (or greater, by an
P
amount 2J(j 1 j i j i j) or, respectively, 2J i j i j. identical argument) has a phase transition. If the
The grand canonical probability of each spin dimension is 1, the argument clearly fails and no phase
configuration is therefore proportional, if h = 0, transition occurs (see the section Absence of phase
respectively, to transitions: d = 1).
P P For more details, see Gallavotti (1999).
e2Jj 1 j i j i j or e2J i j i j 45
and the updown symmetry is clearly reflected
by [45]. Finite-Volume Effects
The average hx i, of  with boundary
The description in the last section of the phase
conditions is given by hx i, = 1  2P, (), where
transition in the nearest-neighbor Ising model can be
P, () is the probability that the spin x is 1. If the
made more precise both from physical and mathe-
site x is occupied by a negative spin then the point x is
matical points of view giving insights into the nature
inside some contour associated with the spin
of the phase transitions. Assume that the boundary
configuration s under consideration. Hence, if ( )
condition is the ()-boundary condition and
is the probability that a given contour belongs to
describe a spin configuration s by means of the
the set of contours
P describing a configuration s, it
associated closed disjoint polygons ( 1 , . . . , n ).
is P, ()  ox ( ) where ox means that
Attribute to s = ( 1 , . . . , n ) a probability propor-
surrounds x.
tional to [45]. Then the following MinlosSinais
If  = ( 1 , . . . , n ) is a spin configuration and if
theorem holds:
the symbol  comp means that the contour is
disjoint from 1 , . . . , n (i.e., { [ } is a new spin Theorem If  is large enough there exist C > 0,
configuration), then ( ) > 0 with ( )  e2Jj j and such that a spin
P configuration s randomly chosen out of the grand
P 2J j 0 j
3 e 0 2 canonical distribution with boundary conditions
 P P and h = 0 will contain, with probability approaching
2J j 0 j
 e 0 2
1 as  ! 1, a number K( ) (s) of contours con-
P P
e
2J
0 2
j 0 j gruent to such that
2Jj j comp
e P 2J P 0 j 0 j p
e
2 jK s   jjj  C jj eJj j 47
 e2Jj j 46 and this relation holds simultaneously for all s.
72 Introductory Article: Equilibrium Statistical Mechanics

Thus, there are very few contours (and the larger analyticity holds at all h. For  large, the function
they are the smaller is, in absolute and relative f (, h) has an essential singularity at h = 0: a result
value, their number): a typical spin configuration in that can be interpreted as excluding a naive theory
the grand canonical ensemble with ()-boundary of metastability as a description of states governed
conditions is such that the large majority of the spins by an equation of state obtained from an analytic
is positive and, in the sea of positive spins, there continuation to negative values of h of f (, h).
are a few negative spins distributed in small and The above considerations and results further
rare regions (their number, however, is still of order clarify the meaning of a phase transition for a
of jj). finite system. For more details, we refer the
Another consequence of the analysis in the last reader to Gallavotti (1999) and Friedli and Pfister
section concerns the the approximate equation of (2004).
state near the phase transition region at low
temperatures and finite . If  is finite, the graph
of h versus m (, h) will have a rather different
Beyond Low Temperatures
behavior depending on the possible boundary con-
ditions. For example, if the boundary condition is
(Ferromagnetic Ising Model)
() or (), one gets, respectively, the results A limitation of the results discussed above is the
depicted in Figure 3a and 3b, where m () denotes condition of low temperature ( large enough).
def
the spontaneous magnetization (i.e., m () = A natural problem is to go beyond the low-
limh!0 lim!1 m (, h)). temperature region and to describe fully the phe-
With periodic or empty boundary conditions, the nomena in the region where boundary condition
diagram changes as in Figure 4. The thermody- instability takes place and first develops. A number
namic limit m(, h) = lim!1 m (, h) exists for all of interesting partial results are known, which
h 6 0 and the resulting graph is in Figure 4b, considerably improve the picture emerging from
which shows that at h = 0 the limit is discontin- the previous analysis. A striking list, but far from
uous. It can be proved, if  is large enough, that exhaustive, of such results follows and focuses on
1 > limh ! 0 @h m(, h) = () > 0 (i.e., the angle the properties of ferromagnetic Ising spin systems.
between the vertical part of the graph and the rest The reason for restricting to such cases is that they
is sharp). are simple enough to allow a rather fine analysis,
Furthermore, it can be proved that m(, h) is which sheds considerable light on the structure of
analytic in h for h 6 0. If  is small enough, statistical mechanics suggesting precise formulation

m(, h) m(, h)

1 1
m*() m*()
O(||1/2) O(||1/2) O (||1/2) O (||1/2)
h h
m*() m*()

(a) (b)
Figure 3 The h vs m (, h) graphs for  finite and (a) and (b)  conditions.

m(, h) m(, h)

1 1
m*() m*()
O(||1/2) O(||1/2)
h h
m*() m*()

(a) (b)
Figure 4 (a) The h vs m (, h) graph for periodic or empty boundary conditions. (b) The discontinuity (at h = 0) of the thermodynamic limit.
Introductory Article: Equilibrium Statistical Mechanics 73

of the problems that it would be desirable to the unit circle) in the z-plane. Then, if J0 6 0,
understand in more general systems. they lie in a closed set N 1 , -independent and
def contained in a neighborhood of N of width
1. Let z = e h and consider that the product of zV
shrinking to 0 when jjJ0 jj ! 0. This allows to
(V is the number of sites jj of ) times the
establish various relations between analyticity
partition function with periodic or perfect-wall
properties and boundary condition instability
boundary conditions and with finite-range
as described in (3) below.
ferromagnetic interaction, not necessarily nearest-
3. In the ferromagnetic Ising model, with not necessa-
neighbor; a polynomial in z (of degree 2V)
rily a nearest-neighbor interaction, one says that
is thus obtained. Its zeros lie on the unit
there is a gap around 0 if d () = 0 near  = 0. It
circle jzj = 1: this is LeeYangs theorem. It
can be shown that if  is small enough there is a gap
implies that the only singularities of f (, h) in
for all h of width uniform in h.
the region 0 <  < 1, 1 < h < 1 can be
4. Another question is whether the boundary
found at h = 0.
condition instability is always revealed by the
A singularity can appear only if the point z = 1
one-spin correlation function (i.e., by the magne-
is an accumulation point of the limiting distribu-
tization) or whether it might be shown only
tion (as  ! 1) of the zeros on the unit circle: if
by some correlation functions of higher order. It
the zeros are z1 , . . . , z2V then
can be proved that no boundary condition
instability occurs for h 6 0; at h = 0 it is possible
1 only if
log zV Z; h; ; periodic
V
1X 2V lim m; h 6 lim m; h
h!0
50
2J h logz  zi h!0
V i1
5. A consequence of the Griffiths inequalities
and if (cf. the section Thermodynamic limits and
inequalities) is that if [50] is true for a given
V 1  number of zeros of the form 0 then it is true for all  > 0 . Therefore, item
d  (4) leads to a natural definition of the critical
zj eij ;   j   d !
!1 2 temperature Tc as the least upper bound of the
it is T s such that [50] holds (kB T = 1 ).
Z 6. If d = 2 the free energy of the nearest-neighbor

1 ferromagnetic Ising model has a singularity
f ; h 2J logz  ei d  48
2  at c and the value of c is known exactly
from the exact solutions of the model:
The existence of the measure d () follows from def
m(, 0 ) = m ()  (1  sinh4 2J)1=8 . The loca-
the existence of the thermodynamic limit: but tion and nature of the singularities of f (, 0) as a
d () is not necessarily d-continuous, i.e., not function of  remains an open question for d = 3.
necessarily proportional to d. In particular, the question whether there is a
2. It can be shown that, with not necessarily a singularity of f (, 0) at  = c is open.
nearest-neighbor interaction, the zeros of the 7. For  < c there is instability with respect to
partition function do not move too much under boundary conditions (see (6) above) and a
small perturbations of the potential even if one natural question is: how many pure phases
perturbs the energy (at perfect-wall or periodic can exist in the ferromagnetic Ising model?
boundary conditions) into (cf. the section Phase transitions and boundary
H0 s H s H s conditions, eqn [22]). Intuition suggests
X that there should be only two phases: the
H s J0 X X 49
positively magnetized and the negatively
X 
magnetized ones.
0 One has to distinguish between translation-
where J (X) is very general and defined on
subsets X = (x1 , . . .P
, xk )  such that the quan- invariant pure phases and non-translation-invariant
tity jjJ0 jj = supy2Zd y2X jJ0 (X)j is small enough. ones. It can be proved that, in the case of the
More precisely, with a ferromagnetic pair two-dimensional nearest-neighbor ferromagnetic
potential J fixed, suppose that one knows that, Ising models, all infinite-volume states (cf. the
when J0 = 0, the partition function zeros in the section Lattice models) are translationally invar-
variable z = eh lie in a certain closed set N (of iant. Furthermore, they can be obtained by
74 Introductory Article: Equilibrium Statistical Mechanics

considering just the two boundary conditions external cause favoring the occupation of a part of
and : the latter states are also pure states for the volume by a single phase. Such an asymmetry
models with non-nearest-neighbor ferromagnetic can be obtained in at least two ways: through a
interaction. The solution of this problem has led to weak uniform external field (in complete analogy with
the introduction of many new ideas and techniques the gravitational field in the liquidvapor transition) or
in statistical mechanics and probability theory. through an asymmetric field acting only on boundary
8. In any dimension d 2, for  large enough, it can spins. The latter should have the same qualitative
be proved that the nearest-neighbor Ising model effect as the former, because in a phase transition
has only two translation-invariant phases. If the region a boundary perturbation produces volume
dimension is 3 and  is large, the and  effects (see sections Phase transitions and inequal-
phases exhaust the set of translation-invariant ities and Symmetry-breaking phase transitions).
pure phases but there exist non-translation- From a mathematical point of view, it is simpler to
invariant phases. For  close to c , however, the use a boundary asymmetry to produce phase separa-
question is much more difficult. tions and the simplest geometry is obtained by
considering -cylindrical or -cylindrical boundary
For more details, see Onsager (1944), Lee and
conditions: this means or  boundary conditions
Yang (1952), Ruelle (1971), Sinai (1991), Gallavotti
periodic in one direction (e.g., in Figure 2 imagine the
(1999), Aizenman (1980), Higuchi (1981), and
right and left boundary identified after removing the
Friedli and Pfister (2004).
boundary spins on them).
Spins adjacent to the bases of  act as symmetry-
Geometry of Phase Coexistence breaking external fields. The -cylindrical bound-
ary condition should favor the formation inside 
Intuition about the phenomena connected with the of the positively magnetized phase; therefore, it
classical phase transitions is usually based on the will be natural to consider, in the canonical
properties of the liquidgas phase transition; this distribution, this boundary condition only when
transition is usually experimentally investigated in
the total magnetization is fixed to be the sponta-
situations in which the total number of particles is
neous magnetization m ().
fixed (canonical ensemble) and in presence of an On the other hand, the -boundary condition
external field (gravity). favors the separation of phases (positively magnetized
The importance of such experimental conditions phase near the top of  and negatively magnetized
is obvious; the external field produces a nontransla- phase near the bottom). Therefore, it will be natural
tionally invariant situation and the corresponding to consider the latter boundary condition in the
separation of the two phases. The fact that the case of a canonical distribution with magnetization
number of particles is fixed determines, on the other
m = (1  2)m () with 0 <  < 1 ([51]). In the latter
hand, the fraction of volume occupied by each of the
case, the positive phase can be expected to adhere to
two phases. the top of  and to extend, in some sense to be
Once more, consider the nearest-neighbor ferro- discussed, up to a distance O(L) from it; and then to
magnetic Ising model: the results available for it can change into the negatively magnetized pure phase.
be used to obtain a clear picture of the solution to To make the phenomenological description
problems that one would like to solve but which in precise, consider the spin configurations s through
most other models are intractable with present-day the associated sets of disjoint polygons (cf. the
techniques.
section Symmetry-breaking phase transitions). Fix
It will be convenient to discuss phase coexistence in
the boundary conditions to be or -cylindrical
the canonical ensemble distributions on configurations boundary conditions and note that polygons asso-
of fixed total magnetization M = mV (see the section ciated with a spin configuration s are all closed and
Lattice models; [40]). Let  be large enough to be in of two types: the ones of the first type, denoted
the two-phase region and, for a fixed  2 (0, 1), let 1 , . . . , n , are polygons which do not encircle ; the
m  m  1   m  second type of polygons, denoted by the symbols  ,
are the ones which wind up, at least once, around .
1  2 m  51
So, a spin configuration s will be described by a set
that is, m is in the vertical part of the diagram of polygons; the statistical weight of a configuration
m = m(, h) at  fixed (see Figure 4). s = ( 1 , . . . , n , 1 , . . . , h ) is (cf. [45]):
Fixing m as in [51] does not yet determine the P
separation of the phases in two different regions; for P 
2J i
j i j j
j j j
this effect, it will be necessary to introduce some e 52
Introductory Article: Equilibrium Statistical Mechanics 75

The reason why the contours that go around where ( )  e2Jj j is the same quantity as
the cylinder  are denoted by (rather than by ) is already mentioned in the text of the theorem of
that they look like open contours (see the section Finite-volume effects. A similar result holds for
Symmetry-breaking phase transitions) if one forgets the contours below (cf. the comments on [47]).
that the opposite sides of  have to be identified. In the
The above theorem not only provides a detailed and
case of the -boundary conditions then the number of
rather satisfactory description of the phase separation
polygons of -type must be odd (hence 6 0), while for
phenomenon, but it also furnishes a precise micro-
the -boundary condition the number of -type
scopic definition of the line of separation between the
polygons must be even (hence it could be 0).
two phases, which should be naturally identified with
For more details, the reader is referred to Sinai
the (random) line .
(1991) and Gallavotti (1999).
A similar result holds in the canonical distribution
, , m () where (i) is replaced by: no -type
polygon is present, while (ii), (iii) become super-
Separation and Coexistence of Phases fluous, and (iv) is modified in the obvious way. In
other words, a typical configuration for the distribu-
In the context of the geometric description of
tion the , , m () has the same appearance as a
the spin configuration in the last section, consider
typical configuration of the corresponding grand
the canonical distributions with -cylindrical or the
canonical ensemble with ()-boundary condition
-cylindrical boundary conditions and zero field: they
(whose properties are described by the theorem
will be denoted briefly as , , ,  , respectively.
given in the section Beyond low temperatures
The following theorem (MinlosSinais theorem)
(ferromagnetic Ising model).
provided the foundations of the microscopic theory
For more details, see Sinai (1991) and Gallavotti
of coexistence: it is formulated in dimension d = 2
(1999).
but, modulo obvious changes, it holds for d 2.
Theorem For 0 <  < 1 fixed, let m = (1  2)
m (); then for  large enough a spin configuration Phase Separation Line and Surface
s = ( 1 , . . . , n , 1 , . . . , 2h1 ) randomly chosen with Tension
the distribution ,  enjoys the properties (i)(iv) below
Continuing to refer to the nearest-neighbor Ising
with a ,  -probability approaching 1 as  ! 1:
ferromagnet, the theorem of the last section means
(i) s contains only one contour of -type and that, if  is large enough, then the microscopic line ,
separating the two phases, is almost straight (since
jj j  1 "Lj < oL 53
"() is small). The deviations of from a straight line
where "() > 0 is a suitable (-independent) are more conveniently studied in the grand canonical
function of  tending to zero exponentially fast distributions 0 with boundary condition set to 1 in
as  ! 1. the upper half of @, vertical sites included, and
(ii) If  
,  denote respectively, the regions above to 1 in the lower half: this is illustrated in Figure 2
and below , and jj  V, j j, j j are, (see the section Symmetry-breaking phase transi-
respectively, the volumes of ,  ,  then tions). The results can be converted into very
3=4
similar results for grand canonical distributions with
jj
j   Vj <  V -cylindrical boundary conditions of the last section.
3=4
j
j  1  V j <  V 54 Define to be rigid if the probability that passes
through the center of the box  (i.e., 0) does not
where ()  !
! 1 exponentially fast; the expo- tend to 0 as  ! 1; otherwise, it is not rigid.
nent 3/4, P
here and below, is not
P optimal. The notion of rigidity distinguishes between the
(iii) If M = x2 x and M

x2 x , then possibilities for the line to be straight. The

3=4
excess length "()L (see [53]) can be obtained in
jM
  m  Vj <  V two ways: either the line is essentially straight (in
3=4
M
 1   m  Vj < V 55 the geometric sense) with a few bumps distributed
with a density of order "() or, otherwise, it is only
(iv) If K (s) denotes the number of contours con- locally straight and with an important part of the
gruent to a given and lying in  then, excess length being gained through a small bending
simultaneously for all the shapes of : on a large length scale. In three dimensions a similar
phenomenon is possible. Rigidity of , or its failure,
jK s    V j  CeJj j V 1=2 ; C > 0 56 can in principle be investigated by optical means;
76 Introductory Article: Equilibrium Statistical Mechanics

there can be interference of coherent light scattered temperature Tc (the latter being defined as the
by macroscopically separated surface elements of highest temperature below which there are at least
only if is rigid in the above sense. two pure phases). The temperature T ec , whose
It has been rigorously proved that, the line is not existence is rather well established in numerical
rigid in dimension 2. And, at least at low tempera- experiments, would be called the roughening
ture, the pfluctuation
of the middle point is of the transition temperature. The rigidity of is con-
order O( L). In dimension 3 however, it has been nected with the existence of translationally non-
shown that the surface is rigid at low enough invariant equilibrium states. The latter exist in
temperature. dimension d = 3, but not in dimension d = 2, where
A deeper analysis is needed to study the shape of the discussed nonrigidity of , established all the
the separation surface under other conditions, for way to Tc , provides the intuitive reason for the
example, with boundary conditions in a canoni- absence of non-translation-invariant states. It has
cal distribution with magnetization intermediate been shown that in d = 3 the roughening tempera-
between m (). It involves, as a prerequisite, the ture Tec () necessarily cannot be smaller than the
definition and many properties of the surface critical temperature of the two-dimensional Ising
tension between the two phases. Here only model with the same coupling.
the definition of surface tension in the case of Note that existence of translationally noninvar-
-boundary conditions in the two-dimensional case iant equilibrium states is not necessary for the
will be mentioned. If Z (, m ()) and Z (, m) description of coexistence phenomena. The theory
are, respectively, the canonical partition functions of the nearest-neighbor two-dimensional Ising model
for the - and -cylindrical boundary conditions is a clear proof of this statement.
the tension
() is defined as The reader is referred to Onsager (1944), van
Beyeren (1975), Sinai (1991), Miracle-Sole (1995),
1 Z ; m Pfister and Velenik (1999), and Gallavotti (1999) for

  lim log
!1 L Z ; m  more details.

The limit can be shown to be -independent for 


large enough: the definition and its justification is Critical Points
based on the microscopic geometric description in
the section Geometry of phase co-existence. The Correlation functions for a system with short-range
definition can be naturally extended to higher interactions and in an equilibrium state (which is
dimension (and to more general non-nearest-neighbor a pure phase) have cluster properties (see [22]):
models). If d = 2, the tension
can be exactly their physical meaning is that in a pure phase there
computed at all temperatures below criticality and is independence between fluctuations occurring in
is 
() = 2J log tanh J. widely separated regions. The simplest cluster
More remarkably, the definition can be extended to property concerns the pair correlation function,
define the surface tension
(, n) in the direction n, that is, the probability density (q1 , q2 ) of finding
that is, when the boundary conditions are such particles at points q1 , q2 independently of where
that the line of separation is in the average the other particles may happen to be (see [23]).
orthogonal to the unit vector n. In this way, if In the case of spin systems, the pair correlation
d = 2 and  2 (0, 1) is fixed, it can be proved that (q1 , q2 ) = hq1 q2 i will be considered. The pair
at low enough temperature the canonical distribu- correlation of a translation-invariant equilibrium
tion with boundary conditions and intermediate state has a cluster property ([22], [42]), if
magnetization m = (1  2)m () has typical
jq1 ; q2  2 j ! 0 57
configurations containing a spin  region of area jq1 q2 j!1

V; furthermore, if the container is rescaled to


size L = 1, the region will have a limiting shape where  is the probability density for finding a
filling an area  bounded by a smooth curve particle at q (i.e., the physical density of the state) or
whose form is determined by the classical macro-  = hq i is the average of the value of the spin at q
scopic Wulff s theory of the shape of crystals in (i.e., the magnetization of the state).
terms of the surface tension
(n). A general definition of critical point is a point c in
An interesting question remains open in the three- the space of the parameters characterizing equili-
dimensional case: it is conceivable that the surface, brium states, for example, , in grand canonical
although rigid at low temperature, might become distributions, , v in canonical distributions, or , h
ec smaller than the critical
loose at a temperature T in the case of lattice spin systems in a grand canonical
Introductory Article: Equilibrium Statistical Mechanics 77

distribution. In systems with short-range interaction This means that if i are regarded as points in R d
(i.e., with (r) vanishing for jrj large enough) the there are functions 2n such that
point c is a critical point if the pair correlation tends  
1 2n1
to 0 (see [57]), slower than exponential (e.g., as a 2n 0; ; . . . ; !2n 2n 0; 1 ; . . . ; 2n1
power of the distance jrj = jq1  q2 j).
A typical example is the two-dimensional Ising 0< 2R 59
model on a square lattice and with nearest-neighbor
and h0 1 . . . 2n1 i / 2n (0, 1 , . . . , 2n1 ) if 1 
ferromagnetic interaction of size J. It has a single
jxi  xj j  l0 (). The numbers !2n define a sequence
critical point at  = c , h = 0 with sinh 2c J = 1. The
of critical exponents.
cluster property is that hx y i  hx ihy i! 0 as
jxyj!1 Other critical exponents can be associated with
jxyj jxyj approaching the critical point along other directions
e e
A  p ; A  (e.g., along h ! 0 at  = c ). In this case, the length up
jx  yj jx  yj2 to which there are scaling phenomena is l0 (h) = o h .
1 Further, the magnetization m(h) tends to 0 as h ! 0 at
Ac ; 58
jx  yj1=4 fixed  = c as m(h) = m0 h1= for  > 0.
None of the feautres of critical exponents is known
for  < c ,  > c , or  = c , respectively, where rigorously, including their existence. An exception is the
A (), Ac , () > 0. The properties [58] stem from case of the twodimensional nearest-neighbor Ising
the exact solution of the model. ferromagnet where some exponents are known exactly
At the critical point, several interesting phenom- (e.g., !2 = 1=4, !2n = n!2 , or  = 1, while ,  are not
ena occur: the lack of exponential decay indicates rigorously known). Nevertheless, for Ising ferromag-
lack of a length scale over which really distinct nets (not even nearest-neighbor but, as always here,
phenomena can take place, and properties of the finite-range) in all dimensions, all of the exponents
system observed at different length scales are likely mentioned are conjectured to be the same as those
to be simply related by suitable scaling transforma- of the nearest-neighbor Ising ferromagnet. A further
tions. Many efforts have been dedicated at finding exception is the derivation of rigorous relations
ways of understanding quantitatively the scaling between critical exponents and, in some cases, even
properties pertaining to different observables. The their values under the assumption that they exist.
result has been the development of the renormaliza-
Remark Naively it could be expected that in a pure
tion group approach to critical phenomena (cf. the
state in P zero field with hx i = 0 the quantity
section Renormalization group). The picture that
s = jj1=2 x2 x , if  is a cubic box of side ,
emerges is that the closer the critical point is the
should have a probability distribution which is
larger becomes the maximal scale of length below
Gaussian, with dispersion lim!1 hs2 i. This is
which scaling properties are observed. For instance,
usually true, but not always. Properties [58]
in a lattice spin system in zero field the magnetiza-
show that in the d = 2 ferromagnetic nearest-
tion Mjja in a box   should have essentially
neighbor Ising model, hs2 i diverges proportionally
the same distribution for all s with side < l0 () and 1
to 24 so that the variable s cannot have the above
l0 () ! 1 as  ! c , provided a is suitably chosen.
Gaussian distribution. The variable S = jj7=8
P
The number a is called a critical exponent.
There are several other critical exponents that x2 x will have a finite dispersion: however,
there is no reason that it should be Gaussian. This
can be defined near a critical point. They can
makes clear the great interest of a fluctuation theory
be associated with singularities of the thermody-
and its relevance for the critical point studies (see
namic function or with the behavior of
the next two sections).
the correlation functions involving joint densities at
two or more than two points. As an example, For more details, the reader is referred to Onsager
consider a lattice spin system: then the 2nspins (1944), Domb and Green (1972), McCoy and Wu
correlation h0 1 . . . 2n1 ic could behave propor- (1973), and Aizenman (1982).
tionally to 2n (0, 1 , . . . , 2n1 ), n = 1, 2, 3, . . . , for a
suitable family of homogeneous functions n , of
some degree !2n , of the coordinates (1 , . . . , 2n1 ) Fluctuations
at east when the reciprocal distances are large but
As it appears from the discussion in the last section,
< l0 () and
fluctuations of observables around their averages
l0  const:  c  ! 1 have interesting properties particularly at critical
!0 points. Of particular interest are observables that
78 Introductory Article: Equilibrium Statistical Mechanics

are averages, over large volumes , of local functions function of m(h). If p = M =jj the function F(p) is
F(x) on phase space: this is so because macroscopic given by
observables often have this form. For instance, given
Fp f ; hp  f ; h  @h f ; hhp  h 60
a region  inside the system container ,  ,
consider a configuration x = (P, Q) and the number then a quite general result is:
P
of particles
P N  = q2 1 in , or the potential energy
TheoremP The relations (1)(3) hold if the potential
F = P (q, q0 )2 (q  q0 ) or the kinetic energy
satisfies x j(x)j < 1 and if F(p) [60] is smooth
K = q2 (1=2m)p2 . In the case of lattice spin
and F00 (p) 6 0 in open intervals around those in
systems, consider a configuration
P s and, for instance,
which p is considered, that is, around p = 0 for the
the magnetization M = i2 i in . Label the
law of large numbers and for the central limit law or
above four examples by  = 1, . . . , 4.
in an open interval containing a, b for the case of the
Let  be the probability distribution describing
large deviations law.
the equilibrium state in which the quantities X are
def
considered; let x = hX =jji and p = (X  In the cases envisaged, the theory of equivalence
x )=jj. Then typical properties of fluctuations that of ensembles implies that the function F can also be
should be investigated are ( = 1, . . . , 4): computed via thermodynamic functions naturally
associated with other equilibrium ensembles. For
1. for all  > 0 it is lim!1  (jpj > ) = 0 (law of
instance, instead of the grand canonical f (, h), one
large numbers);
could consider the canonical g(, m) (see [41]), then
2. there is D > 0 such that

p Z b Fp g; p  g; m  @m g; mp  m 61


dz 2
p jj 2 a; b ! p ez =2D
!1 a 2 D It has to be remarked that there should be a
strong relation between the central limit law and the
(central limit law); and law of large deviations. Setting aside stating the
3. there is an interval I = (p , , p , ) and a concave conditions for a precise mathematical theorem, the
function F (p), p 2 I, such that if [a, b] I then statement can be efficiently illustrated in the case of
1 a ferromagnetic lattice spin system and with   ,
log p 2 a; b ! max F p by showing that the law of large deviations in small
jj !1 p2a;b
intervals, around the average m(h0 ), at a value h0 of
(large deviations law). the external field, is implied by the validity of the
The law of large numbers provides the certainty central limit law for all values of h near h0 and vice
of the macroscopic values; the central limit law versa (here  is fixed). Taking h0 = 0 (for simplicity),
p the heuristic reasons are the following. Let h, be
controls the small fluctuations (of order jj) of X
around its average; and the large deviations law the grand canonical distribution in external field h.
concerns the fluctuations of order jj. Then:
The relations (1)(3) above are not always true: 1. The probability h, (p 2 dp) is proportional,
they can be proved under further general assump- by definition, to 0, (p 2 dp)ehpjj . Hence,
tions if the potential Psatisfies [14] in the case of if the central limit law holds for all h near
particle systems or if q j(q)j < 1 in the case h0 = 0, there will exist two functions m(h) and
of lattice spin systems. The function F (p) is D(h) > 0, defined for h near h0 = 0, with
defined in terms of the thermodynamic limits of m(0) = 0 and
suitable thermodynamic functions associated with
jj
the equilibrium state  . The further assumption is, 0 p 2 dpehp
essentially in all cases, that a suitable thermody- !
namic function in terms of which F (p) will be p  mh2
const:exp jj o dp 62
expressed is smooth and has a nonvanishing second 2Dh
derivative.
For the purpose of a simple concrete example, 2. There is a function (m) such that @m (m(h)) = h
consider P a lattice spin system P of Ising type with and @m 2
(m(h)) = D(h)1 . (This is obtained by
energy  x, y2 (x  y)x y  x h Px and the fluc- noting that, given D(h), the differential equation
tuations of the magnetization M = x2 x ,  , @m h = D(h)1 with the initial value h(0) = 0
in the grand canonical equilibrium states h,  . determines the function h(m); therefore, (m)
Let the free energy be f (, h) (see [41]), let is determined by a second integration, from
def
m = m(h) = hM =jji and let h(m) be the inverse @m (m) = h(m).
Introductory Article: Equilibrium Statistical Mechanics 79

It then follows, heuristically, that the probability The discussion of the last section shows that at
of p in zero field has the form const. e(p)jj dp so the critical point the nature of the large fluctuations
that the probability that p 2 [a, b] will be const is also expected to change: no central limit law is
exp (jj maxp2[a,b] (p)). expected to hold in general because of the example
Conversely, the large deviations law for p at h = 0 of [58] with the divergence of the average of the
implies the validity of the central limit law for the normal second moment of the magnetization in a
fluctuations of p in all small enough fields h: this box as the side tends to 1.
simply arises from the function F(p) having a For more details the reader is referred to Olla
negative second derivative. (1987).
This means that there is a duality between central
limit law and large deviation law or that the law of
Renormalization Group
large deviations is a global version of the central
limit law, in the sense that: The theory of fluctuations just discussed concerns
only fluctuations of a single quantity. The problem
1. if the central limit law holds for h in an interval
of joint fluctuations of several quantities is also
around h0 then the fluctuations of the magnetiza-
interesting and in fact led to really new develop-
tion at field h0 satisfy a large deviation law in a
ments in the 1970s. It is necessary to restrict
small enough interval J around m(h0 ); and
attention to rather special cases in order to illustrate
2. if a large deviation law is satisfied in an interval
some ideas and the philosophy behind the approach.
around h0 then the central limit law holds for the
Consider, therefore, the equilibrium distribution 0
fluctuations of magnetization around its average
associated with one of the classical equilibrium
in all fields h with h  h0 small enough.
ensembles. To fix the ideas we consider the
Going beyond the heuristic level in establishing equilibrium distribution of an Ising energy function
the duality amounts to giving a precise meaning to H0 , having included the temperature factor in the
small enough and to discuss which properties of energy: the inclusion is done because the discussion
m(h) and D(h), or F(p) are needed to derive will deal with the properties of 0 as a function of .
properties (1), (2). It will also be assumed that the average of each spin
For purposes of illustration consider the Ising is zero (no magnetic field, see [39] with h = 0).
model with ferromagnetic short range interaction : Keeping in mind a concrete case, imagine that H0
then the central limit law holds for all h if  is small is the energy function of the nearest-neighbor Ising
enough and, under the same condition on , the ferromagnet in zero field.
large deviations law holds for all h and all intervals Imagine that the volume  of the container has
[a, b] (1, 1). If  is not small then the condition periodic boundary conditions and is very large,
h 6 0 has to be added. Hence, the conditions are ideally infinite. Define the family of blocks kx,
fairly weak and the apparent exceptions concern the parametrized by x 2 Zd and with k an integer,
value h = 0 and  not small where the statements consisting of the lattice sites x = {ki  xi < (k 1)
may become invalid because of possible phase i }. This is a lattice of cubic blocks with side size k
transitions. that will be called the k-rescaled lattice.
P
In presence of phase transitions, the law of large Given , the quantities mx = kd x2kx x are
numbers, the central limit law, and law of large called the block spins and define the map
deviations should be reformulated. Basically, one R ,k 0 = k transforming the initial distribution on
has to add the requirement that fluctuations are the original spins into the distribution of the block
considered in pure phases and change, in a natural spins. Note that if the initial spins have only two
way, the formulation of the laws. For instance, values x = 1, the block spins take values between
the large fluctuations of magnetization in a pure kd=kd and kd=kd at steps of size 2=kd . Further-
phase of the Ising model in zero field and large  more, the map R , k makes sense independently of
(i.e., in a state obtained as limit of finite-volume how many values the initial spins can assume, and
states with or  boundary conditions) in even if they assume a continuum of values Sx 2 R.
intervals [a, b] which do not contain the average Taking  = 1 means, for k large, looking at the
magnetization m are not necessarily exponen- probability distribution of the joint large fluctuations
tially small with the size of jj: if [a, b] in the blocks kx. Taking  = 1=2 corresponds to
[m , m ] they are exponentially small but only studying a joint central limit property for the block
with the size of the surface of  (i.e., with variables.
jj(d1)=d) ) while they are exponentially small with Considering a one-parameter family of initial
the volume if [a, b] \ [ m , m ] = ;. distributions 0 parametrized by a parameter 
80 Introductory Article: Equilibrium Statistical Mechanics

(that will be identified with the inverse temperature), Note that this theorem is stated without even
typically there will be a unique value () of  such mentioning the renormalization maps Rn1=2 : it can
that the joint fluctuations of the block variables nevertheless be interpreted as stating that
admit a limiting distribution, X 1
Rn1=2 H0 ! S2x 65
probk mx 2 ax ; bx ; s 2  n!1
d
2D
x2Z
Z fbx g Y
! g Sx x2 dSx 63 but the interpretation is not rigorous because [64]
k!1 fax g x2 does not state require that Rn1=2 H0 () makes sense
for n 1. It states that at high temperature block
for some distribution g (z) on R . Q spins have normal independent fluctuations: it is
If  > (), the limit will then be x2 (Sx ) dSx , therefore an extension of the central limit law.
or if  < () the limit will not exist (because the There are a few cases in which the map R can be
block variables will be too large, with a dispersion rigorously shown to be well defined at least when
diverging as k ! 1). acting on special equilibrium states like the high-
It is convenient to choose as sequence of k ! 1 temperature lattice spin systems: but these are
the sequence k = 2n with n = 0, 1, . . . because in this exceptional cases of relatively little interest.
way it is R ,k  R n ,1 and the limits k ! 1 along Nevertheless, there is a vast literature dealing with
the sequence k = 2n can be regarded as limits on a approximate representations of the map R . The
sequence of iterations of a map R , 1 acting on the reason is that, assuming not only its existence but
probability distributions of generic spins Sx on the also that it has the properties that one would
lattice Zd (the sequence 3n would be equally normally expect to hold for a map acting on a finite
suited). dimensional space, it follows that a number of
It is even more convenient to consider probability consequences can be drawn; quite nontrivial ones as
distributions that are expressed in terms of energy they led to the first theory of the critical point that
functions H which generate, in the thermodynamic goes beyond the van der Waals theory discribed in
limit, a distribution : then R ,1 defines an action the section van der Waals theory.
R on the energy functions so that R H = H 0 if H The argument proceeds essentially as follows. At
generates , H 0 generates 0 and R ,1  = 0 . Of the critical point, the fluctuations are expected to be
course, the energy function will be more general anomalous (cf. the last remark P in the section
pCritical
than [39] and at least a form like U in [49] has to points) in the sense that h( x2 x = jj)2 i will
be admitted. tend to 1, because  = 1=2 does P not correspond to
In other words, R gives the result of the action the right fluctuation scale of 2  , signaling that
of R ,1 expressed as a map acting on the energy R n1=2,1 0 (c ) will not have a limit but, possibly, there
functions. Its iterates also define a semigroup is c > 1=2 such that R n c ,1 0 (c ) converges to a limit
which is called the block spin renormalization in the sense of [63]. In the case of the critical nearest-
group. neighbor Ising ferromagnetic c = 7=8 (see ending
While the map R ,1 is certainly well defined as a remark in the section Critical points). Therefore, if
map of probability distributions into probability the map R c , 1 is considered as acting on 0 (), it will
distributions, it is by no means clear that R is well happen that forQall  < c , R n c ,1 0 (c ) will converge to
defined as a map on the energy functions. Because, if a trivial limit x2 (Sx ) dSx because the value c is
 is given by an energy function, it is not clear that greater than 1/2 while normal fluctuations are expected.
R ,1  is such. If the map Rc can be considered Q as a map on the
A remarkable theorem can be (easily) proved energy functions, this says that x2 (Sx ) dSx is a
when R , 1 and its iterates act on initial 0 s which (trivial) fixed point of the renormalization group
are equilibrium states of a spin system with short- which attracts the energy functions H0 corre-
range interactions and at high temperature ( small). sponding to the high-temperature phases.
In this case, if  = 1=2, the sequence of distributions The existence of the critical c can be associated
R n
1=2,1 0 () admits a limit which is given by with the existence of a nontrivial fixed point H for
a product of independent Gaussians: Rc which is hyperbolic with just one Lyapunov
exponent > 1; hence, it has a stable manifold of
probk mx 2 ax ; bx ; s 2  codimension 1. Call  the probability distribution
Z fbx g Y  Y corresponding to H .
1 dSx
! exp  S2x p 64 The migration towards the trivial fixed point for
k!1 fax g 2D 2 D
x2 x2  < c can be explained simply by the fact that for
Introductory Article: Equilibrium Statistical Mechanics 81

such values of  the initial energy function H0 is (e.g., the WilsonFisher "-expansion) that allow one
outside the stable manifold of the nontrivial fixed to pass from the well-defined R , 1 to the action of
point and under application of the renormalization R on the energy functions, it is possible to obtain
transformation Rnc , H0 migrates toward the trivial quite unambiguously values for c and expressions
fixed point, which is attractive in all directions. for H which are associated with the action of Rc
By increasing , it may happen that, for on various classes of models.
 = c , H0 crosses the stable manifold of the For instance, it can lead to conclude that the
nontrivial fixed point H for Rc . Then Rnc c H0 critical behavior of all ferromagnetic finite-range
will no longer tend to the trivial fixed point but it lattice spin systems (with energy functions given by
will tend to H : this means that the block spin [39]) have critical points controlled by the same c
variables will exhibit a completely different fluctua- and the same nontrivial fixed point: this property is
tion behavior. If  is close to c , the iterations of Rc far from being mathematically proved, but it is
will bring Rnc H0 close to H , only to be eventually considered a major success of the theory. One has to
repelled along the unstable direction reaching a compare it with van der Waals critical point theory:
distance from it increasing as n j  c j. for the first time, an approximation scheme has
This means that up to a scale length O(2n() ) lattice led, even though under approximations not fully
units with n() j  c j = 1 (i.e., up to a scale O(j controllable, to computable critical exponents which
c jlog2 )), the fluctuations will be close to those of the are not equal to those of the van der Waals theory.
fixed point distribution  , but beyond that scale they The renormalization group approach to critical
will come close to those of the trivial fixed point: to see phenomena has many variants, depending on which
them the block spins would have to be normalized kind of fluctuations are considered and on the models
with index  = 1=2 and they would appear as to which it is applied. In statistical mechanics, there
uncorrelated Gaussian fluctuations (cf. [64], [65]). are a few mathematically complete applications:
The next question concerns finding the nontrivial certain results in higher dimensions, theory of dipole
fixed points, which means finding the energy gas in d = 2, hierarchical models, some problems in
functions H and the corresponding c which are condensed matter and in statistical mechanics of
fixed points of Rc . If the above picture is correct, lattice spins, and a few others. Its main mathematical
the distributions  corresponding to the H would successes have occured in various related fields where
describe the critical fluctuations and, if there was not only the philosophy described above can be
only one choice, or a limited number of choices, of applied but it leads to renormalization transforma-
c and H this would open the way to a universality tions that can be defined precisely and studied in
theory of the critical point hinted already by the detail: for example, constructive field theory, KAM
primitive results of van der Waals theory. theory of quasiperiodic motions, and various pro-
The initial hope was, perhaps, that there would be a blems in dynamical systems.
very small number of critical values c and H However, the applications always concern special
possible: but it rapidly faded away leaving, however, cases and in each of them the general picture of the
the possibility that the critical fluctuations could be trivialnontrivial fixed point dichotomy appears
classified into universality classes. Each class would realized but without being accompanied, except in
contain many energy functions which, upon iterated rare cases (like the hierarchical models or the
actions of Rc , would evolve under the control of the universality theory of maps of the interval), by the
trivial fixed point (always existing) for  small while, full description of stable manifold, unstable direction,
for  = c , they would be controlled, instead, by a and action of the renormalization transformation on
nontrivial fixed point H for Rc with the same c and objects other than the one of immediate interest (a
the same H . For  < c , a resolution of the generality which looks often an intractable problem,
approach to the trivial fixed point would be seen by but which also turns out not to be necessary).
considering the map R1=2 rather than Rc whose In the renormalization group context, mathema-
iterates would, however, lead to a Gaussian distribu- tical physics has played an important role also by
tion like [64] (and to a limit energy function like [65]). providing clear evidence that universality classes
The picture is highly hypothetical: but it is could not be too few: this was shown by the
the first suggestion of a mechanism leading to numerous exact solutions after Onsagers solution
critical points with the character of universality of the nearest-neighbor Ising ferromagnet: there are
and with exponents different from those of the van in fact several lattice models in d = 2 that exhibit
der Waals theory or, for ferromagnets on a lattice, critical points with some critical exponents exactly
from those of its lattice version (the CurieWeiss computable and that depend continuously on the
theory). Furthermore, accepting the approximations models parameters.
82 Introductory Article: Equilibrium Statistical Mechanics

For more details, we refer the reader to McCoy Lack of equipartition is important, as it solves
and Wu (1973), Baxter (1982), Bleher and Sinai paradoxes that arise in classical statistical mechanics
(1975), Wilson and Fisher (1972), Gawedzky and applied to systems with infinitely many degrees
Kupiainen (1983, 1985), Benfatto and Gallavotti of freedom, like crystals (modeled by lattices of
(1995), and Mastropietro (2004). coupled oscillators) or fields (e.g., the electromagnetic
field important in the study of black body radiation).
However, although this has been the first surprise of
Quantum Statistics quantum statistics (and in fact responsible for the
Statistical mechanics is extended to assemblies of very discovery of quanta), it is by no means the last.
quantum particles rather straightforwardly. In the At low temperatures, new unexpected (i.e.,
case of N identical particles, the observables are with no analogs in classical statistical mechanics)
operators O on the Hilbert space phenomena occur: BoseEinstein condensation
(superfluidity), Fermi surface instability (supercon-
HN L2 N
 or HN L2   C2 N
 ductivity), and appearance of off-diagonal long-
where  = , , of the symmetric ( = , bosonic range order (ODLRO) will be selected to illustrate
particles) or antisymmetric ( = , fermionic parti- the deeply different kinds of problems of quantum
cles) functions (Q), Q = (q1 , . . . , qN ), of the posi- statistical mechanics. Largely not yet understood,
tion coordinates of the particles or of the position such phenomena pose very interesting problems not
and spin coordinates (Q, s), s = (1 , . . . , N ), nor- only from the physical point of view but also from
malized so that the mathematical point of view and may pose
Z challenges even at the level of a definition. However,
XZ
j Qj2 dQ 1 or j Q; sj2 dQ 1 it should be kept in mind that in the interesting cases
s (i.e., three-dimensional systems and even most two-
and one-dimensional systems) there is no proof that
here only j = 1 is considered. As in classical the objects defined below really exist for the systems
mechanics, a state is defined by the average values like [66] (see, however, the final comment for an
hOi that it attributes to the observables. important exception).
Microcanonical, canonical, and grand canonical
ensembles can be defined quite easily. For instance,
BoseEinstein Condensation
consider a system described by the Hamiltonian
(
h = Plancks constant) In a canonical state with parameters , v, a defini-
tion of the occurrence of Bose condensation is in
2 X
h N X X
terms of the eigenvalues j (, N) of the kernel
HN   qj qj  qj0 wqj
2m j1 j<j0 j (q, q0 ) on L2 (), called the one-particle reduced
def
density matrix, defined by
KF 66
X1 En ;N Z
e
where periodic boundary conditions are imagined N n q; q1 ; . . . ; qN1
n1
tr eHN
on  and w(q) is periodic, smooth potential (the side
0
of  is supposed to be a multiple of the periodic  n q ; q1 ; . . . ; qN1 dq1 . . . dqN1 68
potential period if w 6 0). Then a canonical
where En (, N) are the eigenvalues of HN and
equilibrium state with inverse temperature  and
n (q1 , . . . , qN ) are the corresponding eigenfunctions.
specific volume v = V=N attributes to the observable
If j are ordered by increasing value, the state with
O the average value
parameters , v is said to contain a BoseEinstein
def tr eHN O condensate if 1 (, N) bN > 0 for all large  at
hOi 67 v = V=N,  fixed. This receives the interpretation
tr eHN
that there are more than bN particles with equal
Similar definitions can be given for the grand momentum. The free Bose gas exhibits a Bose
canonical equilibrium states. condensation phenomenon at fixed density and
Remarkably, the ensembles are orthodic and a heat small temperature.
theorem (see the section Heat theorem and ergodic
hypothesis) can be proved. However, equipartition
Fermi Surface
does not hold: that is, hKi 6 (d=2)N1 , although  1
is still the integrating factor of dU p dV in the heat The wave functions n (q1 , 1 , . . . , qN , N )  n (Q, s)
theorem; hence, 1 continues to be proportional to are now antisymmetric in the permutations of the
temperature. pairs (qi , i ). Let (Q, s; N, n) denote the nth
Introductory Article: Equilibrium Statistical Mechanics 83

eigenfunction of the N-particle energy HN in [66] with The system is said to contain Cooper pairs with
eigenvalue E(N, n) (labeled by n = 0, 1, . . . and non- spins , ( = or  = ) if there exist functions
decreasingly ordered). Setting Q00 = (q001 , . . . , q00Np ), g (q, ) 6 0 with
s 00 = (001 , . . . , 00Np ), introduce the kernels H
p (Q, s;
N Z
0
Q0 , s 0 ) by g q; g q;  dq 0 if  6 0
p Q;s;Q0 ;s 0
 Z X such that
def N X1 EN;n
e
p! dNp Q00 lim x  y; ; x0  y0 ; 0 ; x  x0
p tr eHN V!1
s 00 n0 X
 Q;s;Q00 ;s 00 ;N;n Q0 ; s 0 ;Q00 ;s 00 ; N; n 69 !0
g x  y; g x0  y0 ; 0 70
xx !1

which are called p-particle reduced density matrices
In this case, g (x  y, ) with largest L2 norm can be
(extending the corresponding one-particle reduced
def P called, after normalize, the wave function of the paired
density matrix [68]). Denote (q1  q2 ) =  1
state of lowest energy: this is the analog of the plane
(q1 , , q2 , ). It is also useful to consider spinless
wave for a free particle (and, like it, it is manifestly not
fermionic systems: the corresponding definitions are
normalizable, i.e., it is not square integrable as a
obtained simply by suppressing the spin labels and
function of x, y). If the system contains Cooper pairs
will not be repeated.
and the nonleading terms in the limit [70] vanish
Let r1 (k) be the Fourier transform of 1 (q  q0 ): the
quickly enough the two-particle reduced density
Fermi surface can be defined as the locus of the ks in
matrix [70] regarded as a kernel operator has an
the neighborhood of which @k r1 (k) is unbounded as
eigenvalue of order V as V ! 1: that is, the state of
 ! 1,  ! 1. The limit as  ! 1 is important
lowest energy is macroscopically occupied, quite
because the notion of a Fermi surface is, possibly,
like the free Bose condensation in the ground state.
precise only at zero temperature, that is at  = 1.
Cooper pairs instability might destroy the Fermi
So far, existence of Fermi surface (i.e., the smooth-
surface in the sense that r1 (k) becomes analytic in k;
ness of r1 (k) except on a smooth surface in k-space)
but it is also possible that, even in the presence of
has been proved in free Fermi systems ( = 0) and
them, there remains a surface which is the locus of the
1. certain exactly soluble one-dimensional spinless singularities of the function r1 (k). In the first case,
systems and there should remain a trace of it as a very steep
2. in rather general one-dimensional spinless systems gradient of r1 (k) of the order of an exponential in the
or systems with spin and repulsive pair interac- inverse of the coupling strength; this is what happens
tion, possibly in an external periodic potential. in the BCS model for superconductivity. The model is,
however, a mean-field model and this particular
The spinning case in a periodic potential and
regularity aspect might be one of its peculiarities. In
dimension d 2 is the most interesting case to study
any event, a smooth singularity surface is very likely to
for its relevance in the theory of conduction in
exist for some interesting density matrix (e.g., in the
crystals. Essentially no mathematical results are
BCS model with gap parameter the wave function
available as the above-mentioned ones do not Z
concern any case in dimension >1: this is a rather 1
gx  y;   d
eik xy q dk
deceiving aspect of the theory and a challenge. 2 "k>0 "k2 2
In dimension 2 or higher, for fermionic systems
with Hamiltonian [66], not only there are no results of the lowest energy level of the Cooper pairs is
available, even without spin, but it is not even clear singular on a surface coinciding with the Fermi
that a Fermi surface can exist in presence of surface of the free system).
interesting interactions.
ODLRO
Cooper Pairs Consider the k-fermion reduced density matrix
The superconductivity theory has been phenomeno- k (Q, s; Q0 , s 0 ) as kernel operators Ok on L2 (( 
logically related to the existence of Cooper pairs. C2 )k ). Suppose k is even, then if Ok has a (generalized)
Consider the Hamiltonian [66] and define (cf. [69]) eigenvalue of order N k=2 as N ! 1, N=V = , the
system is said to exhibit off-diagonal long-range order
x  y; ; x0  y0 ; 0 ; x  x0 of order k. For k odd, ODLRO is defined to exist if Ok
def
has an eigenvalue of order N (k1)=2 and k 3 (if k = 1
2 x; ; y; ; x0 ; 0 ; y0 ; 0 the largest eigenvalue of O1 is necessarily 1).
84 Introductory Article: Equilibrium Statistical Mechanics

For bosons, consider the reduced density matrix Appendix 1: The Physical Meaning of the
k (Q; Q0 ) regarding it as a kernel operator Ok on Stability Conditions
L2 ()k and define ODLRO of order k to be present
if O(k) has a (generalized) eigenvalue of order N k as It is useful to see what would happen if the
N ! 1, N=V = . conditions of stability and temperedness (see [14])
ODLRO can be regarded as a unification of the are violated. The analysis also illustrates some of the
notions of Bose condensation and of the existence of typical methods of statistical mechanics.
Cooper pairs, because Bose condensation could be
said to correspond to the kernel operator 1 (q1  q2 ) Coalescence Catastrophe due
in [68] having a (generalized) eigenvalue of order N, to Short-Distance Attraction
and to be a case of ODLRO of order 1. If the state is
pure in the sense that it has a cluster property (see The simplest violation of the first condition in [14]
the sections Phase transitions and boundary condi- occurs when the potential is smooth and negative
tions and Lattice models), then the existence of at the origin.
ODLRO, Bose condensation, and Cooper pairs Let  > 0 be so small that the potential at distances
implies that the system shows a spontaneously  2 is  b < 0. Consider the canonical distribution
broken symmetry: conservation of particle number with parameters , N in a (cubic) box  of volume V.
and clustering imply that the off-diagonal elements The probability Pcollapse that all the N particles are
of (all) reduced density matrices vanish at infinite located in a little sphere of radius  around the center
separation in states obtained as limits of states with of the box (or around any prefixed point of the box) is
periodic boundary conditions and Hamiltonian [66], estimated from below by remarking that
and this is incompatible with ODLRO.  
N b
The free Fermi gas has no ODLRO, the BCS model   b
 N2
2 2
of superconductivity has Cooper pairs and ODLRO
with k = 2, but no Fermi surface in the above sense so that
(possibly too strict). Fermionic systems cannot have
Pcollapse
ODLRO of order 1 (because the reduced density Z
matrix of order 1 is bounded by 1). dpdq Kp q
e
The contribution of mathematical physics has h3N N!
ZC
been particularly effective in providing exactly dpdq Kp q
e
soluble models: however, the soluble models deal h3N N!
with one-dimensional systems and it can be shown p3 !N
4 2m 1 3N b1=2NN  1
that in dimensions 1, 2 no ODLRO can take place. 3
e
3h N!
A major advance is the recent proof of ODLRO and Z 71
Bose condensation in the case of a lattice version of dq q
e
[66] at a special density value (and d 3). h3N N!
In no case, for the Hamiltonian [66] with 6 0,
The phase space is extremely small: nevertheless,
existence of Cooper pairs has been proved nor
such configurations are far more probable than the
existence of a Fermi surface for d > 1. Nevertheless,
configurations which look macroscopically cor-
both Bose condensation and Cooper pairs formation
rect, that is, configurations with particles more or
can be proved to occur rigorously in certain limiting
less spaced by the average particle distance expected
situations. There are also a variety of phenomena
in a macroscopically homogeneous configuration,
(e.g., simple spectral properties of the Hamiltonians)
namely (N=V)1=3 = 1=3 . Their energy (q) is of
which are believed to occur once some of the
the order of uN for some u, so that their probability
above-mentioned ones do occur and several of
will be bounded above by
them can be proved to exist in concrete models. Z
If d = 1, 2, ODLRO can be proved to be impos- dpdq Kp uN
e
sible at T > 0 through the use of Bogoliubovs h3N N!
Pregular  Z
inequality (used in the no d = 2 crystal theorem, dpdq Kp q
see the section Continuous symmetries: no d = 2 e
h3N N!
crystal theorem). p3
For more details, the reader is referred to Penrose V N 2m1 uN
3N e
and Onsager (1956), Yang (1962), Ruelle (1969), Z h N! 72
Hohenberg (1967), Gallavotti (1999), and dq q
e
Aizenman et al. (2004). h3N N!
Introductory Article: Equilibrium Statistical Mechanics 85

However, no matter how small  is, the interactions in the above subsection; it occurs when
ratio Pregular =Pcollapse will approach 0 as V ! 1, the potential is too repulsive at 1, that is,
N=V ! v1 ; this occurs extremely rapidly because
2
ebN =2 eventually dominates over V N
eN log N . q
gjqj3" as q!1
Thus, it is far more probable to find the system in a so that the temperedness condition is again
microscopic volume of size  rather than in a violated.
configuration in which the energy has some macro- In addition, in this case, the system does not
scopic value proportional to N. This catastrophe can occupy the whole volume: it will generate a layer of
be called an ultraviolet catastrophe (as it is due to the particles sticking, in close-packed configuration, to
behavior at very short distances) and it causes the the walls of the container. Therefore, if the density is
collapse of the particles into configurations concen- lower than the close-packing density,  < cp , the
trated in regions as small as we please as V ! 1. system will leave a region around the center of the
container  empty; and the volume of the empty
Coalescence Catastrophe due region will still be of the order of the total volume of
to Long-Range Attraction the box (i.e., its diameter will be a fraction of the
It occurs when the potential is too attractive near 1. box side L). The proof is completely analogous to
For simplicity, suppose that the potential has a hard the one of the previous case; except that now the
core, i.e., it is 1 for r < r0 , so that the above- configuration with lowest energy will be the one
discussed coalescence cannot occur and the system sticking to the wall and close packed there, rather
density bounded above by a certain quantity cp < 1 than the one close packed at the center.
(close-packing density). Also this catastrophe is important as it is realized in
The catastrophe occurs if (q)
gjqj3" , g, " > 0, systems of charged particles bearing the same charge:
for jqj large. For instance, this is the case for matter the charges adhere to the boundary in close-packing
interacting gravitationally; if k is the gravitational configuration, and dispose themselves so that the
constant, m is the particle mass, then g = km2 and " = 2. electrostatic potential energy is minimal. Therefore,
The probability Pregular of regular configurations, charges deposited on a metal will not occupy the whole
where particles are at distances of order 1=3 from volume: they will rather form a surface layer minimiz-
their close neighbors, is compared with the probability ing the potential energy (i.e., so that the Coulomb
Pcollapse of catastrophic configurations, with the potential in the interior is constant). In general, charges
particles at distances r0 from their close neighbors to in excess of neutrality do not behave thermodynami-
form a configuration of density cp =(1 )3 almost in cally: for instance, besides not occupying the whole
close packing (so that r0 is equal to the hard-core volume given to them, they will not contribute
radius times 1 ). In the latter case, the system does normally to the specific heat.
not fill the available volume and leaves empty a region Neutral systems of charges behave thermodyna-
whose volume is a fraction
((cp  )=cp )V of V. mically if they have hard cores, so that the
Further, it can be checked that the ratio Pregular =Pcollapse ultraviolet catastrophe cannot occur or if they obey
tends to 0 at a rate O(exp (g 12 N(cp (1 )3  ))) quantum-mechanical laws and consist of fermionic
if  is small enough (and  < cp ). particles (plus possibly bosonic particles with
A system which is too attractive at infinity will not charges of only one sign).
occupy the available volume but will stay confined in a For more details, we refer the reader to Lieb
close-packed configuration even in empty space. and Lebowitz (1972) and Lieb and Thirring (2001).
This is important in the theory of stars: stars cannot
be expected to obey regular thermodynamics and in
Appendix 2: The Subadditivity Method
particular will not evaporate because their particles
interact via the gravitational force at large distances. A simple consequence of the assumptions is that the
Stars do not occupy the whole volume given to them exponential in (5.2) can be bounded above by
 PN 2
(i.e., the universe); they do not collapse to a point only eBN exp( 2m i = 1 P i ) so that
because the interaction has a strongly repulsive core  pd 
(even when they are burnt out and the radiation pressure  B
1  Zgc ; ; V  exp Ve e 2m1
is no longer able to keep them at a reasonable size).
1 pd
)0 log Zgc ; ; V  e eB 2m 1 73
Evaporation Catastrophe V
This is another infrared catastrophe, that is, a Consider, for simplicity, the case of a hard-core
catastrophe due to the long-range structure of the interaction with finite range (cf. [14]). Consider a
86 Introductory Article: Equilibrium Statistical Mechanics

sequence of boxes n with sides 2n L0 , where L0 > 0 be the Poisson bracket. Integration by parts, with
is arbitrarily fixed to be > 2R. The partition function periodic boundary conditions, yields
Zgc (, z) relative to the volume n is R
1 NZ A fC; eH gdPdQ
X z hA fC; Hgi  
Zn dQeFQ Zc ; ; N
N!
N0  n
  1 hfA ; Cgi 75
because the integral over the P variables can be
explicitly performed and included in zN if z is as a general identity. The latter identity implies, for
defined as z = e (2m1 )d=2 . A = {C, H}, that
Then the box n contains 2d boxes n1 for n 1
hfH; Cg fH; Cgi 1 hfC; fH; C ggi 76
and
d
  Hence, the Schwartz inequality hA Aih{H, C}
1  Zn  Z2n1 exp B2dLn1 =Rd1 22d 74
{H, C}i jh{A , C}ij2 combined with the two
because the corridor of width 2R around the relations in [75], [76] yields Bogoliubovs inequality:
boundaries of the 2d cubes n1 filling n has
jhfA ; Cgij2
volume 2RLn1 2d and contains at most hA Ai  1 77
(Ln1 =R)d1 2d particles, each of which interacts hfC; fC ; Hggi
with at most 2d other particles. Therefore,
Let g, h be arbitrary complex (differentiable)
def functions and @ j = @ qj
pn Ldn log Zn
 Ldn1 log Zn1 B d 2n L0 =Rd1 def
X
N
def
X
N
AQ gqj ; CP; Q pj hqj 78
for some d > 0. Hence, 0  pn  pn1 d 2n j1 j1
for some d > 0 and pn is bounded above and below P1 2
uniformly in n. So, the limit [13] exists on the sequence Then H = 2 pj F(q1 , . . . , qN ), if
Ln = L0 2n and defines a function p1 (, ).
1X X
A box of arbitrary size L can be filled with about Fq1 ; . . . ; qN jqj  qj0 j " Wqj
(L=Ln )d boxes of side Ln with n  so large that, 2 j6j0 j
prefixed  > 0, jp1  pn j <  for all n n . Likewise,
a box of size Ln can be filled by about (Ln =L)d so that, via algebra,
boxes of size L if n is large. The latter remarks lead X
us to conclude, by standard inequalities, that the fC; Hg  hj @ j F  pj pj @ j hj
j
limit in [13] exists and coincides with p1 .
The subadditivity method just demonstrated for def
with hj = h(qj ). If h is real valued, h{C, {C , H}}i
finite-range potentials with hard core can be extended becomes, again via algebra,
to the potentials satisfying just stability and tempered- * +
ness (cf. the section Thermodynamic limit). X
For more details, the reader is referred to Ruelle hj hj0 @ j @ j0 FQ
jj0
(1969) and Gallavotti (1999). * +
X 4X 2
2
" hj Wqj @ j hj
j
 j
Appendix 3: An Infrared Inequality
(integrals on pj just replace p2j by 21 and
The infrared inequalities stem from Bogoliubovs
h(pj )i (pj )i0 i = 1 i, i0 ). Therefore, the average
inequality. Consider as an example the problem of
h{C, {C , H}}i becomes
crystallization discussed in the section Continuous
symmetries: no d = 2 crystal theorem. Let h i *
1X
denote average over a canonical equilibrium state hj  hj0 2 jqj  qj0 j
with Hamiltonian 2 jj0
+
N p2
X X X
H
j
UQ "WQ " h2j Wqj 41 @ j hj 2 79
j1
2 j j

with given temperature P and density parameters Choose g(q)  ei(k K) q , h(q) = cos q k and
, ,  = a3 . Let {X, Y} = j (@pj X @ qj Y  @qj X @pj Y) bound (hj  hj0 )2 by k 2 (qj  qj0 )2 , (@ j hj )2 by k 2 and
Introductory Article: Equilibrium Statistical Mechanics 87

h2j by 1. Hence [79] is bounded above by ND(k ) the interior points, in this case on the derivatives of FV
with with respect to , ,  at 0. The latter are identical to
* ! the averages in [80], [81]. In this way, the constants
def 2 1 1 X 2 B1 , B2 , B0 such that D(k )  k 2 B1 "B2 and B0 > D1
Dk k 4 q  qj0 jqj  qj0 j
2N j6j0 j are found.
+ For more details, the reader is referred to Mermin
1X (1968).
" jWqj j 80
N j

This can be used to estimate the denominator in


Further Reading
[77]. For the LHS remark that
Aizenman M (1980) Translation invariance and instability of phase
X
N
coexistence in the two dimensional Ising system. Communica-
iq k K 2
hA ; Ai j e j tions in Mathematical Physics 73: 8394.
j1 Aizenman M (1982) Geometric analysis of 4 fields and Ising
models. 86: 148.
and Aizenman M, Lieb EH, Seiringer R, Solovej JP, and Yngvason J
D X E 2 (2004) BoseEinstein condensation as a quantum phase

jhfA ; Cgij2 hj @gj transition in a optical lattice, Physical Review A 70: 023612.
j Baxter R (1982) Exactly Solved Models. London: Academic Press.
Benfatto G and Gallavotti G (1995) Renormalization group.
jK k j2 N2 " K " K 2k 2 Princeton: Princeton University Press.
Bleher P and Sinai Y (1975) Critical indices for Dysons asympto-
hence [77] becomes, after multiplying both sides tically hierarchical models. Communications in Mathematical
by the auxiliary function (k ) (assumed even and Physics 45: 247278.
vanishing for jk j > =a) and summing over k , Boltzmann L (1968a) Uber die mechanische Bedeutung des zweiten
* + Haupsatzes der Warmetheorie. In: Hasenohrl F (ed.) Wissenschaf-
def 1
X 1 XN
tliche Abhandlungen, vol. I, pp. 933. New York: Chelsea.
iKk qj 2
D1 k j e j Boltzmann L (1968b) Uber die Eigenshaften monzyklischer und
N k N j1
anderer damit verwandter Systeme. In: Hasenohrl FP (ed.)
1X Wissenshafltliche Abhandlungen, vol. III, pp. 122152.
k New York: Chelsea.
N k Dobrushin RL (1968) Gibbsian random fields for lattice systems
jKj2 " K " K 2k 2 with pairwise interactions. Functional Analysis and Applica-
 81 tions 2: 3143.
4 Dk Domb C and Green MS (1972) Phase Transitions and Critical
Points. New York: Wiley.
To apply [77] the averages in [80], [81] have to be Dyson F (1969) Existence of a phase transition in a onedimensional
bounded above: this is a technical point that is Ising ferromagnet. Communications in Mathematical Physics 12:
discussed here, as it illustrates a general method of 91107.
using the results on the thermodynamic limits and Dyson F and Lenard A (1967, 1968) Stability of matter. Journal
their convexity properties of Mathematical Physics 8: 423434, 9: 698711.
P to obtain d PN
estimates.
ik qj 2 Friedli S and Pfister C (2004) On the singularity of the free energy at
Note that h(1=N) k (k)d P kj j=1 e j i is a first order phase transition. Communications in Mathematical
identically P e (2=N)h j<j0 (q
(0) e j  qj0 )i with Physics 245: 69103.
def
e
(q) = (1=N) k (k )eik q . Gallavotti G (1999) Statistical Mechanics. Berlin: Springer.
Let ,  (q) def = (q) q2 j(q)j  e(q) and Gallavotti G, Bonetto F and Gentile G (2004) Aspects of the
def Ergodic, Qualitative and Statistical Properties of Motion.
let FV ( , , ) = (1=N) log Zc ( , , ) with Zc the
Berlin: Springer.
partition function P in the volume  Pcomputed Gawedzky K and Kupiainen A (1983) Block spin renormalization
0
with
P energy U = jj0 ,  (qj  qj0 ) " j W(qj ) group for dipole gas and (@)4 . Annals of Physics 147:
" jW(qj )j. Then FV ( , , ) is convex in ,  198243.
and it is uniformly bounded above and below if Gawedzky K and Kupiainen A (1985) Massless lattice 44 theory:
jj, j"j, jj  1 (say) and j j  0 : here 0 > 0 exists rigorous control of a renormalizable asymptotically free model.
Communications in Mathematical Physics 99: 197252.
if r 2 j(r)j satisfies the assumption set at the Gibbs JW (1981) Elementary Principles in Statistical Mechanics.
beginning of the section Continuous symmetries: Woodbridge (Connecticut): Ox Bow Press (reprint of the 1902
no d = 2 crystal theorem and the density is smaller edition).
than a close packing (this is because the potential U0 Higuchi Y (1981) On the absence of non translationally invariant
will still satisfy conditions similar to [14] uniformly Gibbs states for the two dimensional Ising system. In: Fritz J,
Lebowitz JL, and Szaz D (eds.) Random Folds. Amsterdam:
in j"j, jj < 1 and j j small enough). North-Holland.
Convexity and boundedness above and below Hohenberg PC (1967) Existence of long range order in one and
in an interval imply bounds on the derivatives in two dimensions. Physical Review 158: 383386.
88 Introductory Article: Functional Analysis

Landau L and Lifschitz LE (1967) Physique Statistique. Moscow: Miracle-Sole S (1995) Surface tension, step free energy and facets
MIR. in the equilibrium crystal shape. Journal Statistical Physics 79:
Lanford O and Ruelle D (1969) Observables at infinity and 183214.
states with short range correlations in statistical mechanics. Olla S (1987) Large deviations for Gibbs random fields.
Communications in Mathematical Physics 13: 194215. Probability Theory and Related Fields 77: 343357.
Lebowitz JL (1974) GHS and other inequalities. Communications Onsager L (1944) Crystal statistics. I. A two dimensional Ising
in Mathematical Physics 28: 313321. model with an orderdisorder transition. Physical Review 65:
Lebowitz JL and Penrose O (1979) Towards a rigorous molecular 117149.
theory of metastability. In: Montroll EW and Lebowitz JL Penrose O and Onsager L (1956) BoseEinstein condensation and
(eds.) Fluctuation Phenomena. Amsterdam: North-Holland. liquid helium. Physical Review 104: 576584.
Lee TD and Yang CN (1952) Statistical theory of equations of Pfister C and Velenik Y (1999) Interface, surface tension and
state and phase transitions, II. Lattice gas and Ising model. Reentrant pinning transition in the 2D Ising model. Commu-
Physical Review 87: 410419. nications in Mathematical Physics 204: 269312.
Lieb EH (2002) Inequalities. Berlin: Springer. Ruelle D (1969) Statistical Mechanics. New York: Benjamin.
Lieb EH and Lebowitz JL (1972) Lectures on the Thermodynamic Ruelle D (1971) Extension of the LeeYang circle theorem.
Limit for Coulomb Systems, In: Lenard A (ed.) Springer Physical Review Letters 26: 303304.
Lecture Notes in Physics, vol. 20, pp. 135161. Berlin: Springer. Sinai Ya G (1991) Mathematical Problems of Statistical Mechanics.
Lieb EH and Thirring WE (2001) Stability of Matter from Atoms Singapore: World Scientific.
to Stars. Berlin: Springer. van Beyeren H (1975) Interphase sharpness in the Ising model.
Mastropietro V (2004) Ising models with four spin interaction at Communications in Mathematical Physics 40: 16.
criticality. Communications in Mathematical Physics 244: Wilson KG and Fisher ME (1972) Critical exponents in 3.99
595642. dimensions. Physical Review Letters 28: 240243.
McCoy BM and Wu TT (1973) The two Dimensional Ising Yang CN (1962) Concept of off-diagonal long-range order and
Model. Cambridge: Harvard University Press. the quantum phases of liquid He and of superconductors.
Mermin ND (1968) Crystalline order in two dimensions. Physical Reviews of Modern Physics 34: 694704.
Review 176: 250254.

Introductory Article: Functional Analysis


S Paycha, Universite Blaise Pascal, Aubiere, France which was developed later) settled on firm ground.
2006 Elsevier Ltd. All rights reserved. Strongly inspired by algebraic methods, Fredholms
work at the turn of the nineteenth century, in which
emerged the concept of kernel of an operator,
became a founding stone for the modern theory of
Introduction
integral equations. Hilbert developed further Fred-
Functional analysis is concerned with the study of holms methods for symmetric kernels, exploiting
functions and function spaces, combining techniques analogies with the theory of real quadratic forms
borrowed from classical analysis with algebraic and thereby making clear the importance of the
techniques. Modern functional analysis developed notion of square-integrable functions. With Hilberts
around the problem of solving equations with Grundzuge einer allgemeinen Theorie der Integral-
solutions given by functions. After the differential gleichung, a further step was made from the
and partial differential equations, which were algebra of the infinite to the geometry of the
studied in the eighteenth century, came the integral infinite. The contribution of Frechet, who intro-
equations and other types of functional equations duced the abstract notion of a space endowed with a
investigated in the nineteenth century, at the end of distance, made it possible to transfer Euclidean
which arose the need to develop a new analysis, geometry to the framework of what have since
with functions of an infinite number of variables then been called Hilbert spaces, a basic concept in
instead of the usual functions. In 1887, Volterra, mathematics and quantum physics.
inspired by the calculus of variations, suggested a The usefulness of functional analysis in the study
new infinitesimal calculus where usual functions are of quantum systems became clear in the 1950s when
replaced by functionals, that is, by maps from a Kato proved the self-adjointness of atomic Hamilto-
function space to R or C, but he and his followers nians, and Garding and Wightman formulated
were still missing some algebraic and topological axioms for quantum field theory. Ever since func-
tools to be developed later. Modern analysis was tional analysis lies at the very heart of many
born with the development of an algebra of the approaches to quantum field theory. Applications
infinite closely related to classical linear algebra of functional analysis stretch out to many branches
which by 1890 had (up to the concept of duality, of mathematics, among which are numerical
Introductory Article: Functional Analysis 89

analysis, global analysis, the theory of pseudodiffer- any non-negative integer k, the space Ck ([0, 1]) of
ential operators, differential geometry, operator functions on P [0, 1] of class Ck equipped with the
algebras, noncommutative geometry, etc. norm kf kk = ki= 0 kf (i) k1 expressed in terms of a
finite number of seminorms kf (i) k1 = supx2[0,1]
jf (i) (x)j, i = 0, . . . , k, is also a Banach space.
Topological Vector Spaces
The space C1 ([0, 1]) of smooth functions on the
Most topological spaces one comes across in practice interval [0, 1] is not anymore a Banach space since
are metric spaces. A metric on a topological space E its topology is described by a countable family of
is a map d : E  E ! [0, 1[ which is symmetric, seminorms kf kk with k varying in the positive
such that d(u, v) = 0 , u = v and which verifies the integers. The metric
triangle inequality d(u, w)  d(u, v) d(v, w) for all X
1
vectors u, v, w. A topological space E is metrizable if kf  gkk
df ; g 2k
there is a metric d on E compatible with the topology k1
1 kf  gkk
on E, in which case the balls with radius 1=n centered
turns it into a Frechet space, that is, a locally convex
at any point x 2 E form a local base at x that is, a
complete metric space. The space S(Rn ) of rapidly
collection of neighborhoods of x such that every
decreasing functions, which are smooth functions f
neighborhood of x contains a member of this
on Rn for which
collection. A sequence (un ) in E then converges to
u 2 E if and only if d(un , u) converges to 0. kf k; : sup jx Dx f xj
The Banach fixed-point theorem on a complete x2R n
metric space (E, d) is a useful tool in nonlinear is finite for any multiindices  and , is also a
functional analysis: it states that a (strict) contrac- Frechet space with the topology given by the
tion on E, that is, a map T : E ! E such that seminorms k  k,  . Further examples of Frechet
d(Tu, Tv)  k(u, v) for all u 6 v 2 E and fixed 0 < spaces are the space C1 0 (K) of smooth functions
k < 1, has a unique fixed point T u0 = u0 . In with support in a fixed compact subset K  Rn
particular, it provides local existence and uniqueness equipped with the countable family of seminorms
of solutions of differential equations dy=dt = F(y, t)
with initial condition y(0) = y0 , where F is Lipschitz kD f k1; K sup jDx f xj;  2 N n0
x2K
continuous.
Linear functional analysis starts from topological and the space C1 (M, E) of smooth sections of a
vector spaces, that is, vector spaces equipped with a vector bundle E over a closed manifold M equipped
topology for which the operations are continuous. A with a similar countable family of seminorms. Given
topological vector space equipped with a local base an open subset  = [p2N Kp with Kp , p 2 N com-
whose members are convex is said to be locally pact subsets of Rn , the space D() = [p2N C1 0 (Kp )
convex. Examples of locally convex spaces are equipped with the inductive limit topology for
normed linear spaces, namely vector spaces which a sequence (fn ) in D() converges to f 2 D()
equipped with a norm, a concept that first arose in if each fn has support in some fixed compact subset
the work of Frechet. A seminorm on a vector space K and (D fn ) converges uniformly to D f on K for
V is a map  : V ! [0,1[ which obeys the triangle each mutilindex  is a locally convex space.
identity (u v)  (u) (v) for any vectors u, v Among Banach spaces are Hilbert spaces which
and such that (u) = jj(u) for any scalar  and have properties very similar to those of finite-
any vector u; if (u) = 0 ) u = 0, it is a norm, often dimensional spaces and are historically the first
denoted by k  k. A norm on a vector space E gives type of infinite-dimensional space to appear with the
rise to a translation-invariant distance function works of Hilbert at the beginning of the twentieth
d(u, v) = ku  vk making it a metric space. century. A Hilbert space is a Banach space equipped
Historically, one of the first examples of normed with a norm kk that derives from an inner product,
spaces is the space C([0, 1]) investigated by Riesz of that is, kuk2 = hu, ui with h , i a positive-definite
(real- or complex-valued) continuous functions on bilinear (or sesquilinear according to whether the
the interval [0, 1] equipped with the supremium base space is real or complex) form. Hilbert spaces
norm kf k1 := supx2[0,1] jf (x)j. In the 1920s, the are fundamental building blocks in quantum
general definition of Banach space arose in connec- mechanics; using (closed) tensor products, from a
tion with the works of Hahn and Banach. A normed Hilbert P space H one builds the Fock space
linear space is a Banach space if it is complete as a F (H) = 1 k
k = 0  H and
P from there the bosonic
metric space for the induced metric, C([0, 1]) being a Fock space F (H) = 1 
k=0 s
k
H (where s stands
prototype of a Banach space. More generally, for for the (closed) symmetrized tensor product) as well
90 Introductory Article: Functional Analysis

P
as the fermionic Fock space F (H) = 1 k
k=0  H to define W s, p () and H s (M, E) with s any real
(where k stands for the antisymmetrized (closed) number.
tensor product). Sobolev spaces arise in many areas of mathe-
A prototype of Hilbert space is the space l2 (Z) of matics; one central example in probability theory is
complex-valued
P sequences (un )n2Z such that the CameronMartin space H 1 ([0, t]) embedded in
2
ju
n2Z n j is finite, which is already implicit in the Wiener space C([0, t]). This embedding is a
Hilberts Grundzugen. Shortly afterwords, Riesz and particular case of more general Sobolev embedding
Fischer, with the help of the integration tool theorems, which embed (possibly continuously,
introduced by Lebesgue, showed that the space sometimes even compactly (the notion of compact
L2 (]0, 1[) (first introduced by Riesz) of square- operator is discussed in a later section)) W k, p -
summable functions on the interval ]0, 1[, that is, Sobolev spaces in Lq -spaces with q > p such as the
functions f such that continuous inclusion W k, p (R n )  Lq (R n ) with
Z 1 1=2 1=q = 1=p  k=n, or in Cl -spaces with l  k such
kf kL2 2
jf xj dx as, for a bounded open and regular enough subset 
0 of Rn and for any s  l n=p with p > n, the
 (the set of
continuous inclusion W s, p ()  Cl ()
is finite, provides an example of Hilbert space.
functions in Cl () such that D u can be continu-
These were then further generalized to spaces  for all jj  l).
ously extended to the closure 
Lp (]0, 1[) of p-summable (1  p < 1) functionals
Sobolev embeddings have important applications for
on ]0, 1[ (i.e., functions f such that
the regularity of solutions of partial differential
Z 1 1=p equations, when showing that weak solutions one
p
kf kLp jf xj dx constructs are in fact smooth. In particular, on an n-
0
dimensional closed manifold M for s > l n=2, the
is finite), which are not Hilbert unless p = 2 but which Sobolev space H s (M, E) can be continuously
provide further examples of Banach spaces, the space embedded in the space Cl (M, E) of sections of E of
L1 (]0, 1[) of functions on ]0, 1[ bounded almost class C l , which in particular implies that the
everywhere with respect to the Lebesgue measure, solutions of a hypoelliptic partial differential equa-
offering yet another example of Banach space. tion Au = v with v 2 L2 (M, E) are smooth, as for
In 1936, Sobolev gave a generalization of the example in the case of solutions of the Seiberg
notion of function and their derivatives through Witten equations.
integration by parts, which led to the so-called
Sobolev spaces W k, p (]0, 1[) of functions f 2
Lp (]0, 1[) with derivatives up to order k lying in
Duality
Lp (]0, 1[), obtained as the closure of C1 (]0, 1[) for The concept of duality (in a topological sense) was
the norm initiated at the beginning of the twentieth century by
!1=p Hadamard, who was looking for continuous linear
Xk
p functionals on the Banach space C(I) of continuous
f 7! kf kW k;p k@ j f kLp
functions on a compact interval I equipped with a
j1
uniform topology. It is implicit in Hilberts theory
(for p = 2, W k, p (]0, 1[) is a Hilbert space often and plays a central part in Riesz work, who
denoted by H k (]0, 1[). They differ from the Sobolev managed to express such continuous functionals as
spaces W0k, p (]0, 1[), which correspond to the closure Stieltjes integrals, one of the starting points for the
of the set D(]0, 1[) for the norm f 7!kf kW k, p ; for modern theory of integration.
example, an element u 2 W 1, p (]0, 1[) lies in The topological dual of a topological vector space
1, p
W0 (]0, 1[) if and only if it vanishes at 0 and 1, E is the space E of continuous linear forms on E
that is, if and only if it satisfies Dirichlet-type which, when E is a normed space, can be equipped
boundary conditions on the boundary of the inter- with the dual norm kLkE = supu2E, kuk1 jL(u)j.
val. Similarly, one defines Sobolev spaces Dual spaces often provide a receptacle for singular
W0k, p (R) = W k, p (R) on R, Sobolev spaces W k, p () objects; any of the functions f 2 Lp (Rn )(p  1) and
and W0k, p () on open subsets   Rn and using a the delta-function at point x 2 Rn, x : f 7! f (x), all lie
partition of unity on a closed manifold M, Sobolev in the space S 0 (R n ) dual to S(Rn ) of tempered
spaces H k (M, E) = W k, 2 (M, E) of sections of vector distributions on Rn , which is itself contained in the
bundles E over M. Using the Fourier transform space D0 (Rn ) of distributions dual to D(Rn ).
(discussed later), one can drop the assumption that k Furthermore, the topological dual E of a nuclear
be an integer and extend the notion of Sobolev space space E contains the support of a probability
Introductory Article: Functional Analysis 91

measure with characteristic function (see the next Lp () can be identified via the Riesz representation

section) given by a continuous positive-definite with Lp () with p conjugate to p, that is, 1=p
function on E. Among nuclear spaces are projective 1=p = 1 and Lp () is reflexive, whereas the topolo-
limits E = \p2N Hp (a sequence (un ) 2 E converges gical duals of W s, p () and W0s, p () both coincide
to u 2 E whenever it converges to u in each Hp ) of with W0s, p () so that only W0s, p () is reflexive.
countably many nested Hilbert spaces     Hp  Neither L1 () nor its topological dual L1 () is
Hp1      H0 such that the embedding Hp  reflexive since L1 () is strictly contained in the
Hp1 is a trace-class operator (see the section topological dual of L1 () for there are continuous
Operator algebras). If Hp is the closure of E for linear forms L on L1 () that are not of the form
the norm k  kp , the topological dual E0 of E for the Z
norm k  k0 is an inductive limit E0 = [p2N0 Hp , Lu uv 8u 2 L1  with v 2 L1 
where Hp are the dual (with respect to k  k0 ) 

Hilbert spaces with norm k  kp (a sequence (un ) 2 Similarly, the topological dual E of a normed
E0 converges to u 2 E0 whenever it lies in some Hp linear space E can be equipped with the topology
and converges to u for the topology of Hp ) and we induced by the dual norm k  kE and the the weak -
have topology, namely the weakest one for which the
maps L 7! L(u), u 2 E, are continuous, and the unit
E      Hp  Hp1      H0
ball in E is indeed compact for this topology
H00  H1      Hp      E0 (BanachAlaoglu theorem).
Duality does not always preserve separability a
As a result of the theory of elliptic operators on a topological vector space is separable if it has a
closed manifold, the Frechet space C1 (M, E) of countable dense subspace since L1 (), which is
smooth sections of a vector bundle over a closed not separable, is the topological dual of L1 (),
manifold M is nuclear as the inductive limit of which is separable. However, as a consequence of
countably many Sobolev spaces Hp (M, E) with the HahnBanach theorem, if the topological dual of
L2 -dual given by the projective limit of countably a Banach space is separable then so is the original
many Sobolev spaces H p (M, E). space and one has equivalence when adding the
The existence of nontrivial continuous linear reflexivity assumption; a Banach space is reflexive
forms on a normed linear space E is ensured by the and separable whenever its topological dual is. For
HahnBanach theorem, which asserts that for any s, p
1  p < 1, Lp () and W0 () are separable and
closed linear subspace F of E, there is a nonvanish- moreover reflexive if p 6 1.
ing continuous linear form that vanishes on F. When
the space is a Hilbert space (H,h , iH ), it follows
from the RieszFrechet theorem that any continuous
Fourier Transform
linear form L on H is represented in a unique way
by a vector v 2 H such that L(u) = hv, uiH for all In the middle of the eighteenth century, oscillations
u 2 H, thus relating the dual pairing on the left with of a vibrating string were interpreted by Bernouilli
the Hilbert inner product on the right and identify- as a limit case for the oscillation of n-point masses
ing the topological dual H with H. when n tends the infinity, and Bernouilli introduced
The strong topology induced by the norm k  k on the novel idea of the superposition principle by
a normed vector space E that is, the topology in which the general oscillation of the string should
which a sequence (un ) converges to u whenever decompose in a superposition of proper oscilla-
kun  uk ! 0 is too refined to have compact sets tions. This point of view triggered off a discussion
when E is infinite dimensional since the compactness as to whether or not an arbitrary function can be
of the unit ball in E for the strong topology expanded as a trigonometric series. Other examples
characterizes finite-dimensional spaces. Since com- of expansions in orthogonal functions (this termi-
pact sets are useful for existence theorems, one is nology actually only appears with Hilbert) had been
inclined to weaken the topology: the weak topology found in the mean time in relation to oscillation
on E which coincides with the strong topology problems and investigations on heat theory, but it
when E is finite dimensional and for which a was only in the nineteenth century, with the works
sequence (un ) converges to u if and only if L(un ) ! of Fourier and Dirichlet, that the superposition
L(u) 8L 2 E has compact unit ball if and only if E problem was solved.
is reflexive or, in other words, if E can be canonically Separable Hilbert spaces can be equipped with a
identified with its double dual (E ) . For 1 < p < 1, countable orthonormal system {en }n2Z (hen , em iH =
given an open subset   Rn, the topological dual of mn with h , iH the scalar product on H) which is
92 Introductory Article: Functional Analysis

complete, that is, any vector u 2 H can P be expanded Fourier transform maps a Gaussian function
2
in this system in a unique way u = n2Z u ^n en with x 7! e(1=2)jxj on Rn , where  is a nonzero scalar,
1 2
Fourier coefficientsPu ^n = hu, en i. The latter obey to another Gaussian function  7! e(1=2) jj (up to
Parsevals relation n2Z j^ un j2 = kuk2 (where k  k is a nonzero multiplicative factor), a starting point for
the norm associated with h , i), and the Fourier T-duality in string theory. More generally, the
transform u 7! (^ u(n))n2Z gives rise to an isometric characteristic function
isomorphism between the separable Hilbert space Z
H and the Hilbert space l2 (Z) of square-summable
^ : eihx;iH
dx
sequences of complex numbers. In particular, the H

space L2 (S1 ) of L2 -functions on the unit circle of a Gaussian probability measure


with covariance
S1 = R=Z with its usual Haar measure dt is separ- C on a Hilbert space H is the function
able with complete orthonormal system t 7! en (t) =  7! e(1=2)h, CiH . Such probability measures typically
e2int , n 2 Z and the Fourier transform arise in Euclidean quantum field theory; in axio-
 Z 1  matic quantum field theory, the analyticity proper-
u 7! t 7! u
^n e2int ut dt ties of n-point functions can be derived from the
0 n2Z Wightman axioms using Fourier transforms. Thus,
2
Fourier transformation underlies many different
identifies it with the space l (Z). Under this aspects of quantum field theory.
identification, the Hilbert subspace l2 (N) obtained
as the range in l2 (Z) of the projection p : (u)n2Z 7!
(un )n2N corresponds to the Hardy space H2 (S1 ). Fredholm operators
The Fourier transform extends to the space S(Rn ),
sending a function f 2 S(Rn ) to the map A complex-valued continuous function K on [0, 1] 
Z [0, 1] gives rise to an integral operator
^ 1 Z 1
7!f  pn eix f x dx
2 Rn A:f ! Kx; yf y dy
0
and maps S(Rn ) onto itself linearly and continuously on complex-valued continuous functions on [0, 1]
with continuous inverse f 7! ^f (). When n = 1, the (equipped with the supremum norm k  k1 ) with the
Poisson formulaPrelates f 2 S(R)Pwith its Fourier following upper bound property:
transform ^f by 1 n = 1 f (2n) =
1 ^
n = 1 f (n).
Since Fourier transformation turns (up to a kA f k1  Sup0;1
0;1
jKx; yj kf k1
constant multiplicative factor) differentiation D
In other words, A is a bounded linear operator with
for a multiindex  = (1 , . . . , n ) into multiplication
norm bounded from above by sup[0, 1][0, 1] jK(x, y)j;
by  = 11    nn , it can be used to define W s, p -
a linear operator A : E ! F from a normed linear
Sobolev spaces with s a real number as the space of
space (E,k  kE ) to a normed linear space (F,k  kF ) is
LRp -functions with finite Sobolev norms kukW s, p =
bounded (or continuous) if and only if its (operator)
^()jp )1=p (which coincide with the ones
( j(1 jj)s u
norm jkAkj := supkukE 1 kA ukF is bounded.
defined previously when s = k is a non-negative
An integral operator
integer).
Fourier transforms are also used to describe a Z 1
linear pseudodifferential operator A (see next two A:f ! Kx; yf y dy
0
sections where the notions of bounded and
unbounded linear operator are discussed) of order defined by a continuous kernel K is, moreover,
a acting on smooth functions on an open subset U compact; a compact operator is a bounded operator
of Rn in terms of its symbol A a smooth map of normed spaces that maps bounded sets to a
on U  Rn with compact support in x such that for precompact sets, that is, to sets whose closure is
any multi-indices ,  2 Nn0 , there is a constant compact. Other examples of compact operators on
C, with normed spaces are finite-rank operators, operators
with finite-dimensional range. In fact, any compact
jDx D x;   C; 1 jjajj operator on a separable Hilbert space can be
approximated in the topology induced by the
for any  2 Rn by operator norm jk  kj by a sequence of finite-rank
Z operators.
1
Af x pn eix A x; ^f  d Inspired by the work of Volterra, who, in the case
2 Rn
of the integral operator defined above, produced
Introductory Article: Functional Analysis 93

continuous solutions = (I  A)1 f of the equation bounded. Unbounded operators arise in partial
f = (I  A) for f 2 C([0, 1]), Fredholm in 1900 differential equations that involve differential opera-
(Sur une classe dequations fonctionnelles) studied the tors such as the Laplacian  on an open subset  
equation f = (I  A) , introducing a complex para- Rn . The following equations provide fundamental
meter . He proved what is since then called the examples of partial differential equations which
Fredholm alternative, which states that either the arose over time from the study of various problems
equation f = (I  A) has a unique solution for every in mathematical physics with the works of Poisson,
f 2 C([0, 1]) or the corresponding homogeneous equa- Fourier, and Cauchy:
tion (I  A) = 0 has nontrivial solutions. In modern
u 0 Laplace equation
language, it means that the resolvent R(A,
) = (A 
2

I)1 of a compact linear operator A is surjective if and @ t


u 0 wave equation
only if it is injective. The Fredholm alternative is a @t2
powerful tool to solve partial differential equations @u
u 0 heat equation
among which the Dirichlet problem, the solutions of @t
which P are harmonic functions u (i.e., u = 0, where
and later the Schrodinger equation in quantum
 =  ni= 1 @ 2 u=@x2i ) on some domain  2 Rn with
mechanics:
Dirichlet boundary conditions uj@ = f , where f is a
continuous function on the boundary @. The Dirichlet @u
i u
problem has geometric applications, in particular to the @t
nonlinear Plateau problem, which minimizes the area of where t is a time parameter.
a surface in Rd with given boundary curves and which An unbounded linear operator on an infinite-
reduces to a (linear) Dirichlet problem. dimensional normed space is usually defined on a
The operator B = I  A built from the compact domain D(A) which is strictly contained in E. The
operator A is a particular Fredholm operator, namely a Laplacian  is defined on the dense domain
bounded linear operator B : E ! F which is invertible D(A) = H 2 (R n ) in L2 (Rn ); it defines a bounded
up to compact operators, that is, such that there is a operator from H 2 (Rn ) to L2 (R n ) but does not
bounded linear operator C : F ! E with both BC  IF extend to a bounded operator on L2 (R n ). Like this
and CB  IE compact. A Fredholm operator B has a operator, most unbounded operators A : E ! F one
finite-dimensional kernel Ker B and when (E,h , iE ) comes across have dense domain D(A) in E and are
and (F,h , iF ) are Hilbert spaces its cokernel Ker B , closed, that is, their graph {(u, Au), u 2 D(A)} is
where B is the adjoint of B defined by closed as a subset of the normed linear space E  F.
hB u; viF hu; B viE 8u 2 E; 8v 2 F When not actually closed, they can be closable, that
is, they can have a closed extension called the
is also finite dimensional, so that it has a well- closure of the operator. By the closed-graph theo-
defined index ind(B) = dim(Ker B)  dim(Ker B ), a rem, when E and F are Banach spaces, a linear
starting point for index theory. Toplitz operators operator A : E ! F is continuous whenever its graph
T , where is a continuous function on the unit is closed, as a consequence of which a closed linear
circle S1 , provide first examples of Fredholm operator A : E ! F defined on a dense domain is
operators; they act on the Hardy space H2 (S1 ) by bounded provided its domain coincides with the
!
X X whole space.
Ten am e m amn em For a closed operator A : E ! F with dense
m0 m0 domain D(A), when E and F are Hilbert spaces
under the identification H2 (S1 ) l2 (N)  l2 (Z), equipped with inner products h , iE and h , iF , the
with l2 (Z) equipped with the canonical complete adjoint A of A is defined on its domain D(A ) by
orthonormal basis (en , n 2 Z). The Fredholm index hAu; viF hu; A viE 8u; v 2 DA  DA
ind(Ten ) is exactly the integer n so that the index of
its adjoint is n, as a consequence of which the index A self-adjoint operator A with domain D(A) is one
map from Fredholm operators to integers is onto. for which D(A) = D(A ) and A = A ; the Laplacian
 on Rn is self-adjoint on the Sobolev space H 2 (Rn )
but it is only essentially self-adjoint on the dense
One-Parameter (Semi) groups domain D(R n ), the latter meaning that its closure is
Unlike in the finite-dimensional situation, a linear self-adjoint.
operator A : E ! F between two normed linear Unbounded self-adjoint operators can arise as
spaces (E,k  kE ) and (F,k  kF ) is not expected to be generators of one-parameter semigroups of bounded
94 Introductory Article: Functional Analysis

operators. A one-parameter family of bounded such that E  H = H  E, which in the particular


operators Tt , t  0 (Tt , t 2 R) on a Hilbert space H case of the standard Wiener measure
on the
is a semigroup (resp. group) if Ts Tt = Tts 8t, s  0 Wiener space E = C([0, t]) and with Hilbert space
(resp. 8t, s 2 R) and it is strongly continuous (or given by the CameronMartin space H = H 1 ([0, t]),
simply continuous) if limt ! t0 Tt u = Tt0 u at any t0  0 is the bilinear form
(resp. t0 2 R) and for any u 2 H. Z
Stones theorem sets up a one-to-one correspon-  rvi
u; v 7! hru; 
H
dence between continuous one-parameter unitary
(Ut Ut = Ut Ut = I) groups Ut , t 2 R on a Hilbert with r the (closed) gradient of Malliavin calculus.
space such that U0 = Id and self-adjoint operators The operator , where  is the Laplacian on R n ,
A obtained as infinitesimal generators, that is, as the generates the heat-operator semigroup et , t  0. It
strong limit has a smooth kernel Kt 2 C1 (Rn  Rn ) defined by
Z
Ut u  u n
Au lim ; u2H et f x Kt x; yf ydy 8f 2 C1
0 R
t!0 t Rn

of Ut , t 2 R, which in a compact form reads and defines a smoothing operator, an operator that
Ut = eitA . An important example in quantum maps Sobolev function to smooth function. In
mechanics is Ut = eit H U0 , t 2 R with H a self- general, a pseudodifferential operators A on an
adjoint Hamiltonian, which solves the Schrodinger open subset U of Rn with symbol A only has a
equation d=dtu = iHu. The LieTrotter formula, distribution kernel
which has important applications for Feynman Z
path integrals, expresses the unitary semigroup KA x; y eihxy;i d
generated by A B, where A, B, and A B are Rn
self-adjoint on their respective domains as a strong The kernel of the inverse Laplacian ( m2 )1
limit on Rn (the non-negative real number m2 stands
 itA itB n for the mass) called Greens function on R n ,
eitAB lim e n e n plays an essential role in the theory of Feynman
t!1
graphs.
On the other hand, positive operators on a
Hilbert space (H,h , iH ) that is, A self-adjoint
and such that hAu, uiH  0 8u 2 D(A) generate
Spectral Theory
one-parameter semigroups Tt = etA , t  0. Hille
and Yosida proved that on a Hilbert space, strongly Spectral theory is the study of the distribution of the
continuous contraction (i.e., jkTt kj  1 8t > 0) values of the complex parameter  for which, given
semigroups such that T0 = Id are in one-to-one a linear operator A on a normed space E, the
correspondence with densely defined positive opera- operator A  I has an inverse and of the properties
tors A : D(A)  H ! H that are maximal (i.e., I A of this inverse when it exists, the resolvent
is onto), obtained as (minus the) infinitesimal R(A, ) = (A  I)1 of A. The resolvent (A) of A
generators is the set of complex numbers  for which A  I is
invertible with densely defined bounded inverse. The
Tt u  u
Au lim ; u2H spectrum Sp(A) of A is the complement in C of the
t!0 t resolvent; it consists of a union of three disjoint sets:
of the corresponding semigroups. Similarly, a posi- the set of all complex numbers  for which A  I is
tive densely defined self-adjoint operator A on a not injective, called the point spectrum such a  is
Hilbert space H gives rise to a densely defined
p pclosed
an eigenvalue of A with associated eigenfunction
symmetric sesquilinear form (u, v) 7! p Au, AviH
h any u 2 D(A) such that Au = u; the set of points 
(see next section for a definition of A;h , iH is the for which A  I has a densely defined unbounded
scalar product on H) and this map yields a one- inverse R(A, ) called the continuous spectrum; and
to-one correspondence between operators and the set of points  for which A  I has a well-
sesquilinear forms on H with the aforementioned defined unbounded but not densely defined inverse
properties, one of the starting points for the theory R(A, ) called the residual spectrum.
of Dirichlet forms. To a probability measure
on A bounded operator has bounded spectrum and a
a separable Banach space E, one can associate a self-adjoint operator A acting on a Hilbert space has
densely defined closed symmetric sesquilinear form real spectrum and no residual spectrum since the
(it is in fact a Dirichlet form) on a Hilbert space H range of A  I is dense. As a consequence of the
Introductory Article: Functional Analysis 95

Fredholm alternative, the spectrum of a compact with involution given by the adjoint operation
operator consists only of point spectrum; it is A 7! A ; it is a C -algebra, that is, an algebra over
countable with accumulation point at 0. A Hamilto- C with a norm k  k and an involution such that A
nian of a quantum mechanichal system can have is closed for this norm and such that kabk  kakkbk
both point and continuous spectra, but its point and ka ak = kak2 for all a, b 2 A and by the
spectrum is of special interest because the corre- GelfandNaimark theorem, every C -algebra is
sponding eigenfunctions are stationary states of the isomorphic to a sub-C -algebra of some L(H). The
system. As was first pointed out by Kac (Can you notion of spectrum extends from bounded opera-
hear the shape of a drum?), the spectrum of an tors to C -algebras; the spectrum sp(a) of an
operator acting on functions can reflect the geome- element a in a C -algebra A is a (compact) set of
try of the space these functions are defined on, a complex numbers such that a    1 is not inver-
starting point for many interesting and far-reaching tible. The notion of self-adjointness also extends
questions in differential geometry. (a = a ), and just as a self-adjoint operator B 2
A self-adjoint linear operator on a Hilbert space L(H) is non-negative (in which case its spectrum
can be described in terms of a family of projections lies in R ) if and only if B = A A for some bounded
E ,  2 R via the spectral representation operator A, an element b 2 A is said to be non-
Z negative if and only if b = a a for some a 2 A, in
A dE which case sp(a)  R 0.
SpA The algebra C(X) of continuous functions f : X !
Given a Borel real-valued function f on R, the operator C vanishing at infinity on some locally compact
Z Hausdorff space X equipped with the supremum
f A f dE norm and the conjugation f 7! f is also a C -algebra
SpA and a prototype for abelian C -algebras, since
yields another self-adjoint operator. A positive Gelfand showed that every abelian C -algebra is
operator A on a dense domain D(A) of some Hilbert isometrically isomorphic to C(X), with X compact if
space (H,h , iH ) has non-negative spectrum and for the algebra is unital. To a C -algebra A, one can
any positive real number t, the map  7! et gives associate an abelian group K0 (A) which is dual to the
the associated bounded heat-operator Grothendieck group K0 (X) of isomorphism classes of
Z vector bundles over a compact Hausdorff space X.
etA et dE Compact operators on a Hilbert space H form
SpA the only proper two-sided ideal K(H) of the C -
p algebra L(H) which is closed for the operator norm
while the map  7!  gives rise to a positive
p p2 topology on L(H). The quotient L(H)=K(H) is
operator A such that A = A. called the Calkin space, after Calkin, who classi-
The resolvent can also be used to define new fied all two-sided ideals in L(H) for a separable
operators Hilbert space H; one can set up a one-to-one
Z correspondence between such ideals and certain
1
f A f RA; d sequence spaces. Corresponding to the Banach
2i C
spacePl1 (Z) of complex-valued sequences (un ) such
from a linear operator via a Cauchy-type integral that n2N jun j < 1, is the -ideal IP1 (H) of trace-
along a countour C around the spectrum; this way class operators. The trace tr(A) = n2Z hA en ,en iH
one defines complex powers Az of (essentially self- of a negative operator A 2 L(H) lies in [0, 1]
adjoint) positive elliptic pseudodiffferential opera- and is independent of the choice of the complete
tors which enter the definition of the zeta-function, orthonormal basis {en , n 2 Z} of H equipped with
z 7! (A, z), of the operator A. The -function is a the inner product h , iH . I 1 (H) is the Banach space
useful tool to extend the ordinary determinant to of bounded linear operators on H such that
-determinants of self-adjoint elliptic operators, kAk1 = tr(jAj) is bounded. Given an (esssentially
thereby providing an ansatz to give a meaning to self-adjoint) positive differential operator D of
partition functions in the path integral approach to order d acting on smooth functions on a closed
quantum field theory. n-dimensional Riemannian manifold M, its
complex power Dz is a trace class on the space
of L2 -functions on M provided Re(z) > n=d and the
Operator Algebras
corresponding trace tr(Dz ) extends to a mero-
Bounded linear operators on a Hilbert space H morphic function on the whole plane, the
form an algebra L(H) closed for the operator norm -function (D, z) which is holomorphic at 0.
96 Introductory Article: Minkowski Spacetime and Special Relativity

More generally, Banach spaces lp (Z), 1  p < 1, operators) are particularly useful. A Holder-type
of
P complex-valued sequences (un )n2Z such that inequality shows that the product of two Hilbert
p
ju
n2Z n j < 1 relate to Schatten ideals I p (H), 1  Schmidt operators is trace-class. Moreover, for any
p < 1, where I p (H) is the Banach space of bounded two HilbertSchmidt operators A and B, the
linear operators on H such that kAkp = (tr(jAjp ))1=p cyclicity property that tr(A B) = tr(B A) holds,
is bounded. Just as all lp -sequences converge to 0, and the sesquilinear form (A, B) 7! tr(A B ) makes
the Schatten ideals I p (H) all lie in K(H) and we L2 (H) a Hilbert space.
have     I p1 (H)  I p (H)      K(H).
Compact operators and Schatten ideals are
useful to extend index theory to a noncommuta-
Further Reading
tive context; a Fredholm module (H, F) over an
involutive algebra A is given by an involutive Adams R (1975) Sobolev Spaces. London: Academic Press.
representation  of A in a Hilbert space H and Dunford N and Schwartz J (1971) Linear Operators. Part I.
a self-adjoint bounded linear operator F on H General Theory. Part II. Spectral Theory. Part III. Spectral
Operators. New York: Wiley.
such that F2 = IdH and the operator brackets Hille E (1972) Methods in Classical and Functional Analysis.
[F, (a)] are compact for all a 2 A. To a London: Academic Press and Addison-Wesley.
p-summable Fredholm module (H, F), that is, Kato T (1982) A Short Introduction to Perturbation Theory for
[F, (a)] 2 I p (H) for all a 2 A, one associates a Linear Operators. New YorkBerlin: Springer.
representative of the Chern character ch (H, F) Reed M and Simon B (1980) Methods of Modern Mathematical
Physics vols. IIV, 2nd edn. New York: Academic Press.
given by a cyclic cocycle on A, which pairs up with Riesz F and SZ-Nagy B (1968) Lecons danalyse fonctionnelle.
K-theory to build an integer-valued index map Paris: GauthierVillars: Budapest Akademiai Kiado.
on K-theory. Rudin W (1994) Functional Analysis, 2nd edn. New York:
Schatten ideals are also useful to investigate the International Series in Pure and Applied Mathematics.
geometry of infinite-dimensional spaces such as loop Yosida K (1980) Functional Analysis, 6th edn. Die Grundlehren
der Mathematischen Wissenschaften in Einzeldarstellungen
groups, for which the HilbertSchmidt operators Band vol. 132. BerlinNew York: Springer.
(operators in I 2 (H) are also called HilbertSchmidt

Introductory Article: Minkowski Spacetime and Special Relativity


G L Naber, Drexel University, Philadelphia, PA, USA for all w 2 M implies v = 0). Further, g has index 1,
2006 Elsevier Ltd. All rights reserved. that is, there exists a basis {e1 , e2 , e3 , e4 } for M with
8
< 1 if a b 1; 2; 3
gea ; eb ab 1 if a b 4
Introduction :
0 if a 6 b
Minkowski spacetime is generally regarded as the g is called a Lorentz inner product for M and any
appropriate mathematical context within which to basis of the type just described is an orthonormal
formulate those laws of physics that do not refer basis for M. We shall often write v  w for the value
specifically to gravitational phenomena. Here we g(v, w) of g on (v, w) 2 M  M. A vector v 2 M is
shall describe this context in rigorous terms, said to be spacelike, timelike, or null if v  v is
postulate what experience has shown to be its positive, negative, or zero, respectively, and the set
correct physical interpretation, and illustrate by CN of all null vectors is called the null cone in M. If
means of examples its appropriateness for the {e1 , e2 , e3 , e4 } is an orthonormal basis and if
formulation of physical laws. we write v = v1 e1 v2 e2 v3 e3 v4 e4 = va ea (using
the Einstein summation convention, according to
Minkowski Spacetime which a repeated index, one subscript and one
and the Lorentz Group superscript, is summed over its possible values) and
w = wb eb , then
Minkowski spacetime M is a four-dimensional real
vector space on which is defined a bilinear form
v  w v1 w1 v2 w2 v3 w3  v4 w4
g : M  M ! R that is symmetric (g(v, w) = g(w, v)
for all v, w 2 M) and nondegenerate (g(v, w) = 0 ab va wb
Introductory Article: Minkowski Spacetime and Special Relativity 97

Timelike 1. (orthogonality) T  = ,
CN where T means transpose and
0 1
1 0 0 0
B0 1 0 0C
 ab B
@0
C
Null 0 1 0A
Spacelike 0 0 0 1

2. (orientability) det  = 1, and


3. (time orientability) 4 4 1.
We shall refer to any 4  4 matrix  = (a b ) satisfying
these three conditions as a Lorentz transformation
(although one often sees the adjectives proper and
Figure 1 Spacelike, timelike and null vectors.
orthochronous appended to emphasize conditions
(2) and (3), respectively). The set L of all such matrices
forms a group under matrix multiplication that we call
In particular, v is null if and only if simply the Lorentz group. It is a simple matter to show
(Naber 1992, lemma 1.3.4) from the orthogonality
v4 2 v1 2 v2 2 v3 2 condition (1) that, if 4 4 = 1, then  must be of the
form
(hence the name null cone for CN ). Timelike vectors 0 1
are inside the null cone and spacelike vectors are 0
B Ri j 0C
outside (see Figure 1). B C
@ 0A
We select some orientation for the vector space M
and will henceforth consider only oriented, ortho- 0 0 0 1
normal bases for M. From the Schwartz inequality where (Ri j ) is an element of SO(3), that is, a 3  3
for R3 , one can show (Naber 1992, theorem 1.3.1) orthogonal matrix with determinant 1. The set R of
that, if v is timelike and w is either timelike or null all matrices of this form is a subgroup of L called
and nonzero, then v  w < 0 if and only if v4 w4 > 0 the rotation subgroup. Although it will play no role
in any orthonormal basis. In particular, one can in what we do here, it should be pointed out that in
define an equivalence relation on the set of all many applications (e.g., in particle physics) it is
timelike vectors by decreeing that two such, v and necessary to consider the larger group of transfor-
w, are equivalent if and only if v  w < 0. For mations of M generated by the Lorentz group and
reasons that will emerge shortly we then say that v spacetime translations (xa ! xa a , for some con-
and w have the same time orientation. There are stants a , a = 1, 2, 3, 4). This is called the inhomoge-
precisely two equivalence classes, one of which we neous Lorentz group, or Poincare group.
select and designate future directed. Timelike vectors
in the other class are then called past directed. One
can show (Naber 1992, section 1.3 and corollary
Physical Interpretation
1.4.5) that this classification can be extended to
nonzero null vectors as well (but not to spacelike For the purpose of describing how one is to think of
vectors). We will call an oriented, orthonormal basis Minkowski spacetime and the Lorentz group physi-
time oriented if its timelike vector e4 is future cally it will be convenient to distinguish (intuitively
directed and will consider only these in what and terminologically, if not mathematically) between a
follows. An oriented, time-oriented, orthonormal vector in M and a point in M (the tip of a
basis for M will be called an admissible basis. If vector). The points in M are called events and are to be
{e1 , e2 , e3 , e4 } and {^e1 , ^e2 , ^e3 , ^e4 } are two such bases thought of as actual physical occurrences, albeit
and if we write idealized as point events which have no spatial
extension and no duration. One might picture, for
eb 1 b^e1 2 b^e2 3 b^e3 4 b^e4 example, an instantaneous collision, or explosion, or
a b^ea ; b 1; 2; 3; 4 1 an instant in the history of some point material
particle or photon (particle of light).
then the matrix  = (a b ) (a = row index, Events are observed and identified by the assign-
b = column index) can be shown to satisfy the ment of coordinates. We will be interested in
following three conditions (Naber 1992, section 1.3): coordinates assigned in a very particular way by a
98 Introductory Article: Minkowski Spacetime and Special Relativity

very particular type of observer. Specifically, our coordinate axes. On the other hand, for any real
admissible observers preside over three-dimensional, number  one can define an element L() of L by
right-handed, Cartesian spatial coordinate systems, 0 1
cosh  0 0  sinh 
relative to which photons always move along B C
0 1 0 0
straight lines in any direction. With a single clock L B
@
C
A 3
0 0 1 0
located at the origin, such an observer can determine
 sinh  0 0 cosh 
the speed, c, of light in vacuo by the so-called Fizeau
procedure (emit a photon from the origin when the and, if two admissible bases are related by this Lorentz
clock there reads t1 , bounce it back from a mirror transformation, then the coordinate transformation [2]
located at (x1 , x2 , x3 ), receive the photon at the becomes
origin again when the clock there reads t2 and set
q ^1 cosh  x1  sinh  x4
x
c = 2 (x1 )2 (x2 )2 (x3 )2 =(t2  t1 )). Now place an
^2 x2
x
identical clock at each spatial point and synchronize 4
them by emitting from the origin a spherical ^3 x3
x
electromagnetic wave (photons in all directions) ^4 sinh  x1 cosh  x4
x
and setting the clock whose location is (x1 , x2 , x3 )
q Letting  = tanh  (so that 1 <  < 1) and suppressing
to read (x1 )2 (x2 )2 (x3 )2 =c at the instant the ^2 = x2 and x
x ^3 = x3 , one obtains
wave arrives. An observer now assigns to an event
1 
the three spatial coordinates of the location at which ^1 p x1  p x4
x
1 2 1  2
it occurred in his coordinate system as well as the 5
time reading on the clock at that location at the  1
^4  p x1 p x4
x
instant the event occurred. We shall assume also 1 2 1  2
that our admissible observers are inertial in the sense
of Newtonian mechanics (the trajectory of a particle This corresponds to two observers whose spatial
on which no forces act, when described in terms axes are oriented as shown in Figure 2 with the
of the coordinates just introduced, is a point or a hatted coordinate system moving along the common
straight line traversed at constant speed). It is an x1 -, x
^1 -axis with speed jj, to the right if  > 0 and
experimental fact (and quite a remarkable one) that to the left if  < 0.
all of these admissible observers (whether or not they We remark that, reverting to traditional time units,
are in relative motion) agree on the numerical value of  = v=c, where jvj is the relative speed of the two
the speed of light in vacuo (c
3.00  1010 cm s1 ). coordinate systems, and [5] becomes what is gener-
We shall exploit this fact at the outset to have all of our ally referred to as a Lorentz transformation in
admissible observers measure time in units of distance elementary expositions of special relativity, that is,
by simply multiplying their time coordinates t by c. x1  vt
The resulting time coordinate is denoted x4 = ct. In ^1 p
x
1  v2 =c2
these units all speeds are dimensionless and the speed 6
of light in vacuo is 1. t  v=c2 x1
^t p
In our mathematical model M of the world of 1  v2 =c2
events, this very subtle and complex notion of an
admissible observer is fully identified with the
conceptually very simple notion of an admissible
x2 x 2
basis {e1 , e2 , e3 , e4 }. If x 2 M is an event and if we
write x = xa ea , then (x1 , x2 , x3 ) are the spatial and x4 ( > 0)
is the time coordinate supplied for x by the
corresponding observer. If {^e1 , ^e2 , ^e3 , ^e4 } is another
basis/observer related to {e1 , e2 , e3 , e4 } by [1] and if
we write x = x ^a^ea , then

^a a b xb ;
x a 1; 2; 3; 4 2
x 1, x 1
Thus, Lorentz transformations relate the space and
time coordinates supplied for any given event by two
admissible observers. If (a b ) 2 R, then the two x3 x 3
observers differ only in the orientation of their spatial Figure 2 Observers in standard configuration.
Introductory Article: Minkowski Spacetime and Special Relativity 99

There is a sense in which, to understand the material object (e.g., the observers clock situated at
kinematic effects of special relativity, it is enough that point) we find that the events x0 and x are both
to restrict ones attention to the so-called special experienced by this material particle and that,
p
Lorentz transformations L(). Specifically, one can moreover, jg(x  x0 , x  x0 )j is just the time lapse
show (Naber 1992, theorem 1.3.5) that if  2 L is between the events recorded by a clock carried along by
any Lorentz transformation, then there exists a real this material particle. To any other admissible observer
number  and two rotations R1 , R2 2 R such that this material particle appears free (not subject to
 = R1 L()R2 . Since R1 and R2 involve no relative forces) because it moves on a straight line with constant
motion, all of the kinematics is contained in L(). speed. This leads us to the following definitions. If
We shall explore these kinematic effects in more x0 , x 2 M are such that x  x0 is timelike, then the
detail shortly. straight line in M containing x0 and x is called the
Now suppose that x and x0 are two distinct events world
p line of a free material particle in M and

in M and consider the displacement vector x  x0 jg(x  x0 , x  x0 )j, usually written (x  x0 ), or
from x0 to x. If {e1 , e2 , e3 , e4 } is an admissible basis simply , is the proper time separation of x0 and x.
and if we write x = xa ea and x0 = xa0 ea , then x  One can think of (x  x0 ) as a sort of length for
x0 = (xa  xa0 )ea = xa ea . If x  x0 is null, then x  x0 measured, however, by a clock carried along by
 1 2  2 2  3 2  4 2 a free material particle that experiences both x0 and x.
x x x x It is an odd sort of length, however, since it satisfies
so the spatial separation of the two events is equal to not the usual triangle inequality, but the following
the distance light would travel during the time lapse reversed version.
between the events. The same must be true in any Reversed triangle inequality (Naber 1992, theorem
other admissible basis since Lorentz transformations 1.4.2) Let x0 , x and y be events in M for which y  x
are the matrices of linear maps that preserve the and x  x0 are timelike with the same time orientation.
Lorentz inner product. Consequently, all admissible Then y  x0 = (y  x) (x  x0 ) is timelike and
observers agree that x0 and x are connectible by
a photon. They even agree as to which of the two y  x0 y  x x  x0 7
events is to be regarded as the emission of the
with equality holding if and only if y  x and x  x0
photon and which is to be regarded as its reception
are linearly dependent.
since one can show (Naber 1992, theorem 1.3.3)
that, when a vector is either timelike or null and The sense of the inequality in [7] has interesting
nonzero, the sign of its fourth coordinate is the same consequences about which we will have more to say
in every admissible basis (because 4 4 1). Thus, shortly.
x4  x40 is either positive for all admissible observers Finally, let us suppose that x  x0 is spacelike.
(x0 occurred before x) or negative for all admissible Then, in any admissible basis
observers (x0 occurred after x). Since photons move  1 2  2 2  3 2  4 2
along straight lines in admissible coordinate systems x x x > x
we adopt the following terminology. If x0 , x 2 M are
such that x  x0 is null, then the straight line in M so the spatial separation of x0 and x is greater than the
containing x0 and x is called the world line of a distance light could travel during the time lapse that
photon in M and is to be thought of as the set of all separates them. There is clearly no admissible observer
events in the history of some particle of light that for whom the events occur at the same location. No
experiences both x0 and x. free material particle (or even photon) can experience
Let us now suppose instead that x  x0 is timelike. both x0 and x. However, one can show (Naber 1992,
Then, in any admissible basis, section 1.5) that, given any real number T (positive,
negative, or zero), one can find an admissible basis
 1 2  2 2  3 2  4 2 x4 = T. Some admissible
{^e1 , ^e2 , ^e3 , ^e4 } in which ^
x x x < x
observers will judge the events simultaneous, some
so the spatial separation of x0 and x is less than the will assert that x0 occurred before x, and others will
distance light would travel during the time lapse reverse the order. Temporal order, cause and effect,
between the events. In this case, one can prove (Naber have no meaning for such pairs of events. For those
1992, section 1.4) that there exists an admissible basis admissible observers for whom the events are simulta-
p

x1 = ^
{^e1 , ^e2 , ^e3 , ^e4 } in which ^ x2 = ^
x3 = 0, that is, neous (^ x4 = 0), the quantity g(x  x0 , x  x0 ) is
there is an admissible observer for whom the two the distance between them and for this reason this
events occur at the same spatial location, one after the quantity is called the proper spatial separation of x0
other. Thinking of this location as occupied by some and x (whenever x  x0 is spacelike).
100 Introductory Article: Minkowski Spacetime and Special Relativity

For any two events x0 , x 2 M, g(x  x0 , x  x0 ) is is unnecessary, but makes the pictures easier to
given in any admissible basis by (x1 )2 (x2 )2 draw). The x ^1 -axis will be represented by the
(x3 )2  (x4 )2 and is called the interval separating straight line x ^4 = 0 which, from [5], is given by
x0 and x. It is the closest analog in Minkowskian x4 = x1 (in Figure 3 we have assumed that  > 0).
geometry to the (squared) length in Euclidean Similarly, the x ^4 -axis is identified with the line
geometry. It can, however, assume any real value x4 = (1=)x1 . Since Lorentz transformations leave
depending on the physical relationship between the Lorentz inner product invariant, the hyperbolas
the events x0 and x. Historically, of course, it was (x1 )2  (x4 )2 = k coincide with (^ x1 )2  (^x4 )2 = k and
the various physical interpretations of this interval we calibrate the axes accordingly, for example, the
that we have just described which led Minkowski branch of (x1 )2  (x4 )2 = 1 with x1 > 0 intersects
(Einstein et al. 1958) to the introduction of the the x1 -axis at the point (x1 , x4 ) = (1, 0) and intersects
structure that bears his name. the x ^1 -axis at the point (^ x1 , x
^4 ) = (1, 0). This
necessitates a different scale on the hatted and
unhatted axes, but one can show (Naber 1992,
Kinematic Effects section 1.3) that, with this calibration, all coordi-
nates can be obtained geometrically by projecting
All of the well-known kinematic effects of special parallel to the opposite axis (e.g., the x4 - and x ^4 -
relativity (the addition of velocities formula, the coordinates of an event result from projecting
relativity of simultaneity, time dilation, and length parallel to the x1 - and x ^1 -axes, respectively).
contraction) follow easily from what we have done. Thus, a line of simultaneity in the hatted
Because it eases visualization and because, as we (respectively, unhatted) coordinates is parallel to
mentioned earlier, it suffices to do so, we will limit our the x^1 - (respectively, x1 -) axis so that, in general, a
discussion to the special Lorentz transformations. pair of events lying on one will not lie on the other
Let 1 and 2 be two real numbers and consider (note, however, that these lines are really three-
the corresponding elements L(1 ) and L(2 ) of dimensional hyperplanes so what appears to be a
L defined by [3]. Sum formulas for sinh  and point of intersection is actually a two-dimensional
cosh  imply that L(1 )L(2 ) = L(1 2 ). Defining plane of agreement, any two events in which are
i = tanh i , i = 1, 2, and  = tanh (1 2 ), the sum judged simultaneous by both observers).
formula for tanh  then gives For any two events whatsoever the relationship
1 2 between the time lapse ^ x4 in the hatted coordinates
 8 4
and the time lapse x in the unhatted coordinates is,
1 1 2
from [5],
The physical interpretation is simple. One has three
 1
admissible observers whose spatial axes are related x4  p x1 p x4
^
in the manner shown in Figure 2. If the speed of the 1  2 1  2
second relative to the first is 1 and the speed of the so the two are generally not equal. Consider, in
third relative to the second is 2 , then the speed of particular, two events on the world line of a point
the third relative to the first is not 1 2 as a at rest in the unhatted coordinate system, for
Newtonian predisposition would lead one to expect,
but rather , given by [8]. This is the relativistic
addition of velocities formula.
We have seen already that, when the interval x4
x 4 (x 1)2 (x 4)2 = 1
between x0 and x is spacelike, the events will be
judged simultaneous by some admissible obser- Hatted line of simultaneity
vers, but not by others. Indeed, if x4 = 0
and the observers
p
are related by [5], then ^x4 = Unhatted line of simultaneity
2 1 1
(= 1   )x = ^ x , which will not be
zero unless  = 0 and so there is no relative motion x 1
(^x1 cannot be zero since then ^ xa = 0 for
a = 1, 2, 3, 4 and x = x0 ). This phenomenon is (x 1, x 4) = (1, 0)
called the relativity of simultaneity and we now
construct a simple geometrical representation of it.
x1
Select two perpendicular lines in the plane to (x 1, x 4) = (1, 0)
represent the x1 - and x4 -axes (the Euclidean ortho-
gonality of the lines has no physical significance and Figure 3 Relativity of simultaneity.
Introductory Article: Minkowski Spacetime and Special Relativity 101

example, two readings on the clock at rest at the


x4 x 1 = 1
origin in this system. Then x1 = 0 so x 4 (x 1)2 (x 4)2 = 1
1
x4 p x4 > x4
^
1  2
x1 = 0,
This effect is entirely symmetrical since, if ^
then [5] implies x 1
1
x4 p ^
x4 > ^
x4
1  2 (x 1, x 4) = (1, 0)
Each observer judges the others clocks to be
running slow. This phenomenon is called time x1
(x 1, x 4) = (1, 0)
dilation and is clearly visible in the spacetime
diagram in Figure 4 (e.g., both observers agree
Figure 5 Length contraction.
on the time reading 0 for the clock at the origin of
the unhatted system, but the line x ^4 = 1 intersects
the world line of the clock, i.e., the x4 -axis, at a system. Its length in this coordinate system is ^ x1 .
point below (x1 , x4 ) = (0, 1)). The world lines of its end points are two straight
We should emphasize that this phenomenon is lines parallel to the x^4 -axis. If the unhatted observer
quite real in the physical sense. For example, locates two events on these world lines simulta-
4
certain types of elementary particles (mesons) found neously their coordinates
p will satisfy x = 0 and,
1 2 1
in cosmic radiation are so short-lived (at rest) that, by [5] ^x = (1= 1   )x so
p
even if they could travel at the speed of light, the x1 1   2 ^ x1 < ^x1
time required to traverse our atmosphere would be
some ten times their normal life span. They should and the moving measuring rod appears contracted
p in
not be able to reach the earth, but they do. Time its direction of motion by a factor of 1   2 . As
dilation keeps them young in the sense that what for time dilation, this phenomenon, known as length
seems a normal life time to the meson appears much contraction, is entirely symmetrical, quite real, and
longer to us. clearly visible in a spacetime diagram (Figure 5).
Finally, since admissible observers generally
disagree on which events are simultaneous and
since the only way to measure the length of a The Relativity Principle
moving object (say, a measuring rod) is to locate its
We have found that admissible observers can disagree
end points simultaneously, it should come as no
about some rather startling things (whether or not two
surprise that length, like simultaneity, and time,
events are simultaneous, the time lapse between two
depends on the admissible observer measuring it.
events even when no one thinks they are simultaneous,
Specifically, let us consider a measuring rod lying
and the length of a measuring rod). This would be
at rest along the x ^1 -axis of the hatted coordinate
a matter of no concern at all, of course, if one could
determine, in any given situation, who was really
(x 1)2 (x 4)2 = 1
right. Surely, two events are either simultaneous or
x4
x 4
they are not and we need only sort out which
admissible observer has the correct view of the
situation? Unfortunately (or fortunately, depending
x 4 = 1 on ones point of view) this distinction between
the judgments made by different admissible observers
(x 1, x 4) = (0, 1) is precisely what physics forbids.
(x 1, x 4) = (0, 1) x 1 The relativity principle (Einstein et al. 1958). All
admissible observers are completely equivalent for
the formulation of the laws of physics.
We must be clear that this is not a mathematical
x1
statement. It is rather a statement about the physical
world around us and how it should be described,
Figure 4 Time dilation. gleaned from observations, some of which are
102 Introductory Article: Minkowski Spacetime and Special Relativity

complex and subtle and some of which are common- approximation to the integral and appealing to our
place (a passenger in a smooth, quiet airplane interpretation of the proper time separation
p
traveling at constant groundspeed cannot feel  = ab xa xb . There are subtleties, however,
his motion relative to the earth). It is a powerful both mathematical and physical (Naber 1992, section
guide for constructing the laws of relativistic 1.4). The mathematical ones are addressed by the
physics, but even more fundamentally it prohibits following result (which combines theorems 1.4.6
us from regarding any particular admissible observer and 1.4.8 of Naber (1992)).
as having a privileged view of the universe. In
Theorem Let x0 and x be two events in M. Then
particular, we are forbidden from attaching any
x  x0 is timelike and future directed if and only if
objective significance to such questions as, were the
there exists a timelike world line  : [0 , 1 ] ! M in
two supernovae simultaneous?, How long did the
M with (0 ) = x0 and (1 ) = x and, in this case,
meson survive?, and What is the distance between
the Crab Nebula and Alpha Centauri? This is L   x  x0 9
severe, but one must deal with it.
with equality holding if and only if  is a parametriza-
tion of a timelike straight line.
Particles and 4-Momentum The inequality [9] asserts that if two material
particles experience both x0 and x, then the one
If I R is an interval, then a map  : I ! M is a curve
that is free (and so can be regarded as at rest in
in M. Relative to any admissible basis we can write
some admissible coordinate system) has longer to
 xa  ea wait for the occurrence of the second event (moving
clocks run slow). For many years this basically
for each  2 I. We shall assume that  is smooth in
obvious fact was christened The Twin Paradox.
the sense that each xa (), a = 1, 2, 3, 4, is infinitely
Just as a smooth curve in Euclidean space has an
differentiable (C1 ) on I and the velocity vector
arc length parametrization, so a timelike world line
dxa has a proper time parametrization defined as
0  ea
d follows. For each  in [0 , 1 ] let
Z  p
is nonzero for every  2 I (we adopt the usual
custom, in a vector space, of identifying the tangent    jg 0 ; 0 j d
0
space at each point with the vector space itself). This
definition of smoothness clearly does not depend on (the proper time length of  from (0 ) to ()).
the choice of admissible basis for M. The curve  is Then  = () has a smooth inverse  = () so  can
said to be spacelike, timelike, or null if be reparametrized by . We will abuse our notation
slightly and write
dxa dxb
0   0  ab  xa  ea
d d
is positive, negative, or zero, respectively, for each The velocity vector with this parametrization is
 2 I. A timelike curve  for which 0 () is future denoted
directed for each  2 I is called a timelike world line dxa
and its image is identified with the set of all events U U  ea
d
in the history of some (not necessarily free) point
material particle. If I = [0 , 1 ] and  : [0 , 1 ] ! M called the 4-velocity of the world line and is the unit
is a timelike world line, then the proper time length tangent vector field to , that is,
of  is defined by U  U 1 10
Z 1 p
L jg0 ; 0 j d for each . An admissible observer is, of course,
0 more likely to parametrize a world line by his own
s time coordinate x4 . Then
Z 1
dxa dxb
ab d
d d   dx1 dx2 dx3
0
0 x4 4 e1 4 e2 4 e3 e4
dx dx dx
and interpreted as the time lapse between the events
(0 ) and (1 ) as recorded by a clock carried along by so
the particle whose world line is . This interpretation   0 4 
g  x ; 0 x4  1  kVk2
is easily motivated by writing out a Riemann sum
Introductory Article: Minkowski Spacetime and Special Relativity 103

where to as the relativistic mass of the particle, but we


s shall avoid this terminology. The fourth component
 1 2  2 2  3 2
dx dx dx of P is given by
kVk 4
4

dx dx dx4
P4 P  e4
is the usual magnitude of the particles velocity m 1
vector q m mkVk2    15
2 2
  1  kVk
V V x4
dx1 dx2 dx3 The appearance of the term (1=2)mkVk2 corre-
e 1 e 2 e3 sponding to the Newtonian kinetic energy suggests
dx4 dx4 dx4
V i ei that P4 be denoted E and called the total relativistic
energy measured by the given admissible observer
in the given admissible coordinate system. One finds for the particle:
then that
 1=2 E P  e4 16
U 1  kVk2 V e 4 11
Now, one must understand that the concept of
energy in physics is a subtle one and simply
We shall identify a material particle in M with a
giving P  e4 this name does not ensure that there
pair (, m), where  is a timelike world line and m is
is any physical content. Whether or not the name
a positive constant called the particles proper mass
is appropriate can only be determined experimen-
(or rest mass). If each dxa =d, a = 1, 2, 3, 4, is
tally. In particular, one should ask if the appear-
constant, then (, m) is a free material particle with
ance of the term m in [15] is consistent with
proper mass m. The 4-momentum of (, m) is
the view that P4 represents the energy of the
defined by P = mU. Thus,
particle. Observe that if kVk = 0 (i.e., if the particle
P  P m2 12 is at rest relative to the given observer), then [15]
gives
In any admissible basis we write
dxa E m mc2 ; in standard units 17
P Pa ea mUa ea m ea
d
 1=2 which we interpret as saying that, even when the
m 1  kVk2 V e 4 13 particle is at rest, it still has energy. If this is really
energy in the physical sense, then it should be
The spatial part of P in these coordinates is possible to liberate and use it. That this is, indeed,
m possible has, of course, been rather convincingly
P q V demonstrated.
1  kVk2 Next we observe that not only material particles,
but also photons possess momentum and
which, for kVk 1, is approximately mV. Identify- energy and therefore should have 4-momentum
ing m with the inertial mass of Newtonian (witness, e.g., the photoelectric effect in which
mechanics (measured by an observer for whom the photons collide with and eject electrons from their
particles speed is small), this is simply the classical orbits in an atom). Unlike a material particle,
momentum of the particle. Somewhat
q
more expli- however, a photons characteristic feature is not
2
citly, if one expands 1= 1  kVk by the Binomial proper mass, but frequency
, or wavelength
Theorem one finds that = 1=
, related to its energy E by E = h
(h being
m Plancks constant) and these are highly observer
Pi q V i dependent (Doppler effect). There is, moreover, no
1  kVk2 proper frequency analogous to proper mass
1 since there is no admissible observer for whom the
mV i mV i kVk2    ; i 1; 2; 3 14 photon is at rest. In an attempt to model these
2
features we consider a point x0 2 M, a future
which gives the components of the classical momen- directed null vector N and an interval I R. The
tum plus relativistic corrections. In order curve  : I ! M defined by
to preserve a formal similarity with Newtonian
q

mechanics one often sees m= 1  kVk2 referred  x0 N 18
104 Introductory Article: Minkowski Spacetime and Special Relativity

is a parametrization of the world line of a photon parallel, in which case the sum is null and future
through x0 . Being null, N can be written in any directed (Naber 1992, lemma 1.4.3). We call this
admissible basis as sum the total 4-momentum of A. Now we formulate
a definition which is intended to model a finite set
N N  e4 d e4 19 of free particles colliding at some event with a
(perhaps new) set of free particles emerging from the
where
h collision (e.g., an electron and proton collide, with a
d N  e1 2 N  e2 2 neutron and neutrino emerging from the collision).
A contact interaction in M is a triple (A, x, A), ~
i1=2 h
N  e 3 2 N  e1 e1 where A and A~ are two finite sets of free particles,
i neither of which contains a pair of particles with
N  e2 e2 N  e3 e3 20 linearly dependent 4-momenta (which would pre-
sumably be physically indistinguishable) and x 2 M
is the direction vector of the world line in the is an event such that
corresponding spatial coordinate system. Now, by
1. x is the terminal point of all of the particles in A
analogy with [16], we define a photon in M to
(i.e., for each world line  : [0 , 1 ] ! M of a
be a curve in M of the form [18], take N to be its
particle in A, (1 ) = x);
4-momentum and define the energy E of the photon ~ and
2. x is the initial point of all the particles in A,
in the admissible basis {e1 , e2 , e3 , e4 } by
3. the total 4-momentum of A equals the total
4-momentum of A. ~
E N  e4 21
Properly (3) is called the conservation of 4-momentum.
Then, by [19], ~ is
If A consists of a single free particle, then (A, x, A)
N E d e4 22 called a decay (e.g., a neutron decays into a proton, an
electron and an antineutrino).
The corresponding frequency
and wavelength ~
Consider, for example, an interaction (A, x,A)
are then defined by
= E=h and = 1=
. In another
^ ^e4 ), where d
^d ^ for which A~ consists of a single photon. The total
admissible basis, one has N = E(
^ 4-momentum of A~ is null so the same must be true of
and E are defined by the hatted versions of [20] and
A. Since the 4-momenta of the individual particles in
[21]. One can then show (Naber 1992, section 1.8)
A are timelike or null and future directed their sum
that
can be null only if they are, in fact, all null and
E^
^ 1   cos  parallel. Since A cannot contain distinct photons with
p parallel 4-momenta, it must consist of a single photon
E
1  2
which, by (3), must have the same 4-momentum as
1 ~ In essence, nothing happened at
1   cos   2 1   cos     23 the photon in A.
2 x. We conclude that no nontrivial interaction of the
where  is the relative speed of the two spatial type modeled by our definition can result in a single
coordinate systems and  is the angle (in the photon and nothing else. Reversing the roles of A
unhatted spatial coordinate system) between the and A~ shows that, if 4-momentum is to be conserved,
direction d of the photon and the direction of a photon cannot decay.
motion of the hatted spatial coordinate system. Next let us consider the decay of a single material
Equation [23] is the formula for the relativistic particle into two material particles, for example, the
Doppler effect with the first term in the series being spontaneous disintegration of an atom through
the classical formula. -emission. Thus, we consider a contact interaction
We conclude this section by examining a few ~ in which A consists of a single free material
(A, x, A)
simple interactions between particles of the sort particle of proper mass m0 and A~ consists of two
modeled by our definitions, assuming only that free material particles with proper masses m1 and
4-momentum is conserved in the interaction. For m2 . Let P0 , P1 , and P2 be the 4-momenta of the
convenience, we will use the term free particle to particles of proper mass m0 , m1 , and m2 , respec-
refer to either a free material particle or a photon. tively. Then P0 = P1 P2 . Appealing to the
If A is a finite set of free particles, then each reversed triangle inequality, the fact that P1 and
element of A has a unique 4-momentum which is a P2 are linearly independent and future directed, and
future-directed timelike or null vector. The sum of [12] we conclude that
any such collection of vectors is timelike and future
directed, except when all of the vectors are null and m0 > m1 m2 23
Introductory Article: Minkowski Spacetime and Special Relativity 105

The excess mass m0  (m1 m2 ) of the initial (, m, q) is a test charge). Let us write [24] more
particle is regarded, via [17], as a measure of the simply as
amount of energy required to split m0 into two
pieces. Stated somewhat differently, when the two ~ m dU
FU 25
particles in A~ were held together to form the single q d
particle in A, the binding energy contributed to
the mass of this latter particle. Dotting both sides of [25] with U gives
Reversing the roles of A and A~ in the last m dU m d
example gives a contact interaction modelling an ~
FU U U U  U
q d 2q d
inelastic collision (two free material particles with
masses m1 and m2 collide and coalesce to form a m d
1 0
third of mass m0 ). The inequality [23] remains true, 2q d
of course, and a somewhat more detailed analysis
Since any future-directed timelike unit vector u is
(Naber 1992, section 1.8) yields an approximate
the 4-velocity of some charged particle, we find
formula for m0  (m1 m2 ) which can be com- ~  u = 0 for any such vector. Linearity then
that F(u)
pared (favorably) with the Newtonian formula for ~  v = 0 for any timelike vector. Now,
implies F(v)
the loss in kinetic energy that results from the
if u and v are timelike and future directed, then u v
collision (energy which, classically, is viewed as ~ v)  (u v) = F(u)
~  v
is timelike so 0 = F(u
taking the form of heat in the combined particle). ~ ~ ~
u  F(v) and therefore F(u)  v =  u  F(v). But M
An analysis of the interaction in which both A and
has a basis of future-directed timelike vectors so
A~ consist of an electron and a photon yields (Naber
1992, section 1.8) a formula for the so-called ~ ~
Fx  y x  Fy 26
Compton effect. Many more such examples of this
sort are treated in great detail in Synge (1972, for all x, y 2 M. Thus, at each point, the linear
chapter VI, 14). transformation F ~ must be skew-symmetric with
respect to the Lorentz inner product. One could
therefore model an electromagnetic field on M by
Charged Particles and Electromagnetic an assignment to each point of a skew-symmetric
Fields linear transformation whose job it is to assign to the
4-velocity of a charged particle whose world line
A charged particle in M is a triple (, m, q), where passes through that point the change in 4-momen-
(, m) is a material particle and q is a nonzero real tum that the particle should expect to experience
number called the charge of the particle. Charged because of the presence of the field. However, a
particles do two things of interest to us. By their slightly different perspective has proved more con-
very presence they create electromagnetic fields and venient. Notice that a skew-symmetric linear trans-
they also respond to the electromagnetic fields formation F ~ : M ! M and the Lorentz inner
created by other charges. product together determine a bilinear form F : M 
Charged particles respond to an electromag- M ! R given by
netic field by experiencing changes in 4-momentum.
The quantitative nature of this response, that is, the ~
Fx; y Fx y
equation of motion, is generally taken to be the
so-called Lorentz 4-force law which expresses ~  x=
which is also skew-symmetric (F(y, x) = F(y)
the proper time rate of change of the particles F(x, y)) and that, conversely, a skew-symmetric
4-momentum at each point of the world line as a bilinear form uniquely determines a skew-symmetric
linear function of the 4-velocity. Thus, at each point linear transformation. Now, an assignment of a
() of the world line skew-symmetric bilinear form to each point of M is
nothing other than a 2-form on M and it is in the
dP language of forms that we choose to phrase classical
~ U
qF 24
d electromagnetic theory (a concise introduction to
this language is available, for example, in Spivak
where F ~( ) :M ! M is a linear transformation (1965, chapter 4).
determined, in each admissible coordinate system, Nature imposes a certain restriction on which
by the classical electric E and magnetic B fields (here 2-forms can reasonably represent an electromagnetic
we are assuming that the contribution of q to the field on M (Maxwells equations). To formulate
ambient electromagnetic field is negligible, that is, these we introduce a source 1-form J as follows: If
106 Introductory Article: Minkowski Spacetime and Special Relativity

x1 , x2 , x3 , x4 is any admissible coordinate system on On regions in which there are no charges, so that
M, then J = 0, [28] and [31] become the source free Maxwell
equations
J J1 dx1 J2 dx2 J3 dx3  dx4 27
dF 0 32
where : M ! R is a charge density function and
J = J1 e1 J2 e2 J3 e3 is a current density vector field and
(these are to be regarded as the usual smoothed d F 0 33
out, pointwise versions of charge per unit
volume and charge flow per unit area per unit that is, both F and  F are closed 2-forms.
time as measured by the corresponding admissible Any 2-form F on M can be written in any admissible
observer). Now, our formal definition is as follows: coordinate system as F = (1/2)Fab dxa ^ dxb (summa-
The electromagnetic field on M determined by the tion convention!), where (Fab ) is the skew-symmetric
source 1-form J on M is a 2-form F on M that matrix of components of F. In order to make contact
satisfies Maxwells equation with the notation generally employed in physics, we
introduce the following names for these components:
dF 0 28 0 1
0 B3 B2 E1
and B B3 0 B1 E2 C
Fab B
@ B2 B1
C 34
 
0 E3 A
d FJ 29 E1 E2 E3 0
A few comments are in order here. We have chosen Thus,
units in which not only the speed of light, but also
various other constants that one often finds in F E1 dx1 ^ dx4 E2 dx2 ^ dx4
Maxwells equations (the dielectric constant 0 and E3 dx3 ^ dx4 B3 dx1 ^ dx2
magnetic permeability 0 ) are 1 and a factor of 4 in
[29] is normalized out. The  in [29] is the Hodge B2 dx3 ^ dx1 B1 dx2 ^ dx3 35
star operator determined by the Lorentz inner Computing  F, dF, d F and  d F and writing
product and the chosen orientation of M. This is a E = E1 e1 E2 e2 E3 e3 and B = B1 e1 B2 e2 B3 e3
natural isomorphism one finds that dF = 0 is equivalent to

: p M ! 4p M; p 0; 1; 2; 3; 4 div B 0 36

of the p-forms on M to the (4  p)-forms on M and is and


most simply defined as follows: let x1 , x2 , x3 , x4 be any @B
admissible coordinate system on M. If 1 2 0 (M) curl E 0 37
@t
is the constant function (0-form) on M whose value
is 1 2 R, then while  d F = J is equivalent to
div E 38

1 dx1 ^ dx2 ^ dx3 ^ dx4
and
is the volume form on M. If 1  i1 <    < ik  4,
then  (dxi1 ^    ^ dxik ) is uniquely determined by @E
curl B  J 39
 i    @t
dx 1 ^    ^ dxik ^ dxi1 ^    ^ dxik Equations [36][39] are the more traditional render-
dx1 ^ dx2 ^ dx3 ^ dx4 ings of Maxwells equations.
In another admissible coordinate system
Thus, for example,  dx2 = dx1 ^ dx3 ^ dx4 ,  (dx1 ^ ^1 , x
x ^2 , x
^3 , x
^4 on M (related to the first by [2]) the
dx2 ) = dx3 ^ dx4 ,  (dx1 ^ dx2 ^ dx3 ^ dx4 ) = 1, 2-form F would be written F = (1=2)F ^ab d^
xa ^ d^
xb .
etc. It follows that, if  is a p-form on M, then Setting xa a
^ =  x 
and x b b
^ =  x 
gives
F = (1=2)(a  b  F^ab )dx ^ dx , so

 1p1  30
^ab ;
F a  b  F ;  1; 2; 3; 4 40
(a more thorough discussion is available in Choquet-
Bruhat et al. (1977, chapter V A3)). In particular, Now, suppose that we wish to describe the electro-
[29] is equivalent to magnetic field of a uniformly moving charge.
According to the relativity principle, it does not
d F  J 31 matter at all whether we view the charge as moving
Introductory Article: Minkowski Spacetime and Special Relativity 107

relative to a fixed admissible observer, or the and


observer as moving relative to a stationary charge. !
Thus, we shall write out the field due to a charge q 1  
B p 3 0e1  x3 e2 x2 e3
fixed at the origin of the hatted coordinate system 1   2 r
(Coulombs law) and transform, by [40], to an !
unhatted coordinate system moving relative to it. q 1
p 3 e1  r 44
Relative to x ^1 , x
^2 , x
^3 , x
^4 , the familiar inverse square 1   2 r
law for a fixed point charge q located at the spatial
^ = 0 and E ^ = (q=^r3 )^r , where ^r = x for the field of a charge moving uniformly with
origin gives B ^1^e1
velocity e1 at the instant the charge passes through
^2^e2 x
x ^3^e3 and ^r = ((^ x1 )2 (^x2 )2 (^x3 )2 )1=2 (note
^ is defined only on M  Span{^e4 }). Thus, the origin. Observe that when  1, r
r, so [43]
that E
says that the electric field of a slowly moving charge
0 1
0 0 0 ^1
x is approximately the Coulomb field. When  1,
B ^2 C
^ab q B 0
F
0 0 x C 41
[44] reduces to the BiotSavart law.
^r3 @ 0 0 0 ^3 A
x Let us consider one other simple application, that
1 2 3 is, the response of a charged particle (, m, q) to an
^ x ^ x ^ x 0
^ab ) electromagnetic field which, for some admissible
It is a simple matter to verify that, on its domain, (F observer, is constant and purely magnetic. For
satisfies the source free Maxwell equations. Taking  to simplicity, we assume that, for this observer E = 0
be the special Lorentz transformation corresponding to and B = be3 , where b is a nonzero constant. The
[5] and writing out [40] with (F ^ab ) given by [41] yields
corresponding 2-form F has components
 1 0 1
x
^ 0 b 0 0
E1 q 3
^r B b 0 0 0 C
 2 Fab B
@ 0 0 0 0A
C
2 q x
^
E p
0 0 0 0
1   ^r3 2
 3 (from [34]). The corresponding linear transforma-
q x
^
E3 p 3 ~ has the same matrix relative to this basis so,
tion F
1   ^r 2
42 with () = xa ()ea and U() = Ua ()ea , the Lorentz
B1 0 4-force law [25] reduces to the system of linear
 3 differential equations
2 q x
^
B p 3
1   ^r 2 dU1 bq 2 dU2 bq
 2 U ;  U1
q x
^ d m d m
3
B p 3 dU3 dU4
1   ^r 2 0; 0
d d
We wish to express these in terms of measurements The system is easily solved and the results easily
made by the unhatted observer at the instant the integrated to give
charge passes through his spatial origin. Setting  
x4 = 0 in [5] gives bq
 x0 a sin  e1
1 m
^1 p x1 ;
x ^2 x2 ;
x ^3 x3
x  
bq
1  2 a cos  e2
m
and so  
a2 b2 q2 2
ce3 1 c e4 45
1 m2
^r2 x1 2 x2 2 x3 2
1  2
where x0 = xa0 ea 2 M is constant and a, , and c are
which, for convenience, we write r2 . Making these real constants with a > 0 (we have used U  U = 1
substitutions in [42] gives to eliminate one other arbitrary real constant). Note
! that, at each point on , (x1  x10 )2 (x2  x20 )2 = a2 .
q 1  1 
E p 3 x e1 x2 e2 x3 e3 Thus, if c 6 0 the spatial trajectory in this coordi-
1   2 r nate system is a helix along the e3 -direction
!
q 1 (i.e., along the magnetic field lines). If c = 0, the
p 3 r 43 trajectory is a circle in the x1 x2 plane. This case
1   2 r is of some practical significance since one can
108 Introductory Article: Minkowski Spacetime and Special Relativity

introduce constant magnetic fields in a bubble is any 2-form satisfying dF = 0 and g is an arbitrary
chamber so as to induce a particle of interest to 0-form, then locally, on a neighborhood of any
follow a circular path. We show now how to point, there exists a 1-form A satisfying
measure the charge-to-mass ratio for such a particle.  
Taking c = 0 in [45] and computing U(), then using dA F and d Ag 47
[11] to solve for the coordinate velocity vector V of (a more general result is proved in Parrott (1987,
the particle gives appendix 2) and a still more general one in section
   2.9 of this same source). The usefulness of the
abq=m bq
V q cos  e1 second condition in [47] can be illustrated as
m
1  kVk2 follows. Suppose we are given some (physical)
   configuration of charges and currents (i.e., some
bq
sin  e2 source 1-form J) and we wish to find the corre-
m
sponding electromagnetic field F. We must solve
From this one computes Maxwells equations dF = 0 and  d F = J (subject to
 1 whatever boundary conditions are appropriate).
m2 Locally, at least, we may seek instead a correspond-
kV k2 1
a b2 q2
2 ing potential A (so that F = dA). Then the first of
Maxwells equations is automatically satisfied
(note that this is a constant). Solving this last equation (dF = d(dA) = 0) and we need only solve
for q=m (and assuming q > 0 for convenience) one  
d (dA) = J. To simplify the notation let us tempora-
arrives at rily write  =  d and consider the operator  =
q 1 kVk d   d on forms (variously called the Laplace
q Beltrami operator, Laplacede Rham operator, or
m ajbj
1  kVk2 Hodge Laplacian on Minkowski spacetime). Then
Since a, b, and kVk are measurable, one obtains the A dA dA d d A  d dA 48
desired charge-to-mass ratio.
To conclude we wish to briefly consider the According to the result quoted above, we may
existence and use of potentials for electromagnetic narrow down our search by imposing the condition
 
fields. Suppose F is an electromagnetic field defined d A = 0, that is
on some connected, open region X in M. Then F is
A 0 49
a 2-form on X which, by [28], is closed. Suppose
also that the second de Rham cohomology H 2 (X ; R) (this is generally referred to as imposing the Lorentz
of X is trivial (since M is topologically R4 this will gauge). With this, [48] becomes A =  d (dA) and
be the case, for example, when X is all of M, or an to satisfy the second Maxwell equation we must
open ball in M, or, more generally, an open star- solve
shaped region in M). Then, by definition, every
closed 2-form on X is exact so, in particular, there A J 50
exists a 1-form A on X satisfying Thus, we see that the problem of (locally) solving
F dA 46 Maxwells equations for a given source J reduces
to that of solving [49] and [50] for the potential A.
In particular, such a 1-form A always exists locally To understand how this simplifies the problem, we
on a neighborhood of any point in X for any F. Such note that a calculation in admissible coordinates
an A is not uniquely determined, however, because, shows that the operator  reduces to the compo-
if A satisfies [46], then so does A df for any nentwise dAlembertian &, defined on real-valued
smooth real-valued function (0-form) f on X (d2 = 0 functions by
implies d(A df ) = dA d2 f = dA = F). Any 1-form
A satisfying [46] is called a (gauge) potential for F. @2 @2 @2 @2
& 2
2
2

The replacement A ! A df for some f is called a @x1 @x2 @x3 @x4 2
gauge transformation of the potential and the
freedom to make such a replacement without Thus, eqn [50] decouples into four scalar equations
altering [46] is called gauge freedom. &Aa Ja ; a 1; 2; 3; 4 51
One can show that, given F, it is always possible
to locally solve dA = F for A subject to an arbitrary each of which is the well-studied inhomogeneous
specification of the 0-form  d A. More precisely, if F wave equation.
Introductory Article: Quantum Mechanics 109

Further Reading Naber GL (1992) The Geometry of Minkowski Spacetime. Berlin:


Springer.
Choquet-Bruhat Y, De Witt-Morette C, and Dillard-Bleick M Parrott S (1987) Relativistic Electrodynamics and Differential
(1977) Analysis, Manifolds and Physics. Amsterdam: North- Geometry. Berlin: Springer.
Holland. Spivak M (1965) Calculus on Manifolds. New York: W A Benjamin.
Einstein A et al. (1958) The Principle of Relativity. New York: Synge JL (1972) Relativity: The Special Theory. Amsterdam:
Dover. North-Holland.

Introductory Article: Quantum Mechanics


G F dellAntonio, Universita di Roma La Sapienza, are inadequate for the description of emission and
Rome, Italy absorption of light, in which the internal structure of
2006 Elsevier Ltd. All rights reserved. the atom plays a major role.
The birth of the old quantum theory is placed
traditionally at the date of M Plancks discussion of
the blackbody radiation in 1900.
Historical Background Planck put forward the postulate that light is
In this section we shall briefly recall the basic emitted and absorbed by matter in discrete energy
empirical facts and the first theoretical attempts quanta through resonators that have an energy
from which the theory and the formalism of present- proportional to their frequency. This assumption
day quantum mechanics (QM) has grown. In the led, through the use of Gibbs rules of Statistical
next sections we shall give the mathematical and Mechanics applied to a gas of resonators, to a law
computational structure of QM, mention the physi- (Plancks law) which reproduces the empirical
cal problems that QM has solved with much findings on the radiation from a blackbody. It led
success, and describe the serious conceptual consis- Einstein to ascribe to light (which had, since the
tency problems which are posed by QM (and which times of Maxwell, a successful description in terms
remain unsolved up to now). of waves) a discrete, particle-like nature. Nine years
Empirical rules of discretization were observed later A Einstein gave further support to Plancks
already, starting from the 1850s, in the absorption postulate by showing that it can reproduce correctly
and in the emission of light. Fraunhofer noticed the energy fluctuations in blackbody radiation and
that the dark lines in the absorption spectrum of even clarifies the properties of specific heat. Soon
the light of the sun coincide with the bright lines in afterwards, Einstein (1924, 1925) proved that the
the emission lines of all elements. G Kirchhoff and putative particle of light satisfied the relativistic laws
R Bunsen reached the conclusion that the relative (relation between energy and momentum) of a
intensities of the emission and absorption of light particle with zero mass.
implied that the ratio between energy emitted and This dual nature of light received further support
absorbed is independent of the atom considered. from the experiments on the Compton effect and
This was the starting point of the analysis by from description, by Einstein, of the photoelectric
Planck. effect (Einstein 1905). It should be emphasized
On the other hand, by the end of the eighteenth that while Planck considered with light in interaction
century, the spatial structure of the atom had been with matter
as composed of bits of energy h
(h
investigated; the most successful model was that of 6, 6  1027 erg s), Einsteins analysis went much
Rutherford, in which the atom appeared as a small further in assigning to the quantum of light properties
nucleus of charge Z surrounded by Z electrons of a particle-like (localized) object. This marks a
attracted by the nucleus according to Coulombs complete departure from the laws of classical electro-
law. This model represents, for distances of the magnetism. Therefore, quoting Einstein,
order of the size of an atom, a complete departure
It is conceivable that the wave theory of light, which
from Newtons laws combined with the laws of retains its effectiveness for the representation of purely
classical electrodynamics; indeed, according to these optical phenomena and is based on continuous functions
laws, the atom would be unstable against collapse, over space, will lead to contradiction with the experiments
and would certainly not exhibit a discrete energy when applied to phenomena in which there is creation or
spectrum. We must conclude that the classical laws conversion of light; indeed these phenomena can be better
110 Introductory Article: Quantum Mechanics

described on the assumption that light is distributed P Jordan, W Pauli, P Dirac and, on the mathema-
discontinuously in space and described by a finite number tical side, also by J von Neumann and A Weyl. This
of quanta which move without being divided and which formulation maintains that one should only consider
must be absorbed or emitted as a whole. relations between observable quantities, described
Notice that, for wavelength of 8103 A, a 30 W by elements that depend only on the initial and final
lamp emits roughly 1020 photons s1 ; for macro- states of the system; each state has an internal
scopic objects the discrete nature of light has no energy. By energy conservation, the difference
appreciable consequence. between the energies must be proportional (with a
Plancks postulate and energy conservation imply universal constant) to the frequency of the radiation
that in emitting and absorbing light the atoms of the absorbed or emitted. This is enough to define the
various elements can lose or gain energy only by energy of the state of a single atom modulo an
discrete amounts. Therefore, atoms as producers or additive constant. The theory must also take into
absorbers of radiation are better described by a account the probability of transitions under the
theory that assigns to each atom a (possible infinite) influence of an external electromagnetic field.
discrete set of states which have a definite energy. We shall give some details later on, which will
The old quantum theory of matter addresses help to follow the basis of this approach.
precisely this question. Its main proponent is The other attempt was originated by L de Broglie
N Bohr (Bohr 1913, 1918). The new theory is following early remarks by HW Bragg and
entirely phenomenological (as is Plancks theory) M Brillouin. Instead of emphasizing the discrete
and based on Rutherfords model and on three nature of light, he stressed the possible wave nature
more postulates (Born 1924): of particles, using as a guide the HamiltonJacobi
formulation of classical mechanics. This attempt
(i) The states of the atom are stable periodic was soon supported by the experiments of Davisson
orbits, as given by Newtons laws, of energy and Germer (1927) of scattering of a beam of ions
En , n 2 Z , given by En = hn f (n), where h is from a crystal. These experiments showed that,
Planks constant, n is the frequency of the while electrons are recorded as point particles,
electron on that orbit, and f(n) is for each atom their distribution follows the law of the intensity for
a function approximately linear in Z at least for the diffraction of a (dispersive) wave. Moreover, the
small values of Z. relation between momentum and frequency was,
(ii) When radiation is emitted or absorbed, the within experimental errors, the same as that
atom makes a transition to a different state. obtained by Einstein for photons.
The frequency of the radiation emitted or The theory started by de Broglie was soon placed
absorbed when making a transition is in almost definitive form by E Schrodinger. In this
n, m = h1 jEn  Em j. approach one is naturally led to formulate and solve
(iii) For large values of n and m and small values of partial differential equations and the full develop-
(n  m)=(n m) the prediction of the theory ment of the theory requires regularity results from
should agree with those of the classical theory the theory of functions.
of the interaction of matter with radiation. Schrodinger soon realized that the relations which
were found in the approach of Heisenberg could be
Later, A Sommerfeld gave a different version of the easily (modulo technical details which we shall
first postulate, by requiring that the allowed orbits discuss later) obtained within the formalism he was
be those for which the classical action is an integer advocating and indeed he gave a proof that the two
multiple of Plancks constant. formalisms were equivalent. This proof was later
The old quantum theory met success when refined, from the mathematical point of view, by
applied to simple systems (atoms with Z < 5) but J von Neumann and G Mackey.
it soon appeared evident that a new, radically In fact, Schrodingers approach has proved much
different point of view was needed and a fresh more useful in the solution of most physical
start; the new theory was to contain few free problems in the nonrelativistic domain, because it
parameters, and the role of postulate (iii) was now can rely on the developments and practical use of
to fix the value of these parameters. the theory of functions and of partial differential
There were two (successful) attempts to construct equations. Heisenbergs algebraic approach has
a consistent theory; both required a more sharply therefore a lesser role in solving concrete problems
defined mathematical formalism. The first one was in (nonrelativistic) QM.
sparked by W Heisenberg, and further important If one considers processes in which the number of
ideas and mathematical support came from M Born, particles may change in time, one is forced to
Introductory Article: Quantum Mechanics 111

introduce a Hilbert space that accommodates states neighborhood of !0 ), one finds that u
^(x, !) is an
with an arbitrarily large number of particles, as is approximate solution of the equation
the case of the theory of relativistic quantized field
or in quantum statistical mechanics; it is then more !20 2
ux; !
^ n x; !^
ux; ! 1
difficult to follow the line of Schrodinger, due to c2
difficulties in handling spaces of functions of Writing u(x, !) = A(x, !) ei(!=c)W(x, !) the phase
infinitely many variables. The approach of Heisen- W(x, !) satisfies, in the high-frequency limit, the
berg, based on the algebra of matrices, has a rather eikonal equation jrW(x, !)j2 = n2 (x, !). One can
natural extension to suitable algebras of operators; define for the solution a phase velocity vf and it
the approach of Schrodinger, based on the descrip- turns out that vf = c=jrW(x, !)j.
tion of a state as a (wave) function, encounters more On the other hand, classical mechanics can also be
difficulties since one must introduce functionals over described by propagation of surfaces of constant value
spaces of functions and the description of dynamics for the solution W(x, t) of the HamiltonJacobi
does not have a simple form. equation H(x, rW) = E, with H = p2 =2m V(x).
From this point of view, the generalization of Recall that high-frequency (the realm of geometric
Heisenbergs approach has led to much progress in optics) corresponds to small distances. This analogy
the understanding of the structure of the resulting led Schrodinger (1926) to postulate that the dynamics
theory. Still some relevant results have been satisfied by the waves associated with the particles was
obtained in a Schrodinger representation. We shall given by the (Schrodinger) equation
not elaborate further on this point.
We shall end this introductory section with a @ x; t h2
ih  x x; t Vx x; t 2
short description of the emergence of the structure @t 2m
of QM in Heisenbergs and Schrodingers
This wave was to describe the particle and its motion,
approaches; this will provide a motivation for the
but, being complex valued, it could not represent any
axiom of QM which we shall introduce in the
measurable property. It is a mathematical
R property of
following section. For an extended analysis, see, for
the solutions of [2] that the quantity j (x, t)j2 d3 x is
example, Jammer (1979).
preserved in time. Furthermore, if one sets
The specific form that was postulated by
de Broglie (1923) for the wave nature of a particle x; t  j x; tj2
relies on the relation of geometrical optics with
h 
wave propagation and on the formulation of jx; t  i x; tr x; t  x; tr x; t 3
Hamiltonian mechanics as a sort of wave front 2m
propagation through the solution of the Hamilton one easily verifies the local conservation law
Jacobi equation and the introduction of group
@
velocity. div jx; t 0 4
By the analogy with electromagnetic wave, it is @t
natural to associate with a free nonrelativistic These mathematical properties led to the statis-
particle of momentum p and mass m the plane wave tical interpretation given by Max Born: in those
experiments in which the position of the particles is
h p2 measured, the integral of j (x, t)j2 over a region  of
p x; t eipxEt=h ; h
 ; E space gives the probability that at time t the particle
2 2m
is localized in the region . Moreover, the current
Schrodinger obtained the equation for a quantum associated with a charged particle is given locally by
particle in a field of conservative forces with j(x, t) defined above.
potential V(x) by considering an analogy with the Let us now briefly review Heisenbergs approach.
propagation of an electromagnetic wave in a At the heart of this approach are: empirical formulas
medium with refraction index n(x, !) that varies for the intensities of emission and absorption of
slowly on the scale of the wavelength. Indeed, in this radiation (dispersion relations), Sommerfelds quan-
case the wave follows the laws of geometrical tum condition for the action and the vague
optics, and has therefore a particle-like behavior. statement the analogue of the derivative for the
If one denotes by u^(x, !) the Fourier transom (with discrete action variable is the corresponding finite
respect to time) of a generic component of the difference quotient. And, most important, the
electric field and one assumes that the field be remark that the correct description of atomic
essentially monochromatic (so that the support of physics was through quantities associated with
u
^(x, !) as a function of ! is in a very small pairs of states, that is, (infinite) matrices and the
112 Introductory Article: Quantum Mechanics

empirical fact that the frequency (or rather the wave The conclusion Born and Heisenberg drew is that
number) !k, j of the radiation (emitted or absorbed) the matrix A that takes the place of the momentum
in the transition between the atomic levels k and in the classical theory must be such that
j (k 6 j) satisfies the Ritz combination principle jAnm, n j2 = e2 hm1 f (n m, n). In the same vein,
!m, j !j, k = !m, k . It easy to see that any doubly considering the polarization in a static electric
indexed family satisfying this relation must have the field, it is possible to find an expression for the
form !m, k = Em  Ek for suitable constant Ej . matrix that takes the place of the coordinate x in
It was empirically verified by Kramers that the classical Hamiltonian theory.
dipole moment of an atom in an external monochro- In general, the new approach (matrix mechanics)
matic external field with frequency  was proportional associates matrices with some relevant classical
to the field with a coefficient (of polarization) observables (such as functions of position or
  momentum) with a time dependence that is derived
e2 X fi Fi from the empirical dispersion relations of Kramers,
P  5
4m i i2   2 i2   2 the correspondence principle, Bohrs rule, Sommer-
feld action principle and first- (and second-) order
where e, m are the charge and the mass of the
perturbation theory for the interaction of an atom
electron and fi , Fi are the probabilities that the
with an external electromagnetic field. It was soon
frequency  is emitted or absorbed.
clear to Born and Jordan (1925) that this dynamics
A detailed analysis of the phenomenon of polarization
took the form ihA_ = AH  HA for a matrix H that
in classical mechanics, with the clearly stated aim of
for the case of the hydrogen atom is obtained for the
presenting the results in a way that may give hints for the
classical Hamiltonian with the prescription given for
construction of a New Mechanics was made by Max
the coordinates x and p. It was also seen as plausible
Born (1924). He makes use of action-angle variables
the relation [^ ^k ] = iI among the matrices x
xh , p ^k and
{ Ji , i } assuming that the atom can be considered as a
^k corresponding to position and momentum. One
p
collection of harmonic oscillators with frequency i
year later P Dirac (1926) pointed out the structural
coupled linearly to the electric field of frequency .
identity of this relation with the Poisson bracket of
In the dipole approximation one obtains the
Hamiltonian dynamics, developed a quantum alge-
following result for the polarization P (linear
bra and a quantum differentiation and proved
response in energy to the electric field):
that any  -derivation (derivation which preserves
X jAJj2   m the adjoint) of the algebra BN of N  N matrices is
P 2m  rJ 6 inner, that is, is given by (a) = i[a, h] for a
m>0 m  2  
Hermitian matrix h. Much later this theorem was
where k = @H=@Jk , H is the interaction Hamiltonian), extended (with some assumptions) to the algebra of
and A( J) is a suitable matrix. In order to derive the all bounded operators on a separable Hilbert space.
new dynamics, having as a guide the correspondence Since the derivations are generators of a one-
principle, one has to compare this result with the parameter continuous group of automorphisms,
Kramers dispersion relation, which we write (to make that is, of a dynamics, this result led further strength
the comparison easier) in the form to the ideas of Born and Heisenberg.
The algebraic structure introduced by Born,
e2 X fm;n fn;m Jordan, and Heisenberg (1926) was used by Pauli
P 2 2
 2 Em > En 7
4m n;m n;m   n;m  2 (1927) to give a purely group-theoretical derivation
of the spectrum of the hydrogen atom, following the
Bohrs rule implies that (n , n) = (E(n  lines of the derivation in symplectic mechanics of the
E(n))=h. SO(4) symmetry of the Coulomb system. This
Born and Heisenberg noticed that, for n suffi- remarkable success gave much strength to the
ciently large and k small, one can approximate the Heisenberg formulation of QM, which was soon
differential operator in [6] with the corresponding recognized as an efficient instrument in the study of
difference operator, with an error of the order of k/n. the atomic world.
Therefore, [6] could be substituted by The algebraic formulation was also instrumental
" in the description given by Pauli (1928) of the
1
X jAnm;n j2
P h 2
spin (a property of electrons empirically postu-
2
mk >0 n m   lated by Goudsmidt and Uhlenbeck to account for a
#
jAnm;n j2 hyperfine splitting of some emission lines) as
 8 internal degree of freedom without reference to
n  m2  2 spatial coordinates and still connected with the
Introductory Article: Quantum Mechanics 113

properties of the the system under the group of interpretation forces the particle wave to be square
spatial rotations. This description through matrices integrable, and mathematics provides a limitation on
has a major role also in the formulation by Pauli of the simultaneous localization in momentum and
the exclusion principle (and its relation with Fermi position leading to Heisenbergs uncertainty princi-
Dirac statistics), which gave further credit to the ple. Dynamics is obtained from a particlewave
Heisenbergs theory by helping in reproducing duality and an analogy with the relativistic wave
correctly the classification of the atoms. equation in the low-energy regime. The presence of
These features may explain why the standard bound states with quantized energies is seen as a
formulation of the axioms of QM given in the next consequence of the well-known fact that waves
section shows the influence of Heisenbergs confined to a bounded spatial region have their
approach. On the other hand, comparison with wave number (and therefore energy) quantized.
experiments is usually set in the framework in
Schrodingers approach. Posing the problems in
terms of properties of the solution of the Schrodinger Formal Structure
equation, one is led to a pragmatic use of the
In this section we describe the formal mathematical
formalism, leaving aside difficulties of interpreta-
structure that is commonly associated with QM. It
tion. This separation of the axioms from the
constitutes a coherent mathematical theory, but the
practical use may be one of the reasons why a
interpretation axiom it contains leads to conceptual
serious analysis of the axioms and of the problems
difficulties.
that arise from them is apparently not a concern for
We state the axioms in the form in which they
most of the research in QM, even from the point of
were codified by J von Neumann (1966); they
view of mathematical physics.
constitute a mathematically precise rendering of the
One should stress that both the approach of Born
formalism of Born, Heisenberg, and Jordan. The
and Heisenberg and that of de Broglie and Schro-
formalism of Schrodinger per se does not require
dinger are rooted in a mixture of attention to the
general statements about the category of
experimental data, deep understanding of the pre-
observables.
vious theory, bold analogies and approximations,
and deep concern for the consistency of the new Axiom I
mechanics.
(i) Observables are represented by self-adjoint opera-
There is an essential difference between the
tors in a complex separable Hilbert space H.
starting points of the two approaches. In Heisen-
(ii) Every such operator represents an observable.
bergs approach, the atom has a priori no spatial
structure; the description is entirely in terms of its Remark Axiom I (ii) is introduced only for mathe-
properties under emission and absorption of light, matical simplicity. There is no physical justification
and therefore its observable quantities are repre- for part (ii). In principle, an observable must be
sented by matrices. Dynamics enters through the connected to a procedure of measurement (observa-
study of the interaction with the electromagnetic tion) and for most of the self-adjoint operators on H
field, and some analogies with the classical theory of (e.g., in the Schrodinger representation for
electrodynamics in an asymptotic regime (correspon- ixk (@=@xh )xk ) such procedure has not yet been given).
dence principle). In this way, as we have briefly
Axiom II
indicated, the special role of some matrices, which
have a mutual relation similar to the relation of (i) Pure states of the systems are represented by
position and momentum in Hamiltonian theory. normalized vectors in H.
Following this analogy, it is possible to extend the (ii) If a measurement of the observable A is made on
theory beyond its original scope and consider a system in the state represented by the element
phenomena in which the electrons are not bound  2 H, the average of the numerical values one
to an atom. obtains is < , A >, a real number because A is
In the approach of Schrodinger, on the other self-adjoint (we have denoted by < , > the
hand, particles and collections of particles are scalar product in H).
represented by spatial structures (waves). Spatial
Remark Notice that Axiom II makes no statement
coordinates are therefore introduced a priori, and
about the outcome of a single measurement.
the position of a particle is related to the intensity of
the corresponding wave (this was stressed by Born). Using the natural complex structure of B(H), pure
Position and momentum are both basic measurable states can be extended as linear real functionals on
quantities as in classical mechanics. Physical B(H).
114 Introductory Article: Quantum Mechanics

One defines a state as any linear real positive b, 2 R then immediately after the measure-
functional on B(H) (all bounded operators on the ment the system can be in any (not necessarily
separable Hilbert space H) and says that a state is pure) state which lies in the convex hull of the
normal if it is continuous in the strong topology. pure states which are in the spectral subspace of
It can be proved that a normal state can be the operator A in the interval b; 
decomposed into a convex combination of at most (b  , b ).
a denumerable set of pure states. With these
Note Statements (ii) and (iii) can be extended
definitions a state is pure iff it has no nontrivial
without modification to the case in which the initial
decomposition. It is worth stressing that this state-
state is not a pure state, and is represented by a
ment is true only if the operators that correspond to
density matrix
.
observable quantities generate all of B(H); one refers
to this condition by stating that there are no Remark 1 Axiom III makes sure that if one
superselection rules. performs, immediately after the first, a further
By general results in the theory of the algebra measurement of the same observable A the outcome
B(H), a normal state  is represented by a positive will still lie in the interval b; . This is needed to
operator of trace class
through the formula give some objectivity to the statement made about
(A) = Tr(
A). Since a positive trace-class operator the outcome; notice that one must place the
(usually referred to as density matrix in analogy condition immediately after because the evolution
with its classical counterpart) has eigenvalues k may not leave invariant the spectral subspaces of A.
that are positive and sum up to 1, the decomposition
P If the operator A has, in the interval b; , only
of the normal state  takes the form
= k k k , discrete (pure point) spectrum, one can express
where k is the projection operator onto the kth Axiom III in the following way: the outcome can
eigenstate (counting multiplicity). be any state that can be represented by a convex
It is also convenient to know that if a sequence of affine superposition of the eigenstates of A with
normal states
k on B(H) converges weakly (i.e., for eigenvalues contained in b; .
each A 2 B(H) the sequence
k (A) converges) then
the limit state is normal. This useful result is false in In the very special case when A has only one
general for closed subalgebras of B(H), for example, eigenvalue in b; and this eigenvalue is not
for algebras that contain no minimal projections. degenerate, one can state Axiom III in the following
Note that no pure state is dispersion free with form (commonly referred to as reduction of the
respect to all the observables (contrary to what wave packet): the system after the measurement is
happens in classical mechanics). Recall that the pure and is represented by an eigenstate of the
dispersion of the state 
with respect to the operator A.
observable A is defined as 
(A) 
(A2 )  (
(A))2 .
Remark 2 Notice that the third axiom makes a
The connection of the state with the outcome of a
statement about the state of the system after the
single measurement of an observable associated with
measurement is completed.
an operator A is given by the following axiom, which
we shall formulate only for the case when the self- It follows from Axiom III that one can measure
adjoint operator A has only discrete spectrum. The simultaneously only observables which are repre-
generalization to the other case is straightforward but sented by self-adjoint operators that commute with
requires the use of the spectral projections of A. each other (i.e., their spectral projections mutually
commute). It follows from the spectral representa-
Axiom III
tion of the self-adjoint operators that a family {Ak }
(i) If A has only discrete spectrum, the possible of commuting operators can be considered (i.e.,
outcomes of a measurement of A are its there is a representation in which they are) functions
eigenvalues {ak }. over a common measure space.
(ii) If the state of the system immediately before the Axioms IIII give a mathematically consistent
measurement is represented by the vector P 2 H, formulation of QM and allow a statistical descrip-
the probability that the outcome be ak is h j < , tion (and statistical prediction) of the outcome of
A; k
h > j, where h
A; k
are a complete orthonormal the measurement of any observable. It is worth
set in the Hilbert space spanned by the eigenvec- remarking that while the predictions will have only
tors of A to the eigenvalue ak . a statistical nature, the dynamical evolution of the
(iii) If a system is in the pure state  and one observables (and by duality of the states) will be
performs a measurement of the observable described by deterministic laws. The intrinsically
A with outcome aj 2 (b  , b ) for some statistical aspect of the predictions comes only from
Introductory Article: Quantum Mechanics 115

the third postulate, which connects the mathemati- statistical mixture of the same two states, defined
cal content of the theory with the measurement by the density matrix
= jaj2  jbj2  , where we
process. have denoted by  the orthogonal projection onto
The third axiom, while crucial for the connection the normalized vector . Therefore, the search for
of the mathematical formalism with the experimen- these interference terms is one of the means to verify
tal data, contains the seed of the conceptual the predictions of QM, and their smallness under
difficulties which plague QM and have not been given conditions is a sign of quasiclassical behavior
cured so far. of the system under study.
Indeed, the third axiom indicates that the process Strictly connected to superposition are entangle-
of measurement is described by laws that are ment and the partial trace operation. Suppose that
intrinsically different from the laws that rule the one has two systems which when considered
evolution without measurement. This privileged role separately are described by vectors in two Hilbert
of the changing by effect of a measurement leads to spaces Hi , i = 1, 2, and which have observables Ai 2
serious conceptual difficulties since the changing is B(Hi ). When we want to study their mutual
independent of whether or not the result is recorded interaction, it is natural to describe both of them in
by some observer; one should therefore have a way the Hilbert space H1  H2 and to consider the
to distinguish between measurements and generic observables A1  I and I  A2 .
interactions with the environment. When the systems interact, the interaction will not
A related problem that is originated by Axiom III in general commute with the projection operator 1
is that the formulation of this axiom refers implicitly onto H1 . Therefore, even if the initial state is of the
to the presence of a classical observer that certifies form 1  2 , i 2 Hi , the final state (after the
the outcomes of measurements and is allowed to interaction) is a vector 2 H1  H2 which cannot
make use of classical probability theory. This be written as = 1  2 with i 2 Hi . It can be
observer is not subjected therefore to the laws shown, however, that there always exist two
of QM. orthonormal family P vectors n 2 H1 and n 2 H2
These two aspects of the conceptual difficulties such
P that = cn n  n for suitable cn 2 C,
have their common origin in the separation of the jcn j2 = 1 (this decomposition is not unique in
measuring device and of the measured systems into general).
disjoint entities satisfying different laws. The diffi- Recalling that  (A1  I) =  (A1 ), one can write
culties in the theory of measurement have not yet X
received a satisfactory answer, but various attempts  A1  I jcn j2 n A1 
A1
have been made, with various degree of success, and X

 jcn j2 n
some of them are described briefly in the section n
Interpretation problems. It appears therefore that
QM in its present formulation is a refined and The map 2 :  ! 
1 is called reduction or also
successful instrument for the description of the conditioning) with respect to H2 ; it is also called
nonrelativistic phenomena at the Planck scale, but partial trace with respect to H2 . The first notation
its internal consistency is still standing on shaky reflects the analogy with conditioning in classical
ground. probability theory.
Returning to the axioms, it is worth remarking The map 2 can be extended by linearity to a map
explicitly that according to Axiom II a state is a from normal states (density matrices) on B(H1  H2 )
linear functional over the observables, but it is to normal states on B(H1 ) and gives rise to a
represented by a sesquilinear function on the positivity-preserving and trace-preserving map.
complex Hilbert space H. Since Axiom II states One can in fact prove (Takesaki 1971) that any
that any normalized element of H represents a state conditioning for normal states of a von Neumann
(and elements that differ only by a phase represent algebra M is completely positive in the sense that it
the same state) together with , also  a remains positive after tensorization of M with B(K),
b , jaj2 jbj2 = 1 represent a state superposition of where K is an arbitrary Hilbert space.
 and (superposition principle). It can also be proved that a partial converse is
But for an observable A, one has in general true, that is, that every completely positive trace-
 (A) 6 jaj2  (A) jbj2  (A), due to the cross-terms preserving map  on normal states of a von
in the scalar product. The superposition principle is Neumann algebra A B(H) can be written, for a
one of the characteristic features of QM. The suitable choice of a larger Hilbert space K and
superposition of the two pure states  and has partial Pisometries Vk , in the form (Kraus form)
properties completely different from those of a (a) = k Vk aVk .
116 Introductory Article: Quantum Mechanics

But it must be remarked that, if U(t) is a one- described above for a trace. Most of the definitions
parameter group of unitary operators on H1  H2 (e.g., of entropy) can be given in this enlarged
and
is a density matrix, the one-parameter family context, but differences may occur, since in general
of maps (t) 
! 2 (U(t)
U (t)) does not, in A does not contain finite-dimensional projections,
general, have the semigroup property (t s) = and therefore the trace function is not the trace
(t)  (s) s, t > 0 and therefore there is in general commonly defined in a Hilbert space. We shall not
no generator (of a reduced dynamics) associated describe further this very interesting and much
with it. Only in special cases and under very strong developed theory, of major relevance in quantum
hypothesis and approximations is there a reduced statistical mechanics. For a thorough presentation
dynamics given by a semigroup (Markov property). see Ohya and Petz (1993).
Since entanglement and (nontrivial) conditioning are The simplest and most-studied example is the
marks of QM, and on the other side the Markov case when each Hilbert space Hi is a complex
property described above is typical of conditioning in two-dimensional space. The resulting system is
classical mechanics, it is natural to search for condi- constructed in analogy with the Ising model of
tions and approximations under which the Markov classical statistical mechanics, but in contrast to that
property is recovered, and more generally under which system it possesses, for each value of the index i,
the coherence properties characteristic of QM are infinitely many pure states. The corresponding
suppressed (decoherence). We shall discuss briefly this algebra of observables is a closed subalgebra of
problem in the section Interpretation problems, (C2  C2 ) Z and generically does not contain any
devoted to the attempts to overcome the serious finite-dimensional projection.
conceptual difficulties that descend from Axiom III. This model, restricted to the case (C2  C2 )K , K a
It is seen from the remarks and definitions above finite integer, has become popular in the study of
that normal states (density matrices) play the role quantum information and quantum computation, in
that in classical mechanics is attributed to measures which case a normalized element of Hi is called a q-bit
over phase space, with the exception that pure states (in analogy with the bits of information in classical
in QM do not correspond to Dirac measures (later information theory). It is clear that the unit sphere in
on we shall discuss the possibility of describing a (C2  C2 ) contains many more than four points, and
quantum-mechanical states with a function (Wigner this gives much more freedom for operations on the
function) on phase space). system. This is the basis of quantum computation and
In this correspondence, evaluation of an observa- quantum information, a very interesting field which
ble (a measurable function over phase space) over a has received much attention in recent years.
state (a normalized, positive measure) is related to
finding the (Hilbert space) trace of the product of an
operator in B(H) with a density matrix. Notice that Quantization and Dynamics
the trace operation shares some of the properties of The evolution in nonrelativistic QM is described by
the integral, in particular tr AB = tr BA if A is in the Schrodinger equation in the representation in
trace class and B 2 B(H) (cf. g 2 L1 and f 2 L1 ) which for an N-particle system the Hilbert space is
and tr AB > 0 if A is a density matrix and B is a L2 (R3N  Ck , where Ck is a finite-dimensional space
positive operator. This suggests to define functions which accounts for the fact that some of the
over the density matrices that correspond to quan- particles may have a spin content.
tities which are important in the theory of dynami- Apart from (often) inessential parameters, the
cal systems, in particular the entropy. Schrodinger equation for spin-0 particles can be
This is readily done if the Hilbert space is finite written typically as
dimensional, and in the infinite-dimensional case if
one takes as observables all Hermitian bounded @
ih H
operators. In quantum statistical mechanics one is @t
led to consider an infinite collection of subsystems, X
N

each one described with a Hilbert space (finite or H mk ihrk Ak 2


k1
infinite dimensional) Hi , i = 1, 2, . . . , the space of
representation is a subspace K of H1  H2     , X
N X
N
Vk xk Vi;k xi  xk 9
and the observables are a (weakly closed) subalgebra
k1 i6k;1
A of B(K) (typically constructed as an inductive
limit of elements of the form I  I     Ak  I   ). where h is Plancks constant, Ak are vector-valued
In this context one also considers normal states on A functions (vector potentials), and Vk and Vi, k are
and defines a trace operation, with the properties scalar-valued function (scalar potentials) on R3 .
Introductory Article: Quantum Mechanics 117

If some particles have of spin 1/2, the correspond- One is led to wonder whether the structure of
ing kinetic energy term should read  (i h
 r)2 , fields (operator-valued elements in the dual of
where
k , k = 1, 2, 3, are the Pauli matrices and one compactly supported smooth functions on classical
must add a term W(x) which is a matrix field with spacetime), taken over in a simple way from the
values in Ck  Ck and takes into account the field structure of classical electromagnetism, is a
coupling between the spin degrees of freedom. valid instrument in the description of phenomena
Notice that the local operator i
 r is a square that take place at a scale incomparably smaller than
root of the Laplacian. the scale (atomic scale) at which we have reasons to
A relativistic extension of the Schrodinger equa- believe that the formalisms of Schrodinger and
tion for a free particle of mass m
0 in dimension Heisenberg provide a suitable model for the descrip-
3 was obtained by Dirac in a space of spinor- tion of natural phenomena.
valued functions k (x, t), k = 0, 1, 2, 3, which carries The phenomena which are related to the interac-
an irreducible representation of the Lorentz group. tion of a quantum nonrelativistic particle interacting
In analogy with the electromagnetic field, for which with the quantized electromagnetic field take place
a linear partial differential equation (PDE) can be at the atomic scale. These phenomena have been the
written using a four-dimensional representation of subject of very intense research in theoretical
the Lorentz group, the relativistic Dirac equation is physics, mostly within perturbation theory, and the
the linear PDE analysis to the first few orders has led to very
spectacular results (although there is at present no
X
3
@ proof that the perturbation series are at least
i k m ; x0  ct
@xk asymptotic).
k0
In this field rigorous results are scarce, but
where the k generate the algebra ofP a representation recently some progress has been made, establishing,
of the Lorentz group. The operator (@=@xk ) k is a among other things, the existence of the ground
local square root of the relativistically invariant state (a nontrivial result, because there is no gap
dAlembert operator @ 2 =@x20   m  I. separating the ground-state energy from the con-
When one tries to introduce (relativistically tinuous part of the spectrum) and paving the way
invariant) local interactions, one faces the same for the description of scattering phenomena; the
problem as in the classical mechanics, namely one latter result is again nontrivial because the photon
must introduce relativistically covariant fields (e.g., field may lead to an anomalous infrared (long-
the electromagnetic field), that is, systems with an range) behavior, much in the same way that the
infinite number of degrees of freedom. If this field is long-range Coulomb interaction requires a special
considered as external, one faces technical problems, treatment in nonrelativistic scattering theory.
which can be overcome in favorable cases. But if one This contribution to the Encyclopedia is meant to
tries to obtain a fully quantized theory (by also be an introduction to QM and therefore we shall
quantizing the field) the obstacles become unsur- limit ourselves to the basic structure of nonrelativis-
mountable, due also to the nonuniqueness of the tic theory, which deals with systems of a finite
representation of the canonical commutation rela- number of particles interacting among themselves
tions if these are taken as the basis of quantization, and with external (classical) potential fields, leaving
as in the finite-dimensional case. for more specialized contributions a discussion of
In a favorable case (e.g., the interaction of a more advanced items in QM and of the successes
quantum particle with the quantized electromagnetic and failures of a relativistically invariant theory of
field) one can set up a perturbation scheme in a interaction between quantum particles and quan-
parameter  (the physical value of  in natural units tized fields.
is roughly 1/137). We shall come back later to We shall return therefore to basics.
perturbation schemes in the context of the Schro- One may begin a section on dynamics in QM by
dinger operator; in the present case one has been discussing some properties of the solutions of the
able to find procedures (renormalization) by which Schrodinger equation, in particular dispersive effects
the series in  that describe relevant physical and the related scattering theory, the problem of
quantities are well defined term by term. But even bound states and resonances, the case of time-
in this favorable case, where the sum of the first few dependent perturbation and the ionization effect,
terms of the series is in excellent agreement with the the binding of atoms and molecules, the Rayleigh
experimental data, one has reasons to believe that scattering, the Hall effect and other effects in
the series is not convergent, and one does not even nanophysics, the various multiscale and adiabatic
know whether the series is asymptotic. limits, and in general all the physical problems that
118 Introductory Article: Quantum Mechanics

have been successfully solved by Schrodingers QM topologies. The strongest result refers to Wigners
(as well as the very many interesting and unsolved case. One can prove that if a one-parameter group
problems). of Wigner automorphism t is measurable in the
We will consider briefly these issues and the weak topology (i.e., t
(A) is measurable in t for
approximation schemes that have been developed in every choice of A and
) then it is possible to choose
order to derive explicit estimates for quantities of the U(t) provided by Wigners theorem in such a
physical interest. Since there are very many excellent way that they form a group which is continuous in
reviews of present-day research in QM (e.g., Araki the strong topology. Similar results are obtained for
and Ezawa (2004), Blanchard and DellAntonio the cases of Kadison and Segal automorphism, but
(2004), Cycon et al. (1986), Islop and Sigal (1996), in both cases one has to assume continuity of t in a
Lieb (1990), Le Bris (2005), Simon (2002), and stronger topology (the strong operator topology in
Schlag (2004)) we refer the reader to the more the Segal case, the norm topology in Kadisons).
specialized contributions to this Encyclopedia for a Weak continuity is sufficient if the operator product
detailed analysis and precise statements about the is preserved (in this case one speaks of automorph-
results. isms of the algebra of bounded operators). The
We prefer to come back first to the foundations of existence of the continuous group U(t) defines a
the theory; we shall take the point of view of Hamiltonian evolution. One has indeed:
Heisenberg and start discussing the mapping proper-
Theorem 1 (Stone). The map t ! U(t), t 2 R is a
ties of the algebra of observables and of the states.
weakly continuous representation of R in the set of
Since transition probabilities play an important role,
unitary operators in a Hilbert space H if and only if
we consider only transformations  which are such
there exists a self-adjoint operator H on (a dense set
that, for any pair of pure states 1 and 2 , one has
of) H such that U(t) = eitH and therefore
< (1 ), (2 ) > = < 1 , 2 >. We call these maps
Wigner automorphisms. dUt
A result of Wigner (see Weyl (1931)) states that if  2 DH ! i  HUt 10
dt
 is a Wigner automorphism then there exists a
unique operator U , either unitary or antiunitary, The operator H is called generator of the dynamics
such that (P) = U PU for all projection operators. described by U(t).
If there is a one-parameter group of such auto- Note In Schrodingers approach the operator
morphisms, the corresponding operators are all described in Stones theorem is called Hamiltonian,
unitary (but they need not form a group). in analogy with the classical case. In the case of one
A generalization of this result is due to Kadison. particle of mass m in R3 subject to a conservative
Denoting by I1, the set of density matrices, a force with potential energy V(x) it has the following
Kadison automorphism  is, by definition, such that form, in units in which h = 1:
for all
1 ,
2 2 I1, and all 0 < s < 1 one has (s
1
(1  s)
2 ) = s(
1 ) (1  s)(
2 ). For Kadison auto- 1 X @x2
k
morphisms the same result holds as for Wigners. H  Vx;  11
2m k
@x2k
A similar result holds for automorphisms of the
observables. Notice that the product of two Hermi- If the potential V depends on time, Stones theorem
tian operators is not Hermitian in general, but is not directly applicable but still the spectral
Hermiticity is preserved under Jordans product properties of the self-adjoint operators Ht and of
defined as A  B  (1=2)[AB BA]. the Kernel of the group  ! eiHt  are essential to
A Segal automorphism is, by definition, an solve the (time-dependent) Schrodinger equation.
automorphism of the Hermitian operators that The semigroup t ! etH0 is usually a positivity-
preserves the Jordan product structure. A theorem preserving semigroup of contractions and defines a
of Segal states that is a Segal automorphism if and Markov process; in favorable cases, the same is true
only if there exist an orthogonal projector E, a of t ! etH (FeynmannKac formula).
unitary operator U in EH, and an antiunitary There is an analogous situation in the general
operator V in (I  E)H such that (A) = W AW  , theory of dynamical systems on a von Neumann
where W  U V. algebra; in analogy with the case of elliptic
We can study now in more detail the description operators, one defines as dissipation a map  on
of the dynamics in terms of automorphism of a von Neumann algebra M which satisfies (a a)

Wigner or Kadison type when it refers to states a (a) (a )a for all a 2 M. The positive dissipa-
and of Segal type when it refers to observables. We tion  is called completely positive if it remains
require that the evolution be continuous in suitable positive after tensorization with B(K) for any
Introductory Article: Quantum Mechanics 119

Hilbert space K. Notice that according to this the essentially self-adjoint operator that acts on the
definition every  -derivation is a completely positive smooth functions with compact support as multi-
dissipation. For dissipations there is an analog of the plication by the coordinate x and p ^ is defined
theorem of Stinespring, and often bounded dissipa- similarly in Fourier space. This representation can
tion can be written as be trivially generalized to construct operators q ^k and
 X ^k in L2 (RN ).
p
X 1
a = ih; a 
Vk aVk  {Vk Vk ; a} Another frequently used representation of [12] is
2 on L2 (S1 ) (and when generalized to N degrees of
for a 2 M freedom, on T N ). In this representation, the operator
^
(the symbols {. , .} denote the anticommutator). PNis defined
p
ik=2
by ck ! kck on functions f () =
c
k = M k e , 0
M, N < 1. In this case the
In general terms, by quantization is meant the operator q ^ is defined as multiplication by the angle
construction of a theory by deforming a commutative coordinate . It is easy to check that this representa-
algebra of functions on a classical phase X in such a tion is inequivalent to the previous one and that [12]
way that the dynamics of the quantum system can be is satisfied (as an identity) on the (dense) set of
derived from the prescription of deformation, usually vectors which are in the domain both of p ^q^ and
by deforming the Poisson brackets if X is a cotangent of q^p^. But notice that the domain of essential self-
bundle T  M (Halbut 2002, Landsman 2002). We adjointness of p ^ is not left invariant by the action of
shall discuss only the Weyl quantization (Weyl 1931) ^ (f () is a function on S1 only if f (2) = 0).
q
that has its roots in Heisenbergs formulation of QM We shall denote p ^ in this representation by the
and refers to the case in which the configuration space symbol @=@per and refer to it as the Bloch
is RN , or, with some variant (FloquetZak) the representation. It can be modified by setting the
N-dimensional torus. We shall add a few remarks action of p ^ as cn ! ncn , 0 <  < 2, and this
on the Wick (anti-Weyl) quantization. More general gives rise to the various BlochZak and magnetic
formulations are needed when one tries to quantize a representations.
classical system defined on the cotangent bundle of The Bloch representation can be extended to
a generic variety and even more so if it defined on a periodic functions on R1 noticing that L2 (R) =
generic symplectic manifold. L2 (S1 )  l2 (N); similarly, the BlochZak and the
The Weyl quantization is a mathematically accu- magnetic representation can be extended to L2 (RN ).
rate rendering of the essential content of the The difference between the representations can be
procedure adopted by Born and Heisenberg to seen more clearly if one considers the one-parameter
construct dynamics by finding operators which groups of unitary operators generated by the
play the role of symplectic coordinates. canonical operators q ^ and p^. In the Schrodinger
Consider a system with one degree of freedom. representation on L2 (R), these groups satisfy
The first naive attempt would be to find operators
q ^ that satisfy the relation
^, p UaVb eiab VbUa
^
^ iI
q; p
^ 12 Ua eia^q ; Vb eibp

and to construct the Hamiltonian in analogy with and therefore, setting z = a ib and W(z) 
the classical case. To play a similar role, the eiab=2 V(b)U(a) one has
operators q^ and p ^ must be self-adjoint and satisfy 0
[12] at least in a weak sense. If both are bounded, WzWz0 ei!z;z =2 Wz z0
13
[12] implies eibp^ q
^eibp^ = q
^ bI (the exponential is z 2 C; !z; z0 Imz; z0
defined through a convergent series) and therefore
the spectrum of q ^ is the entire real line, a contra- The unitary operators W(z) are therefore projective
diction. Therefore, that inclusion sign in [12] is strict representations of the additive group C. This
and we face domain problems, and as a consequence generalizes immediately to the case of N degrees
[12] has many inequivalent solutions (equivalence of freedom; the representation is now of the
here means unitary equivalence). additive group CN and ! is the standard symplectic
Apart from pathological ones, defined on form on CN .
L2 -spaces over multiple coverings of R, there are In the Bloch representation, the unitaries
inequivalent solutions of [12] which are effectively U(a)V(b)U (a)V  (b) are not multiples of the iden-
used in QM. tity, and have no particularly simple form. The map
The most common solution is on the Hilbert space CN 3 z ! W(z) with the structure [13] is called Weyl
L2 (R) (with Lebesgue measure), with x ^ defined as system; it plays a major role in QM. The following
120 Introductory Article: Quantum Mechanics

theorem has therefore a major importance in the is the formal adjoint of ak in L2 (R). One has
mathematical theory of QM. jak (Nk 1)1=2 j < 1. In the domain of N these
operators satisfy the following relations (canonical
Theorem 2 (von Neumann 1965). There exists
commutation relations)
only one, modulo unitary equivalence, irreducible
representation of the Weil system. ak ; ah  k;h ; ah ; ak  0
15
The proof of this theorem follows a general Nk ; ah  ah h;k ; Nh ; ak  ak h;k
pattern in the theory of group representations. One
In view of the last two relations, the operator ak is
introduces an algebra W (N) of operators
called the annihilation operator (relative to the kth
Z
degree of freedom) and its formal adjoint is called
Wf  f zWzdz; f 2 L1 CN the creation operator. The operators ak have as
spectrum the entire complex plane, the operators ak
called Weyl algebra. have empty spectrum; the eigenvectors of Nk are the
It easy to see that jWf j = jf j1 and that f ! Wf is a Hermite polynomials in the variable xk . The
linear isomorphism of algebras if one considers W (N) eigenvectors of ak (i.e., the solutions in L2 (R) of
with its natural product structure and L1 as a the equation ak  =  , 2 C) are called coherent
noncommutative algebra with product structure states; they have a major role in the Bargmann
Z
i FockSegal quantization and in general in the
f  g  dz0 f z  z0 gz0 exp !z; z0 14 semiclassical limit.
2
The operators {Nk } generate a maximal abelian
So far the algebra W (N) is a concrete algebra of system and therefore the space L2 (RN ) has a natural
bounded operators on L2 (R2 ). But it can also be representation as the symmetrized subspace of
considered an abstract C -algebra which we still k (CN )k (Fock representation). In this representa-
denote by W (N) . tion, a natural basis is given by the common
It is easy to see that, according to [14], if f0 is eigenvectors {nk } , k = 1, . . . , N, of the operators Nk .
chosen to be a suitable Gaussian, then Wf0 is a A generic vector can be written as
projection operator which commutes with all the
X X
Wf s. Moreover, Wf Wg = f , g Wf g for a suitable cfnk g fnk g ; jcfnk g j2 < 1
phase factor . Considering the GelfandNeumark fnk g fnk g
Segal construction for the C -algebra W (N) , one
finds that these properties lead to a decomposition and therefore can be represented by the sequence c{nk } .
of any representation in cyclic irreducible equivalent Notice that the creation operators do not create
ones, completing the proof of the theorem. particles in RN but rather act as a shift in the basis
The Weyl system has a representation (equivalent of the Hermite polynomials.
to the Schrodinger one) in the space L2 (RN , g), It is traditional to denote by (L2 (RN )) the Fock
where g is Gausss measure. This allows an exten- representation (also called second quantization
sion in which CN is replaced by an infinite- because for each degree of freedom the wave
dimensional Banach space equipped with a Gauss function is written in the quantized basis of the
measure (weak distribution (Segal 1965, Gross harmonic oscillator) and to denote by (A) the lift
1972, Wiener 1938)). Uniqueness fails in this more of a matrix A 2 B(CN ). These notations are espe-
general setting (uniqueness is strictly connected with cially used if CN is substituted with a Banach space
the compactness of the unit ball in CN ). Notice that X. This terminology was introduced by Segal in his
in the Schrodinger representation (and, therefore, in work on quantization of the wave equation; it is
any other representation) the Hamiltonian for the used ever since, mostly in a perturbative context.
harmonic oscillator defines a positive self-adjoint In the theory of quantized fields, the space CN is
operator substituted with a Banach space, X, of functions.
In this setting, second quantization (Segal 1965,
X
N
@2 Nelson 1974) considers the state {nk } as represent-
N Nk ; Nk  x2k  1
@xk2 ing a configuration of the system in which there are
1
precisely nk particles in the kth physical state (this
The spectrum of each of the commuting operators presupposes having chosen a basis in the space of
Nk consists of the positive integers (including 0) and distribution on R3 ). There is no problem in doing
is therefore called number operator for the kth this (Gross 1972) and one can choose for X a
degree of freedom. The operatorp Nk can be written suitable Sobolev space (which one depends on the
as Nk = ak ak , where ak = (1= 2)(xk @=@xk ) and ak Gaussian measure given in X) if one wants that the
Introductory Article: Quantum Mechanics 121

generalization of the commutation relations [15] be is reasonable to introduce the following relations
of the form [a (f ), a(g)] = < f , g> with a suitable (canonical anticommutation relations:
scalar product <  ,  > in X. The problem with
quantization of relativistic fields is that, in order to fak ; ah g k;h ; fah ; ak g 0
16
ensure locality, one is forced to use a Sobolev space Nk ; ah  ah h;k ; fA; Bg  AB  BA
of negative index (depending on the dimension of
The Hilbert space is now N H2 , where H2 is a
physical space), and this gives rise to difficulties in
two-dimensional complex Hilbert space. Notice that
the definition of the dynamics for nonlinear vector
H2 carries an irreducible two-dimensional represen-
fields.
tation of sU(2)  o(3) (spin representation) so that
One should notice that in the work of Segal
this quantization associates spin 1/2 and
(1965), and then in Constructive field theory antisymmetry.
(Nelson 1974), the Fock representation is placed in
The operators in [16] are all bounded (in fact
a Schrodinger context exhibiting the relevant opera-
bounded by 1 in norm). The Fock representation is
tors as acting on a space L2 (X, g), where X is a
constructed as in the case of Weyl (see Araki
subspace of the space of Schwartz distributions on
(1988)), with nk equal 0 or 1 for each index k.
the physical space of the particles one wants to
The infinite-dimensional case is defined in the same
describe and g is a suitably defined Gauss measure
way, and leads to inequivalent irreducible represen-
on X.
tations (Araki 1988); only in one of them is the
The Fock representation is related to the Bargmann number operator defined and bounded below. Some
FockSegal representation (Bargmann 1967), a repre-
of these representations can be given a Schrodinger-
sentation in a space of holomorhic functions on CN
like form, with the introduction of a gauge and an
square integrable with respect to a Gaussian measure.
integration formalism based on a trace (Gross
For its development, this representation relies on the
1972). This system is much used in quantum
properties of Toeplitz operators and on Tauberian
statistical mechanics because it deals with bounded
estimates. It is much used in the study of the
operators and can take advantage of strong results
semiclassical limit and in the formulation of QM in
in the theory of C -algebras. In the finite-dimensional
systems for which the classical version has, for phase case (and occasionally also in the general case) it is
space, a manifold which is not a cotangent bundle
used in quantum information (the space H2 is the
(e.g., the 2-sphere).
space of a quantum bit).
Remark The Fock representation associated with Returning to the Weyl system, we now introduce
the Weyl system in the infinite-dimensional context the strictly related Wigner function which plays an
can describe only particles obeying BoseEinstein important role in the analysis of the semiclassical
statistics; indeed, the states are qualified by their limit and in the discussion of some scaling limits, in
particle content for each element of the basis chosen particular the hydrodynamical limit and the Bose
and there is no possibility of identifying each Einstein condensation when N ! 1.
particle in an N-particle state. This is obvious in The Wigner function W for a pure state  is a
the finite-dimensional case: the Hermite polynomial real-valued function on the phase space of the
of order 2 cannot be seen as composed of two classical system which represents the state faithfully.
polynomials of order 1. It is defined as
Z 
In the infinite-dimensional context, if one wants y  y
to treat particles which obey FermiDirac statistics, W x; 2n ei ;x x x dy
Rn 2 2
one must rely on the Pauli exclusion principle (Pauli
1928), which states that two such particles cannot The Wigner function is not positive in general (the
be in the same configuration; to ensure this, the only exceptions are those Gaussian states that satisfy
wave function must be antisymmetric under permu- (x)  (p)
h). But is has the interesting property
tation of the particle symbols. It is a matter of fact that its marginalsRreproduce correctly the Born rule.
(and a theorem in relativistic quantum field theory 2 . If the func-
In fact, one has W (x, ) dx = j( )j
n
which follows in that theory from covariance, tion (t, x) x 2 R is a solution of the free Schrodinger
locality and positivity of the energy (Streater and equation ih@=@t = h2  then its Wigner function
Wightman 1964) that particles with half-integer spin satisfies the Liouville (transport) equation @W =@t
obey the FermiDirac statistics. Therefore, to quan-  rW = 0.
tize such systems, one must introduce (commuta- The Wigner function is strictly linked with the
tion) relations different from those of Weyl. Since it Weyl quantization. This quantization associates
must now be that (a )2 = 0, due to antisymmetry, it with every function
(p, x) in a given regularity
122 Introductory Article: Quantum Mechanics

class an operator
(D, x) (the Weyl symbol of the Under the correspondence A $ A, ^ linear symplec-
function
) defined by tic maps correspond to unitary transformations.
Z This is not in general the case for nonlinear maps.

D; xf ; g 
; xWf ; g ; x d dx One can prove that conditions (1)(5) give
a complete characterization of the map A $ A. ^
Z  p p Moreover, the correspondence cannot be extended
Wf ; g ; x  ei ;p f x ; x  dp
2 2 to other functions in phase space. Indeed, one has:
It can be verified that the action of F preserves the Theorem 3 (van Hove). Let G be the class of
Schwartz classes S and S0 and is unitary in L2 (R2N ). functions C1 on R2N which are generators of global
Moreover, one has
(D, x) =
(D, x). symplectic flows. For g 2 G let g (t) be the
The relation between Weyls quantization and corresponding group. There cannot exist for every
Wigner functions can be readily seen from the g a correspondence g $ ^g, with ^g self-adjoint, such
natural duality between bounded operators and that ^g(x, p) = g(^ ^).
x, p
pure states:
Z We described the Weyl quantization as a corre-
^ ^  ap; qp; q dp dq
trA spondence between functions in the Schwartz class S
Z and a class of bounded operators. Weyls quantiza-
0 tion can be extended to a much wider class of
p; q eip;q q0 ; q dq0
functions. Operators that can be so constructed are
called Fourier integral operators. One uses the
We give now a brief discussion of the general notation

(D, x).
structure of a quantization, and apply it to the We have the following useful theorems (Robert
Weyl quantization. By quantization of a Hamilto- 1987):
nian system we mean a correspondence, parame-
trized by a small parameter  h, between classical Theorem 4 Let l1 , . . . , lK be linear functions on RN
observables (real functions on a phase space F ) and such that {li lk } = 0. Let P be a polynomial and let
quantum observables (self-adjoint operators on a
( , x)  P[l1 ( , x), lK ( , x)]. Then
Hilbert space H) with the property that the (i)
(D, x) maps S in L2 (RN ) and self-adjoint;
corresponding structures coincide in the limit h ! 0 (ii) if g is continuous, then (g(
)(D, x) = g(
(D, x)).
and the difference for  h 6 0 can be estimated in a
suitable topology. One proves that
(D, x) extends to a continuous
This last requirement is important for the applica- map S0 (X) ! S0 (X) and, moreover,
tions and, from this point of view, Weyls quantiza- Theorem 5 (CalderonVaillancourt). If
0 
tion gives stronger results than the other formalisms P  
jjjj 2N1 jD Dx
j < 1 the norm of the opera-
of quantization.
tor
(D, x) is bounded by
0 .
We limit our analysis to the case F  T  X, with
X  RN , and we make use of the realization of H as Any operator obtained from a suitable class of
L2 (RN ). functions through Weyls quantization is called a
Let {xi } be Cartesian coordinates in RN and pseudodifferential operator. If
(q, p) = P(p), where
consider a correspondence A ! A ^ that satisfies the P is a polynomial,
(p,
q) is a differential operator.
following requirements: Moreover, if
(p, x) 2 L2 then
(D, x) is a
^ is linear; HilbertSchmidt operator and
1. A$A
2. xk $ x^k where x ^k is multiplication by xk ; Z 1=2
n=2 2
3. pk $ i h@=@xk ; j
D; xjHS 2h jAzj dz
4. if f is a continuous function in RN , one has
f (x) $ f (^ x) and ^f (p) = (Ff )(^
x), where F denotes a
Pseudodifferential operators turn out to be very
Fourier transform;
^  ,   (, ), ,  2 RN , where L is the important in particular in the quantum theory of
5. L $ L
molecules (Le Bris 2003), where adiabatic analysis
generator of the translations in phase space in
^  is the generator of the one- and Peierls substitution rules force the use of
the direction  and L
pseudodifferential operators.
parameter group t ! W(t) associated with  by
The next important problem in the theory of
the Weyl system.
quantization is related to dynamics.
Note that (1) and (4) imply (2) and (3) through a Let  be a quantization procedure and let H(p, q)
limit procedure. be a classical Hamiltonian on phase space. Let At be
Introductory Article: Quantum Mechanics 123

the evolution of a classical observable A under the annihilation operators by placing all creation opera-
flow defined by H and assume that (At ) is well tors to the left.
defined or all t. We now come back to Schrodingers equation and
Is there a self-adjoint operator H ^ such that notice that it can be derived within Heisenbergs
^ ^
(At ) = eitH (A) eitH ? If so, can one estimate formalism and Weyls quantization scheme from the
^  (H)j? Conversely, if the generator of the
jH Hamiltonian of an N-particle system in Hamiltonian
quantized flow is, by definition, H ^ (as is usually mechanics (at least if one neglects spin, which has
assumed), is it possible to give an estimate of the no classical analog).
difference j(At )  ((A))t j for a dense set of  2 Apart from (often) inessential parameters, the
^ ^ ~ t  At j ,
H, where At  eitH AeitH , or to estimate jA 1 Schrodinger equation for N scalar particles in R3
where A ~ t is defined by (A~ t ) = ((A)) . Is it possible can be written as
t
to write an asymptotic series in  h for the differences?
For the Weyl quantization some quantitative @ X N
ih ihrk Ak 2  V   H
results have been obtained if one makes use of the @t k1 17
semiclassical observables (Robert 1987). We shall
 2 L2 R3N
not elaborate further on this point.
For completeness, we briefly mention another where Ak are vector-valued functions (vector poten-
quantization procedure which is often used in tials) and V = Vk (xk ) Vi, k (xi  xk ) are scalar-
mathematical physics. valued function (scalar potentials) on R3 .
Typical problems in Schrodingers quantum
Wick Quantization mechanics are:
This quantization assigns positive operators to 1. Self-adjointness of H, existence of bound states
positive functions, but does not preserve polynomial (discrete spectrum of the operator), their number
relations. It is strictly related to the Bargmann and distribution, and, in general, the properties
FockSegal representation. of the spectrum.
Call coherent state centered in the point (y, ) of 2. Existence, completeness, and continuity proper-
phase space the normalized solution of (ip ^x ^ ties of the wave operators
i x)y,  (x) = 0.
Wicks quantization of the classical observable A W  s  lim eitH0 eitH 18
1
is by definition the map A ! OpW (A), where
Z and the ensuing existence and properties of the
OpW A  2 hn Ay;  ; y; y; dy d S-matrix and of the scattering cross sections. In
[18] H0 is a suitable reference operator, usually
One can prove, either directly or going through  (with periodic boundary conditions if the
Weyls representation, that potentials are periodic in space), for which
Schrodingers equation can be somewhat analy-
1. if A
0 then OpW
h (A)
0; tically controlled.
2. the Weyl symbol of the operator OpW h (A) is 3. Existence and property of a semiclassical limit.
Z Z
1 2 2
hn
 Ay; ehxy   dy d In [17] and [18] we have implicitly assumed that H
is time independent; very interesting problems arise
3. for every A 2 O(0) one has kOpW ^
h (A)  Ak = when H depends on time, in particular if it is
O(h).
periodic or quasiperiodic in time, giving rise to
Wicks quantization associates with every vector ionization phenomena. In the periodic case, one is
 2 H a positive Radon measure  in phase R space, helped by Floquets theory, but even in this case
called Husimi measure. It is defined by A d = many interesting problems are still unsolved.
(OpWh (A)  ), A 2 S(z). Wicks quantization is less
 If the potentials are sufficiently regular, the
adapted to the treatment of nonrelativistic particles, spectrum of H consists of an absolutely continuous
in particular Eherenfests rule does not apply, and part (made up of several bands in the space-periodic
the semiclassical propagation theorem has a more case) and a discrete part, with few accumulation
complicated formulation. It is very much used for points.
the analysis in Fock space in the theory of quantized On the contrary, if V(x, !) is a measurable
relativistic fields, where a special role is assigned to function on some probability space , with a
Wick ordering, according to which the polynomials suitable distribution (e.g., Gaussian), the spectrum
in x
^h and p ^h are reordered in terms of creation and may have totally different properties almost surely.
124 Introductory Article: Quantum Mechanics

For example, in the case N = 1 (so that the terms Vi, j  ! 0, a very singular PDE (the coefficients of the
h
are absent) in one and two spatial dimensions the differential terms go to zero in this limit).
spectrum is pure point and dense, with eigenfunctions Dividing each term of the equation by h (because
which decrease at infinity exponentially fast (although we do not want to change the scale of time) leads, in
not uniformly); as a consequence, the evolution group the case of one quantum particle in R3 in potential
does not give rise to a dispersive motion. The same is field V(x) (we treat, for simplicity, only this case), to
true in three dimensions if the potential is sufficiently the equation
strong and the kinetic energy content of the initial state
@x; t
is sufficiently limited. This very interesting behavior is i hx; t h1 Vxx; t 19
due roughly to the randomness of the barriers @t
generated by the potential and is also present, to a It is convenient therefore to rescale the spatial
large extent, for potentials quasiperiodic in space variables by a factor p h1=2 (i.e., choose different
(Pastur and Figotin 1992). units) setting x = hX and look for solutions of [19]
In these as well as in most problems related which remain regular in the limit h ! 0 as functions
to Schrodingers equation, a crucial role is taken of the rescaled variable X. One searches therefore
by the resolvent operator (H  I)1 , where is for solutions that on the physical scale have
any complex number outside the spectrum of H; support that becomes vanishingly small in the
many of the results are obtained when the difference limit. It is therefore not surprising that, in the limit,
(H  I)1  (H0  I)1 is a compact operator. these solutions may describe point particles; the
Problems of type (1) and (2) are of great physical main result of semiclassical analysis is that he
interest, and are of course common with theoretical coordinates of these particles obey Hamiltons laws
physics and quantum chemistry (Le Bris 2003), of classical mechanics.
although the instruments of investigation are some- This can be roughly seen as follows (accurate
what different in mathematical physics. The semi- estimates are needed to make this empirical analysis
classical limit is often more of theoretical interest, precise). Using multiscale analysis, one may write the
but its analysis has relevance in quantum chemistry solution in the form (X, x, t) and seek solutions
and its methods are very useful whenever it is which are smooth in X and x. Both terms on the right-
convenient to use multiscale methods, as in the hand side of p[19] contain contributions of order 2
study of molecular spectra. and 1 in h and in order to have regular solutions
We start with a brief description of point (3); it one must have cancellations between equally singular
provides a valid instrument in the description of contributions. For this, one must perform an expan-
quantum-mechanical systems at a scale where it is sion to the second order of the potential (assumed at
convenient to use units in which the physical least twice differentiable) around a suitable trajectory
constant  h has a very small value ( h 1027 in q(t), q 2 R3 , and choose this trajectory in such a way
CGS units). From Heisenbergs commutation rela- that the cancellations take place.
tions, [^ ^] 
x, p hI, it follows that the product of the A formal analysis shows that this is achieved only
dispersion (uncertainty) of the position and momen- if the trajectory chosen is precisely a solution of the
tum variables is proportional to  h and therefore at classical Lagrange equations. Of course, a more
least one of these two quantities must have very refined analysis and good estimates are needed to
large values (compared to  h). One considers usually make this argument precise, and to estimate the
the case in which these dispersions have comparable error that is made when one p neglects in the resulting
values, which is therefore very small, of the order of equation terms of order h; in favorable cases, for
magnitude  h1=2 (but very large as compared with h). each chosen T the error in the solution for most
In order to make connection with the Hamilton initial
p conditions of the type described is of order
Jacobi formalism of classical mechanics one can also h for jtj < T.
consider the case in which the dispersion in This semiclassical result is most easily visualized
momentum is of the order  h (the WKB method). using the formalism of Wigner functions (the
The semiclassical limit takes advantage mathema- technical details, needed to to make into a proof
tically from the fact that the parameter h is very the formal arguments, take advantage of regularity
small in natural units, and performs an asymptotic estimates in the theory of functions).
analysis, in which the terms of lowest order are In natural units, one defines
exactly described and the difference is estimated.  N  
The problem one faces is that the Schrodinger i
Wh; x; ; t W x; ; t
equation becomes, in the mathematical limit 2 h
Introductory Article: Quantum Mechanics 125

In terms of the Wigner function Wh,  the Schrodin- success. We give here a very naive introduction to
ger equation [19] takes the form these problems and refer the reader to the more
specialized contributions to this Encyclopedia for a
@f h rigorous analysis and exact statements.
 rx f h Kh  f h 0
@t 20 Of course, most of the problems of physical
h t 0 0  h interest are not exactly solvable, in the sense that
rarely the final result is given explicitly in terms of
where
simple functions. As a consequence, exact numerical
    
i i ; y 1 hy
 hy results, to be compared with experimental data, are
Kh e h V x V x rarely obtained in physically relevant problems, and
2N 2 2
most often one has to rely on approximation
It can be proved (Robert 1987) that if the potential schemes with (in favorable cases) precise estimates
is sufficiently regular and if the initial datum on the error.
converges in a suitable topology to a positive Formal perturbation theory is the easiest of such
measure f0 , then, for all times, Wh,  (x, t) converges schemes, but it seldom gives reliable results to
to a (weak) solution of the Liouville equation physically interesting problems. One writes
@f H  H V 21
 rx f  rVx  r f 0
@t
where  is a small real parameter, and sets a formal
This leads to the semiclassical limit if, for example, scheme in case (1) by writing
one considers a sequence of initial data n where n
is a sequence of functions centered at x0 with X
1 X
1

Fourier transform centered at p0 and dispersion of H   E  ; E  k Ek ;   k k


0 0
order  h1=2 both in position and in momentum. In
this case, the limit measure is a Dirac measure and, in case (2), iterating Duhamels formula
centered on the classical paths. Z t
In the course of the proof of the semiclassical limit e itH
e itH0
i eitsH VeisH0 ds 22
theorem, one becomes aware of the special status of 0
the Hamiltonians that are at most quadratic in x ^ and Very seldom the perturbation series converges, and
^. Indeed, it is easy to verify that for these
p one has to resort to more refined procedures.
Hamiltonians the expectation values of x ^ and p ^
In some cases, it turns out to be convenient to
obey the classical equation of motion (P Ehrenfest consider the formal primitive E ~  of E (as a
rule). differentiable function of ) and prove that it is
From the point of view of Heisenberg, this can be differentiable in  for 0 <  < 0 (but not for  = 0).
understood as a consequence of the fact that In favorable cases, this procedure may lead to
operators at most bilinear in a and a form an
algebra D under commutation and, moreover, the X
N

homogeneous part of order 2 is a closed subalgebra E k Ek RN ; lim jRN j 1


N!1
0
such that its action on D (by commutation) has the
same structure as the algebra of generators of the with explicit estimates of jRN ()j for 0  < 0 .
Hamiltonian flow and its tangent flow. Apart from Re-summation techniques of the formal power
(important) technicalities, the proof of the semiclas- series may be of help in some cases.
sical limit theorem reduces to the proof that one can The estimate of the lowest eigenvalues of an
estimate the contribution of the terms of order operator bounded below is often done by variational
higher than 2 in the expansion of the quantum analysis, making use of minmax techniques applied
Hamiltonian at the classical trajectory as being of to the quadratic form Q()  (, H).
order  h1=2 in a suitable topology (Hepp 1974). Semiclassical analysis can be useful to search for
We end this overview by giving a brief analysis of the distribution of eigenvalues and in the study of
problems (1) and (2), which refer to the description the dynamics of states whose dispersions both in
of phenomena that are directly accessible to com- position and in momentum are very large in units in
parison with experimental data, and therefore have which h = 1.
been extensively studied in theoretical physics and A case of particular interest in molecular and
quantum chemistry (Mc Weeny 1992); some of atomic physics occurs when the physical parameters
them have been analyzed with the instruments of which appear in H (typically the masses of the
mathematical physics, often with considerable particles involved in the process) are such that one
126 Introductory Article: Quantum Mechanics

can a priori guess the presence of coordinates which apt to concur with mathematical investigation to a
have a rapid dependence on time (fast variables) and fuller comprehension of QM.
a complementary set of coordinates whose depen-
dence on time is slow. This suggests that one can try
an asymptotic analysis, often in connection with Interpretation Problems
adiabatic techniques. Seldom one deals with cases in
In this section we describe some of the conceptual
which the hypotheses of elementary adiabatic
problems that plague present-day QM and some of
theorems are satisfied, and one has to refine the
the attempts that have been made to cure these
analysis, mostly through subtle estimates which
problems, either within its formalism or with an
ensure the existence of quasi invariant subspaces.
altogether different approach.
Asymptotic techniques and refined estimates are
also needed to study the effective description of a
Approaches within the QM Formalism
system of N interacting identical particles when N
becomes very large; for example, in statistical We begin with the approaches from within. We
mechanics, one searches for results which are valid have pointed out that the main obstacle in the
when N ! 1. measurement problem is the description of what
The most spectacular results in this direction are occurs during an act of measurement. Axiom III
the proof of stability of matter by E Lieb and claims that it must be seen as a destruction act,
collaborators, and the study of the phenomenon of and the outcome is to some extent random. The
BoseEinstein condensation and the related Gross final state of the system is one of the eigenstates of
Pitaevskii (nonlinear Schrodinger) equation. The the observable, and the dependence on the initial
experimental discovery of the state of matter state is only through an a priori probability assign-
corresponding to a BoseEinstein condensate is a ment; the act of measurement is therefore not a
clear evidence of the nonclassical behavior of matter causal one, contrary to the (continuous) causal
even at a comparatively macroscopic size. From the reversible description of the interaction with the
point of view of mathematical physics, the ongoing environment. One should be able to distinguish
research in this direction is very challenging. a priori the acts of measurement from a generic
One should also recognize the increasing role that interaction.
research in QM is taking in applications, also in There is a further difficulty. Due to the super-
connection with the increasing success of nanotech- position principle, if a system S on which we want
nology. In this respect, from the point of view of to make a measurement of the property associated
mathematical physics, the study of nanostructure with the operator A interacts with an instrument
(quantum-mechanical systems constrained to very I described by the operator S, the final state of the
small regions of space or to lower-dimensional combined system will be a coherent superposition of
manifolds, such as sheets or graphs) is still in its tensor product of (normalized) eigenstates of the
infancy and will require refined mathematical two systems
techniques and most likely entirely new ideas. X X
Finally, one should stress the important role cn;m A S
n  m; jcn;m j2 1 23
n;m n;m
played by numerical analysis (Le Bris 2003) and
especially computer simulations. In problems involv- Measurement as described by Axiom III of QM
ing very many particles, present-day analytical claims that once the measurement P is over, the
techniques provide at most qualitative estimates measured system is, with probability m jcn, m j2 , in
and in favorable cases bounds on the value of the the state A n and the instrument is in a state which
quantities of interest. Approximation schemes are carries the information about the final state of the
not always applicable and often are not reliable. system (after all, what one reads at the end is an
Hints for a progress in the mathematical treatment indicator of the final state of the instrument).
of some relevant physical phenomena of interest in It is therefore convenient to write in the form
QM (mostly in condensed matter physics) may come X X
from the ab initio analysis made by simulations on dn A
n  n ; jdn j2 1 24
n
large computers; this may provide a qualitative and,
to a certain extent, quantitative behavior of the (this defines n if the spectrum of A is pure point and
solutions of Schrodingers equation corresponding to nondegenerate). It is seen from [24] that, due to the
typical initial conditions. In recent times the reduction postulate, we know that the the measured
availability of more efficient computing tools has system is in the state A n0 if a measurement of an
made computer simulation more reliable and more observable T with nondegenerate spectrum,
Introductory Article: Quantum Mechanics 127

eigenvectors {n }, and eigenvalues {zn } gives the measured system, and these are the observables that
results zn0 . specify the outcome of the measurement in prob-
Along these lines, one does not solve the measure- abilistic terms.
ment problem (the outcome is still probabilistic) but The scattering approach relies on the Schrodinger
at least one can find the reason why the measuring approach to QM, and on results from the theory of
apparatus may be considered classical. scattering. This approach describes the interaction of
It is more convenient to go back to [23] and to the system S (typically a heavy particle) with an
assume that one is able to construct the measuring environment made of a large number of light particles
apparatus in such a way that one divides (roughly) and seeks to describe the state of S after the
its pure (microscopic) states in sets n (each interaction when one does not have any information
corresponding to a macroscopic state) which are on the final state of the light particle. One seeks to
(roughly) in one-to-one correspondence to the prove that the reduced density matrix is (almost)
eigenstates of A. The sets n contain a very large diagonal in a given representation (typically the one
number, Nn , of elements, so that the sets n need given by the spatial coordinates). This defines the
not be given with extreme precision. And the sets n observable (typically, position) that can be measured
must be in a sense stable under small external and the probability of each outcome.
perturbations. Both approaches rely on the loss of information in
It is clear from this rough description that the the process to cancel the effect of the superposition
apparatus should contain a large number of small principle and to bring the measurement problem
components and still its interaction with the small within the realm of classical probability theory.
system A should lead to a more or less sudden None of them provides a causal dependence of the
change of the sets n . result of the measurement on the initial state of the
A concrete model of this mechanism has been system.
proposed by K Hepp (1972) for the case when A is a We describe only very briefly these attempts.
2  2 matrix, and the measuring apparatus is made In its more basic form, the scattering approach
of a chain of N spins, N ! 1; the analysis was has as starting point the Schrodinger equation for a
recently completed by Sewell (2005) with an system of two particles, one of which has mass very
estimate on the error which is made if N is finite much smaller than the other one. The heavy particle
but large. This is a dynamical model, in which the may be seen as representing the system on which a
observable A (a spin) interacts with a chain of spins measurement is being made. The outline of the
(moves over the spins) leaving the trace of its method of analysis (which in favorable cases can be
passage. It is this trace (final macroscopic state of made rigorous) (Joos and Zeh 1985, Tegmark 1993)
the apparatus) which is measured and associated is the following. One chooses units in which the
with the final state of A. The interaction is not mass of the heavy particle is 1, and one denotes by 
instantaneous but may require a very short time, the mass of the light particle. If x is the coordinate
depending on the parameters used to describe the of the heavy particle and y that of the light one, and
apparatus and the interaction. if the initial state of the system is denoted by
We call decoherence the weakening of the 0 (x, y), the solution of the equation for the system
superposition principle due to the interaction with is (apart from inessential factors)
the environment.
Two different models of decoherence have been t expfix  1 y Wx Vx  ytg0
analyzed in some detail; we shall denote them
thermal-bath model and scattering model; both are Making use of center-of-mass and relative coordi-
dynamical models and both point to a solution, to nates, one sees that when  is very small one should
various extents, of the problem of the reduction to a be able to describe the system on two timescales,
final density matrix which commutes with the one fast (for the light particle) and one slow (for the
operator A (and therefore to the suppression of the heavy one) and, therefore, place oneself in a setting
interference terms). which may allow the use of adiabatic techniques. In
The thermal-bath model makes use of the this setting, for the measure of the heavy particle
Heisenberg representation and relies on results of (e.g., its position) one may be allowed to consider
the theory of C -algebras. This approach is closely the light particle in a scattering regime, and use the
linked with (quantum) statistical mechanics; its aim wave operator corresponding to a potential
is to prove, after conditioning with respect to the Vx (y)  V(y  x).
degrees of freedom of the bath, that a special role Taking the partial trace with respect to the
emerges for a commuting set of operators of the degrees of freedom of the light particle (this
128 Introductory Article: Quantum Mechanics

corresponds to no information of its final state) one So the appearance of classical properties of a
finds, at least heuristically, that the state of the quantum system corresponds to the emergence of
heavy particle is now described (due to the trace an algebra with nontrivial center. Since automorphic
operation) by a density matrix
for which in the evolutions of an algebra preserve its center, this
coordinate representation the off-diagonal terms program can be achieved only if we admit the loss of

x, x0 are slightly suppressed by a factor x, x0 = 1  quantum coherence, and this requires that the
(Wx , Wx0 ) where represents the initial state of quantum systems we describe are open and interact
the light particle and Wx is the wave operator for with the environment, and moreover that the
the motion of the light particle in the potential Vx . commutative algebra which emerges be stable for
One must assume that function  which represents time evolution.
the initial state of the heavy particle is sufficiently It may be shown that one must consider quantum
localized so that x, x0 < 1 for every x0 6 x in its environment in the thermodynamic limit, that is,
support. consider the interaction of the system to be
If the environment is made of very many measured with a thermal bath. A discussion of the
particles (their number N() must be such that possible emergence of classical observables and of
lim ! 0 N() = 1) and the heavy particle can be the corresponding dynamics is given by Gell-Mann
supposed to have separate interactions with all of (1993). In all these approaches, the commutative
them, the off-diagonal elements of the density subalgebra is selected by the specific form of the
matrix tend to 0 as  ! 0 and the resulting density interaction; therefore, the measuring apparatus
0 0
matrix tends to R have the form (x, x ) = (x  x ) determines the algebra of classical observables.
(x), (x)
0, (x) dx = 1. If it can be supposed On the experimental side, a number of very
that all interactions take place within a time T()  , interesting results have been obtained, using very
 > 0 one has (x) = j (x)j2 . refined techniques; these experiments usually also
If the interactions are not independent, the determine the decoherence time. The experimental
analysis becomes much more involved since it has results, both for the collision model (Hornberger
to be treated by many-body scattering theory; this et al. 2003) and for the thermal-bath model
suggests that the scattering approach can be hardly (Hackermueller et al. 2004), are done mostly with
used in the context of the thermal-bath model. In fullerene (a molecule which is heavy enough and is
any case, the selection of a preferred basis (the not deflected too much after a collision with a
coordinate representation) depends on the fact that particle of the gas). They show a reasonable
one is dealing with a scattering phenomenon. A few accordance with the (rough) theoretical conclusions.
steps have been made for a rigorous analysis (Teta The most refined experiments about decoherence
2004) but we are very far from a mathematically are those connected with quantum optics (circularly
satisfactory answer. polarized atoms in superconducting cavities). These
The thermal-bath approach has been studied are not related to the wave nature of the particles
within the algebraic formulation of QM and stands but in a sense to the wave nature of a photon as a
on good mathematical ground (Alicki 2002, single unit. The electromagnetic field is now
Blanchard et al. 2003, Sewell 2005). Its drawback regarded as an incoherent superposition of states
is that it is difficult to associate the formal scheme with an arbitrarily large number of photons.
with actual physical situations and it is difficult to Polarized photons can be produced one by one,
give a realistic estimate on the decoherence time. and they retain their individuality and their polar-
The thermal-bath approach attributes the deco- ization until each of them interacts with the
herence effect to the practical impossibility of environment (e.g., the boundary of the cavity or a
distinguishing between a vast majority of the pure particle of the gas). In a sense, these experimental
states of the systems and the corresponding statis- results refer to a decoherence by collision theory.
tical mixtures. In this approach, the observables are The experiments by Haroche (2003) prove that
represented by self-adjoint elements of a weakly coherence may persist for a measurable interval of
closed subalgebra M of all bounded operators B(H) time and are the most controlled experiments on
on a Hilbert space H. This subalgebra may depend coherence so far.
on the measuring apparatus (i.e, not all the
apparatuses are fit to measure a set of observables).
Other Approaches
A classical observable by definition commutes
with all other observables and therefore must belong We end this section with a brief discussion of the
to the center of A which is isomorphic to a problem of hidden variables and a presentation of
collection of functions on a probability space M. an entirely different approach to QM, originated by
Introductory Article: Quantum Mechanics 129

D Bohm (1952) and put recently on firm mathema- configuration of the points, the dynamics in a
tical grounds by Duerr et al. (1999). The approach is potential field V(x) is described in the following
radically different from the traditional one and it is way: for the wave  by a nonrelativistic Schrodinger
not clear at present whether it can give a solution to equation with potential V and for the coordinates by
the measurement problem and a description of all the ordinary differential equation (ODE)
the phenomena which traditional QM accounts for.  
But it is very interesting from the point of view of  rk 
x_ k h=mk Im x; xk 2 R3
the mathematics involved.  
We have remarked that the formulation of QM
that is summarized in the three axioms given earlier where mk is the mass of the mth particle.
has many unsatisfactory aspects, mainly connected Notice that the vector field is singular at the zeros
with the superposition principle (described in its of the wave function, therefore global existence and
extremal form by the Schrodingers cat paradox) uniqueness must be proved. To see why Bohmian
and with the problem of measurement which mechanics is empirically equivalent to QM, at least
reveals, for example, through the EinsteinRosen for measurement of position, notice that the
Podolski paradox, an intrinsic nonlocality if one equation for the points coincides with the continuity
maintains that their objective properties can be equation in QM. It follows that if one has at time
attributed to systems which are far apart. From the zero a collection of points distributed with density
very beginning of QM, attempts have been made to j0 j2 , the density at time t will be j(t)j2 where (t)
attribute these features to the presence of hidden is the solution of the Schrodinger equation with
variables; the statistical nature of the predictions initial datum 0 .
of QM is, from this point of view, due to the Bohm (1952) formulated the theory as a modi-
incompleteness of the parameters used to describe fication of Newtons laws (and in this form it has
the systems. The impossibility of matching the been widely used) through the introduction of a
statistical prediction of QM (confirmed by experi- quantum potential VQ . This was achieved by
mental findings) with a local theory based on hidden writing the wave function in its polar form
variables and classical probability theory has been  = ReiS=h and writing the continuity equation as a
known for sometime (Kochen and Specker 1967), modified HamiltonJacobi equation. The version of
also through the use of Bell inequalities (Bell Bohms theory discussed in Duerr et al. (1999)
1964) among correlations of outcomes of separate introduces only the guiding wave function and the
measurements performed on entangled system coordinates of the points, and puts the theory on
(mainly two photons or two spin-1/2 particles firm mathematical grounds. Through an impressive
created in a suitable entangled state). series of mathematical results, these authors and
A proof of the intrinsic nonlocality of QM (in the their collaborators deal with the completeness of
above sense) was given by L Hardy (see Haroche the velocity vector field, the asymptotic behavior of
(2003)). the points trajectories (both for the scattering regime
While experimental results prove that one and for the trapped trajectories, which are shown to
cannot substitute QM with a naive theory of correspond to bound states in QM), with a rigorous
hidden variables, more refined attempts may have analysis of the theorem on the flux across a surface
success. We shall only discuss the approach of Bohm (a cornerstone in scattering theory) and the detailed
(following a previous attempt by de Broglie) as analysis of the two-slit experiment through a
presented in Duerr et al. (1999). It is a dynamical study of the interaction with the measuring appara-
theory in which representative points follow classical tus. The theory is completely causal, both for the
paths and their motion is governed by a time- trajectories of the points and for the time develop-
dependent vector velocity field (in this sense, it is ment of the pilot wave, and can also accommodate
not Newtonian). In a sense, Bohmian mechanics is a points with spin. It leads to a mathematically precise
minimal completion of QM if one wants to keep the formulation of the semiclassical limit, and it may
position as primitive observable. To these primitive also resolve the measurement problem by relating
objects, Bohms theory adds a complex-valued func- the pilot wave of the entire system to its approximate
tion  (the guiding wave in Bohms terminology) decomposition in incoherent superposition of pilot
defined on the configuration space Q of the particles. wave associated with the particle and to the measur-
In the case of particles with spin, the function  is ing apparatus (this would be the way to see the
spinor-valued. Dynamics is given by two equations: collapse of the wave function in QM). A weak
one for the coordinates of the particles and one for point of this approach is the relation of the
the guiding wave. If x  x1 , . . . , xN describes the representative points with observable quantities.
130 Introductory Article: Quantum Mechanics

Further Reading Transactions Serial A Mathematical and Physics Engineering


Science 361: 13391347.
Alicki R (2004) Pure decoherence in quantum systems. Open Syst. Heisenberg W (1925) Uber Quantenteoretische Umdeutung
Inf. Dyn. 11: 5361. Kinematisches und Mechanischer Beziehungen. Zeitschrift fur
Araki H (1988) In: Jorgensen P and Muhly P (eds.) Operator Physik 33: 879893.
Algebras and Mathematical Physics, Contemporary Mathe- Heisenberg W (1926) Uber quantentheoretische Kinematik und
matics 62. Providence, RI: American Mathematical Society. Mechanik. Matematishes Annalen 95: 694705.
Araki H and Ezawa H (eds.) (2004) Topics in the theory of Hepp K (1974) The classical limit of quantum correlation functions.
Schroedinger Operators. River Edge, NJ: World Scientific. Communications in Mathematical Physics 35: 265277.
Bach V, Froelich J, and Sigal IM (1998) Quantum electrody- Hepp K (1975) Results and problem in the irreversible statistical
namics of constrained non relativistic particles. Advanced mechanics of open systems. Lecture Notes in Physics 39.
Mathematics 137: 299395. Berlin: Spriger Verlag.
Bargmann V (1967) On a Hilbert space of analytic functions and Hornberger K and Sype EJ (2003) Collisional decoherence
an associated integral transform. Communications of Pure and reexamined. Physical Reviews A 68: 012105, 116.
Applied Mathematics 20: 1101. Islop P and Sigal S (1996) Introduction to spectral theory with
Bell J (1966) On the problem of hidden variables in quantum application to Schroedinger operators. Applied Mathematical
mechanics. Reviews of Modem Physics 38: 42474280. Sciences 113. New York: Springer Verlag.
Blanchard P and DellAntonio GF (eds.) (2004) Multiscale Jammer M (1989) The Conceptual Development of Quantum
methods in quantum mechanics, theory and experiments. Mechanics, 2nd edn. Tomash Publishers, American Institute
Trends in Mathematics. Boston: Birkhauser. of Physics.
Blanchard P and Olkiewiz R (2003) Decoherence in the Joos E et al. (eds.) (2003) Quantum Theory and the Appearance
Heisenberg representation. International Journal of Physics B of a Classical World, second edition. Berlin: Springer Verlag.
18: 501507. Kochen S and Speker EP (1967) The problem of hidden variables
Bohm D (1952) A suggested interpretation of quantum theory in in quantum mechanics. Journal of Mathematics and
terms of hidden variables I, II. Physical Reviews 85: 161179, Mechanics 17: 5987.
180193. Le Bris C (2002) Problematiques numeriques pour la simulation
Bohr N (1913) On the constitution of atoms and molecules. moleculaire. ESAIM Proceedings of the 11th Society on
Philosophical Magazine 26: 125, 476502, 857875. Mathematics and Applied Industries, pp. 127190. Paris.
Bohr N (1918) On the quantum theory of line spectra. Kongelige Le Bris C and Lions PL (2005) From atoms to crystals: a
Danske Videnskabernes Selskabs Skrifter Series 8, IV, 1, 1118. mathematical journey. Bulletin of the American Mathematical
Born M (1924) Uber quantenmechanik. Zeitschrift fur Physik 32: Society (NS) 42: 291363.
379395. Lieb E (1990) From atoms to stars. Bulletin of the American
Born M and Jordan P (1925) Zur quantenmechanik. Zeitschrift Mathematical Society 22: 149.
fur Physik 34: 858888. Mackey GW (1963) Mathematical Foundations of Quantum
Born M, Jordan P, and Heisenberg W (1926) Zur quantenmecha- Mechanics. New YorkAmsterdam: Benjamin.
nik II. Zeitschrift fur Physik 35: 587615. Mc Weeny E (1992) An overview of molecular quantum
Cycon HL, Frese RG, Kirsh W, and Simon B (1987) Schroedinger mechanics. Methods of Computational Molecular Physics.
operators with application to quantum mechanics and geome- New York: Plenum Press.
try. Texts and Monogrphs in Physics. Berlin: Springer Verlag. Nelson E (1973) Construction of quantum fields from Markoff
de Broglie L (1923) Ondes et quanta. Comptes Rendue 177: fields. Journal of Functional Analysis 12: 97112.
507510. von Neumann J (1996) Mathematical foundation of quantum
DellAntonio GF (2004) On decoherence. Journal of Mathema- mechanics. Princeton Landmarks in Mathematics. Princeton
tical Physics 44: 49394955. NJ: Princeton University Press.
Dirac PAM (1925) The fundamental equations of quantum Nielsen M and Chuang I (2000) Quantum Computation and
mechanics. Proceedings of the Royal Society of London A Quantum Information. Cambridge, MA: Cambridge Uni-
109: 642653. versity Press.
Dirac PAM (1926) The quantum algebra. Proceedings of the Ohya M and Petz D (1993) Quantum Entropy and Its Use. Text
Cambridge Philosophical Society 23: 412428. and Monographs in Physics. Berlin: Springel Verlag.
Dirac PAM (1928) The quantum theory of the electron. Proc. Pauli W (1927) Zur Quantenmechanik des magnetische Elektron.
Royal Soc. London A 117: 610624, 118: 351361. Zeitschrift fur Physik 43: 661623.
Duerr D, Golstein S, and Zangh N (1996) Bohmian mechanics as Pauli W (1928) Collected Scientific Papers, vol. 2. 151160,
the foundation of quantum mechanics. Boston Studies Philo- 198213, 10731096.
sophical Society 184: 2144. Dordrecht: Kluwer Academic. Robert D (1987) Aoutur de lapproximation semi-classique.
Einstein A (1905) The Collected Papers of Albert Einstein, vol. 2, Progress in Mathematics 68. Boston: Birkhauser.
pp. 347377, 564585. Princeton, NJ: Princeton University Schroedinger E (1926) Quantizierung als Eigenwert probleme.
Press. Annalen der Physik 79: 361376, 489527, 80: 437490,
Einstein A (19241925) Quantentheorie des einatomigen idealen 81: 109139.
gases. Berliner Berichte (1924) 261267, (1925) 314. Segal I (1996) Quantization, Nonlinear PDE and Operator Algebra,
Gell-Mann M and Hartle JB (1997) Strong decoherence. pp. 175202. Proceedings of the Symposium on Pure Mathe-
Quantum-Classical Correspondence, pp. 335. Cambridge, matics 59. Providence, RI: American Mathematical Society.
MA: International Press. Sewell J (2004) Interplay between classical and quantum structure
Gross L (1972) Existence and uniqueness of physical ground state. in algebraic quantum theory. Rend, Circ. Mat. Palermo Suppl.
Journal of Functional Analysis 19: 52109. 73: 127136.
Haroche S (2003) Quantum Information in cavity quantum Simon B (2000) Schrodinger operators in the twentieth century.
electrodynamics. Royal Society of London Philosophical Journal of Mathematical Physics 41: 35233555.
Introductory Article: Topology 131

Streater RF and Wightman AS (1964) PCT, Spin and Statistics Wiener N (1938) The homogeneous chaos. American Journal of
and All That. New YorkAmsterdam: Benjamin. Mathematics 60: 897936.
Takesaki M (1971) One parameter autmorphism groups and Wigner EP (1952) Die Messung quantenmechanischer operatoren.
states of operator algebras. Actes du Congres International des Zeitschrift fur Physik 133: 101108.
Mathamaticiens Nice, 1970, Tome 2, pp. 427432. Paris: Yafaev DR (1992) Mathematical scattering theory. Transactions
Gauthier Villars. of Mathematical Monographs. Providence, RI: American
Teta A (2004) On a rigorous proof of the JoosZeh formula for Mathematical Society.
decoherence in a two-body problem. Multiscale Methods in Zee HI (1970) On the interpretation of measurement in quantum
Quantum Mechanics, pp. 197205. Trends in Mathematics. theory. Foundations of Physics 1: 6976.
Boston: Birkhauser. Zurek WH (1982) Environment induced superselection rules.
Weyl A (1931) The Theory of Groups and Quantum Mechanics. Physical Reviews D 26(3): 18621880.
New York: Dover.

Introductory Article: Topology


Tsou Sheung Tsun, University of Oxford, Oxford, UK (i) ;, X 2 T .
2006 Elsevier Ltd. All rights reserved. (ii) Let I be an index set. then
[
A 2 T ;  2 I ) 2I
A 2 T
Tn
Introduction (iii) Ai 2 T , i = 1, . . . , n ) i=1 Ai 2 T .
This will be an elementary introduction to general Definition 2 A member of the topology T is called
topology. We shall not even touch upon algebraic an open set (of X with topology T ).
topology, which will be dealt with in Cohomology
Theories, although in some mathematics departments Remark The last two properties are more easily
it is introduced in an advanced undergraduate course. put as arbitrary unions of open sets are open, and
We believe such an elementary article is useful for finite intersections of open sets are open. One can
the encyclopaedia, purely for quick reference. Most easily see the significance of this: if we take the
of the concepts will be familiar to physicists, but usual topology (which will be defined in due
usually in a general rather vague sense. This article course) of the real line, then the intersection of all
will provide the rigorous definitions and results open intervals (1=n, 1=n), n a positive integer, is
whenever they are needed when consulting other just the single point {0}, which is manifestly not
articles in the work. To make sure that this is the open in the usual sense.
case, we have in fact experimentally tested the Example If we postulate that ;, and the entire set
article on physicists for usefulness. X, are the only open subsets, we get what is called
Topology is very often described as rubber-sheet the indiscrete or coarsest topology. At the other
geometry, that is, one is allowed to deform objects extreme, if we postulate that all subsets are open,
without actually breaking them. This is the all- then we get the discrete or finest topology. Both
important concept of continuity, which underlies seem quite unnatural if we think in terms of the
most of what we shall study here. real line or plane, but in fact it would be more
We shall give full definitions, state theorems unnatural to explicitly exclude them from the
rigorously, but shall not give any detailed proofs. definition. They prove to be quite useful in certain
On the other hand, we shall cite many examples, respects.
with a view to applications to mathematical physics,
taking for granted that familiar more advanced Definition 3 A subset of X is closed if its
concepts there need not be defined. By the same complement in X is open.
token, the choice of topics will also be so dictated. Remarks
(i) One could easily build a topology using closed
",1,5,1,0,0pc,0pc,0pc,0pc>Essential sets instead of open sets, because of the simple
Concepts relation that the complement of a union is the
intersection of the complements.
Definition 1 Let X be a set. A collection T of
(ii) From the definitions, there is nothing to prevent
subsets of X is called a topology if the following are
a set being both open and closed, or neither
satisfied:
132 Introductory Article: Topology

Definition 4 A set equipped with a topology is This space is neither Hausdorff nor compact (see
called a topological space (with respect to the given later for definition of compactness).
topology). Elements of a topological space are
Definition 13 Let X and Y be two topological
sometimes called points.
spaces and let f : X ! Y be a map from X to Y. We
Definition 5 Let x 2 X. A neighborhood of x is a say that f is continuous if f 1 (A) is open (in X)
subset of X containing an open set which contains x. whenever A is open (in Y).
Remark This seems a clumsy definition, but turns Remark Continuity is the single most important
out to be more useful in the general case than concept here. In this general setting, it looks a little
restricting to open neighborhoods, which is often done. different from the  definition, but this latter works
only for metric spaces, which we shall come to shortly.
Definition 6 A subcollection of open sets B  T is
called a basis for the topology T if every open set is Definition 14 A map f : X ! Y is a homeomorph-
a union of sets of B. ism if it is a continuous bijective map such that its
Definition 7 A subcollection of open sets S  T is inverse f 1 is also continuous.
called a sub-basis for the topology T if every open Remark Homeomorphisms are the natural maps
set is a union of finite intersections of sets of S. for topological spaces, in the sense that two home-
Definition 8 The closure A  of a subset A of X is omorphic spaces are indistinguishable from the
the smallest closed set containing A. point of view of topology. Topological invariants
are properties of topological spaces which are
Definition 9 The interior A of a subset A of X is preserved under homeomorphisms.
the largest open set contained in A.
Definition 15 Let B  A. Then one can define the
Remark It is sometimes useful to define the relative topology of B by saying that a subset C  B
 A = {x 2 A,
boundary of A as the set An  x 62 A}.
is open if and only if there exists an open set D of A
Definition 10 Let A be a subset of a topological such that C = D \ B.
space X. A point x 2 X is called a limit point of A if Definition 16 A subset B  A equipped with the
every open set containing x contains some point of relative topology is called a subspace of the
A other than x. topological space A.
Definition 11 A subset A of X is said to be dense in
 = X. Remark Thus, if for subsets of the real line, we
X if A consider A = [0, 3], B = [0, 2], then C = (1, 2] is open
Definition 12 A topological space X is called a in B, in the relative topology induced by the usual
Hausdorff space if for any two distinct points x, y 2 X, topology of R.
there exist an open neighborhood of A of x and an Definition 17 Given two topological spaces X and Y,
open neighborhood B of y such that A and B are
we can define a product topological space Z = X  Y,
disjoint (that is, A \ B = ;).
where the set is the Cartesian product of the two sets X
Remark and Examples and Y, and sets of the form A  B, where A is open in
X and B is open in Y, form a basis for the topology.
(i) This is looking more like what we expect.
However, certain mildly non-Hausdorff spaces Remark Note that the open sets of X  Y are not
turn out to be quite useful, for example, in twistor always of this product form (A  B).
theory. A pocket furnishes such an example.
Definition 18 Suppose there is a partition of X into
Explicitly, consider X to be the subset of the real
disjoint subsets A ,  2 I , for some index set I , or
plane consisting of the interval [1, 1] on the x-
equivalently, there is defined on X an equivalence
axis, together with the interval [0, 1] on the line
relation  . Then one can define the quotient
y = 1, where the following pairs of points are
topology on the set of equivalence classes {A ,  2
identified: (x, 0) (x, 1), 0 < x  1. Then the two
I }, usually denoted as the quotient space X=  = Y,
points (0, 0) and (0, 1) do not have any disjoint
as follows. Consider the map  : X ! Y, called the
neighborhoods. Strictly speaking, one needs the
canonical projection, which maps the element x 2 X
notion of a quotient topology, introduced below.
to its equivalence class [x]. Then a subset U  Y is
(ii) For a more truly non-Hausdorff topology,
open if and only if 1 (U) is open.
consider the space of positive integers N =
{1, 2, 3, . . . }, and take as open sets the following: Proposition 1 Let T be the quotient topology on
;, N, and the sets {1, 2, . . . , n} for each n 2 N. the quotient space Y. Suppose T 0 is another
Introductory Article: Topology 133

topology on Y such that the canonical projection is Definition 25 A metric space is a set X together
continuous, then T 0  T . with a function d : X  X ! R satisfying
Definition 19 An (open) cover {U :  2 I } for X is a (i) d(x, y)  0,
collection of open sets U  X such that their union (ii) d(x, y) = 0 , x = y,
equals X. A subcover of this cover is then a subset of (iii) d(x, z)  d(x, y) d(y, z) (triangle inequality).
the collection which is itself a cover for X.
Remarks
Definition 20 A topological space X is said to be
(i) The function d is called the metric, or distance
compact if every cover contains a finite subcover.
function, between the two points.
Remark So for a compact space, however one (ii) This concept of metric is what is generally
chooses to cover it, it is always sufficient to use a known as Euclidean metric in mathematical
finite number of open subsets. This is one of the physics. The distinguishing feature is the posi-
essential differences between an open interval (not tive definiteness (and the triangle inequality).
compact) and a closed interval (compact). The former One can, and does, introduce indefinite metrics
is in fact homeomorphic to the entire real line. (for example, the Minkowski metric) with
various signatures. But these metrics are not
Definition 21 A topological space X is said to be
usually used to induce topologies in the spaces
connected if it cannot be written as the union of two
concerned.
nonempty disjoint open sets.
Definition 26 Given a metric space X and a point
Remark A useful equivalent definition is that any
x 2 X, we define the open ball centred at x with
continuous map from X to the two-point set {0, 1},
radius r (a positive real number) as
equipped with the discrete topology, cannot be
surjective. Br x fy 2 X : dx; y < rg
Definition 22 Given two points x, y in a topolo- Given a metric space X, we can immediately
gical space X, a path from x to y is a continuous define a topology on it by taking all the open balls in
map f : [0, 1] ! X such that f (0) = x, f (1) = y. We X as a basis. We say that this is the topology
also say that such a path joins x and y. induced by the given metric. Then we can recover
Definition 23 A topological space X is path- our usual  definition of continuity.
connected if every two points in X can be joined Proposition 4 Let f : X ! Y be a map from the metric
by a path lying entirely in X. space X to the metric space Y. Then f is continuous
Proposition 2 A path-connected space is connected. (with respect to the corresponding induced topologies)
at x 2 X if and only if given any  > 0, 9 > 0 such that
Proposition 3 A connected open subspace of R n is d(x, x0 ) <  implies d(f (x, ), f (x0 )) < .
path-connected.
Note that we do not bother to give two different
Definition 24 Given a topological space X, define symbols to the two metrics, as it is clear which
an equivalence relation by saying that x  y if and spaces are involved. The proof is easily seen by
only if x and y belong to the same connected taking the relevant balls as neighborhoods. Equally
subspace of X. Then the equivalence classes are easy is the following:
called (connected) components of X.
Proposition 5 A metric space is Hausdorff.
Examples
Definition 27 A map f : X ! Y of metric spaces is
(i) The Lie group O(3) of 3  3 orthogonal matrices uniformly continuous if given any  > 0 there exists
has two connected components. The identity  > 0 such that for any x1 , x2 2 X, d(x1 , x2 ) < 
connected component is SO(3) and is a subgroup. implies d(f (x1 ), f (x2 )) < .
(ii) The proper orthochronous Lorentz transformations
of Minkowski space form the identity component Remark Note the difference between continuity
of the group of Lorentz transformations. and uniform continuity: the latter is stronger and
requires the same  for the whole space.
Definition 28 Two metrics d1 and d2 defined on X
Metric Spaces are equivalent if there exist positive constants a and
b such that for any two points x, y 2 X we have
A special class of topological spaces plays an
important role: metric spaces. ad1 x; y  d2 x; y  bd1 x; y
134 Introductory Article: Topology

Remark This is clearly an equivalence relation. Definition 31 A metric space X is complete if every
Two equivalent metrics induce the same topology. Cauchy sequence in X converges to a limit in it.
Examples Examples
(i) Given a set X, we can define the discrete metric (i) The closed interval [0, 1] on the real line is
as follows: d0 (x, y) = 1 whenever x 6 y. This complete, whereas the open interval (0, 1) is
induces the discrete topology on X. This is quite not. For example, the Cauchy sequence
a convenient way of describing the discrete {1=n, n = 2, 3, . . . } has no limit in this open
topology. interval. (Considered as a sequence on the real
(ii) In R, the usual metric is d(x, y) = jx  yj, and line, it has of course the limit point 0.)
the usual topology is the one induced by this. (ii) The spaces Rn are complete.
(iii) More generally, in Rn , we can define a metric (iii) The Hilbert space 2 consisting of all
for every p  1 by sequences
P 2of real numbers {x1 , x2 , . . . } such
( )1=p that 1 1 xk converges is complete with respect
X
n
p to the obvious metric which is a generalization
dp x; y jxk  yk j
to infinite dimension of d2 above. For arbi-
k1
trary p  1, one can similarly define p , which
where x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ). In are also complete and are hence Banach
particular, for p = 2 we have the usual Eucli- spaces.
dean metric, but the other cases are also useful.
To continue the series, one can define Remarks Completeness is not a topological invar-
iant. For example, the open interval (1, 1) and the
d1 max fjxk  yk jg whole real line are homeomorphic (with respect to
1<k<n
the usual topologies) but the former is not complete
All these metrics induce the same topology on R n . while the latter is. The homeomorphism can
(iv) In a vector space V, say over the real or the conveniently be given in terms of the trigonometric
complex field, a function k k : V ! R is called function tangent.
a norm if it satisfies the following axioms:
(a) kxk = 0 if and only if x = 0, Definition 32 A subset B of the metric space X is
(b) kxk = jjkxk, and bounded if there exists a ball of radius R (R > 0)
(c) kx yk  kxk kyk. which contains it entirely.

Then it is easy to see that a metric can be defined Theorem 1 (HeineBorel) Any closed bounded
using the norm subset of Rn is compact.
Remark The converse is also true. We have thus a
dx; y kx  yk
nice characterization of compact subsets of R n as
In many cases, for example, the metrics defined in being closed and bounded.
example (iii) above, one can define the norm of a
Proposition 6 Any bounded sequence in Rn has a
vector as just the distance of it from the origin. One
convergent subsequence.
obvious exception is the discrete metric.
A slightly more general concept is found to be Definition 33 Consider a sequence {fn } of real-
useful for spaces of functions and operators: that of valued functions on a subset A (usually an interval)
seminorms. A seminorm is one which satisfies the of R. We say that {fn } converges pointwise in A if
last two of the conditions, but not necessarily the the sequence of real numbers {fn (x)} converges for
first, for a norm, as listed above. every x 2 A. We can then define a function f : A ! R
by f (x) = limn!1 fn (x), and write fn ! f .
Definition 29 Given a metric space X, a sequence
of points {x1 , x2 , . . . } is called a Cauchy sequence if, Definition 34 A sequence of functions fn : A !
given any  > 0, there exists a positive integer N R, A  R is said to converge uniformly to a function
such that for any k, > N we have d(xk , x ) < . f : A ! R if given any  > 0, there exists a positive
integer N such that, for all x, jfn (x)  f (x)j < 
Definition 30 Given a sequence of points
whenever n > N.
{x1 , x2 , . . . } in a metric space X, a point x 2 X is
called a limit of the sequence if given any  > 0, Theorem 2 Let fn : (a, b) ! R be a sequence of
there exists a positive integer N such that for any functions continuous at the point c 2 (a, b), and
n > N we have d(x, xn ) < . We say that the suppose fn converges uniformly to f on (a, b). Then f
sequence converges to x. is continuous at c.
Introductory Article: Topology 135

Remark and Example The pointwise limit of take equivalence classes of functions which are equal
continuous functions need not be continuous, as almost everywhere (that is, up to a null set), but very
can be shown by the following example: often we can take representatives of these classes
fn (x) = xn , x 2 [0, 1]. We see that the limit function and just deal with genuine functions instead. Note
f is not continuous: that of all Lp , only L2 is a Hilbert space.
n
f x
0 x 6 1 Definition 38 In the space Lp , we define its norm by
1 x1 Z 1=p
p
Definition 35 Let X be a metric space. A map kf k jf xj dx
f : X ! X is a contraction if there exists c < 1 such
that d(f (x), f (y))  cd(x, y) for all x, y 2 X. Now we turn to general normed spaces, and
operators on them.
Theorem 3 (Banach) If X is a complete metric
space and f is a contraction in X, then f has a unique Definition 39 Convergence in the norm is also
fixed point x 2 X, that is, f (x) = x. called strong convergence. In other words, a
sequence (xn ) in a normed space X is said to
converge strongly to x if
Some Function and Operator Spaces
lim kxn  xk 0
The spaces of functions and operators can be n!1
equipped with different topologies, given by various
Definition 40 A sequence (xn ) in a normed space X
concepts of convergence and of norms (or sometimes
is said to converge weakly to x if
seminorms), very often with different such concepts
for the same space. As we saw earlier, a norm in a lim f xn f x
n!1
vector space gives rise to a metric, and hence to a
topology. Similarly with the concept of convergence for all bounded linear functionals f.
for sequences of functions and operators, as one Consider the space B(X, Y) of bounded linear
then knows what the limit points, and hence closed operators T from X to Y. We can make this into a
sets, are. normed space by defining the following norm:
But before we do that, let us introduce, in a
slightly different context, a topology which is in kTk sup kTxk
some sense the natural one for the space of x 2X; kxk 1
continuous maps from one space to another.
Then we can define three different concepts of
Definition 36 Consider a family F of maps from a
convergence on B(X, Y). There are in fact more in
topological space X to a topological space Y, and
current use in functional analysis.
define W(K, U) = {f : f 2 F, f (K)  U}. Then the
family of all sets of the form W(K, U) with K Definition 41 Let X and Y be normed spaces and
compact (in X) and U open (in Y) form a sub-basis let (Tn ) be a sequence of operators Tn 2 B(X, Y).
for the compact open topology for F.
(i) (Tn ) is uniformly convergent if it converges in
Consider a topological space X and sequences of the norm.
functions (fn ) on it. Let D  X. We can then define (ii) (Tn ) is strongly convergent if (Tn x) converges
pointwise convergence and uniform convergence strongly for every x 2 X.
exactly as for functions on subsets of the real line. (iii) (Tn ) is weakly convergent if (Tn x) converges
weakly for every x 2 X.
Definition 37 Let X, D and (fn ) as above.
Remark Clearly we have: uniform convergence )
(i) The functions fn converge pointwise on D to a
strong convergence ) weak convergence, and the
function f if the sequence of numbers
limits are the same in all three cases. However, the
fn (x) ! f (x), 8x 2 D.
converses are in general not true.
(ii) The functions fn converge uniformly on D to a
function f if given  > 0, there exists N such that
for all n > N we have jfn (x)  f (x)j < , 8x 2 D. Homotopy Groups
Next we consider the Lebesgue spaces Lp , that The most elementary and obvious property of a
is, functions f defined on subsets of Rn , such topological space X is the number of connected
that jf (x)jp is Lebesgue integrable, for real components it has. The next such property, in a
numbers p  1. To define these spaces, we tacitly certain sense, is the number of holes X has. There
136 Introductory Article: Topology

are higher analogues of these, called the homotopy Definition 45 A space X is called simply connected
groups, which are topological invariants, that is, if 1 (X) is trivial.
they are invariant under homeomorphisms. They
To define the higher homotopy groups, let us go
play important roles in many topological considera-
into a little detail about homotopy.
tions in field theory and other topics of mathema-
tical physics. The articles Topological Defects Definition 46 Given two topological spaces X and
and Their Homotopy Classification and Electric- Y, and maps
Magnetic Duality contain some examples.
p; q : X ! Y
Definition 42 Given a topological space X, the
zeroth homotopy set, denoted 0 (X), is the set of we say that h is a homotopy between the maps p, q if
connected components of X. One sometimes writes
0 (X) = 0 if X is connected. h:XI !Y

To define the fundamental group of X, or 1 (X), is a continuous map such that h(x, 0) = p(x),
we shall need the concept of closed loops, which we h(x, 1) = q(x), where I is the unit interval [0, 1]. In
shall find useful in other ways too. For simplicity, this case, we write p q.
we shall consider based loops (that is, loops passing
Definition 47 A map f : X ! Y is a homotopy
through a fixed point in X). It seems that in most
equivalence if there exists a map g : Y ! X such
applications, these are the relevant ones. One could
that g f idX and f g idY .
consider loops of various smoothness (when X is a
manifold), but in view of applications to quantum Remark This is an equivalence relation.
field theory, we shall consider continuous loops,
Definition 48 For a topological space X with base
which are also the ones relevant for topology.
point x0 , we define n (X), n  0 as the set of
Definition 43 Given a topological space X and a homotopy equivalence classes of based maps from
point x0 2 X, a (closed) (based) loop is a continuous the n-sphere Sn to X.
function of the parametrized circle to X:
Remark This coincides with the previous defini-
tions for 0 and 1 .
 : 0; 2
! X
There is a very nice relation between homotopy
satisfying (0) = (2) = x0 .
classes and loop spaces.
Definition 44 Given a connected topological space
Proposition 8 n (X) = n1 (X) = = 0 (n X).
X and a point x0 2 X, the space of all closed based
loops is called the (parametrized based) loop space Remarks
of X, denoted X.
(i) When we consider the gauge group G in a Yang
Remarks Mills theory, its fundamental group classifies the
monopoles that can occur in the theory.
(i) The loop space X inherits the relative compact
(ii) For n  1, n (X) is a group, the group action
open topology from the space of continuous maps
coming from the joining of two loops together
from the closed interval [0, 2] to X. It also has a
to form a new loop. On the other hand, 0 (X)
natural base point: the constant function mapping
in general is not a group. However, when X is a
all of [0, 2] to x0 . Hence it is easy to iterate the
Lie group, then 0 (X) inherits a group structure
construction and define k X, k  1.
from X, because it can be identified with the
(ii) Here we have chosen to parametrize the circle
quotient group of X by its identity-connected
by [0, 2], as is more natural if we think in
component. For example, the two components
terms of the phase angle. We could easily have
of O(3) can be identified with the two elements
chosen the unit interval [0,1] instead. This
of the group Z2 , the component where the
would perhaps harmonize better with our pre-
determinant equals 1 corresponding to 0 in Z2
vious definition of paths and the definitions of
and the component where the determinant
homotopies below.
equals 1 corresponding to 1 in Z2 .
Proposition 7 The fundamental group of a topo- (iii) For n  2, the group n (X) is always abelian.
logical space X, denoted 1 (X), consists of classes of (iv) Examples of nonabelian 1 are the fundamental
closed loops in X which cannot be continuously groups of some Riemann surfaces.
deformed into one another while preserving the base (v) Since 1 is not necessarily abelian, much of the
point. direct-sum notation we use for the homotopy
Introductory Article: Topology 137

groups should more correctly be written multi- transitive, then we have the following nice
plicatively. However, in most literature in result: coverings of X are in 11 correspon-
mathematical physics, the additive notation dence with normal subgroups of 1 (X).
seems to be preferred. (ii) Given a connected space X, there always exists a
unique connected simply connected covering space
Examples e called the universal covering space. Further-
X,
(i) n (X  Y) = n (X) n (Y), n  1. more, Xe covers all the other covering spaces of X.
(ii) For the spheres, we have the following results: For the higher homotopy groups, one has
 e
0 if i > n n X n X; n2
i Sn
Z if i n One very important class of homotopy groups are
i S1 0 if i > 1 those of Lie groups. To simplify matters, we shall
n1 Sn Z2 if n  3 consider only connected groups, that is, 0 (G) = 0.
Also we shall deal mainly with the classical groups,
n2 Sn Z2 if n  2
and in particular, the orthogonal and unitary groups.
6 S3 Z12
Proposition 9 Suppose that G is a connected Lie
(iii) From the theory of sphere bundles, we can group.
deduce:
(i) If G is compact and semi-simple, then 1 (G) is
i S2 i1 S1 i S3 if i  2 e is still compact.
finite. This implies that G
i S4 i1 S3 i S7 if i  2 (ii) 2 (G) = 0.
(iii) For G compact, simple, and nonabelian,
i S8 i1 S7 i S15 if i  2 3 (G) = Z.
(iv) For G compact, simply connected, and simple,
and the first of these relations give the follow-
4 (G) = 0 or Z2 .
ing more succinct result:
Examples
i S3 i S2 if i  3
(i) 1 (SU(n)) = 0.
(iv) A result of Serre says that all the homotopy (ii) 1 (SO(n)) = Z2 .
groups of spheres are in fact finite except n (Sn ) (iii) Since the unitary groups U(n) are topologically
and 4n1 (S2n ), n  1. the product of SU(n) with a circle S1 , their
Definition 49 Given a connected space X, a map homotopy groups are easily computed using the
 : B ! X is called a covering if (i) (B) = X, and (ii) for product formula. We remind ourselves that
each x 2 X, there exists an open connected neighbor- U(1) is topologically a circle and SU(2) topolo-
hood V of x such that each component of 1 (V) is open gically S3 .
in B, and  restricted to each component is a home- (iv) For i  2, we have:
omorphism. The space B is called a covering space. i SO3 i SU2
Examples i SO5 i Sp2
(i) The real line R is a covering of the group U(1). i SO6 i SU4
(ii) The group SU(2) is a double cover of the group Just for interest, and to show the richness of the
SO(3). subject, some isomorphisms for homotopy groups
(iii) The group SL(2, C) is a double cover of the are shown in Table 1 and some homotopy groups
Lorentz group SO(1, 3). for low SU(n) and SO(n) are listed in Table 2.
(iv) The group SU(2, 2) is a 4-fold cover of the
conformal group in four dimensions. This local Table 1 Some isomorphisms for homotopy groups
isomorphism is of great importance in twistor
theory. Isomorphism Range

Remarks i (SO(n)) i (SO(m)) n, m  i 2


i (SU(n)) i (SU(m)) n, m  12 (i 1)
(i) By considering closed loops in X and their i (Sp(n)) i (Sp(m)) n, m  14 (i  1)
coverings in B it is easily seen that the i (G2 ) i (SO(7)) 2i 5
i (F4 ) i (SO(9)) 2i 6
fundamental group 1 (X) acts on the coverings
i (SO(9)) i (SO(7)) i  13
of X. If we further assume that the action is
138 Introductory Article: Topology

Table 2 Some homotopy groups for low SU(n) and SO(n)

4 5 6 7 8 9 10

SU(2) Z2 Z2 Z12 Z2 Z2 Z3 Z15


SU(3) 0 Z Z6 0 Z12 Z3 Z30
SU(4) 0 Z 0 Z Z24 Z2 Z120 Z2
SU(5) 0 Z 0 Z 0 Z Z120
SU(6) 0 Z 0 Z 0 Z Z3
SO(5) Z2 Z2 0 Z 0 0 Z120
SO(6) 0 Z 0 Z Z24 Z2 Z120 Z2
SO(7) 0 0 0 Z Z2 Z2 Z2 Z2 Z24
SO(8) 0 0 0 ZZ Z2 Z2 Z2 Z2 Z2 Z2 Z24 Z24
SO(9) 0 0 0 Z Z2 Z2 Z2 Z2 Z24
SO(10) 0 0 0 Z Z2 Z Z2 Z12

Appendix: A Mathematicians the other hand, the map f 1 is defined if and only
Basic Toolkit if f is bijective.
6. A map from a set to either the real or complex
The following is a drastically condensed list, most numbers is usually called a function.
of which is what a mathematics undergraduate 7. A map between vector spaces, and more particu-
learns in the first few weeks. The rest is included larly normed spaces (including Hilbert spaces), is
for easy reference. These notations and concepts called an operator. Most often, one considers
are used universally in mathematical writing. We linear operators.
have not endeavored to arrange the material in a 8. An operator from a vector space to its field of
logical order. Furthermore, given structures such as scalars is called a functional. Again, one con-
sets, groups, etc., one can usually define substruc- siders almost exclusively linear functionals.
tures such as subsets, subgroups, etc., in a
straightforward manner. We shall therefore not Relations
spell this out.
1. A relation  on a set A is a subset R  A  A.
We say that x  y if (x, y) 2 R.
Sets 2. We shall only be interested in equivalence relations.
An equivalence relation  is one satisfying, for all
A [ B fx : x 2 A or x 2 Bg union
x, y, z 2 A:
A \ B fx : x 2 A and x 2 Bg intersection (a) x  x (reflexive),
AnB fx : x 2 A and x 62 Bg complement (b) x  y ) y  x (symmetric),
A  B fx; y : x 2 A; y 2 Bg Cartesian product (c) x  y, y  z ) x  z (transitive).
3. If  is an equivalence relation in A, then for each
x 2 A, we can define its equivalence class:

Maps x
fy 2 A : y  xg
It can be shown that equivalence classes are
1. A map or mapping f : A ! B is an assignment of nonempty, any two equivalence classes are either
an element f (x) of B for every x 2 A. equal or disjoint, and they together partition the set
2. A map f : A ! B is injective if f (x) = f (y) A. Subgroup equivalence classes are called cosets.
) x = y. This is sometimes called a 11 map, a 4. An element of an equivalence class is called a
term to be avoided. representative.
3. A map f : A ! B is surjective if for every y 2 B
there exists an x 2 A such that y = f (x). This is Groups
sometimes called an onto map.
A group is a set G with a map, called multiplication
4. A map f : A ! B is bijective if it is both surjective
or group law
and injective. This is also sometimes called a 11
map, a term to be equally avoided. G  G ! G
5. For any map f : A ! B and any subset C  B, the
x; y 7! xy
inverse image f 1 (C) = {x: f (x) 2 C}  A is always
defined, although, of course, it can be empty. On satisfying
Introductory Article: Topology 139

1. (xy)z = x(yz), 8x, y, z 2 G (associative); Fields


2. there exists a neutral element (or identity) 1 such
A field F is a commutative ring in which every
that 1x = x1 = x, 8x 2 G; and
nonzero element is invertible.
3. every element x 2 G has an inverse x1 , that is,
The additive identity 0 is never invertible, unless
xx1 = x1 x = 1.
0 = 1, so it is usual to assume that a field has at least
A map such as the multiplication in the definition two elements, 0 and 1.
is an example of a binary operation. Note that we The most common fields we come across are, of
have denoted the group law as multiplication here. course, the number fields: the rationals, the reals,
It is usual to denote it additively if the group is and the complex numbers.
abelian, that is, if xy = yx, 8x, y 2 G. In this case, we
may write the condition as x y = y x, and call
Vector Spaces
the identity element 0.
A vector space, or sometimes linear space, V, over a
field F, is an abelian group, written additively, with
Rings
a map F  V ! V such that, for x, y 2 V, ,  2 F,
A ring is a set R equipped with two binary
1. (x y) = x y (linearity),
operations, x y called addition, and xy called
2. ( )x = x x,
multiplication, such that
3. ()x = (x), and
1. R is an abelian group under addition; 4. 1x = x.
2. the multiplication is associative; and
A vector space is then a right (or left) F-module.
3. (x y)z = xz yz, x(y z) = xy xz, 8x, y, z 2 R
The elements of V are called vectors, and those of F
(distributive).
scalars.
If the multiplication is commutative (xy = yx) then
the ring is said to be commutative. A ring may
Algebras
contain a multiplicative identity, in which case it is
called a ring with unit element. An algebra A over a field F is a ring which is a
An ideal I of R is a subring of R, satisfying in vector space over F, such that
addition
ab ab ab;  2 F; a; b 2 A
r 2 R; a 2 I ) ra 2 I; ar 2 I
Note that in some older literature, particularly the
One can define in an obvious fashion a left-ideal and Russian school, an algebra of operators is called a
a right- ideal. The above definition will then be for a ring of operators.
two-sided ideal.

Modules Further Reading


Given a ring R, an R-module is an abelian group M, Borel A (1955) Topology of Lie groups and characteristic classes.
together with an operation, M  R ! M, denoted Bulletin American Mathematical Society 61: 397432.
Kelly JL (1955) General Topology. New York: Van Nostrand
multiplicatively, satisfying, for x, y 2 M, r, s 2 R,
Reinhold.
1. (x y)r = xr yr, Kreyszig E (1978) Introductory Functional Analysis with Applica-
tions. New York: Wiley.
2. x(r s) = xr xs,
Mc Carty G (1967) Topology: An Introduction with Application
3. x(rs) = (xr)s, and to Topological Groups. New York: McGraw-Hill.
4. x1 = x Simmons GF (1963) Introduction to Topology and Modern
Analysis. New York: McGraw-Hill.
The term right R-module is sometimes used, to
distinguish it from obviously defined left R-modules.
A
Abelian and Nonabelian Gauge Theories Using Differential Forms
A C Hirshfeld, Universitat Dortmund, Dortmund, invariance. The covariant derivatives involve the
Germany gauge potentials, whose transformation properties
2006 Elsevier Ltd. All rights reserved. are dictated by those of the covariant derivative.
Whereas for an abelian gauge theory such as
electromagnetism scalar-valued p-forms are suffi-
cient (actually only p = 1, 2), a nonabelian gauge
Introduction theory involves the use of Lie-algebra-valued
p-forms. These are introduced and used to construct
Quantum electrodynamics is the theory of the
the YangMills action, which involves the field
electromagnetic interactions of photons and elec-
strength tensor which is determined from the gauge
trons. When attempting to generalize this theory to
potentials. This action leads to the YangMills
other interactions it turns out to be necessary to
equations for the gauge potentials, which are the
identify its essential components. The essential
nonabelian generalizations of the Maxwell equations.
properties of electrodynamics are contained in its
formulation as an abelian gauge theory. The
generalization to include other interactions is then
reduced to incorporating the structure of nonabelian Relativistic Kinematics
groups. This becomes particularly clear when we
The trajectory of a mass point is described as x (),
formulate the theory in the language of differential
where  is the invariant proper time interval:
forms.
Here we first present the formulation of electro- d 2 dt2  dx  dx dt2 1  v2 1
dynamics using differential forms. The electromag-
netic fields are introduced via the Lorentz force with v = dx=dt. With the abbreviation  = (1  v2 )1=2
equation. They are recognized as the components of this yields d = (1=)dt.
a differential 2-form. This form fulfills two differ- The 4-velocity of a point is defined as u =
ential conditions, which are equivalent to Maxwells dx =d = (dx =dt). The quantity
equations. These are expressed with the help of a
dx dx
differential operator and its Hermitian conjugate, u2 g u u 1 2
the codifferential operator. We consider the effects d 2
of charge conservation and introduce electromag- is a relativistic invariant. Here
netic potentials, which are defined up to gauge 0 1
transformations. We finally consider Weyls argu- 1 0 0 0
B 0 1 0 0C
ment for the existence of the electromagnetic g B C 3
@0 0 1 0A
interaction as a consequence of the local phase
invariance of the electron wave function. 0 0 0 1
We then go on to present the nonabelian general-
is the metric of Minkowski space.
ization. The gauge bosons appear in a theory with
The 4-momentum of a particle is p = m0 u =
fermions by requiring invariance of the theory with
(m0 , m0 v), and p p = m20 . The 4-force is
respect to local gauge transformations. When the
fermions group into symmetry multiplets this gives  0 
 dp dp dp
rise to a gauge group SU(N) involving N 2 1 gauge f   ;f 4
d dt dt
bosons mediating the interaction, where N is the
dimension of the Lie algebra. The interaction arises with the 3-force
through the necessity of replacing the usual deriva-
tives by covariant derivatives, which transform in a dm0 v
f 5
natural way in order to preserve the gauge dt
142 Abelian and Nonabelian Gauge Theories Using Differential Forms

Differentiate p2 = m20 with respect to , this yields for an arbitrary vector v. The contraction of a
  2-form with a vector yields a 1-form.
dp0 It is easily seen that a 2-form can be expressed in
2p f 2m0  2  f v 0 6
dt terms of a polar vector and an axial vector: if it is to
be invariant with respect to parity transformations
or with

dp0 dx t ! t; x ! x; y ! y; z ! z 17


f v f  7
dt dt
the fields in eqn [13] must transform as
This says that
E ! E; B!B 18
0
dp f  dx dW 8 Now we check the validity of eqn [11]. We have

where W is the work done and p0 is the energy. f qiu F


For a charged particle, the Lorentz force is qv  Edt  qEx v  Bx dx
f qE v  B 9 Ey v  By dy Ez v  Bz dz 19
in agreement with eqn [10]. We remember to change
where q is the charge of the particle, E is the electric,
the signs in Ex = Ex , Bx = Bx , etc.
and B the magnetic field strength. Since f  v = qE  v,
we have the four-dimensional form of the Lorentz
force:
The Codifferential Operator
f  qE  v; E v  B 10
The space of p-forms on an n-dimensional manifold
is an
   
The Lorentz Force Equation with n n n!
20
Differential Forms p np n  p!p!
We write the Lorentz force equation as an equation dimensional vector space. The space of p-forms is
for a differential form f = f dx , with f = g f  . The thus isomorphic to the space of (n  p)-forms. The
velocity-dependent Lorentz force is Hodge dual operator maps the p-forms into the
(n  p)-forms, and is defined by
f qiu F 11
 ^   h;  idx1 ^    ^ dxn 21
with
Here h,  i is the scalar product of two p-forms:
 
@ @ @ @
u vx vy vz 12 h;  i i1  ip i1  sip 22
@t @x @y @z
where i1  sip are the coefficients of the form ,
the 4-velocity and F the electromagnetic field
strength:  i1  ip dxi1 ^    ^ dxip 23

F E ^ dt B 13 j1  sjp are the coefficients of the form ,

where E is a 1-form in three dimensions,  j1  jp dxj1 ^    ^ dxjp 24

E Ex dx Ey dy Ez dz 14 and

and B is a 2-form in three dimensions, i1  ip gi1 j1    gip jp j1  jp 25

The indices satisfy i1 <    < ip and j1 <   < jp .


B Bx dy ^ dz By dz ^ dx Bz dx ^ dy 15
The basis elements are orthogonal with respect to
The symbol iu indicates a contraction of a 2-form this scalar product, and
with a vector, which is defined as
hdxi1 ^    ^ dxip ; dxi1 ^    ^ dxip i
iu Fv Fu; v 16 gi1 i1    gip ip 26
Abelian and Nonabelian Gauge Theories Using Differential Forms 143

The Hodge dual has the property that to the scalar product ( , ). Whereas the differential
  operator d maps p-forms into (p 1)-forms, the
 dx1 ^    ^ dxp codifferential operator d maps p-forms into (p  1)-
g11    gpp sign  forms.
  The relation d2 = 0 leads to
 dxp1 ^    ^ dxn 27
d 2 / dd / d2  0 35
where  is a permutation of the indices (1, . . . , n), This fact plays an essential role in connection with
(1) <    < (p), and (p 1) <    < (n). We also the conservation laws.
have Finally, we want to obtain a coordinate expres-
  sion for d . Indeed d  = Div  for
 dxp1 ^    ^ dxn
@Kj
gp1p1    gnn 1pnp sign  DivK 36
  @xj
 dx1 ^    ^ dxp 28 where K is the multi-index of the coeffecients in
 = K dxK , and K indicates that K = (k1 , . . . , kp ) is in
We therefore find that the application of the the order k1 <    < kp . We will show that
Hodge dual to a p-form twice yields (, d ) = (, Div) for an arbitrary (p  1)-form
  . It is a fact that
  dx1 ^   ^dxp Z
 
g11   gpp sign  dxp1 ^  ^ dxn ; d  d;  dI I  1

37

g11   gnn 1pnp dx1 ^  ^ dxp 29 Now we have the coordinate expressions
or d dL ^ dxL 38
pnp Ind g
 1 1 Id 30 and (dxL )K = KL . It follows that
where Ind g is the number of times (1) occurs along jK @L L
the diagonal of g. dI dL ^ dxL I I 39
@xj K
Now let  be a (p  1)-form, and  a p-form.
Then d   is an (n  p 1)-form, and or
jK @K
d^   d ^   1p1  ^ d   dI I 40
@xj
p1 np1p1
d ^   1 1 Here we use
Indg
 1 ^ d  
 ^ I IKL K L 41
np1 Indg
d ^   1 1
where
 ^  d   31
8
We then have >
> 1 if (KL) is an even
>
>
Z >
> permutation of I
>
<
d;   ; d  d ^  32 IKL 1 42
M > if (KL) is an odd
>
>
>
> permutation of I
with >
>
: 0 otherwise
d 1np1 1Ind g  d  33
Use of the Leibnitz rule yields
We are here using the scalar product of two p-forms Z Z
Z I jK @K I
dI   1 I  1
;  :  ^   34 @xj
M Z @ jK  I
I K
With the help of Stokes theorem the last integral in 1
@xj
eqn [32] may be turned into a surface term at Z I
infinity, which vanishes for  and  with compact jK @
  K I 1 43
support. d is the adjoint operator to d with respect @xj
144 Abelian and Nonabelian Gauge Theories Using Differential Forms

The first term corresponds to a surface integration We apply again the Hodge dual:
jK
and we can neglect it. We then have I I =  jK from  
the antisymmetry of , so that @Ex
d  F div Edt curl Bx  dx
Z @t
@jK  
; d   K  1 ; Div 44 @Ey
@xj curl By  dy
@t
 
@Ez
curl Bz  dz 53
@t
The Maxwell Equations
In Minkowski space the expression d equals the
The Maxwell equations become remarkably concise codifferential. Therefore, the equation d F = d 
when expressed in terms of differential forms, namely F =  j holds, with j given by j = (
, J), which is
equivalent to
dF 0; d F j 45
@E
where F is the field strength and j is the current div E
; curl B  J 54
@t
density. We wish to demonstrate this. We use a
(3 1)-separation of the exterior derivative into a the inhomogeneous Maxwell equations.
timelike and a spacelike part:
@ Current Conservation
d d dt ^ 46
@t
The electromagnetic 4-current is
We then get
  j
0 u
0 ;
0 v
; J 55
@B
dF dE ^ dt dB 0 47 where
is the charge density and J the current
@t
density. This corresponds to a 1-form
By comparing coefficients, we arrive at
j
dt  Jx dx  Jy dy  Jz dz 56
@B
dE  ; dB 0 48 The Hodge dual is j = 3  j2 ^ dt, with the 3-form
dt
3 =
dx ^ dy ^ dz, and the 2-form
In vector notation
j2 Jx dy ^ dz  Jy dz ^ dx  Jz dx ^ dy 57
@B
curl E  ; div B 0 49
@t From the Maxwell equation d F = j, it follows
the usual form of the homogeneous Maxwell that
equations.
By direct application of the formula [27], one finds d 2 F d j 0 58

F  ?B ^ dt ?E 50 that is

where ? means the Hodge dual in three space dj d3  j2 ^ dt d3  dj2 ^ dt
 
dimensions. One finds @

   div J dt ^ dx ^ dy ^ dz
@t
@?E
dF d?E d?B ^ dt 51 @

@t div J 0 59
@t
Therefore,
This is the continuity equation. R
d  F div Edx ^ dy ^ dz The total charge inside a volume V is Q = V
dV,
  therefore
x @Ex
curl B  dy ^ dz ^ dt Z Z
dt dQ d
   
dV J  n dS 60
@Ey dt dt V
curl By  dz ^ dx ^ dt @V
dt
  where @V is the surface which encloses the
@Ez volume V, dS is the surface element, and n is the normal
curl Bz  dx ^ dy ^ dt 52
dt vector to this surface. This is current conservation.
Abelian and Nonabelian Gauge Theories Using Differential Forms 145

The Gauge Potential of the form g = exp {i(x)}, with g an element of the
abelian gauge group G = U(1). The free action is
The Poincare lemma tells us that dF = 0 implies
F = dA, with the 4-potential A: Z
S0 L 0 d 4 x 69
A dt A 61
and the vector potential A = Ax dx Ay dy Az dz. with
From  
  L0  i  @  m 70
@
F E ^ dt B d dt ^ A
@t the Lagrange density. This action is not invariant
@A under gauge transformations:
d ^ dt dA dt ^ 62
@t  
L0 ! L00  i  @  m  @    71
it follows by comparing coefficients that
The undesired term can be compensated by the
@A introduction of a gauge potential ! in a covariant
E d  ; B dA 63
@t derivative of ,
In vector notation this is
D d ! 72
@A
E grad  ; B curl A 64 which has the desired transformation property
@t
D ! exp {i}D when besides the transformation
The 4-potential is determined up to a gauge function :
(x) ! exp {i(x)} (x) of the matter field the gauge
A0 A d 65 potential simultaneously transforms according to the
gauge transformation ! ! !  id. The new Lagrange
This gauge freedom has no influence on the density is
observable quantities E and B:
 
L  i  D  m L0 i! x  x 73
F0 dA0 dA d2  dA F 66
2 The substitution @ ! D is known to physicists;
The Laplace operator is 4 = (d d) = dd

 with ! =  iqA it is the ansatz of minimal coupling
d d, so when the 4-potential A fulfills the condition
for taking into account electromagnetic effects:
d A = 0, we have
@ ! @  iqA . The Lagrange density becomes in
4A d dA d F j 67 this notation L = L0  A J , where J = q   .
The Lagrange density must now be completed by
the classical wave equation. The condition a kinetic term for the gauge potential and we get the
d A = 0 is called the Lorentz gauge condition. complete electromagnetic Lagrange density
This condition can always be fulfilled by using the
gauge freedom: d (A d) = 0 is fulfilled when L L0  A J  14 F F 74
d d = 4 = d A, where we have used the fact
that d  = 0 for functions. That is to say, d A = 0 is with F = @ A  @ A . In the action this corre-
fulfilled when  is a solution of the inhomogeneous sponds to
wave equation. Z Z
1
S S0  A J vol4  F F vol4 75
M 4 M
Gauge Invariance
In quantum mechanics, the electron is described by a We get the field equations for the potential A by
wave function which is determined up to a free demanding that the variation of the action vanishes:
phase. Indeed, at every point in space this phase can Z Z
1 4
be chosen arbitrarily: SA  
A J vol  F F vol4 76
M 4 M
0
x ! x expfixg x
x ! 0 x x expfixg 68 We write now
Z
with the only condition being that (x) is a A J vol4 A; j 77
continuous function. The gauge transformation is M
146 Abelian and Nonabelian Gauge Theories Using Differential Forms

and The Dirac Lagrangian is not invariant with


Z respect to local gauge transformations:
1
F F vol4
4 M  
Z L0  i  @  m ! L00
1 1  
F ^  F F; F
2 M 2 L0 i   g@ g1 87
dA; F d A; F A; d F 78 We introduce the gauge potential
where we have exchanged the action of and d. ! x !a xTa 88
Since this holds for arbitrary variations A we find
with a gauge transformation
d F j 79
! ! !0 g1 ! g g1 @ g 89
the inhomogeneous Maxwell equation.
The Lagrange density is modified through a covar-
iant derivative:
Nonabelian Gauge Theories @ ! D @ !  90
In SU(N) gauge theory the elementary particles are The covariant derivative D transforms according to
taken to be members of symmetry multiplets. For
example, in electroweak theory the left-handed D ! D 0 g1 D g 91
electron and the neutrino are members of an SU(2) and thus the modified Lagrange density
doublet:  
  L  i  D  m L0 i   ! 92
e
80 is invariant with respect to local gauge transformations.

The extra term in the Langrange density is
A gauge transformation is conventionally written
0
x g1 x x; 0 x xgx 81 Ja Aa 93

with with
Aa iq!a 94
gx exp fxg 82
and
where g(x) is an element of the Lie group SU(2) and
 is an element of the Lie algebra su(2). The Lie Ja   Ta 95
algebra is a vector space, and its elements may be
In mathematical terminology ! is called a connec-
expanded in terms of a basis:
tion. The quantity A is the physicists gauge
x a xTa 83 potential. The connection is anti-Hermitian and the
gauge potential Hermitian. The gauge potential also
For su(2) the basis elements are traceless and anti- includes the coupling constant q. We will refer to
Hermitian (see below), they are conventionally both ! and A as the gauge potential, where the
expressed in terms of the Pauli matrices, relation between them is given by eqn [94].
We can write the gauge potential as A = Aa dx Ta
a
Ta 84 or, in the SU(2) case, as
2i
A A1 T1 A2 T2 A3 T3 96
with
    where we see explicitly that it involves three vector
0 1 0 i
1 ; 2 fields, which couple to the electroweak currents [95]
1 0 i 0 with the single coupling constant q, and which will
  85
1 0 become after symmetry breaking the three vector
3 bosons W , W , Z0 of the electroweak gauge theory.
0 1
Actually, a mix of the neutral gauge boson and the
They are conventionally normalized according to photon will combine to yield the Z0 boson, while the
orthogonal mixture gives rise to the electromagnetic
trTa Tb  12 ab 86 interaction, in an SU(2)  U(1) theory. At this stage,
Abelian and Nonabelian Gauge Theories Using Differential Forms 147

the gauge bosons are all massless, their masses are The Gauge Potential and the
generated by the Higgs mechanism. Field Strength
The generalization of the abelian relationship
between the gauge potential and the field strength,
Lie-Algebra-Valued p-Forms F = dA, is

To describe nonabelian fields, we need Lie-algebra- d! 12 !; ! d ! ^ ! 107


valued p-forms:
where because ! is a 1-form we can use eqn [106].
Ta a 97 The mathematician refers to as the curvature. The
physicist writes, in analogy to eqn [94],
where Ta is a generator of the Lie algebra, the index
a runs over the number of generators of the Lie F i q 12 F
a
dx ^ dx Ta 108
algebra, and the a are the usual scalar-valued
p-forms. The composition in a Lie algebra is a Lie One obtains for the components
bracket, which is defined for two Lie-algebra-valued
a
p-forms by F @ Aa  @ Aa  iqfbc
a b c
A A 109

;  : Ta ; Tb  a ^ b
98 A generalization of the gauge transformation of
A, that is, A0 = A d, is eqn [89]:
The Lie bracket in the algebra is
c !0 g1 !g g1 dg 110
Ta ; Tb  fab Tc 99
a
A quantity with the transformation property
where fbc are the structure constants. It follows from
this that 0 g1 g 111
a
;  Ta ; Tb  ^ b Tb ; Ta  a
^ b 100 is called a tensorial quantity. The gauge potential
! is according to this definition nontensorial.
or Nevertheless the field strength is tensorial. Indeed
;  1pq1 ;  101 0 dg1 !g dg1 ^ dg

when is a p-form and is a q-form. In the special 12 g1 !g g1 dg; g1 !g g1 dg
case that Ta is a matrix, also the product Ta Tb is dg1 ^ !g g1 d!g  g1 ! ^ dg dg1 ^ dg
defined, and from this the product of two Lie- 12 g1 !; !g 12 g1 !g; g1 dg
algebra-valued p-forms 12 g1 dg; g1 !g 12 g1 dg; g1 dg
^ Ta a ^ Tb b
Ta Tb a ^ b
102 g1 g dg1 ^ !g  g1 ! ^ dg dg1 ^ dg
g1 ! ^ dg g1 dg ^ g1 !g g1 dg ^ g1 dg
Now the Lie bracket is a commutator:
g1 g 112
Ta ; Tb  Ta Tb  Tb Ta 103
where we have used the derivation of the relation
and g1 g = Id to get

;  Ta ; Tb  a ^ b
dg1 g1 dg g1 113
Ta a ^ Tb b
 1pq Tb b
^ Ta a In the abelian case, we had dF = 0. The non-
^  1 pq
^ 104 abelian analog is

From this relation it follows that for and odd d d! ^ !  ! ^ d!


p-forms  ! ^ ! ^ !  ! ^  ! ^ !
;  ^ ^ 105 ^!  !^ 114

For an odd p-form or

;  ^ ^ 2 ^ 106 d ! ^  ^ ! 0 115
148 Abelian and Nonabelian Gauge Theories Using Differential Forms

the Bianchi identity. It can also be written as The scalar product is invariant under the action of
G on G: for g 2 G
d ! ^  ^ ! d !;  0 116
h gXg1 ; gYg1 i tr gXYg1
because from eqn [104]
trX; Y hX; Yi 126
21
! ^ 1 ^ ! !;  117
or for X, Y, Z 2 G
The covariant derivative D is defined as
hetX Y etX ; etX ZetX i hY; Zi 127
D : d !;  118 We take the derivative of this equation with respect
for a tensorial quantity. The covariant derivative to t at the value t = 0 and get:
takes tensorial p-forms into tensorial (p 1)-forms: hX; Y; Zi hY; X; Zi 0 128
D0 0 dg1 g g1 !g g1 dg; g1 g We define an action of the algebra G on itself:
1 1 p 1 ad(X): G ! G
dg ^ g g d g 1 g ^ dg
g1 !g; g1 g g1 dg; g1 g adXY X; Y 129
p 1
g1 D g dg1 ^ g 1 g ^ dg We can then formulate our conclusion as follows:
the action of G on itself is anti-Hermitian:
g1 dgg1 ^ g  1p g1 ^ dg
g1 D g 119 hadXY; Z i  hY; adXZi 130
or
We have thereby verified the transformation prop-
erty of eqn [91]. adXy adX 131

From gy g = 1 we have jdet (g)j2 = 1. For the gauge


The Gauge Group group G = SU(N) we require in addition det (g) = 1.
From the gauge transformation 0 = g the require- Since
ment j 0 j2 = j j2 leads to gy g = 1. That means that g detg detexpX exptrX 132
belongs to the unitary Lie group G = U(n), whose
gT = g1 . For elements of the Lie
elements fulfill gy =  the elements X 2 su(N) must be traceless. A basis of
algebra G = u(n) this implies the vector space of traceless, anti-Hermitian (2  2)
matrices is given by the Pauli matrices, eqn [85].
 y T
eX eX eX 120

or
The YangMills Action
y  T X
X X 121 The SU(2) YangMills action is, in analogy to the
 is complex conjugation and XT means abelian case,
where X Z Z
transposition. 1 4 1
S 2 a a
F F vol 2 trF F vol4
For elements of the Lie algebra we can define a 4q M  2q M
scalar product (the Killing metric) Z
1
2 trF ^  F 133
2q M
hX; Yi : tr XY X  X  122
We have included the trace in our definition of the
The scalar product is real:
scalar product:
 Yi
 X
  Y
   X  X  hX; Yi Z Z
hX; 123 I n
; :  tr < I > vol  tr ^  134
M M
symmetric:
We then write eqn [133] as
hX; Yi trX; Y trY; X hY; Xi 124
S! 12 ; 135
and positive definite:
 taking into account the relation between and the

hX; Xi X  X  X  X  jX  j2 125 field strength F, and indicating the dependence on
Abelian and Nonabelian Gauge Theories Using Differential Forms 149

the gauge potential. Since is tensorial the action is The first term in the last expression is
invariant. Z
Now we calculate the variation von S[!] with 
d !; !; d tr ! fd g vol4 145
respect to a variation of the gauge potential: M

d 1 The second term can be computed using


S! S!tjt0 ;
dt 2 !; ! f! ^ ! ! ^ !g@ ; @
1
; ; ! !  ! ! ! !  ! ! 146
2    
1 and hence
; d! !; ! ;
2
  !; !  2! ; !   147
1 1
d! !; ! !; !;
2 2 because is antisymmetric,  =   . Thus,
d ! !; !; 136 Z
!; !;  tr!; ! ^ 
where we have exchanged the order of and d. We M
Z
remark that although ! is not a tensorial section, ! is: 1
 tr!; !  vol4
for !01 = g1 !1 g g1 dg and !02 = g1 !2 g g1 dg is 2 M
Z
! !01  !02 g1 !1  !2 g 137  tr! ; !   vol4
Z M
The quantity is in any case tensorial. Therefore,
the covariant derivative is defined, and we have h! ; ! ;  ivol4 148
M
D ! d ! !; ! 138 where h , i is the scalar product in G. From eqn [128]
and this equals
Z
D d !;  139  h ! ; ! ;  ivol4
M
In general, the action of the covariant derivative on Z
tensorial quantities can be written as D = d ad(!), tr ! ! ;  vol4 149
M
where ad(X) is the representation of the Lie algebra on
itself introduced in the previous section. We now have Combining this with eqn [144] gives
 Z
S! D !; !; D 0 140
!; D  tr ! fd   ! ;  gvol4
for an arbitrary variation !. Therefore, D = 0. M

We have obtained !; fd   ! ;  g 150

D 0 141 We can now insert the coordinate expression for


the YangMills equations, and d  @  151
D 0 142 Finally, the coordinate expressions of the Yang
the Bianchi identites. These are the generalizations Mills equations D = 0 are
of the Maxwell equations d F = 0 and dF = 0 in the D  f@  ! ;  g 0 152
absence of external sources. For the general case of
interacting fermions, we write out the full action, in
analogy to eqn [74], and obtain, in analogy to eqns
The Analogy with Electromagnetism
[79] and [58],
The YangMills equation and the Bianchi identity in
D J; D J 0 143
the absence of external sources are
We shall now derive, again for the pure gauge
@ F  iqA ; F  0 153
sector, coordinate expressions for the YangMills
equations. Consider the expression and

S! D !; !; D @ F @ F @ F  iqfA ; F 
d ! !; !; 144 A ; F  A ; F g 0 154
150 Abelian and Nonabelian Gauge Theories Using Differential Forms

We shall write these equations in terms of the fields already by Cartan (1923). A modern presentation of
i0 i differential forms and the manifolds on which they
F E; i 1; 2; 3 155 are defined is given in Abraham et al. (1983). A
recent treatment of electrodynamics in this approach
is Hehl and Obukhov (2003). Weyls argument is in
F12 B3 ; F31 B2 ; F12 B3 156 his paper of 1929.
where the E and B vectors may be thought of as Nonabelian gauge theories today explain the
electric and magnetic fields, even though they have electromagnetic, the strong and weak nuclear
Lie-algebra indices, Fi0 = (Fa )i0 Ta , etc. In the context of interactions. The original paper is that of Yang
the SU(3) theory, they are referred to as the chromo- and Mills (1954). Glashow, Salam, and Weinberg
electric and chromomagnetic fields, respectively. (1980) saw the way to apply it to the weak
The YangMills equations with  = 0 are interactions by using spontaneous symmetry
breaking to generate the masses through the use
@i Fi0  iqAi ; Fi0  0 157 of the Higgs (1964) mechanism. tHooft and
with i = 1, 2, 3 a spatial index. In vector notation Veltman (1972) showed that the resulting quan-
this is tum field theory was renormalizable. The strong
interactions were recognized as the nonabelian
div E iqA  E  E  A 158 gauge theory with gauge group SU(3) by Gell-
This is the analog of Gausss equation. Even though Mann (1972). For a modern treatment which puts
we started out without external sources, iq(A  E  nonabelian gauge theories in the context of
E  A) plays the role of a charge density. The differential geometry, see Frankel (1987).
YangMills field E and the potential A combine to
See also: Dirac Fields in Gravitation and Nonabelian
act as a source for the YangMills field. This is an Gauge Theory; Electroweak Theory; Measure on Loop
essential feature of nonabelian gauge theories in Spaces; Nonperturbative and Topological Aspects of
which they differ from the abelian case, due to the Gauge Theory; Quantum Electrodynamics and its
fact that the commutator [A, E] is nonvanishing. Precision Tests.
Now consider the YangMills equations with a
spatial index  = i:
@0 Fi0 @j Fij  iqA0 ; Fi0   iqAj Fij  0 159 Further Reading
In vector notation this is Abraham A, Marsden J, and Ratiu T (1983) Manifolds, Tensor
Analysis, and Applications. MA: Addison-Wesley.
@E Cartan E (1923) On manifolds with an Affine Connection and the
curl B  iqA0 E  EA0 Theory of General Relativity. English translation of the French
@t
original 1923/1924 (Bibliopolis, Napoli 1986).
iqA  B B  A 160
Frankel T (1987) The Geometry of Physics, An Introduction.
replacing the AmpereMaxwell law. Note that there Cambridge University Press.
Gell-Mann M (1972) Quarks: developments in the quark theory
are two extra contributions to the current other of hadrons. Acta Physica Austriaca Suppl. IV: 733.
than the displacement current. Glashow SL (1980) Towards a unified theory: threads in a
The analogs of the laws of Faraday and of the tapestry. Reviews of Modern Physics 52: 539.
absence of magnetic monopoles are derived similarly Hehl FW and Obukhov YN (2003) Foundations of Classical
from the Bianchi identities. The results are Electrodynamics. Boston: Birkhauser.
Higgs PW (1964) Broken symmetries and the masses of gauge
@B bosons. Physical Review Letters 13: 508.
curl E iqfA  E E  A A0 B  BA0 g 161 tHooft G and Veltman M (1972) Regularization and renorma-
@t
lization of gauge fields. Nuclear Physics B 44: 189.
and Poincare H (1953) Oeuvre. Paris: Gauthier-Villars.
Salam A (1980) Gauge unification of fundamental forces. Reviews
div B iqA  B  B  A 162 of Modern Physics 52: 525.
Weinberg SM (1980) Conceptual foundations of the unified
theory of weak and electromagnetic interactions. Reviews of
Further Remarks Modern Physics 52: 515.
Weyl H (1929) Elektron und gravitation. Zeitschrift fuer Physik
The foundations of the mathematics of differential 56: 330.
forms were laid down by Poincare (1953). They Yang CN and Mills RL (1954) Construction of isotopic spin and
were applied to the description of electrodynamics isotopic gauge invariance. Physical Review 96: 191.
Abelian Higgs Vortices 151

Abelian Higgs Vortices


J M Speight, University of Leeds, Leeds, UK We sometimes use polar coordinates in the spatial
2006 Elsevier Ltd. All rights reserved. plane, x = r( cos , sin ), and sometimes a complex
coordinate z = x1 ix2 = rei . Occasionally, it is
convenient to think of R21 as a subspace of R31
Introduction and denote by k the unit vector in the (fictitious)
third spatial direction. The complex scalar Higgs
For the purpose of this article, vortices are topological field is denoted , and the electromagnetic gauge
solitons arising in field theories in (2 1)-dimensional potential A , best thought of as the components of a
spacetime when a complex-valued field  is allowed to 1-form A = A dx . F = @ A  @ A is the field
acquire winding at infinity, meaning that the phase of strength tensor which, in R 21 , has only three
(t, x), as x traverses a large circle in the spatial plane, independent components, identified with the mag-
changes by 2n, where n is a nonzero integer. Such netic field B = F12 and electric field (E1 , E2 ) =
winding cannot be removed by any continuous (F01 , F02 ). The gauge-covariant derivative is D  =
deformation of  (hence topological) and traps a @   ieA , e being the electric charge of the Higgs.
considerable amount of energy which tends to coalesce Under a U(1) gauge transformation,
into smooth, stable lumps with highly particle-like
characteristics (hence solitons). Clearly, the universe  7! ei ; A 7! A e1 @  1
is (3 1) dimensional. Nonetheless, planar field :R 21
! R being any smooth function, F and
theories are of physical interest for two main reasons. jj remain invariant, while D  7! ei D . Only
First, the theory may arise by dimensional reduction of gauge-invariant quantities are physically observable
a (3 1)-dimensional model under the assumption of (classically).
translation invariance in one direction. Vortices are With these conventions, the AHM has Lagrangian
then transverse slices through straight tube-like objects density
variously interpreted as magnetic flux tubes in a
superconductor or cosmic strings. Second, a crucial 1  
L F F D D    2  jj2 2 2
ingredient of the standard model of particle physics is 4 2 8
spontaneous breaking of gauge symmetry by a Higgs which is manifestly gauge invariant. By rescaling
field. As well as endowing the fundamental gauge , A , x and the unit of action, we can (and
bosons and chiral fermions with mass, this mechanism henceforth will) assume that e =  =  = 1. The
can potentially generate various types of topological only parameter which cannot be scaled away is  > 0.
solitons (monopoles, strings, and domain walls) whose Its value greatly influences the models behavior.
structure and interactions one would like to under- The field equations, obtained by demanding that
stand. Vortices in (2 1) dimensions are interesting in (x),RA (x) be a local extremal of the action
this regard because they arise in the simplest field S = Ld3 x, are
theory exhibiting the Higgs mechanism, the abelian
Higgs model (AHM). They are thus a useful theoret- 
D D  1  jj2  0
ical laboratory in which to test ideas which may 2 3
i    0
ultimately find application in more realistic theories. @  F D   D
2
This article describes the properties of abelian Higgs
vortices and explains how, using a mixture of This is a coupled set of nonlinear second-order PDEs.
numerical and analytical techniques, a good under- Of particular interest are solutions which have finite
standing of their dynamical interactions has been total energy. Energy is not a Lorentz-invariant
obtained. quantity. To define it we must choose an inertial
frame and, having broken Lorentz invariance, it is
convenient to work in a temporal gauge, for which
The Abelian Higgs Model A0  0 (which may beR obtained by a gauge transfor-
t
Throughout this article spacetime will be R21 mation with (t, x) = 0 A0 (t0 , x) dt0 , after which only
endowed with the Minkowski metric with signature time-independent gauge transformations are per-
( ,  ,  ), and Cartesian coordinates x ,  = mitted). The potential energy of a field is then
0, 1, 2, with x0 = t (the speed of light c = 1). A Z  
1  2 2
spacetime point will be denoted x, its spatial part by E B Di Di  1  jj dx1 dx2
2
2 4
x = (x1 , x2 ). Latin indices j, k, . . . range over 1, 2, and
repeated indices (Latin or Greek) are summed over. Emag Egrad Eself 4
152 Abelian Higgs Vortices

while its kinetic energy is const2 r2 as r ! 0. It is known that solutions to this


Z system, which we shall call n-vortices, exist for all
1  
Ekin j@0 Aj2 @0 @0  dx1 dx2 5 n, , though no explicit formulas for them are
2 known. They may be found numerically, and are
If , A satisfy the field equations then the total depicted in Figure 1. Note that
and a always rise
energy Etot = Ekin E is independent of t. By monotonically to their vacuum values, and B always
Derricks theorem, static solutions have Emag  falls monotonically to 0, as r increases. These
Eself (Manton and Sutcliffe 2004, pp. 8287). solutions have their magnetic flux concentrated in a
Configurations with finite energy have quantized single, symmetric lump, a flux tube in the R31
total magnetic flux. To see this, note that E finite picture. In contrast, the total energy density (inte-
implies jj ! 1 as r ! 1, so   ei (r, ) at large r for grand of E in [4]) is nonmonotonic for n  2, being
some real (in general, multivalued) function . The peaked on a ring whose radius grows with n. This is
winding number of  is its winding around a circle of a common feature of planar solitons.
large radius R, that is, the integer n = ( (R, 2)  The large r asymptotics of n-vortices are well
(R, 0))=2. Although the phase of  is clearly gauge understood. For   4 one may linearize [7] about
dependent, n is not, because to change this, a gauge
= 1, a = n, yielding
transformation ei : R 2 ! U(1) would itself need qn p
nonzero winding around the circle, contradicting
r  1 K0 r 8
2
smoothness of ei . The model is invariant under
spatial reflexions, under which n 7! n, so we will mn
assume (unless noted otherwise) that n  0. Finite- ar  n rK1 r 9
2
ness of E also implies that D = d  iA ! 0, so
A  id=  d as r ! 1 (note  6 0 for large r). where qn , mn are unknown constants and K
Hence, the total magnetic flux is denotes the modified Bessels function. For  > 4
Z I Z 2 linearization is no longer well justified, and the
2 asymptotic behaviour of
(though not a) is quite
B d x lim A lim @ d 2n 6
R2 R!1 SR R!1 0 different (Manton and Sutcliffe 2004, pp. 174175).
We shall not consider this rather extreme regime
where SR = {x : jxj = R} and we have used Stokess further. Note that
theorem. The above argument uses only generic r
properties of E, namely that finite Eself requires jj  r
K r  e as r ! 1 10
to assume a nonzero constant value as r ! 1. So 2r
flux quantization is a robust feature of this type of
model. As presented, the argument is somewhat for all , so both
and a approach their vacuum
formal, but it can be made mathematically rigorous values exponentially
p fast, but with different decay
at the cost of gauge-fixing technicalities (Manton lengths: 1=  for
, 1 for a. This can be seen in
and Sutcliffe 2004, pp. 164166). Note that if n 6 0 Figure 1a. The constants qn and mn depend on  and
then, by continuity, (x) must vanish at some x 2 must be inferred by comparing the numerical
R2 , and one expects a lump of energy density to be solutions with [8], [9]; q = q1 and m = m1 will
associated with each such x since  = 0 maximizes receive a physical interpretation shortly.
the integrand of Eself . The 1-vortex (henceforth just vortex) is stable for
all , but n-vortices with n  2 are unstable to break
up into n separate vortices if  > 1. We shall say that
Radially Symmetric Vortices the AHM is type I if  < 1, type II if  > 1, and
critically coupled if  = 1, based on this distinction. Let
The model supports static solutions within the En denote the energy of an n-vortex. Figure 2 shows
radially symmetric ansatz  =
(r)ein , A = a(r) d, the energy per vortex En =n plotted against n for
which reduces the field equations to a coupled pair  = 0.5, 1, and 2. It decreases with n for  = 0.5,
of nonlinear ODEs: indicating that it is energetically favorable for isolated
d2
1 d
1  vortices to coalesce into higher winding lumps. For
 n  a2
1 
2
0  = 2, by contrast, En =n increases with n indicating
dr2 r dr r2 2 7
2 that it is energetically favorable for n-vortices to fission
d a 1 da into their constituent vortex parts. The case  = 1
 n  a
2 0
dr2 r dr balances between these behaviors: En =n is independent
Finite energy requires limr!1
(r) = 1, limr!1 a(r) = n of n. In fact, the energy of a collection of vortices is
while smoothness requires
(r)  const1 rn , a(r)  independent of their positions in this case.
Abelian Higgs Vortices 153

1 0.5

0.8 0.4

0.6 0.3
, a

B
0.4 0.2
n=1 n=5
0.2 0.1

0 0
0 2 4 6 8 0 2 4 6 8
r r

(a) (b)

0.6
n=1
0.5
Energy density

0.4

0.3

0.2 n=5

0.1

0
0 2 4 6 8
r

(c)
Figure 1 Static, radially symmetric n-vortices: (a) the 1-vortex profile functions
(r ) (solid curve) and a(r ) (dashed curve) for  = 2, 1,
and 1/2, left to right; (b) the magnetic field B; and (c) the energy density of n-vortices, n = 1 to 5, left to right, for  = 1.

1.4 Interaction Energy


1.3 =2 A precise understanding of the type I/II dichotomy
can be obtained using the 2-vortex interaction
1.2 energy Eint (s) introduced by Jacobs and Rebbi. This
is defined to be the minimum of E over all n = 2
1.1
configurations for which (x) = 0 at some pair of
En /n

=1
1
points x1 , x2 distance s apart. One interprets x1 , x2
as the vortex positions. Eint can only depend on their
0.9 separation s = jx1  x2 j, by translation and rotation
=1 invariance. Figure 3 presents graphs of Eint (s)
2
0.8 generated by a lattice minimization algorithm. For
 < 1, vortices uniformly attract one another, so a
0.7
1 2 3 4 vortex pair has least energy when coincident. For
n  > 1, vortices uniformly repel, always lowering
Figure 2 The energy per unit winding En =n of radially their energy by moving further apart. The graph for
symmetric n-vortices for  = 1=2, 1, and 2.  = 1 would be a horizontal line, Eint (s) = 2.
154 Abelian Higgs Vortices

1.75
2.42

2.38
Eint

Eint
1.7

2.34

1.65 2.3
0 2 4 6 8 10 0 2 4 6 8 10
s s

(a) (b)
Figure 3 The 2-vortex interaction energy Eint (s) as a function of vortex separation (solid curve), in comparison with its asymptotic
1
form Eint (s) (dashed curve) for (a)  = 1=2 and (b)  = 2.

The large s behavior of Eint (s) is known, and can q2 p


be understood in two ways (Manton and Sutcliffe Eint s  E1
int s 2E1  K0 s
2
2004, pp. 177181). Speight, adapting ideas of m2
Manton on asymptotic monopole interactions, K0 s 15
2
observed that, in the real  gauge ( 7! ei ,
A 7! A  d), the difference between the vortex and
the vacuum  = 1, A = 0 at large r, Bettencourt and Rivers obtained the same formula
by a more direct superposition ansatz approach,
q p though they did not give the constants q, m a
1 K0  r 11
2 physical interpretation.
The force between a well-separated vortex pair,
m Eint 0 (s), consists of the mutual attraction
p of
A0 ; A  0; k  rK0 r 12 identical scalar monopoles, of range 1= , and the
2
mutual repulsion of identical magnetic dipoles, of
is identical to the solution of a linear Klein range 1. If  < 1, scalar attraction dominates at
GordonProca theory, large s so vortices attract. If  > 1, magnetic
repulsion dominates and they repel. If  = 1 then
@ @   ; @ @  1A j 13 q  m, as we shall see, so the forces cancel exactly.
Figure 3 shows both Eint and E1 int for  = 0.5, 2. The
in the presence of a composite point source, agreement is good for s large, but breaks down for
s < 4, as one expects. Vortices are not point
q x; j0 ; j m0; k  r x 14
particles, as in the linear model, and when they lie
located at the vortex position. Viewed from afar, close together the overlap of their cores produces
therefore, a vortex looks like a point particle significant effects.
carrying both a scalar monopole charge q and a The same method predicts the interaction energy
magnetic dipole moment m, a point between an n1 -vortex and an n2 -vortex at large
p vortex, separation. We just replace 2E1 by En1 En2 , q2 by
inducing a real scalar field of mass  (the Higgs
particle) and a vector boson field of mass 1 (the qn1 qn2 , and m2 by mn1 mn2 . In particular, an
photon). If physics is to be model independent, antivortex ((1)-vortex) has E1 = E1 , q1 = q1 = q,
therefore, the interaction energy of a pair of well- and m1 = m1 = m, so the interaction energy for
separated vortices should approach that of the a vortexantivortex pair is
corresponding pair of point vortices as the separa-
tion grows. Computing the latter is an easy exercise q2 p m2
Ev
v
int s  2E1  K0 r  K0 r 16
in classical linear field theory, yielding 2 2
Abelian Higgs Vortices 155

which is uniformly attractive. It would be pleasing if capture when  = 1=2. Since type I vortices attract,
qn , mn could be deduced easily from q, m. One one might expect  to be always negative, indicating
might guess qn = jnjq, mn = nm, in analogy with that the vortices deflect towards one another. In
monopoles. Unfortunately, this is false: qn , mn fact, as Figure 5a shows, this happens only for small
grow approximately exponentially with jnj. v and large b. Another naive expectation is that
 = 0 or  = 180 when b = 0 (either vortices pass
through one another or ricochet backwards in a
Vortex Scattering head-on collision). In fact  = 90 , the only other
The AHM being Lorentz invariant, one can obtain possibility allowed by reflexion symmetry of the
time-dependent solutions wherein a single n-vortex initial data. Figure 6 depicts snapshots of such a
travels at constant velocity, with speed 0 < v < 1 scattering process at modest v. The vortices deform
and Etot = (1  v2 )1=2 En , by Lorentz boosting the each other as they get close until, at the moment of
static solutions described above. Of more dynamical coincidence, they are close to the static 2-vortex
interest are solutions in which two or more vortices ring. They then break apart along a line perpendi-
undergo relative motion. The simplest problem is cular to their line of approach. One may consider
vortex scattering. Two vortices, initially well sepa- them to have exchanged half-vortices, so that each
rated, are propelled towards one another. In the emergent vortex is a mixture of the incoming
center-of-mass (COM) frame they have, as t ! 1, vortices. This rather surprising phenomenon was
equal speed v, and approach one another along actually predicted by Ruback in advance of any
parallel lines distance b (the impact parameter) numerical simulations and turns out to be a generic
apart, see Figure 4. If b = 0, they approach head- feature of planar topological solitons.
on. Assuming they do not capture one another, they Consider now the type II case ( = 2, Figure 5b).
interact and, as t ! 1, recede along parallel straight Here,  > 0 for all v, b as one expects of particles
lines having been deflected through an angle  (the that repel each other. Head-on scattering is more
scattering angle). If scattering is elastic, the exit lines interesting now since two regimes emerge: for v >
also lie b apart and each vortex travels at speed v as vcrit
0.3, one has the surprising 90 scattering
t ! 1. The dependence of  on v, b, and  has already described, while for v < vcrit the vortices
been studied through lattice simulations by several bounce backwards,  = 180 . This is easily
authors, perhaps most comprehensively by Myers, explained. In order to undergo 90 head-on scatter-
Rebbi, and Strilka (1992). We shall now describe ing, the vortices must become coincident (otherwise
their results. reflexion symmetry is violated), hence must have
Note first that vortex scattering is actually initial energy at least E2 . For v < vcrit , where
inelastic: vortices recede with speed < v because
2E1
some of their initial kinetic energy is dispersed by p E2 17
the collision as small-amplitude traveling waves 1  vcrit 2
(radiation). This energy loss can be as high as they have too little energy, so come to a halt before
80% in very fast collisions at small b. At small v the coincidence, then recede from one another. The
energy loss is tiny, but can still have important solution vcrit of [17] depends on  and is plotted in
consequences for type I vortices: if v is very small, Figure 7. For v slightly above vcrit , we see that, in
they start with only just enough energy to escape contrast to the type I case, (b) is not monotonic:
their mutual attraction. In undergoing a small b maximum deflection occurs at nonzero b.
collision they can lose enough of this energy to The point vortex formalism yields a simple model
become trapped in an oscillating bound state. In this of type II vortex scattering which is remarkably
case they do not truly scatter and  is ill-defined. successful at small v. One writes down the Lagrangian
Myers et al. find that v  0.2 suffices to avoid for two identical (nonrelativistic) point particles of
mass E1 moving along trajectories x1 (t), x2 (t) under
the influence of the repulsive potential E1
int ,

L 12 E1 jx_ 1 j2 jx_ 2 j2  E1
int jx1  x2 j 18
b
Energy and angular momentum conservation reduce
(v, b) to an integral over one variable (s = jx1  x2 j)
which is easily computed numerically. To illustrate,
Figure 5b shows the result for  = 2, v = 0.1
Figure 4 The geometry of vortex scattering. in comparison with the lattice simulations of
156 Abelian Higgs Vortices

180
100
160

140

50 120

100

80

0 60

40

20

50 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
b b

(a) (b)

90

80

70

60

50

40

30

20

10

0
0 1 2 3 4 5 6
b

(c)
Figure 5 The 2-vortex scattering angle  as a function of impact parameter b for v = 0.1 (5), v = 0.2 (4),
v = 0.3 (}), v = 0.4 (&), v = 0.5 (), and v = 0.9 (), as computed by Myers et al. (1992): (a)  = 1=2; (b)  = 2; (c)  = 1. The
dotted curves are merely guides to the eye.pThe solid curves in (b), (c) were computed using the point vortex model. Note that Myers
et al. use different normalizations, so b = 2bMRS and  = MRS =2.

Myers et al. The agreement is almost perfect. For (v, 0) = 90 for all v, just as in the large v type I
large v the approximation breaks down not only and type II cases. The point is that scalar attraction
because relativistic corrections become significant, and magnetic repulsion of vortices are mediated by
but also because small b collisions then probe the small fields with different Lorentz transformation proper-
jx1  x2 j region where vortex core overlap effects ties. While they cancel for static vortices, there is no
become important. For the same reason, the point reason to expect them to cancel for vortices in
vortex model is less useful for type I scattering. relative motion.
Here there is no repulsion to keep the vortices well
separated, so its validity is restricted to the small v,
Critical Coupling
large b regime.
Critical coupling is theoretically the most inter- The AHM with  = 1 has many remarkable proper-
esting regime, where most analytic progress has been ties, at which we have so far only hinted. These all
made. Since Eint  E1 int  0, one might expect vortex stem from Bogomolnyis crucial observation
scattering to be trivial ((v, b)  0), but this is quite (Manton and Sutcliffe 2004, pp. 197202) that the
wrong, as shown in Figure 5c. In particular, potential energy in this case can be rewritten as
Abelian Higgs Vortices 157

0.5

0.4

0.3

crit
0.2

0.1

0
1 2 3 4 5 6 7 8

Figure 7 The critical velocity for 90 head-on scattering of type
II vortices vcrit as a function of , as predicted by equation [17]
(solid curve), in comparison with the results of Myers et al.
(1992), (crosses).

Z ( 2
1 1 2
E B  1  jj
2 2
 Z
jD1  iD2 j2 B d2 x  i 
dD 19
R2

The last integral vanishes by Stokess theorem, so


E  n by flux quantization [6], and E = n if and
only if
D1 iD2  0 20
1
2 1  jj2 B 21
Note that system [20], [21] is first order, in contrast
to the second-order field equations [3]. No explicit
solutions of [20], [21] are known. However, Taubes
has proved that for each unordered list
[z1 , z2 , . . . , zn ] of n points in C, not necessarily
distinct, there exists a solution of [20], [21], unique
up to gauge transformations, with (z1 ) = (z2 ) =
= (zn ) = 0 and  nonvanishing elsewhere, the
zero at zr having the same multiplicity as zr has in
the list. Note that the list is unordered: a solution is
uniquely determined by the positions and multi-
plicities of the zeroes of , but the order in which we
label these is irrelevant. The solution minimizes E
within the class Cn of winding n configurations, so is
automatically a stable static solution of the model.
Equation [20] applied to the symmetric n-vortex,
 =
(r) ein , A = a(r) d implies a(r) = n  r
0 (r)=
(r).
Comparing with [8], [9], it follows that qn = mn
when  = 1 as previously claimed, since K1 = K00 .
Figure 6 Snapshots of the energy density during a head-on Tong has conjectured, based on a string duality
collision of vortices. This 90 scattering phenomenon is a argument, that q1 = 281=4 . This is consistent with
generic feature of planar topological soliton dynamics. current numerics but has no direct derivation so far.
158 Abelian Higgs Vortices

Taubess theorem shows that this n-vortex is just is approximately independent of v for v  0.5.
one point, corresponding to the list [0, 0, . . . , 0], in a Further, Stuart (1994) has proved that, for initial
2n-dimensional space of static multivortex solutions speeds of order , small, the fields stay (pointwise) 2
called the moduli space Mn . This space may be close to their geodesic approximant for times of
visualized as the flat, finite-dimensional valley order 1 .
bottom in Cn on which E attains its minimum On symmetry grounds, two vortex dynamics in
value, n. Points in Mn are in one-to-one correspon- the COM frame reduces to geodesic motion in M02
dence with distinct unordered lists [z1 , z2 , . . . , zn ], C, the subspace of centered 2-vortices (a1 = 0, so
which are themselves in one-to-one correspondence z1 = z2 ), with induced metric
with points in Cn , as follows. To each list, we assign
the unique monic polynomial whose roots are zr , 0 Gja0 jda0 da0 25

pz z  z1 z  z2 z  zn G being some positive function. Note that a0 = z1 z2 ,


n1 n so the intervortex distance jz1  z2 j = 2jz1 j = 2ja0 j1=2 .
a0 a1 z an1 z z 22
The line a0 =  2 R, traversed with  increasing, say,
0
This polynomial is uniquely determined by its is geodesic inpM
2 . The vortex positions (roots of
p
coefficients (a0 , a1 , . . . , an1 ) 2 Cn , which give good 2
z a0 ) are jj for   0 and i  for  > 0.
global coordinates on Mn Cn . The zeros zr of  This describes perfectly the 90 scattering phenom-
may be used as local coordinates on Mn , away from enon: two vortices approach head-on along the x1
, the subset of Mn on which two or more of the axis, coincide to form a 2-vortex ring, then break
zeros zr coincide, but are not good global apart along the x2 axis, as in Figure 6. This behavior
coordinates. occurs because a0 = z1 z2 , rather than z1  z2 , is the
Let (, A)a denote the static solution correspond- correct global coordinate on M02 , since vortices are
ing to a 2 Cn . If the zeros zr are all at least s apart, classically indistinguishable.
Taubes showed the solution is just a linear super- Samols found a useful formula (Manton and
position of 1-vortices located at zr , up to corrections Sutcliffe 2004, pp. 205215) for in terms of the
exponentially small in s. Imagine these constituent behavior of ja j close to its zeros, using which he
vortices are pushed with small initial velocities. devised an efficient numerical scheme to evaluate
Then ((t), A(t)) must remain close to the valley G(ja0 j), and computed (b) in detail, finding
bottom Mn , since departing from it costs kinetic excellent agreement with lattice simulations at low
energy, of which there is little. Manton has speeds. He also studied the quantum scattering of
suggested, therefore, that the dynamics is well vortices, approximating the quantum state by a
approximated by the constrained variational problem wave function  on Mn evolving according to the
wherein ((t), A(t))
R = (, RA)a(t) 2 Mn for all t. Since natural Schrodinger equation for quantum geodesic
the action S = L d3 x = (Ekin  E) dt, and E = n, motion,
constant, on Mn , this constrained problem amounts
@
to Lagrangian mechanics on configuration space Mn ih  12 h2   26
with Lagrangian L = Ekin jMn . Now Ekin is real, @t
positive, and quadratic in time derivatives of , A, so where  is the LaplaceBeltrami operator on
1 _ _ (Mn , ). This technique, introduced for monopoles
L 2 rs aar as 23
by Gibbons and Manton, is now standard for
rs forming the entries of a positive-definite n  n solitons of Bogomolnyi type.
Hermitian matrix ( sr  rs ). Since (, A)a is not By analyzing the forces between moving point
known explicitly, neither are rs (a). Observe, how- vortices at  = 1, Manton and Speight (2003)
ever, that L is the Lagrangian for geodesic motion in showed that, as the vortex separations become
Mn with respect to the Riemannian metric uniformly large, the metric on Mn approaches
rs adar d
as 24 X q2 X
1  dzr dzr  K0 jzr  zs j
Manton originally proposed this geodesic approx- r
4 s6r
imation for monopoles, but it is now standard for all
topological solitons of Bogomolnyi type (where one  dzr  dzs dzr  dzs 27
has a moduli space of static multisolitons saturating
a topological lower bound on E). Note that This formula can also be obtained by a method of
geodesics are independent of initial speed, which matched asymptotic expansions. We can use [27] to
agrees with Myers et al: Figure 5c shows that (v, b) study 2-vortex scattering for large b, when the
Abelian Higgs Vortices 159

vortices remain well separated. (Note that 1 is not reference frame (the rest frame of the superconduc-
positive definite if any jzr  zs j becomes too small.) tor) so it is unsurprising that the Lorentz-invariant
The results are good, provided v  0.5 and b  3 AHM is inappropriate. Insofar as vortices move at
(see Figure 5c). all, they seem to obey a first-order (in time)
dynamical system, in contrast to the second-order
AHM. Manton has devised a first-order system
Other Developments which may have relevance to superconductivity, by
replacing Ekin with a ChernSimonsSchrodinger func-
The (critically coupled) AHM on a compact physical tional (Manton and Sutcliffe 2004, pp. 193197).
space  is of considerable theoretical and physical Rather than attracting or repelling, vortices now
interest. Bradlow showed that Mn () is empty unless tend to orbit one another at constant separation.
V = Area()  4n, so there is a limit to how many There is again a moduli space approximation to
vortices a space of finite area can accommodate slow vortex dynamics for 
1, but it has a
(Manton and Sutcliffe 2004, pp. 227230). Manton Hamiltonian-mechanical rather than Riemannian-
has analyzed the thermodynamics of a gas of geometric flavor.
vortices by studying the statistical mechanics of Finally, an interesting simplification of the AHM,
geodesic flow on Mn (). In this context, spatial which arises, for example, as a phenomenological
compactness is a technical device to allow nonzero model of liquid helium-4, is obtained if we discard the
vortex density n=V for finite n, without confining gauge field A , or equivalently set the electric charge of
the fields to a finite box, which would destroy the  to e = 0. There is now no type I/II classification, since
Bogomolnyi properties. In the limit of interest,  may be absorbed by rescaling. The resulting model,
n, V ! 1 with n/V fixed, the thermodynamical which has only global U(1) phase symmetry, supports
properties turn out to depend on  only through n-vortices  =
(r)ein for all n, but these are not
V, so  = S2 and  = T 2 give equivalent results, for exponentially spatially localized,
example. The equation of state of the gas is
(P = pressure, T = temperature) n2 n2 8 n2

r 1   Or6 29
r2 22 r4
nT
P 28 and cannot have finite E by Derricks theorem. They
V  4n
are unstable for jnj > 1, and 1-vortices uniformly
which is similar, at low density n/V, to that of a gas repel one another. They can be given an interesting
of hard disks of area 2. The crucial step in deriving first-order dynamics (the GrossPitaevski equation).
[28] is to find the volume of Mn () which, despite
there being no formula for , may be computed
exactly by remarkable indirect arguments (Manton Abbreviations
and Sutcliffe 2004, pp. 231234).
A electromagnetic gauge potential
The static AHM coincides with the Ginzburg
b impact parameter
Landau model of superconductivity, which has
D gauge-covariant derivative
precisely the same type I/II classification. Here the E potential energy
Higgs field represents the wave function of a Ekin kinetic energy
condensate of Cooper pairs, usually (but not always) F electromagnetic field strength tensor
electrons. There has been a parallel development of L Lagrangian
the static model by condensed matter theorists, L Lagrangian density
therefore; see Fossheim and Sudbo (2004), for S action
example. In fact the vortex was actually first  Higgs field
discovered by Abrikosov in the condensed matter  scattering angle
context. One important difference is that type I
superconductors do not support vortex solutions in
an external magnetic field Bext because the critical See also: Fractional Quantum Hall Effect;
GinzburgLandau Equation; High Tc Superconductor
jBext j required to create a single vortex is greater
Theory; Integrable Systems: Overview; Nonperturbative
than the critical jBext j required to destroy the
and Topological Aspects of Gauge Theory; Quantum
condensate completely (  0). Type II supercon- Fields with Topological Defects; Solitons and Other
ductors do support vortices, and there are such Extended Field Configurations; Symmetry Breaking in
superconductors with 
1, but the vortex Field Theory; Topological Defects and Their Homotopy
dynamics we have described is not relevant to these Classification; Variational Techniques for
systems. In this context there is an obvious preferred GinzburgLandau Energies.
160 Adiabatic Piston

Further Reading Manton NS and Sutcliffe PM (2004) Topological Solitons.


Cambridge: Cambridge University Press.
Atiyah M and Hitchin N (1988) The Geometry and Dynamics of Myers E, Rebbi C, and Strilka R (1992) Study of the interaction
Magnetic Monopoles. Princeton: Princeton University Press. and scattering of vortices in the abelian Higgs (or Ginzburg-
Fossheim K and Sudbo A (2004) Superconductivity: Physics and Landau) model. Physical Review 45: 13551364.
Applications. Hoboken NJ: Wiley. Rajaraman R (1989) Solitons and Instantons. Amsterdam: North-
Jaffe A and Taubes C (1980) Vortices and Monopoles: Structure Holland.
of Static Gauge Theories. Boston: Birkhauser. Stuart D (1994) Dynamics of abelian Higgs vortices in the near
Nakahara M (1990) Geometry, Topology and Physics. Bristol: Bogomolny regime. Communications in Mathematical Physics
Adam-Hilger. 159: 5191.
Manton NS and Speight JM (2003) Asymptotic interactions of Vilenkin A and Shellard EPS (1994) Cosmic Strings and Other
critically coupled vortices. Communications in Mathematical Topological Defects. Cambridge: Cambridge University Press.
Physics 236: 535555.

Adiabatic Piston
Ch Gruber, Ecole Polytechnique Federale de question is to find the final state, that is, the final
Lausanne, Lausanne, Switzerland position Xf of the piston and the parameters (p
f , Tf )
A Lesne, Universite P.-M. Curie, Paris VI, Paris, of the gases.
France In the late 1950s, using the two laws of
2006 Elsevier Ltd. All rights reserved. equilibrium thermodynamics (i.e., thermostatics),
Landau and Lifshitz concluded that the adiabatic
piston will evolve toward a final state where
Introduction p =T  = p =T . Later, Callen (1963) and others
realized that the maximum entropy condition
Macroscopic Problem implies that the system will reach mechanical

The adiabatic piston is an old problem of equilibrium where the pressures are equal p f = pf ;
thermodynamics which has had a long and con- however, nothing could be said concerning the final
troversial history. It is the simplest example con- position Xf or the final temperatures Tf which
cerning the time evolution of an adiabatic wall, that should depend explicitly on the viscosity of the
is, a wall which does not conduct heat. The system fluids. It thus became a controversial problem since
consists of a gas in a cylinder divided by an one was forced to accept that the two laws of
adiabatic wall (the piston). Initially, the piston is thermostatics are not sufficient to predict the final
held fixed by a clamp and the two gases are in state as soon as adiabatic movable walls are
thermal equilibrium characterized by (p , T , N ), involved (see early references in Gruber (1999)).
where the index / refers to the gas on the left/right Experimentally, the adiabatic piston was used
side of the piston and (p, T, N) denote the pressure, already before 1924 to measure the ratio cp =cv of
the temperature, and the number of particles the specific heats of gases. In 2000, new measure-
(Figure 1). Since the piston is adiabatic, the whole ments have shown that one has to distinguish
system remains in equilibrium even if T  6 T . At between two regimes, corresponding to weak damp-
time t = 0, the clamp is removed and the piston is let ing or strong damping, with very different proper-
free to move without any friction in the cylinder. The ties, for example, for weak damping the frequency
of oscillations corresponds to adiabatic oscillations,
whereas for strong damping it corresponds to
isothermal oscillations.

N N+
Microscopic Problem
p p+ A
T T+ The adiabatic piston was first considered from a
microscopic point of view by Lebowitz who intro-
duced in 1959 a simple model to study heat
conduction. In this model, the gas consists of point
particles of mass m making purely elastic collisions
0 X L
on the wall of the cylinder and on the piston.
Figure 1 The adiabatic piston problem. Furthermore, the gas is very dilute so that the
Adiabatic Piston 161

equation of state p = nkB T is satisfied at equili- extension to hard-disk particles was analyzed at
brium, where n is the density of particles in the gas the same time by Kestemont et al. (2000). Recently,
and kB the Boltzmann constant. The adiabatic piston several other authors have contributed to this
is taken as a heavy particle of mass M  m without subject.
any internal degree of freedom. Using this same The general picture which emerges from all the
model Feynman (1965) gave a qualitative analysis in investigations is the following. For an infinite
Lectures in Physics. He argued intuitively but cylinder, starting with mechanical equilibrium
correctly that the system should converge first p = p = p, the piston evolves to a stationary
toward a state of mechanical equilibrium where stochastic state with nonzero velocity toward the
p = p and then very slowly toward thermal warmer side
equilibrium. This approach toward thermal equili- r
brium is associated with the wiggles of the piston m kB p p m
hVi T  T  o 1
induced by the random collisions with the atoms of M 8m M
the gas. Of course, this stochastic behavior is not
part of thermodynamics and the evolution beyond with relaxation time
the mechanical equilibrium cannot appear in the r  1
macroscopical framework assuming that the piston M kB 1 1 1
 p p 2
does not conduct heat. A 8m p T T
From a microscopical point of view, one is
where M=A is the mass per unit area of the piston.
confronted with two different problems: the
In this state the piston has a temperature
p
approach toward mechanical equilibrium in the
TP = T T  and there is a heat flux
absence of any a priori friction (where the entropy
of both gases should increase) and, on a different r
p p m 8kB m
timescale, the approach toward thermal equilibrium jQ T   T po
(where the entropy of one gas should decrease but M m M
the total entropy increase). p p p 3
The conceptual difficulties of the problem beyond
For a finite cylinder and p 6 p , the evolution
mechanical equilibrium come from the following
proceeds in four different stages. The first two are
intuitive reasoning. When the piston moves toward
deterministic and adiabatic. They correspond to the
the hotter gas, the atoms of the hotter gas gain
thermodynamic evolution of the (macroscopic)
energy, whereas those of the cooler gas lose energy.
adiabatic piston. The last two stages, which go
When the piston moves toward the cooler side, it is
beyond thermodynamics, are stochastic with heat
the opposite. Since on an average the hotter side
transfer across the piston. More precisely:
should cool down and the cold side should warm
up, we are led to conclude that on an average the 1. In the first stage whose duration is the time
piston should move toward the colder side. On the needed for the shock wave to bounce back on the
other hand, from p = nkB T, the piston should move piston, the evolution corresponds to the case of
toward the warmer side to maintain pressure the infinite cylinder (with p 6 p ). If
balance. R = Nm=M > 10, the piston will be able to
In 1996, Crosignani, Di Porto, and Segev intro- reach and maintain a constant velocity
duced a kinetic model to obtain equations describing r p
kB T T m
the adiabatic approach toward mechanical equili- 
V p  p p p O
brium. Starting with the microscopical model 8m p T p T
  M
introduced by Lebowitz, Gruber, Piasecki, and for jp  p j  1 4
Frachebourg, later joined by Lesne and Pache,
initiated in 1998 a systematic investigation of the 2. In the second stage the evolution toward
adiabatic piston within the framework of statistical mechanical equilibrium is either weakly or
mechanics, together with a large number of numer- strongly damped depending on R. If R < 1, the
ical simulations. This analysis was based on the fact evolution is very weakly damped,
p the dynamics
that m=M is a very small parameter to investigate takes place on a timescale t0 = Rt, and the effect
expansions in powers of m=M (see Gruber and of the collisions on the piston is to introduce an
Piasecki (1999) and Gruber et al. (2003) and external potential (X) = c1 =X2 c2 =(L  X)2 .
reference therein). An approach using dynamical On the other hand, if R > 4, the evolution is
system methods was then developed by Lebowitz strongly damped (with two oscillations only) and
et al. (2000) and Chernov et al. (2002). An depends neither on M nor on R.
162 Adiabatic Piston

3. After mechanical equilibrium has been reached, independent of the transverse coordinates. We are
the third stage is a stochastic approach toward thus led to a formally one-dimensional problem
thermal equilibrium associated with heat transfer (except for normalizations). Therefore, in this
across the piston. This evolution is very slow and review, we consider that the particles are noninter-
exhibits a scaling property with respect to acting and all velocities are parallel to the x-axis.
t0 = mt=M. From the collision law, if v and V denote the
4. After thermal equilibrium has been reached velocities of a particle and the piston before a
(T  = T , p = p ), in a fourth stage the gas collision, then under the collision on the piston:
will evolve very slowly toward a state with
Maxwellian distribution of velocities, induced v ! v 0 2V  v v  V
5
by the collision with the stochastic piston. V ! V 0 V v  V
The general conclusion is thus that a wall which is where
adiabatic when fixed will become a heat conductor
under a stochastic motion. However, it should be 2m
 6
stressed that the time required to reach thermal Mm
equilibrium will be several orders of magnitude larger Similarly, under a collision of a particle with the
than the age of the universe for a macroscopical piston boundary at x = 0 or x = L:
and such a wall could not reasonably be called a heat
conductor. However, for mesoscopic systems, the effect v ! v0 v 7
of stochasticity may lead to very interesting properties,
Let us mention that more general models have also
as shown by Van den Broeck et al. (2004) in their
been considered, for example, the case where the
investigations of Brownian (or biological) motors.
two fluids are made of point particles with different
masses m , or two-dimensional models where the
particles are hard disks. However, no significant
Microscopical Model
differences appear in these more general models and
The system consists of two fluids separated by an we restrict this article to the simplest case.
adiabatic piston inside a cylinder with x-axis, One can study different situations: L = 1, L
length L, and area A. The fluids are made of N  finite, and L ! 1. Furthermore, taking first M and
identical light particles of mass m. The piston is a A finite, one can investigate several limits.
heavy flat disk, without any internal degree of
1. Thermodynamic limit for the piston only. In
freedom, of mass M  m, orthogonal to the
this limit, L is fixed (finite or infinite) and
x-axis, and velocity parallel to this x-axis. If the
A ! 1, M ! 1, keeping constant the initial
piston is fixed at some position X0 , and if the two
densities n  of the fluid and the parameter
fluids are in thermal equilibrium characterized by
(p  
0 , T0 , N ), then they will remain in equilibrium 2mA A
forever even if T0 6 T0 : it is thus an adiabatic  A  2m 8
Mm M
piston in the sense of thermodynamics. At a certain
time t = 0, the piston is let free to move and the If L is finite, this means that N  ! 1 while
problem is to study the time evolution. To define the keeping constant the parameters
dynamics, we consider that the system is purely 
Hamiltonian, that is, the particles and the piston mN  Mgas
R 9
move without any friction according to the laws of M M
mechanics. In particular, the collisions between the
2. Thermodynamic limit for the whole system,
particles and the walls of the cylinder, or the piston,
where L ! 1 and A  L2 , N   L3 . In this
are purely elastic and the total energy of the system
limit, space and time variables are rescaled
is conserved. In most studies, one considers that the
according to x0 = x=L and t 0 = t=L. This limit
particles are point particles making purely elastic
can bepconsidered
as a limiting case of (1) where
collisions. Since the piston is bound to move only in
R  A ! 1 (and time is scaled).
the x-direction, the velocity components of the
3. Continuum limit where L and M are fixed and
particles in the transverse directions play no role in
N  ! 1, m ! 0 keeping M  gas constant, that is,
this problem. Moreover, since there is no coupling
R = cte.
between the components in the x- and transverse
directions, one can simplify the model further by The case L infinite and the limit (1) have been
assuming that all probability distributions are investigated using statistical mechanics (Liouville or
Adiabatic Piston 163

Boltzmanns equations). On the other hand, the where (v 0 , V 0 ) are given by eqn [5] and
limit (2) has been studied using dynamical system Z 1
methods, reducing first the system to a billiard in an  v; V; t dX; P X; v ; X; V; t 14
surf
(N N  1)-dimensional polyhedron. The limit 1
(3) has been introduced to derive hydrodynamical
We thus have to solve eqns [12][13] with initial
equations for the fluids.
conditions
In this article, we present the approach based on
statistical mechanics. Although not as rigorous as (2)  x; v; t 0 n 
0 0 v x X0  x
on a mathematical level, it yields more informations
on the approach toward mechanical and thermal  x; v; t 0 n
0 0 v L  x x  X0 15
equilibrium. Moreover, it indicates what are the V; t 0
V
open problems which should be mathematically
solved. In all investigations, advantage is taken of Using the fact that  = 2m=(M m)  1, we can
the fact that m/M is very small and one introduces rewrite eqn [13] as a formal series in powers of :
the small parameter  
p X1
1k  k1 @ k e
 m=M  1 10 @t V; t  Fk1 V; t 16
k1
k! @V
Let us note that  measures the ratio of thermal
Z 1
velocities for the piston and a fluid particle, whereas
   2 measures the ratio of velocity changes during
~k V; t
F v  Vk 
surf v ; V; tdv
V
a collision. Z V
 v  Vk 
surf v ; V; tdv 17
1

Starting Point: Exact Equations from which one obtains the equations for the
moments of the piston velocity:
Using the statistical point of view, the time evolution
is given by Liouvilles equation for the probability
1 dhV n i
distribution on the whole phase space for (N
 dt
N  1) particles, with L, A, N  , and M finite. Z
X n
n! 1
Initially (t  0), the piston is fixed at (X0 , V0 = 0) k1 ~k1 V; t
dV V nk F 18
and the fluids are in thermal equilibrium with k1
k!n  k! 1
homogeneous densities n 0 , velocity distributions
 
0 (v) = 0 (v), and temperatures
However, we do not know the two-point correlation
Z 1 functions.

T0 m dv n 
0 0 vv
2
11 If the length of the cylinder is infinite, the
1 condition M  m implies that the probability for
Integrating out the irrelevant degrees of freedom, a particle to make more than one collision on the
the Liouvilles equation yields the equations for piston is negligible. Alternatively, one could choose
the distribution  (x, v; t) of the right and left initial distributions 0 (v) which are zero for jvj <
particles: vmin , where vmin is taken such that the probability
of a recollision is strictly zero. Therefore, if L = 1,
@t   x; v; t v@x   x; v ; t I  x; v ; t 12 one can consider that before a collision on the

The collision term I (x, v; t) is a functional of piston the particles are distributed with  0 (v) for
, P (X, v; X, V; t), the two-point correlation func- all t, and the two-point correlation functions
tion for a right (resp. left) particle at (x = X, v) and factorize, that is,
the piston at (X, V). Similarly, one obtains for the
 
surf v; V; t surf v; tV; t; if v > V
velocity distribution of the piston: 19
Z 1 
surf v; V; t 
surf v; tV; t; if v < V

@t V; t A V  v V  v 0 0
surf v ; V ; t
1 where for L = 1,   
surf (v; t) = n0 0 (v) and thus the
 conditions to obtain eqn [18] are satisfied.
v  V surf v ; V; t dv
Z 1 If L is finite, one can show that the factorization

A V  v v  V 0 0
surf v ; V ; t
property (eqn [19]) is an exact relation in the
1
 thermodynamic limit for the piston (A ! 1,
V  v surf v ; V ; t dv 13 M=A = cte). For finite L and finite A, we introduce
164 Adiabatic Piston

Assumption 1 (Factorization condition). Before a from which one obtains equations for dr =dt. In
collision the two-point correlation functions have the particular, using the identities
factorization property (eqn [19]) to first order in .
r1;  r;  r2;  r; 
Under the factorization condition, we have F3 3F2 ; F2 2F0 29

~k V; t Fk V; tV; t
F 20 in [22] and [24], we have


with Z  t
F2 V; t  F2 V;
1
Fk V; t dvv  Vk 
surf v; t
X 2 r; 
V F0 2r 30
Z V r 0
2 r!
 dvv  Vk 
surf v; t
1
Fk V; t  Fk V; t 21  
d hE i

 M F2 V; t  V 
and from eqn [18] dt A
  X1
M d 
F3 V; t 1 2r  3
hVi MhF2 V; ti 22 2 2 r 2 r!
A dt

r1;  

F2 V; tr 31
 
M d 2
hV i MhVF2 V;ti hF3 V;ti  23
A dt Depending on the questions or approximations one
 = hV i then from eqns [12] and [20], wants to study, either the distribution (V; t) or the
Introducing V 
moments hV n it will be the interesting objects.
it follows that the (kinetic) energies satisfy
Finally, with the condition [19], one can take
  h
d hE i eqn [12] for x 6 Xt and impose the boundary
 M hF2 V; ti V
dt A conditions at x = Xt :
  V; ti
hV  VF 2
i
  Xt ; v; t  Xt ; v 0 ; t; if v < Vt
  32
hF3 V; ti 24  Xt ; v; t  Xt ; v 0 ; t; if v > Vt
2
which implies conservation of energy.
From the first law of thermodynamics, and similarly for x = 0 and x = L with v 0 = v.
  Let us note that this factorization condition is of
d h E i 1 h P! i
the same nature as the molecular chaos assumption
PW PPQ!  25
dt A A introduced in kinetic theory, and with this condition
eqn [13] yields the Boltzmann equation for this
where PPW!  and PPQ!  denote the work- and model.
heat-power transmitted by the piston to the fluid, In the following, to obtain explicit results as a
we conclude from eqns [22] and [25] that the heat function of the initial temperatures T0 , we take
flux is Maxwellian distributions  0 (v) and initial condi-
1 P! h tions (p  
0 , T0 , n0 ) such that the velocity of the piston
  V; ti
PQ  M hV  VF remains small (i.e., jhVit j  jhv i0 j).
2 
A i
 
hF3 V; ti 26
2
Since   1, it is interesting to introduce the Distribution (V ; t) for the Infinite
irreducible moments Cylinder (L = 1)
p
 ri To lowest order in  = m=M, and assuming
r hV  V  27 j1  p =p j is of order , one obtains from eqn [16]
 = hVi ,
and the expansion around V the usual FokkerPlanck equation whose solution
t
gives
X
1
1 !
Fn V; t 
Fnr;  VV  r
 V 28
r! 1 1 V  V  t2
r0 0 V; t p exp  33
2 t 22 t
Adiabatic Piston 165

r 9
with m kB p p >
r 1 hVistat T  T > =
kB p p M 8m if p p 38

Vt 
p  p
p p 1  e t k p >
>
TT ;
8m T 2 B
T hV 2 istat  hVistat
s M
A 8m p p
p p 34 Let us remark that we have established eqn [35]
M kB T T under the condition that j1  p =p j = O(), but as
p p
kB p p T p T  we see in the next section, the stationary value Vstat
2 t TT p p 1  e2 t obtained from eqn [36] remains valid whenever
M p T  p T p
j(1  p =p )(1  T =T  )j  1.
where we have dropped the index zero on the
variable T  , n and used the equation of state
p = n kB T  .
In conclusion, in the thermodynamic limit for the Moments hV n it : Thermodynamic Limit
piston (M ! 1, M=A fixed), eqn [33] shows that for the Piston
the evolution is deterministic, that is, (V; t) = General Equations: Adiabatic Evolution


(V  V(t), 
where the velocity V(t) of the piston
tends exponentially fast toward stationary value In the thermodynamic limit M ! 1,  ! 0,  = A

Vstat = V(1) with relaxation time  = 1 . is fixed and eqn [16] reduces to
Let us note that for p = p , we have V(t)  0
@ ~
and the evolution [33] is identical to the @t V; t  F2 V; t 39
@V
OrnsteinUhlenbeck process of thermalization of
the Brownian particle starting with zero velocity Integrating [39] with initial condition (V; t = 0) =
and friction coefficient . The analysis of [16] to
(V) yields
first order in  yields then
" #

V;t
V  Vt; that is; hV n it hVint 40
X3
V; t 1  
ak tV  Vt k
0 V; t 35 where
k0
d
where ak (t) can be explicitly calculated and a0 (t) = Vt F2 Vt; t; Vt 0 0 41
dt
2 (t)a2 (t) because of the normalization condition.
Moreover, a2 (t)  (p  p ), that is, a2 (t) = 0 if Moreover,
p = p . From [35], one obtains ~2 V; t F2 V; tV; t
r F 42
p
kB T T
hVit p p and
8m p T  p T
n
; P X; v; X; V; t  x; v; t
X  Xt

p  p 1  e t


V  Vt 43
  p T  p T 
2
p  p p p
8 p T  p T 2 where dX(t)=dt = V(t), X(t = 0) = X0 .
In conclusion, as already mentioned, in this limit

1  2 te t  e2 t
the factorization condition (eqn [19]) is an exact
m 1 relation. Let us note that  
p p T  p T  surf (v; t) = surf (2V  v; t) if
M TT v > V(t) (on the right) or v < V(t) (on the left). Let
p p! o us also remark that 2mF2 (V(t); t) represents the
p T p T 

p p 1  e t 2 36 effective pressure from the right/left exerted on the
p T  p T piston. Moreover, since for any distribution

and  
surf (v; t), the functions F2 (V; t) and F2 (V; t) are
r
monotonically decreasing, we can introduce the
m decomposition
2
hV it  hVi2t 2
 t 1 2
2 ta2 t 37
M  
M 
From eqn [36], we now conclude that for equal p surf 2mF 
2 V; t ^
p 
 V; tV 44
A
pressures p = p , the piston will evolve stochasti-
cally to a stationary state with nonzero velocity where the static pressure at the surface is
toward the warmer side ^ (t) = p
p surf (V = 0; t) and the friction coefficients
166 Adiabatic Piston

 (V; t) are strictly positive. The evolution [41] is its final velocity Vstat and one can solve eqn [12] to
thus of the form obtain the evolution of the fluids.
d A  Finite Cylinder (L < 1, M = 1)
Vt p ^  VV
^ p 45
dt M
For finite L, introducing the average temperature in
It involves the difference of static pressure and the the fluids
friction coefficient (V) =  (V) (V). Finally,
from eqn [12], we obtain the evolution of the  2hE it
Tav 50
(kinetic) energy per unit area for the fluids in the left kB N 
and right compartments:
we have to solve [41] and [46], that is,
 
d < E > d A  
 2mF2 V; tV 46 Vt 2m F2 V; t  F2 V; t
dt A dt M 51
Therefore, from [40] and [46], and the first law of d  A
kB Tav 4m  F2 V; tV
thermodynamics, we recover the conclusions dt N
obtained in the previous section, that is, in the where F2 (V; t) is a functional of 
surf (v; t) which we
thermodynamic limit for the piston, the evolution decompose as
(eqns [41], [12], and [35]) is deterministic and  
adiabatic (i.e., in [46] only work and no heat is
F2 V; t n ^  t  M  V; tV 52
^ tkB T
involved). A

Infinite Cylinder (L = 1, M = 1) with


Z 1
As already discussed, for L = 1 we can neglect the 
n
^ t dv
surf v; t
recollisions. Therefore, in F2 the distribution  (v; t) 0
can be replaced by n   Z 53
0 0 (v) and F2 (V) is indepen- 0

dent of t. In this case, the evolution of the piston is n
^ t dv
surf v; t
1
simply given by the ordinary differential equation
and
d A
Vt 2mF2 V; Vt 0 0 47
dt M ^ kB T
n ^ p ^ 54
where F2 (V) is a strictly decreasing function of V. If p
For a time interval 1 = L m=kB T which is the time
p 
0 = p0 , then V(t) = 0, that is, the piston remains at for the shock wave to bounce back, the piston will
rest and the two fluids remain in their original
evolve as already discussed. In particular, if R is
thermal equilibrium. If p 
0 6 p0 , that is, n0 kB T0 6
  sufficiently large, then after a time 0 = O((R )1 ) the
n0 kB T0 , the piston will evolve monotonically to a piston will reach the velocity V  given by F2 (V,
 t) = 0
stationary state with constant velocity Vstat solution 
(eqn [47]). For t > 1 , F2 (V; t) depends explicitly on
of F2 (Vstat ) = 0. From [34], it follows that Vstat is a time. For R sufficiently large, we can expect that for
function of n  
0 =n0 , T0 , T0 but does not depend on all t the velocity V(t) will be a functional of  surf (v; t)
the value M=A. Moreover, the approach to this
given by F2 [V(t); surf (. ; t)] = 0, and thus the problem
stationary state is exponentially fast with relaxation
is to solve eqn [12] with the boundary condition (eqn
time 0 = 1= (V = 0). For Maxwellian distributions
[32]). Since V(t) so defined is independent of M=A,
0 (v), Vstat is a solution of the evolution will be independent of M=A if R is
r q q
   8kB m  sufficiently large. This conclusion, which we cannot

kB n0 T0  n0 T0  Vstat n0 T0  n
0 T0 prove rigorously, will be confirmed by numerical

2
 3  simulations.
Vstat m n
0  n0 O Vstat 0 48
To give a qualitative discussion of the evolution
Moreover, for arbitrary values of R , we shall use the following
r q assumption already introduced in the experimental
q
A 8kB m  measurement of cp =cv .
01 
n0 T0 n0 T0 49
M 
Assumption 2 (Average assumption). The surface
which implies that the relaxation time will be very coefficients n ^  (t) (eqns [52][53]) coin-
^ (t) and T
small either if M=A  1, or if n0 = n~
0 with  1. cide to order 1 in  with the average value of the
In this case, the piston acquires almost immediately density and temperature in the fluids, that is,
Adiabatic Piston 167

N N In other words, the effect of collisions on the piston


^
n ; ^
n is to induce an external potential of the form
AXt AL  Xt
^  T  t [c1 jXj2 c2 (L  X)2 ] and a friction force. It is a
T av 55 damped harmonic oscillator with
 
We still need an expression for the friction 2 E0 1
!0 6
coefficients. From M Xf L  Xf
rr"s s # 64
F2 V; t p
^ t  4mVF1 V 0; t 1 E0 R R
4
mV 2 n
^ t OV 3 56  ML Xf L  Xf

then, assuming that to first order in , F1 (V = 0; t) is (recall that R = mN  =M). For the case N  = N to
the same function of T ^  (t) as for Maxwellian be considered in the simulations, eqn [64] implies
distributions, we have that the motion is weakly damped if
2s 3 "r r #2
 
A  4 8kB T
^ 3 Xf X

V m^n  V 5 OV 2 57 R < Rmax 1 f 65
M m 2 L L

Therefore, choosing initial condition such that V(t) with period


is small for all time, eqn [51] yields 2 1
p p  p 66
^ X  T ^ L  X !0 R  Rmax
T
q q and strongly damped if R > Rmax , in agreement with
C T ^  X0  T ^ L  X0 58
0 0 experimental observations.
We thus obtain the equilibrium point for the
adiabatic evolution (M = 1): Moments hV n it : Piston with Finite Mass
 
N 2E0 Xf Equation to First Order in  = 2m=(M m)
Tf 59
A AkB L
If the mass of the piston is finite with M  m, then
    the irreducible moments r are of the order [(r1)=2]
N 2E0 Xf where [(r 1)=2] is the integral part of (r 1)=2.
Tf 1 60
A AkB L If the factorization condition [19] is satisfied, to first
order in  we have
where    
2E0 N N nn  1 n2
T0 T0 61 hV n it V n t V t2 t 67
AkB A A 2
and where V(t) = hVit and 2 (t) = hV 2 it  hVi2t are
s s
s solutions of
 
A A 3 AL
X3f  L  Xf C 62 1d
N N 2E0 kB Vt F2 2 F0
 dt
Solving [58][62] gives the equilibrium state (Xf , Tf ), 1d
2 t  42 F1 F3
which is a state of mechanical equilibrium p f = pf ,  dt 68

but not thermal equilibrium Tf 6 Tf . Moreover, this 1d  
equilibrium state does not depend on M. Having hE it  MF2 2 F0 V
 dt
obtained the equilibrium point, we can then investi- 
M=242 F1  F3 
gate the evolution close to the equilibrium point.
Linearizing eqn [51] around (Xf , Tf ) yields :
and 2 kB TP =M defines the temperature of the
    2 piston.
d N Tf Xf
V kB
dt M X3 Infinite Cylinder: Heat Transfer
 
N Tf L  Xf 2 For the infinite cylinder, the factorization assump-
  V 0V 63
M L  X3 tion is an exact relation and in this case the
functions Fk (V; t) are independent of t. The solution
168 Adiabatic Piston

of the autonomous system [68] with Fk = Fk (V) observables. The initial conditions are set on the
shows that the piston evolves to a stationary state first-stage solution. The initial conditions of the
 given by
with velocity V second regime match the asymptotic behavior of the
  first-stage solution (matching condition).
  F3 VF0 V 0
F2 V 69 The slaving principle is implemented by interpret-
4 F1 V
ing an evolution equation of the form
The temperature of the piston is da da
 A; a; A O1 73
 dt d
 2 kB TP  F3 V
 70
M 
4 F1 V as follows: it indicates that a is in fact a fast quantity
relaxing at short times ( ) toward a stationary
and the heat flux from the piston to the fluid is
state aeq () slaved to the slow evolution and

1 P! m2 F3 F1  F3 F1 determined by the condition
PQ 71
A 2M F1  F1 A; aeq  0 74
If we choose initial conditions such that jV(t)j  1
(at lowest order in , actually A[, aeq ()] = O()
for all t, and Maxwellian distributions  (v), the
which prescribes the leading order of aeq ()); the
solutions V(t), 2 (t) coincide with the solutions
following-order terms can be arbitrarily fixed as
previously obtained (eqns [36] and [37]) and
long as only the first order of perturbation is
r
1 P! m 8kB implemented. Physically, such a condition arises to

P T  T
express that an instantaneous mechanical equili-
A Q M m
brium takes place at each time  of the slow
p p

p p 72 relaxation to thermal equilibrium.
p T  p T
In conclusion, to first order in m=M, there is a heat
Equations for the fluctuation-induced evolution of
flux from the warm side to the cold one propor-
the system Following this procedure, we arrive at
tional to (T  T  ), induced by the stochastic
explicit expressions for the rescaled quantities (of order
motion of the piston. e = V=,  e 2 = 2 =, and  e = (p  p )=:
O(1))V
   
Finite Cylinder (L < 1, M < 1)
Ve m AL F3 F1  F3 F1 O
3 E0 F1
Singular character of the perturbation approach  
Whereas the leading order is actually the thermo- e
 2m AL
F3 F1  F3 F1
dynamic behavior M = 1 in the first two stages of 2m 3 E0 75
the evolution (fast relaxation toward mechanical F3 F1
equilibrium), the fluctuations of order O() rule the  O
4F1
slow relaxation toward thermal equilibrium. It is
thus obvious that a naive perturbation approach e 2 F3 O
4F1
cannot give access to both regimes. This difficulty
is reminiscent of the boundary-layer problems We then introduce a (dimensionless) rescaled posi-
encountered in hydrodynamics, and the perturbation tion for the piston
method to be used here is the exact temporal analog
1 X 1 1
of the matched perturbative expansion method  2  ; 76
developed for these boundary layers. The idea is to 2 L 2 2
implement two different perturbation approaches: which satisfies
1. one at short times, with time variable t describing  
d  2A F1 F1
the fast dynamics ruling the fast relaxation kB T  T 77
d 3E0 F1
toward mechanical equilibrium; and
2. one for longer times, with a rescaled time To discuss eqn [77], a third assumption has to be
variable  = t. introduced.
The second perturbation approach above is supple- Assumption 3 (Maxwellian Identities). In the
mented with a slaving principle, expressing that at regime when V = O(), the relations between the
each time of the slow evolution, that is, at fixed , functionals F1 , F2 , and F3 are the same at lowest
the still present fast dynamics has reached a local order in  as if the distributions surf (v; V; t) were
asymptotic state, slaved to the values of the slow Maxwellian in v:
Adiabatic Piston 169

r
kB T  thermal equilibrium up to a temperature difference
F1 V  
T  T  = O(). For the sake of technical complete-
2m
 
 78 ness (rather that physical relevance, since the above
2kB T first-order analysis is enough to get the observable,
F3 V F1 V  VF2 V
m meaningful behavior), let us mention that the pertur-
bation analysis can be carried over at higher orders;
Using these identities and the (dimensionless)
using further rescaled times t2 = 2 t0 , . . . , tn = n t0 , it
rescaled time
would allow us to control the evolution up to a
rr
2 kB 2N T0 N T0 temperature difference jT  T  j = O(n ); however,
s 79 one could expect that the factorization condition does
3L m N
not hold at higher orders.
where N = N N  , we obtain a deterministic
equation describing the piston motion (Gruber et al.
2003): Numerical Simulations
"r r #
d N N As we have seen, the results were established under
 1 2  1  2 the condition that m/M is a small parameter. More-
ds 2N 2N 
80 over for finite systems (L < 1, M < 1), it was
1 Xad assumed that before collisions and to first order in
0 
2 L m/M, the factorization and the average assumptions
where Xad is the piston position at the end of the are satisfied. The numerical simulations are thus
adiabatic regime (i.e., Xf , eqn [62]). The meaningful essential to check the validity of these assumptions, to
observables straightforwardly follow from the solu- determine the range of acceptable values m/M for the
tion (s): perturbation expansion, to investigate the thermo-
  dynamic limit, and to guide the intuition.
1
Xs L  s In all simulation, we have taken kB = 1, m = 1,
2 T  = 1 and usually T = 10. For L finite, we have
    81
N T0 N T0 taken L = 60, X0 = 10, A = 105 , and N = N  = N=2,
T  s 1  2 s
2N that is, p = R(M=A)(1=10) and p = 2p . The
number of particles N was varied from a few hundreds
The first-order perturbation analysis using a single to one or several millions; the mass M of the piston
rescaled time t1 = t0 is valid in the regime when from 1 to 105 . We give below some of the results
V = O() and it gives access to the relaxation toward which have been obtained for L = 1 (Figures 2 and 3)

M=5
450
400 M = 10

350
0.5
300
M = 15
250 0.4
X(t )

200
0.3
Vstat

M = 25
150

100 0.2
M = 50
50 M = 100 0.1
0
0
0 500 1000 1500 2000 2500 3000 0 20 40 60 80 100
t M

(a) (b)
Figure 2 Evolution of the piston for L = 1, and p  = p = 1 as observed in simulations (stochastic line in (a), dots in (b)) compared
with prediction: (a) position X(t ) for T = 10; and (b) stationary velocity for T = 10 (continuous line) and T = 100 (dotted line), as a
function of M.
170 Adiabatic Piston

0
0
2000
2000

X (t )
4000 4000
X(t)

6000
6000
8000
8000
10000

12000 10000
0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3
t 104 t 104

(a) (b)
Figure 3 Evolution of the piston for L = 1, M = 104 , and p 6 p  as observed in simulations (continuous line) compared with
predictions (dotted line): (a) p  = 1, p = p  p, from top to bottom p=p  = 0.05, 0.1, 0.2, 1, 2, 3; and (b) p  = , p = 2 ,
p=p  = 1; X 0 = X , t 0 = t, = 103 , 102 , 101 , 1, 10, 102 , 103 , 104 .

10 0.3

0.2
9.5
0.1
9 0
Xad V
0.1
8.5
0.2
8
0.3

7.5 0.4
0 50 100 150 200 250 300 350 0 10 20 30 40 50

9.5 0.3

0.2
9
0.1

8.5 0
Xad V
X(t )

V(t )

0.1
8
0.2
7.5
0.3

7 0.4
50 100 150 200 250 300 350 0 10 20 30 40 50

0.15
10
0.1
9.5

9 0.05

8.5 0
Xad
8 0.05

7.5 0.1

7 0.15

6.5 0.2
0 50 100 150 200 250 300 350 0 10 20 30 40 50
t t

(a) (b)
Figure 4 Deterministic evolution toward mechanical equilibrium for L < 1, M = 105 : (a) position X(t); one finds Xad
sim
= 8.3 whereas
th
Xad = 8.42 and (b) velocity V(t); one finds V sim = 0.343 whereas V  th = 0.3433. From top to bottom: R = 12: strong damping,
3
independentpof R and M for R > 4 and M > 10 . R = 2: critical damping. R = 0.1: weak damping; damping coefficient increases with R
and !0  R for R < 1 but is independent of M for M > 103 .
Adiabatic Piston 171

105 105
3 3
2.8 2.8
2.6 2.6
2.4 2.4
2.2
p av

2.2

psurf


2 2
1.8
1.8
1.6
1.6
1.4
1.4
1.2
0 50 100 150 200 250 1.2
0 20 40 60 80
0 50 100 150 200 250
10 0 20 40 60 80
10.5
Tav+ (t )

10

+ (t)
9.5

Tsurf
9.5
9
9
8.5

2 2

(t)
Tav (t )

1.5 1.5
Tsurf

1 1
0 50 100 150 200 250 0 20 40 60 80
t t

(a) (b)

Figure 5 Same conditions as Figure 4, R = 12: (a) average pressure and temperature in the fluid: pav (t) = 2E  n  =N  ,

Tav = E  =N  kB and (b) pressure and temperature at the surface of the piston. Prediction: Tad

= 1.54, Tad 
= 9.46, pad = pad = 2.2.
 
Simulations: Tad = 1.52, Tad = 9.48, pad = pad = 2.2.

and for L < 1 approach to mechanical equilibrium predictions. In particular, they show that if R > 4,
(Figures 46) and to thermal equilibrium (Figures 7 the piston will be able to reach and maintain for
and 8). some time the velocity Vstat , whereas this will not be
the case for R < 1 (Figure 4b). In the second stage of
the evolution, the simulations (Figure 4) exhibit
Conclusions and Open Problems
damped oscillations toward mechanical equilibrium
In this article, the adiabatic piston has been which are in very good agreement with the predic-

investigated to first order in the small parameter tions for the final state (Xad , Tad ), the frequency of
m/M, but no attempt has been made to control the oscillations and the existence of weak and strong
remainder terms. For an infinite cylinder, no other damping depending on R < 1 or R > 4. Moreover,
assumptions were necessary and the numerical the general behavior of the evolution observed in the
simulations (Figures 2 and 3) are in perfect agree- simulations as a function of the parameters was as
ment with the theoretical prediction in particular for predicted. However, the damping coefficient of these
the stationary velocity Vstat , the friction coefficient oscillations is wrong by one or several orders of
(V), and the relaxation time . magnitude. To understand this discrepancy, we note
For a finite cylinder (L < 1) and in the thermo- that using the average assumption we have related
dynamic limit (M = 1), we were forced to introduce the damping to the friction coefficient. However, the
the average assumption to obtain a set of autono- simulations clearly show that those two dissipative
mous equations. As we have seen when initially p effects have totally different origins. Indeed, as one
6 p , this limiting case also describes the evolution can see with L = 1, friction is associated with the
to lowest order during the first two stages character-
p fact that the density of the gas in front and in the
ized by a time of the order t1 = L m=kB T , where the back of the piston is not the same as in the bulk, and
evolution is adiabatic and deterministic. In the first this generates a shock wave that propagates in the
stage, that is, before the shock wave bounces back on fluid. For finite L, when R > 4, the stationary
the piston, the simulations confirm the theoretical velocity Vstat is reached and the effect of friction is
172 Adiabatic Piston

0.3 0.3 0.35


0.2 0.2

0.1 0.1
0.3
0 0
5 0 5 5 0 5

0.3 0.3 0.25

0.2 0.2

0.1 0.1 0.2


0 0
()

5 0 5 5 0 5

()
0.15
0.3 0.3

0.2 0.2
0.1
0.1 0.1
0 0
5 0 5 5 0 5
0.05
0.3 0.3

0.2 0.2
0
0.1 0.1

0 0
5 0 5 5 0 5 5 2.5 0 2.5 5

(a) (b)
Figure 6 Velocity distribution in the left compartment. Same conditions as Figure 4, R = 12. Dotted line corresponds to Maxwellian
with T  = 1.52: (a) t = 12, 24, 36, 48, 60, 92, 144, 240 from top to bottom and (b) t = 276460.

to transfer in this first stage more and more energy to motion. In this case very little dissipation is involved
the fluid on one side and vice versa on the other side. and the damping will be very small. This indicates
However, to stop the piston and reverse its motion, that the mechanism responsible for damping is
only a certain amount of the transferred energy is associated with shock waves bouncing back and
necessary and the rest remains as dissipated energy in forth and the average assumption, which corresponds
the fluid leading to a strong damping. On the other to a homogeneity condition throughout the gas,
hand, for R < 1, the value Vstat is never reached and cannot describe the situation. In fact, the simulations
all the energy transferred is necessary to revert the (Figure 5b) indicate that the average assumption does

35 10

9
30
8
25 7

6
T
X

20
5
15 4

3
10
2

5 1
0 50 100 150 200 250 300 0 50 100 150 200 250 300
= t = t

(a) (b)
 4
Figure 7 Approach to thermal equilibrium, N = 3
10 . The smooth curves correspond to the predictions, the stochastic curves to
simulations: (a) position X (),  = t, no visible difference for M = 100, 200, 1000 and (b) average temperatures T  (),  = t, M = 200.
Adiabatic Piston 173

0.4 0.2

0.35

0.3 0.15
()/n

0.25

()/n
0.2 0.1

0.15

0.1 0.05

0.05

0 0
10 5 0 5 10 15 10 5 0 5 10 15

(a) (b)

Figure 8 Approach to thermal equilibrium from Tad = 1.54 (dotted line in(a)) to Tf = 5.5 (heavy line in (b)). Velocity distribution
function on the left for M = 200, N  = 5
104 . (a)  =  t = 2, 4, 14, 48, 92, 144 and (b) approach to Maxwellian distribution for  > 445.

not hold in this second stage. In conclusion, one is Finally, let us mention that the relation between the
forced to admit that to describe correctly the piston problem and the second law of thermodynamics
adiabatic evolution, it is necessary to study the is one more major problem. The question of entropy
coupling between the motion of the piston and the production out of equilibrium, and the validity of the
hydrodynamic equations of the gas. Preliminary second law, are still highly controversial. Again,
investigations have been initiated, but this is still preliminary results can be found in the literature.
one of the major open problems. Another problem Among other things, this question has led to a model of
would be to study the evolution in the case of heat conductivity gases, which reproduces the correct
interacting particles. However, investigations with behavior (Gruber and Lesne 2005).
hard disks suggest that no new effects should appear.
To investigate adiabatic evolution, a simpler version
See also: Billiards in Bounded Convex Domains;
of the adiabatic piston problem, without any con-
Boltzmann Equation (Classical and Quantum);
troversy, has been introduced: this is the model of a
Hamiltonian Fluid Dynamics; Multiscale Approaches;
standard piston with a constant force acting on it. Nonequilibrium Statistical Mechanics (Stationary):
In the third stage, that is, the very slow Overview; Nonequilibrium Statistical Mechanics:
approach to thermal equilibrium, another assump- Dynamical Systems Approach.
tion was necessary, namely the factorization
condition. The simulations (Figure 7) show a very
good agreement with the prediction, and in
particular the scaling property with t0 = t=M is Further Reading
perfectly verified. It appears that the small dis-
crepancy between simulations and theoretical Callen HB (1963) Thermodynamics. New York: Wiley.
(Appendix C. See also Callen HB (1985) Thermodynamics
predictions could be due to the fact that, to and Thermostatics, 2nd edn., pp. 51 and 53. New York:
compute explicitly the coefficients in the equations Wiley.)
of motion, we have taken Maxwellian relations for Chernov N, Sinai YaG, and Lebowitz JL (2002) Scaling dynamic
the velocities of the gas particles, which is clearly of a massive piston in a cube filled with ideal gas: exact
not the case (Figure 8a). results. Journal of Statistical Physics 109: 529548.
Feynman RP (1965) Lectures in Physics I. New York: Addison-
The fourth stage of the evolution, that is, the Wesley.
approach to Maxwellian distributions (Figure 8b), is Gruber Ch (1999) Thermodynamics of systems with internal
still another major open problem. Some preliminary adiabatic constraints: time evolution of the adiabatic piston.
studies have been conducted, where one investigates European Journal of Physics 20: 259266.
the stability and the evolution of the system when Gruber Ch and Lesne A (2005) Hamiltonian model of heat
conductivity and Fourier law. Physica A 351: 358.
initially the two gases are in the same equilibrium Gruber Ch, Pache S, and Lesne A (2003) Two-time-scale
state, but characterized by a distribution function relaxation towards thermal equilibrium of the enigmatic
which is not Maxwellian. piston. Journal of Statistical Physics 112: 11991228.
174 AdS/CFT Correspondence

Gruber Ch and Piasecki J (1999) Stationary motion of the Ball Systems and the Lorentz Gas, Encyclopedia of
adiabatic piston. Physica A 268: 412442. Mathematical Sciences Series, vol. 101, pp. 217227. Berlin:
Kestemont E, Van den Broeck C, and MalekMM (2000) The Springer.
adiabatic piston: and yet it moves. Europhysics Letters 49: 143. Van den Broeck C, Meurs P, and Kawai R (2004) From
Lebowitz JL, Piasecki J, and Sinai YaG (2000) Scaling dynamics Maxwell demon to Brownian motor. New Journal of Physics
of a massive piston in an ideal gas. In: Szasz D (ed.) Hard 7: 10.

AdS/CFT Correspondence
C P Herzog, University of California at Santa Barbara, with string one-loop diagrams, by N 0 , etc. This
Santa Barbara, CA, USA counting corresponds to the closed-string coupling
I R Klebanov, Princeton University, Princeton, NJ, constant of order N1 . Thus, in the large-N limit
USA the gauge theory becomes planar, and the dual
2006 Elsevier Ltd. All rights reserved. string theory becomes classical. For small g2YM N,
the gauge theory can be studied perturbatively; in
this regime the dual string theory has not been very
Introduction useful because the background becomes highly
curved. The real power of the AdS/CFT duality,
The anti-de Sitter/conformal field theory (AdS/CFT)
which already has made it a very useful tool, lies in
correspondence is a conjectured equivalence
the fact that, when the gauge theory becomes
between a quantum field theory in d spacetime strongly coupled, the curvature in the dual descrip-
dimensions with conformal scaling symmetry and a tion becomes small; therefore, classical supergravity
quantum theory of gravity in (d 1)-dimensional
provides a systematic starting point for approximat-
anti-de Sitter space. The most promising
ing the string theory.
approaches to quantizing gravity involve super-
There is a strong motivation for an improved
string theories, which are most easily defined in
understanding of dualities of this type. In one
10 spacetime dimensions, or M-theory which is
direction, generalizations of this duality provide the
defined in 11 spacetime dimensions. Hence, the
tantalizing hope of a better understanding of
AdS/CFT correspondences based on superstrings quantum chromodynamics (QCD); QCD is a non-
typically involve backgrounds of the form AdSd1
abelian gauge theory that describes the strong
Y9d while those based on M-theory involve back-
interactions of mesons, baryons, and glueballs, and
grounds of the form AdSd1
Y10d , where Y are
has a conformal symmetry which is broken by
compact spaces.
quantum effects. In the other direction, AdS/CFT
The examples of the AdS/CFT correspondence
suggests that quantum gravity may be understand-
discussed in this article are dualities between
able as a gauge theory. Understanding the confine-
(super)conformal nonabelian gauge theories and
ment of quarks and gluons that takes place in
superstrings on AdS5
Y5 , where Y5 is a five- low-energy QCD and quantizing gravity are well
dimensional Einstein space (i.e., a space whose acknowledged to be two of the most important
Ricci tensor is proportional to the metric,
outstanding problems of theoretical physics.
Rij = 4gij ). In particular, the most basic (and maxi-
mally supersymmetric) such duality relates
N = 4 SU(N) super YangMills (SYM) and type IIB
superstring in the curved background AdS5
S5 . Some Geometrical Preliminaries
There exist special limits where this duality is The d-dimensional sphere of radius L, Sd , may be
more tractable than in the general case. If we take defined by a constraint
the large-N limit while keeping the t Hooft coupling
= g2YM N fixed (gYM is the YangMills coupling X
d1

strength), then each Feynman graph of the gauge Xi 2 L2 1


i1
theory carries a topological factor N  , where  is
the Euler characteristic of the graph. The graphs of on d 1 real coordinates Xi . It is a positively curved
spherical topology (often called planar), to be maximally symmetric space with symmetry group
identified with string tree diagrams, are weighted by SO(d 1). We will denote the round metric on Sd of
N 2 ; the graphs of toroidal topology, to be identified unit radius by d2d .
AdS/CFT Correspondence 175

The d-dimensional anti-de Sitter space, AdSd , may topological defect: upon touching a D-brane, a
be defined by a constraint closed string can open up and turn into an open
string whose ends are free to move along the
X
d1
X0 2 Xd 2  Xi 2 L2 2 D-brane. For the endpoints of such a string the p 1
i1 longitudinal coordinates satisfy the conventional free
(Neumann) boundary conditions, while the 9  p
This constraint shows that the symmetry group of coordinates transverse to the Dp brane have the fixed
AdSd is SO(2, d  1). AdSd is a negatively curved (Dirichlet) boundary conditions, hence the origin of
maximally symmetric space, that is, its curvature the term Dirichlet brane. The Dp brane preserves
tensor is related to the metric by half of the bulk supersymmetries and carries an
1 elementary unit of charge with respect to the (p 1)-
Rabcd  gac gbd  gad gbc  3 form gauge potential from the RamondRamond
L2
(RR) sector of type II superstring.
Its metric may be written as
For this article, the most important property of
 
dy2 D-branes is that they realize gauge theories on their
ds2AdS L2 y2 1dt2 2 y2 d2d2 4 world volume. The massless spectrum of open
y 1
strings living on a Dp brane is that of a maximally
where the radial coordinate y 2 [0, 1), and t is supersymmetric U(1) gauge theory in p 1 dimen-
defined on a circle of length 2. This space has sions. The 9  p massless scalar fields present in this
closed timelike curves; to eliminate them, we will supermultiplet are the expected Goldstone modes
work with the universal covering space where associated with the transverse oscillations of the Dp
t 2 (1, 1). The boundary of AdSd , which plays brane, while the photons and fermions provide the
an important role in the AdS/CFT correspondence, is unique supersymmetric completion. If we consider
located at infinite y. There exists a subspace of AdSd N parallel D-branes, then there are N 2 different
called the Poincare wedge, with the metric species of open strings because they can begin and
! end on any of the D-branes. N 2 is the dimension of
L 2
2
Xd2
2
ds2 2 dz2  dx0 dxi 5 the adjoint representation of U(N), and indeed we
z i1 find the maximally supersymmetric U(N) gauge
theory in this setting.
where z 2 [0, 1).
The relative separations of the Dp branes in the
A Euclidean continuation of AdSd is the
9  p transverse dimensions are determined by
Lobachevsky space (hyperboloid), Ld . It is obtained
the expectation values of the scalar fields. We will
by reversing the sign of (Xd )2 , dt2 , and (dx0 )2 in [2],
be interested in the case where all scalar expectation
[4], and [5], respectively. After this Euclidean
values vanish, so that the N Dp branes are stacked
continuation, the metrics [4] and [5] become
on top of each other. If N is large, then this stack is
equivalent; both of them cover the entire Ld .
a heavy object embedded into a theory of closed
Another equivalent way of writing the metric is
  strings which contains gravity. Naturally, this
ds2L L2 d2 sinh2  d2d1 6 macroscopic object will curve space: it may be
described by some classical metric and other back-
which shows that the boundary at infinite  has the ground fields including the RR (p 2)-form field
topology of Sd1 . In terms of the Euclideanized strength. Thus, we have two very different descrip-
metric [5], the boundary consists of the Rd1 at tions of the stack of Dp branes: one in terms of the
z = 0, and a single point at z = 1. U(N) supersymmetric gauge theory on its world
volume, and the other in terms of the classical RR
charged p-brane background of the type II closed
superstring theory. The relation between these two
The Geometry of Dirichlet Branes
descriptions is at the heart of the connections
Our path toward formulating the AdS5 =CFT4 between gauge fields and strings that are the subject
correspondence requires introduction of Dirichlet of this article.
branes, or D-branes for short. They are soliton-like
membranes of various internal dimensionalities
Coincident D3 Branes
contained in type II superstring theories. A Dirichlet
p-brane (or Dp brane) is a (p 1)-dimensional Gauge theories in 3 1 dimensions play an impor-
hyperplane in (9 1)-dimensional spacetime where tant role in physics, and as explained above, parallel
strings are allowed to end. A D-brane is much like a D3 branes realize a (3 1)-dimensional U(N) SYM
176 AdS/CFT Correspondence

theory. Let us compare a stack of D3 branes with where we used the standard relations = 87=2 gst 0 2
the RR-charged black 3-brane classical solution and g2YM = 4gst [10]. Thus, the size of the throat in
where the metric assumes the form string units is 1=4 . This remarkable emergence
h i of the t Hooft coupling from gravitational con-
ds2 H1=2 r f rdx0 2 dxi 2 siderations is at the heart of the success of the AdS/
h i CFT pcorrespondence. Moreover, the requirement
H 1=2 r f 1 rdr2 r2 d5 2 7
L  0 translates into   1: the gravitational
approach is valid when the t Hooft coupling is very
where i = 1, 2, 3 and
strong and the perturbative field-theoretic methods
L4 r0 4 are not applicable.
Hr 1 ; f r 1 
r4 r4
The solution also contains an RR self-dual 5-form Example: Thermal Gauge Theory from
Near-Extremal D3 Branes
field strength
An important black hole observable is the Bekenstein
F dx0 ^ dx1 ^ dx2 ^ dx3 ^ dH 1
Hawking (BH) entropy, which is proportional to the
4L4 volS5 8 area of the event horizon. For the 3-brane solution
[7], the horizon is located at r = r0 . For r0 > 0 the
so that the Einstein equation of type IIB super-
3-brane carries some excess energy E above its
gravity, R = F
F
=96, is satisfied.
extremal value, and the BH entropy is also non-
In the extremal limit r0 ! 0, the 3-brane metric
vanishing. The Hawking temperature is then defined
becomes
by T 1 = @SBH =@E.
 1=2   Setting r0  L in [9], we obtain a near-extremal
L4
ds2 1 4 dx0 2 dxi 2 3-brane geometry, whose Hawking temperature is
r
 1=2 found to be T = r0 =(L2 ). The eight-dimensional
L4  2  area of the horizon is
1 4 dr r2 d25 9
r
Ah r0 =L3 V3 L5 volS5 6 L8 T 3 V3 12
Just like the stack of parallel, ground-state D3
branes, the extremal solution preserves 16 of the where V3 is the spatial volume of the D3 brane (i.e.,
32 supersymmetries present in the type IIB theory. the volume of the x1 , x2 , x3 coordinates). Therefore,
Introducing z = L2 =r, one notes that the limiting the BH entropy is
form of [9] as r ! 0 factorizes into the direct 2Ah 2 2
product of two smooth spaces, the Poincare wedge SBH N V3 T 3 13
2 2
[5] of AdS5 , and S5 , with equal radii of curvature L.
The 3-brane geometry may thus be viewed as a This gravitational entropy of a near-extremal
semi-infinite throat of radius L which, for r  L, 3-brane of Hawking temperature T is to be
opens up into flat (9 1)-dimensional space. Thus, identified with the entropy of N = 4 supersym-
p metric U(N) gauge theory (which lives on N
for L much larger than the string length scale, 0 ,
the entire 3-brane geometry has small curvatures coincident D3 branes) heated up to the same
everywhere and is appropriately described by the temperature.
supergravity approximation to type IIB string The entropy of a free U(N) N = 4 supermultiplet
theory. which consists of the gauge field, 6N2 massless
p scalars, and 4N2 Weyl fermions can be calculated
The relation between L and 0 may be found by
equating the gravitational tension of the extremal using the standard statistical mechanics of a
3-brane classical solution to N times the tension of a massless gas (the blackbody problem), and the
single D3 brane: answer is
p 22 2
2 4 5  S0 N V3 T 3 14
L volS N 10 3
2
where vol(S 5
)= 3 It is remarkable that the 3-brane geometry captures
 is the volume of a unit 5-sphere,
p
and = 8G is the ten-dimensional gravitational the T 3 scaling characteristic of a conformal field
constant. It follows that theory (CFT) (in a CFT this scaling is guaranteed by
the extensivity of the entropy and the absence of

L4 5=2 N g2YM N0 2 11 dimensionful parameters). Also, the N 2 scaling
2 indicates the presence of O(N 2 ) unconfined degrees
AdS/CFT Correspondence 177

of freedom, which is exactly what we expect in the particle incident from the asymptotic (large r) region
N = 4 supersymmetric U(N) gauge theory. But what tunnels into the r  L region and produces an
is the explanation of the relative factor of 3/4 excitation of the throat. The fact that the two
between SBH and S0 ? In fact, this factor is not a different descriptions of the absorption process give
contradiction but rather a prediction about the identical cross sections supports the identification of
strongly coupled N = 4 SYM theory at finite excitations of AdS5  S5 with the excited states of
temperature. As we argued above, the supergravity the N = 4 SYM theory.
calculation of the BH entropy, [13], is relevant to Maldacena (1998) motivated this correspondence
the  ! 1 limit of the N = 4 SU(N) gauge theory, by thinking about the low-energy (0 ! 0) limit of
while the free-field calculation, [14], applies to the the string theory. On the D3 brane side, in this low-
 ! 0 limit. Thus, the relative factor of 3/4 is not a energy limit, the interaction between the D3 branes
discrepancy: it relates two different limits of the and the closed strings propagating in the bulk
theory. Indeed, on general field-theoretic grounds, vanishes, leaving a pure N = 4 SYM theory on the
we expect that in the t Hooft large-N limit, the D3 branes decoupled from type IIB superstrings in
entropy is given by flat space. Around the classical 3-brane solutions,
there are two types of low-energy excitations. The
22 2
S N f V3 T 3 15 first type propagate in the bulk region, r  L, and
3 have a cross section for absorption by the throat
The function f is certainly not constant: which vanishes as the cube of their energy. The
perturbative calculations valid for small  = g2YM N second type are localized in the throat, r  L, and
give find it harder to tunnel into the asymptotically flat
p region as their energy is taken smaller. Thus, both
3 3 2 3=2 the D3 branes and the classical 3-brane solution
f  1  2    16
2 3 have two decoupled components in the low-energy
Thus, the BH entropy in supergravity, [13], is limit, and in both cases, one of these components is
translated into the prediction that type IIB superstrings in flat space. Maldacena
conjectured an equivalence between the other two
3
lim f  17 components.
!1 4 Immediate support for this identification comes
from symmetry considerations. The isometry group
of AdS5 is SO(2, 4), and this is also the conformal
group in 3 1 dimensions. In addition, we have the
isometries of S5 which form SU(4) SO(6). This
The Essentials of the AdS/CFT
group is identical to the R-symmetry of the N = 4
Correspondence
SYM theory. After including the fermionic genera-
The AdS/CFT correspondence asserts a detailed map tors required by supersymmetry, the full isometry
between the physics of type IIB string theory in the supergroup of the AdS5  S5 background is
throat of the classical 3-brane geometry, that is, the SU(2, 2j4), which is identical to the N = 4 super-
region r  L, and the gauge theory living on a stack conformal symmetry. We will see that, in theories
of D3 branes. As already noted, in this limit r  L, with reduced supersymmetry, the S5 factor is
the extremal D3 brane geometry factors into a direct replaced by other compact Einstein spaces Y5 , but
product of AdS5  S5 . Moreover, the gauge theory AdS5 is the universal factor present in the dual
on this stack of D3 branes is the maximally description of any large-N CFT and makes the
supersymmetric N = 4 SYM. SO(2, 4) conformal symmetry a geometric one.
Since the horizon of the near-extremal 3-brane lies The correspondence extends beyond the super-
in the region r  L, the entropy calculation could gravity limit, and we must think of AdS5  Y5 as a
have been carried out directly in the throat limit, background of string theory. Indeed, type IIB strings
where H(r) is replaced by L4 =r4 . Another way to are dual to the electric flux lines in the gauge theory,
motivate the identification of the gauge theory with providing a string-theoretic setup for calculating
the throat is to think about the absorption of correlation functions of Wilson loops. Furthermore,
massless particles. In the D-brane description, a if N ! 1 while g2YM N is held fixed and finite, then
particle incident from asymptotic infinity is con- there are string scale corrections to the supergravity
verted into an excitation of the stack of D-branes, limit (Maldacena 1998, Gubser et al. 1998, Witten
that is, into an excitation of the gauge theory on the 1998) which proceed in powers of
world volume. In the supergravity description, a 0 =L2 = (g2YM N)1=2 . For finite N, there are also
178 AdS/CFT Correspondence

string loop corrections in powers of 2 =L8 N 2 . large-N limit, the string theory becomes classical
As expected, with N ! 1 we can take the classical which implies
limit of the string theory on AdS5  Y5 . However, in
order to understand the large-N gauge theory with Zstring eI 0 x 20
finite t Hooft coupling, we should think of AdS5  where I[ 0 (x)] is the extremum of the classical string
Y5 as the target space of a two-dimensional sigma action calculated as a functional of 0 . If we are
model describing the classical string physics. further interested in correlation functions at very
large t Hooft coupling, then the problem of
Correlation Functions and the Bulk/Boundary extremizing the classical string action reduces to
Correspondence solving the equations of motion in type IIB super-
gravity whose form is known explicitly. A simple
A basic premise of the AdS/CFT correspondence is example of such a calculation is presented in the
the existence of a one-to-one map between gauge- next subsection.
invariant operators in the CFT and fields (or Our reasoning suggests that from the point of
extended objects) in AdS. Gubser et al. (1998) and view of the metric [5], the boundary conditions are
Witten (1998) formulated precise methods for imposed not quite at z = 0, which is the true
calculating correlation functions of various opera- boundary of L5 , but at some finite value z = . It
tors in a CFT using its dual formulation. A physical does not matter which value it is since the metric [5]
motivation for these methods comes from earlier is unchanged by an overall rescaling of the coordi-
calculations of absorption by 3-branes. When a nates (z, x); thus, such a rescaling can take z = L into
wave is absorbed, it tunnels from asymptotic infinity z =  for any . The physical meaning of this cutoff is
into the throat region, and then continues to that it acts as a UV regulator in the gauge theory.
propagate toward smaller r. Let us separate the Indeed, the radial coordinate z is to be considered as
3-brane geometry into two regions: r > <
L and r L. the effective energy scale of the gauge theory, and
<
For r L the metric is approximately that of decreasing z corresponds to increasing the energy. A
AdS5  S5 , while for r >
L it becomes very different safe method for performing calculations of correla-
and eventually approaches the flat metric. Signals tion functions, therefore, is to keep the cutoff on the
coming in from large r (small z = L2 =r) may be z-coordinate at intermediate stages and remove it
considered as disturbing the boundary of AdS5 at only at the end.
r L, and then propagating into the bulk of AdS5 .
Discarding the r > L part of the 3-brane metric, the
gauge theory correlation functions are related to the Two-Point Functions and Operator Dimensions
response of the string theory to boundary conditions
In the following, we present a brief discussion of
at r L. It is therefore natural to identify the
two-point functions of scalar operators in CFTd .
generating functional of correlation functions in the
The corresponding field in Ld1 is a scalar field of
gauge theory with the string theory path integral
mass m whose Euclidean action is proportional to
subject to the boundary conditions that
(x, z) = 0 (x) at z = L (at z = 1 all fluctuations Z " #
1 d 2
Xd
2 m2 L2 2
are required to vanish). In calculating correlation d x dz z d1
@z @a 2
functions in a CFT, we will carry out the standard 2 a1
z
Euclidean continuation; then on the string theory 21
side, we will work with L5 , which is the Euclidean
version of AdS5 . In calculating correlation functions of vertex
More explicitly, we identify a gauge theory operators from the AdS/CFT correspondence, the
quantity W with a string-theory quantity Zstring : first problem is to reconstruct an on-shell field in
Ld1 from its boundary behavior. The near-bound-
W 0 x Zstring 0 x 18
ary, that is, small z, behavior of the classical
W generates the connected Euclidean Greens func- solution is
tions of a gauge-theory operator O,
Z
z; x ! zd 0 x Oz2
4
W 0 x exp d x 0 O 19 z Ax Oz2 22

Zstring is the string theory path integral calculated as where  is one of the roots of
a functional of 0 , the boundary condition on the
field related to O by the AdS/CFT duality. In the   d m2 L2 23
AdS/CFT Correspondence 179

0 (x) is regarded as a source in [19] that couples states. Since the radius of the S5 is L, the masses of
to the dual gauge-invariant operator O of dimension the KaluzaKlein states are proportional to 1=L.
, while A(x) is related to the expectation value, Thus, the dimensions of the corresponding operators
are independent of L and therefore also of . On the
1 gauge-theory side, this independence is explained by
Ax hOxi 24
2  d the fact that the supersymmetry protects the dimen-
sions of certain operators from being renormalized:
It is possible to regularize the Euclidean action to
they are completely determined by the representa-
obtain the following value as a functional of the
tion under the superconformal symmetry. All
source:
families of the KaluzaKlein states, which corre-
 spond to such protected operators, were classified
I 0 x    d=2d=2 long ago. Correlation functions of such operators in
  d=2
Z Z 0 the strong t Hooft coupling limit may be obtained
0 x 0 x
 dd x dd x0 25 from the dependence of the supergravity action on
jx  x0 j2 the boundary values of corresponding KaluzaKlein
fields, as in [19]. A variety of explicit calculations
Varying twice with respect to 0 , we find that the
have been performed for two-, three-, and even four-
two-point function of the corresponding operator is
point functions. The four-point functions are parti-
2  d 1 cularly interesting because their dependence on
hOxOx0 i 26
 d=2 jx  x0 j2
d=2  operator positions is not determined by the con-
formal invariance.
Which of the two roots,  or  , of [23] On the other hand, the masses of string excita-
r tions are m2 = 4n=0 , where n is an integer. For the
d d2


m2 L2 27 corresponding operators the formula [27] predicts
2 4 that the dimensions do depend on the t Hooft
should we choose for the operator dimension? For 2
coupling
p and, in fact, blow up for large  = gYM N as
positive m2 ,  is certainly the right choice: here the 2 1=4
n.
other root,  , is negative. However, it turns out
that for
Calculation of Wilson Loops
d2 d2
 < m2 L2 <  1 28 The Wilson loop operator of a nonabelian gauge
4 4
theory
both roots of [23] may be chosen. Thus, there are  I 
two possible CFTs corresponding to the same WC tr P exp i A 29
classical AdS action: in one of them the correspond- C
ing operator has dimension  , while in the other involves the path-ordered integral of the gauge
the dimension is  . We note that  is bounded connection A along a contour C. For N = 4 SYM,
from below by (d  2)=2, which is precisely the one typically uses a generalization of this loop
unitarity bound on dimensions of scalar operators in operator which incorporates other fields in the
d-dimensional field theory! Thus, the ability to N = 4 multiplet, the adjoint scalars and fermions.
choose dimension  is crucial for consistency of Using a rectangular contour, we can calculate the
the AdS/CFT duality. quarkantiquark potential from the expectation
Whether string theory on AdS5  Y5 contains value hW(C)i. One thinks of the quarks located a
fields with m2 in the range [28] depends on Y5 . distance L apart for a time T, yielding
The example discussed in the next section,
Y5 = T 1, 1 , turns out to contain such fields, and the hWi eTVL 30
possibility of having dimension  , [27], is crucial
where V(L) is the potential.
for consistency of the AdS/CFT duality in that case.
According to Maldacena, and Rey and Yee, the
However, for Y5 = S5 , which is dual to the N = 4
AdS/CFT correspondence relates the Wilson loop
large-N SYM theory, there are no such fields and all
expectation value to a sum over string world sheets
scalar dimensions are given by [27].
ending on the boundary of L5 (z = 0) along the
The operators in the N = 4 large-N SYM theory
contour C:
naturally break up into two classes: those that Z
correspond to the KaluzaKlein states of super-
hWi eS 31
gravity and those that correspond to massive string
180 AdS/CFT Correspondence

where S is the action functional of the string world


sheet. In the large t Hooft coupling limit  ! 1, N X
Y
this path integral may be evaluated using a saddle-
point approximation. The leading answer is eS0 ,
where S0 is the action for the classical solution,
which is proportional to the minimal area of the
string world sheet in L5 subject to the boundary
conditions. The area as currently defined is
actually divergent, and to regularize it one must Figure 1 D3 branes placed at the tip of a Ricci-flat cone X.
position the contour at z =  (this is the same type
of regulator as used in the definition of correlation produced by placing D3 branes at the tip of a
functions). Ricci-flat six-dimensional cone X (see Figure 1). The
Consider a circular Wilson loop of radius a. The cone metric may be cast in the form
action of the corresponding classical string world
sheet is dsX 2 dr2 r2 dsY 2 37
p a  where Y is the level surface of X. In particular, Y is a
S0   1 32 positively curved Einstein manifold, that is, one for

which Rij = 4gij . In order to preserve the N = 1
Subtracting the linearly divergent term, which is
supersymmetry, X must be a CalabiYau space; then
proportional to the length of the contour, one finds
Y is defined to be SasakiEinstein.
p
lnhWi  Oln  33 The D3 branes appear as a point in X and span the
transverse Minkowski space R3, 1 . The ten-dimen-
a result which has been duplicated in field theory by sional metric they produce assumes the form [9], but
summing certain classes of rainbow Feynman dia- with the sphere metric d5 2 replaced by the metric on
grams in N = 4 SYM. From these sums, one finds Y, ds2Y . The equality of tensions [10] now requires that
2 p p
hWirainbow p I1  34  N 3
 L4 4gs N02 38
2 volY volY
where I1 is a Bessel function. This formula is one of
In the near-horizon limit, r ! 0, the geometry factors
the few available proposals for extrapolation of an
into AdS5  Y. Because the D3 branes are located at a
observable from small to large coupling. At large ,
singularity, the gauge theory becomes much more
r p
2e  complicated, typically involving a product of several
hWirainbow 35 SU(N) factors coupled to matter in bifundamental
 3=4
representations, often described using a quiver dia-
in agreement with the geometric prediction. gram (see Figure 2 for an example).
The quarkantiquark potential is extracted from a
rectangular Wilson loop of width L and length T.
After regularizing the divergent contribution to the Z
energy, one finds the attractive potential
p U U
42 
VL  36 Y
1=44 L
Y Y
The Coulombic 1/L dependence is required by the
conformal invariance of the theory. The fact that the
V V
potential scales as the square root of the t Hooft
coupling indicates some screening of the charges at Y Y
large coupling.
Y Y
U U
Conformal Field Theories and Einstein
Manifolds
V
Interesting generalizations of the duality between 4,3
Figure 2 The quiver for Y . Each node corresponds to an
AdS5  S5 and N = 4 SYM with less supersymmetry SU(N ) gauge group and each arrow to a bifundamental chiral
and more complicated gauge groups can be superfield.
AdS/CFT Correspondence 181

The simplest examples of X are orbifolds C3 =, and q are integers with p q. Gauntlett et al. (2004)
where  is a discrete subgroup of SO(6). Indeed, if discovered metrics on all the Y p, q , and the quiver
 SU(3), then N = 1 supersymmetry is preserved. gauge theories that live on the D-branes probing the
The level surface of such an X is Y = S5 =. In this singularity are now known. Making contact with
case, the product structure of the gauge theory can the simpler examples discussed above, the Y p, 0 are
be motivated by thinking about image stacks of D3 orbifolds of T 1, 1 while the Y p, p are orbifolds of S5 .
branes from the action of . In the second class of cones X, a del Pezzo surface
The next simplest example of a CalabiYau cone shrinks to zero size at the tip of the cone. A
X is the conifold which may be described by the del Pezzo surface is an algebraic surface of complex
following equation in four complex variables: dimension 2 with positive first Chern class. One
simple del Pezzo surface is a complex projective
X
4
za 2 0 39 space of dimension 2, P2 , which gives rise to the
a1 N = 1 preserving S5 =Z3 orbifold. Another simple
case is P1  P1 , which leads to T 1, 1 =Z2 . The
Since this equation is symmetric under an overall remaining del Pezzos surfaces Bk are P2 blown up
rescaling of the coordinates, this space is a cone. The at k points, 1  k  8. The cone where B1 shrinks to
level surface Y of the conifold is a coset manifold zero size has level surface Y 2, 1 . Gauge theories for
T 1, 1 = (SU(2)  SU(2))=U(1). This space has the all the del Pezzos have been constructed. Except for
SO(4) SU(2)  SU(2) symmetry which rotates the the three del Pezzos just discussed, and possibly also
zs, and also the U(1) R-symmetry under za ! ei za . for B6 , metrics on the cones over these del Pezzos
The metric on T 1, 1 is known explicitly; it assumes are not known. Nevertheless, it is known that for
the form of an S1 bundle over S2  S2 . 3  k  8, the volume of the SasakiEinstein mani-
The supersymmetric field theory on the D3 branes fold Y associated with Bk is 3 (9  k)=27.
probing the conifold singularity is SU(N)  SU(N)
gauge theory coupled to two chiral superfields, Ai ,
in the (N, N) representation and two chiral super- The Central Charge
fields, Bj , in the (N, N) representation. The As
The central charge provides one of the most
transform as a doublet under one of the global
amazing ways to check the generalized AdS/CFT
SU(2)s, while the Bs transform as a doublet under correspondences. The central charge c and confor-
the other SU(2). Cancelation of the anomaly in the
mal anomaly a can be defined as coefficients of
U(1) R-symmetry requires that the As and the Bs
certain curvature invariants in the trace of the stress
each have R-charge 1=2. For consistency of the
energy tensor of the conformal gauge theory:
duality, it is necessary that we add an exactly
marginal superpotential which preserves the SU(2)  hT i aE4  cI4 41
SU(2)  U(1)R symmetry of the theory. Since a
(The curvature invariants E4 and I4 are quadratic in
marginal superpotential has R-charge equal to 2 it
the Riemann tensor and vanish for Minkowski
must be quartic, and the symmetries fix it uniquely
space.) As discussed above, correlators such as hT i
up to overall normalization:
can be calculated from supergravity, and one finds
W ij kl tr Ai Bk Aj Bl 40 3 N 2
ac 42
There are in fact infinite families of CalabiYau 4 volY
cones X, but there are two problems one faces in
On the gauge-theory side of the correspondence,
studying these generalized AdS/CFT correspon-
anomalies completely determine a and c:
dences. The first is geometric: the cones X are not
all well understood and only for relatively few do 3
a 32 3 tr R3  tr R
we have explicit metrics. However, it is often 1
c 32 9 tr R3  5 tr R 43
possible to calculate important quantities such as
the vol(Y) without knowing the metric. The second The trace notation implies a sum over the R-charges
problem is gauge theoretic: although many techni- of all of the fermions in the gauge theory. (From the
ques exist, there is no completely general procedure geometric knowledge that a = c, we can conclude
for constructing the gauge theory on a stack of D- that tr R = 0.)
branes at an arbitrary singularity. The R-charges can be determined using the
Let us mention two important classes of Calabi principle of a-maximization. For a superconformal
Yau cones X. The first class consists of cones over gauge theory, the R-charges of the fermions
the so-called Y p, q SasakiEinstein spaces. Here, p maximize a subject to the constraints that the
182 AdS/CFT Correspondence

NovikovShifmanVainshteinZakharov (NSVZ) the 5-form RR field strengths, and their back-reaction


beta function of each gauge group vanishes and on the geometry. This back-reaction creates a geo-
the R-charge of each superpotential term is 2. metric transition to the deformed conifold
For the Y p, q spaces mentioned above, one finds
X
4
that z2a 2 46
 p a1
q2 2p 4p2  3q2
volY p;q  p 
3 and introduces a warp factor so that the full ten-
2 2 2
3p 3q  2p p 4p  3q 2 2
dimensional geometry has the form
44 ds10 2 h1=2 dx0 2
The gauge theory consists of p  q fields Z, p q dxi 2 h1=2  d~s6 2 47
fields Y, 2p fields U, and 2q fields V. These fields all 2
where d~s6 is the CalabiYau metric of the deformed
transform in the bifundamental representation of a
conifold, which is known explicitly.
pair of SU(N) gauge groups (the quiver diagram for
The field-theoretic interpretation of this solution is
Y 4, 3 is given in Figure 2). The NSVZ beta function
unconventional. After a finite amount of RG flow, the
and superpotential constraints determine the
SU(N M) group undergoes a Seiberg duality trans-
R-charges up to two free parameters x and y. Let x
formation. After this transformation, and
be the R-charge of Z and y the R-charge of Y. Then
an interchange of the two gauge groups, the new
the U have R-charge 1  (1=2)(x y) and the V ~  SU(N ~ M) with the same
gauge theory is SU(N)
have R-charge 1 (1=2)(x  y). ~ = N  M. The
matter and superpotential, and with N
The technique of a maximization leads to the result
self-similar structure of the gauge theory under the
1  p Seiberg duality is the crucial fact that allows this
x 2 4p2 2pq 3q2 2p  q 4p2  3q2
3q pattern to repeat many times. If N = (k 1)M, where
1  p k is an integer, then the duality cascade stops after k
y 2 4p2  2pq 3q2 2p q 4p2  3q2 steps, and we find SU(M)  SU(2M) gauge theory.
3q
This IR gauge theory exhibits a multitude of interesting
Thus, as calculated by Benvenuti et al. (2004) and effects visible in the dual supergravity background.
Bertolini et al. (2004) One of them is confinement, which follows from the
3 N 2 fact that the warp factor h is finite and nonvanishing at
aY p;q 45 the smallest radial coordinate,  = 0. The methods
4 volY p;q
presented in the section Calculation of Wilson loops,
in remarkable agreement with the prediction [42] of then imply that the quarkantiquark potential grows
the AdS/CFT duality. linearly at large distances. Other notable IR effects
are chiral symmetry breaking and the Goldstone
mechanism. Particularly interesting is the appearance
A Path to a Confining Theory
of an entire baryonic branch of the moduli space in
There exists an interesting way of breaking the the gauge theory, whose existence has been demon-
conformal invariance for spaces Y whose topology strated also in the dual supergravity language.
includes an S2 factor (examples of such spaces
include T 1, 1 and Y p, q , which are topologically
Conclusions
S2  S3 ). At the tip of the cone over Y, one may
add M wrapped D5 branes to the N D3 branes. The This article tries to present a logical path from
gauge theory on such a combined stack is no longer studying gravitational properties of D-branes to the
conformal; it exhibits a novel pattern of quasiperiodic formulation of an exact duality between conformal
renormalization group flow, called a duality cascade. field theories and string theory in anti-de Sitter
To date, the most extensive study of a theory of this backgrounds, and also sketches some methods for
type has been carried out for the conifold, where one breaking the conformal symmetry. Due to space
finds an N = 1 supersymmetric SU(N)  SU(N M) limitations, many aspects and applications of the
theory coupled to chiral superfields A1 , A2 in the AdS/CFT correspondence have been omitted. At
(N, N M) representation, and B1 , B2 in the the moment, practical applications of this duality
(N, N M) representation. D5 branes source RR are limited mainly to very strongly coupled, large-N
3-form flux; hence, the supergravity dual of this gauge theories, where the dual string description is
theory has to include M units of this flux. Klebanov well approximated by classical supergravity. To
and Strassler (2000) found an exact nonsingular understand the implications of the duality for more
supergravity solution incorporating the 3-form and general parameters, it is necessary to find better
Affine Quantum Groups 183

methods for attacking the world sheet approach to Bertolini M, Bigazzi F, and Cotrone AL (2004) New checks and
string theories in anti-de Sitter backgrounds with RR subtleties for AdS/CFT and a-maximization. JHEP 0412: 024
(arXiv:hep-th/0411249).
background fields turned on. When such methods are Bigazzi F, Cotrone AL, Petrini M, and Zaffaroni A (2002) Super-
found, it is likely that the material presented here will gravity duals of supersymmetric four dimensional gauge theories.
have turned out to be just a tiny tip of a monumental Rivista del Nuovo Cimento 25N12: 1 (arXiv:hep-th/0303191).
iceberg of dualities between fields and strings. DHoker E and Freedman DZ (2002) Supersymmetric gauge
theories and the AdS/CFT correspondence, arXiv:hep-th/
0201253.
Gauntlett J, Martelli D, Sparks J, and Waldram D (2004) Sasaki
Acknowledgments Einstein metrics on S2  S3. Advances in Theoretical
The authors are very grateful to all their colla- Mathematics in Physics 8: 711 (arXiv:hep-th/0403002).
Gubser SS, Klebanov IR, and Polyakov AM (1998) Gauge theory
borators on gauge/string duality for their valuable
correlators from noncritical string theory. Physics Letters B
input over many years. The research of I R Klebanov 428: 105 (hep-th/9802109).
is supported in part by the National Science Herzog CP, Klebanov IR, and Ouyang P (2002) D-branes on the
Foundation (NSF) grant no. PHY-0243680. The conifold and N = 1 gauge/gravity dualities, arXiv:hep-th/
research of C P Herzog is supported in part by the 0205100.
Klebanov IR (2000) TASI lectures: introduction to the AdS/CFT
NSF under grant no. PHY99-07949. Any opinions,
correspondence, arXiv:hep-th/0009139.
findings, and conclusions or recommendations Klebanov IR and Strassler MJ (2000) Supergravity and a
expressed in this material are those of the authors confining gauge theory: Duality cascades and -resolution of
and do not necessarily reflect the views of the NSF. naked singularities. JHEP 0008: 052 (arXiv:hep-th/0007191).
Maldacena J (1998) The large N limit of superconformal field
See also: Brane Construction of Gauge Theories; Branes theories and supergravity. Advances in Theoretical and
and Black Hole Statistical Mechanics; Einstein Equations: Mathematical Physics 2: 231 (hep-th/9711200).
Maldacena JM (1998) Wilson loops in large N field theories.
Exact Solutions; Gauge Theories from Strings; Large-N
Physics Review Letters 80: 4859 (arXiv:hep-th/9803002).
and Topological Strings; Large-N Dualities; Mirror Polchinski J (1998) String Theory. Cambridge: Cambridge
Symmetry: A Geometric Survey; Quantum University Press.
Chromodynamics; Quantum Field Theory in Curved Polyakov AM (1999) The wall of the cave. International Journal
Spacetime; Superstring Theories. of Modern Physics A 14: 645.
Rey SJ and Yee JT (2001) Macroscopic strings as heavy quarks in
large N gauge theory and anti-de Sitter supergravity.
Further Reading European Physics Journal C 22: 379 (arXiv:hep-th/9803001).
Semenoff GW and Zarembo K (2002) Wilson loops in SYM
Aharony O, Gubser SS, Maldacena JM, Ooguri H, and Oz Y theory: from weak to strong coupling. Nuclear Physics
(2000) Large N field theories, string theory and gravity. Proceeding Supplements 108: 106 (arXiv:hep-th/0202156).
Physics Reports 323: 183 (arXiv:hep-th/9905111). Strassler MJ The duality cascade, TASI 2003 lectures, arXiv:hep-
Benvenuti S, Franco S, Hanany A, Martelli D, and Sparks J (2005) th/0505153.
An infinite family of superconformal quiver gauge theories with Witten E (1998) Anti-de Sitter space and holography. Advances in
SasakiEinstein duals. JHEP 0506: 064 (arXiv:hep-th/0411264). Theoretical and Mathematical Physics 2: 253 (hep-th/9802150).

Affine Quantum Groups


G W Delius and N MacKay, University of York, One can distinguish three classes of affine quantum
York, UK groups, each leading to a different dependence of the
2006 G W Delius. Published by Elsevier Ltd. R-matrices on the spectral parameter u: Yangians
All rights reserved. lead to rational R-matrices, quantum affine algebras
lead to trigonometric R-matrices, and elliptic quan-
tum groups lead to elliptic R-matrices. We will mostly
Affine quantum groups are certain pseudoquasitriangu- concentrate on the quantum affine algebras but many
lar Hopf algebras that arise in mathematical physics results hold similarly for the other classes.
in the context of integrable quantum field theory, After giving mathematical details about quantum
integrable quantum spin chains, and solvable lattice affine algebras and Yangians in the first two sections,
models. They provide the algebraic framework behind we describe how these algebras arise in different
the spectral parameter dependent YangBaxter equation areas of mathematical physics in the three following
sections. We end with a description of boundary
R12 uR13 u vR23 v quantum groups which extend the formalism to the
R23 vR13 u vR12 u 1 boundary YangBaxter (reflection) equation.
184 Affine Quantum Groups

Quantum Affine Algebras To define the quantization of U(g ^), one can either
^) (Drinfeld 1985) as an algebra over the
define Uh (g
Definition
ring C[[h]] of formal power series over an indeter-
A quantum affine algebra Uq (g ^) is a quantization of minate h or one can define Uq (g ^) (Jimbo 1985) as an
^
the enveloping algebra U(g) of an affine Lie algebra algebra over the field Q(q) of rational functions of q
(KacMoody algebra) g ^. So we start by introducing with coefficients in Q. We will present Uh (g ^) first.
affine Lie algebras and their enveloping algebras The quantum affine algebra Uh (g ^) is the unital
before proceeding to give their quantizations. algebra over C[[h]] topologically generated by
Let g be a semisimple finite-dimensional Lie algebra Hi , E
i for i = 0, 1, . . . , r and D with relations
over C of rank r with Cartan matrix (aij )i,j = 1,..., r , h i
symmetrizable via positive integers di , so that di aij is Hi ; E 
j aij E i ; Hi ; Hj  0
symmetric. In terms of the simple roots i , we have h i qH
i  qi
i Hi
E ; E 
 ij 4
i  j ji j2 i j
qi  q1i
aij 2 and di :
ji j2 2  
D; Hi  0; D; E i i;0 E
i
P
We can introduce an 0 = ri = 1 ni i in such a way
that the extended Cartan matrix (aij )i,j = 0,..., r is of 
1a
Xij
affine type that is, it is positive semidefinite of k 1  aij  k    1aij k
1 E
i Ej Ei 0; i 6 j
rank r. The integers ni are referred to as Kac indices. k qi
k0
Choosing 0 to be the highest root of g leads to an
untwisted affine KacMoody algebra while choosing where qi = qdi and q = eh . The q-binomial coeffi-
0 to be the highest short root of g leads to a twisted cients are defined by
affine KacMoody algebra. qn  qn
One defines the affine Lie algebra g ^ corresponding nq 5
q  q1
to this affine Cartan matrix as the Lie algebra
(over C) with generators Hi , E i for i = 0, 1, . . . , r
and D with relations nq ! nq  n  1q . . .2q 1q 6
h i
Hi ; E 
j aij E i ; Hi ; Hj  0 
h i m mq !
7
E
i ; Ej

ij Hi 2 n q nq !m  nq !
 
D; Hi  0; D; Ei i;0 E i ^) is a Hopf
The quantum affine algebra Uh (g
1a
Xij  
1  aij   k    1aij k algebra with coproduct
1k E
i Ej E i 0; i 6 j
k
k0 D D  1 1  D
The E
i are referred to as Chevalley generators and Hi Hi  1 1  Hi 8
the last set of relations are known as Serre relations.   Hi =2 H =2
 E
i E
i  qi qi i  E
i
The generator D is known as the canonical deriva-
tion. We will denote the algebra obtained by antipode
dropping the generator D by g ^0 . SD D; SHi Hi
^
In applications to physics, the affine Lie algebra g   1 
9
often occurs in an isomorphic form as the loop Lie S Ei qi Ei
algebra g[z, z1 ]  C  c with Lie product (for and co-unit
untwisted g ^)  
D Hi  E
i 0 10
k l kl
Xz ; Yz  X; Yz k;l X; Yc; It is easy to see that the classical enveloping
for X; Y 2 g; k; l 2 Z 3 algebra U(g ^) can be obtained from the above by
setting h = 0, or more formally,
and c being the central element.
The universal enveloping algebra U(g ^) of g
^ is the ^=hUh g
U h g ^ Ug
^
unital algebra over C with generators Hi , E i for
i = 0, 1, . . . , r and D and with relations given by [2] We can also define the quantum affine algebra
where now [ , ] stands for the commutator instead of ^) as the algebra over Q(q) with generators
U q (g
the Lie product. Ki , E
i , D for i = 0, 1, . . . , r and relations that are
Affine Quantum Groups 185

^) by
obtained from the ones given above for Uh (g gradation, s0 = 1, s1 =    = sr = 0, and the prin-
setting cipal gradation, s0 = s1 =    = sr = 1. We shall
Hi =2
also need the spin gradation si = di1 . The
qi Ki ; i 0; . . . ; r 11 representations
One can go further to an algebraic formulation over  
 
C in which q is a complex number (with some points
including q = 0 not allowed). This has the advantage play an important role in applications to integrable
that it becomes possible to specialize, for example, to models where  is referred to as the (multiplicative)
q a root of unity, where special phenomena occur. spectral parameter. In applications to particle scatter-
ing introduced in a later section, it is related to the
Representations rapidity of the particle. The generator D can be
For applications in physics, the finite-dimensional realized as an infinitesimal scaling operator on  and
representations of Uh (g ^0 ) are the most interesting. As thus plays the role of the Lorentz boost generator.
will be explained in later sections, these occur, for The tensor product representations a  b are
example, as particle multiplets in 2D quantum field irreducible generically but become reducible for
theory or as spin Hilbert spaces in quantum spin certain values of = , a fact which again is important
chains. In the next subsection, we will use them to in applications (fusion procedure, particle-bound
derive matrix solutions to the YangBaxter equation. states).
While for a nonaffine quantum algebra Uh (g)
the ring of representations is isomorphic to that of
R-Matrices
the classical enveloping algebra U(g) (because in fact
the algebras are isomorphic, as Drinfeld has pointed A Hopf algebra A is said to be almost cocommu-
out), the corresponding fact is no longer true for affine tative if there exists an invertible element R 2 A  A
quantum groups, except in the case g ^ = a(1) d such that
n = sln1 .
For the classical enveloping algebras U(g ^0 ), any
finite-dimensional representation of U(g) also carries Rx

xR; for all x 2 A 13
a finite-dimensional representation of U(g ^0 ). In the where
: x  y 7! y  x exchanges the two factors in
quantum case, however, in general, an irreducible the coproduct. In a quasitriangular Hopf algebra,
representation of Uh (g ^0 ) reduces to a sum of this element R satisfies
representations of Uh (g).
To classify the finite-dimensional representations   idR R13 R23
^0 ), it is necessary to use a different realization
of Uh (g 14
id  R R13 R12
^0 ) that looks more like a quantization of the
of Uh (g
loop algebra realization [3] than the realization in and is known as the universal R-matrix (see Hopf
terms of Chevalley generators. In terms of the Algebras and q-Deformation Quantum Groups). As
generators in this alternative realization, which we a consequence of [13] and [14], it automatically
do not give here because of its complexity, the satisfies the YangBaxter equation
finite-dimensional representations can be viewed as
pseudo-highest-weight representations. There is a set R12 R13 R23 R23 R13 R12 15
of r fundamental representations V a , a = 1, . . . r, For technical reasons, to do with the infinite number
each containing the corresponding Uh (g) fundamen- of root vectors of g^, the quantum affine algebra Uh (g ^)
tal representation as a component, from the tensor does not possess a universal R-matrix that is an
products of which all the other finite-dimensional element of Uh (g ^)  Uh (g^). However, as pointed out
representations may be constructed. The details can by Drinfeld (1985), it possesses a pseudouniversal
be found in Chari and Pressley (1994). R-matrix R() 2 (Uh (g ^ 0 )  U h (g
^0 ))(()). The  is
Given some representation  : Uh (g ^0 ) ! End(V),
related to the automorphism  defined in [12].
we can introduce a parameter  with the help of When using the homogeneous gradation, R() is a
the automorphism  of Uh (g ^0 ) generated by D and
formal power series in .
given by When the pseudouniversal R-matrix is evaluated
 
 E si E in the tensor product of any two indecomposable
i i
i 0; . . . ; r 12 finite-dimensional representations 1 and 2 , one
 Hi Hi
obtains a numerical R-matrix
Different choices for the si correspond to different
gradations. Commonly used are the homogeneous R12  1  2 R 16
186 Affine Quantum Groups

The entries of these numerical R-matrices are (with summation over repeated indices). The Yan-
rational functions of the multiplicative spectral gian Y(g ) is the algebra generated by these and a
parameter  but when written in terms of the second set of generators Ja satisfying
additive spectral parameter u = log () they are
Ia ; Jb  fabc Jc
trigonometric functions of u and satisfy the Yang
Baxter equation in the form given in [1]. The matrix Ja Ja  1 1  Ja 12 fabc Ic  Ib
 12 

R12 
R The requirement that  be a homomorphism
imposes further relations:
satisfies the intertwining relation

Ja ; Jb ; Ic   Ia ; Jb ; Jc  abcdeg fId ; Ie ; Ig g
R 12 =  1  2 x


and
 12 =
2  1 x  R 17
Ja ; Jb ; Il ; Jm  Jl ; Jm ; Ia ; Jb 
 
^0
for any x 2 Uh (g ). It follows from the irreducibility abcdeg flmc lmcdeg fabc Id ; Ie ; Jg
of the tensor product representations that these where
R-matrices satisfy the YangBaxter equations
1 X
 23 = R
id  R  13 =  idid  R  12 = abcdeg f f f f ; fx1 ; x2 ; x3 g xi xj xk
24 adi bej cgk ijk i6j6k
R 12 =  idid  R  13 =
 23 =  id When g = sl2 the first of these is trivial, while for
 R 18
g 6 sl2 the first implies the second. The co-unit is
or, graphically, (Ia ) = (Ja ) = 0; the antipode is s(Ia ) = Ia , s(Ja ) =
Ja (1=2)fabc Ic Ib . The Yangian may be obtained
V 3 V 2 V 1 V 3 V 2 V 1 from Uh (^g 0 ) by expanding in powers of h. For
the precise relationship, see Drinfeld (1985) and
= MacKay (2005). In the spin gradation, the auto-
morphism [12] generated by D descends to Y(g) as
Ia 7! Ia , Ja 7! Ja uIa .
V 1 V 2 V 3 V 1 V 2 V 3 There are two other realizations of Y(g). The first
Explicit formulas for the pseudouniversal (see, for example, Molev 2003) defines Y(gln )
R-matrices were found by Khoroshkin and Tolstoy. directly from
However, these are difficult to evaluate explicitly in Ru  vT1 uT2 v T2 vT1 uRu  v
specific representations so that in practice it is easiest
to find the numerical R-matrices R  ab () by solving the where T1 (u) = T(u)  id, T2 (v) = id  T(v), and
intertwining relation [17]. It should be stressed that X
n
solving the intertwining relation, which is a linear Tu tij u  eij
equation for the R-matrix, is much easier than directly i;j1
solving the YangBaxter equation, a cubic equation. tij u ij Iij u1 Jij u2   
where eij are the standard matrix units for g ln . The
rational R-matrix for the n-dimensional representa-
Yangians tion of g ln is
As remarked by Drinfeld (1986), for untwisted ^g the X
n
P
quantum affine algebra Uh (^g 0 ) degenerates as h ! 0 Ru  v 1  ; where P eij  eji
into another quasipseudotriangular Hopf algebra, uv i;j1
the Yangian Y(g ) (Drinfeld 1985). It is associated
is the transposition operator. Y(g ln ) is then defined
with R-matrices which are rational functions of the
to be the algebra generated by Iij , Jij , and must be
additive spectral parameter u. Its representation ring
quotiented by the quantum determinant at its
coincides with that of Uh (^g 0 ).
center to define Y(sln ). The coproduct takes a
Consider a general presentation of a Lie algebra g ,
particularly simple form,
with generators Ia and structure constants fabc ,
so that X
n
tij u tik u  tkj u
Ia ; Ib  fabc Ic ; Ia Ia  1 1  Ia k1
Affine Quantum Groups 187

Here we do not give explicitly the third realization, where R T() = T(1, 1; ) and T(x, y; ) =
y
namely Drinfelds new realization of Y(g ) (Drinfeld P exp( x L(; ) d). Taking the trace of this relation
1988), but we remark that it was in this presentation gives an infinity of charges in involution.
that Drinfeld found a correspondence between certain Quantization is problematic, owing to divergences
sets of polynomials and finite-dimensional irreducible in T. The QISM regularizes these by putting the
representations of Y(g ), thus classifying these (although model on a lattice of spacing , defining the lattice
not thereby deducing their dimension or constructing Lax operator to be
the action of Y(g )). As remarked earlier, the structure is
as in the earlier section: Y(g ) representations are in Ln  Tn  1=2; n 1=2; 
Z n1=2 !
general g -reducible, and there is a set of r fundamental
Y(g )-representations, containing the fundamental P exp L;  d
n1=2
g -representations as components, from which all
other representations can be constructed.
The lattice monodromy matrix is then T() =
liml ! 1, m ! 1 Tlm where Tlm = Lm Lm1 . . . Ll1 ,
and its trace again yields an infinity of commuting
Origins in the Quantum charges, provided that there exists a quantum
Inverse-Scattering Method R-matrix R(1 , 2 ) such that
Quantum affine algebras for general ^g first appear in R1 ; 2 L1n 1 L2n 2
Drinfeld (1985, 1986) and Jimbo (1985, 1986), but
they have their origin in the quantum inverse- L2n 2 L1n 1 R1 ; 2 19
scattering method (QISM) of the St. Petersburg
c2 ) first where L1n (1 ) = Ln (1 )  id, L2n (2 ) = id  Ln (2 ).
school, and the essential features of Uh (sl
That R solves the YangBaxter equation follows
appear in Kulish and Reshetikhin (1983). In this
from the equivalence of the two ways of intertwining
section, we explain how the quantization of the Lax-
Ln (1 )  Ln (2 )  Ln (3 ) with Ln (3 )  Ln (2 ) 
pair description of affine Toda theory led to the
Ln (1 ).
discovery of the Uh (^g ) coproduct, commutation
To compute Ln (), one uses the canonical, equal-
relations, and R-matrix. We use the normalizations
time commutation relations for the i and _ i . In
of Jimbo (1986), in which the Hi are rescaled so that
terms of the lattice fields
the Cartan matrix aij = i .j is symmetric.
We begin with the affine Toda field equations Z n1=2
pi;n _ i x dx
2X
r  
m aij j 0 :j j n1=2
@ @ i  e  ni e Z
n1=2 X
j1
qi;n e =2aij j x dx
n1=2 j
an integrable model in R 11 of r real scalar fields
i (x, t) with a mass parameter m and coupling the only nontrivial relation is [pi, n , qj, n ] =
constant . Equivalently, we may write (ih =2)ij qj, n , and one finds
[@x Lx , @t Lt ] = 0 for the Lax pair
! !
X X
X r
mX r   Ln  exp Hi pi;n exp Hj pj;n
Lx x; t Hi @t i e =2aij j E 
i Ei 2 i 4 j
2 i1 2 i;j1
"
  m X  
m X =2a0j j
r
1  qi;n E 
e E E  i Ei
2 j1 0
 0 2 i
Y n  #
X r
mX r   1
Lt x; t Hi @x i e =2aij j E  qi;n i E 0 E0

i  Ei 
2 i1 2 i;j1 i
!
 
mX r
1  X
e =2a0j j
E0  E0  exp Hj pj;n O2
2 j1  4 j

with arbitrary  2 C. The classical integrability of the the expression used by the St Petersburg school and
system is seen in the existence of r(, 0 ) such that by Jimbo. We now make the replacement
Hi =4  Hi =4
Ei 7! q Ei q , where q = exp(ih 2 =2), and
fT  T0 g r; 0 ; T  T0  compute the O() terms in [19], which reduce to
188 Affine Quantum Groups

RzHi  1 1  Hi (S-matrix) for solitons must be proportional to the


intertwiner for these tensor product representa-
Hi  1 1  Hi Rz

tions, the R matrix:
Rz Ei q
Hi =2
qHi =2  E i

 ab 
Sab  f ab R
Hi =2
qHi =2  E i E 
i  q Rz

with  proportional to u, the additive spectral
Rz z1 E
0 q
H0 =2
qH0 =2  E 0
parameter. The scalar prefactor f ab () is not deter-

mined by the symmetry but is fixed by other
H0 =2
qH0 =2  E 0 z1 
E 0  q Rz requirements like unitarity, crossing symmetry, and
the bootstrap principle.
where z = 1 =2 . We recognize in these the Uh (g ^) It turns out that the axiomatic properties of the
coproduct and thus the intertwining relations, in the R-matrices are in perfect agreement with the
homogeneous gradation. These equations were axiomatic properties of the analytic S-matrix. For
solved for R in defining representations of example, crossing symmetry of the S-matrix, gra-
nonexceptional g by Jimbo (1986). phically represented by
For ^g = slc2 , it was Kulish and Reshetikhin (1983)
b a b a b a
who first discovered that the requirement that the
coproduct must be an algebra homomorphism forces = = 20
the replacement of the commutation relations of i i
b 2 ) by those of Uh (sl
U(sl b 2 ); more generally it requires
a b a b a b
the replacement of U(^g ) by Uh (^g ).
is a consequence of the property of the universal
R-matrix with respect to the action of the antipode S,
Affine Quantum Group Symmetry
and the Exact S-Matrix S  1R R1
In the last section, we saw the origins of Uh (^g ) in the An S-matrix will have poles at certain imaginary
auxiliary algebra introduced in the Lax pair. rapidities ab
c corresponding to the formation of
However, the quantum affine algebras also play a virtual bound states. This is graphically represented
second role, as a symmetry algebra. An imaginary- in Figure 1b. The location of the pole is determined
coupled affine Toda field theory based on the affine by the masses of the three particles involved,
algebra ^g _ possesses the quantum affine algebra
Uh (^g ) as a symmetry algebra, where ^g _ is the m2c m2a m2b 2ma mb cosiab
c
Langland dual to ^g (the algebra obtained by At the bound state pole, the S-matrix will project
replacing roots by coroots).  matrix has to have
onto the multiplet V c . Thus, the R
The solitonic particle states in affine Toda theories this projection property as well and indeed, this turns
form multiplets which transform in the fundamental out to be the case. The bootstrap principle, whereby
representations of the quantum affine algebra. Multi- the S-matrix for a bound state is obtained from the
particle states transform in tensor product representa- S-matrices of the constituent particles,
tions V a  V b . The scattering of two solitons of type
c d c
a and b with relative rapidity  is described by the
S-matrix Sab () : V a  V b ! V b  V a , graphically
d
represented in Figure 1a. It then follows from the = 21
symmetry that the two-particle scattering matrix d

a b d a b
b a b a
is a consequence of the property [14] of the universal
c R-matrix with respect to the coproduct.
There is a famous no-go theorem due to Coleman
cab
a b a b and Mandula which states the impossibility of
(a) (b) combining space-time and internal symmetries in
Figure 1 (a) Graphical representation of a two-particle any but a trivial way. Affine quantum group
scattering process described by the S-matrix Sab (). (b) At symmetry circumvents this no-go theorem. In fact,
special values cab of the relative spectral parameter, the two the derivation D is the infinitesimal two-dimensional
particles of types a and b form a bound state of type c. Lorentz boost generator and the other symmetry
Affine Quantum Groups 189

charges transform nontrivially under these Lorentz or, graphically,


transformations, see [2].
The noncocommutative coproduct [8] means W
=
that a Uh (g^) symmetry generator, when acting on a W
2-soliton state, acts differently on the left soliton V1 V2 ... Vn V1 V2 ... Vn
than on the right soliton. This is only possible
because the generator is a nonlocal symmetry charge One defines the transfer matrix
that is, a charge which is obtained as the space
 trW T
integral of the time component of a current which
itself is a nonlocal expression in terms of the fields which is now an operator on V n , the Hilbert space
of the theory. of the quantum spin chain. Due to [22], two transfer
Similarly, many nonlinear sigma models possess matrices commute,
nonlocal charges which form Y(g ), and the con-
; 0  0
struction proceeds similarly, now utilizing rational
R-matrices, and with particle multiplets forming and thus the () can be seen as a generating
fundamental representations of Y(g ). In each case, function of an infinite number of commuting
the three-point couplings corresponding to the charges, one of which will be chosen as the
formation of bound states, and thus the analogs for Hamiltonian. This Hamiltonian can then be diag-
Uh (^g ) and Y(g ) of the ClebschGordan couplings, onalized using the algebraic Bethe ansatz.
obey a rather beautiful geometric rule originally One is usually interested in the thermodynamic
deduced in simpler, purely elastic scattering models limit where the number of spins goes to infinity. In
(Chari and Pressley 1996). this limit, it has been conjectured, the Hilbert space
More details about this topic can be found in of the spin chain carries a certain infinite-dimensional
Delius (1995) and MacKay (2005). representation of the quantum affine algebra and this
has been used to solve the model algebraically, using
vertex operators (Jimbo and Miwa 1995).
Integrable Quantum Spin Chains
Affine quantum groups provide an unlimited supply Boundary Quantum Groups
of integrable quantum spin chains. From any
In applications to physical systems that have a
R-matrix R() for any tensor product of finite-
boundary, the YangBaxter equation [1] appears in
dimensional representations W  V, one can pro-
conjunction with the boundary YangBaxter equa-
duce an integrable quantum system on the Hilbert
tion, also known as the reflection equation,
space V n . This Hilbert space can then be inter-
preted as the space of n interacting spins. The space R12 u  vK1 uR21 u vK2 v
W is an auxiliary space required in the construction K2 vR12 u vK1 uR21 u  v 23
but not playing a role in the physics.
Given an arbitrary R-matrix R(), one defines the The matrices K are known as reflection matrices. This
monodromy matrix T() 2 End(W  V n ) by equation was originally introduced by Cherednik to
describe the reflection of particles from a boundary in
T R01   1 R02   2    R0n   n an integrable scattering theory and was used by
where, as usual, Rij is the R-matrix acting on the Sklyanin to construct integrable spin chains and
ith and jth component of the tensor product space. quantum field theories with boundaries.
The i can be chosen arbitrarily for convenience. Boundary quantum groups are certain co-ideal
Graphically the monodromy matrix can be repre- subalgebras of affine quantum groups. They provide
sented as the algebraic structures underlying the solutions of the
boundary YangBaxter equation in the same way in
W which affine quantum groups underlie the solutions of
the ordinary YangBaxter equation. Both allow one
V1 V2 V3 . . . Vn 1 Vn
to find solutions of the respective YangBaxter
As a consequence of the YangBaxter equation equation by solving a linear intertwining relation. In
satisfied by the R-matrices the monodromy matrix the case without spectral parameters these algebras
satisfies appear in the theory of braided groups (see Hopf
Algebras and q-Deformation Quantum Groups and
RTT TTR 22 Braided and Modular Tensor Categories).
190 Affine Quantum Groups

For example, the subalgebra B (^g ) of Uh (^g 0 ) algebra Y(g , h) generated by the Ii , ~Jp is, like B (^g ),
generated by a co-ideal subalgebra, (Y(g , h)) Y(g )  Y(g , h),
Hi =2 Hi
and again yields an intertwining relation for
Qi qi E 
i Ei i qi  1; K-matrices. For g = sln and h = so n or sp 2n , Y(g , h)
i 0; . . . ; r 24 is the twisted Yangian described in Molev (2003).
All the constructions in earlier sections of this
is a boundary quantum group for certain choices of
review have analogs in the boundary setting. For
the parameters i 2 C[[h]]. It is a left co-ideal
more details see Delius and MacKay (2003) and
subalgebra of Uh (^g 0 ) because
MacKay (2005).
Qi Qi  1 qH g 0  B g
i  Qi 2 Uh ^
i
^ 25
See also: Bethe Ansatz; Boundary Conformal Field
Intertwiners K() : V ! V= for some constant  Theory; Classical r-Matrices, Lie Bialgebras, and Poisson
satisfying Lie Groups; Hopf Algebras and q-Deformation Quantum
Groups; RiemannHilbert Problem; Solitons and
^
K Q = QK; for all Q 2 B g 26 KacMoody Lie Algebras; YangBaxter Equations.
provide solutions of the reflection equation in the
form
 12  id  K1 R
 21 = Further Reading
id  K2 R
 12 = id  K1  Chari V and Pressley AN (1994) Quantum Groups. Cambridge:
R
Cambridge University Press.
R  21  id  K2 27 Chari V and Pressley AN (1996) Yangians, integrable quantum
systems and Doreys rule. Communications in Mathematical
This can be extended to the case where the Physics 181: 265302.
boundary itself carries a representation W of B (^g ). Delius GW (1995) Exact S-matrices with affine quantum group
symmetry. Nuclear Physics B 451: 445465.
The boundary YangBaxter equation can be repre-
Delius GW and MacKay NJ (2003) Quantum group symmetry in
sented graphically as sine-Gordon and affine Toda field Theories on the Half-Line,
2
Communications in Mathematical Physics 233: 173190.
V 1/ 2
V 1/ Drinfeld V (1985) Hopf algebras and the quantum YangBaxter
1
V 1/ equation. Soviet Mathematics Doklady 32: 254258.
Drinfeld V (1986) Quantum Groups, Proc. Int. Cong. Math.
1
(Berkeley), pp. 798820.
V 1/ Drinfeld V (1988) A new realization of Yangians and quantized
= V 1
affine algebras. Soviet Mathematics Doklady 36: 212216.
Jimbo M (1985) A q-difference analogue of Ug and the Yang
V 1 Baxter equation. Letters in Mathematical Physics 10: 6369.
W Jimbo M (1986) Quantum R-matrix for the generalized Toda
V 2 V 2 W system. Communications in Mathematical Physics 102:
537547.
Another example is provided by twisted Yangians Jimbo M and Miwa T (1995) Algebraic Analysis of Solvable
where, when the Ia and Ja are constructed as Lattice Models. Providence, RI: American Mathematical
Society.
nonlocal charges in sigma models, it is found that Kulish PP and Reshetikhin NY (1983) Quantum linear problem
a boundary condition which preserves integrability for the sine-Gordon equation and higher representations.
leaves only the subset Journal of Soviet Mathematics 23: 2435.
MacKay NJ (2005) Introduction to Yangian symmetry in
Ii and ~Jp Jp 1 fpiq Ii Iq Iq Ii integrable field theory. International Journal of Modern
4
Physics (to appear).
conserved, where i labels the h-indices and p, q the Molev A (2003) Yangians and their applications. In: Hazewinkel
k-indices of a symmetric splitting g = h k. The M (ed.) Handbook of Algebra, vol. 3, pp. 907959. Elsevier.
AharonovBohm Effect 191

AharonovBohm Effect
M Socolovsky, Universidad Nacional Autonoma de In the context of the Schrodinger equation, one
Mexico, Mexico DF, Mexico can show that due to gauge invariance, if 0 is a
2006 Elsevier Ltd. All rights reserved. solution to the equation in the absence of an
electromagnetic potential, then the product of
0 (x) times the integral of A over a path joining
an arbitrary reference point x0 to x is also a
Introduction solution, if the integral is path independent. How-
In classical electrodynamics, the interaction of charged ever, it is the path integral of Feynman which in the
particles with the electromagnetic field is local, formulas for propagators of charged particles in the
through the pointlike coupling of the electric charge presence of electromagnetic fields clearly shows that
of the particles with the electric and magnetic fields, E the action of these fields on charged particles is
and B, respectively. This is mathematically expressed nonlocal, and it is given by the celebrated non-
by the Lorentz-force law. The scalar and vector integrable (path-dependent) phase factor of Wu and
potentials, and A, which are the time and space Yang (1975). Moreover, this fact provides an
components of the relativistic 4-potential A , are additional proof of the nonlocal character of
considered auxiliary quantities in terms of which quantum mechanics: to surround fluxes, or to
the field strengths E and B, the observables, are develop a potential difference, the particle has to
expressed in a gauge-invariant manner. The homo- travel simultaneously at least through two paths.
geneous or first pair of Maxwell equations are a direct Thus, the fact that the AharonovBohm (AB)
consequence of the definition of the field strengths in effect was verified experimentally, by Chambers and
terms of A_ The inhomogeneous or second pair of others, demonstrates the necessity of introducing the
Maxwell equations, which involve the charges and (gauge-dependent) potential A in describing the
currents present in the problem, are also usually electromagnetic interactions of the quantum parti-
written in terms of E and B ; however when writing cle. This is widely regarded as the single most
them in terms of A , the number of degrees of freedom important piece of evidence for electromagnetism
of the electromagnetic field is explicitly reduced from being a gauge theory. Moreover, it shows, to
six to four; and finally, with two additional gauge paraphrase Yang, that the field underdescribes the
transformations, one ends with the two physical physical theory, while the potential overdescribes it,
degrees of freedom of the electromagnetic field. and it is the phase factor which describes it exactly.
In quantum mechanics, however, both the The content of this article is essentially twofold.
Schrodinger equation and the path-integral approaches The first four sections are mainly physical, where we
for scalar and unpolarized charged particles in the describe the magnetic AB effect using the
presence of electromagnetic fields, are written in Schrodinger equation and the Feynman path inte-
terms of the potential and not of the field strengths. gral. The fifth section is geometrical and is the long-
Even in the case of the SchrodingerPauli equation est of the article. We describe the effect in the
for spin 1=2 electrons with magnetic moment m context of fiber bundles and connections, namely
interacting with a magnetic field B, one knows that as a result of the coupling of the wave function
the coupling m B is the nonrelativistic limit of the (section of an associated bundle) to a nontrivial
Dirac equation, which depends on A but not on E and flat connection (non-pure gauge vector potential
B_ Since gauge invariance also holds in the quantum with zero magnetic field) in a trivial bundle (the
domain, it was thought that A and were mere AB bundle) with topologically nontrivial (non-
auxiliary quantities, like in the classical case. simply-connected) base space. We discuss the mod-
Aharonov and Bohm, in 1959, predicted a quan- uli space of flat connections and the holonomy
tum interference effect due to the motion of charged groups giving the phase shifts of the interference
particles in regions where B(E) vanishes, but not patterns. Finally, in the last section, we briefly
A(), leading to a nonlocal gauge-invariant effect comment on the nonabelian AB effect.
depending on the flux of the magnetic field in the
inaccessible region, in the magnetic case, and on the
difference of the integrals over time of time-varying Electromagnetic Fields in Classical Physics
potentials, in the electric case. (The magnetic effect
was already noticed 10 years before by Ehrenberg In classical physics, the motion of charged particles
and Siday in a paper on the refractive index of in the presence of electromagnetic fields is governed
electrons.) by the equation
192 AharonovBohm Effect

d  v  where  is a real-valued differentiable scalar


p = q E B 1 function (at least of class C2 ) on spacetime. That
dt c
is, if E0 , B0 , and S0int are defined in terms of A0 and
where 0 as E, B, and Sint are defined in terms of A and
mv , then E0 = E, B0 = B, and S0int = Sint . This fact
p = p
1  v2 =c2 leads to the concept that, classically, the observa-
bles E and B are the physical quantities, while A
is the mechanical momentum of the particle with is only an auxiliary quantity. Also, and most
electric charge q, mass m, and velocity v = x (c is important in the present context, eqn [1] states
the velocity of light in vacuum, and for jvj  c the that the motion of the particles is determined by
left-hand side (LHS) of [1] is approximately mv); the the values or state of the field strengths in an
right-hand side (RHS) is the Lorentz force, where E infinitesimal neighborhood of the particles, that is,
and B are, respectively, the electric and magnetic classically, E and B act locally. If one defines the
fields at the spacetime point (t, x) where the particle differential 1-form A  A dx (with dx0 = c dt),
is located. Equation [1] is easily derived from the then the components of the differential 2-form
EulerLagrange equation F = dA = (1=2)(@ A  @ A )dx ^ dx  (1=2)F
  dx ^ dx are precisely the electric and magnetic
d @L @L
 =0 2 fields:
dt @v @x 0 1
0 E1 E2 E3
with the Lagrangian L given by the sum of the free B C
Lagrangian for the particle, B E1 0 B3 B2 C
B C
F = B C 8
r B E2 B3 0 B A1C
v2 @
2
L0 = mc 1  2 3 E3 B2 B1 0
c
and the Lagrangian describing the particlefield At the level of A,
interaction,
q dF = d2 A = 0 9
Lint = A  v  q 4
c is an identity, but at the level of E and B, [9]
In [4], A and are, respectively, the vector potential amounts to the homogeneous (or first pair of)
and the scalar potential, which together form the Maxwell equations obeyed by the field strengths:
4-potential A = (A0 , A) = (, Ai ), i = 1, 2, 3, rB = 0 10a
in terms of which the electric and magnetic field
strengths are given by
1@
1@ rE B=0 10b
E=  A  r 5a c @t
c @t
Therefore, these equations have a geometrical
origin. The second pair of Maxwell equations is
B = rA 5b dynamical, and is obtained from the field action (in
The classical action corresponding to a given path of the Heaviside system of units)
Z
the particle is 1
Sfield =  d4 xF F 11
Z t2 Z t2 4c
S dt L dtL0 Lint which leads to r  E = 4 12a
Zt1t2 Zt1 t2
dt L0 dt Lint  S0 Sint 6
t1 t1
1@ 4j
rB E= 12b
E, B, and S are invariant under the gauge c @t c
transformation where (, j) = (j0 , j) is the 4-current satisfying, as a
consequence of [12a] and [12b], the conservation law
A ! A0 = A  r 7a
@ j = 0 13
1@ For a pointlike particle, (t, x) = q3 (x  x(t)) and
! 0 =  7b
c @t j = v.
AharonovBohm Effect 193

Electromagnetic Fields in Quantum Z xt0 x0  Z t0 


i 1
Physics Dxexp d mx2
xtx h t 2
In quantum physics, the motion of charged particles in  Z t0 
iq  0

external electromagnetic fields is governed by the  exp A  dx  dx
hc t
Schrodinger equation or, equivalently, by the Feynman Z xt0 x0  Z t0 
path integral. In both cases, however, it is the i 1
Dxexp d mx2
4-potential A which appears in the equations, and xtx h t 2
not the field strengths. For simplicity, we consider here  Z t0 
iq
scalar (spinless) charged particles or unpolarized  exp dx A 16
hc t
electrons (spin-(1=2)particles), both of which, in the
nonrelativistic approximation, can be described quan- R
where the integral Dx() . . . is over all continuous
tum mechanically by a complex wave function (t, x). spacetime paths (, x()) which join (t, x) with (t0 , x0 ).
To derive the Schrodinger equation, one starts If one knows the wave function at (t, x), then the
from the classical Hamiltonian wave function at (t0 , x0 ) is given by
1 q 2 Z
H = P  v  L  mc2 = P  A q 14
2 c t0 ; x0 = d3 x Kt0 ; x0 ; t; x t; x 17
where
An important point is the natural appearance in the
@ q
P= L = p A integrand of the functional integral of the factor
@v c R
is the canonical momentum of the particle, and we iq=hc A
e 

have subtracted its rest energy. The replacements


P ! i hr and H ! i h@=@t lead to for each path  joining (t, x) with (t0 , x0 ).
  2 
@ 1 q
i
h hr A q
i
@t 2m c A Solution to the Schrodinger Equation
 2
h
 q2
 r2 A2 In what follows, we shall restrict ourselves to static
2m 2mc2 magnetic fields; then in the previous formulas, we

i
hq i
hq set = 0 and A(t, x) = A(x). It is then easy to
rA A  r q 15
2mc mc show that if Rx0 is an arbitrary reference point and
x
the integral x0 A(x0 )  dx0 is independent of the
The gauge transformation [7a] and [7b] is a integration path from x0 to x, that is, it is a well-
symmetry of this equation, if simultaneously to the defined function f of x, and if 0 is a solution of
change of the 4-potential, the wave function trans- the free Schrodinger equation, that is,
forms as follows:
t; x ! 0
t; x = eiq=hc t; x 7c @ 2 2
h
ih 0 =  r 0 18
0 0 (iq=hc)
@t 2m
So, A and obey [15]. At each (t, x), e
belongs to U(1), the unit circle in the complex plane. then
In the path-integral approach, the kernel  Z x 
iq
K(t0 , x0 ;t, x), which gives the probability amplitude t; x = exp Ax0  dx0 0 t; x 19
for the propagation of the particle from the spacetime hc x0

point (t, x) to the spacetime point (t0 , x0 ) (t < t0 ), is


given by is a solution of [15]. In fact, replacing [19] in [15],
Kt0 ; x0 ; t; x the LHS gives
Z xt0 x0    
i iq @
Dxexp S0 Sint exp f x ih
xtx h
 hc @t
0
Z xt0 x0  Z t0 
i 1
Dxexp d mx2 while for the RHS one has
xtx h
 t 2
   
q iq h2
A  v  q exp f x  r2 0
c hc 2m
194 AharonovBohm Effect

The cancelation of the exponential factors shows of the figure (in direction z); outside of the solenoid,
that, under the condition of path independence, the magnetic field is zero. If the radius of the
there is no effect of the potential on the charged solenoid is R, a vector potential A that produces
particles. Another way to see this is by making a such field strength is given by
gauge transformation [7a][7c] with (x) = f (x),
(jBjr/2);
^ rR
which
Rx changes ! 0 and A ! A0 = A  r Ax 21
0 0 (/2r);
^ r>R
x0 A(x )  dx = A  A = 0.
The condition of path independence amounts, where  = R2 jBj and is a unit vector in the
however, to the condition
R that no magnetic field is azimuthal direction. In fact,
present since, if  A depends on , then for some R
pair of paths  and  0 from (t, x) to (t0 , x0 ), 0 6  jBj^z; r  R
R R R H R B = r  Ax = 22
0; r > R
A  0 A =  A 0 A = [(0 ) A =  ds  (r  A),
where in the last equality we applied Stokes theorem Notice that at r = R, A is continuous but not
( is any surface with boundary  [ ( 0 )), which continuously differentiable. Also, the ideal limit of
shows that B = r  A must not vanish everywhere an infinitely long solenoid makes the problem two-
and has a nonzero flux  through  given by dimensional, that is, in the xy plane.
Z The probability amplitude for an electron emitted
= ds  B 20 at the source S to arrive at the point P on the screen
 , is given by the sum of two probability ampli-
The conclusion of this section is that the ansatz [19] for tudes, namely those corresponding to passing
solving [15] can only be applied in simply connected through the slits 1 and 2. The solenoid is assumed
regions with no magnetic field strength present. to be impenetrable to the electrons; mathematically,
this corresponds to a motion in a non-simply-
connected region. In the approximation for the
AharonovBohm Proposal path integral [16], in which one considers the
In 1959, Aharonov and Bohm proposed an experi- contribution of only two classes of paths, that is,
ment to test, in quantum mechanics, the coupling of the class fg represented by path I, and the class
electric charges to electromagnetic field strengths f 0 g represented by path II, if the wave function at
through a local interaction with the electromagnetic the source is S , then the wave function at P is
potential A , but not with the field strengths given by
themselves. However, as we saw before, no physical Z R
ijej=hc A
effect exists, that is, A can be gauged away, unless P ei=hS0  e 

magnetic and/or electric fields exist somewhere, fg


Z R !
although not necessarily overlapping the wave func-
i=hS0  0 ijej=hc A
tion of the particles. e e 0
S
f 0 g
Consider the usual two-slit experiment as depicted R Z
in Figure 1, with the additional presence, behind the ijej=hc A
e I ei=hS0  S
slits, of a long and narrow solenoid enclosing a fg
nonvanishing magnetic flux  due to a constant and R Z
ijej=hc A 0
homogeneous magnetic field B normal to the plane e II ei=hS0  S
f 0 g
R 
ijej=hc A 0
e I
P I
R  
y ijej=hc A 0
e II[I
P II
R  
1 ijej=hc A 0 0
e I
P I e2i=0 P II
I P 23
z
R x
S
II where, in the second line, we used the path
2 independence of the integral of A within each class
of paths;
Z R
0 i=h S0 
P I = e fg
S
Figure 1 Magnetic AharonovBohm effect. fg
AharonovBohm Effect 195

and with
Z 
0 0
ei=hS0   = 2 27
P II = S 0
f 0 g
(Schulman 1971, Kobe 1979). As in [23],
and, in the last equality, we applied the extended
version of Stokes theorem (by Craven), to allow for P  k0 = P ; k2Z 28
noncontinuously differentiable vector potentials;
There is a close relation between the AB effect
and the quantum of magnetic flux associated with
and the Dirac quantization condition (DQC) in the
the charge jej is defined by
presence of electric and magnetic charges: according
c
h to [25] (or [26]) the AB effect disappears when the
0 = 2 4:135  107 G cm2 24 flux  equals n0 = 2n(hc=jej), n 2 Z, that is,
jej
p p when the condition
( = 2=jej = = 137 in the natural system jej = nhc 29
of units (n.s.u.) h = c = 1; is the fine structure
constant). Then the probability of finding the holds. But this is the DQC (Dirac 1931) when  is
electron at P is proportional to the flux associated with a magnetic charge g :
(g) = (g=4r2 )  4r2 = g, leading to jejg = nhc
2 0 2 0 2
j Pj =j P Ij j
P IIj (2n in the n.s.u.). This is precisely the condition for
 2i=0 0 0
 the Dirac string to be unobservable in quantum
2Re e P I P II 25
mechanics: to give no AB effect.
which exhibits an interference pattern shifted with
respect to that without the magnetic field: as B and
therefore  change, dark and bright interference
Geometry of the AB Effect
fringes alternate periodically at the screen , with In this section we study the space of gauge classes of
period 0 . This is the magnetic AB effect, which has flat potentials outside the solenoid, which determine
been quantitatively verified in many experiments, the the AB effect; the topological structure of the AB
first one in 1960 by Chambers. The effect is: bundle; and the holonomy groups of the connec-
tions, which precisely give the phase shifts of the
1. gauge invariant, since B and therefore  are
wave functions. We use the n.s.u. system; in parti-
gauge invariant; 1
cular, if [L] is the unit of length, then
p [A
p  ] = [L] ,
2. nonlocal, since it depends on the magnetic field 0
[jej] = [L] , and 0 = 2=jej = = 137, where
inside the solenoid, where the electrons never
is the fine structure constant.
enter;
To synthesize, one can say that the abelian AB
3. quantum mechanical, since classically the charges
effect is a nonlocal gauge-invariant quantum effect
do not feel any force and therefore no effect
due to the coupling of the wave function (section of
would be expected in this limit; and
an associated bundle) to a nontrivial (non-exact) flat
4. topological, since the electrons necessarily move
(closed) connection in a trivial principal bundle with
in a non-simply-connected space.
a non-simply-connected base space. In the following
But perhaps the most important implication of the subsections, we will give a detailed explanation of
AB effect is a dramatic additional confirmation of these statements.
the nonlocal character of quantum mechanics: the
electron has to travel along the two paths (I and The AB Bundle
II) simultaneously; on the contrary, no flux would
The gauge group of electromagnetism is the abelian
be surrounded and then no shift of the (then
Lie group U(1) with Lie algebra (the tangent space at
nonexistent) interference fringes would be observed
the identity) u(1) = iR. In the limit of an infinitely
at the screen .
long and infinitesimally thin solenoid carrying the
Calculations in the path-integral approach includ-
magnetic flux , the space available to the electrons
ing the whole set of homotopy classes of paths
is the plane minus a point, that is, R2
, which is of
around the solenoid, indexed by an integer m, have
the same homotopy type as the circle S1 . Then the
been performed by several authors, leading to a
set of isomorphism classes of U(1) bundles over R 2

formula of the type


is in one-to-one correspondence with the set of
X1 homotopy classes of maps from S0 to S1 (Steenrod
0
P = eim P m 26
m = 1 1951), which consists of only one point: if f , g :
196 AharonovBohm Effect

S0 ! S1 are given by f (1) = ei1 , f (1) = ei2 , where


g(1) = ei
1 , and g(1) = ei
2 , then H : S0  [0, 1] !
x dy  y dx
S1 given by H(1, t) = ei((1t)1 t
1 ) and H(1, t) = A0 = i 2 C0 35
ei((1t)2 t
2 ) is a homotopy between f and g. Then, x 2 y2
up to equivalence, the relevant bundle for the AB is the connection that, once multiplied by jej1 (see
effect is the product bundle below) generates the flux 0 and therefore no
AB effect: A0 is closed (dA0 = 0) but not
AB : U1 ! R 2
 U1 ! R2
30a exact ((x dy  y dx)=(x2 y2 ) = d only for 2
(0, 2), = 0 is excluded); [A0 ]DR = A0 d with
Since R2
is homeomorphic to an open disk minus a 2 0 (R2
; iR). gives an element of G through the
point (D20 )
, then the total space of the bundle is composite exp : R2
! U(1), (x, y) 7! ei (x, y) . The
homeomorphic to an open solid 2-torus minus a AB effect with flux  =  0 is produced by the
circle, since (T02 )
= (D20 )
 S1 . Then the AB connection A = A0 . To determine M0 , one finds
bundle has the topological structure the smallest  2 R such that ( )A0 A0 , that is,
( )A0 2 [ A0 ], which means, from [33], that
AB : S1 ! T02
! D20
30b ( )A0 = A0 f 1 df or A0 = f 1 df . For 6
0, A0 = id and f11 df1 = id, then  = 1, and
therefore ( 1)A0 A0 , in particular A0 0.
A remark concerning the gauge group G is the
The Gauge Group and the Moduli Space of Flat following. In classical electrodynamics, according to
Connections [7a] and [7b], the symmetry group could be taken to
The gauge group of the bundle AB is the set of be the additive group (R, ) instead of the multi-
smooth functions from the base space to the plicative group U(1). Since R is contractible, then
structure group, that is, G = C1 (R2
, U(1)). Since the gauge group would be Gcl = C1 (R2
, R) with
G C0 (R2
, U(1)) = fcontinuous functions R2
! U [R2
, R] 0, so that the homomorphism  : Gcl ! G,
(1)g and [R2
, U(1)] = fhomotopy classes of contin- (f ) (x) = eif (x) would not exhaust G since (f ) 2 [1]
uous functions R2
! U(1)g [S1 , S1 ] 1 (S1 ) for any f 2 Gcl : in fact, H : R2
 [0, 1] ! U(1)
Z, given f 2 G there exists a unique n 2 Z such given by H(x, t) = ei(1t)f (x) is a homotopy between
that f is homotopic to fn (f fn ), where fn : R 2
! (f ) and 1. However, the quantization of electric
U(1) is given by fn (rei ) = ein , 2 [0, 2). charges implies that in fact the gauge group is U(1)
G acts on the space of flat connections on AB and not R. This is equivalent mathematically to the
given by the closed u(1)-valued differential 1-forms possible existence of magnetic monopoles which
on R2
: require nontrivial bundles for their description.

C0 = fA 2 1 R2
; u1; dA = 0g 31 Covariant Derivative, Parallel Transport,
and Holonomy
through
Let G be a matrix Lie group with Lie algebra g, B a

C0  G ! C0 ; A; f ! A f 1 df 32 differentiable manifold, : G ! P! B a principal
bundle, V a vector space, G  V ! V an action,
where f 1 (x, y) = (f (x, y))1 . The moduli space and V : V ! P G V ! v
B the corresponding asso-
ciated vector bundle ( V is trivial if is trivial). Call
C0 ( V ) the sections of V , (TB)( (TP)) the sections
M0 fgauge equivalence classes
G of the tangent bundle of B(P), and eq (P, V) the set
of flat connections on AB g of functions  : P ! V satisfying (pg) = g1 (p)
(equivariant functions from P to V). s 2 ( V )
fA fA f 1 df; f 2 Gg; A 2 C0 g 33 induces s 2 eq (P, V) with s (p) = , where
s((p)) = [p, ] and  2 eq (P, V) induces s 2 ( V )
is isomorphic to the circle S1 with length 1. This can with s (b) = [p, (p)], where p 2 1 (fbg). If H is a
be seen as follows: the de Rham cohomology of R 2
connection on , that is, a smooth assignment of a
with coefficients in iR in dimension 1 is (horizontal) vector subspace Hp of Tp P at each p of
P, algebraically determined by a smooth g-valued
1-form ! on P through Hp = ker(!p ), s 2 ( V ),
1
HDR R2
; iR = f A0 DR ; 2 Rg
X 2 (TB), and X" 2 (TP) the horizontal lifting of
1
HDR S1 ; iR R 34 X by !, then X" (s ) 2 eq (P, V), and covariant
AharonovBohm Effect 197

derivative of s with respect to ! in the direction of X is whose solution is the time-ordered exponential
defined by Z t !
gtg01 T exp
d !  
r!X s : = sX"s 36a 0

X
1 Z t
If  : 1 (U) ! U  G is a local trivialization of , 1 1 m 1
d1 ! 1 

x ,  = 1, . . . , dim B are local coordinates on U, and m1 0
ei , i = 1, . . . , dim V is a basis of the local sections in Z 1
1
V (U), then the local expression of [36a] is  2   
d2 ! 2 
0
  Z
@ m1
r!XU @=@x si ei = X ij  Aji si ej 36b  m
dm ! m  38
@x 0

where If q = p then g(0) = 1. For each p 2 P, the set of


elements g0 2 G such that c" (1) = pg0 for c 2
j j j (B;(p)) is a subgroup of G, Hol!p , called the
AU i = Ai dx = 
!U i 36c
holonomy of ! at p. (For each p, there exists a
group isomorphism Hol!p ! Hol!(p) , and if p and p0
is the geometrical gauge potential in U, given by the are connected by a horizontal curve, then
pullback of !U , the restriction of ! to 1 (U), by the Hol!p = Hol!p0 ; if all p0 s in P are horizontally con-
j
local section  : U ! 1 (U), (b) = 1 (b, 1). (Ai nected, then Hol!p = G for all p 2 P.) If (U, ) is a
!U j
is defined through r@=@x ei = Ai ej .) The operator local trivialization of , c U, and (t) = (c(t)), then
one has the local formula
j j @ j Z ct !!
Di = i Ai 36d
@x " 1
c t =  ct; 1 T exp  AU g0
c0
is the usual local covariant derivative. In an over-
lapping trivialization, [36b] is replaced by 39

  In particular, if is a product bundle, then  is the


@ 0 identity, and choosing g(0) = 1 gives
= X ij  Aij s0i e0j
!
rXU0@=@x s0i e0i 
@x Z ct !!
"
c t = ct; T exp  AU 40
with e0j = gkj ek and s0i = g1il sl on U \ U0 , then the c0
local potential transforms as
In our case, V = C, is a product bundle, s = ,
0
Ajl = gjk Aik g1il @ gjk g1kl 36e the wave function, is a global section of the
associated bundle
which for G abelian has the form [32]. C : C ! R2
 C !
C
R2
41
For each smooth path c : [0, 1] ! B joining the
G = U(1) with g = iR and an action U(1)  C ! C,
points b and b0 , and each p 2 Pb = 1 (fbg), there
(ei , z) 7! ei z; therefore, A = A0 = ia with a
exists a unique path c" in P through p with c" (t) 2
real valued, and the covariant derivative is
Hc(t) for all t 2 [0, 1]. c" is the horizontal lifting of c !
by ! through p. Thus, for each connection and path @
there exists a diffeomorphism P!c : Pb ! Pb0 called D = ia 36f
@x
parallel transport. If c is a loop at b, then P!c 2
Diff(Pb ) is called the holonomy of ! at b along c. To If carries the electric charge q, we define the
the loop space of B at b, (B;b), corresponds a physical gauge potential A through
subgroup Hol!b of Diff(Pb ) called the holonomy of !
a = qA 42
at b. If c 2 (B;b) and is a lifting of c through q 2
Pb , then there exists a unique path g : [0, 1] ! G and, for the covariant derivative, after multiplying
such that c" (t) = (t)g(t) with c" (0) = qg(0) = p; g by i, we obtain the operator appearing in eqn [15],
satisfies the differential equation iD = (i(@=@x )  qA ) : in fact, for the spatial
part the coupling is (ir qA) , and for the
d temporal part one has (i@=@t  q) . For the
gt ! t t =0 37
dt electron, q = jej and a = jejA = (2=0 )A .
198 Algebraic Approach to Quantum Field Theory

For c 2 (R2
;(x0 , y0 )), which turns n times nonabelian. Examples with YangMills and grav-
around the solenoid at (0, 0), eqn [40] gives itational fields are considered in the literature.
H H
n A in a
c" x0 ; y0 ; e c x ; y ; e
H 0 0 c

ijejn Adx Acknowledgment


x0 ; y0 ; e c x0 ; y0 ; e2in=0
and therefore, for =0 = 2 [0, 1) we have the The author thanks the University of Valencia,
holonomy groups Spain, where part of this work was done.
!
Holx0 ;y0 ;1 fe2in=0 gn2Z See also: Deformation Quantization and Representation
Theory; Fractional Quantum Hall Effect; Geometric
Zq ; p=q; p; q 2 Z; p; q 1
Phases; Moduli Spaces: An Introduction; Quantum
Z; 62 Q Chromodynamics; Variational Techniques for
43 GinzburgLandau Energies.

In the second case, Hol!() is dense in U(1): in fact,


((x0 , y0 ), 1)
suppose that for n1 , n2 2 Z, n1 6 n2 , e2in1 = e2in2 ,
then e2i(n1 n2 ) = 1 and so (n1  n2 ) = m for some
Further Reading
m 2 Z; therefore, 2 Q, which is a contradiction. Aguilar MA and Socolovsky M (2002) AharonovBohm effect,
Finally, we should mention that the AB effect flat connections, and Greens theorem. International Journal
can be understood as a geometric phase a la Berry, of Theoretical Physics 41: 839860.
Aharonov Y and Bohm D (1959) Significance of electro-
though not necessarily through an adiabatic change magnetic potentials in the quantum theory. Physical Review
of the parameters on which the Hamiltonian 15: 485491.
depends. The Berry potential aB turns out to be Berry MV (1984) Quantal phase factors accompanying adiabatic
proportional to the real magnetic vector potential A: changes. Proceedings of the Royal Society of London A 392:
in the n.s.u., and for electrons, 4557.
Chambers RG (1960) Shift of an electron interference pattern by
aB =  jejA 44 enclosed magnetic flux. Physical Review Letters 5: 35.
Corichi A and Pierri M (1995) Gravity and geometric phases.
Physical Review D 51: 58705875.
Dirac PMA (1931) Quantised singularities in the electro-
Nonabelian and Gravitational AB Effects magnetic field. Proceedings of the Royal Society of London A
133: 6072.
Since the fundamental group 1 (R 2
, (x0 , y0 )) Z, Kobe DH (1979) AharonovBohm effect revisited. Annals of
eqn [43] shows that there is a homomorphism (!) : Physics 123: 381410.
1 (R2
, (x0 , y0 )) ! U(1), (!)(n) = e2in , with Peshkin M and Tonomura A (1989) The AharonovBohm Effect.
(!) (1 (R2
)) = Hol!() Berlin: Springer.
((x0 , y0 ), 1) , which characterizes Schulman LS (1971) Approximate topologies. Journal of Mathe-
the AB effect in that case. In general, an AB matical Physics 12: 304308.
effect in a G-bundle with a connection ! is Steenrod N (1951) The Topology of Fibre Bundles. Princeton, NJ:
characterized by a group homomorphism from the Princeton University Press.
fundamental group of the base space B onto the Sundrum R and Tassie LJ (1986) Non-abelian AharonovBohm
effects, Feynman paths, and topology. Journal of Mathema-
holonomy group of the connection, which is a
tical Physics 27: 15661570.
subgroup of the structure group. The AB effect is Wu TT and Yang CN (1975) Concept of nonintegrable phase
nonabelian if the holonomy group is nonabelian, factors and global formulation of gauge fields. Physical
which requires both G and 1 (B, x) to be Review D 12: 38453857.

Algebraic Approach to Quantum Field Theory


R Brunetti and K Fredenhagen, Universitat physics. There are, however, severe obstacles against
Hamburg, Hamburg, Germany a straightforward translation of concepts of classical
2006 Elsevier Ltd. All rights reserved. field theory into quantum theory, among them the
notorious divergences of quantum field theory and
the intrinsic nonlocality of quantum physics. There-
fore, the concept of locality is somewhat obscured in
Introduction
the formalism of quantum field theory as it is
Quantum field theory may be understood as the typically exposed in textbooks. Nonlocal concepts
incorporation of the principle of locality, which is at such as the vacuum, the notion of particles or the S-
the basis of classical field theory, into quantum matrix play a fundamental role, and neither the
Algebraic Approach to Quantum Field Theory 199

relation to classical field theory nor the influence of be applied to them. However, it has recently been
background fields can be properly treated. shown that formal perturbation theory can be
Algebraic quantum field theory (AQFT; synony- reshaped in the spirit of AQFT such that the algebras
mously, local quantum physics), on the contrary, of observables of these models can be constructed as
aims at emphasizing the concept of locality at every algebras of formal power series of Hilbert space
instance. As the nonlocal features of quantum operators. The price to pay is that the deep
physics occur at the level of states (entangle- mathematics of operator algebras cannot be applied,
ment), not at the level of observables, it is better but the crucial features of the algebraic approach can
not to base the theory on the Hilbert space of states be used.
but on the algebra of observables. Subsystems of a AQFT was originally proposed by Haag as a
given system then simply correspond to subalgebras concept by which scattering of particles can be
of a given algebra. The locality concept is abstractly understood as a consequence of the principle of
encoded in a notion of independence of subsystems; locality. It was then put into a mathematically
two subsystems are independent if the algebra of precise form by Araki, Haag, and Kastler. After the
observables which they generate is isomorphic analysis of particle scattering by Haag and Ruelle
to the tensor product of the algebras of the and the clarification of the relation to the Lehmann
subsystems. SymanzikZimmermann (LSZ) formalism by Hepp,
Spacetime can then in the spirit of Leibniz be the structure of superselection sectors was studied
considered as an ordering device for systems. So, one first by Borchers and then in a fundamental series of
associates with regions of spacetime the algebras of papers by Doplicher, Haag, and Roberts (DHR)
observables which can be measured in the pertinent (see, e.g., Doplicher et al. (1971, 1974)) (soon after
region, with the condition that the algebras of Buchholz and Fredenhagen established the relation
subregions of a given region can be identified with to particles), and finally Doplicher and Roberts
subalgebras of the algebra of the region. uncovered the structure of superselection sectors as
Problems arise if one aims at a generally covariant the dual of a compact group thereby generalizing the
approach in the spirit of general relativity. Then, in TannakaKrein theorem of characterization of
order to avoid pitfalls like in the hole problem, group duals.
systems corresponding to isometric regions must be With the advent of two-dimensional conformal
isomorphic. Since isomorphic regions may be field theory, new models were constructed and it was
embedded into different spacetimes, this amounts shown that the DHR analysis can be generalized to
to a simultaneous treatment of all spacetimes of a these models. Directly related to conformal theories is
suitable class. We will see that category theory the algebraic approach to holography in anti-de Sitter
furnishes such a description, where the objects are (AdS) spacetime by Rehren.
the systems and the morphisms the embeddings of a The general framework of AQFT may be described
system as a subsystem of other systems. as a covariant functor between two categories. The
States arise as secondary objects via Hilbert space first one contains the information on local relations
representations, or directly as linear functionals on and is crucial for the interpretation. Its objects are
the algebras of observables which can be interpreted topological spaces with additional structures (typi-
as expectation values and are, therefore, positive cally globally hyperbolic Lorentzian spaces, possibly
and normalized. It is crucial that inequivalent spin bundles with connections, etc.), its morphisms
representations (sectors) can occur, and the being the structure-preserving embeddings. In the
analysis of the structure of the sectors is one of case of globally hyperbolic Lorentzian spacetimes,
the big successes of AQFT. One can also study the one requires that the embeddings are isometric and
particle interpretation of certain states as well as preserve the causal structure. The second category
(equilibrium and nonequilibrium) thermodynamical describes the algebraic structure of observables. In
properties. quantum physics the standard assumption is that one
The mathematical methods in AQFT are mainly deals with the category of C -algebras where the
taken from the theory of operator algebras, a field of morphisms are unital embeddings. In classical phys-
mathematics which developed in close contact to ics, one looks instead at Poisson algebras, and in
mathematical physics, in particular to AQFT. perturbative quantum field theory one admits alge-
Unfortunately, the most important field theories, bras which possess nontrivial representations as
from the point of view of elementary particle formal power series of Hilbert space operators. It is
physics, as quantum electrodynamics or the standard the leading principle of AQFT that the functor a
model could not yet be constructed beyond formal contains all physical information. In particular, two
perturbation theory with the annoying consequence theories are equivalent if the corresponding functors
that it seemed that the concepts of AQFT could not are naturally equivalent.
200 Algebraic Approach to Quantum Field Theory

In the analysis of the functor a, a crucial role is The concept of locally covariant quantum field
played by natural transformations from other theory is defined as follows.
functors on the locality category. For instance, a
Definition 1
field A may be defined as a natural transformation
from the category of test function spaces to the (i) A locally covariant quantum field theory is a
category of observable algebras via their functors covariant functor a from Loc to Obs and (writing
related to the locality category.  for a( )) with the covariance properties
 0   0 ; idM idaM

Quantum Field Theories as Covariant for all morphisms 2 homLoc (M1 , M2 ), all
0
Functors morphisms 2 homLoc (M2 , M3 ), and all
M 2 obj(Loc).
The rigorous implementation of the generally covariant (ii) A locally covariant quantum field theory
locality principle uses the language of category theory. described by a covariant functor a is called
The following two categories are used: causal if the following holds: whenever there
Loc: The class of objects obj(Loc) is formed by all are morphisms j 2 homLoc (Mj , M), j = 1, 2,

(smooth) d-dimensional (d  2 is held fixed), so that the sets 1 (M1 ) and 2 (M2 ) are causally
globally hyperbolic Lorentzian spacetimes M separated in M, then one has
 
which are oriented and time oriented. Given any  1 aM1 ;  2 aM2 f0g
two such objects M1 and M2 , the morphisms 2
homLoc (M1 , M2 ) are taken to be the isometric where the element-wise commutation makes
embeddings : M1 ! M2 of M1 into M2 but with sense in a(M).
the following constraints: (iii) One says that a locally covariant quantum field
theory given by the functor a obeys the time-
(i) if  : [a, b] ! M2 is any causal curve and slice axiom if
(a), (b) 2 (M1 ) then the whole curve must
be in the image (M1 ), that is, (t) 2 (M1 ) for  aM aM0
all t 2 [a, b];
(ii) any morphism preserves orientation and holds for all 2 homLoc (M, M0 ) such that (M)
time orientation of the embedded spacetime. contains a Cauchy surface for M0 .
The composition is defined as the composition Thus, a quantum field theory is an assignment of
of maps, the unit element in homLoc (M, M) is C -algebras to (all) globally hyperbolic spacetimes
given by the identical embedding idM : M 7! M so that the algebras are identifiable when the
for any M 2 obj(Loc). spacetimes are isometric, in the indicated way. This
Obs: The class of objects obj(Obs) is formed by all is a precise description of the generally covariant
C -algebras possessing unit elements, and the locality principle.
morphisms are faithful (injective) unit-preserving
-homomorphisms. The composition is again
defined as the composition of maps, the unit The Traditional Approach
element in homObs (A, A) is for any A 2 obj(Obs)
The traditional framework of AQFT, in the Araki
given by the identical map idA : A 7! A, A 2 A.
HaagKastler sense, on a fixed globally hyperbolic
The categories are chosen for definitiveness. One spacetime can be recovered from a locally covariant
may envisage changes according to particular needs, quantum field theory, that is, from a covariant
as, for instance, in perturbation theory where instead functor a with the properties listed above.
of C -algebras general topological -algebras are Indeed, let M be an object in obj(Loc). K(M)
better suited. Or one may use von Neumann denotes the set of all open subsets in M which are
algebras, in case particular states are selected. On relatively compact and also contain, with each pair
the other hand, one might consider for Loc bundles of points x and y, all g-causal curves in M
over spacetimes, or (in conformally invariant the- connecting x and y (cf. condition (i) in the definition
ories) admit conformal embeddings as morphisms. In of Loc). O 2 K(M), endowed with the metric of M
case one is interested in spacetimes which are not restricted to O and with the induced orientation and
globally hyperbolic, one could look at the globally time orientation, is a member of obj(Loc), and the
hyperbolic subregions (where one needs to be careful injection map M,O : O ! M, that is, the identical
about the causal convexity condition (i) above). map restricted to O, is an element in homLoc (O, M).
Algebraic Approach to Quantum Field Theory 201

With this notation, it is easy to prove the following Ultraviolet Structure and Idealized Localizations
assertion:
This section deals with the problem of inspecting the
Theorem 1 Let a be a covariant functor with theory at very small scales. In the limiting case, one
the above-stated properties, and define a map is interested in idealized localizations, eventually the
K(M) 3 O 7! A(O) a(M) by setting points of spacetimes. But the observable algebras are
trivial at any point x 2 M, namely
AO : M;O aO \
AO C1; O 2 KM
Then the following statements hold: O3x

(i) The map fulfills isotony, that is, Hence, pointlike localized observables are neces-
sarily singular. Actually, the Wightman formulation
O1 O2 ) AO1 AO2
of quantum field theory is based on the use of
for all O1 ; O2 2 KM
distributions on spacetime with values in the algebra
of observables (as a topological -algebra). In spite
(ii) If there exists a group G of isometric diffeo-
of technical complications whose physical signifi-
morphisms  : M ! M (so that   g = g) preser-
cance is unclear, this formalism is well suited for a
ving orientation and time orientation, then there
discussion of the connection with the Euclidean
is a representation G 3  7!   of G by C -
theory, which allows, in fortunate cases, a treatment
algebra automorphisms   : a(M) ! a(M)
by path integrals; it is more directly related to
such that
models and admits, via the operator-product expan-
~ AO AO;
 O 2 KM sion, a study of the short-distance behavior. It is,
therefore, an important question how the algebraic
(iii) If the theory given by a is additionally causal, approach is related to the Wightman formalism. The
then it holds that reader is referred to the literature for exploring the
results on this relation.
AO1 ; AO2  f0g
Whereas these results point to an essential equiva-
for all O1 , O2 2 K(M) with O1 causally sepa- lence of both formalisms, one needs in addition a
rated from O2 . criterion for the existence of sufficiently many Wight-
man fields associated with a given local net. Such a
These properties are just the basic assumptions of
criterion can be given in terms of a compactness
the ArakiHaagKastler framework.
condition to be discussed in the next subsection. As a
benefit, one derives an operator-product expansion
which has to be assumed in the Wightman approach.
The Achievements of the Traditional In the purely algebraic approach, the ultraviolet
Approach structure has been investigated by Buchholz and
Verch. Small-scale properties of theories are studied
In the ArakiHaagKastler approach in Minkowski
with the help of the so-called scaling algebras whose
spacetime M, many results have been obtained in
elements can be described as orbits of observables
the last 40 years, some of them also becoming a
under all possible renormalization group motions.
source of inspiration to mathematics. A description
There results a classification of theories in the scaling
of the achievements can be organized in terms of a
limit which can be grouped into three broad classes:
length-scale basis, from the small to the large. We
theories for which the scaling limit is purely classical
assume in this section that the algebra a(M) is
(commutative algebras), those for which the limit is
faithfully and irreducibly represented on a Hilbert
essentially unique (stable ultraviolet fixed point) and
space H, that the Poincare transformations are
not classical, and those for which this is not the case
unitarily implemented with positive energy, and
(unstable ultraviolet fixed point). This classification
that the subspace of Poincare invariant vectors is
does not rely on perturbation expansions. It allows
one dimensional (uniqueness of the vacuum).
an intrinsic definition of confinement in terms of the
Moreover, algebras correponding to regions which
so-called ultraparticles, that is, particles which are
are spacelike to a nonempty open region are
visible only in the scaling limit.
assumed to be weakly closed (i.e., von Neumann
algebras on H), and the condition of weak
Phase-Space Analysis
additivity is fulfilled, that is, for all O 2 K(M)
the algebra generated from the algebras As far as finite distances are concerned, there are
A(O x), x 2 M is weakly dense in a(M). two apparently competing principles, those of
202 Algebraic Approach to Quantum Field Theory

nuclearity and modularity. The first one suggests geometrical meaning. Indeed, these authors showed
that locally, after a cutoff in energy, one has a for the pair (A(W), ), where W denotes the wedge
situation similar to that of old quantum mechanics, region W = {x 2 M j jx0 j < x1 }, that the associated
namely a finite number of states in a finite volume modular unitary it is the Lorentz boost with velocity
of phase space. Aiming at a precise formulation, tanh(2t) in the direction 1 and that the modular
Haag and Swieca introduced their notion of com- conjugation J is the CP1 T symmetry operator with
pactness, which Buchholz and Wichmann sharpened parity P1 the reflection with respect to the x1 = 0
into that of nuclearity. The latter authors proposed plane. Later, Borchers discovered that already on the
that the set generated from the vacuum vector , purely algebraic level a corresponding structure exists.
He proved that, given any standard pair (A, ) and a
feH A j A 2 AO; kAk < 1g one-parameter group of unitaries ! U( ) acting on
H denoting the generator of time translations the Hilbert space H with a positive generator and
(Hamiltonian), is nuclear for any  > 0, roughly such that  is invariant and U( )AU( ) A, > 0,
stating that it is contained in the image of the unit then the associated modular operators  and J fulfill
ball under a trace class operator. The nuclear size the commutation relations
Z(,O) of the set plays the role of the partition
it U it Ue2t
function of the model and has to satisfy certain
bounds in the parameter . The consequence of this JU J U
constraint is the existence of product states, namely
which are just the commutation relations between
those normal states for which observables localized in
boosts and lightlike translations.
two given spacelike separated regions are uncorre-
Surprisingly, there is a direct connection between
lated. A further consequence is the existence of
the two concepts of nuclearity and modularity.
thermal equilibrium states (KMS states) for all  > 0.
Indeed, in the nuclearity condition, it is possible to
The second principle concerns the fact that, even
replace the Hamiltonian operator by a specific
locally, quantum field theory has infinitely many
function of the modular operator associated with a
degrees of freedom. This becomes visible in the
slightly larger region. Furthermore, under mild
ReehSchlieder theorem, which states that every
conditions, nuclearity and modularity together
vector  which is in the range of eH for some
determine the structure of local algebras completely;
 > 0 (in particular, the vacuum ) is cyclic and
they are isomorphic to the unique hyperfinite type
separating for the algebras A(O), O 2 K(M), that is,
III1 von Neumann algebra.
A(O) is dense in H ( is cyclic) and A = 0, A 2
A(O) implies A = 0 ( is separating). The pair Sectors, Symmetries, Statistics, and Particles
(A(O), ) is then a von Neumann algebra in the
so-called standard form. On such a pair, the Large scales are appropriate for discussing global
TomitaTakesaki theory can be applied, namely issues like superselection sectors, statistics and
the densely defined operator symmetries as far as large spacelike distances are
concerned, and scattering theory, with the resulting
SA A ; A 2 AO notions of particles and infraparticles, as far as large
timelike distances are concerned.
is closable, and the polar decomposition of its
In purely massive theories, where the vacuum
closure 
S = J1=2 delivers an antiunitary involution
sector has a mass gap and the mass shell of the
J (the modular conjugation) and a positive self-
particles are isolated, a very satisfactory description
adjoint operator  (the modular operator) asso-
of the multiparticle structure at large times can be
ciated with the standard pair (A(O), ). These
given. Using the concept of almost local particle
operators have the properties
generators,
JAOJ AO0  At
where the prime denotes the commutant, and where  is a single-particle state (i.e., an eigenstate
it
 AO it
AO; t2R of the mass operator), A(t) is a family of almost
local operators essentially localized in the kinema-
The importance of this structure is based on the tical region accessible from a given point by a
fact disclosed by Bisognano and Wichmann using motion with the velocities contained in the spectrum
Poincare-covariant Wightman fields and local alge- of , one obtains the multiparticle states as limits of
bras generated by them, that for specific regions in products A1 (t)    An (t) for disjoint velocity sup-
Minkowski spacetime the modular operators have a ports. The corresponding closed subspaces are
Algebraic Approach to Quantum Field Theory 203

invariant under Poincare transformations and are representation of the symmetric group. One may then
unitarily equivalent to the Fock spaces of noninter- enlarge the algebra of observables and obtain an
acting particles. algebra of operators which transform covariantly
For massless particles, no almost-local particle under the global gauge group and satisfy Bose or
generators can be expected to exist. In even Fermi commutation relations for spacelike separation.
dimensions, however, one can exploit Huygens In two spacetime dimensions, one obtains instead
principle to construct asymptotic particle generators braided tensor categories. They have been classified
which are in the commutant of the algebra of the under additional conditions (conformal symmetry,
forward or backward lightcone, respectively. Again, central charge c < 1) in a remarkable work by
their products can be determined and multiparticle Kawahigashi and Longo. Moreover, in their paper,
states obtained. one finds that by using completely new methods (Q-
Much less well understood is the case of massive systems) a new model is unveiled, apparently
particles in a theory which also possesses massless inaccessible by methods used by others. To some
particles. Here, in general, the corresponding states extent, these categories can be interpreted as duals
are not eigenstates of the mass operator. Since of generalized quantum groups.
quantum electrodynamics (QED) as well as the The question arises whether all representations
standard model of elementary particles have this describing elementary particles are, in the massive
problem, the correct treatment of scattering in these case, DHR representations. One can show that in the
models is still under discussion. One attempt to a case of a representation with an isolated mass shell
correct treatment is based on the concept of the so- there is an associated vacuum representation which
called particle weights, that is, unbounded positive becomes equivalent to the particle representation after
functionals on a suitable algebra. This algebra is restriction to observables localized spacelike to a given
generated by positive almost-local operators annihi- infinitely extended spacelike cone. This property is
lating the vacuum and interpreted as counters. weaker than the DHR condition but allows, in four
The structure at large spacelike scales may be spacetime dimensions, the same construction of a
analyzed by the theory of superselection sectors. The global gauge group and of covariant fields with Bose
best-understood case is that of locally generated and Fermi commutation relations, respectively, as the
sectors which are the objects of the DHR theory. DHR condition. In three space dimensions, however,
Starting from a distinguished representation 0 one finds a braided tensor category, which has similar
(vacuum representation) which is assumed to fulfill properties as those known from topological field
the Haag duality, theories in three dimensions.
   0 The sector structure in massless theories is not
0 AO 0 AO0 well understood, due to the infrared problem. This is
in particular true for QED.
for all double cones O, one may look at all
representations which are equivalent to the vacuum
representation if restricted to the observables loca-
Fields as Natural Transformations
lized in double cones in the spacelike complement of
a given double cone. Such representations give rise In order to be able to interpret the theory in terms of
to endomorphisms of the algebra of observables, measurements, one has to be able to compare
and the product of endomorphisms can be inter- observables associated with different regions of
preted as a product of sectors (fusion). In general, spacetime, or, even different spacetimes. In the
these representations violate the Haag duality, but absence of nontrivial isometries, such a comparison
there is a subclass of the so-called finite statistics can be made in terms of locally covariant fields. By
sectors where the violation of Haag duality is small, definition, these are natural transformations from
in the sense that the nontrivial inclusion the functor of quantum field theory to another
   0 functor on the category of spacetimes Loc.
 AO  AO0 The standard case is the functor which associates
with every spacetime M its space D(M) of smooth
has a finite Jones index. These sectors form (in at least
compactly supported test functions. There, the
three spacetime dimensions) a symmetric tensor
morphisms are the pushforwards D
 .
category with some further properties which can be
identified, in a generalization of the TannakaKrein Definition 2 A locally covariant quantum field  is
theorem, as the dual of a unique compact group. This a natural transformation between the functors d
group plays the role of a global gauge group. The and a, that is, for any object M in obj(Loc) there
symmetry of the category is expressed in terms of a exists a morphism M : D(M) ! a(M) such that for
204 Algebraic Approach to Quantum Field Theory

any pair of objects M1 and M2 and any morphism Field Theory: Fundamental Concepts and Tools;
between them, the following diagram commutes: Scattering in Relativistic Quantum Field Theory: The
Analytic Program; Spin Foams; Symmetries in Quantum
M1
DM1 ! AM1 Field Theory: Algebraic Aspects; Symmetries in Quantum
Field Theory of Lower Spacetime Dimensions;
# # TomitaTakesaki Modular Theory; Two-Dimensional
Models; von Neumann Algebras: Introduction, Modular
DM2 ! AM2 Theory and Classification Theory; von Neumann
M2
Algebras: Subfactor Theory.

The commutativity of the diagram means, expli-


citly, that
Further Reading
   M1  M2  
Araki H (1999) Mathematical Theory of Quantum Fields.
which is the requirement sought for the covariance Oxford: Oxford University Press.
of fields. It contains, in particular, the standard Baumgrtel H and Wollemberg M (1992) Causal Nets of
covariance condition for spacetime isometries. Operator Algebras. Berlin: Akademie Verlag.
Borchers HJ (1996) Translation Group and Particle Representa-
Fields in the above sense are not necessarily linear. tion in Quantum Field Theory, Lecture Notes in Physics. New
Examples for fields which are also linear are the scalar Series m: Monographs, 40. Berlin: Springer.
massive free KleinGordon fields on all globally Borchers HJ (2000) On revolutionizing quantum field theory with
hyperbolic spacetimes and its locally covariant Wick Tomitas modular theory. Journal of Mathematical Physics 41:
polynomials. In particular, the energymomentum 36043673.
Bratteli O and Robinson DW (1987) Operator Algebras and
tensors can be constructed as locally covariant fields, Quantum Statistical Mechanics, vol. 1 Berlin: Springer.
and they provide a crucial tool for discussing the back- Brunetti R and Fredenhagen K (2000) Microlocal analysis and
reaction problem for matter fields. interacting quantum field theories: renormalization on physi-
An example for the more general notion of a field cal backgrounds. Communications in Mathematical Physics
are the local S-matrices in the StuckelbergBogolubov 208: 623661.
Brunetti R, Fredenhagen K, and Verch R (2003) The generally
EpsteinGlaser sense. These are unitaries SM () with covariant locality principle a new paradigm for local
M 2 obj(Loc) and  2 D(M) which satisfy the quantum field theory. Communications in Mathematical
conditions Physics 237: 3168.
Buchholz D and Haag R (2000) The quest for understanding in
SM 0 1 relativistic quantum physics. Journal of Mathematical Physics
41: 36743697.
SM 
SM 
SM
1 SM
Dixmier J (1964) Les C-algebres et leurs representations. Paris:
Gauthier-Villars.
for ,
, 2 D(M) such that the supports of  and Doplicher S, Haag R, and Roberts JE (1971) Local observables
can be separated by a Cauchy surface of M with and particle statistics I. Communications in Mathematical
supp  in the future of the surface. Physics 23: 199230.
The importance of these S-matrices relies on the Doplicher S, Haag R, and Roberts JE (1974) Local observables
and particle statistics II. Communications in Mathematical
fact that they can be used to define a new quantum
Physics 35: 4985.
field theory. The new theory is locally covariant if the Evans DE and Kawahigashi Y (1998) Quantum Symmetries on
original theory is and if the local S-matrices satisfy Operator Algebras. New York: Clarendon Press.
the condition of the locally covariant field above. A Haag R (1996) Local Quantum Physics, 2nd edn. Berlin: Springer.
perturbative construction of interacting quantum Haag R and Kastler D (1964) An algebraic approach to quantum
field theory. Journal of Mathematical Physics 5: 848861.
field theory on globally hyperbolic spacetimes was
Hollands S and Wald RM (2001) Local Wick polynomials and time
completed in this way by Hollands and Wald, based ordered products of quantum fields in curved spacetime. Commu-
on previous work by Brunetti and Fredenhagen. nications in Mathematical Physics 223: 289326.
Hollands S and Wald RM (2002) Existence of local covariant
See also: Axiomatic Quantum Field Theory; Constructive time ordered products of quantum field in curved spacetime.
Quantum Field Theory; Current Algebra; Deformation Communications in Mathematical Physics 231: 309345.
Quantization and Representation Theory; Dispersion Kastler D (ed.) (1990) The Algebraic Theory of Superselection Sectors.
Relations; Indefinite Metric; Integrability and Quantum Introductions and Recent Results. Singapore: World Scientific.
Kawahigashi Y and Longo R (2004) Classification of local
Field Theory; Operads; Perturbative Renormalization
conformal nets. Case c < 1. Annals of Mathematics 160: 130.
Theory and BRST; Quantum Central Limit Theorems; Takesaki M (2003) Theory of Operator Algebras I, II, III,
Quantum Field Theory: A Brief Introduction; Quantum Encyclopedia of Mathematical Sciences, vols. 124, 125, 127.
Field Theory in Curved Spacetime; Quantum Fields Berlin: Springer.
with Indefinite Metric: Non-Trivial Models; Quantum Wald RM (1994) Quantum Field Theory in Curved Spacetime
Fields with Topological Defects; Quantum Geometry and Black Hole Thermodynamics. Chicago: University of
and its Applications; Scattering in Relativistic Quantum Chicago Press.
Anomalies 205

Anderson Localization see Localization for Quasi-Periodic Potentials

Anomalies
S L Adler, Institute for Advanced Study, Princeton, NJ, A
USA
2006 Elsevier Ltd. All rights reserved.

Synopsis V V
Figure 1 The AVV triangle diagram responsible for the abelian
Anomalies are the breaking of classical symmetries by chiral anomaly.
quantum mechanical radiative corrections, which arise
when the regularizations needed to evaluate small
with F (x) = @  B (x)  @  B (x) the electromagnetic
fermion loop Feynman diagrams conflict with a
field strength tensor. The second term in eqn [2],
classical symmetry of the theory. They have important
which would be unexpected from the application of
implications for a wide range of issues in quantum
the classical Noether theorem, is the abelian axial-
field theory, mathematical physics, and string theory.
vector anomaly (often called the AdlerBellJackiw
(or ABJ) anomaly after the seminal papers on the
subject). Since vector current conservation, together
Chiral Anomalies, Abelian with the axial-vector current anomaly, implies that
and Nonabelian the left- and right-handed chiral currents j  j5 are
Consider quantum electrodynamics, with the fer- also anomalous, the axial-vector anomaly is fre-
mionic Lagrangian density quently called the chiral anomaly, and we shall
use the terms interchangeably in this article.
L i  @  e0   B  m0 1a There are a number of different ways to understand
where  = y  0 , e0 and m0 are the bare charge and why the extra term in eqn [2] appears. (1) Working
mass, and B is the electromagnetic gauge potential. through the formal Feynman diagrammatic Ward
(We reserve the notation A for axial-vector quan- identity proof of the Noether theorem, one finds that
tities.) Under a chiral transformation there is a step where the closed fermion loop contribu-
tions are eliminated by a shift of the loop-integration
! ei5 1b variable. For Feynman diagrams that are convergent,
this is not a problem, but the AVV diagram is linearly
with constant , the kinetic term in eqn [1a] is
divergent. The linear divergence vanishes under sym-
invariant (because 5 commutes with  0   ), whereas
metric integration, but the shift then produces a finite
the mass term is not invariant. Therefore, naive
residue, which gives the anomaly. (2) If one defines the
application of Noethers theorem would lead one to
AVV diagram by PauliVillars regularization with
expect that the axial-vector current
regulator mass M0 that is allowed to approach infinity
j5  5 1c at the end of the calculation, one finds a classical
Noether theorem in the regulated theory,
obtained from the Lagrangian density by applying a
chiral transformation with spatially varying , should @  j5 jm0  @  j5 jM0 2im0 j5 jm0  2iM0 j5 jM0 3a
have a divergence given by the change under chiral
transformation of the mass term in eqn [1a]. Up to with the subscripts m0 and M0 indicating that
tree approximation, this is indeed true, but when one fermion loops are to be calculated with fermion
computes the AVV Feynman diagram with one axial- mass m0 and M0 , respectively. Taking the vacuum
vector and two vector vertices (see Figure 1), and to two-photon matrix element of eqn [3a], one finds
insists on conservation of the vector current that the matrix element h0jj5 jM0 ji, which is
j =  , one finds that to order e20 , the classical unambiguously computable after imposing vector-
Noether theorem is modified to read current conservation, falls off only as M1
0 as the
regulator mass approaches infinity. Thus, the
e20  product of 2iM0 with this matrix element has a
@  j5 x 2im0 j5 x F xF x
 2
162 finite limit, which gives the anomaly. (3) If the
206 Anomalies

gauge-invariant axial-vector current is defined by @  ja5 x normal divergence term


point-splitting
1=42
  traA 1=4FV xFV x
1=12FA xFA x

j5 x x
=2 5 x 
=2eie0
B x 3b
2=3iA xA xFV x
with
! 0 at the end of the calculation, one
2=3iFV xA xA x
observes that the divergence of eqn [3b] contains
an extra term with a factor of
. On careful 2=3iA xFV  xA x
evaluation, one finds that the coefficient of this  8=3A xA xA xA x 4b
factor is an expression that behaves as
1 , which
gives the anomaly in the limit of vanishing
. (4) In eqn [4b], tr denotes a trace over internal
Finally, if the field theory is defined by a functional degrees of freedom, and aA is the internal symmetry
integral over the classical action, the standard matrix associated with the axial-vector external
Noether analysis shows that the classical action is field. In the abelian case, where there is no internal
invariant under the chiral transformation of eqn symmetry structure, the terms involving two or four
[1b], apart from the contribution of the mass term, factors of A , A , . . . vanish by antisymmetry of
which gives the naive axial-vector divergence. How-
  , and one recovers the AVV triangle anomaly,
ever, as pointed out by Fujikawa, the chiral as well as a kinematically related anomaly in the
transformation must also be applied to the func- AAA triangle diagram. In the nonabelian case, with
tional integration measure, and since the measure is nontrivial internal symmetry structure, there are also
an infinite product, it must be regularized to be well box- and pentagon-diagram anomalies.
defined. Careful calculation shows that the regular- In addition to coupling to spin-1 gauge fields,
ized measure is not chiral invariant, but contributes fermions can also couple to spin-2 gauge fields,
an extra term to the axial-vector Ward identity that associated with the graviton. When the coupling of
is precisely the chiral anomaly. fermions to gravitation is taken into account, the
A key feature of the anomaly is that it is axial-vector current T 5 , with T an internal
irreducible: a local polynomial counter term cannot symmetry matrix, has an additional anomalous
be added to the AVV diagram that preserves contribution to its divergence proportional to
vector-current conservation and eliminates the
anomaly. More generally, one can show that there tr T
 R R 4c
is no way of modifying quantum electrodynamics
so as to eliminate the chiral anomaly, without where R is the Riemann curvature tensor of the
spoiling either vector-current conservation (i.e., gravitational field.
electromagnetic gauge invariance), renormalizabil-
ity, or unitarity. Thus, the chiral anomaly is a new
physical effect in renormalizable quantum field
Chiral Anomaly Nonrenormalization
theory, which is not present in the prequantization A salient feature of the chiral anomaly is the fact
classical theory. that it is not renormalized by higher-order radia-
The abelian chiral anomaly is the simplest case of tive corrections. In other words, the one-loop
the anomaly phenomenon. It was extended to expressions of eqns [2] and [4b] give the exact
nonabelian gauge theories by Bardeen using a anomaly coefficient without modification in higher
point-splitting method to compute the divergence, orders of perturbation theory. In gauge theories
followed by adding polynomial counter terms to such as quantum electrodynamics and quantum
remove as many of the residual terms as possible. chromodynamics, this result (the AdlerBardeen
The resulting irreducible divergence is the nonabe- theorem) can be understood heuristically as fol-
lian chiral anomaly, which in terms of YangMills lows. Write down a modified Lagrangian, in
field strengths for vector and axial-vector gauge which regulators are included for all gauge-boson
potentials V  and A , fields. Since the gauge-boson regulators do not
influence the chiral-symmetry properties of the
FV x @  V x  @ V  x  iV  x; V x theory, the divergences of the chiral currents are
 iA x; A x not affected by their inclusion, and so the only
 4a
FA x @  A x  @ A x  iV  x; A x sources of anomalies in the regularized theory are
 iA x; V x small single-fermion loops, giving the anomaly
expressions of eqns [2] and [4b]. Since the
is given by renormalized theory is obtained as the limit of
Anomalies 207

the regularized theory as the regulator masses quarks (or an equivalent HanNambu triplet), eqn
approach infinity, this result applies to the [6b] gives the correct neutral pion decay rate. This
renormalized theory as well. calculation was one of the first pieces of evidence for
The above argument can be made precise, and the color degree of freedom of quarks.
extends to nongauge theories such as the -model as
well. For both gauge theories and the -model, Anomaly Cancellation in Gauge Theories
cancellation of radiative corrections to the anomaly
coefficient has been explicitly demonstrated in In quantum electrodynamics, the gauge particle (the
fourth-order calculations. Nonperturbative demon- photon) couples to the vector current, and so the
strations of anomaly renormalization have also been anomalous conservation properties of the axial-
given using the CallanSymanzik equations. For vector current have no effect. The same statement
example, in quantum electrodynamics, Zee, and holds for the gauge gluons in quantum chromody-
Lowenstein and Schroer, showed that a factor f namics, when treated in isolation from the other
that gives the ratio of the true anomaly to its one- interactions. However, in the electroweak theory
loop value obeys the differential equation that embeds quantum electrodynamics in a theory of
  the weak force, the gauge particles (the W  and Z
@ @ intermediate bosons) couple to chiral currents,
m f 0 5
@m @ which are left- or right-handed linear combinations
Since f is dimensionless, it can have no dependence of the vector and axial-vector currents. In this case,
on the mass m, and since ( ) is nonzero this implies the chiral anomaly leads to problems with the
@f =@ = 0. Thus, f has no dependence on , and so renormalizability of the theory, unless the anomalies
f = 1. cancel between different fermion species. Writing all
fermions as left-handed, the condition for anomaly
cancellation is
Applications of Chiral Anomalies
trfT ; T gT trT T T T T 0
Chiral anomalies have numerous applications in the for all ; ;  7
standard model of particle physics and its exten-
sions, and we describe here a few of the most with T the coupling matrices of gauge bosons to
important ones. left-handed fermions. These conditions are obeyed
in the standard model, by virtue of three nontrivial
Neutral Pion Decay p 0 ! g g sum rules on the fermion gauge couplings being
satisfied (four sum rules, if one includes the
As a result of the abelian chiral anomaly, the
gravitational contribution to the chiral anomaly
partially conserved axial-vector current (PCAC)
given in eqn [4c], which also cancels in the standard
equation relevant to neutral pion decay is modified
model). Note that anomaly cancellation in the
to read
locally gauged currents of the standard model does
@  F 53 x not imply anomaly cancellation in global-flavor
 p 0 currents. Thus, the flavor axial-vector current
f 2 = 2  x S F xF x
 6a anomaly that gives the 0 !  matrix element
4
remains anomalous in the full electroweak theory.
with  the pion mass, f 131 MeV the charged- Anomaly cancellation imposes important constraints
pion decay constant, and S a constant determined on the construction of grand unified models that
by the constituent fermion charges and axial-vector combine the electroweak theory with quantum
couplings. Taking the matrix element of eqn [6a] chromodynamics. For instance, in SU(5) the fer-
between the vacuum state and a two-photon state, mions are put into a 5 and 10 representation, which
and using the fact that the left-hand side has a together, but not individually, are anomaly free. The
kinematic zero (the SutherlandVeltman theorem), larger unification groups SO(10) and E6 satisfy eqn
one sees that the 0 !  amplitude F is comple- [7] for all representations, and so are automatically
tely determined by the anomaly term, giving the anomaly free.
formula
p
F  =2S 2=f 6b Instanton Physics and the Theta Vacuum
For a single set of fractionally charged quarks, the The theory of anomalies is intimately tied to the
amplitude F is a factor of three too small to agree physics associated with instanton classical Yang
with experiment; for three fractionally charged Mills theory solutions. Since the instanton field
208 Anomalies

strength is self-dual, the nonvanishing instanton has the same anomaly coefficient as that in the
Euclidean action underlying theory. In other words, we must have
Z
1
SE d4 x F F 82 8a trfS ; S gS trfT ; T gT 9
4
implies that the integral of the pseudoscalar density To prove this, one adjoins to the theory a set of
F F
  over the instanton is also nonzero, right-handed spectator fermions f with the same
Z flavor structure as the original set, but which are not
d4 xF F
  642 8b acted on by the color force. These right-handed
fermions cancel the original anomaly, making the
Referring back to eqn [4b], this means that the underlying theory anomaly free at zero color
integral of the nonabelian chiral anomaly for coupling; since dynamics cannot spontaneously
fermions in the background field of an instanton is generate anomalies, the theory, when the color
an integer, which in the Minkowski space continua- dynamics is turned on, must also have no global
tion has the interpretation of a topological winding chiral anomalies. This implies that the bound-state
number change produced by the instanton tunneling spectrum must conspire to cancel the anomalies
solution. This fact has a number of profound associated with the right-handed spectators; in other
consequences. Since a vacuum with a definite wind- words, the bound-state anomaly structure must
ing number j i is unstable under instanton tunnel- match that of the original fermions. This anomaly
ing, careful analysis shows that the nonabelian matching condition has found applications in the
vacuum that has correct clustering properties is a study of the possible compositeness of quarks and
Fourier superposition leptons. It has also been applied to the derivation of
X nonperturbative dynamical results in whole classes
ji ei j i 8c of supersymmetric theories, where the combined
tools of holomorphicity, instanton physics, and
anomaly matching have given incisive results.
giving rise to the -vacuum of quantum chromody-
namics, and a host of issues associated with (the lack
of) strong CP violation, the PecceiQuinn mecha-
nism, and axion physics. Also, the fact that the Global Structure of Anomalies
integral of eqn [8b] is nonzero means that the U(1) We noted earlier that chiral anomalies are irreduci-
chiral symmetry of quantum chromodynamics is ble, in that they cannot be eliminated by adding a
broken by instantons, which as shown by t Hooft local polynomial counter-term to the action. How-
resolves the longstanding U(1) problem of strong ever, anomalies can be described by a nonlocal
interactions, that of explaining why the flavor effective action, obtained by integrating out the
singlet pseudoscalar meson 0 is not light, unlike its fermion field dynamics, and this point of view proves
flavor octet partners. very useful in the nonabelian case. Starting with the
abelian case for orientation, we note that if A is an
Anomaly Matching Conditions external axial-vector field, and we write an effective
The anomaly structure of a theory, as shown by t action [A], then the axial-vector current j5 asso-
Hooft, leads to important constraints on the forma- ciated with A is given (up to an overall constant) by
tion of massless composite bound states. Consider a the variational derivative expression
theory with a set of left-handed fermions if , with i a A
color index acted on by a nonabelian gauge force, j5 x 10a
A x
and f an ungauged family or flavor index. Suppose
that the family multiplet structure is such that the and the abelian anomaly appears as the fact that the
global chiral symmetries associated with the flavor expression
index have nonvanishing anomalies tr{T , T }T .
Then the t Hooft condition asserts that if the color 
@  j5 XA G 6 0; X @ 10b
forces result in the formation of composite massless A x
bound states of the original completely confined
fermions, and if there is no spontaneous breaking of is nonvanishing even when the theory is classically
the original global flavor symmetries, then these chiral invariant. Turning now to the nonabelian
bound states must contain left-handed spin-1/2 case, the variational derivative appearing in eqns
composites with a representation structure S that [10a] and [10b] must be replaced by an appropriate
Anomalies 209

covariant derivative. In terms of the internal- the consistency conditions. Subsequently, Witten
symmetry component fields Aa and Va of the gave a new construction of this local action, in
YangMills potentials of eqn [4a], one introduces terms of the integral of a fifth-rank antisymmetric
operators tensor over a five-dimensional disk which has a
four-dimensional space as its boundary. He also
 
Xa x @  fabc Vb c showed that requiring ei to be independent of the
Aa x A x choice of the spanning disk requires, in analogy with
 Diracs quantization condition for monopole charge,
fabc Ab
Vc x the condition that the overall coefficient in the
11a nonabelian anomaly be quantized in integer multi-
 
Y a x @  a
fabc Vb c ples. Comparison with the lowest-order triangle
V x V x
diagram shows that in the case of SU(Nc ) gauge
 theory, this integer is just the number of colors Nc .
fabc Ab
Ac x Thus, global considerations tightly constrain the
nonabelian chiral anomaly structure, and dictate
with fabc the antisymmetric nonabelian group struc- that up to an integer-proportionality constant, it
ture constants. The operators Xa and Y a are easily must have the form given in eqns [4a] and [4b].
seen to obey the commutation relations

Xa x; Xb y fabc x  yYc x Trace Anomalies


Xa x; Y b y fabc x  yXc x 11b The discovery of chiral anomalies inspired the search
for other examples of anomalous behavior. First
Y a x; Y b y fabc x  yYc x indications of a perturbative trace anomaly obtained
in a study of broken scale invariance by Coleman and
Let [V, A] be the effective action as a functional of Jackiw were shown by Crewther, and by Chanowitz
the fields V  , A , constructed so that the vector and Ellis, to correspond to an anomaly in the three-
currents are covariantly conserved, as expressed point function  V V , where  is the energy
formally by momentum tensor. Letting  (p) be the momentum
Y a V; A 0 12a space expression for this three-point function, and 
the corresponding V V two-point function, the trace
Then the nonabelian axial-vector current anomaly is anomaly equation in quantum electrodynamics reads
given by  
@
 p 2  p  p
Xa V; A Ga 12b @p
R
From eqns [12a] and [12b] and the first line of  2 p p   p2 13a
eqn [11b], we have 6
with the first term on the right-hand side the naive
Xb Ga  Xa Gb Xb Xa  Xa Xb V; A divergence, and the second term the trace anomaly,
/ fabc Y c V; A 0 12c with anomaly coefficient R given by
X 1 X 2
which is the WessZumino consistency condition on R Q2i Q 13b
the structure of the anomaly Ga . It can be shown 1
4 i;spin 0 i
i;spin 2
that this condition uniquely fixes the form of the
nonabelian anomaly to be that of eqn [4b], up to an The fact that there should be a trace anomaly can
overall constant, which can be determined by readily be inferred from a trace analog of the Pauli
comparison with the simplest anomalous AVV Villars regulator argument for the chiral anomaly
triangle graph. A physical consequence of the given in eqn [3a]. Letting j =  be the scalar
consistency condition is that the 0 !  decay current in abelian electrodynamics, one has
amplitude determines uniquely certain other anom-  jm0   jM0 m0 jjm0  M0 jjM0 13c
alous amplitudes, such as 2 ! 3,  ! 3, and a
five pseudoscalar vertex. Taking the vacuum to two-photon matrix element
Although the action [V, A] is necessarily non- of this equation, and imposing vector-current con-
local, Wess and Zumino were able to write down a servation, one finds that the matrix element
local action, involving an auxiliary pseudoscalar h0jjjM0 ji is proportional to M1 
0 h0jF F jiM0
field, that obeys the anomalous Ward identities and for a large regulator mass, and so makes a
210 Anomalies

nonvanishing contribution to the right-hand side of sectors of a theory, which do not contain the physical
eqn [13c], giving the lowest-order trace anomaly. fields that we directly observe, to the physical sector
Unlike the chiral anomaly, the trace anomaly is containing the observed fields.
renormalized in higher orders of perturbation
theory; heuristically, the reason is that whereas
boson field regulators do not affect the chiral Further Anomaly Topics
symmetry properties of a gauge theory (which are The above discussion has focused on some of the
determined just by the fermionic terms in the principal features and applications of anomalies.
Lagrangian), they do alter the energymomentum There are further topics of interest in the physics and
tensor, since gravitation couples to all fields, includ- mathematics of anomalies that are discussed in
ing regulator fields. An analysis using the Callan detail in the references cited in the Further reading
Symanzik equations shows, however, that the trace section. We briefly describe a few of them here.
anomaly is computable to all orders in terms of
various renormalization group functions of the Anomalies in Other Spacetime Dimensions
coupling. For example, in abelian electrodynamics, and in String Theory
defining ( ) and ( ) by ( ) = (m= )@ =@m and The focus above has been on anomalies in four-
1 ( ) = (m=m0 )@m0 =@m, the trace of the energy dimensional spacetime, but anomalies of various
momentum tensor is given to all orders by types occur both in lower-dimensional quantum
field theories (such as theories in two- and three-
 1  m0  14 NF F     14 dimensional spacetimes) and in quantum field the-
ories in higher-dimensional spacetimes (such as N = 1
with N[ ] specifying conditions that make the division supergravity in ten-dimensional spacetime). Anoma-
into two terms in eqn [14] unique, and with the lies also play an important role in the formulation
ellipsis    indicating terms that vanish by the equa- and consistency of string theory. The bosonic string is
tions of motion. A similar relation holds in the consistent only in 26-dimensional spacetime, and the
nonabelian case, again with the function appearing analogous supersymmetric string only in ten-dimen-
as the coefficient of the anomalous tr N[F F ] term. sional spacetime, because in other dimensions both
Just as in the chiral anomaly case, when spin-0, these theories violate Lorentz invariance after quanti-
spin-1/2, or spin-1 fields propagate on a background zation. In the Polyakov path-integral formulation of
spacetime, there are curvature-dependent contribu- these string theories, these special dimensions are
tions to the trace anomaly, in other words, gravita- associated with the cancellation of the Weyl anomaly,
tional anomalies. These typically take the form of which is the relevant form of the trace anomaly
complicated linear combinations of terms of the discussed above. YangMills, gravitational, and
form R2 , R R , R  R  , R, ; , with coefficients mixed YangMills gravitational anomalies make an
depending on the matter fields involved. appearance both in N = 1 ten-dimensional super-
In supersymmetric theories, the axial-vector current gravity and in superstring theory, and again special
and the energymomentum tensor are both dimensions play a role. In these theories, only when
components of the supercurrent, and so their anoma- the associated internal symmetry groups are either
lies imply the existence of corresponding supercurrent SO(32) or E8  E8 is elimination of all anomalies
anomalies. The issue of how the nonrenormalization possible, by cancellation of hexagon-diagram anoma-
of chiral anomalies (which have a supercurrent lies with anomalous tree diagrams involving
generalization given by the Konishi anomaly), and exchange of a massless antisymmetric two-form
the renormalization of trace anomalies, can coexist in field. This mechanism, due to Green and Schwarz,
supersymmetric theories originally engendered con- requires the factorization of a sixth-order trace
siderable confusion. This apparent puzzle is now invariant that appears in the hexagon anomaly in
understood in the context of a perturbatively exact terms of lower-order invariants, as well as two
expression for the function in supersymmetric field numerical conditions on the adjoint representation
theories (the so-called NSVZ, for Novikov, Shifman, generator structure, restricting the allowed gauge
Vainshtein, and Zakharov, function). Supersymme- groups to the two noted above.
try anomalies can be used to infer the structure of
effective actions in supersymmetric theories, and these
in turn have important implications for possibilities Covariant versus Consistent Anomalies;
Descent Equations
for dynamical supersymmetry breaking. Anomalies
may also play a role, through anomaly mediation, in The nonabelian anomaly of eqns [4a] and [4b] is
communicating supersymmetry breaking in hidden called the consistent anomaly, because it obeys the
Anomalies 211

WessZumino consistency conditions of eqn [12c]. spacetime integral of the anomaly is a topological
This anomaly, however, is not gauge covariant, as can invariant, as noted above in our discussion of
be seen from the fact that it involves not only the instanton-related applications of anomalies.

YangMills field strengths FV, A , but the potentials
V  , A as well. It turns out to be possible, by adding
appropriate polynomials to the currents, to transform Retrospect
the consistent anomaly to a form, called the covariant The wide range of implications of anomalies has
anomaly, which is gauge covariant under gauge surprised even astonished the founders of the
transformations of the potentials V  , A . This anom- subject. New anomaly applications have appeared
aly, however, does not obey the WessZumino within the last few years, and very likely the future
consistency conditions, and cannot be obtained from will see continued growth of the area of quantum
variation of an effective action functional. field theory concerned with the physics and mathe-
The consistent anomalies (but not the covariant matics of anomalies.
anomalies) obey a remarkable set of relations, called
the StoraZumino descent equations, which relate
the abelian anomaly in 2n 2 spacetime dimensions Acknowledgment
to the nonabelian anomaly in 2n spacetime dimen- This work is supported, in part, by the Depart-
sions. This set of equations has been interpreted ment of Energy under grant #DE-FG02-90ER40542.
physically by Callan and Harvey as reflecting the
fact that the Dirac equation has chiral zero modes in See also: Bosons and Fermions in External Fields;
the presence of strings in 2n 2 dimensions and of BRST Quantization; Effective Field Theories; Gauge
domain walls in 2n 1 dimensions. Theories from Strings; Gerbes in Quantum Field Theory;
Index Theorems; Lagrangian Dispersion (Passive
Anomalies and Fermion Doubling in Lattice Scalar); Lattice Gauge Theory; Nonperturbative and
Gauge Theories Topological Aspects of Gauge Theory; Quantum
Electrodynamics and Its Precision Tests; Quillen
A longstanding problem in lattice formulations of Determinant; Renormalization: General Theory;
gauge field theories is that when fermions are SeibergWitten Theory.
introduced on the lattice, the process of discretization
introduces an undesirable doubling of the fermion
particle modes. In particular, when an attempt is made Further Reading
to put chiral gauge theories, such as the electroweak Adler SL (1969) Axial-vector vertex in spinor electrodynamics.
theory, on the lattice, one finds that the doublers Physical Review 177: 24262438.
eliminate the chiral anomalies, by cancellation between Adler SL (1970) Perturbation theory anomalies. In: Deser S,
modes with positive and negative axial-vector charge. Grisaru M, and Pendleton H (eds.) Lectures on Elementary
Particles and Quantum Field Theory, vol. 1, pp. 3164.
Thus, for a long time, it appeared doubtful whether Cambridge, MA: MIT Press.
chiral gauge theories could be simulated on the lattice. Adler SL (2005) Anomalies to all orders. In: t Hooft G (ed.) Fifty Years
However, recent work has led to formulations of lattice of YangMills Theory, pp. 187228. Singapore: World Scientific.
fermions that use a mathematical analog of a domain Adler SL and Bardeen WA (1969) Absence of higher order
wall to successfully incorporate chiral fermions and the corrections in the anomalous axial-vector divergence equation.
Physical Review 182: 15171536.
chiral anomaly into lattice gauge theory calculations. Bardeen W (1969) Anomalous ward identities in spinor field
theories. Physical Review 184: 18481859.
Relation of Anomalies to the AtiyahSinger Bell JS and Jackiw R (1969) A PCAC puzzle: 0 !  in the
Index Theorem -model. Nuovo Cimento A 60: 4761.
Bertlmann RA (1996) Anomalies in Quantum Field Theory.
The singlet (aA = 1) anomaly of eqn [4b] is closely Oxford: Clarendon.
related to the AtiyahSinger index theorem. Specifi- De Azcarraga JA and Izquierdo JM (1995) Lie Groups,
Lie Algebras, Cohomology and Some Applications in Physics,
cally, the Euclidean spacetime integral of the singlet
ch. 10. Cambridge: Cambridge University Press.
anomaly constructed from a gauge field can be Fujikawa K and Suzuki H (2004) Path Integrals and Quantum
shown to give the index of the related Dirac Anomalies. Oxford: Oxford University Press.
operator for a fermion moving in that background Golterman M (2001) Lattice chiral gauge theories. Nuclear
gauge field, where the index is defined as the Physics Proceeding Supplements 94: 189203.
difference between the numbers of right- and left- Green MB, Schwarz JH, and Witten E (1987) Superstring Theory.
vol. 2, sects. 13.313.5. Cambridge: Cambridge University Press.
handed zero-eigenvalue normalizable solutions of Hasenfratz P (2005) Chiral symmetry on the lattice. In: t Hooft G
the Dirac equation. Since the index is a topological (ed.) Fifty Years of YangMills Theory, pp. 377398.
invariant, this again implies that the Euclidean Singapore: World Scientific.
212 Arithmetic Quantum Chaos

Jackiw R (1985) Field theoretic investigations in current algebra Shifman M (1997) Non-perturbative dynamics in supersymmetric
and topological investigations in quantum gauge theories. In: gauge theories. Progress in Particle and Nuclear Physics 39:
Treiman S, Jackiw R, Zumino B, and Witten E (eds.) Current 1116.
Algebra and Anomalies. Singapore: World Scientific and van Nieuwenhuizen P (1988) Anomalies in Quantum Field
Princeton: Princeton University Press. Theory: Cancellation of Anomalies in d = 10 Supergravity.
Jackiw R (2005) Fifty years of YangMills theory and our Leuven: Leuven University Press.
moments of triumph. In: t Hooft G (ed.) Fifty Years of Yang Volovik GE (2003) The Universe in a Helium Droplet, ch. 18.
Mills Theory, pp. 229251. Singapore: World Scientific. Oxford: Clarendon.
Makeenko Y (2002) Methods of Contemporary Gauge Theory, Weinberg S (1996) The Quantum Theory of Fields, Vol. II
ch. 3. Cambridge: Cambridge University Press. Modern Applications, ch. 22. Cambridge: Cambridge
Neuberger H (2000) Chiral fermions on the lattice. Nuclear University Press.
Physics Proceeding Supplements 83: 6776. Zee A (2003) Quantum Field Theory in a Nutshell, sect. IV.7.
Polchinski J (1999) String Theory, vol. 1, sect. 3.4; vol. 2, sect. Princeton: Princeton University Press.
12.2. Cambridge: Cambridge University Press.

Arithmetic Quantum Chaos


J Marklof, University of Bristol, Bristol, UK form a discrete spectrum with an asymptotic density
2006 Elsevier Ltd. All rights reserved. governed by Weyls law
AreanH
#fj : j  g  ; !1 3
4

Introduction We rescale the sequence by setting

The central objective in the study of quantum chaos AreanH


Xj j 4
is to characterize universal properties of quantum 4
systems that reflect the regular or chaotic features of which yields a sequence of asymptotic density 1.
the underlying classical dynamics. Most develop- One of the central conjectures in AQC says that, if
ments of the past 25 years have been influenced by M is an arithmetic hyperbolic surface (see the next
the pioneering models on statistical properties of section for examples of this very special class of
eigenstates (Berry 1977) and energy levels (Berry surfaces of constant negative curvature), the eigen-
and Tabor 1977, Bohigas et al. 1984). Arithmetic values of the Laplacian have the same local
quantum chaos (AQC) refers to the investigation of statistical properties as independent random vari-
quantum systems with additional arithmetic struc- ables from a Poisson process (see, e.g., the surveys by
tures that allow a significantly more extensive Sarnak (1995) and Bogomolny et al. (1997)). This
analysis than is generally possible. On the other means that the probability of finding k eigenvalues Xj
hand, the special number-theoretic features also in randomly shifted interval [X, X L] of fixed
render these systems nongeneric, and thus some of length L is distributed according to the Poisson law
the expected universal phenomena fail to emerge. Lk eL =k!. The gaps between eigenvalues have an
Important examples of such systems include the exponential distribution,
modular surface and linear automorphisms of tori Z b
(cat maps) which will be described below. 1
#fj  N : Xj1  Xj 2 a; bg ! es ds 5
The geodesic motion of a point particle on a N a
compact Riemannian surface M of constant nega- as N ! 1, and thus eigenvalues are likely to appear
tive curvature is the prime example of an Anosov in clusters. This is in contrast to the general
flow, one of the strongest characterizations of expectation that the energy level statistics of generic
dynamical chaos. The corresponding quantum chaotic systems follow the distributions of random
eigenstates j and energy levels j are given by the matrix ensembles; Poisson statistics are usually
solution of the eigenvalue problem for the Laplace associated with quantized integrable systems.
Beltrami operator  (or Laplacian for short) Although we are at present far from a proof of [5],
the deviation from random matrix theory is well
  0; kkL2 M 1 1 understood (see the section Eigenvalue statistics
and Selberg trace formula).
where the eigenvalues Highly excited quantum eigenstates j (j ! 1)
(cf. Figure 1) of chaotic systems are conjectured to
0 0 < 1  2     ! 1 2 behave locally like random wave solutions of [1],
Arithmetic Quantum Chaos 213

in the current physics and mathematics literature. A


first rigorous proof of the existence of scarred
eigenstates has recently been established in the case
of quantized toral automorphisms. Remarkably,
these quantum cat maps may also exhibit QUE. A
more detailed account of results for these maps is
given in the section Quantum eigenstates of cat
maps; see also Rudnick (2001) and De Bievre (to
appear).
There have been a number of other fruitful
interactions between quantum chaos and number
theory, in particular the connections of spectral
statistics of integrable quantum systems with the
Figure 1 Image of the absolute-value-squared of an eigenfunc- value distribution properties of quadratic forms, and
tion j (z) for a nonarithmetic surface of genus 2. The surface is analogies in the statistical behavior of energy levels
obtained by identifying opposite sides of the fundamental region. of chaotic systems and the zeros of the Riemann zeta
Reproduced from Aurich and Steiner (1993) Statistical properties of
function. We refer the reader to Marklof (2006) and
highly excited quantum eigenstates of a strongly chaotic system.
Physica D 64(13): 185214, with permission from R Aurich. Berry and Keating (1999), respectively, for informa-
tion on these topics.

where boundary conditions are ignored. This Hyperbolic Surfaces


hypothesis was put forward by Berry in 1977 and
Let us begin with some basic notions of hyperbolic
tested numerically, for example, in the case of
geometry. The hyperbolic plane H may be abstractly
certain arithmetic and nonarithmetic surfaces of
defined as the simply connected two-dimensional
constant negative curvature (Hejhal and Rackner
Riemannian manifold with Gaussian curvature 1.
1992, Aurich and Steiner 1993). One of the
A convenient parametrization of H is provided by
implications is that eigenstates should have uniform
the complex upper-half plane, H = {x iy: x 2
mass on the surface M, that is, for any bounded
R, y > 0}, with Riemannian line and volume
continuous function g : M ! R
elements
Z Z
jj j2 g dA ! g dA; j!1 6 dx2 dy2 dx dy
M M ds2 ; dA 7
y2 y2
where dA is the Riemannian area element on M.
This phenomenon, referred to as quantum unique respectively. The group of orientation-preserving
ergodicity (QUE), is expected to hold for general isometries of H is given by fractional linear
surfaces of negative curvature, according to a transformations
conjecture by Rudnick and Sarnak (1994). In the
case of arithmetic hyperbolic surfaces, there has az b
H !H ; z 7!
been substantial progress on this conjecture in the cz d
  8
works of Lindenstrauss, Watson, and LuoSarnak a b
2 SL2; R
(discussed later in this article; see also the review by c d
Sarnak (2003)). For general manifolds with ergodic
geodesic flow, the convergence in [6] is so far where SL(2, R) is the group of 2  2 matrices with
established only for subsequences of eigenfunctions unit determinant. Since the matrices 1 and 1
of density 1 (SchnirelmanZelditchColin de Verdiere represent the same transformation, the group of
theorem, see Quantum Ergodicity and Mixing of orientation-preserving isometries can be identified
Eigenfunctions), and it cannot be ruled out that with PSL(2, R):= SL(2, R)={1}. A finite-volume
exceptional subsequences of eigenfunctions have hyperbolic surface may now be represented as the
singular limit, for example, localized on closed quotient nH, where  PSL(2, R) is a Fuchsian
geodesics. Such scarring of eigenfunctions, at least group of the first kind. An arithmetic hyperbolic
in some weak form, has been suggested by numerical surface (such as the modular surface) is obtained, if 
experiments in Euclidean domains, and the existence has, loosely speaking, some representation in n  n
of singular quantum limits is a matter of controversy matrices with integer coefficients, for some suitable n.
214 Arithmetic Quantum Chaos

This is evident in the case of the modular surface,


where the fundamental group is the modular group

 PSL2; Z
  
a b
2 PSL2; R: a; b; c; d 2 Z =f1g
c d
A fundamental domain for the action of the
modular group PSL(2, Z) on H is the set
 
F PSL2;Z z 2 H : jzj > 1;  12 < Re z < 12 9
Figure 3 Fundamental domain of the regular octagon in the
(see Figure 2). The modular group is generated by Poincare disk.
the translation
 
1 1
: z 7! z 1
0 1 The group of orientation-preserving isometries is
now represented by PSU(1, 1) = SU(1, 1)={1},
and the inversion where
 
0 1   
: z 7! 1=z  
1 0 SU1; 1 : ;  2 C; jj2  jj2 1 11
 
These generators identify sections of the boundary
of F PSL(2, Z) . By gluing the fundamental domain acting on D as above via fractional linear transfor-
along identified edges, we obtain a realization of the mations. The fundamental group of the regular
modular surface, a noncompact surface with one octagon surface is the subgroup of all elements in
cusp at z ! 1,pand two conic singularities at z = i PSU(1, 1) with coefficients of the form
and z = 1=2 i 3=2.
An interesting example of a compact arithmetic q
p p p
surface is the regular octagon, a hyperbolic  k l 2;  m n 2 1 2 12
surface of genus 2. Its fundamental domain is
shown in Figure 3 as a subset of the Poincare disc where k, l, m, n 2 Z[i], that is, Gaussian integers of
D = {z 2 C: jzj < 1}, which yields an alternative the form k1 ik2 , k1 , k2 2 Z. Note that not all
parametrization of the hyperbolic plane H. In these choices of k, l, m, n 2 Z[i] satisfy the condition
coordinates, the Riemannian line and volume jj2  jj2 = 1. Since all elements  6 1 of  act
element read fix-point free on H, the surface nH is smooth
without conic singularities.
4dx2 dy2 4dx dy In the following, we will restrict our attention to a
ds2 ; dA 10
1  x2  y2 2 1  x2  y2 2 representative case, the modular surface with
 = PSL(2, Z).

y
Eigenvalue Statistics and Selberg
Trace Formula
The statistical properties of the rescaled eigenvalues
Xj (cf. [4]) of the Laplacian can be characterized by
their distribution in small intervals

N x; L : #fj : x  Xj  x Lg 13

where x is uniformly distributed, say, in the


1 0 1 x interval [X, 2X], X large. Numerical experiments
Figure 2 Fundamental domain of the modular group PSL(2, Z ) by Bogomolny, Georgeot, Giannoni, and Schmit,
in the complex upper-half plane. as well as Bolte, Steil, and Steiner (see references in
Arithmetic Quantum Chaos 215

Bogomolny (1997)) suggest that the Xj are asymp- where H is the set of all primitive oriented closed
totically Poisson distributed: geodesics , and  their lengths. The quantity j is
related to the eigenvalue j by the equation j = 2j
Conjecture 1 For any bounded function g : Z
0 ! C
1=4. The trace formula [18] holds for a large class of
we have
even test functions h. For example, it is sufficient to
Z X1
1 2X Lk eL assume that h is infinitely differentiable, and that the
gN x; L dx ! gk 14 Fourier transform of h,
X X k0
k!
Z
1
as T ! 1. gt h eit d 19
2 R
One may also consider larger intervals, where
has compact support. The trace formula for non-
L ! 1 as X ! 1. In this case, the assumption on
compact surfaces has additional terms from the
the independence of the Xj predicts a central-limit
parabolic elements in the corresponding group, and
theorem. Weyls law [3] implies that the expectation
includes also sums over the resonances of the
value is asymptotically, for T ! 1,
continuous part of the spectrum. The noncompact
Z
1 2X modular surface behaves in many ways like a
N x; L dx  L 15 compact surface. In particular, Selberg showed that
X X
the number of eigenvalues embedded in the con-
This asymptotics holds for any sequence of L tinuous spectrum satisfies the same Weyl law as in
bounded away from zero (e.g., L constant, or the compact case (Sarnak 2003).
L ! 1). Setting
Define the variance by   
Z AreaM 2 1
1 2X 2
h X;XL  20
2 X; L N x; L  L dx 16 4 4
X X
where [X, XL] is the characteristic function of the
In view of the above conjecture, p one
expects interval [X, X L], we may thus view N (X, L) as
2 (X, L)  L in the limit X ! 1, L= X ! 0 (the the left-hand side of the trace formula. The above
variance exhibits
p a less universal behavior in the test function h is, however, not admissible, and
range L X (the notation A B means there is a requires appropriate smoothing. Luo and Sarnak (cf.
constant c > 0 such that A  cB), cf. Sarnak (1995), Sarnak (2003)) developed an argument of this type
and a central-limit theorem for the fluctuations to obtain a lower bound on the average number
around the mean: variance,
Conjecture 2 For any bounded function g : R ! C Z p
1 L 2 X
we have  X; L0 dL0 21
! L 0 log X2
Z 2X p p
1 N x; L  L
g p dx in the regime X= log X L X, which is
X X 2 x; L consistent with the Poisson conjecture 2 (X, L)  L.
Z 1
1 2 Bogomolny, Levyraz, and Schmit suggested a remark-
! p gt e1=2t dt 17 able limiting formula for the two-point correlation
2 1
function for the modular surface (cf. Bogomolny
as X, L ! 1, L X. et al. (1997) and Bogomolny (2006)), based on an
The main tool in the attempts to prove the above analysis of the correlations between multiplicities of
conjectures has been the Selberg trace formula. It lengths of closed geodesics. A rigorous analysis of the
relates sums over eigenvalues of the Laplacians to fluctuations of multiplicities is given by Peter (cf.
sums over lengths of closed geodesics on the Bogomolny (2006)). Rudnick (2005) has recently
hyperbolic surface. The trace formula is in its established a smoothed version of Conjecture 2 in the
simplest form in the case of compact hyperbolic regime
surfaces; we have p p
X X
Z ! 1; !0 22
X1
AreaM 1 L L log X
hj h tanh d
j0
4 1 where the characteristic function in [20] is replaced
XX
1
 gn by a certain class of smooth test functions.
18 All of the above approaches use the Selberg trace
2 sinhn  =2
2H n1 formula, exploiting the particular properties of the
216 Arithmetic Quantum Chaos

distribution of lengths of closed geodesics in exponential degeneracy in the length spectrum seems
arithmetic hyperbolic surfaces. These will be dis- to occur in a weaker form also for nonarithmetic
cussed in more detail in the next section, following surfaces.
the work of Bogomolny, Georgeot, Giannoni and A further useful property of the length spectrum
Schmit, Bolte, and Luo and Sarnak (see Bogomolny of arithmetic surfaces is the bounded clustering
et al. (1997) and Sarnak (1995) for references). property: there is a constant C (again surface
dependent) such that
#L \ ; 1  C 28
Distribution of Lengths of Closed
for all . This fact is evident in the case of the
Geodesics
modular surface; the general case is proved by Luo
The classical prime geodesic theorem asserts that the and Sarnak (cf. Sarnak (1995)).
number N() of primitive closed geodesics of length
less than is asymptotically
Quantum Unique Ergodicity
e
N  23 The unit tangent bundle of a hyperbolic surface nH

describes the physical phase space on which the
One of the significant geometrical characteristics of classical dynamics takes place. A convenient para-
arithmetic hyperbolic surfaces is that the number of metrization of the unit tangent bundle is given by
closed geodesics with the same length grows the quotient nPSL(2, R this may be seen be means
exponentially with . This phenomenon is most of the Iwasawa decomposition for an element
easily explained in the case of the modular surface, g 2 PSL(2, R),
where the set of lengths appearing in the lengths ! !
spectrum is characterized by the condition 1 x y1=2 0
g
2 cosh=2 jtr j 24 0 1 0 y1=2
!
where  runs over all elements in SL(2, Z) with cos =2 sin =2
 29
jtrj > 2. It is not hard to see that any integer n > 2  sin =2 cos =2
appears in the set {jtr j:  2 SL(2, Z)}, and hence
the set of distinct lengths of closed geodesics is where x iy 2 H represents the position of the
particle in nH in half-plane coordinates, and 2
L f2 arcoshn=2: n 3; 4; 5; . . .g 25
[0, 2) the direction of its velocity.

Multiplying the
Therefore, the number of distinct lengths less than matrix [29] from the left by ac db and writing the
is asymptotically (for large ) result again in the Iwasawa form [29], one obtains
the action
N 0 #L \ 0;   e=2 26  
az b
Equations [26] and [23] say that on average the z;
7! ;  2 argcz d 30
cz d
number of geodesics with the same lengths is at least
}e=2 =. which represents precisely the geometric action of
The prime geodesic theorem [23] holds equally for isometries on the unit tangent bundle.
all hyperbolic surfaces with finite area, while [26] is The geodesic flow t on nPSL(2, R) is repre-
specific to the modular surface. For general arith- sented by the right translation
metic surfaces, we have the upper bound  
t et=2 0
 : g 7! g 31
N 0  ce=2 27 0 et=2
for some constant c > 0 that may depend on the The Haar measure on PSL(2, R) is thus trivially
surface. Although one expects N 0 () to be asympto- invariant under the geodesic flow. It is well known
tic to (1=2)N() for generic surfaces (since most that is not the only invariant measure, that is, t is
geodesics have a time-reversal partner which thus not uniquely ergodic, and that there is in fact an
has the same length, and otherwise all lengths are abundance of invariant measures. The simplest
distinct), there are examples of nonarithmetic Hecke examples are those with uniform mass on one, or a
triangles where numerical and heuristic arguments countable collection of, closed geodesics.
suggest N 0 ()  c1 ec2 = for suitable constants c1 > 0 To test the distribution of an eigenfunction
and 0 < c2 < 1=2 (cf. Bogomolny (2006)). Hence j in phase space, one associates with a function
Arithmetic Quantum Chaos 217

a 2 C1 (nPSL(2, R)) the quantum observable Hecke Operators, Entropy


Op(a), a zeroth order pseudodifferential operator and Measure Rigidity
with principal symbol a. Using semiclassical tech-
niques based on Friedrichs symmetrization, one For compact surfaces, the sequence of probability
can show that the matrix element measures approaching the matrix elements j is
relatively compact. That is, every infinite sequence
j a hOpaj ; j i 32 contains a convergent subsequence. Lindenstrauss
central idea in the proof of QUE is to exploit the
is asymptotic (as j ! 1) to a positive functional presence of Hecke operators to understand the
that defines a probability measure on invariance properties of possible quantum limits.
nPSL(2, R). Therefore, if M is compact, any We will sketch his argument in the case of the
weak limit of j represents a probability measure modular surface (ignoring issues related to the non-
on nPSL(2, R). Egorovs theorem (see Quantum compactness of the surface), where it is most
Ergodicity and Mixing of Eigenfunctions) in turn transparent.
implies that any such limit must be invariant For every positive integer n, the Hecke operator
under the geodesic flow, and the main challenge Tn acting on continuous functions on nH with
in proving QUE is to rule out all invariant  = SL(2, Z) is defined by
measures apart from Haar.
d1  
1 X n X
az b
Conjecture 3 (Rudnick and Sarnak (1994); see Tn f z p f 35
Sarnak (1995, 2003)). For every compact hyperbolic n a;d1 b0 d
surface nH, the sequence j converges weakly to . adn

Lindenstrauss has proved this conjecture for The set Mn of matrices with integer coefficients and
compact arithmetic hyperbolic surfaces of congru- determinant n can be expressed as the disjoint union
ence type (such as the second example in the section [ [ a b
n d1
Hyperbolic surfaces) for special bases of eigen- Mn  36
0 d
functions, using ergodic-theoretic methods. These a;d1 b0
adn
will be discussed in more detail in the next section.
His results extend to the noncompact case, that is, to and hence the sum in [35] can be viewed as a sum
the modular surface where  = PSL(2, Z). Here he over the cosets in this decomposition. We note the
shows that any weak limit of subsequences of j is product formula
of the form c , where c is a constant with values in X
[0, 1]. One believes that c = 1, but with present Tm Tn Tmn=d2 37
djgcdm;n
techniques it cannot be ruled out that a proportion
of the mass of the eigenfunction escapes into the The Hecke operators are normal, form a com-
noncompact cusp of the surface. For the modular muting family, and in addition they commute with
surface, c = 1 can be proved under the assumption of the Laplacian . In the following, we consider an
the generalized Riemann hypothesis (see the section orthonormal basis of eigenfunctions j of  that
Eigenfunctions and L-functions and Sarnak are simultaneously eigenfunctions of all Hecke
(2003)). QUE also holds for the continuous part of operators. We will refer to such eigenfunctions as
the spectrum, which is furnished by the Eisenstein Hecke eigenfunctions. The above assumption is
series E(z, s), where s = 1=2 ir is the spectral automatically satisfied, if the spectrum of  is
parameter. Note that the measures associated with simple (i.e., no eigenvalues coincide), a property
the matrix elements conjectured by Cartier and supported by numerical
computations. Lindenstrauss work is based on the
r a hOpaE; 1=2 ir; E; 1=2 iri 33 following two observations. Firstly, all quantum
are not probability measures but only Radon limits of Hecke eigenfunctions are geodesic-flow
measures, since E(z, s) is not square-integrable. Luo invariant measures of positive entropy, and sec-
and Sarnak, and Jakobson have shown that ondly, the only such measure of positive entropy
that is recurrent under Hecke correspondences is
r a a the Lebesgue measure.
lim 34 The first property is proved by Bourgain and
r ! 1 r b b
Lindenstrauss (2003) and refines arguments of
for suitable test functions a, b 2 C1 (nPSL(2, R)) Rudnick and Sarnak (1994) and Wolpert (2001) on
(cf. Sarnak (2003)). the distribution of Hecke points (see Sarnak (2003) for
218 Arithmetic Quantum Chaos

references to these papers). For a given point z 2 H holds, it is inessential for the proof of QUE due to
the set of Hecke points is defined as the positive entropy of quantum limits discussed in
the previous paragraph.
Tn z : Mn z 38
For most primes, the set Tpk (z) comprises (p 1)
pk1 distinct points on nH. For each z, the Hecke Eigenfunctions and L-Functions
operator Tn may now be interpreted as the
adjacency matrix for a finite graph embedded in An even eigenfunction j (z) for  = SL(2, Z) has the
nH, whose vertices are the Hecke points Tn (z). Fourier expansion
Hecke eigenfunctions j with X
1
j z aj ny1=2 Kij 2ny cos2nx 41
Tn j j nj 39
n1

give rise to eigenfunctions of the adjacency matrix. We associate with j (z) the Dirichlet series
Exploiting this fact, Bourgain and Lindenstrauss
show that for a large set of integers n X
1
Ls; j aj nns 42
X
jj zj2 jj wj2 40 n1

w2Tn z which converges for Re s large enough. These series


2 have an analytic continuation to the entire complex
that is, pointwise values of jj j cannot be substan-
plane C and satisfy a functional equation,
tially larger than its sum over Hecke points. This,
and the observation that Hecke points for a large set s; j 1  s; j 43
of integers n are sufficiently uniformly distributed
on nH as n ! 1, yields the estimate of positive where
entropy with a quantitative lower bound.    
s s ij s  ij
Lindenstrauss proof of the second property, s; j    Ls; j 44
2 2
which shows that Lebesgue measure is the only
quantum limit of Hecke eigenfunctions, is a result of If j (z) is in addition an eigenfunction of all Hecke
a currently very active branch of ergodic theory: operators, then the Fourier coefficients in fact
measure rigidity. Invariance under the geodesic flow coincide (up to a normalization constant) with the
alone is not sufficient to rule out other possible limit eigenvalues of the Hecke operators
measures. In fact, there are uncountably many
measures with this property. As limits of Hecke aj m j maj 1 45
eigenfunctions, all quantum limits possess an addi-
tional property, namely recurrence under Hecke If we normalize aj (1) = 1, the Hecke relations [37]
correspondences. Since the explanation of these is result in an Euler product formula for the
rather involved, let us recall an analogous result in a L-function,
simpler setup. The map 2 : x 7! 2x mod 1 defines a Y
1
Ls; j 1  aj pps p12s 46
hyperbolic dynamical system on the unit circle with
p prime
a wealth of invariant measures, similar to the case of
the geodesic flow on a surface of negative curvature. These L-functions behave in many other ways like
Furstenberg conjectured that, up to trivial invariant the Riemann zeta or classical Dirichlet L-functions.
measures that are localized on finitely many rational In particular, they are expected to satisfy a Riemann
points, Lebesgue measure is the only 2-invariant hypothesis, that is, all nontrivial zeros are con-
measure that is also invariant under action of strained to the critical line Ims = 1=2.
3 : x 7! 3x mod 1. This fundamental problem is Questions on the distribution of Hecke eigenfunc-
still unsolved and one of the central conjectures in tions, such as QUE or value distribution properties,
measure rigidity. Rudolph, however, showed that can now be translated to analytic properties of
Furstenbergs conjecture is true if one restricts the L-functions. We will discuss two examples.
statement to 2-invariant measures of positive The asymptotics in [6] can be established
entropy (cf. Lindenstrauss (to appear)). In Linden- by proving [6] for the choices g = k , k = 1, 2, . . . ,
strauss work, 2 plays the role of the geodesic that is,
flow, and 3 the role of the Hecke correspondences. Z
Although here it might also be interesting to ask jj j2 k dA ! 0 47
whether an analog of Furstenbergs conjecture M
Arithmetic Quantum Chaos 219

Watson discovered the remarkable relation (Sarnak where


2003) Z Z
Z 2 Ca : at gagd g dt 54
R nPSL2;R
j1 j2 j3 dA

M is the classical autocorrelation function for the
4 12 ; j1  j2  j3 geodesic flow with respect to the observable a
48
1; sym2 j1 1; sym2 j2 1; sym2 j3 (Sarnak 2003). Up to the arithmetic factor
(1=2)L(1=2, j ), eqn [53] is consistent with the
The L-functions (s, g) in Watsons formula are FeingoldPeres prediction for the variance of generic
more advanced cousins of those introduced earlier chaotic systems. Furthermore, recent estimates of
(see Sarnak (2003) for details). The Riemann moments by Rudnick and Soundararajan (2005)
hypothesis for such L-functions then implies, via indicate that Conjecture 4 is not valid in the case of
[48], a precise rate of convergence to QUE for the the modular surface.
modular surface,
Z Z
1=4
jj j2 g dA g dA Oj 49 Quantum Eigenstates of Cat Maps
M M
Cat maps are probably the simplest area-preserving
for any > 0, where the implied constant depends maps on a compact surface that are highly chaotic.
on and g. They are defined as linear automorphisms on the
A second example on the connection between torus T2 = R 2 =Z2 ,
statistical properties of the matrix elements
j (a) = hOp(a)j , j i (for fixed a and random j) and A : T2 ! T2 55
values L-functions has appeared in the work of Luo
where a point  2 R2 (mod Z2 ) is mapped to
and Sarnak (cf. Sarnak (2003)). Define the variance
A(mod Z2 ); A is a fixed matrix in GL(2, Z) with
1 X 2 eigenvalues off the unit circle (this guarantees
V a j a  a 50 hyperbolicity). We view the torus T2 as a symplectic
N  
j
manifold, the phase space of the dynamical system.
with N() = #{j: j  }; cf. [3]. Following a conjec- Since T2 is compact, the Hilbert space of quantum
ture by FeingoldPeres and Eckhardt et al. (see Sarnak states is an N-dimensional vector space HN , N
(2003) for references) for generic quantum chaotic integer. The semiclassical limit, or limit of small
systems, one expects a central-limit theorem for the wavelengths, corresponds here to N ! 1.
statistical fluctuations of the j (a), where the normal- It is convenient to identify HN with L2 (Z=NZ),
ized variance N()1=2 V (a) is asymptotic to the with the inner product
classical autocorrelation function C(a), see eqn [54]. 1 X
h 1; 2i 1 Q 2 Q 56
Conjecture 4 For any bounded function g : R ! C N Q mod N
we have
! For any smooth function f 2 C1 (T2 ), define a
1 X j a  a quantum observable
g p
N   V a X
j
Z 1 OpN f b
f nTN n
1 2
! p gte1=2t dt 51 n2Z2
2 1
where b f (n) are the Fourier coefficients of f, and
as  ! 1. TN (n) are translation operators
Luo and Sarnak prove that in the case of the
TN n ein1 n2 =N t2n2 t1n1 57
modular surface the variance has the asymptotics

lim N1=2 V a hBa; ai 52


!1
t1 Q Q 1
where B is a non-negative self-adjoint operator 58
t2 Q e2iQ=N Q
which commutes with the Laplacian  and all
Hecke operators Tn . In particular, we have The operators OpN (a) are the analogs of the

pseudodifferential operators discussed in the section
Bj 12 L 12; j Cj j 53
Quantum unique ergodicity.
220 Arithmetic Quantum Chaos

A quantization of A is a unitary operator UN (A) Graffi, and Isola (1995). That is, [65] holds for all
on L2 (Z=NZ) satisfying the equation j = 1, . . . , N. Rudnick and Kurlberg, and more
recently Gurevich and Hadani, have established
UN A1 OpN f UN A OpN f  A 59 results on the rate of convergence analogous to
1 2
for all f 2 C (T ). There are explicit formulas for [49]. These results are unconditional. Gurevich and
UN (A) when A is in the group Hadani use methods from algebraic geometry based
   on those developed by Deligne in his proof of the
a b Weil conjectures (an analog of the Riemann hypoth-
 2 SL2; Z: ab  cd  0 mod 2 60
c d esis for finite fields).
In the case of quantum-cat maps, there are values
These may be viewed as analogs of the ShaleWeil of N for which the number of coinciding eigenvalues
or metaplectic representation for SL(2). for example, can be large, a major difference to what is expected
the quantization of for the modular surface. Linear combinations of
  eigenstates with the same eigenvalue are as well
2 1
A 61 eigenstates, and may lead to different quantum
3 2
limits. Indeed, Faure, Nonnenmacher, and De Bievre
yields (see De Bievre (to appear)) have shown that there
X are subsequences of values of N, so that, for all
1=2 2i 2 f 2 C1 (T2 ),
UN A Q N exp Q
Q0 mod N
N Z
1 1
hOpf Nj ; Nj i ! f d f 0 66
0
 QQ Q 02
Q0 62 2 T2 2

that is, half of the mass of the quantum limit


In analogy with [1], we are interested in the localizes on the hyperbolic fixed point of the map.
statistical features of the eigenvalues and eigenfunc- This is the first, and to date the only, rigorous result
tions of UN (A), that is, the solutions to concerning the existence of scarred eigenfunctions in
UN A ; kkL2 Z=NZ 1 63 systems with chaotic classical limit.

Unlike typical quantum-chaotic maps, the statistics


of the N eigenvalues Acknowledgment
1
N1 ; N2 ; . . . ; NN 2 S 64 The author is supported by an EPSRC Advanced
Research Fellowship.
do not follow the distributions of unitary random
matrices in the limit N ! 1, but are rather singular See also: Quantum Ergodicity and Mixing of
(Keating 1991). In analogy with the Selberg trace Eigenfunctions; Random Matrix Theory in Physics.
formula for hyperbolic surfaces [18], there is an
exact trace formula relating sums over eigenvalues
of UN (A) with sums over fixed points of the classical
Further Reading
map (Keating 1991).
As in the case of arithmetic surfaces, the eigenfunc- Aurich R and Steiner F (1993) Statistical properties of highly
tions of cat maps appear to behave more generically. excited quantum eigenstates of a strongly chaotic system.
Physica D 64(13): 185214.
The analog of the SchnirelmanZelditchColin de
Berry MV and Keating JP (1999) The Riemann zeros and
Verdiere theorem states that, for any orthonormal eigenvalue asymptotics. SIAM Review 41(2): 236266.
basis of eigenfunctions {Nj }N j = 1 we have, for all Bogomolny EB (2006) Quantum and arithmetical chaos. In:
f 2 C1 (T2 ), Cartier PE, Julia B, Moussa P, and Vanhove P (eds.) Frontiers
Z in Number Theory, Physics and Geometry on Random
Matrices, Zeta Functions, and Dynamical Systems, Springer
hOpf Nj ; Nj i ! f d 65 Lecture Notes. Les Houches.
T2
Bogomolny EB, Georgeot B, Giannoni M-J, and Schmit C (1997)
as N ! 1, for all j in an index set JN of full density, Arithmetical chaos. Physics Reports 291(56): 219324.
that is, #JN  N. Kurlberg and Rudnick (see De Bievre S Recent Results on Quantum Map Eigenstates,
Rudnick (2001)) have characterized special bases of Proceedings of QMATH9, Giens 2004 (to appear).
Hejhal DA and Rackner BN (1992) On the topography of Maass
eigenfunctions {Nj }N
j = 1 (termed Hecke eigenbases, waveforms for PSL(2, Z). Experiment. Math. 1(4): 275305.
in analogy with arithmetic surfaces) for which QUE Keating JP (1991) The cat maps: quantum mechanics and classical
holds, generalizing earlier work of Degli Esposti, motion. Nonlinearity 4(2): 309341.
Asymptotic Structure and Conformal Infinity 221

Lindenstrauss E Rigidity of multi-parameter actions. Israel (Barcelona, 2000), Progr. Math., vol. 202, pp. 429437.
Journal of Mathematics (Furstenberg Special Volume) (to Basel: Birkhauser.
appear). Rudnick Z (2005) A central limit theorem for the spectrum of the
Marklof J (2006) Energy level statistics, lattice point problems and modular group, Park city lectures. Annales Henri Poincare 6:
almost modular functions. In: Cartier PE, Julia B, Moussa P, 863883.
and Vanhove P (eds.) Frontiers in Number Theory, Physics and Sarnak P Arithmetic quantum chaos. The Schur lectures (1992)
Geometry on Random Matrices, Zeta Functions, and Dynami- (Tel Aviv), Israel Math. Conf. Proc., 8, pp. 183236. Bar-Ilan
cal Systems, Springer Lecture Notes. Les Houches. Univ., Ramat Gan, 1995.
Rudnick Z (2001) On quantum unique ergodicity for linear maps Sarnak P (2003) Spectra of hyperbolic surfaces. Bulletin of the
of the torus. In: European Congress of Mathematics, American Mathematical Society (N.S.) 40(4): 441478.

Asymptotic Structure and Conformal Infinity


J Frauendiener, Universitat Tubingen, Tubingen, the background. In other theories, like electrody-
Germany namics, the physical field, such as the Maxwell field,
2006 Elsevier Ltd. All rights reserved. is very different from the background field, the flat
metric of Minkowski space. The fact that the metric
in GR plays a dual role makes it difficult to extract
physical meaning from the metric because there is no
Introduction
nondynamical reference point.
A major motivation for studying the asymptotic Imagine a system alone in the universe. As we
structure of spacetimes has been the need for a recede from the system we would expect its influence
rigorous description of what should be understood by to decrease. So we expect that the spacetime which
an isolated system in Einsteins theory of gravity. models this situation mathematically will resemble
As an example, consider a gravitating system some- the flat Minkowski spacetime and it will approximate
where in our universe (e.g., a galaxy, a cluster of it even better the farther away we go. This implies
galaxies, a binary system, or a star) evolving accord- that one needs to impose fall-off conditions for the
ing to its own gravitational interaction, and possibly curvature and that the manifold will be asymptoti-
reacting to gravitational radiation impinging on it cally flat in an appropriate sense. However, there is
from the outside. Thereby it will emit gravitational the problem that fall-off conditions necessarily imply
radiation. We are interested in describing these waves the use of coordinates and it is awkward to decide
because they provide us with important information which coordinates should be good ones. Thus, it is
about the physics governing the system. not clear whether the notion of an asymptotically flat
To adequately describe this situation, we need to spacetime is an invariant concept.
idealize the real situation in an appropriate way, since What is needed, therefore, is an invariant defini-
it is hopeless to try to analyze the behavior of the tion of asymptotically flat spacetimes. The key
system in its interaction with the rest of the universe. observation in this context is that infinity is far
We are mainly interested in the behavior of the away with respect to the spacetime metric. This
system, and not so much in other processes taking means that geodesics heading away from the system
place at large distances from the system. Since we should be able to run forever, that is, be defined
would like to ignore those regions, we need a way to for arbitrary values of their affine parameter s.
isolate the system from their influence. Infinity will be reached for s ! 1. However,
The notion of an isolated system allows us to suppose we do not use the spacetime metric g but a
select individual subsystems of the universe and metric ^g which is scaled down with respect to g, that
describe their properties regardless of the rest of the is, in such a way that ^g = 2 g for some function .
universe so that we can assign to each subsystem Then it might be possible to arrange  in such a way
such physical attributes as its energymomentum, that geodesics for the metric ^g cover the same events
angular momentum, or its emitted radiation field. (strictly speaking, this holds only for null geodesics,
Without this notion, we would always have to take but this is irrelevant for the present plausibility
into account the interaction of the system with its argument) as those for the metric g yet that their
environment in full detail. affine parameter ^s (which is also scaled down with
In general relativity (GR) it turns out to be a rather respect to s) approaches a finite value ^s0 for s ! 1.
difficult task to describe an isolated system and the Then we could attach a boundary to the spacetime
reason is as always in Einsteins theory the fact manifold consisting of all the limit points corre-
that the metric acts both as the physical field and as sponding to the events with ^s = ^s0 on the ^g-geodesics.
222 Asymptotic Structure and Conformal Infinity

This boundary would have to be interpreted as Clearly, the metric is undefined at events with
infinity for the spacetime because it takes infinitely cos U = 0 or cos V = 0. These would correspond to
long for the g-geodesics to get there. events with u = 1 or v = 1 which do not lie in
We arrived at this idea of attaching a boundary by M. However, by defining the function
considering the metric structure only up to arbi-
 2 cos U cos V
trary scaling, that is, by looking at metrics which
differ only by a factor. This is the conformal we find that the metric ^g = 2 g with
structure of the spacetime manifold in question. By
considering the spacetime only from the point of ^g 4dU dV  sin2 V  U d2 3
view of its conformal structure we obtain a picture is conformally equivalent to g and is regular for all
of the spacetime which is essentially finite but which values of U and V (keeping V  U). In fact, by
leaves its causal properties unchanged, and hence in defining the coordinates
particular the properties of wave propagation. This
is exactly what is needed for a rigorous treatment of T V U; RV U
radiation emitted by the system. this metric takes the form
^g dT 2  dR2  sin2 R d2 4

Infinity for Minkowski Spacetime the metric of the static Einstein universe E. Thus, we
may regard the Minkowski spacetime as the part of
The above discussion suggests that we should consider the Einstein cylinder defined by restricting the
the spacetime metric only up to scale, that is, coordinates T and R to the region jTj R <  as
to focus on the conformal structure of the spacetime illustrated in Figure 1. Although M can be considered
in question. Since we are interested in systems which as being diffeomorphic to the shaded part in Figure 1,
approach Minkowski spacetime at large distances these two manifolds are not isometric. This is obvious
from the source, it is illuminating to study Minkowski from considering the properties of the events lying on
spacetime as a preliminary example. So consider the
manifold M = R4 equipped with the flat metric

g dt2  dr2  r2 d2 1 i+


where r is the standard radial coordinate defined by
r2 = x2 y2 z2 and
d2 d2 sin2 d2 I +

2
is the standard metric on the unit sphere S . We now
introduce retarded and advanced time coordinates,
which are adapted to the null cone and hence to the
conformal structure of g by the definition
u t  r; vtr
and obtain the metric in the form i0

g du dv  14 v  u2 d2

The coordinates u and v both take arbitrary real values


but they are restricted by the relation v  u = 2r  0.
In order to see what happens at infinity, we introduce
the coordinates U and V by the relations
I
u tan U; v tan V
i
Then U and V both take values in the open interval
(=2, =2) with V  U and the metric is trans-
formed to
1  
g 4dU dV  sin2 V  U d2 2 Figure 1 The embedding of Minkowski spacetime into the
4 cos2 U cos2 V Einstein cylinder.
Asymptotic Structure and Conformal Infinity 223

the boundary @M of M in E. Fix a point P inside M Definition 1 A spacetime (M, gab ) is called asymp-
and follow a null geodesic with respect to the metric ^g totically simple if there exists a manifold-with-
from P toward the future. It will intersect @M after a boundary M c with metric ^gab and scalar field  on
finite amount of its affine parameter has elapsed. Mc and boundary I = @M such that the following
When we follow a null geodesic with respect to g conditions hold:
from P in the same direction, we find that it does not c M = int M;c
1. M is the interior of M:
reach @M for any value of its affine parameter. Thus, 2
2. ^gab =  gab on M;
the boundary is at infinity for the metric g but at a 
3.  and ^gab are smooth on all of M;
finite location with respect to the metric ^g. When we
4.  > 0 on M;  = 0, ra  6 0 on I ; and
consider all possible kinds of geodesics for the metric g
5. each null geodesic acquires both future and past
we find that @M consists of five qualitatively different
endpoints on I .
pieces. The future pointing timelike geodesics all
approach the point i given by (T, R) = (, 0), while This definition formalizes the construction which
the past-pointing geodesics approach i with coordi- was explicitly performed above, by which one
nates (, 0). All spacelike geodesics come arbitrarily attaches a regular (nonempty) boundary to a space-
close to a point i0 with coordinates (0, ) (located on time after suitably rescaling its metric. Asymptoti-
the front of the cylinder in Figure 1). Null geodesics, cally simple spacetimes are exactly those for which
however, are different. For any point (T,   jTj) with this process of conformal compactification is possi-
T 6 0,  on @M there are g-null-geodesics which ble. The purpose of condition 5 is to exclude
come arbitrarily close. pathological cases. There are spacetimes which do
In this sense, we may regard @M as consisting of not satisfy this condition (e.g., the Schwarzschild
limit points obtained by tracing-geodesics for infi- spacetime, where some of the null geodesics enter
nite values of their affine parameters. According to the event horizon and cannot escape to infinity).
the causal character of the geodesics the set of their Yet, one would like to include them as being
respective limit points is called future/past timelike asymptotically simple in a sense, because they
infinity i , spacelike infinity i0 or future/past null- clearly describe isolated systems. For these cases,
infinity, denoted by I  . These two parts of null- there exists the notion of weakly asymptotically
infinity are three-dimensional regular submanifolds simple spacetimes.
of the embedding manifold E, while the points i , i0 In order to arrive at asymptotically flat space-
are regular points in E in the sense that the metric ^g times, one needs to make certain assumptions about
is regular there. This is not automatic, considering the behavior of the curvature near the boundary,
the fact that infinitely many geodesics converge to a thus:
single point. However, the flatness of Minkowski
Definition 2 An asymptotically simple spacetime is
spacetime guarantees that the geodesics approach at
called asymptotically flat if its Ricci tensor Ric[g]
just the appropriate rate for the limit points to be
vanishes in a neighborhood of I .
regular.
This example shows that the structure of the Note that this definition imposes a rather strong
boundary is determined entirely by the metric g of restriction on the Ricci curvature; less restrictive
Minkowski spacetime. If we had chosen a different assumptions are possible. This condition applies
function 0 = ! with ! > 0 then we would not only near I . Thus, it is possible to consider
have obtained the Einstein cylinder but some spacetimes which contain matter fields as long as
different Lorentzian manifold (M0 , g0 ). Yet, the these fields do not extend to infinity.
boundary of M in M0 would have had the same Other asymptotically simple spacetimes which are
properties. not asymptotically flat are the de Sitter and anti-de
Sitter spacetimes which are solutions of the Einstein
equations with nonvanishing cosmological constant .
It is a simple consequence of the definition that
Asymptotically Flat Spacetimes
the boundary I is a regular three-dimensional
The physical idea of an isolated system is captured hypersurface of the embedding spacetime M c which
mathematically by an asymptotically flat space- is timelike, spacelike, or null depending on the sign
time. Since such a spacetime M is expected to of . In particular, for the Minkowski spacetime
approach Minkowski spacetime asymptotically, ( = 0) the boundary is necessarily a null hypersur-
the asymptotic structure of M is also expected to face, as noted above.
be similar to that of M. This expectation is The requirement that the vacuum Einstein
expressed in equations hold near I has several important
224 Asymptotic Structure and Conformal Infinity

consequences. First, I is a null hypersurface with In physical terms, the supertranslations arise
the special property of being shear-free. This means because there are infinitely many directions from
that any cross section of a bundle of its null which observers at infinity (whose world lines coincide
generators does not suffer any distortions when with the null generators of I in a certain limit) can
moved along the generators. Only expansion or observe the system and because each observer is free to
contraction can occur. The global structure of I choose its own origin of proper time u. The observers
is the same as the one from the example above. surrounding the system are not synchronized, because
Null infinity consists of two connected components, under the assumptions made there is no natural way to
I  , each of which is diffeomorphic to S2  R. Thus, fix a unique common origin. Hence, a supertranslation
topologically, I  are cylinders. The cone-like is a shift of the parameter along each null generator of
appearance as seen in Figure 1 is artificial. It I corresponding to a change of origin for each
depends on the particular conformal factor  chosen individual observer. It can be given as a map S2 ! R.
for the conformal compactification. Furthermore, it A choice of origin on each null generator of I is
is only in very exceptional cases that the metric ^g is referred to as a cut of I . It is a two-dimensional
regular at i0 or i . surface of spherical topology which intersects each null
The most important consequence, however, con- generator exactly once. It is an open question whether
cerns the conformal Weyl tensor Ca bcd . This is the one can always synchronize the observers by imposing
part of the full Riemann curvature tensor Ra bcd which canonical conditions at i0 or i , thereby reducing the
is trace-free. It is invariant under conformal rescal- BMS group to the smaller Poincare group.
ings of the metric. Thus, on M, Ca bcd = C^ a bcd . When The supertranslations contain a unique four-
the vanishing of the Ricci tensor near I is assumed dimensional normal subgroup. In M these special
then it turns out that the Weyl tensor necessarily supertranslations are the ones which are induced by
vanishes on I . This is the ultimate justification for the translations of Minkowski spacetime in the
calling such manifolds asymptotically flat because the following way. Take the future light cone of some
entire curvature vanishes on I . event P and follow it out to I , where its intersection
defines an origin for each observer located there.
Now consider the light cone of another event Q
Some Consequences obtained from P by a translation in a spatial
direction. Then the light emitted from Q will arrive
There are several consequences of the existence of
at I earlier than that from P for observers in the
the conformal boundary I . They all can be traced
direction of the translation, while it will be delayed
back to the fact that this boundary can be used to
for observers in the opposite direction. This change
separate the geometric fields into a universal back-
in arrival time defines a specific supertranslation.
ground field and dynamical fields which propagate
Similarly, for a translation in a temporal direction,
on it. The background is given by the boundary
the light from Q will arrive later than that from P
points attached to an asymptotically flat spacetime
for all observers. Thus, every translation in M
which always form a three-dimensional null hyper-
defines a particular supertranslation on I . These
surface I with two connected components (in the
can be characterized in a different way, which is
sequel, we restrict our attention to I only; I  is
intrinsic to I and which can be used in the general
treated similarly), each with the topology of a
case even though there will be no Killing vectors
cylinder. And in each case, I is shear-free.
present in a general asymptotically flat spacetime. In
an appropriate coordinate system, the asymptotic
The BMS Group
translations are given as linear combinations of the
Since the structure of null-infinity is universal over first four spherical harmonics Y00 , Y10 , Y11 . The
all asymptotically flat spacetimes, it is obvious that space of asymptotic translations T is in a natural
its symmetry group should also possess a universal way isometric to M.
meaning. This group, the so-called BondiMetzner
Sachs (BMS) group is in many respects similar to the
Poincare group, the symmetry group of M. It is the The Peeling Property
semidirect product of the Lorentz group with an c Since it
Now consider the Weyl tensor Ca bcd on M.
abelian group which, however, is not the four-
vanishes on I where  = 0 we may form the
dimensional translation group but an infinite-dimen-
quotient
sional group of supertranslations. This group is a
normal subgroup, so the factor group is isomorphic
to the Lorentz group. Ka bcd 1 Ca bcd
Asymptotic Structure and Conformal Infinity 225

which can be shown to be smooth on I . The The quantity in brackets, the mass aspect, is a
physical interpretation of this tensor field is based combination of the scalar 2 which in a sense
on the following properties. In source-free regions measures the strength of the Coulomb-like part of
the field satisfies the spin-2 zero-rest-mass equation the gravitational field on I and the complex
quantity . In a so-called Bondi coordinate system,
ba Ka bcd 0
r this quantity is related to the radiation field 4 by
which is very similar to the Maxwell equations for the relation
the electromagnetic (spin-1) Faraday tensor. Thus, 

4
Ka bcd is interpreted as the gravitational field, which
describes the gravitational waves contained inside the dot indicating differentiation with respect to the
the system. The zero-rest-mass equation for Ka bcd affine parameter along the null generators. Thus, 
and the fact that the field is smooth on I implies that is essentially the second time integral of the
the Weyl tensor satisfies the peeling property. This radiation field. The mass aspect is integrated against
is a characteristic conspiracy between the fall-off a function W which is an asymptotic translation,
behavior of certain components of the Weyl tensor that is, a linear combination of the first four
along outgoing g-null-geodesics approaching I in spherical harmonics. Thus, one can view the
M with respect to an affine parameter s for s ! 1 expression [6] as defining a linear map T ! R.
and their algebraic type. Symbolically, the Weyl Since T and M are isometric this defines a covector
tensor has the following behavior as s ! 1 along Pa on M, which can always be shown to be timelike,
the null geodesic: Pa Pa  0. This positivity property together with the
fact that in the special cases of Schwarzschild and
4 31 211 1111 Kerr spacetimes the integral yields the mass para-
C 2 3 Os5 5
s s s s4 meters when evaluated for a time translation
where the numerator of each component indicates (W = 1) motivates the interpretation of PC as the
its Petrov type. The repeated principal null direction energymomentum 4-vector of the spacetime at the
(PND) in the first three components and one of the instant defined by the cut C. In particular, for W = 1
PNDs in the fourth component are aligned with the the integral gives the time component of PC , the
tangent vector of the geodesic. This implies that BondiSachs energy E.
the farthest reaching component of the Weyl tensor, The interpretation of [6] as energymomentum is
which is O(1=s), has the Petrov type of a radiation strengthened by the fact that PC arises as dual to the
field. It is customary to combine the components translations which is familiar from Lagrangian field
which are O(1=si ) into one complex function and theories where energy and momentum appear as
denote it by 5i . When expressed in terms of the generators for time and space translations. In fact,
c this fall-off behavior implies that
field Ka bcd on M, one can set up a Hamiltonian framework where the
of all components of Ka bcd only 4 does not role of the BondiSachs energymomentum as
necessarily vanish on I . generator of asymptotic translations is made
In special cases like the Minkowski, Schwarzs- explicit.
child, Kerr, and more generally in all asymptotically This point of view suggests that one should also
flat stationary spacetimes, even 4 vanishes on I . be able to define a notion of angular momentum for
For these reasons, 4 is called the radiation field of asymptotically flat spacetimes because angular
the system, that is, that part of the gravitational field momentum arises as the generator of rotations,
which can be registered by the observers at infinity. which can also be defined asymptotically. However,
It describes the outgoing radiation which is being while there is a unique notion of translation on I ,
emitted by the system during its evolution. this is not the case for rotations (and boosts). The
reason is hidden in the structure of the BMS group
where the Lorentz group appears naturally as a
The BondiSachs Mass-Loss Formula factor group but not as a unique subgroup. In
Gravitational waves carry away energy from the physical terms, the angular momentum depends on
system. This is a consequence of the BondiSachs an origin but there is no natural way to choose an
mass-loss formula. The BondiSachs energy origin on I . This ambiguity in the choice of origin
momentum is related to a weighted integral over a leads to several nonequivalent expressions for
cut C, angular momentum in the literature.
Z Consider now two cuts C and C0 , with C0 later than
1 C. Then we may compute the difference E = E  E0
PC W  W 2 _  d2 S 6
4G C of the BondiSachs energies with respect to the two
226 Averaging Methods

cuts. It turns out that this difference can be the neighborhood of spacelike infinity i0 is not
expressed as an integral over the (three-dimensional) sufficiently well understood so far.
piece  of I which is bounded by the two cuts
(i.e., @ = C0  C): See also: Black Hole Mechanics; Boundaries for
Z Spacetimes; Canonical General Relativity; Einstein
1 Equations: Exact Solutions; Einstein Equations: Initial
0
E E _ _ d3 V 7
4G  Value Formulation; General Relativity: Overview;
Gravitational Waves; Quantum Entropy; Spacetime
This result means that the BondiSachs energy of the Topology, Causal Structure and Singularities; Stability of
system decreases, since E0 < E and the rate of Minkowski Space; Stationary Black Holes.
decrease is given by the (positive-definite) amount
of gravitational radiation which leaves the system
during the period defined by the two cuts. Further Reading
It is necessary to point out that in this article the Ashtekar A (1987) Asymptotic Quantization. Naples: Bibliopolis.
structure of null infinity has been postulated based Bondi H, van der Burg MGJ, and Metzner AWK (1962)
on physical reasonings. The Einstein equations have Gravitational waves in general relativity VII. Waves from
been used only in a very weak sense, namely only in axi-symmetric isolated systems. Proceedings of the Royal
a neighborhood of I . It is an entirely different Society of London, Series A 269: 2152.
Frauendiener J (2004) Conformal infinity. Living Reviews in
question whether the field equations are compatible Relativity, vol. 3. http://relativity.livingreviews.org/Articles/
with this postulated structure. To answer it, one lrr-2004-1/index.html.
needs to show that there are global solutions of the Friedrich H (1992) Asymptotic structure of space-time. In: Janis AI
Einstein equations which exhibit the postulated and Porter JR (eds.) Recent Advances in General Relativity.
behavior in the asymptotic region. This question Boston: Birkhauser.
Friedrich H (1998a) Einsteins equation and conformal structure.
has been settled recently in the affirmative: there are In: Huggett SA, Mason LJ, Tod KP, Tsou SS, and Woodhouse
many global spacetimes which are asymptotically NMJ (eds.) The Geometric Universe: Science, Geometry and
flat in the sense described here. the Work of Roger Penrose. Oxford: Oxford University Press.
This article discussed has the notion of null Friedrich H (1998b) Gravitational fields near space-like and null
infinity, that is, of spacetimes which are asymptoti- infinity. Journal of Geometry and Physics 24: 83163.
Geroch R (1977) Asymptotic structure of space-time. In: Esposito
cally flat in lightlike directions. Spacetimes which FP and Witten L (eds.) Asymptotic Structure of Space-Time.
are asymptotically flat in spacelike directions have New York: Plenum.
not been covered. The latter is a notion which has Hawking S and Ellis GFR (1973) The Large Scale Structure of
been developed largely independently of null infinity Space-Time. Cambridge: Cambridge University Press.
Penrose R (1965) Zero rest-mass fields including gravitation:
since it is essentially a property of an initial data set
asymptotic behaviour. Proceedings of the Royal Society of
and not of the entire four-dimensional spacetime. London, Series A 284: 159203.
Ultimately, these two notions should coincide, in the Penrose R (1968) Structure of space-time. In: DeWitt CM
sense that if one has an initial data set which is and Wheeler JA (eds.) Battelle Rencontres. New York:
asymptotically flat in spatial directions in an appro- W. A. Benjamin.
priate sense then its Cauchy development will be an Penrose R and Rindler W (1984, 1986) Spinors and Space-Time,
Cambridge: Cambridge University Press.
asymptotically flat spacetime. However, as of yet, it Sachs RK (1962) Gravitational waves in general relativity VIII.
is not clear what the appropriate conditions should Waves in asymptotically flat space-time. Proceedings of the
be because the structure of the gravitational field in Royal Society of London, Series A 270: 103127.

Averaging Methods
A I Neishtadt, Russian Academy of Sciences, fast oscillations. The most common field of applica-
Moscow, Russia tions of averaging methods is the analysis of the
2006 Elsevier Ltd. All rights reserved. behavior of dynamical systems that differ from
integrable systems by small perturbations.

Introduction Averaging Principle


Averaging methods are the methods of perturbation Equations of motion of a system that differ from an
theory that are based on the averaging principle and integrable system by small perturbations often can
the idea of dividing the dynamics into slow drift and be written in the form
Averaging Methods 227

I_ "gI; ; "; _ !I "f I; ; " the first r vectors of which belong to . Instead of
n , one can introduce new variables:
I I1 ; . . . ; In 2 R 1
m
1 ; . . . ; m 2 T modd 2; 0 < "  1 # #1 ; . . . ; #r 2 T r modd 2
 1 ; . . . ; mr 2 T mr modd 2
The small parameter " characterizes the amplitude
#i ki ; ; j krj ;
of the perturbation. For " = 0 one gets the
unperturbed system. The equation I = const. sin- Let R be an r  m matrix whose rows are vectors
gles out an invariant m-dimensional torus of the k(i) , 1  i  r. For an approximate description of the
unperturbed system. The motion on this torus is behavior of variables I, #, the averaging principle
quasiperiodic with frequency vector !(I); compo- prescribes replacing system [1] by the system
nents of vector I are called slow variables
whereas components of vector are called fast J_ "G J; ; _ R! J "RF J; 
variables or phases. The right-hand sides of I
mr
system [1] are 2-periodic with respect to all j . It G J; # 2 g J; ; 0 d
T mr
3
is assumed that they are smooth enough functions I
of all arguments. It is also assumed that compo-
F J; # 2mr f J; ; 0 d
nents of the frequency vector are not linearly T mr
dependent over the ring of integer numbers
(one should express g, f through #,  and then
identically with respect to I. System [1] is called
integrate over , d = d1    dmr ). System [3] is
a system with rotating phases.
called partially averaged system for resonances in
In applications, one is often interested mainly in
. Functions G , F can be obtained from Fourier
the behavior of slow variables. The averaging
series expansions of functions g, f for " = 0
principle (or method) consists in replacing the
by throwing away harmonics exp (i(k, )), k 2 =
system of perturbed equations [1] by the averaged
(nonresonant harmonics). Passing from system [1]
system
to system [3] is based on the idea that the ignored
I nonresonant harmonics oscillate fast and do not
J_ "G J; G J 2m g J; ; 0 d 2 affect essentially the evolution of the slow variables.
Tm Now let system [1] be a Hamiltonian system close
to an integrable one. The Hamiltonian function has
for the purpose of providing an approximate
the form
description of the evolution of the slow variables
over time intervals of order 1=" or longer. Here, H H0 p "H1 p; ; y; x; "
d = d1    dm . System [2] contains only slow
where , x are coordinates and p, y are conjugated
variables and, therefore, is much simpler for
to them. The equations of motion have the same
investigation than system [1]. When passing from
form as [1], with I = (p, y, x):
system [1] to system [2], one ignores the terms
g(I, , 0)  G(I) on the right-hand side of [1]. The @H1 @H1
p_ " ; y_ "
averaging principle is based on the idea that these @ @x
terms oscillate and lead only to small oscillations 4
@H1 @H0 @H1
which are superimposed on the drift described by x_ " ; _ "
@y @I @I
the averaged system. To justify the averaging
principle, one should establish a relation between The averaging principle in the case when there are
the behavior of the solutions of systems [1] and [2]. no resonant relations leads to the system
This problem is still far from being completely @H1 @H1
solved. p_ 0; y_ " ; x_ "
@x @y
Another version of the averaging principles is I 5
used in the case when frequencies are approxi- H1 2m H1 p; ; y; x; 0 d
mately in resonance. This means that one or Tm

several relations of the form (k, !) = 0 approxi- Therefore, in this case there is no drift in p, and the
mately are valid with irreducible integer coefficient behavior of y, x is described by the Hamiltonian
vectors k 6 0; here, (k, !) is the standard scalar system, which contains p as a parameter. Equations
product in R m . Let  be a sublattice of the integer of motion of planets around the Sun can be reduced
lattice Zm generated by these vectors. Let to the form [4]. The issue of the absence of the
r = rank  and k(1) , k(2) , . . . , k(m) be a basis in Zm , evolution of momenta p is known in this problem as
228 Averaging Methods

the LagrangeLaplace theorem, about the absence of where gk , k 2 Zm , are Fourier coefficients of func-
the evolution of semimajor axes of planetary orbits. tion g at " = 0, and u01 is an arbitrary function of J. It
is assumed that the denominators in [9] do not
vanish, and that the series in [9] converges and
determines a smooth function. In the same way,
Elimination of Fast Variables, Decoupling
from the other equations in [8] one can sequentially
of Slow and Fast Motions determine F0 , v1 , . . . , Gi , u i1 , Fi , v i1 , i  1.
The basic role in the averaging method is played by On truncating the series in [6] and [7] at the terms
the idea that the exact system can be in the principal of order " l , we obtain a truncated system of the lth
approximation transformed into the averaged sys- approximation. The equation for J is decoupled
tem by means of a transformation of variables close from the other equations and can be solved
to the identical one. The extension of this idea is the separately. Then the behavior of is determined
idea that similar transformation of variables allows by means of quadrature. The behavior of original
one to eliminate, up to an arbitrary degree of variable I in this approximation is a slow drift
accuracy, the fast phases from the right-hand sides (described by the equation for J), on which small
of the equations of perturbed motion and in this oscillations (described by transformation of variables)
way decouple the slow motion from the fast one. are superimposed. The behavior of can be repre-
For system [1], provided there are no resonant sented as a rotation with slowly varying frequency,
relations between frequencies, the elimination of fast on which oscillations are also superimposed. For l = 1,
variables is performed as follows. The desirable the truncated system coincides with the averaged
transformation of variables (I, ) 7! (J, ) is sought system [2].
as a formal series If the sublattice  Zm specifying possible
resonant relations is given, then in an analogous
I J "u1 J; "2 u2 J;    manner one can construct a formal transformation
6
"v1 J; "2 v2 J;    of variables (I, ) 7! (J, ) such that, in the new
variables, the fast phase will appear on the right-
where functions uj , vj are 2-periodic in . The hand sides of the differential equations for the new
transformation [6] should be chosen in such a way variables only in combinations (k, ), with k 2 
that in the new variables the right-hand sides of (see, e.g., Arnold et al. (1988)). Again, on truncat-
equations of motion do not contain fast variables, ing the series on the right-hand sides of the
that is, the equations of motion should have the differential equations for the new variables at the
form terms of order " l , we obtain a truncated system of
J_ "G0 J "2 G1 J    the lth approximation. At l = 1, this truncated
7 system coincides with the partially averaged system
_ ! J "F0 J "2 F1 J   
[3] (for some special choice of arbitrary functions
Substituting [6] into [7], taking into account [1], and that are contained in the formulas for transformation
equating the terms of the same order in ", we obtain of variables). If the original system is a Hamiltonian
the following set of relations: system of the form [4], then the transformation of
variables eliminating the fast phases from the right-
@u1 hand sides of the differential equations can be
G0 J g J; ; 0  !
@ chosen to be symplectic. The corresponding
@! @v1 procedures are called Lindstedt method and
F0 J f J; ; 0 u1  !
@J @ Newcomb method (nonresonant case for n = m),
8 Delaunay method (resonant case for n = m), and
@ui1
Gi J Xi J;  ! von Zeipel method (resonant case for n  m) (see
@
Poincare (1957) and Arnold et al. (1988)).
@! @vi1
Fi J Yi J; ui1  !; i1 The calculation of high-order terms in the
@J @ procedures of elimination of fast variables is rather
The functions Xi , Yi are uniquely determined by the cumbersome. There are versions of these procedures
terms u1 , v1 , . . . , ui , vi in expansion [6]. The first which are convenient for symbolic processors
equation in [8] implies that (especially for Hamiltonian systems, e.g., the
DepritHori method; Giacaglia 1972).
G0 J g0 J G J The averaging method consists in using the
X gk 9
u1 J; expik; u01 J averaged system for the description of motion in
k60
ik; ! the first approximation and the truncated systems
Averaging Methods 229

obtained by means of the procedures of elimination time intervals of order 1=" (Bogolyubov and
of fast variables in the higher approximations, Mitropolskii 1961).
together with the corresponding transformations of If system [1] is a multifrequency system (m  2), but
variables. the vector of frequencies is constant and nonresonant,
then for any  > 0 and small enough " < "0 () it holds
that jI(t)  J(t)j <  for 0  t  K=" (Bogolyubov
Justification of the Averaging Method
1945, Bogolyubov and Mitropolskii 1961). If, in
To justify the averaging method, one should estab- addition, the frequencies satisfy the Diophantine
lish conditions under which the deviation of the condition j(k, !)j > const jkj for all k 2 Zm n{0}
slow variables along the solutions of the exact and some  > 0, then one can choose  = O("). In
system from the solutions of the averaged system this case, higher approximations of the procedure of
with appropriate initial data on time intervals of elimination of fast variables allow one to describe
order 1=" or longer tends to 0 as " ! 0. It is the dynamics with an accuracy of the order of any
desirable to have estimates from the above for these power in " on time intervals of order 1=" (see, e.g.,
deviations. The estimates of deviations of the Arnold et al. (1988)).
solutions of the exact system from the solutions of If the system is a multifrequency system, and
the truncated systems obtained by means of the frequencies are not constant (but depend on the slow
procedure of elimination of fast phases are impor- variables I), then due to the evolution of slow
tant as well. It can happen that there are bad variables the frequencies themselves are evolving
initial data for which the slow component of the slowly. At certain time moments, they can satisfy
solution of the exact system deviates from the certain resonant relations. One of the phenomena
solution of the averaged system by a value of order that can take place here is a capture into a
1 over time of order 1=". In this case, one should resonance; this capture leads to a large deviation of
have estimates from above for the measure of the set the solutions of the exact and averaged systems.
of such bad initial data; on the complementary set However, the general Anosov averaging theorem
of initial data, one should have estimates from (Anosov 1960) implies that if the frequencies ! are
above for the deviation of slow variables along the nonresonant for almost all I, then for any  > 0, the
solutions of the exact system from the solution of inequality jI(t)  J(t)j <  is satisfied for 0  t  K="
the averaged system. These problems are currently for all initial data outside a set E(, ") whose
far from being completely solved. Some general measure tends to 0 as " ! 0.pIn many cases, it
results are described in the following. turns out that mes E(, ") = O( "=) (in particular,
Let functions !, f , g on the right-hand side of the sufficient condition for the last estimate is that
system [1] be defined and bounded together with a rank(@!=@I) = m) (Arnold et al. (1988)).
sufficient number of derivatives in the domain D{I}  The knowledge about averaging in two-
T m {}  [0, "0 ]. Let J(t) be the solution of the frequency systems (m = 2) on time intervals, of order
averaged system [2] with initial condition I0 2 D. of 1=", is relatively more complete (see Arnold
Let (I(t), (t)) be the solution of the exact system [1] (1983), Arnold et al. (1988), and Lochak and
with initial conditions (I0 , 0 ). So, I(0) = J(0). It is Meunier (1988)). For Hamiltonian and reversible
assumed that the solution J(t) is defined and stays at systems, the justification of the averaging method is
a positive distance from the boundary of D on the a by-product of KolmogorovArnoldMoser (KAM)
time interval 0  t  K=", K = const > 0. theory. The KAM theory provides estimates of the
If system [1] is a one-frequency system (m = 1), difference between the solutions of the exact and
and the frequency ! does not vanish in D, then for averaged systems for majority of initial data on
0  t  K=" the solution (I(t), (t)) is well defined, infinite time interval 1 < t < 1. For remaining
and jI(t)  J(t)j < C", C = const. > 0. For ! = 1, this data this difference can grow because of Arnold
assertion was proved by P Fatou (1928) and, by a diffusion, but, in general, very slowly. According to
different method, by L I Mandelshtam and L D the Nekhoroshev theorem, this difference is small on
Papaleksi (1934). This was historically the time intervals whose length grows exponentially when
first result on the justification of the averaging the perturbation decays linearly (for an analytic
method (Mintropolskii 1971). There is a proof Hamiltonian if the unperturbed Hamiltonian is a
based on the elimination of fast variables (see, e.g., generic function, the so-called steep function).
Arnold (1983)). For a one-frequency system, higher Another aspect of justification of the averaging
approximations of the procedure of elimination of method is establishing relations between invariant
fast variables allow the description of the dynamics manifolds of the exact and averaged systems.
with an accuracy of the order of any power in " on Consider, in particular, the case of a one-frequency
230 Averaging Methods

system and a multifrequency system with constant first theorem, if (t), 0  t  K=", is a solution of
Diophantine frequencies. Suppose that the averaged the averaged system, and x(t) is a solution of the
system has an equilibrium such that real parts of all exact system with initial condition x(0) = (0), then
its eigenvalues are different from 0, or a limit cycle for any  > 0 there exists "0 () > 0 such that
such that the absolute values of all but one of its jx(t)  (t)j <  for 0  t  K=" and 0 < " < "0 ().
multipliers are different from 1. Then the exact The second and the third Bogolyubov theorems
system has an invariant torus, respectively, m- or describe the motion in the neighborhoods of
(m 1)-dimensional, whose projection onto the equilibria and the limit cycles of the averaged
space of the slow variables is O(")-close to the system. In particular, if for an equilibrium real
equilibrium (cycle) of the averaged system. This parts of all its eigenvalues are different from 0, or,
torus is stable or unstable together with the for a limit cycle, the absolute values of all but one
equilibrium (cycle) of the averaged system. For multipliers are different from 1, then the exact
Hamiltonian and reversible systems, the problem of system has a solution which eternally stays near
invariant manifolds is considered in the framework this equilibrium (cycle). The stability properties of
of the KAM theory. this solution are the same as the stability properties
of the corresponding equilibrium (cycle) of the
averaged system.
Averaging in Bogolyubovs Systems For systems of the form [10] a procedure exists
that, similarly to the procedure in the section
Systems in the standard form of Bogolyubov (1945)
Elimination of fast variables, decoupling of slow
are of the form
and fast motions, allows us to eliminate time t
x_ "Xt; x; "; x 2 R p; 0 < "  1 10 from the right-hand side of the system with an
accuracy of the order of any power in " by means of
It is assumed that the function X, besides the usual a transformation of variables. (To perform this
smoothness conditions, satisfies the condition of procedure, one should assume that the conditions
uniform average: the limit (time average) of uniform average are satisfied for functions
that arise in the process of constructing higher
Z T
1 approximations in this procedure (Bogolyubuv and
X0 x lim Xt; x; 0 dt 11
T!1 T 0
Mitropolskii 1961).) In the first approximation,
such a transformation of variables transforms the
exists uniformly in x. The averaging principle of original system into the averaged one.
Bogolyubov consists of the replacement of the The condition of uniform average is very impor-
original system in standard form by the averaged tant for theory. If the limit in [11] exists, but
system convergence is nonuniform in x, then the time
average X0 could be, for example, a discontinuous
_ " X0  12
function of x, and the averaged system would not be
with a goal to provide an approximate description well defined.
of the behavior of x. This approach generalizes the
approach of the section Averaging principle for
the case of constant frequencies (! = const). Upon
Averaging in SlowFast Systems
introducing in the given system with constant
frequencies the deviation from uniform rotation Systems of the form [1] are particular cases of the
=  !t and denoting x = (I, ), we obtain a systems of the form
system in the standard form [10]. Here the condition
x_ f x; y; "; y_ "gx; y; " 13
of uniform average is fulfilled because X(t, x, 0) is a
quasiperiodic function of time t. The averaged which are called slowfast systems (or systems
system [12] for nonresonant frequencies coincides with slow and fast motions, with slow and fast
with the averaged system [2]; for resonant frequen- variables). The generalization of the approach of the
cies, it coincides with the partially averaged system section Averaging principle for these systems is
[3] (one should only supply systems [2] and [3] with the following averaging principle of Anosov (1960).
equations for some components of the vector  !t In the system [6], let x 2 M, y 2 Rn , where M is a
that do not enter into the right-hand side of the smooth compact m-dimensional manifold. At " = 0,
averaged system). the system for fast variables x contains slow
The averaging principle of Bogolyubov is justified variables y as parameters. Assume that this system
by three Bogolyubov theorems. According to the (which is called fast system) has a finite smooth
Averaging Methods 231

invariant measure
y and is ergodic for almost all
values of y. Introduce the averaged system
Z See also: Central Manifolds, Normal Forms;
_Y " GY; GY 1 gx; Y; 0d
Y Diagrammatic Techniques in Perturbation Theory;

Y M M Hamiltonian Systems: Stability and Instability Theory;
KAM Theory and Celestial Mechanics; Multiscale
According to the averaging principle, one should use
Approaches; Random Walks in Random Environments;
the solution Y(t) of the averaged system with initial
Separatrix Splitting; Stability Problems in Celestial
condition Y(0) = y(0) for approximate description of Mechanics; Stability Theory and KAM.
slow motion y(t) in the original system. This
averaging principle is justified by the following
Anosov theorem [1]: for any positive  the measure Further Reading
of the set E(, ") of initial data (from a compact in Anosov DV (1960) Averaging in systems of ordinary differential
the phase space) such that equations with rapidly oscillating solutions. Izvestiya Akade-
mii Nauk SSSR, Ser. Mat. 24(5): 721742 (Russian).
max j yt  Ytj >  Arnold VI (1983) Geometrical Methods in the Theory
0  t  1="
of Ordinary Differential Equations. New YorkBerlin:
tends to 0 as " ! 0. Springer.
The particular case when the original system is Arnold VI, Kozlov VV, and Neishtadt AI (1988) Mathematical
Aspects of Classical and Celestial Mechanics, Encyclopaedia
a Hamiltonian system depending on slowly vary-
of Mathematical Sciences, vol. 3. Berlin: Springer.
ing parameter = "t, and for almost all values of Bakhtin VI (2004) Cramer asymptotics in the averaging method
the motion of the system with = const is for systems with fast hyperbolic motions. Proceedings of the
ergodic on almost all energy levels, is considered Steklov Institute of Mathematics 244(1): 79.
in Kasuga (1961). Bogolyubov NN (1945) On some statistical methods in mathe-
matical physics. Akad. Nauk USSR. Lvov (Russian).
For the case when the has strong mixing proper- Bogolyubov NN and Mitropolskii YuA (1961) Asymptotic
ties, see Bakhtin (2004) and Kifer (2004). Methods in the Theory of Nonlinear Oscillations. New York:
For slowfast systems, there is also a general- Gordon and Breach.
ization of approach of the previous section that uses Giacaglia GEO (1972) Perturbation Methods in Nonlinear
time averaging and the condition of uniform average Systems, Applied Mathematical Science, vol. 8. Berlin: Springer.
Kasuga T (1961) On the adiabatic theorem for the
(Volosov 1962).
Hamiltonian system of differential equations in the classical
mechanics I, II, III. Proceedings of the Japan Academy 37(7):
366382.
Kevorkian J and Cole JD (1996) Multiple Scale and Singular
Applications of the Averaging Method Perturbations Methods, Applied Mathematical Sciences,
vol. 114. New York: Springer.
The averaging method is one of the most productive
Kifer Y (2004) Some recent advances in averaging. In: Modern
methods of perturbation theory, and its applications Dynamical Systems and Applications, 403. Cambridge:
are immense. It is widely used in celestial mechanics Cambridge University Press.
and space flight dynamics for the description of the Lochak P and Meunier P (1988) Multiphase Averaging for
evolution of motions of celestial bodies, in plasma Classical Systems, Applied Mathematical Sciences, vol. 72.
New York: Springer.
physics and theory of accelerators for description of Mitropolskii YuA (1971) Averaging Method in Nonlinear
motion of charged particles, and in radio engineer- Mechanics. Kiev: Naukova Dumka (Russian).
ing for the description of nonlinear oscillatory Poincare H (1957) Les Methodes Nouvelles de la Mecanique
regimes. There are also applications in hydrody- Celeste, vols. 13. New York: Dover.
namics, physics of lasers, optics, acoustics, etc. (see Sanders JA and Verhulst F (1985) Averaging Methods in
Nonlinear Dynamical Systems, Applied Mathematical
Arnold et al. (1988), Bogolyubov and Mitropolskii
Sciences, vol. 59. New York: Springer.
(1961), Lochak and Meunier (1988), Mitropolskii Volosov VM (1962) Averaging in systems of ordinary differential
(1971), and Volosov (1962)). equations. Russian Mathematical Surveys 17(6): 1126.
232 Axiomatic Approach to Topological Quantum Field Theory

Axiomatic Approach to Topological Quantum Field Theory


C Blanchet, Universite de Bretagne-Sud, Vannes, 2. Functoriality If a cobordism (W, X, Z) is
France obtained by gluing two cobordisms (M, X, Y) and
V Turaev, IRMA, Strasbourg, France (M0 , Y 0 , Z) along a diffeomorphism f : Y ! Y 0 , then
2006 Elsevier Ltd. All rights reserved. the following diagram is commutative:
W
VX ! VZ

Introduction M# # M0


f]
The idea of topological invariants defined via path VY ! VY 0
integrals was introduced by A S Schwartz (1977) in a
special case and by E Witten (1988) in its full 3. Normalization For any n-dimensional manifold
power. To formalize this idea, Witten (1988) X, the linear map
introduced a notion of a topological quantum field
theory (TQFT). Such theories, independent of 0; 1  X : VX ! VX
Riemannian metrics, are rather rare in quantum is identity.
physics. On the other hand, they admit a simple 4. Multiplicativity There are functorial
axiomatic description first suggested by M Atiyah isomorphisms
(1989). This description was inspired by G Segals
(1988) axioms for a two-dimensional conformal VX q Y  VX  VY
field theory. The axiomatic formulation of TQFTs V;  k
makes them suitable for a purely mathematical
research combining methods of topology, algebra, such that the following diagrams are commutative:
and mathematical physics. Several authors explored VX q Y q Z  VX  VY  VZ
axiomatic foundations of TQFTs (see Quinn (1995) # #
and Turaev (1994). VX q Y q Z  VX  VY  VZ

VX q ;  VX  k
Axioms of a TQFT # #
VX VX
An (n 1)-dimensional TQFT (V, ) over a scalar
field k assigns to every closed oriented n-dimen- Here  = k is the tensor product over k. The
sional manifold X a finite-dimensional vector space vertical maps are respectively the ones induced
V(X) over k and assigns to every cobordism by the obvious diffeomorphisms, and the stan-
(M, X, Y) a k-linear map dard isomorphisms of vector spaces.
5. Symmetry The isomorphism
M M; X; Y : VX ! VY
VX q Y  VY q X
Here a cobordism (M, X, Y) between X and Y is a
compact oriented (n 1)-dimensional manifold M induced by the obvious diffeomorphism corre-
endowed with a diffeomorphism @M  X q Y (the sponds to the standard isomorphism of vector
overline indicates the orientation reversal). All spaces
manifolds and cobordisms are supposed to be
smooth. A TQFT must satisfy the following axioms. VX  VY  VY  VX

1. Naturality Any orientation-preserving diffeo- Given a TQFT (V, ), we obtain an action of the
morphism of closed oriented n-dimensional mani- group of diffeomorphisms of a closed oriented
folds f : X ! X0 induces an isomorphism f] : V n-dimensional manifold X on the vector space
(X)! V(X0 ). For a diffeomorphism g between the V(X). This action can be used to study this group.
cobordisms (M, X, Y) and (M0 , X0 , Y 0 ), the follow- An important feature of a TQFT (V, ) is that it
ing diagram is commutative: provides numerical invariants of compact oriented
(n 1)-dimensional manifolds without boundary.
gjX ]
VX ! VX0 Indeed, such a manifold M can be considered as a
cobordism between two copies of ; so that (M) 2
M # # M0 Homk (k, k) = k. Any compact oriented (n 1)-
gjY ]
VY ! VY 0 dimensional manifold M can be considered as a
Axiomatic Approach to Topological Quantum Field Theory 233

cobordism between ; and @M; the TQFT assigns to circles S1 q S1 and one circle S1 ) defines a commu-
this cobordism a vector (M) in Homk (k, tative multiplication on the vector space A = V(S1 ).
V(@M)) = V(@M) called the vacuum vector. The 2-disk, considered as a cobordism between S1
The manifold [0, 1]  X, considered as a cobord- and ;, induces a nondegenerate trace on the algebra
ism from X q X to ; induces a nonsingular pairing A. This makes A into a commutative Frobenius
algebra (also called a symmetric algebra). This
VX  VX ! k algebra completely determines the TQFT (V, ).
We obtain a functorial isomorphism V(X) = Moreover, this construction defines a one-to-one
V(X) = Homk (V(X), k). correspondence between equivalence classes of two-
We now outline definitions of several important dimensional TQFTs and isomorphism classes of
classes of TQFTs. finite dimensional commutative Frobenius algebras
If the scalar field k has a conjugation and all the (Kock 2003).
vector spaces V(X) are equipped with natural The formalism of TQFTs was to a great extent
nondegenerate Hermitian forms, then the TQFT motivated by the three-dimensional case, specifi-
(V, ) is Hermitian. If k = C is the field of complex cally, Wittens ChernSimons TQFTs. A mathema-
numbers and the Hermitian forms are positive tical definition of these TQFTs was first given
definite, then the TQFT is unitary. by Reshetikhin and Turaev using the theory of
A TQFT (V, ) is nondegenerate or cobordism quantum groups. The WittenReshetikhinTuraev
generated if for any closed oriented n-dimensional three-dimensional TQFTs do not satisfy exactly the
manifold X, the vector space V(X) is generated by definition above: the naturality and the functoriality
the vacuum vectors derived as above from the axioms only hold up to invertible scalar factors
manifolds bounded by X. called framing anomalies. Such TQFTs are said to
Fix a Dedekind domain D  C. A TQFT (V, ) be projective. In order to get rid of the framing
over C is almost D-integral if it is nondegenerate and anomalies, one has to add extra structures on the
there is d 2 C such that d(M) 2 D for all M with three-dimensional cobordism category. Usually one
@M = ;. Given an almost integral TQFT (V, ) and a endows surfaces X with Lagrangians (maximal
closed oriented n-dimensional manifold X, we define isotropic subspaces in H1 (X; R)). For 3-cobordisms,
S(X) to be the D-submodule of V(X) generated by all several competing but essentially equivalent
the vacuum vectors. This module is preserved under additional structures are considered in the literature:
the action of self-diffeomorphisms of X and yields a 2-framings (Atiyah 1989), p1 -structures (Blanchet
finer arithmetic version of V(X). et al. 1995), numerical weights (K Walker, V Turaev).
The notion of an (n 1)-dimensional TQFT over Large families of three-dimensional TQFTs are
k can be reformulated in the categorical language as obtained from the so-called modular categories.
a symmetric monoidal functor from the category of The latter are constructed from quantum groups at
n-manifolds and (n 1)-cobordisms to the category roots of unity or from the skein theory of links.
of finite-dimensional vector spaces over k. The See Quantum 3-Manifold Invariants.
source category is called the (n 1)-dimensional
cobordism category. Its objects are closed oriented
n-dimensional manifolds. Its morphisms are cobord- Additional Structures
isms considered up to the following equivalence:
The axiomatic definition of a TQFT extends in
cobordisms (M, X, Y) and (M0 , X, Y) are equivalent
various directions. In dimension 2 it is interesting to
if there is a diffeomorphism M ! M0 compatible
consider the so-called openclosed theories involving
with the diffeomorphisms @M  X q Y  @M0 .
1-manifolds formed by circles and intervals and
two-dimensional cobordisms with boundary
(G Moore, G Segal). In dimension 3 one often
TQFTs in Low Dimensions
considers cobordisms including framed links and
TQFTs in dimension 0 1 = 1 are in one-to-one graphs whose components (resp. edges) are labeled
correspondence with finite-dimensional vector with objects of a certain fixed category C. In such a
spaces. The correspondence goes by associating theory, surfaces are endowed with finite sets of
with a one-dimensional TQFT (V, ) the vector points labeled with objects of C and enriched with
space V(pt) where pt is a point with positive tangent directions. In all dimensions one can study
orientation. manifolds and cobordisms endowed with homotopy
Let (V, ) be a two-dimensional TQFT. The linear classes of mappings to a fixed space (homotopy
map  associated with a pair of pants (a 2-disk with quantum field theory, in the sense of Turaev).
two holes considered as a cobordism between two Additional structures on the tangent bundles spin
234 Axiomatic Quantum Field Theory

structures, framings, etc. may be also considered Blanchet C, Habegger N, Masbaum G, and Vogel P (1995)
provided the gluing is well defined. Topological quantum field theories derived from the Kauff-
man bracket. Topology 34: 883927.
Kock J (2003) Frobenius Algebras and 2D Topological Quantum
See also: Braided and Modular Tensor Categories; Hopf
Field Theories. LMS Student Texts, vol. 59. Cambridge:
Algebras and q-Deformation Quantum Groups; Indefinite Cambridge University Press.
Metric; Quantum 3-Manifold Invariants; Topological Quinn F (1995) Lectures on axiomatic topological quantum field
Gravity, Two-Dimensional; Topological Quantum Field Freed DS and Uhlenbeck KK (eds.) Geometry and Quantum
Theory: Overview. Field Theory, pp. 325453. IAS/Park City Mathematical Series,
University of Texas, Austin: American Mathematical Society.
Segal G (1988) Two-dimensional conformal field theories and
modular functors. In: Simon B, Truman A, and Davies IM
Further Reading (eds.) IXth International Congress on Mathematical Physics,
pp. 2237. Bristol: Adam Hilger Ltd.
Atiyah M (1989) Topological Quantum Field Theories. Publica- Turaev V (1994) Quantum Invariants of Knots and 3-Manifolds.
tions Mathematiques de lIhes 68: 175186. de Gruyter Studies in Mathematics, vol. 18. Berlin: Walter de
Bakalov B and Kirillov A Jr. (2001) Lectures on Tensor Gruyter.
Categories and Modular Functors. University Lecture Series Witten E (1988) Topological quantum field theory. Communica-
vol. 21. Providence, RI: American Mathematical Society. tion in Mathematical Physics 117(3): 353386.

Axiomatic Quantum Field Theory


B Kuckert, Universitat Hamburg, Hamburg, Germany (in 1 3 spacetime dimensions). So, the develop-
2006 Elsevier Ltd. All rights reserved. ment of alternatives and modifications of the setting
got into the focus of the theory, and the axioms
themselves became the objects of research. Their
role as axioms understood in the common sense
Introduction
turned into the role of mere properties of quantum
The term axiomatic quantum field theory sub- fields. Today, the term axiomatic quantum field
sumes a collection of research branches of quantum theory is widely avoided for this reason.
field theory analyzing the general principles of In a long list of publications spread over the
relativistic quantum physics. The content of the 1960s, Araki, Borchers, Haag, Kastler, and others
results typically is structural and retrospective rather worked out an algebraic approach to quantum field
than quantitative and predictive. theory in the spirit of Segals postulates for general
The first axiomatic activities in quantum field theory quantum Mechanics (1947) (see Algebraic Approach
date back to the 1950s, when several groups started to Quantum Field Theory).
investigating the notion of scattering and S-matrix in The Wightman setting was the basis of a frame-
detail (Lehmann, Symanzik, and Zimmermann 1955 work into which the causal construction of the
(LSZ-approach), Bogoliubov and Parasiuk 1957, Hepp S-matrix developed by Stuckelberg (1951) and
and Zimmermann (BPHZ-approach), Haag 195759 Bogoliubov and Shirkov (1959) has been fitted by
and Ruelle 1962 (HaagRuelle theory) (see Scattering, Epstein and Glaser (1973). The causality principle
Asymptotic Completeness and Bound States and fixes the time-ordered products up to a finite
Scattering in Relativistic Quantum Field Theory: number of parameters at each order, which are to
Fundamental Concepts and Tools). be put in as the renormalization constants.
Wightman (1956) analyzed the properties of the Already in 1949, Dyson had seen that problems in
vacuum expectation values used in these approaches the formulation of quantum electrodynamics (QED)
and formulated a system of axioms that the vacuum could be avoided by just multiplying the time
expectation values ought to satisfy in general. Together variable and, correspondingly, the energy variable by
with Garding (1965), he later formulated a system of the imaginary unit constant (Wick rotation). Schwin-
axioms in order to characterize general quantum fields ger then investigated time-ordered Green functions of
in terms of operator-valued functionals, and the two QED in this Euclidean setting. This approach was
systems have been found to be equivalent. formulated in terms of axioms by Osterwalder and
A couple of spectacular theorems such as the PCT Schrader (1973, 1975) (see Euclidean Field
theorem and the spinstatistics theorem have been Theory).
obtained in this setting, but no interacting quantum Other extensions of the aforementioned settings
fields satisfying the axioms have been found so far are objects of current research (see Indefinite Metric,
Axiomatic Quantum Field Theory 235

Quantum Field Theory in Curved Spacetime, Continuity as a distribution For all ,  2 D, the
13
Symmetries in Quantum Field Theory of Lower linear functionals T, ,  on C1
0 (R ) defined by
Spacetime Dimensions, and Thermal Quantum Field
a
Theory). T;; : h; Fa i

are distributions. They can be extended to tempered


distributions.
Quantum Fields
The Fourier transform of a tempered distribution
Garding and Wightman characterized operator- is well defined as a tempered distribution. It is
valued quantum fields on the Minkowski spacetime mainly due to the importance of Fourier transforma-
R13 by a couple of axioms. Given additional tions that the preceding assumption is convenient.
assumptions concerning the high-energy behavior, Bogoliubov et al. (1975) remark that the assumption
the GardingWightman fields are in oneone corre- is not a mere technicality, since it rules out
spondence with algebraic field theories. nonrenormalizable quantum fields.
Without specifying or presupposing these addi-
tional assumptions, the axioms will now be for- Microcausality (BoseFermi alternative) If and
mulated and discussed in detail and compared to the are test functions with spacelike separated support,
corresponding conditions in the algebraic setting. then
Adjoint operators are marked by an asterisk, and
Einsteins summation convention is used. Fa Fb jD  Fb Fa jD :
Operator-valued functionals The components of a
The sign depends on the statistics of the fields, it
field F are an n-tuple F1    Fn of linear maps that
13 is  if and only if both F a and Fb are fermion
assign to each test function 2 C1 0 (R ) linear
fields.
operators F1 ()    Fn () in a Hilbert space H with
Microcausality is closely related to Einstein
domains of definition D(F1 ())    D(Fn ()). There
causality. Einstein causality requires that any two
exists a dense subspace D of H with
observables located in spacelike separated regions
D  D(F ()) \ D(F () ) and F ()D [ F () D  D
commute in the strong sense, that is, their spectral
for all indices . Consider m such fields F 1    Fm
measures commute. But fields with FermiDirac
with components Fa , 1  a  m, 1    na . Assume
statistics are not observables, and not even for Bose
there to be an involution  : (1    m) ! (1    m) such
 Einstein fields with self-adjoint field operators does
that Fa () = Fa () , where (x) : (x).
the above condition imply that the spectral projec-
Quantum fields cannot be operator-valued func- tions commute, which is the criterion for commen-
tions on R13 if one wants them to exhibit (part of) surability. The sign on the right-hand side does,
the properties to follow. But point fields can be however, specify the statistics of the field.
quadratic forms; typically this is the case for fields in This is a crucial difference with the algebraic
a Fock space. approach. If O and P are spacelike separated open
For each component Fa and each open region regions and if A 2 A (O) and B 2 A (P), then one
O  R13 , the field operators Fa () with supp  O assumes, like in the above case, that AB = BA
generate a  -algebra F a (O) of operators defined on (locality). But being elements of C -algebras, A and
D. These operators typically are unbounded, which B are bounded operators (or can be represented
is one of the differences with the traditional setting accordingly), so if A and B are self-adjoint, they are,
of the algebraic approach. There a C -algebra A (O) indeed, commensurable.
is assigned to each open region O in such a way Doplicher, Haag, and Roberts (1974) and Buch-
that O  P implies A (O)  A (P). Each C -algebra holz and Fredenhagen (1984) have derived from this
is a  -algebra, but in contrast to a C -algebra, input of observables a field structure of localized
a -algebra does not need to be endowed with a particle states, and they showed that the statistics of
norm. The fundamental observables in quantum these fields is BoseEinstein, FermiDirac, or some
theory are bounded positive operators (typically, but corresponding parastatistics (which is, a priori,
not always, projections), and these generate a C - forbidden if one assumes microcausality).
algebra. Recall that the unimodular group SL(2, C) is
There is no fundamental physical motivation for isomorphic to the universal covering group of
confining the setting to fields with a finite number of the restricted Lorentz group L" (the connected
components, except that it includes most of the component containing the unit element). Denote by
fields known from daily life.  : SL(2, C) ! L" a covering map.
236 Axiomatic Quantum Field Theory

Covariance There exist strongly continuous uni- (using the nuclear theorem) and
tary representations U and T of SL(2, C) and
wa11 
 aN a1  aN
N : h; F1  N i: 1

(R13 , ), respectively, and representations


D1    Dm of SL(2, C) in Cn1    Cnm , respectively, These distributions are called the N-point func-
such that tions of the fields F 1    F m and yield the vacuum
UgFa Ug Da g1  Fa g1  expectation values of the theory. It is straightfor-
ward to deduce the following properties from the
and GardingWightman axioms.
TyFa Ty Fa   y; Microcausality (BoseFermi alternative) If i and
i1 have spacelike separated supports, then
where Da (g1 ) are the elements of the matrix
Da (g1 ). Dropping coordinate indices, this reads wa11 
 ai ai1  aN
i i1  N 1    i i1    N

UgF a Ug Da g1 F a g1   wa11 


 ai1 ai  aN
i1 i  N 1    i1 i    N :

and or dropping coordinate indices,

TyFa Ty F a   y: wa1  ai ai1  aN 1    i i1    N


The representations U and T generate a representa-  wa1  ai1 ai  aN 1    i1 i    N :
tion of the universal covering of the restricted
Poincare group. Invariance For all g 2 SL(2, C) and y 2 R13 , one has

As it stands, this assumption is a very strong one, wa11  aN


 N 1    N
since it manifestly fixes the action of the representa-
Da1 g1 11    DaN g1 NN
tion on the field operators. In the algebraic
approach, the covariance assumption is more mod- wa11 
 aN
N g1    gN
estly formulated. Namely, it is assumed that wa11  aN
 N 1   y    N   y
U(g)A (O)U(g) = A ((g)O) and T(y)A (O)T(y) =
A (O y), leaving open how the representation acts or dropping coordinate indices,
on the single local observables.
wa1  aN 1    N
Vacuum vector There exists a unique (up to a  
multiple) vector  2 D that is invariant under the Da1 g1    DaN g1
representations U and T and cyclic with respect to wa1  aN g1    gN
the algebra F (R13 ) generated by all field operators
wa1  aN 1   y    N   y:
Fa (), that is, F (R13 ) = H.
By translation invariance, the N-point functions
Spectrum condition The joint spectrum of the
wa11 
aN
x1    xN only depend on the N  1 relative-
components of the 4-momentum, i.e., of the gen- N
position vectors 1 : x1  x2 , 2 : x2  x3 ; . . . ,
erators of the spacetime translations, has support in
N1 : xN1  xN . This means that there are distribu-
the closed forward light cone V , that is, the set
tions Wa11 N
on R13 N1 related to the N-point
{k2  0, k0  0}. N
functions by the symbolic condition
The existence of an invariant ground state called
the vacuum is standard in algebraic quantum field wa11 
aN
N
x1    xN Wa11
aN
N
1    N1 :
theory as well. In precise notation, this reads
Z
a1 aN
w1 N Wa11
aN
N
x dx;
13
N-Point Functions
where
Consider the above fields F 1    F m . For each N 2 N
and each N-tuple (a1    aN ) of natural numbers  m x 1    N1 : x; x  1 ; x  1  2 ; . . . ; x  1
(labeling fields), define families (F a1  aN ) :
     N1 :
(Fa11 
 aN
N )i  nai and w
a1  aN
: (wa11 
 aN
N )i  nai of dis-
13 N
tributions on (R ) by The functions Wa11
aN
N
are called the Wightman
functions, and they have the following property
Fa11 
 aN a1 aN
N 1    N : F1 1    FN N because of the spectrum condition of the field.
Axiomatic Quantum Field Theory 237

Spectrum condition The support of the Fourier and an antilinear involution  by ( , ) : (  ,  ).
transform of each Wa11  aNN is contained in (V )N1 . This endows BI with the structure of a nonabelian
-algebra with unit element 1 = (1, ;) (Borchers
The uniqueness of the vacuum vector (up to a
algebra).
phase) is equivalent to the following condition.
If one defines F; (z) : z1, then w; (z) = z, and the
Cluster property For N  2, let x be a spacelike Wightman functions induce a C-linear functional !
vector in R13 , let L be a natural number < N, and on BI by
let and be tempered test functions on (R13 )L
13 NL ! ;  : w 2

and (R ) , respectively. then


! exhibits the following two properties, which are
lim wa11 ...
 aN
N
  x the announced additional conditions required for
0<!1
wa11 
 aL aL1  aN reconstructing the fields from the N-point functions.
L wL1  N :
Hermiticity !( ) = !():
On the one hand, these properties have been
deduced from the GardingWightman axioms via Positivity !( )  0:
eqn [1]. Conversely, a family of distributions To see Hermiticity, compute
labeled in the above fashion and satisfying the

above properties may be used to construct a ! ;  h; F 
i
GardingWightman field theory provided that two hF ; i ! ; 
more conditions which hold for all systems of
N-point functions are satisfied. This requires and use C-linearity to prove the statement for
some elementary notation. arbitrary  2 B. For positivity, write any  as a finite
Define the index sets sum  = ( 1 , 1 )    ( M , M ), and compute
  !
a1    aN XM
I N : : 1  ai  m; 1  i  nai 
!  ! 
i ; i j ; j
1     N
 i;j1
!
for all 1  i  N ; N 2 N X 
 
! i j ; i j
S ij
I 0 : {;}, and I : N 2 N0 I N . On I a concatena- X  

tion is defined by wi j i j
      ij
a1    a N b 1    bM a 1    a N b 1    bM X  

: h; Fi j i j i
1    N 1    M 1    N 1    M ij
X  

and h; Fi i Fj j i
ij
;  :  ; :  X
hFi i ; Fj j i
and an involution  by ij
 2
     X 
 Fi i 
a1    aN
:
aN    a1
and ; : ;:
   0:
1    N  N    1 i

Define an antilinear involution  on S N : Theorem 1 (Wightmans reconstruction theorem).


S((R13 )N ) by Let m and n1    nm be natural numbers, let
I 0 , I 1 , I 2 , . . . , and I be the above index sets, and
x1    xN : xN    x1
let BI be the above Borchers algebra. Let D1    Dm
for each N 2 N. Put S 0 : C and z : z for all be matrix representations of SL(2, C) in Cn1    Cnm ,
z 2 C. respectively.
S
Define S I N : S N I N , and S I : N S I N . For For each natural number N, let (w ) 2 I N be a
 13 N
each  2 I N , the set S : S((R ) ) L {} is a family of distributions on (R 13 )N . Suppose the
linear space. On the direct sum BI :  2 I S  family (w ) 2 I defined this way satisfies microcaus-
define an associative product by ality, covariance, spectrum condition, and the
cluster property. If the linear functional ! defined
; ;  : ;   on BI by eqn [2] is Hermitian and positive, then
238 Axiomatic Quantum Field Theory

there is (up to unitary equivalence) a unique family unitary operators implementing the Lorentz boosts on
F 1    F m of GardingWightman fields with n1    nm the fields are elements of modular groups. This means
components such that eqn [1] holds. that a uniformly accelerated observer perceives the
vacuum as a thermal state with a temperature
The proof uses the GNS construction known from
proportional to its acceleration, corresponding to the
the theory of operator algebras. The Borchers
famous Unruh effect.
algebra plays several roles. On the one hand, it is a
In addition, it was shown that P1 CT symmetries
linear space with an inner product. The Hilbert
(i.e., PCT combined with rotations by the angle ) are
space H and the invariant space D of the field theory
implemented by modular conjugations (modular P1 CT
are constructed from this structure. On the other
symmetry). Modular P1 CT symmetry is a consequence
hand, the Borchers algebra acts on itself as an
of the Unruh effect (Guido and Longo 1995).
algebra of linear operators by its own algebra
multiplication. This is the structure the -algebra of
Spin and Statistics
field operators is constructed from.
Immediately following Luderss PCT theorem, the
spinstatistics theorem was proved for the N-point
Results
functions of the Wightman setting (Luders and
The mathematical and structural analysis of quan- Zumino 1958, Burgoyne 1958, DellAntonio 1961).
tum fields has improved the understanding of This was a remarkable and widely acknowledged
scattering theory in the different approaches men- progress. But as remarked earlier, the confinement to
tioned above; see Bogoliubov et al. (1975) and the finite-component fields, which is used in the proof,
relevant articles in this encyclopedia. Apart from cannot be motivated by physical first principles (i.e., in
this, the following results deserve to be mentioned. a truly axiomatic fashion). The representation D of
Evidently, many others have to be omitted for SL(2, C) acting on the components, however, is forced
practical reasons. to be finite dimensional by this assumption, and since
the representations Da are objects of investigation, a
PCT Symmetry considerable part of the result is assumed this way
An early famous result was Luderss proof (1957) from the outset. Even more so, there are examples of
that all fields in the above setting exhibit PCT fields with a wrong spinstatistics connection and
symmetry, that is, the symmetry under reflections in infinitely many components.
all space and time variables combined with a charge This was one reason to continue working on the
conjugation. This symmetry is exhibited by all subject. At the beginning of the 1990s, it was found
particle reactions observed so far. The proof, like that the spinstatistics theorem can be derived from
several of the main results, made extensive use of the the symmetries discovered by Bisognano and Wich-
fact that the N-point functions are boundary values mann, and Unruh. Two approaches not referring to
of analytic functions due to the spectrum condition, the number of internal degrees of freedom have been
and that a fundamental theorem by Bargmann, Hall, worked out: one assumes the Unruh effect (Guido
and Wightman (1957) yields invariant analytic and Longo 1995), the other modular P1 CT symme-
extensions. try (Kuckert 1995, 2005, Kuckert and Lorenzen
2005). The first approach has been generalized to
ReehSchlieder Theorem conformal fields, the second to the case that the
For each field Fa and each bounded open region symmetry groups homogeneous part is not SL(2, C),
O  R13 , the vacuum vector is cyclic with respect but only SU(2).
to F a (O) (Reeh and Schlieder 1961). So excitations Both approaches can be applied to infinite-
of the vacuum vector by field operators located in O component fields. They yield existence theorems; a
are not to be considered as state vectors of a particle distinguished representation is constructed from the
localized in O, since they are not perpendicular to modular symmetries, and this representation exhib-
the excitations by field operators located outside O. its Paulis spinstatistics connection. As mentioned
before, nothing more can be expected at this level of
Unruh Effect and Modular P1 CT Symmetry generality. The line of argument works in both the
algebraic and the Wightman setting.
In the 1970s, Bisognano and Wichmann (1975, 1976)
discovered a surprising link of symmetries to the
A Dynamical Property of the Vacuum
intrinsic algebraic structure of quantum fields, which is
established by the TomitaTakesaki modular theory One can derive the spectrum condition, the Bisog-
(see TomitaTakesaki Modular Theory). Namely, the nanoWichmann symmetries/the Unruh effect, and
Axiomatic Quantum Field Theory 239

covariance from the condition that no (inertial or) (and, hence, also special) relativity have to satisfy to
uniformly accelerated observer can extract mechan- ensure causality. But the conflict can be solved by
ical energy from the field in vacuo by means of a smearing the densities out in space or time, as has
cyclic process (Kuckert 2002). first been realized by Ford (1991). The extent to
which the energy density can become negative
Interacting Fields depends on the extent to which it is smeared out:
more smearing means less violation of positivity,
The examples of interacting quantum fields that fit
so the classical positivity conditions are restored at
into the above settings live in one or two spatial
medium and large scales. There are many ways to
dimensions only, and their relevance for physics
make this principle concrete. Quantum energy
mainly consists in being such examples. This
inequalities hold for thermodynamically well-
has contributed to some frustration and to doubts
behaved quantum fields on causally well-behaved
on whether one is not, in fact, proving theorems on
classical spacetime backgrounds.
pretty empty sets, or in other words, working on
the most sophisticated theory of the free field.
The computations in quantum field theory are, like Bibliographic Notes
most of the computations in physics, perturbative. In
Important monographs on axiomatic quantum field
order to be successful, they need to yield good
theory are those by Streater and Wightman (1964),
agreement with experiment with reasonable compu-
Jost (1965), Bogoliubov et al. (1975), and Bogoliubov
tational efforts, that is, by evolution up to the second
et al. (1990). Note that the books of Bogoliubov et al.
or third order. This asymptotic convergence is more
differ in setup fundamentally and that neither replaces
important than convergence of the series as a whole.
the other. For a lecture notes volume, see also Volkel
There are low-dimensional examples of interacting
(1977), and for a review article, see Streater (1975).
Wightman fields (e.g., (4 )2 ; cf. the monograph by
A valuable discussion of the Wightman axioms can
Glimm and Jaffe (1987)), and time will tell whether
also be found in the second volume of the series by
four-dimensional interacting Wightman fields exist.
Reed and Simon (1970).
But there is no reason to expect convergence for
The first monograph on the algebraic approach to
general interacting fields; for example, QED does not
quantum field theory is due to Haag (1992), a more
fit into the Wightman framework.
recent one has been written by Araki (1999).
The appropriate extension of the Wightman
Concerning the sufficient conditions for switching
setting has been formulated by Epstein and Glaser
between the GardingWightman and the algebraic
(1973). It defines the S-matrix rather than the field
approach, see Wollenberg (1988) and the Ph.D.
itself as a (in general divergent) formal power series
thesis of Bostelmann (2000) and references given
of operator-valued distributions.
there. Dynamical and thermodynamical foundation
The above results apply to this somewhat more
of standard axioms, the BisognanoWichmann
modest setting as well, so the axiomatic
symmetries (Unruh effect), and the spinstatistics
approaches do help in understanding the known
theorem, have been investigated by Kuckert (2002,
high-energy physics interactions. This even includes
2005), see also the references given there for related
gauge theories (see Perturbative Renormalization
work.
Theory and BRST). The high-precision results of
In different formulations and at differing degrees of
QED can be reproduced within this setting, and
mathematical sophistication, the causal approach to
there occur no UV singularities: renormalization
perturbation theory can be found in the monographs
amounts to the need to extend distributions by
by Bogoliubov and Shirkov (1959), Scharf (1989,
fixing some parameters, that is, the renormalization
2001), and Steinmann (2000). Two modern review
constants. The infrared problem is circumvented by
articles have been written by Brunetti and Fredenhagen
considering the S-matrix as a (position-dependent)
(2000) and by Dutsch and Fredenhagen (2004).
distribution taking values in the unitary formal
The reference original articles on the Euclidean
power series of distributions rather than as a single
axioms are those of Osterwalder and Schrader (1973,
(global) unitary operator (or unitary power series).
1975). Note that the first one contains an error. (cf.
also Zinoviev (1995)). A monograph on Euclidean
Quantum Energy Inequalities
field theory and its relations to the other axiomatic
Energy densities of Wightman fields admit negative settings of quantum field theory and to statistical
expectation values (Epstein, Glaser, and Jaffe 1965). mechanics is that by Glimm and Jaffe (1987).
This is in contrast to the positivity conditions that A recent review on quantum energy inequalities is
the energymomentum tensors of classical general due to Fewster (2003).
240 Axiomatic Quantum Field Theory

Acknowledgments Dutsch and Fredenhagen K (2004) Causal Perturbation Theory in


terms of retarded products, and a proof of the Action Ward
The author is a fellow of the Emmy-Noether Identity, to appear in. Rev. Math. Phys.
Programme (DFG). Thanks for discussions are due Fewster CJ (2003) Energy Inequalities in Quantum Field Theory.
Proceedings of the International Conference on Mathematical
to Professor D Arlt.
Physics (revised version under math-ph/0501073).
Glimm J and Jaffe A (1987) Quantum Physics: A Functional
See also: Algebraic Approach to Quantum Field Theory; Integral Point of View, 2nd edn. BerlinHeidelberg
C*-Algebras and Their Classification; Constructive New York: Springer.
Quantum Field Theory; Dispersion Relations; Euclidean Guido D and Longo R (1995) An algebraic spin and statistics
Field Theory; Indefinite Metric; Perturbative Theorem. Communications in Mathematical Physics 172: 517.
Haag R (1992) Local Quantum Physics. BerlinHeidelbergNew
Renormalization Theory and BRST; Quantum Field
York: Springer.
Theory: A Brief Introduction; Quantum Field Theory in
Jost R (1965) The General Theory of Quantized Fields. American
Curved Spacetime; Scattering, Asymptotic Completeness Mathematical Society.
and Bound States; Scattering in Relativistic Quantum Kuckert B (2002) Covariant thermodynamics of quantum
Field Theory: Fundamental Concepts and Tools; systems: passivity, semipassivity, and the Unruh effect. Annals
Scattering in Relativistic Quantum Field Theory: The of Physics 295: 216.
Analytic Program; Symmetries in Quantum Field Theory: Kuckert B (2005) Spin, statistics, and reflections, I. Annales Henri
Algebraic Aspects; Symmetries in Quantum Field Theory Poincare 6: 849.
of Lower Spacetime Dimensions; Thermal Quantum Field Kuckert B and Lorenzen R (2005) Spin, Statistics, and Reflec-
Theory; TomitaTakesaki Modular Theory; tions, II. Preprint (math-ph/0512068).
Osterwalder K and Schrader R (1973) Axioms for Euclidean
Two-Dimensional Models.
Greens functions. Communications in Mathematical Physics
31: 83.
Osterwalder K and Schrader R (1975) Axioms for Euclidean
Greens functions. 2. Communications in Mathematical
Physics 42: 281.
Further Reading
Reed M and Simon B (1970) Methods of Modern Mathematical
Araki H (1999) Mathematical Theory of Quantum Fields. Physics, (4 volumes). London: Academic Press.
Oxford: Oxford University Press. Scharf G (1989) Finite Quantum Electrodynamics. Berlin
Bogoliubov NN, Logunov AA, and Todorov IT (1975) Introduc- HeidelbergNew York: Springer.
tion to Axiomatic Quantum Field Theory, (Russian original Scharf G (2001) Quantum Gauge Theories A True Ghost Story.
edition: Nauka (Moskow) 1969). New York: Benjamin. Weinheim: Wiley.
Bogoliubov NN, Logunov AA, Oksak AI, and Todorov IT (1990) Streater RF (1975) Outline of axiomatic quantum field theory.
General Principles of Quantum Field Theory, (Russian Reports on Progress in Physics 38: 771.
original edition Nauka (Moskow) 1987). DordrechtBoston Streater RF and Wightman AS (1964) PCT, Spin & Statistics, and
London: Kluwer. All That. New York: Benjamin.
Bostelmann H (2000) Lokale Algebren und Operatorprodukte am Volkel AH (1977) Fields, Particles, and Currents. Lecture Notes
Punkt (in German). Ph.D. thesis, Gottingen. in Physics, vol. 66. BerlinHeidelbergNew York: Springer.
Brunetti R and Fredenhagen K (2004) Microlocal Analysis and Wollenberg M (1988) The existence of quantum fields for local
Interacting Quantum Field Theories: Renormalization on nets of observables. Journal of Mathematical Physics 29: 2106.
Physical Backgrounds. Communications in Mathematical Zinoviev YM (1995) Equivalence of Euclidean and Wightman field
Physics 208: 623. theories. Communications in Mathematical Physics 174: 1.
B
Backlund Transformations
D Levi, Universita Roma Tre, Rome, Italy showed that four such solutions can be related in an
2006 Elsevier Ltd. All rights reserved. algebraic way:
 0   0 
~ w
w a1 a2 w w ~
tan tan 4
4 a1  a2 4
Introduction Equation [4] is derived using the permutability
theorem proved by Bianchi in his Ph.D. thesis in
Backlund transformations appeared for the first time
1879:
in the work of the geometers of the end of the
nineteenth century, for instance, Bianchi, Lie,
w
Backlund, and Darboux, when studying surfaces
a1 a2
of constant curvature. If on a surface in three-
dimensional Euclidean space, the asymptotic direc-
~
tions are taken as coordinate directions, then the w w 5
surface metric may be written as
a2 a1
ds2 dx2 2 cosw dx dy dy2 1 ~
w
where w(x, y) is a function of the surface coordi-
nates x, y. A necessary and sufficient condition for whereby the diagram
the surface to be of constant curvature is that w
a
satisfies the nonlinear partial differential equation w w

w;xy sinw 2
we mean a BT from w to w0 with parameter a.
where the subscript denotes partial derivative. For sG equation [2] a trivial solution is given, for
Equation [2] is nowadays called the sine Gordon example, by w(x, y) = . Then, from eqn [3a] we get
(sG) equation. Bianchi (1879), Lie (1888, 1890,  
1  e2axy
1893), and Backlund (1874) introduced a transfor- wx;
~ y 2 arcsin
mation which allows one to pass from a solution of 1 e2axy
eqn [2] to a new solution, that is, from a surface of Introducing this result in eqn [3b], we get ,y = 1=a.
constant curvature to a new one. Starting from the So, the application of the BT [3] to sG equation gives
work of Clarin (1903), this transformation has been the nontrivial solution
referred to as Backlund transformation (BT). The
BT for eqn [2] reads
  w= ~ = 4 arctan 1 e[axy/a]
w 6
w~ w 1 + e[axy/a]
w~ ;x w;x 2a sin 3a
2
Clarin (1903) extended the results of Backlund to
 
2 w~ w the case of a generic partial differential equation of
~ ;y w;y sin
w 3b second order,
a 2
where a is a nonzero constant parameter and w is a Fx; y; w; w;x ; w;y ; w;xx ; w;xy ; w;yy 0 7
different solution of eqn [2]. It is immediate to prove by assuming that
by appropriate differentiation of eqns [3] with
respect to y and x that both w and w must satisfy w;x f w; w;
~ w~ ;x ; w
~ ;y
8
eqn [2]. The BT [3] provides a denumerable set of w;y gw; w;
~ w~ ;x ; w
~ ;y
exact solutions once a solution w is known. Bianchi
242 Backlund Transformations

If the compatibility of eqns [8] with s1 < m1 and s2 < m2 , represents the BT of
eqns [13] iff the compatibility of eqns [14] is
f;y  g;x 0 9
identically satisfied on the solutions of eqns [13]
is identically satisfied by eqn [7] for the variable and Gj depends on a set of essential arbitrary
w(x, y), then we say that eqns [8] are an constant parameters.
auto-Backlund transformation for eqn [7]. In this
The Clarin formulation [8] and the classical BT
case, eqns [8] transform a solution of eqn [7] into a
for the sG [3] are clearly special subcases of this
new solution of the same equation. Thus, eqns [8]
definition. When a solution of F1 = 0 is known, a
simplify the problem of finding solutions of eqn [7].
solution of F2 = 0 is obtained by solving a set of
Given one solution w(x, y) of eqn [7], the existence
lower-order partial differential equations. By a
of a BT reduces the problem of integrating eqn [7]
proper choice of the BT parameters, once a new
into that of solving two first-order ordinary differ-
solution is obtained by solving the BT [14], one can
ential equations. From this point of view, the
use the obtained solution as a starting point to
CauchyRiemann relations
construct another one, and so on. In this way, one
w;x w
~ ;y ; w;y w
~ ;x 10 can construct a whole ladder of solutions, a priori a
denumerable set of solutions. This same construc-
for the Laplace equation tion has been applied also to the case of functional
w;xx w;yy 0 11 equations. In particular, it has been considered for
the case of differentialdifference and difference
are a BT ante litteram (however, without a free difference equations both for finite (dynamical
parameter). systems (Wojciechowski 1982)) and infinite lattices
Consider the case when w(x, y) satisfies a different (Toda 1989).
partial differential equation, In the case when F1 and F2 represent the same
Gx; y; w;
~ w~ ;x ; w
~ ;y ; w
~ ;xx ; w
~ ;xy ; w
~ ;yy 0 12 equation, s1 = s2 = 1 and the BTs Gj = 0 are linear in
u , then Definition 1 is strictly related to the notion
In this case, one still has a BT, but not an auto-BT. (1)
of nonclassical symmetry or conditional symmetry
The best-known cases are when F1 = w,y w,xxx (Levi and Winternitz 1989, Olver 1993), an exten-
ww,x and G1 = w,y w,xxx w2 w,x , and F2 = w,xy  sion of the concept of Lie symmetry used to reduce
ew and G2 = w,xy (Lamb 1976). In the first case, the and integrate a differential equation. In the case of
BT relates the Kortewegde Vries (KdV) equation to the nonclassical symmetries, the known solution u ~ is
the modified KdV equation and this transformation included in the arbitrary x-dependent coefficients of
paved the way to the discovery of the complete the transformation. In this case, the BT is just a way
integrability of the KdV equation by Gardner et al. to construct an explicit solution of the differential
(1967). In the second case, the BT relates the equation [7].
Liouville equation to the wave equation, and can Definition 1 is often too general to be able to get
be used to solve it completely. Due to the first explicit results. It is constructive for any partial
example, often a non-auto-BT is denoted as Miura differential equation, linear or nonlinear, but if one
transformation. is not able to get a nontrivial BT this does not
One can now state an operative definition of BT, mean that a BT does not exist. As noted later, the
extending the results of Backlund and Clarin to existence of an auto-BT is associated to the
more general equations. existence of an infinity of symmetries, and this is
Definition 1 Consider two partial differential a condition for the exact integrability of eqn [13]
equations of order m1 and m2 : (Fokas 1980, Ibragimov and Shabat 1980). So, the
F1 x; u; u ; u ; . . . ; u 0 13a existence of a BT is closely related to the integr-
1 2 m1 ability of eqn [13].

F2 x; u
~; u
~; u
~;...; u
~ 0 13b
1 2 m2 Backlund via Integrability
where x 2 R n and (u, u~ ) 2 Cp , and u is the set of One can derive the BT from the integrability
(k)
k-order derivative of u. The set of n equations properties of eqn [13a]. Equation [13a] is said to
be integrable if it can be written as the compatibility
Gj x; u; u ; . . . ; u ; u
~; u
~;...; u
~0 condition of an overdetermined system of linear
1 s1 1 s2
partial differential equations for an auxiliary func-
j 1; 2; . . . ; n 14
tion depending on a free parameter belonging to the
Backlund Transformations 243

complex C plane. The prototype of such a situation In eqn [21] and henceforth, d=dt denotes the total
is given by the Lax pair for the KdV equation derivative with respect to t.
In the following, for the sake of the simplicity
u;t u;xxx  6uu;x 0 15 of exposition and for the concreteness of the
introduced by Lax (1968): presentation, all the results presented on the BT
will be derived for the KdV equation. Similar
L k2 ; L @x2 ux; t 16a results can be obtained and have been obtained in
the literature for many classes of integrable
;t M ; M 4@xxx  3u@x @x u 16b partial differential equations in two and three
dimensions and for differentialdifference and
where k is a free parameter and = (x, t; k). As eqn differencedifference equations. For a partial
[16a] is nothing else but the stationary Schrodinger review of the available recent literature on
equation, the function can be interpreted as a the subject, see Rogers and Shadwick (1982) and
wave function, and k2 is the spectral parameter Coley et al. (2001)
corresponding to the potential u(x, t). The condition A more general form of introducing the non-
for the existence of a solution of the over- linear partial differential equation as a compat-
determined system of eqns [16] is given by the ibility of an overdetermined system of linear
operator equation equations has been provided by Zaharov and
L;t L; M 17 Shabat (1979) with the dressing method (DM). In
the DM, the differential equations [16] are
the so-called Lax equation. In the case of substituted by a matrix system of linear equations
asymptotically bounded potentials, eqn [16a]
defines the spectrum unique. Introducing the ;x Uux; t; k 22a
following asymptotic boundary conditions for the ;t Vux; t; k 22b
wave function ,
where  = (x, t; k) and U and V are matrix
x; t; k ! Tk; teikx functions. The existence of a nonsingular solution
x!1
18 of the system of linear equations [22] requires
x; t; k ! eikx Rk; teikx
x!1 that the matrix functions U and V satisfy the
equation
where R(k, t) and T(k, t) are, respectively, the
reflection and the transmission coefficient, the U;t  V;x U; V 0 23
spectrum is defined in the complex plane of
the variable k by often called zero-curvature condition. The KdV
equation [15] in the DM is obtained by choosing
Su  fRk; t; 1 < k < 1; pn ; cn t;  
j 1; 2; . . . ; Ng 19 ik ux; t
Uux; t; k
1 ik
where pn are the bound state parameters corre-
Vux; t; k
sponding to isolated singularities of the reflection !
coefficients on the imaginary positive k-axis corre- 2u 4k2 ux  2iku  4ik3
sponding to a solution n (x, t; pn ) of the spectral
ux 2iku 4ik3 2uu 2k2  2ikux  ux x
problem vanishing for x ! 1 and such that
24
lim epn x n x; t; pn  1 20
x!1
The existence of an auto-BT implies the existence
and cn are some functions of t related to the residues of a differential equation (see Definition 1) which
of R(k, t) at the poles pn . There is a one-to-one relates two solutions of the same nonlinear equa-
correspondence between the evolution of the poten- tion. The new solution u(x, t) of eqn [15] will be
tial u(x, t) in eqn [15] and that of the spectrum S[u] associated to a different Lax operator and a
of the Schrodinger spectral problem [16a]. In parti- different spectral problem (but of the same opera-
cular, for the KdV, taking into account eqn [16b], tional form)
the evolution of the reflection coefficient R(k, t) is
given by ~ @xx u
L ~x; t 25a

dRk; t
8ik3 Rk; t 21 ~ ~ k2 ~
L 25b
dt
244 Backlund Transformations

The existence of a relation between the potentials of the spectral problem, eqn [29a] provides a new
u(x, t) and u(x, t) thus implies that there must be a solution of the KdV, while eqn [29b] gives a new
(u, u; k)-dependent operator D such that solution of the spectral problem. This procedure can
be carried out recursively and gives a ladder of
~D 26 explicit solutions for the KdV equation.
The DM is a particularly simple setting in which
The compatibility of eqns [16a], [25b], and [26]
one can derive DTs. In fact, expressing the matrix
implies that LD = Dk2 , that is,
D in terms of , eqn [28a] gives a relation between
~ DL
LD 27 the potentials of the type given by eqn [29a], while
eqn [26] gives eqn [29b]. Depending on the form of
Equation [27] is the auto-BT in the Lax formalism. the matrix D in terms of k, one can introduce more
If L and L are two different spectral problems parameters in the DT. The classical DT [29]
related to two different nonlinear partial differential depends on just one parameter; however, in the
equations, then eqn [27] will provide a Miura case of the Schrodinger spectral problem [16a], one
transformation. In the DM, the requirement of the can also have DTs depending on two parameters, a
existence of a BT is given again by eqn [26] with TDT.
and substituted by  and  and the operator D A more general DT, which can provide solutions
substituted by a matrix function D. The BT in the even when the initial solution is not bounded
DM is given by asymptotically, can be obtained for many equations
D;x U~ux; t; kD  DUux; t; k 28a and, in particular, also for the KdV equation. This is
obtained in a particular limit of the TDT when the
D;t V~
ux; t; kD  DVux; t; k 28b parameters coincide (Levi 1988) and it is often
referred to as binary DT (Matveev and Salle 1991).
In the particular case of the HilbertRiemann The binary DT for the KdV is given by
problem with zeros, providing the soliton solutions,
u
~x; t ux; t  2log Fx; t;xx 30a
the matrix D can be expressed as a function of . In
this way, one derives the Moutard or Darboux
 
transformation (DT) (Moutard 1878, Levi et al. 1 Fx;t;xx
~x;t;k k2
  2
 ;x x;t; k
1984), the most efficient way to get soliton solutions k2  2 2Fx;t
of the nonlinear partial differential equation. 
Fx x;t
Given a linear ordinary differential equation for  x;t; k 30b
Fx;t
the unknown , depending on a set of arbitrary
functions u(x) and parameters k, the DT provides a where  is a value of k for which the function
discrete transformation which leaves the equation (x, t; k) is asymptotically bounded at 1 and the
invariant. In the particular case of the KdV equation function F(x, t) is given by
associated with the stationary Schrodinger spectral Z 1
problem [16a], we have Fx; t 1  y; t; 2 dy 31
u
~x; t ux; t  2log Fx; t;xx 29a x

with  an arbitrary constant. The corresponding BT


obtained eliminating the function F from eqns [30]
~x; t; k  i
;x x; t; k reads
k ip
Fx x; t 1
 x; t; k 29b q
~;xx  q;xx  q ~  q3
Fx; t 8
 q
~x qx  2gx 2q
~  q
where the intermediate wave function
q x  qx 2
1 ~
32
Fx; t x; t; k ip a x; t; k ip 2 q ~q
R1
is a linear combination of the Jost solution of the where q = x u0 (y, t) dy with u0 (x, t) = u(x, t) 
Schrodinger spectral problem with p a real para- g(x), the asymptotically bounded part of u(x, t),
meter and a an arbitrary constant. If one looks for and R g(x) its asymptotic behavior, and
1
an equation involving only the potentials u and u, q = x u0 (y, t) dy with u0 (x, t) = u(x, t)  g(x).
from eqns [29], one gets the BT for the KdV Once the Lax operator L is given, we can obtain
equation. Given a trivial solution of the KdV in a constructive way the operators M which
equation, together with the corresponding solution give the admissible nonlinear partial differential
Backlund Transformations 245

equations and the operators D which give the Backlund and Symmetries
admissible BT. A technique to do so is provided by
A symmetry of the nonlinear equation [15] is given
the so-called Lax technique introduced by Bruschi
by a flow commuting with it, that is, by an
and Ragnisco (1980ac). Using the Lax technique,
equation
we can easily obtain the nonlinear partial differ-
ential equations and BT associated with the Lax u; f u; ux ; ut ; . . . 37
operator [16a] both in the isospectral and non-
isospectral case (when k,t = 0 and when k,t 6 0) where  is the group parameter, u = u(x, t; ), and the
and the corresponding evolution of the spectrum.  derivative of [15] is zero on its set of solutions.
We have A group transformation is obtained by integrating it.
Usually this is possible only when eqn [37] is a
u;t f L; tux gL; txux 2u 33a quasilinear partial differential equation of the first
order. Taking into account the evolution of the
k;t kg4k2 ; t spectrum of the KdV equation [15], it is easy to
dRk; t 33b prove that its symmetries are given by
2ikf 4k2 ; tRk; t ( )
dt X
1 X
1
n n
u; n L  3 n tL u;x
F~
u  u G 1 0 33c n0 n0
( )
X
1
2 2 n Ln xu;x 2u 38
~ t F4k  2ikG4k Rk; t
Rk; 33d n0
F4k2 2ikG4k2
where n and n are a set of constant parameters.
where the functions f, g, F, and G are entire For each choice of the parameters n and n ,
functions of their first argument and the recursive one gets a symmetry of the KdV equation [15].
operators L and  are given by With eqn [38] one can associate the following
evolution of the reflection coefficient R(k, t; ):
Lf x f;xx x  4ux; tf x (
Z 1 X
1
dR
2u;x x; t f y dy 34a 2ik n 4k2 n
x d n0
)
f x f;xx x  2~ux; t ux; tf x X 1
2 n1
Z 1 3 n t4k R 39
n0
 f y dy 34b
x
and of the spectral parameter k
u;x x; t u;x x; tf x ~
f x ~ ux; t  ux; t X
1
Z 1 k; n 4k2 n k 40
 uy; t  uy; tf y dy
~ 34c n0
x
As (1/2)L 1 = xu,x 2u, one can add to the
In the limit when u ! u the operator  ! L. A BT symmetries [38] the exceptional one (which has no
is obtained by choosing the functions F and G in spectral counterpart as u is not bounded
eqn [33c]. The simplest BT is obtained by setting asymptotically):
F =  and G = 1:
  u; 1 6tu;x 41
~ v  v   12~
v;x v;x ~ v  v 0 35
By a proper natural choice of the constant para-
with u(x, t) = v,x (x, t) and  is the Backlund meters n and n , one can define two infinite series
parameter. By combining together BT of the form of symmetries. The first one is obtained by choosing
[35] with different parameters as in eqn [5], we get n = 0 and n =
n, m with m = 1, 2, . . . , 1 and can
the permutability theorem for the KdV BTs: be denoted as the isospectral series as k, = 0. This is
formed by commuting symmetries. The second one
1 2 v0  v
~ is given by n = 0 and n =
n, m with m = 1, 2, . . . , 1
v0 v 
~ 36 and can be denoted as the nonisospectral series as
1  2 1/2v0  ~v
k, 6 0. The nonisospectral symmetries have a
Its proof is immediate from the point of view of the nonzero commutation relation among themselves
spectrum. and with the isospectral ones.
246 Backlund Transformations

Except for a few Lie point symmetries (given by which is an integrable differentialdifference
eqn [41] and by choosing inside the series [38] those approximation to the KdV equation or
with different from zero only 0 or 0 or 1 ) they
are all generalized symmetries (Olver 1993). By wn 1; t;t wn; t;t


analyzing their spectrum, it is easy to prove that the wn 1; t wn; t
2a sin 46
choice [38] is such that they are all independent. For 2
the isospectral class, the evolution of the spectrum is a discrete integrable differentialdifference approxima-
simple and can be integrated to provide the group tion to the sG equation (Hirota 1977, Orfanidis 1978).
transformation of the spectrum As the nonlinear superposition formulas are
Rk; t;  Rk; t purely algebraic relations involving potentials asso-
" ( )# ciated with integrable nonlinear partial differential
X
1
equations, one can interpret them as difference
2 n
 exp 2ik n 4k  42
n0
difference equations. In the case of the sG equation
from eqn [7], we have
Let us now consider the simplest BT obtained by
wn1;m1  wn;m
choosing, in eqn [33c], F() =  and G() = 1, where  
 is an arbitrary parameter. In the spectral space, this a1 a2 wn;m1  wn1;m
4 arctan1 tan 47
corresponds to the following change of the spectrum: a1  a2 4
where w(x, t) = wn, m , w(x, t) = wn1, m , w0 (x, t) =
~ t   2ik Rk; t
Rk; 43
 2ik wn, m1 , and w0 (x, t) = wn1, m1 . In a similar manner,
from [36], one gets
Defining R(k, t) = R(k, t; ), eqn [42] is equal to
eqn [43] iff 1 2 vn1;m  vn;m1 
vn1;m1 vn;m  48
1  2 12 vn1;m  vn;m1 
2
n  ; n 0; 1; . . . ; 1 44
2n1 2n 1 The continuous limit of eqn [47], obtained by setting
x = 1 n and y = 2 m and choosing
So we need an infinite number of symmetries to
be able to reconstruct the change of the spectrum a1 1 2

given by the BT. This shows that the existence of a BT a2 4
is strictly connected to the existence of an infinity of gives back eqn [2] (Rogers and Schief 1997). It is
symmetries which is a condition for the exact worth mentioning that one can also use known
integrability of the nonlinear partial differential nonlinear lattice equations to construct BT for
equation (Fokas 1980, Ibragimov and Shabat 1980). nonlinear partial differential equations (Levi 1981).

See also: Integrable Systems and Discrete Geometry;


Discretization via Backlund Integrable Systems: Overview; Painleve Equations;
BTs, apart from providing classes of exact solutions Solitons and KacMoody Lie Algebras; Toda Lattices.
to nonlinear equations, play a very important role in
the discretization of partial differential equations. As
noted earlier, an auto-BT is a differential relation Further Reading
between two different solutions of the same non-
linear partial differential equation. If it is assumed Backlund AV (1874) Einiges uber Curven und Flachentransfor-
that the new solution u is just the old solution u mationen. Lund Universitets Arsskrift 10: 112.
Bianchi L (1879) Ricerche sulle superficie a curvatura costante e sulle
computed in a different point of a lattice, then the elicoidi. Annali della R. Scuola normale superiore di Pisa 2: 285.
BT becomes just a differentialdifference equation Bruschi M and Ragnisco O (1980a) Existence of a Lax pair for
(Chiu and Ladik 1977, Levi and Benguria 1980). any member of the class of nonlinear evolution equations
This can be carried out also at the level of the associated to the matrix Schrodinger spectral problem. Lettere
associated compatibility condition and in such a al Nuovo Cimento 29: 321326.
Bruschi M and Ragnisco O (1980b) Extension of the Lax method
way one is able to also obtain its Lax pair. This to solve a class of nonlinear evolution equations with
demonstrates the integrability of the differential x-dependent coeffcients associated to the matrix Schrodinger
difference equation spectral problem. Lettere al Nuovo Cimento 29: 327330.
Bruschi M and Ragnisco O (1980c) Backlund transformations
vn 1; t;t vn; t;t vn 1; t  vn; t and Lax technique. Lettere al Nuovo Cimento 29: 331334.
 Chiu S-C and Ladik JF (1977) Generating exactly soluble
   12vn 1; t  vn; t 0 45 nonlinear discrete evolution equations by a generalized
BatalinVilkovisky Quantization 247

Wronskian technique. Journal of Mathematical Physics Levi D and Benguria R (1980) Backlund transformations and
18: 690700. nonlinear differential difference equations. Proceedings of the
Clarin J (1903) Sur quelques equations aux derivees partielles du National Academy of Science USA 77: 50255027.
second ordre. Annales de la Facult des Sciences de Toulouse pour Levi D and Winternitz P (1989) Non-classical symmetry reduction:
les Sciences Mathmatiques et les Sciences Physiques. Serie 2 example of the Boussinesq equation. Journal of Physics A:
5: 437458. Mathematical and General 22: 29152924.
Coley A, Levi D, Milson R, Rogers C, and Winternitz P (eds.) (2001) Levi D, Ragnisco O, and Sym A (1984) Dressing method vs. classical
Backlund and Darboux transformations. The Geometry of Darboux transformation. Il Nuovo Cimento 83B: 3442.
solitons. Proceedings of the AARMS-CRM Workshop, Halifax, Lie S (1888, 1890, 1893) Theorie der Transformationgruppen.
NS, June 49, 1999. CRM Proceedings and Lecture Notes, vol. Leipzig: B.G. Teubner.
29. Providence, RI: American Mathematical Society. Matveev VB and Salle LA (1991) Darboux Transformations and
Faddeev LD and Takhtajan LA (1987) Hamiltonian Methods in Solitons. Berlin: Springer.
the Theory of Solitons. Berlin: Springer. Moutard Th-F (1878) Sur la construction des equations de la forme
Fokas AS (1980) A symmetry approach to exactly solvable evolution (1=z)(d2 z=dxdy) = (x,y), qui admettent une integrale general
equations. Journal of Mathematical Physics 21: 13181325. explicite. Journal de lEcole Polytechnique, Paris 28: 111.
Gardner CS, Greene JM, Kruskal MD, and Miura RM (1967) Olver PJ (1993) Applications of Lie Groups to Differential
Method for solving the Kortewegde Vries equation. Physical Equations. New York: Springer.
Review Letters 19: 10951097. Orfanidis SJ (1978) Discrete sine-Gordon equations. Physical
Hirota R (1978) Nonlinear partial difference equations. III. Review D 18: 38223827.
Discrete sine-Gordon equation. Journal of the Physical Society Rogers C and Schief WK (1997) The classical Backlund
of Japan 43: 20792086. transformation and integrable discretization of characteristic
Ibragimov NH and Shabat AB (1980) Infinite LieBcklund algebras equations. Physics Letters A 232: 217223.
(in Russian). Funktsional. Anal. i Prilozhen 14: 7980. Rogers C and Shadwick WF (1982) Backlund Transformations
Lamb GL (1976) Backlund transformations at the turn of and Their Applications. New York: Academic Press.
the century. In: Miura RM (ed.) Backlund Transformations, Toda M (1989) Theory of Nonlinear Lattices. Berlin: Springer.
pp. 6979. Berlin: Springer. Wojciechowski S (1982) The analogue of the Backlund transforma-
Lax PD (1968) Integrals of nonlinear equations of evolution and tion for integrable many-body systems. Journal of Physics A:
solitary waves. Communications in Pure and Applied Mathe- Mathematical and General 15: L653L657.
matics 21: 647690. Zaharov VE and Shabat AB (1979) Integration of the nonlinear
Levi D (1981) Nonlinear differential difference equations as equations of mathematical physics by the method of the
Backlund transformations. Journal of Physics A: Mathematical inverse scattering problem. II (Russian). Funktsional Analiz
and General 14: 10831098. i ego Prilozheniya 13: 1322. (English translation: Functional
Levi D (1988) On a new Darboux transformation for the Analysis and Applications 13: 166173 (1980)).
construction of exact solutions of the Schroedinger equation.
Inverse Problems 4: 165172.

BatalinVilkovisky Quantization
A C Hirshfeld, Universitat Dortmund, examples of the BatalinVilkovisky formalism are
Dortmund, Germany given. At the present time, it is the most general
2006 Elsevier Ltd. All rights reserved. treatment available. Alexandrov, Kontsevich, Schwarz,
and Zabaronsky (AKSZ 1997) have presented a
geometric interpretation for the case in which the
action is topologically invariant.
Introduction
The BatalinVilkovisky formalism for quantizing
Structure of the Set of Gauge
gauge theories has a long history of development. It
Transformations
begins with the FaddeevPopov procedure for
quantizing YangMills theory, involving the Faddeev Consider a system whose dynamics is governed by
Popov ghost fields (Faddeev and Popov 1967). It a classical action S[i ] which depends on the
continued with the discovery of BRST symmetry by fields i (x), i = 1, . . . , n. We employ a compact
Becchi et al. (1976). Then Zinn-Justin (1975) notation in which the multi-index i may denote
introduced sources for these transformations, and the various fields involved, the discrete indices on
a symmetric structure in the space of fields and which they depend, and the dependence on the
sources in his study of renormalizability of these spacetime variables as well. The generalized
theories. Finally, Batalin and Vilkovisky (1981) summation convention then means that a
systematized and generalized these developments. repeated index may denote not only a sum over
A more detailed account of this history can be discrete variables, but also integration over
found in Gomis et al. (1994), where many worked the spacetime variables. i = (i ) denotes the
248 BatalinVilkovisky Quantization

Grassmann parity of the fields. Fields with i = 0 Equations [8] and [10] lead to the following
are called bosonic, with i = 1 fermionic. The condition:
graded commutation rule is  
 ji
1 ; 2 i Ri T  S0;j E "1 "2 11
i xj y 1i j j yi x 1

For a gauge theory the action is invariant under a set The tensors T are called the structure constants of the
of gauge transformations with infinitesimal form gauge algebra, although they depend, in general, on
ij
the fields of the theory. When E = 0, the gauge
i Ri " ;  1 or 2 or . . . m 2 algebra is said to be closed, otherwise it is open.
Equation [11] defines a Lie algebra if the algebra is
The " are the infinitesimal gauge parameters and 
closed and the T are independent of the fields.
Ri the generators of the gauge transformations. The gauge tensors have the following graded
When  = (" ) = 0 we have an ordinary symmetry, symmetry properties:
when  = 1 the equation is characteristic of a
supersymmetry. The Grassmann parity of Ri is 
T 
1  T
(Ri ) = i  (mod 2). ij ji ij
12
A subscript after a comma denotes the right E 1  E 1  E
derivative with respect to the corresponding field,
The Grassmann parities are
that is, the field is to be commutated to the far right
and then dropped. The field equations may then be 
T    mod 2 13
written as
and
S0;i 0 3
ij
where S0 is the classical action. Let  denote the E i j   mod 2 14
surface in the space of solutions where the field
Various restrictions are imposed by the Jacobi
equations are satisfied:
identity
S0;i j 0 4 X
1 ; 2 ; 3  0 15
If the gauge transformations are independent cyclic123
on-shell, that is,
These restrictions are
rank Ri j m 5 X  
ji
the gauge theory is said to be irreducible. We Ri A  S0;j B " " " 0 16
cyclic123
assume here that this is the case. When it is not, the
theory is reducible. For details of the treatment in where
that case, see Gomis, Paris, and Samuel. The  
classical solutions are 0 2 . 3A  Tk

Rk  T 
T
1  
The Noether identities are  

 Tk Rk  T 
T
S0;i Ri 0 6  
The general solution to the Noether identity is 1   Tk
Rk  T

T

i Ri T  S0;j Eji 7 and



The commutator of two gauge transformations is 3Bji  Ejik Rk  Eji T

 1i 
  
1 ; 2 i Ri;j Rj  1  Ri;j Rj "1 "2 8 j
R;k Eki
 1
j i  i
R;k E
kj

Since this commutator is a symmetry of the action, it 1    !  !  1  


satisfies the Noether identity   !  ! 
 
S0;i Ri;j Rj  1  Ri;j Rj 0 9 As in the familiar FaddeevPopov procedure, it is
useful to introduce ghost fields C with opposite
which by eqn [7] implies that Grassmann parities to the gauge parameters " :
Ri;j Rj  1  Ri;j Rj Ri T

S0;j Eji 10 C  1 mod 2 17
BatalinVilkovisky Quantization 249

and to replace the gauge parameters by ghost fields. For bosonic fields
One must then modify the graded symmetry proper-
ties of the gauge structure tensors according to @B @B
B; B 2 29
@A @A
2 4 
T1 2 3 4 ... ! 1 T1 2 3 4 ... 18
for fermionic fields
The Noether identities then take the form
F; F 0 30
S0;i Ri C 0 19
and the structure relations [10] become and for any X

j  ji X; X; X 0 31
2Ri;j R  Ri T S0;j E C C 0 20
If one groups the fields and the antifields together
into the set
Introducing the Antifields za fA ; A g; a 1; . . . ; 2N 32
We incorporate the ghost fields into the field set
then the antibracket is seen to define a symplectic
A = {i , C }, where i = 1, . . . , n and  = 1, . . . , m.
structure on the space of fields and antifields
Clearly A = 1, . . . , N, where N = n m. One then
further increases the set by introducing an antifield @r X ab @l Y
A for each field A . The Grassmann parity of the X; Y ! 33
@za @zb
antifields is
  with
 A A 1 mod 2 21 
ab 0 BA
! 34
Each field is assigned a ghost number, with BA 0
ghi  0 The antifields can be thought of as conjugate
ghC  1 22 variables to the fields, since
   A 
gh A ghA   1  ; B BA 35
In the space of fields and antifields, the antibracket
is defined by
@r X @l Y @r X @l Y The Classical Master Equation
X; Y  23
@A @A @A @A Let S[A , A ] be a functional of the fields and
antifields with the dimension of an action, vanishing
where @r denotes the right, @l the left derivative. The ghost number and even Grassmann parity. The
antibracket is graded antisymmetric: equation
X; Y 1X 1Y 1 Y; X 24 @S @S
S; S 2 0 36
@A @A
It satisfies a graded Jacobi identity
is the classical master equation. Solutions of the
X; Y; Z 1X 1Y 1 classical master equation with suitable boundary
Y; Z; X 1Z 1X Y Z; X; Y 0 25 conditions turn out to be generating functionals for
the gauge structure of the theory. S is also the
It is a graded derivation starting point for the quantization. One denotes by
 the subspace of stationary points of the action in
X; YZ X; YZ 1X Y X; ZY
26 the space of fields and antifields:
XY; Z XY; Z 1X Y YX; Z

a
@S
It has ghost number  z
a0 37
@z
ghX; Y ghX ghY 1 27
Given a classical solution 0 of S0 one stationary
and Grassmann parity point is
X; Y X Y 1 mod 2 28 i i0 ; Ca 0; A 0 38
250 BatalinVilkovisky Quantization

An action which satisfies the classical master We define a surface in functional space
equation has its own set of invariances:
 A   @
@S a   ; A jA 46
R 0 39 @A
@za b
so that for any functional X[,  ]
with 
@
@l @r S Xj X ; 47
Rab !ac 40 @
@zc @zb
This equation implies To construct a gauge-fixing fermion  of ghost

number 1, one must again introduce additional
Rac Rab
 0 41 fields. The simplest choice utilizes a trivial pair
  ,
 with
C
One says that Rab is invariant on-shell. A nilpotent
2N  2N matrix has rank  N. Let r be the rank of    1;
C
  

the hessian of S at the stationary point: 48

ghC  1; gh

  0

@l @r S

r rank a b
42 The fields C   are the FaddeevPopov antighosts.
@z @z 
Along with these fields we include the corresponding
We then have r  N. The relevant solutions of the   ,
 . Adding the term
 C
antifields C   to the
classical master equation are those for which r = N. action S does not spoil its properties as a proper
In this case the number of independent gauge solution to the classical master equation, and one
invariances of the type in eqn [39] equals the number gets the nonminimal action
of antifields. When at a later stage the gauge is fixed,
 
Snon S
 C 49
the nonphysical antifields are eliminated.
To ensure the correct classical limit, the proper The simplest possibility for  is
solution must contain the classical action S0 in the
sense that    
C 50
 

S A ; A
 0 S0 i  43 where  are the gauge-fixing conditions for the
A
fields . The gauge-fixed action is denoted by
The action S[A , A ] can be expanded in a series in
the antifields, while maintaining vanishing ghost S Snon j 51
number and even Grassmann parity:
Quantization is performed using the path integral
S;   S0 i Ri C Ca 12 T

1 C C to calculate a correlation function X, with the
i j 1i 14 Eji 1 C C    44 constraint [45] implemented by a -function:
Z 
@
When this is inserted into the classical master I X DD  A 
equation, one finds that this equation implies the @A

gauge structure of the classical theory. i
exp W;   X;   52
h
Here W is the quantum action, which reduces to S in
Gauge Fixing and Quantization the limit h ! 0. An admissible  leads to well-
defined propagators when the path integral is
Equation [39] shows that the action S still possesses
expressed as a perturbation series expansion.
gauge invariances, and hence is not yet suitable for
The results of a calculation should be independent
quantization via the path integral approach: a
of the gauge fixing. Consider the integrand in eqn
gauge-fixing procedure is necessary. In the Batalin
[52],
Vilkovisky approach the gauge is fixed, and the

antifields eliminated, by use of a gauge-fixing i
fermion  which has Grassmann parity () = 1 I;   exp W;   X;   53
h
and gh[] = 1. It is a functional of the fields A
only; its relation to the antifields is Under an infinitesimal change in 
Z
@ I X  I X D  I 54
A 45
@A
BatalinVilkovisky Quantization 251

where the Laplacian  is versa. The geometric object corresponding to a


classical mechanical system in the BatalinVilkovisky
@ @
 1A 1 55 formalism is a QP-manifold.
@A @A The nondegenerate closed 2-form ! is written as
Obviously, the integral I (X) is independent of  if
I = 0. For X = 1 one gets the requirement ! dza ab dzb 62
 
i i where za are local coordinates in the supermanifold
 exp W exp W M. For functions on M, an (odd) Poisson bracket is
h
 h

 defined as in eqn [33], where !ab stands for the
i 1
 W  2 W; W 0 56 inverse matrix of !ab . An even function S on M
h
 2
h satisfies the classical master equation if (S, S) = 0.
The formula The correspondence between vector fields and
1
functions on M is given by KF G = (G, F), where KF
2 W; W i
hW 57 is the vector field, F the given function, and G an
is the quantum master equation. A gauge-invariant arbitrary function. The function F is called the
correlation function satisfies Hamiltonian of the vector field KF .
Geometrically, equivalent QP-manifolds describe
X; W ihX 58 the same physics. In particular, one can consider
The terms of higher order in  h by which the an even Hamiltonian vector field KF corresponding
quantum action W may differ from the solution of to an odd function F. This vector field determines
the classical master equation S correspond to the an infinitesimal transformation preserving P-structure.
counter-terms of the renormalizable gauge theory if It transforms a solution S to the classical master
equation into the physically equivalent solution
S 0 59 S (S, F), where  is an infinitesimally small
One must, of course, use a regularization scheme parameter.
which respects the symmetries of the theory. For A submanifold L of a P-manifold M is called a
W = S O(h) the quantum master equation [57] Lagrangian submanifold if the restriction of the
reduces in this case to the classical master equation form ! to L vanishes. In the particular case when
M = T  N (the cotangent bundle to N with reversed
S; S 0 60 parity of fibres) with standard P-structure, one can
Hence, up to possible counter-terms, one may construct many examples of Lagrangian submani-
simply choose W = S. folds in the following way. Fix an odd function  on
To implement the gauge fixing, one uses for the N, the gauge fermion. The submanifold L 2 M
action W = Snon . For the path integral Z = I (X = 1), determined by the equation
the integration over the antifields in eqn [52] is
@
performed by using the -function. The result is a 63
Z  @xa
i
Z D exp S 61 where {xa , a } are coordinates corresponding to the
h

identification of M, will be a Lagrangian submani-
fold of M.
The P-manifold M in the neighborhood of L can
be identified with T  L. In other words, one can
Geometrical Interpretation of Topological
find such a neighborhood U of L in M and a
Field Theories neighborhood V of L in T  L that there exists an
The BatalinVilkovisky formalism for topological isomorphism of P-manifolds U and V leaving L
field theories has been given a geometrical inter- intact. Using this isomorphism a function  defined
pretation by AKSZ (1997). on a Lagrangian submanifold L
M determines
A supermanifold equipped with an odd vector another Lagrangian submanifold L
M.
field satisfying Q2 = 0 is called a Q-manifold. A Consider a solution S to the classical master
Q-manifold provided with an odd symplectic struc- equation on M. In the BatalinVilkovisky formalism
ture ! (P-structure) is called a QP-manifold if the we have to restrict S to a Lagrangian submanifold
odd symplectic structure is Q-invariant, that is, L 2 M, then the quantization of S can be performed
LQ ! = 0. Every solution to the classical master by integration of exp (iS=h) over L. One may
equation determines a QP-structure on M and vice construct an odd vector field Q on L in such a
252 BatalinVilkovisky Quantization

way that the functional S restricted to L is Considering the commutator of two gauge transfor-
Q-invariant. This invariance is BRST invariance. mations leads to (see eqns [8][11])
AKSZ apply these geometric constructions to obtain Z
 
in a natural way the action functionals of two-  2Pmi ;j Pnj  Pji Pmn ;j Cm Cn 0
dimensional sigma-models (Witten 1998) and to ZM 
show that the ChernSimons theory (Axelrod and  2Pjk i Dlj Pmk ;ij Am Pjl 70
Singer 1991) in BatalinVilkovisky formalism arises as M

a sigma-model with target space G, where G stands Dm kl  j kl
i P ;m  D X  P ;ji Cl Ck 0
for a Lie algebra and  denotes parity inversion.
The Jacobi identity is
Pij ;m Pmk Ci Cj Ck 0 71
The Poisson-Sigma Model
The quantization of the Poisson-sigma model was The fields and antifields of the model are
performed by Hirshfeld and Schwarzweller (2000)  
A fAi ; Xi ; Ci g and A Ai ; Xi ; Ci 72
and by Cattaneo and Felder (2001). The Poisson-
sigma model is the simplest topological field theory The extended action is
in two dimensions. It is a field theory on a two- Z 
dimensional world sheet without boundary (Schaller S   Ai @ Xi Pij XAi Aj
M
and Strobl 1994). It involves a set of bosonic scalar
j 1
fields, which can be seen as a set of maps Ai Di Cj Xi Pji XCj Ci Pjk ;i XCj Ck
Xi : M ! N, where N is a Poisson manifold. In 2
addition, one has a 1-form A on the world sheet M 1 i j kl
A A  P ;ij XCk Cl 73
which takes values in T  (N), for x coordinates on M 4
we have A = Ai dxi ^ dXi . Its action is
Z The gauge-fixing conditions are taken to be of the
  form i (A, X), so that the gauge fermion [50] becomes
S0 X; A   Ai @ Xi Pij XAi Aj 64  i i (A, X). The antifields are then fixed to be
M =C

where  is the antisymmetric tensor and  is the j @ j A; X
Ai C
volume form on M. The gauge transformations of @Ai
the model are
 j @ j A; X
Xi C 74
Xi Pij X"j ; Ai Di "j
j
65 @Xi

Ci 0
j j
where Di = @ i Pkj ,i Ak . The equations of motion   i A; X
C i
are
The gauge-fixed action is
 Dji Aj 0 66 Z 
S   Ai @ Xi Pij XAi Aj
and M
 i ij 
 @ X P Aj  D X 0 i
67 k @ k A; X j  k @ k A; X Pij Cj
C Di Cj C
@Ai @Xi
The gauge algebra is given by
1  m @ m A; X  n @ n A; X
C C  Pkl ;ij X
"1 ; "2 Xi Pji Pmn ;j "1n "2m 4 @Ai @Aj

j
"1 ; "2 Ai Di Pmn ;j "1n "2m 68 Ck Cl
i i A; X 75
  D Xj  Pmn ;ji "1n "2m
Now consider different gauge conditions:
In our general notation the generators of the gauge
j 1. First, the Landau gauge for the gauge potential
transformations R are here Pij and Di . The gauge
tensors T and E are Pij ,k and  Pmn ,ji . The higher- i = @  Ai , so that the gauge fermion becomes
order gauge tensors A and B vanish. =C  i @  Ai . The antifields are fixed to be
The ghost fields are again denoted by Ci . The i
Ai @  C
Noether identities are then
Z   Xi Ci 0 76
  Dji Aj Pki  D Xi Dki Ck 0 69   @  Ai
C i
M
Bethe Ansatz 253

for this gauge choice the gauge-fixed action is Notice that in the noncovariant gauges 2 and 3 the
Z  action simplifies, in that the term which arose
S  i @  Dj Cj
  Ai @ Xi Pij XAi Aj C because of the nonclosed nature of the gauge algebra
i
M vanishes.
1  i @  C
 j  Pkl ;ij X
@  C
4 See also: BF Theories; BRST Quantization; Constrained

Systems; Graded Poisson Algebras; Operads;
 Ck Cl  i @  Ai 77 Perturbative Renormalization Theory and BRST;
Supermanifolds; Topological Sigma Models.
Translating this action into the notation of Cattaneo
and Felder, one sees that it is exactly the expression
they use to derive the perturbation series.
Further Reading
2. Now consider the temporal gauge i = A0i . The
gauge fermion is given by  = C  i A0i . The anti- Alexandrov M, Kontsevich M, Schwarz A, and Zaboronsky O
fields are fixed to (1997) Geometry of the Master Equation. International
Journal of Modern Physics A12: 14051430.
i
A0i C Axelrod S and Singer IM (1991) ChernSimons Perturbation
Theory, Proceedings of the XXth Conference on Differential
A1i 0 Geometric Methods in Physics, Baruch College/CUNY, NY.
78 (hep-th/9110056).
Xi Ci 0
Batalin IA and Vilkovisky GA (1977) Gauge algebra and
  A0i
C quantization. Physics Letters 69B: 309312.
i
Becchi C, Rouet A, and Stora R (1976) Renormalization of gauge
The gauge-fixed action is theories. Annals of Physics (NY) 98: 287321.
Z  Cattaneo AS and Felder G (2001) On the AKSZ formulation of
S   Ai @ Xi Pij XAi Aj the PoissonSigma model. Letters of Mathematical Physics
M 56: 163179.

 i Dj Cj  i A0i Faddeev LD and Popov VN (1967) Feynman diagrams for the
C 0i 79 YangMills field. Physics Letters 25B: 2930.
Gomis J, Paris J, and Samuel S (1994) Antibracket Antifields and
3. Finally consider the SchwingerFock gauge gauge-theory quantization. Physics Reports 269: 1145.
i = x Ai . Then the antifields are fixed to be Hirshfeld AC and Schwarzweller T (2000) Path integral quantiza-
tion of the PoissonSigma model. Annals of Physics (Leipzig)
i
Ai x C 9: 83101.
Schaller P and Strobl T (1994) Poisson structure induced
Xi Ci 0 80 (topological) field theories. Modern Physics Letters A9:
  x Ai
C 31293136.
i
Witten E (1988) Topological sigma models. Communications in
for this gauge choice the gauge-fixed action is Mathematical Physics 118: 411449.
Z  Zinn-Justin J (1975) Renormalization of gauge theories. In:
Rollnik H and Dietz K (eds.) Trends in Elementary
S   Ai @ Xi Pij XAi Aj Particle Physics, Lecture Notes in Physics, vol. 37. Berlin:
M
 Springer.
C i x Dj Cj  i @  Ai 81
i

Bethe Ansatz
M T Batchelor, Australian National University, theory. At the heart of the Bethe ansatz is the way in
Canberra, ACT, Australia which multibody interactions factor into two-body
2006 Elsevier Ltd. All rights reserved. interactions. The Bethe ansatz is thus intimately
entwined with the theory of integrability.
The way in which the Bethe ansatz works is best
Introduction understood by working through an explicit hands-on
example. The canonical example is the isotropic
The Bethe ansatz is a particular form of wave function antiferromagnetic Heisenberg Hamiltonian
introduced in the diagonalization of the Heisenberg
X
L1
spin chain. It underpins the majority of exactly solved H hi;i1 hL;1 ; hij 12 s i  s j 1 1
models in statistical mechanics and quantum field i1
254 Bethe Ansatz

where s = ( x , y , z ) are Pauli matrices and L is the E L  2 2 cos k 6


length of the chain. Periodic boundary conditions are
imposed. However, open boundary conditions may The boundary conditions are such that a(0) = a(L)
also be treated, along with the addition of magnetic and a(L 1) = a(1); either gives eikL = 1, from which
bulk and boundary fields. The z-components of each the L values of k follow.
of the spins are either up or down. Since the
z-component of the total spin commutes with the Case 3: n = 2
Hamiltonian, the total number n of up spins serves as a
good quantum number. A state of the system can Here the wave function can be written in terms of
therefore be conveniently described in terms of the the two flipped spins as
coordinates of all the up spins. Denote these coordi- X
 ax; yj x; yi 7
nates by xi , with 1  xi  L. The quantum number n x<y
ensures that the Hamiltonian decomposes into L 1
sectors, each of size L choose n. The antiferromagnetic It is to be emphasized that one is working in the
ground state occurs in the largest sector. region with x < y. There are two cases to consider:
The normalization of the Hamiltonian [1] is such (1) y > x 1 and (2) y = x 1. Consider the
that its action is that of the permutation operator: interactions in the bulk. For (1) the action of the
Hamiltonian implies
hji ji; hji ji Eax; y L  4ax; y ax  1; y ax 1; y
2
hji ji; hji ji ax; y  1 ax; y 1 8
and for (2)
Eax; x 1 L  2ax; x 1
Diagonalization of Sectors ax  1; x 1 ax; x 2 9
One can address the diagonalization of the sectors
The compatibility of these two equations requires that
for various cases.
2ax; x 1 ax; x ax 1; x 1 10
Case 1: n = 0 which is known as the collision or meeting
Consider the case with all spins down. The condition.
eigenstate is  = j    i, with H = L and, Some adjustments need to be made for spins
thus, E = L is the trivial solution. which get flipped at the boundaries. Looking at
[8] and [9] with x = 1 and x = L, it is evident that
one can take
Case 2: n = 1
ay; x L ax; y 11
There are L states, with
to restore the original ordering. The terms which
X
L arise involve up spins at sites 0 and L 1. This
 axj xi 3 illustrates the periodic boundary condition.
x1 We now assume (the Bethe ansatz) that

where j (x)i is the state with an up spin at site x. ax; y A12 eik1 x eik2 y A21 eik2 x eik1 y 12
The aim is to find the amplitudes a(x). It is clear
Substitution of the ansatz [12] into [8] gives
that
E L  4 2 cos k1 2 cos k2 13
Hj xi L  2j xi j x  1i Substitution of [12] into [10] gives
j x 1i 4
A12 1  2 eik1 eik1 k2
 14
in the bulk (away from either boundary). Insertion A21 1  2 eik2 eik1 k2
of [3] into H = E gives The three relations [11], [12], and [14] give the
Bethe equations
Eax L  2ax ax  1 ax 1 5
A12 A21
ikx eik1 L and eik2 L 15
Substitution of spin waves a(x) = e gives A21 A12
Bethe Ansatz 255

which are to be solved for k1 and k2 . Note that In this case the Bethe ansatz is
ei(k1 k2 )L = 1. y y
ax; y; z A123 zx1 z2 zz3 A132 zx1 z3 zz2
Case 4: n = 3 A213 zx2 zy1 zz3 A231 zx2 zy3 zz1
The full power of the Bethe ansatz method becomes A321 zx3 zy2 zz1 A312 zx3 zy1 zz2 24
evident for three particles. Here
in which zj = eikj . This is a sum over the 3!
X permutations of the integers 1, 2, 3. Inserting this
 ax; y; zj x; y; zi 16
x<y<z
ansatz into [17] gives
E L  6 2cos k1 cos k2 cos k3 25
There are several cases to consider:
To determine the kj , it is convenient to define
1. y > x 1 and z > y 1, where
sij 1  2zj zi zj 26
Eax; y; z L  6ax; y; z ax  1; y; z
Substitution of [24] into the meeting conditions [21]
ax; y  1; z ax; y; z  1 17
and [22] then gives
By a(x  1, y, z), we mean a(x 1, y,z) s12 A123 s21 A213 s13 A132 s31 A312
a(x  1, y, z), etc.
s23 A231 s32 A321 0 27
2. y = x 1 and z > y 1, with

Eax; x 1; z s23 A123 s32 A132 s13 A213 s31 A231


L  4ax; x 1; z ax  1; x 1; z s21 A321 s12 A312 0 28
ax; x 2; z ax; x 1; z  1 18 These equations are assumed to be satisfied in
permutation pairs, that is,
3. y > x 1 and z = y 1, where
s12 A123 s21 A213 0
Eax; y; y 1 29
s23 A123 s32 A132 0; etc:
L  4ax; y; y 1 ax  1; y; y 1
ax; y  1; y 1 ax; y; y 2 19 Up to an overall constant, the relations [27] and [28]
are satisfied by
4. y = x 1 and z = y 1, for which A123 s21 s31 s32 ; A132 s31 s21 s23
Eax; x 1; x 2 L  2ax  1; x 1; x 2 A312 s13 s23 s21 ; A321 s23 s13 s12 30
ax; x 1; x 3 20 A231 s32 s12 s13 ; A213 s12 s32 s31

Again, we must ensure that these equations are The boundary condition, a(y, z, x L) = a(x, y, z),
compatible. This involves comparison of the last gives
three equations with [17]. The three equations to be  L  y   y
z1 A321  A132 z1x z3 z2x zL2 A312  A231 z2x z3 z1x
satisfied are    
y y
zL1 A231  A123 z1x z2 z3x zL3 A213  A321 z3x z2 z1x
2ax; x 1; z ax; x; z ax 1; x 1; z 21    
zL2 A132  A213 z2x z1y z3x zL3 A123  A312 z3x z1y z2x
2ax; y; y 1 ax; y; y ax; y 1; y 1 22 0 31
4ax; x 1; x 2 ax; x; x 2 ax; x 1; x 1 This leads to the equations
ax; x 2; x 2 A123 A132 s21 s31
ax 1; x 1; x 2 23 zL1
A231 A321 s12 s13
But note that setting z = x 2 in [21] and y = x 1 A213 A231 s12 s32
zL2 32
in [22] leads to [23] being automatically satisfied. A132 A312 s21 s23
We are thus left with only two equations [21] and A321 A312 s13 s23
[22]. Note the similarity between these two equa- zL3
A213 A123 s31 s32
tions and the meeting condition [10] for the n = 2
case. which can be solved for the Bethe roots kj .
256 Bethe Ansatz

General n meeting conditions can be handled in terms of two-


body interactions. To see this more clearly, the six
The general Bethe ansatz is
X permutation pair equations [29] can be written in
ax1 ; . . . ; xn Ap1 ;...;pn zxp11 . . . zxpnn 33 the general form Aabc = Yab Abac and Aabc = Ybc Aacb ,
P where Yab = sba =sab . Now there are two possible
where the sum is over all n! permutations paths to get from Aabc to Acba , namely
P = {p1 , . . . , pn } of the integers 1, . . . , n. The boundary Acba Yab Yac Ybc Aabc
condition is 42
Acba Ybc Yac Yab Aabc
ax2 ; x3 ; . . . ; xn ; x1 L ax1 ; x2 ; . . . ; xn 34
Both paths must be equivalent, with
leading to the Bethe equations
Yab Yba 1 and Yab Yac Ybc Ybc Yac Yab 43
Ap1 ;...;pn
zLp1 35 The latter is a condition of nondiffraction or
Ap2 ;...;pn ;p1 equivalently a manifestation of the YangBaxter
for all permutations, with equation.
Y Historically, the next model to be exactly solved in
Ap1 ;...;pn P spj ;pi 36 terms of the Bethe ansatz was the one-dimensional
1i<jn model of N interacting bosons on a line of length L
where P is the signature of the permutation. Finally, defined by the Hamiltonian

Y
n
sp ;p Y
n
s;j XN
@2 X
zLp1 n1 1
or zLj n1 37 H 2c xi  xj 44
s sj; @x2i
2 p1 ;p 1 i1 1i<jN
6j

where c is a measure of the interaction strength. For


for j = 1, . . . , n. The eigenvalues are given by
this model the Bethe ansatz wave function is of the
X
n   same form as [33] with the two-body interaction
EL 2 cos kj  2 38 term given by
j1
sab ka  kb ic 45
Another form of the Bethe equations is obtained
by defining The Bethe equations are given by

uj  1/2i Y
N
kj  k ic
eikj 39 expikj L 
uj 1/2i 1
kj  k  ic

which gives for j 1; . . . ; N 46


X
n
1 The energy eigenvalue is
EL 40
u2
j1 j
1/4 X
N
E k2j 47
with uj satisfying j1
  Y
uj  1/2i L n
uj  u  i For repulsive (c > 0) interactions, one can prove that
 41 all Bethe roots are real.
uj 1/2i u  u i
1 j
The Bethe ansatz has been applied to a number of
for j = 1, . . . , n. other and more general models, both for discrete
All eigenvalues of the Heisenberg spin chain may spins and in the continuum. These include the
be obtained in terms of the Bethe ansatz solution. anisotropic Heisenberg (XXZ) spin chain, for
For example, the distribution of roots uj for the which the above working readily generalizes to
ground state are real and symmetric about the trigonometric functions. The underlying ansatz [33]
origin. Excitations may involve complex roots. remains the same. One key generalization is the
Although obtained exactly in terms of the Bethe nested Bethe ansatz, which arises, for example, in
roots, the Bethe ansatz wave function is the solution of the general N-state permutator
cumbersome. model, the Hubbard model, and the GaudinYang
We have thus seen how the Bethe ansatz works model of interacting fermions. For such models the
for the Heisenberg spin chain. The underlying nested Bethe ansatz involves an additional level of
mechanism is the way in which the collision or work to determine the amplitudes appearing in the
BF Theories 257

wave function [33] due to higher symmetries. This Theory; Integrable Systems: Overview; Quantum Spin
results in Bethe equations involving different types Systems; YangBaxter Equations.
or colors of roots.
The exactly solved one-dimensional quantum spin
chains may also be obtained from their two-dimen- Further Reading
sional classical counterparts the vertex models. For
Baxter RJ (1983) Exactly Solved Models in Statistical Mechanics.
example, the six-vertex model shares the same Bethe
London: Academic Press.
ansatz wave function and Bethe equations as the Baxter RJ (2003) Completeness of the Bethe ansatz for the six-
XXZ spin chain. The more general permutator and eight-vertex models. Journal of Statistical Physics
Hamiltonians are related to multistate vertex models. 108: 148.
One may also consider other spin-S models. Bethe HA (1931) Zur Theorie der Metalle I. Eigenwerte und
Eigenfunktionen der linearen Atomkette. Zeitschrift fur Physik
The discussion in this article has centered on what is
71: 205226.
known as the coordinate Bethe ansatz. Another Gaudin M (1967) Un Systeme a Une Dimension de Fermions en
formulation is the algebraic Bethe ansatz, which was Interaction. Physics Letters A 24: 5556.
developed for the systematic treatment of the higher- Gaudin M (1983) la Fonction donde de Bethe. Paris: Masson.
spin models. In this formulation, operators create the Korepin VE, Izergin AG, and Bogoliubov NM (1993) Quantum
Inverse Scattering Method and Correlation Functions.
Bethe states by acting on a vacuum. The algebraic
Cambridge: Cambridge University Press.
Bethe ansatz goes hand-in-hand with the quantum Lieb EH and Liniger W (1963) Exact analysis of an interacting
inverse-scattering method. In all of the exactly solved Bose gas I. The general solution and the ground state. Physical
Bethe ansatz models, it is possible to derive quantities Review 130: 16051616.
like the ground-state energy per site via the root density Mattis DC (1993) The Many-Body Problem: An Encyclopaedia of
Exactly Solved Models in One-Dimension. Singapore: World
method, which assumes that the Bethe roots form a
Scientific.
uniform distribution in the infinite-size limit. The McGuire JB (1964) Study of exactly soluble one-dimensional
thermodynamics of the Bethe ansatz solvable models N-body problems. Journal of Mathematical Physics
may also be calculated in a systematic fashion. 5: 622636.
Despite Bethes early optimism, the Bethe ansatz Sutherland B (2004) Beautiful Models: 70 Years of Exactly Solved
Quantum ManyBody Problems. Singapore: World Scientific.
has not been extended to higher-dimensional Takahashi M (1999) Thermodynamics of One-Dimensional
systems. Solvable Models. Cambridge: Cambridge University Press.
Yang CN (1967) Some exact results for the many-body problem
See also: Affine Quantum Groups; Eight Vertex and Hard in one-dimension with repulsive Delta-function interaction.
Hexagon Models; Integrability and Quantum Field Physical Review Letters 19: 13121315.

BF Theories
M Blau, Universite de Neuchatel, Neuchatel, that A is flat, FA = 0, and thus BF theories are
Switzerland topological gauge theories of flat connections.
2006 Elsevier Ltd. All rights reserved. Abelian BF theories and their relation to topolo-
gical invariants (the RaySinger torsion) were
originally discussed by Schwarz (1978, 1979). In
the context of the topological field theory, non-
Introduction
abelian BF theories were introduced in Horowitz
BF theories are a class of gauge theories with a (1989) and Blau and Thompson (1989, 1991).
nontrivial metric-independent classical action. As Since then, BF theories have attracted a lot of
such these theories are candidate topological field attention as simple toy-models of (topological)
theories akin to the ChernSimons theory in three gauge theories, and also because of their relation-
dimensions, but in contrast to the ChernSimons ships with the ChernSimons theory, the YangMills
theory these exist and are well defined in arbitrary theory, and gauge-theory formulations of gravity, as
dimensions. well as because of the rather rich and intricate
The name BF theories derives from the fact structure of their quantum theories.
that, roughly (see [1] below and the subsequent The purpose of this article is to provide an
discussion for a more precise description),
R the action overview of these various features of BF theories.
of the BF theory takes the form B ^ FA with FA the The standard reference for the basic classical and
curvature of a connection A and B a Lagrange quantum properties of BF theories is Birmingham
multiplier. The classical equations of motion imply et al. (1991).
258 BF Theories

Basic Classical Properties of BF Theories StoraTyupkin procedure), a typical gauge choice


being dA0 ? (A  A0 ) = 0 where A0 is a reference
Nonabelian BF Theories
connection, and ? is the Hodge duality operator
The classical action and equations of motion Typi- corresponding to a choice of metric on M.
Typically, the classical action of the BF theory takes
the form Local p-form symmetries For n = 2, the only local
Z symmetries of the BF action are the above G gauge
SBF A; B trG B ^ FA 1 transformations. For n > 2, however, there are other
M local symmetries associated with shifts of Bp 2
where FA is the curvature of a connection A on a p (M, g) with p = n  2 > 0. Indeed, integration by
principal G-bundle P ! M over an n-dimensional parts using Stokes theorem and @M = 0 shows that [1]
manifold M, B is an ad-equivariant horizontal is invariant under
(n  2)-form on P, and trG (a trace) denotes an
ad-invariant nondegenerate scalar product on the A ! A; Bp ! Bp dA p1 ; p1 2 p1 M; g 6
Lie algebra g of the Lie group G. Generalizations of For p = 1,  is a 0-form and the invariance follows.
this are possible, in particular, for G abelian or for For p > 1, however, the gauge parameter has, in
n = 3 and are mentioned below. some sense, its own gauge invariance. Namely,
We consider FA and B as forms on M taking under the shift
values in the bundle of Lie algebras ad P = P ad g
and refer to such objects as elements of  (M, g). p1 ! p1 dA p2 7
Then tr B ^ FA 2 n (M, R) is a volume form on M. one has
In order to simplify the exposition, in the following
we will mostly assume that G is compact semisimple dA p1 ! dA p1 FA ; p2  8
and that M is compact without a boundary (even Thus for FA = 0, the shift [7] has no effect on the
though relaxing any one of these conditions is local symmetry [6]. Likewise, for p > 2 the parameter
possible and also of interest in its own right). p2 itself has a similar invariance, etc. Since FA = 0
Varying the action [1] with respect to A and B, is one of the classical equations of motion, the shift
one obtains the classical equations of motion symmetry [6] is what is called an on-shell reducible
FA 0; dA B 0 2 symmetry. Gauge-fixing such symmetries is not
straightforward, and one generally appeals to the
where BatalinVilkovisky formalism to accomplish this.
dA B dB A; B 3
Diffeomorphisms and local symmetries One mani-
is the covariant exterior derivative. In particular, festation of the general covariance of the BF action
therefore, the equations of motion imply that the [1] is the on-shell equivalence of (infinitesimal)
connection A is flat. diffeomorphisms and (infinitesimal) local symme-
tries. Diffeomorphisms are generated by the Lie
Gauge invariance For any n, the action [1] is derivative LX along a vector field X. The action of
invariant under G gauge transformations (vertical LX on differential forms is given by the Cartan
automorphisms of P) acting on A and B as formula LX = diX iX d, where i(.) is the operation
of contraction. The action of the Lie derivative on
A ! g1 Ag g1 dg; B ! g1 Bg 4 A and B can be written in gauge covariant form as
(the latter is what is meant by the fact that B takes LX A iX FA dA X;
values in ad P), because FA is also ad-equivariant, 9
LX B iX dA B B; X dA 0 X
FA ! g1 FAg , and trG is ad-invariant. The infinitesi-
mal version of this statement is that the action is where (X) = iX A and 0 (X) = iX B. This shows that
invariant under the variations on-shell diffeomorphisms are equivalent to field-
dependent gauge and p-form symmetries of the
A dA ; B B;  5 BF action.
where  2 0 (M, g) can (formally) be thought of as
an element of the Lie algebra of the group of gauge The classical moduli space The classical moduli
transformations. space C = C(P, M, G) is the space of solutions to the
Gauge-fixing this symmetry can proceed in the classical equations of motion modulo the local
usual way (via the FaddeevPopov or BecchiRouet symmetries of the action. Since the field content
BF Theories 259

and the nature of the local symmetries of the BF for example, the usual YangMills action for
theory depend strongly on the dimension n of M, the nonabelian gauge fields
structure and interpretation of the classical moduli Z
1
space also depend on n. SYM 2 trG FA ^ ?FA 17
For n = 2, by [5] the equation of motion [2] for 4g M
B 2 0 (M, g) says that A is invariant under the it does not require a metric (or the corresponding
infinitesimal gauge transformation generated by B. Hodge duality operator ?) for its formulation. This
Thus if A is irreducible, there are no nontrivial makes it a candidate action for a topological field
solutions for B and, away from reducible flat theory, this term loosely referring to field theories
connections, the classical moduli space is just the which, in a suitable sense, do not depend on
moduli space of flat connections on P ! M over the additional structures imposed on the underlying
surface M: space(-time) manifold M, in this case a Riemannian
structure.
Cn2 Mflat P; G 10
To establish that BF theories are topological
This space may or may not be empty, depending on quantum field theories, one needs to show that
whether P admits flat connections or not. the partition function (and correlation functions)
For n = 3, the equation of motion [2] for of the quantized BF theory are also metric
B 2 1 (M, g) says that B is a tangent vector to the independent. This is not completely automatic as
space of flat connections at the flat connection A, in typically the metric enters in the gauge fixing of
the sense that under the variation A = B, one has the local symmetries of the action which is
required to make the quantum theory well defined.
FA dA B 0 11
The usual lore is that since the metric only enters
The local G gauge symmetry and the 1-form symmetry through the gauge fixing and since the quantum
[6] now imply that the moduli space of classical theory should be independent of the choice of
solutions can be identified with the (co-)tangent bundle gauge, it should also be metric independent. In the
of the moduli space of flat connections on P ! M case of nonabelian BF theories, the complexity of
over the 3-manifold M: their local symmetries complicates the analysis
somewhat, but it can nevertheless be shown that
Cn3 TMflat P; G 12
BF theories indeed define topological field theories
In higher dimensions there appears to be less also at the quantum level.
geometrical structure associated with BF theories,
and all that can be said in general is that the tangent Special Features of Abelian BF Theories
space to Cn at a solution (A, B) of the equations of
motion [2] is the vector space: All the features of nonabelian BF theories discussed
above are, of course, also valid when G is abelian
TA;B Cn HA1 M; g  HAn2 M; g 13 (with some obvious modifications and simplifica-
where HAk (M, g) are the cohomology groups of the tions). However, when G is abelian, a more general
deformation complex action than [1] is possible. Indeed, although there is
no obvious higher p-form analog of nonabelian
dA :  M; g ! 1 M; g 14 gauge fields, in the abelian case G = U(1) or G = R,
2 and the condition FA 2 2 (M, R) can be relaxed. In
associated with the flat connection A, FA = (dA ) = 0.
particular, one can consider the actions
When M is topologically of the form M =   R
Z
(where one can think of R as time), one has
Sn; p  SBp ; Cnp1 Bp ^ dCnp1 18
TA;B Cn HA1 ; g  HAn2 ; g 15 M

with Bp 2  (M, R), Cnp1 2 np1 (M, R), and


p
This is naturally a symplectic vector space (necessary FC = dC; its (n  p)-form field strength. More gen-
for a phase space), the nondegenerate antisymmetric erally, one can also consider the hybrid action
pairing being given by Poincare duality: Z
Z
SA n; p Bp ^ dA Cnp1 19
!a1 ; b1 ; a2 ; b2  trG a1 ^ b2  a2 ^ b1 16 M

where A is a fixed (nondynamical) flat G-connection,
dA2 = 0, and B and C take values in the corresponding
Metric independence Perhaps the most important adjoint bundle. This action can be considered as the
property of the action [1] is that, in contrast to, linearization of the nonabelian BF action [1] around
260 BF Theories

the flat connection A, and it reduces to the abelian BF is well defined. The RaySinger torsion of (M, g)
action [18] for g = R. (with respect to the flat connection A) is then
The action is invariant under the (reducible) local defined by
symmetries
n 
Y  p
p 1 p=2
Bp ! Bp dA p1 TA M det A 25
20 p0
Cnp1 ! Cnp1 dA 0np2
The space of solutions to the equations of motion Even though this definition depends strongly on the
dA C = dA B = 0 modulo gauge symmetries is (cf. [13]) metric g on M, the RaySinger torsion has the
the finite-dimensional vector space remarkable property of being independent of g. The
RaySinger torsion can be shown to be trivial
p np1
Cn; p HA M; g  HA M; g 21 (essentially =1 modulo zero-mode contributions)
in even dimensions, but is a nontrivial topological
which is naturally symplectic for M =   R. invariant in odd dimensions. Henceforth, we will
suppress the dependence on M and denote the
n-dimensional RaySinger torsion by TA (n).
Uses and Applications of Quantum
Abelian BF Theories Gaussian path integrals and determinants The path
Quantization of Abelian BF Theories and the integral for abelian BF theories is modeled on the
RaySinger Torsion usual formula for a -function
Z
We will now show that the partition function of 1
n x p n dn  eix 26
the abelian BF theory (actually more generally that 2 Rn
of the linearized nonabelian BF action [19]) is
related to the RaySinger torsion of M. This from which one deduces the Gaussian integral
requires some preparatory material on Gaussian formula
path integrals, determinants, and gauge fixing that Z
1
we present first. p dn  dn x eiDxiKxiJ
In order to simplify the exposition, we assume 2n Rn Rn
that there are no harmonic modes, either because Z
they have been gauged away or because the dn xn Dx J eiKx
Rn
cohomology groups of dA are trivial, HAk (M, g) = 0,
1 1
that is, the deformation complex [14] is acyclic. eiK:D J 27
det D
Here, we have assumed that the operator (matrix) D
Laplacians, determinants, and the RaySinger
is invertible. The model that one uses in the path
torsion Choosing a Riemannian metric g (and
integral is that
Hodge duality operator ?) on M, the twisted
Laplacian on p-forms is Z R
i
det D 1
?D
d d e M 28
p
A dA dA? 2 dA dA? dA? dA 22
where  is a set of fields and the  are a set of dual
where dA? =  ? dA ? is the adjoint of d with respect to
fields with D again a nondegenerate operator. The
the scalar product on p-forms defined by ?. This is an
inverse determinant arises for Grassmann even fields
elliptic operator whose determinant can be defined, for
(as in [27]), while it is the determinant that appears
example, by a -function regularization. Denoting the
for Grassmann odd fields.
(nonzero) eigenvalues of A(p) by k(p) , its -function is
X p s Gauge fixing the FaddeevPopov trick If the
 p s k 23 R
k
action [19], SA (n, p) = Bp dA Cnp1 , were non-
degenerate, its partition function could be defined
This converges for Re(s) sufficiently large and can be directly by [28]. However, because of gauge invariance
analytically continued to a meromorphic function of of the action, the kinetic term is degenerate and one
s analytic at s = 0, so that needs to eliminate the gauge freedom to obtain an (at
p p0
least formally) well-defined expression for the partition
det A : e 0
24 function. Concretely, this degeneracy can be seen by
BF Theories 261

recalling that, when there are no harmonic forms (as we where  denotes collectively all the fields. Concre-
have assumed), there is a unique orthogonal Hodge tely, when n = 2 and p = 0 (or, equivalently, p = 1),
decomposition of a p-form Bp 2 p (M, g) into a sum of the quantum action is
a dA -exact and a dA -coexact form: Z
q 0
Bp dA p1 dA? p1 29 SA 2; 0 B0 dA C1 dA ? C1 c ? A c 35

(and likewise for C). Evidently, the exact (longitudinal) Likewise, for n = 3 and p = 1 (the only other case
parts dA  of B and C do not appear in the action, and when the gauge symmetry is indeed irreducible),
these are precisely the gauge-dependent parts of B and both B1 and C1 require separate gauge fixing, and
C under the gauge transformation [20]. Gauge fixing the quantum action is
amounts to imposing a condition F (Bp ) = 0 on Bp that Z
determines the longitudinal part uniquely in terms of q 0
SA 3; 1 B1 dA C1 dA ? C1 c ? A c
the transversal part dA? . A natural condition is
0
dA p1 0 , F Bp dA? Bp 0 30 0 dA ? B1 c0 ? A c0 36

A gauge-fixing condition independent of the partition Formally, therefore, the two-dimensional partition
function results from inserting 1 in the form of function is
Z
det 0
1 dgF Bg F B 31 ZA 2; 0 37
G det DA
into the functional integral (the FaddeevPopov where DA is the operator:
trick), where G is the gauge group. This defines the  
FaddeevPopov determinant F , and the functional ?dA
DA : 1 M; g
properties of the delta functional imply that F is ?dA ?
the determinant of the operator that one obtains ! 0 M; g  0 M; g 38
upon gauge variation of F (B).
In the general case of reducible gauge symmetries, One can define the determinant of this operator as
the nature of the gauge group is complicated and the square root of the determinant of the operator
requires some more thought. In the irreducible case, D?A DA = (1)
A , and therefore the partition function
however, that is, for p = 1, the Lie algebra of the
gauge group can be identified with 0 (M, g), and ZA 2; 0 det 0 det 1 1=2 TA 2 39
F is the determinant of the operator: is equal to the two-dimensional RaySinger torsion
F [25]. In this case, it is easy to see directly that the
dA : 0 M; g ! 0 M; g 32 even-dimensional RaySinger torsion is trivial, as
B
one could have equally well defined the determinant
For [30], this is simply the Laplacian on 0-forms, of DA as the square root of the operator
and thus DA D?A = (0) (0)
A  A , which implies ZA (2, 0) = 1.
F det A
0
33 In three dimensions, the two pairs of ghosts each
contribute a det (0)
A , and thus

det 0 2
ZA 3; 1 40
det DA
The partition function Following the finite-dimen-
sional model, both the -function implementing the where !
gauge-fixing condition and the FaddeevPopov ?dA dA
determinant can be lifted into the exponential, the DA : 0 M; g  1 M; g
dA ? 0
former by a Lagrange multiplier  [26], a Grassmann
even 0-form, and the latter by a pair of Grassmann ! 0 M; g  1 M; g 41
odd 0-forms c and c [28], the ghost and antighost
is the operator acting on the fields (B1 , C1 , , 0 ). As
fields, respectively. The sum of the classical action
before, this operator can be diagonalized by squar-
and these gauge-fixing and ghost terms defines the
q ing it, DA DA = (0)  (1) , and thus
(BRST-invariant) quantum action SA (n, p), and the
partition function is 0
ZA 3; 1 det A 3=2 det A 1=2
1
Z
TA 31
q
ZA n; p deiSA n;p 34 42
262 BF Theories

is again related to the (this time genuinely nontrivial) Since the dimension of  is equal to the codimen-
RaySinger torsion. sion of S0 = @0 ,  and S0 will generically intersect
In spite of the complications caused by reducible transversally at isolated points, and we define the
gauge symmetries, it can be shown that all of the linking number of S and S0 to be the intersection
above generalizes to arbitrary n and p, with the number of  and S0 , expressed in terms of de Rham
result that (for n odd) currents as
p Z Z
ZA n; p TA n1 43
LS; S0 S0  S0 46
confirming the topological nature of BF theories.  M

In the nonabelian case, the situation is significantly In terms of de Rham currents, the Wilson
R surface
more complicated because of the complexity of the operators can be written as WS [B] = M S ^ B, etc.
classical moduli space, the (higher cohomology) zero Thus, the generating functional for correlation
modes, and the on-shell reducibility of the gauge functions of Wilson surface operators
symmetries. Nevertheless, ignoring all the zero modes
except those of A, that is, except the moduli m of flat hei WS B ei
WS0 C i
Z R
connections A(m), the result is similar to that in the i B dC
S0 C S B
DCDBe M 47
abelian case, in that the partition function reduces to an
integral over the moduli space of flat connections, with
is simply a Gaussian path integral. Using the
measure determined by the RaySinger torsion TA(m) .
defining properties of de Rham currents, this can
be formally evaluated (using [27]) to give
Linking Numbers as Observables of Abelian 0

BF Theories hei WS B ei
WS0 C i ei
LS;S 48

With the exception of p = 0, there are no interesting As expected, correlation functions of these topolog-
local observables (gauge-invariant functionals of the ical field theories encode topological information.
fields C and B) in the abelian BF theory, since the gauge-
invariant field strengths dC and dB vanish by the
equations of motion. (For p = 0, B is a gauge-invariant Uses and Applications of Classical
0-form and hence B(x) is a good local observable.) Nonabelian BF Theories
However, as in the ChernSimons and YangMills Low-dimensional BF theories are closely related to
theories, certain (weakly) nonlocal observables such as other theories of interest, for example, the Yang
Wilson loops are also of interest. In the case at hand (eqn Mills theory, the ChernSimons theory, and gravity.
[18]), we have abelian Wilson surface operators Here, we briefly review some of these relationships.
Z Z
In order to avoid the complexities of quantum
WS B B; WS0 C C 44 nonabelian BF theories, we focus on their classical
S S0
features. Brief suggestions for further reading are
associated with p- and (n  p  1)-dimensional sub- provided at the end of each subsection.
manifolds S and S0 of M, respectively. These operators
are gauge invariant, that is, invariant under the local Relation with YangMills Theory
symmetries [20] provided that @S = @S0 = 0, so that S
and S0 represent homology cycles of M. In any dimension, the nonabelian BF action can be
For M = R n , correlation functions of these opera- regarded as the zero-coupling limit g2 ! 0 of the
tors are related to the topological linking number of YangMills theory since the YangMills action [17]
S and S0 . We choose S = @ and S0 = @0 to be can be written in first-order form as
disjoint compact-oriented boundaries of oriented Z
1
submanifolds  and 0 of Rn . We also introduce trG FA ^ ?FA
4g2 M
de Rham currents  and S (essentially distribu- Z
tional differential forms with -function support on  trG iBn2 ^ FA g2 Bn2 ^ ?Bn2  49
 or S, respectively), characterized by the properties M
Z Z
However, whereas for n
3 the B2 -term breaks the
!p  S ^ !p p-form gauge invariance of the BF action (and thus
Z S ZM 45 liberates the physical YangMills degrees of free-
!p1  ^ !p1 dom), this limit is nonsingular in two dimensions
 M
where this p-form symmetry is absent and, indeed,
k
for all !k 2  (M, R) (and likewise for S0 and 0 ). both theories have zero physical degrees of freedom.
BF Theories 263

c c
A nonsingular BF-like zero coupling limit of [Ja , Jb ] = fab Jc , [Ja , Pb ] = fab Pc and [Pa , Pb ] = 0, and
the YangMills theory for n
3 can be obtained the curvature of the TG-connection C = Ja Aa Pa Ba is
by introducing an auxiliary (Stuckelberg) field
2 n3 (M, g) which restores the p-form gauge FC Ja FAa Pa dA Ba 53
invariance. The resulting BF YangMills action is Thus, the equations of motion of the TG Chern
Z 
Simons theory are equivalent to the equations of
SBFYM trG iBn2 ^ FA motion [2] of the BF theory with gauge group G.
M
  This equivalence also holds at the level of the action:
2 1
g Bn2  p dA 1
2g 2 SCS C SBF A; B 54
 
1
^  Bn2  p dA 50 provided that one chooses the nondegenerate invar-
2g iant scalar product to be
This action is not only invariant under ordinary G
trTG Ja Pb trG Ja Jb
gauge transformations, but also under the p-form 55
gauge symmetry B ! B dA  [6] provided that trTG Ja Jb trTG Pa Pb 0
p
transforms as ! 2g. Thus, this shift can be
For G = SO(3), TG is the Euclidean group of
used to set to zero, upon which one recovers the
isometries of R3 and for G = SO(2, 1), TG is the
first-order form of the YangMills action. More-
Poincare group of isometries of the three-dimensional
over, in the zero-coupling limit all that survives is a
Minkowski space R2, 1 . For these gauge groups, the BF
standard (and nontopological) minimal coupling of
action takes the form of the three-dimensional
to the BF action:
(Euclidean or Lorentzian) EinsteinHilbert action,
lim SBFYM with the interpretation of B = e as the dreibein and
g2 !0 A = ! as the spin connection. The equations of motion
Z

for e and ! express the vanishing of the torsion
trG iBn2 ^ FA 12 dA ^ dA 51
M and the Riemann tensor (equivalent to the vanishing
of the Ricci tensor for n = 3), respectively. This
accounting for the correct number of degrees of
ChernSimons interpretation of three-dimensional
freedom of the YangMills theory (the (n  3)-form
gravity extends to gravity with a cosmological
being absent for n = 2).
constant, with H the appropriate de Sitter or anti-de
Two-dimensional quantum BF and YangMills
Sitter isometry group (SO(4), SO(3, 1), or SO(2, 2),
theories have a variety of interesting topological
depending on the signature and the sign of the
properties. An account of some of them can be found
cosmological constant). In terms of the BF interpreta-
in Blau and Thompson (1994) and Witten (1991). For
tion, this corresponds to the simple topological
a detailed discussion of the gauge symmetries and gauge
deformation
fixing of the BFYM action, see Cattaneo et al. (1998).
Z

S BF A; B trG B ^ FA 13 B ^ B ^ B 56
M
ChernSimons Theory, Gravity, and (Deformed)
BF Theory of the BF action, which has the deformed local
symmetries (cf. [5] and [6])
The ChernSimons theory is a three-dimensional
gauge theory. The ChernSimons action for an A dA  B; 0 ; B B;  dA 0 57
H-connection C, H the gauge group, is
Z A simple way to understand these symmetries is to
note that the action can be written as the difference
SCS C trH C ^ dC 23 C ^ C ^ C 52
M of two ChernSimons actions:
p p
It is invariant under the infinitesimal gauge transforma- SCS A B  SCS A  B
tions C = dC ,  2 0 (M, h), and the gauge-invariant p
4 S BF A; B 58
equation of motion is the flatness condition FC = 0.
Now let H = TG be the tangent bundle group whose evident standard local gauge symmetries
p
TG G s g. This is a semidirect product group (A  B) = dAp B  are equivalent to [57] for
p
with G acting on g via the adjoint and g regarded  =   0 .
as an abelian Lie algebra of translations. Thus, in A detailed account of three-dimensional classical
terms of generators (Ja , Pa ), where the Ja are and quantum gravity can be found in Carlip
generators of G, the commutation relations are (1998).
264 BF Theories

Relation with Gravity Wilson loops and string topology has been investi-
gated in Cattaneo et al. (2003).
Theories of two-dimensional gravity and topological
gravity also have a BF formulation (Blau and See also: BatalinVilkovisky Quantization; BRST
Thompson 1991, Birmingham et al. 1991) which Quantization; ChernSimons Models: Rigorous Results;
resembles the ChernSimons BF formulation of Gauge Theories From Strings; Knot Invariants and
three-dimensional gravity described above, the nat- Quantum Gravity; Loop Quantum Gravity; Moduli
ural gauge group now being SO(2, 1) or SO(3) or Spaces: An Introduction; Nonperturbative and
one of its contractions. Topological Aspects of Gauge Theory; Schwarz-Type
In the first-order (Palatini) formulation, the Topological Quantum Field Theory; Spin Foams;
EinsteinHilbert action for four-dimensional gravity Topological Quantum Field Theory: Overview.
can be written as
Z
SEH tre ^ e ^ F! 59
Further Reading
Baez J (2000) An introduction to spin foam models of
where e is the vierbein and ! is the spin quantum gravity and BF theory. Lecture Notes in Physics
543: 2594.
connection. This action has the general form of a Birmingham D, Blau M, Rakowski M, and Thompson G (1991)
BF action with a constraint that B = e ^ e be a Topological field theory. Physics Reports 209: 129340.
simple bi(co-)vector. Thus, four-dimensional Blau M and Thompson G (1989) A New Class of Topological
general relativity can be regarded as a constrained Field Theories and the RaySinger Torsion. Physics Letters B
BF theory. Although this constraint drastically 228: 6468.
Blau M and Thompson G (1991) Topological gauge theories
changes the number of physical degrees of freedom of antisymmetric tensor fields. Annals of Physics
(BF theory has zero degrees of freedom, while 205: 130172.
four-dimensional gravity has two), this is never- Blau M and Thompson G (1994) Lectures on 2d gauge theories:
theless a fruitful analogy which also lies at the topological aspects and path integral techniques. In: Gava E,
heart of the spin-foam quantization approach to Masiero A, Narain KS, RandjbarDaemi S, and Shafi Q (eds.)
Proceedings of the 1993 Trieste Summer School on High
quantum gravity. This constrained BF description Energy Physics and Cosmology, pp. 175244. Singapore:
of gravity is also available for higher-dimensional World Scientific.
gravity theories. Carlip S (1998) Quantum Gravity in 2 1 Diemensions. Cambridge:
For further details, and references, see Freidel et al. Cambridge University Press.
(1999) and the review article (Baez 2000). Cattaneo A and Rossi C (2001) Higher-dimensional BF theories in
the BatalinVilkovisky formalism: the BV action and general-
ized Wilson loops. Communications in Mathematical Physics
Knot and Generalized Knot Invariants 221: 591657.
Cattaneo A, Cotta-Ramusino P, Fucito F, Martellini M, and
The known relationship between Wilson loop Rinaldi M, et al. (1998) Four-dimensional YangMills theory
observables of the ChernSimons theory with as a deformation of topological BF theory. Communications in
Mathematical Physics 197: 571621.
a compact gauge group and knot invariants Cattaneo A, Pedrini P, and Frohlich J (2003) Topological field
(Witten 1989), and the interpretation of the three- theory interpretation of string topology. Communications in
dimensional BF theory as a ChernSimons theory Mathematical Physics 240: 397421.
with a noncompact gauge group raise the question of Freidel L, Krasnov K, and Puzio R (1999) BF description of
higher-dimensional gravity theories. Advances in Theoretical
the relation of observables of an n = 3 BF theory to
and Mathematical Physics 3: 12891324.
knot invariants, and suggest the possibility of using Horowitz GT (1989) Exactly soluble diffeomorphism invariant
an n
4 BF theory to define higher-dimensional theories. Communications in Mathematical Physics
analogs of knot invariants. It turns out that an 125: 417437.
appropriate observable of n = 3 BF theory for Schwarz AS (1978) The partition function of a degenerate
G = SU(2) is related to the AlexanderConway quadratic functional and RaySinger Invariants. Letters in
Mathematical Physics 2: 247252.
polynomial. The analysis of higher-dimensional BF Schwarz AS (1979) The partition function of a degenerate
theories requires the full power of the Batalin functional. Communications in Mathematical Physics
Vilkovisky (BV) formalism. BV observables general- 67: 116.
izing Wilson loops have been shown to give rise to Witten E (1989) Quantum field theory and the Jones
cohomology classes on the space of imbedded curves. polynomial. Communications in Mathematical Physics
127: 351399.
For a detailed discussion of these issues, see Witten E (1991) On quantum gauge theories in two dimen-
Cattaneo and Rossi (2001) and references therein. sions. Communications in Mathematical Physics
A relation between the algebra of generalized 141: 153209.
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 265

Bicrossproduct Hopf Algebras and Noncommutative Spacetime


S Majid, Queen Mary, University of London,
London, UK Position Momentum

2006 Elsevier Ltd. All rights reserved. Gravity Curved Noncommutative


x2 = 12 [pi , pj ] = ih ijk pk

Cogravity Noncommutative Curved


Introduction
[xi , xj ] = 2i P2 = 12
ijk xk
One of the sources of quantum groups is a
bicrossproduct construction coming in the case of Quantum
mechanics [xi , pj ] = ihij
Lie groups from considerations of Planck-scale
physics in the 1980s. This article describes these Figure 1 Noncommutative spacetime means curvature in
objects and their currently known applications. See momentum space. The equations are for illustration.
also the overview of Hopf algebras which provides
the algebraic context (see Hopf Algebras and
q-Deformation Quantum Groups). for flat space in the bottom line, which is quantum
The construction of quantum groups here is mechanics (there is a similar story for quantum
viewed as a microcosm of the problem of quantiza- mechanics on a curved space). We see however a
tion in a manner compatible with geometry. Here third and dual possibility noncommutativity in
quantization enters in the noncommutativity of the position space which should be interpreted as
algebra of observables and curvature enters as a curvature in momentum space, that is, the dual of
quantum nonabelian group structure on phase gravity. This is an independent physical effect and
space. Among the main features of the resulting comes therefore with its own length scale which we
bicrossproduct models (Majid 1988) are denote . These ideas were made precise in the mid
1990s using the quantum group Fourier transform;
1. Compatibility takes the form of nonlinear matched
see Majid (2000). Here we show what is involved on
pair equations generically leading to singular
three illustrative examples.
accumulation regions (event horizons or a max-
imum value of momentum depending on context). 1. We consider the spin space algebra
2. The equations are solved in an equal and
opposite form from local factorization of a R3 : xi ; xj  i2ij k xk
larger object. where 12 3 = 1 and where it is convenient to insert a
3. Different classical limits are related by observer factor 2. This is the enveloping algebra U(su2 ), that
observed symmetry and Hopf algebra duality. is, just angular momentum space but now regarded
4. Nonabelian Born reciprocity re-emerges and is upside down as a coordinate algebra (see Hopf
linked to T-duality. Algebras and q-Deformation Quantum Groups).
It has also been argued that noncommutative Then a plane wave is of the form
geometry should emerge as an effective theory of the eipx ; p 2 R3
p
first corrections to geometry coming from any
unknown theory of quantum gravity. Concrete where we set h = 1 for this discussion. The momenta
models of noncommutative spacetime currently pi are nothing but local coordinates for the
provide the first framework for the experimental corresponding point ei  p 2 SU2 where  is the
verification of such effects. The most basic of these representation by Pauli matrices. It is really elements
possible effects is curvature in momentum space or of this curved space SU2 where momenta live. Here
cogravity. We start with this. R3 = U(su2 ) has dual C[SU2 ] and Hopf algebra
Fourier transform (after suitable completion) takes
one between these spaces. Thus, in one direction
Cogravity Z Z
We recall that curvature in space or spacetime F f duf uu  d3 pJpf p eipx
SU2
means by definition noncommutativity among the
covariant derivatives Di . Here the natural momenta for f a function on SU2 . We use the Haar measure on
are pi = ihDi and the situation is typified by the SU2 . The local result on the right has J the Jacobian
top line in Figure 1. There are also mixed relations for the change to the local p coordinates and f is
between the Di and position functions as indicated written in terms of these. Note that the coproduct in
266 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

C[SU2 ] in terms of the pi generators is an infinite which is of the form of Schrodingers equation with
series given by the CampbellBakerHausdorff series, respect to an auxiliary time variable and for a
and not the usual linear one (this is why the measure particle with mass 1=.
is not the Lebesgue one). The physical content here is The reader may ask what happens to the
in the plane waves themselves, one can use any other Euclidean group of translations and rotations in
momentum coordinates to parametrize them with the this context. From the above we find that
corresponding measure and coproduct. Differential U (poinc3 ) = C[SU2 ] U(su2 ), the semidirect pro-
operators on R3 are given by the action of elements of duct generated by translations @ i and usual rota-
C[SU2 ] and are diagonal on these plane waves, tions. This in turn is the quantum double D(U(su2 ))
of the classical enveloping algebra, and as such a
f: p f p p
quantum group with braiding etc. (see Hopf
which corresponds under Fourier transform simply Algebras and q-Deformation Quantum Groups).
to pointwise multiplication in C[SU2 ]. For example, This quantum double has been identified as part
the function 2 (tr  2) as a function on SU2 will of an effective theory in 2 1 quantum gravity in a
give a rotationally invariant wave operator which is Euclidean version based on ChernSimons theory
also invariant under inversion in the group. Its value with Lie algebra poinc3 and the spin space algebra
on plane waves is proposed as an effective theory for this. The
quotient of R3 by an allowed value of the quadratic
1 2
treip  1 2 cosjpj  1 Casimir x2 (which then makes it a matrix algebra)
2  is called a fuzzy sphere and appears as a world-
In the limit  ! 0 this gives the usual wave operator volume algebra in certain string theories and
on R3 . reduced matrix models. The noncommutative dif-
It is also possible to put a differential graded ferential geometry that we have described is due to
algebra (DGA) structure of differential forms on this Batista and the author.
algebra, the natural one being 2. We take the same type of construction to
obtain the bicrossproduct model spacetime
2
dxi i ; xi   xi i dxi algebra

dxi xj  xj dxi iij k dxk iij  R1;3
 : t; xi  ixi ; xi ; xj  0

where  is the 2  2 identity matrix which, together These are the relations of a Lie algebra b (say) but
with the Pauli matrices i , completes the basis of again regarded as coordinates on a noncommutative
left-invariant 1-forms. The 1-form  provides a spacetime. Here  is a timescale which can be
natural time direction, even though there is no time written as a mass scale  = 1= instead. We
coordinate, and the new parameter  6 0 appears as parametrize the plane waves as
the freedom to change its normalization. The partial 0
derivatives @ i are defined by p;p0 eipx eip t ; p;p0 p0 ;p00 pep0 p0 ;p0 p00

d x @ i dxi @ 0  which identifies the p as the coordinates of the


3
nonabelian group B = R  R with Lie algebra
and act diagonally on plane waves as b . The group law in these coordinates is read off
i pi as usual from the product of plane waves, which
@i tri i sinjpj also gives the coproduct of C[B ] on the p . We
2 jpj
have parametrized plane waves in this way
while @ 0 = i(tr  2)=22 is computed as above. (rather than the canonical way by the Lie algebra
Note that  cannot be taken to be zero due to an as before) in order to have a more manage-
anomaly for translation invariance of the DGA. It is able form for this. We do pay a price that in these
in fact a typical feature of noncommutative differ- coordinates group inversion is not simply p ,
ential geometry that there is a 1-form  generating d but
by commutator which can be required as an extra
0
cotangent direction with its associated partial p; p0 1 ep p; p0
derivative an induced Hamiltonian. In the present
model we have which is also the action of the antipode S on the
abstract p generators.
X i 2
@0 i @ O2 In particular, the right-invariant Haar measure on
2 i B in these coordinates is the usual d4 p so the
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 267

quantum group Fourier transform reduces to the in units where 1 is the usual speed of light. So
usual one but normal ordered, the prediction is that the speed of light depends
Z on energy. What is remarkable is that even if
0
F f d4 p f peipx eip t   1044 s (the Planck timescale), this prediction
R4 could in principle be tested, for example using -ray
(one can also Fourier transform with respect to the left- bursts. These are known in some cases to travel
invariant measure d4 p e3p on B ). The inverse is again
0
cosmological distances before arriving on Earth, and
given in terms of the usual inverse transform if we have a spread of energies from 0.1100 MeV.
specify general fields in R1,
3
by normal ordering of According to the above, the relative time delay t
usual functions, which we shall do. As before, the action on traveling distance L for frequencies correspond-
of elements of C[B ] defines differential operators on ing to p0 , p0 p0 is
R1, 3
 and these act diagonally on plane waves.
We also have a natural DGA with L
t  p0  1044 s  100 MeV  1010 y  1 ms
dxj x x dxj ; dtx  x dt idx c

which leads to the partial derivatives which is in principle observable by statistical


analysis of a large number of bursts correlated
@ with distance (determined, e.g., by using the Hubble
@ i : x; t : ipi :
@xi telescope to lock in on the host galaxy of each
x; t i  x; t i 0 burst). Although the above is only one of a class of
@0 : : 1  ep :
i  predictions, it is striking that even Planck-scale
for normal-ordered polynomial functions or in effects are now in principle within experimental
terms of the action of the coordinates p in C[B ]. reach.
These @  do respect our implicit -structure We now explain what happens to the full
(unitarity) on R1, 3
but in a Hopf algebra sense Poincare symmetry here. The nonlinear action of

which is not the usual sense, since the action of the the Lorentz group on B Fourier transforms to an
antipode S is not just p . This can be remedied by action on the generators of R1, 3
 , which combines

using adjusted derivatives L(1=2) @  where with the above action of the p to generate an entire
Poincare quantum group U(so1, 3 ) C[B ]. We will
0
L : x; t i : ep : say more about its bicrossproduct structure in a
1 0 2 later section. The above wave operator in momen-
In
P thisi 2case the natural 4D Laplacian is L ((@ )  tum space is the natural Casimir in these momentum
i (@ ) ), which acts on plane waves as coordinates. A common mistake in the literature for
2 0 this model is to suppose that the Casimir relation
 coshp0  1 p2 ep alone amounts to a physical prediction, whereas in
2
fact the momentum coordinates are arbitrary and
where
X
3 have meaning only in conjunction with the plane
p2 = pi2 waves that they parametrize. The deformed Poincare
i1
as an algebra alone is actually isomorphic to the
This deforms the usual Laplacian in such a way as to undeformed one by a different choice of generators,
remain invariant under the Lorentz group (which now so by itself has no physical content; one needs rather
acts nonlinearly on B in this model) and under group the noncommutative spacetime as well. Prior work
inversion. on the relevant deformed Poincare algebra either did
This model may provide the first experimental test not consider it acting on spacetime or took it acting
for noncommutative spacetime and cogravity. For the on classical (commutative) Minkowski spacetime
analysis of an experiment, we assume the identification with inconsistent results (there is no such action as a
of noncommutative waves in the above normal-ordered quantum group).
form with classical ones that a detector might register. The above model was introduced by Majid
In that case one may argue (Amelino-Camelia and and Ruegg (1994) and later tied up with a dual
Majid 2000) that the dispersion relation for such waves approach of Woronowicz. There is also a previous
has the classical derivation as @p0=@pi which now -Poincare version of the Hopf algebra alone
computes as propagation speed for a massless particle: obtained (Lukierski et al. 1991) in another context
 0 (by contraction of Uq (so2, 3 )) but with fundamentally
@p  p0
  different generators and relations and hence
 @p  e
different physical content (e.g., the Lorentz
268 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

generators there do not close among themselves but where p = i@  . The wave operator @ @  is
mix with momentum). therefore given by the action of p p and has value
3. The usual Heisenberg algebra of quantum k k as usual on plane waves. On the other hand,
mechanics is another possible noncommutative  0

k 

(phase) space; one may also take the same algebra k


k0 ei=2k kk0
and view it as a noncommutative spacetime, so:
or in algebraic terms the twist functor T applied
R 1;3 x ; x
 i
to the Fourier transform implies also a twisted
 :
coproduct or coaddition law for the abstract k
for any antisymmetric tensor 
. This is not a generators, now different from the linear one for the
Hopf algebra but it turns out that this model can covariance momentum operators p . This leads to
also be completely solved by Hopf algebra meth- some of the more interesting features of the model.
ods, namely the theory of covariant twists. Twist One immediately also has a Poincare quantum
models also include versions of the noncommuta- group here, U (poinc1, 3 ), obtained by similarly
tive torus studied by Connes, and related -spaces, twisting the classical U(poinc1, 3 ). We just view
which are nontrivial at the level of C -algebras. F as living here rather than in the original H. The
However, at an algebraic level, all covariant translation sector is unchanged as before but if M
structures are automatically provided by applying are the usual Lorentz generators, then
the twisting functor T to the desired classical
construction (see Hopf Algebras and q-Deformation F M M 1 1 M
Quantum Groups). This is not usually appreciated in 12 p   p    p p
the physics literature on such models, but see Oeckl
 12 p   p    p p
(2000).
Thus, consider H = U(R1, 3 ) with generators p = using the metric 
to raise or lower indices. The
i@  acting as usual on functions on Minkowski antipode is also modified according to the theory
space. It has a cocycle in Majid (1995). The relations in the Poincare

p

algebra are not modified (so, e.g., p p will
F ei=2p
remain central). Any construction originally Poin-
which induces a new product
on functions by care covariant becomes covariant under this

= (F1 ( )). This is just the standard twisted one after application of the twisting
Moyal product, in the present case on R 1, 3 , viewed functor. As with the differentials above, the
as a covariant twist using Hopf algebra methods. action on R 1,
3
is not actually modified but may
The Hopf algebra U(R1, 3 ) in principle has a twisted appear so when functions are expressed in terms
coproduct given by F = F(( ))F1 but this does of the
product.
not change as the algebra is commutative. The above model is popular at the time of
Next, H also acts covariantly on (R 1, 3 ), the writing in connection with string theory. Here, an
usual algebra of differential forms, and twisting this effective description of the endpoints of open
in the same way gives strings landing on a fixed 4-brane has been
modeled conveniently in terms of the
product
x
dx dx dx dx
above (Seiberg and Witten 1999). It should be
unchanged. This is because no terms higher than borne in mind, however, that this fixed 4-brane
p p

contribute and then d(1) = 0. The asso- lives in some of the higher dimensions of the string
ciated partial derivatives defined by d are likewise spacetime, so this is not necessarily a prediction of
unchanged and act in the usual way as derivations noncommutative spacetime R1, 3 .
with respect to both the
product and the In fact, a proposal superficially similar to R1,

3

undeformed product. The result may look different above was already proposed in Snyder (1947).
when the same (x) is expressed as a function of the Here
variables with the
product. In other words, the
x ; x
 i2 M

only deformation comes from the Moyal product


itself, with the rest being automatic. Moreover, the where  is our length scale and the M
are now
plane waves themselves are unchanged because operators with the usual commutation rules for the
(x  k)
n = (x  k)n due to  being antisymmetric. Lorentz algebra with themselves and with x and the
Hence, momenta p . The latter obey

k x eixk

eix:k ; p k x k k x p ; x
 i
 2 p p
; p ; p
 0
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 269

so the entire Poincare algebra is undeformed but the The full extent of quantum bundles and gravity
phase-space relations are deformed. Snyder also (see Quantum Group Differentials, Bundles and
constructed the orbital angular momentum realiza- Gauge Theory) and quantum field theory is not
tion M
= x p
 x
p . This model is not a propo- always possible, although both have been done for
sal for a noncommutative spacetime because the covariant twist examples (for functorial reasons)
algebra does not even close among the x . Rather it and for small finite sets. For the first two models
is a proposal for mixing of position and Lorentz above, for example, it is not clear at the time of
generators. On the other hand (which was the point writing how to interpret scattering when the addi-
of view in Snyder (1947)), in any representation of tion of momenta is nonabelian.
the Poincare algebra, the M
become operators and
in some sense numerical. The rotational sector has
discrete eigenvalues as usual, so to this extent the
Matched Pair Equations
spacetime has been discretized. Although not fitting
into the methods in this article, it is also of interest Although we have presented noncommutative space-
that the relations above were motivated by con- time first, the first actual application of quantum
sidering p as coordinates projected from a 5D flat group methods to Planck-scale physics was the
space to de Sitter space and x as the 5-component Planck-scale Hopf algebra obtained by a theory of
of orbital angular momentum in the flat space. bicrossproducts. Like the Snyder model, the inten-
To conclude this section, let us note that there are tion here was to deform phase space itself, but since
further models that we have not included for lack of then bicrossproducts have had many further appli-
space. One of them is a much-studied R 1, q
3
in which cations. The main ingredient here is the notion of a
t is central but the xi enjoy complicated q-relations pair of groups (G, M), say, acting on each other as
best understood as q-deformed Hermitian matrices. we explain now. The mathematics here goes back to
One of the motivations in the theory was the result the early 1910s in group theory, but also arose in
in Majid (1990) that q-deformation could be used to mathematical physics as a toy version of Einsteins
regularize infinities in quantum field theory as poles equation in the sense of compatibility between
at q = 1. Another entire class is to use noncommu- quantization and curvature (see the next section).
tative geometry and quantum group methods on By definition, (G, M) are a matched pair of
finite or discrete spaces. Unlike lattice theory where groups if there are left and right actions
a finite lattice is viewed as approximation, these
3 "
models are not approximations but exact noncom- M M  G!G
mutative geometries valid even on a few points. The
of each group on the set of the other, such that
noncommutativity enters into the fact that finite
differences are bilocal and hence naturally have s3e s; e"u u; s"e e; e3u e
different left and right multiplications by functions.
s3u3v s3uv; s"t"u st"u
Both aspects are mentioned briefly in the overview
article (see Hopf Algebras and q-Deformation s"uv s"us3u"v
Quantum Groups). Also, on the experimental st3u s3t"ut3u
front, another large area that we have not had
room to cover is the prediction of modified for all u, v 2 G, s, t 2 M. Here e denotes the relevant
uncertainty relations both in spacetime and phase group unit element. As a first application of such
space (Kempf et al. 1995). data, one may make a double cross product group
Moreover, for all of the models above, once one G M with product
has a noncommutative differential calculus one may u; s:v; t us"v; s3vt
proceed to gauge theory etc., on noncommutative
spacetimes, at least at the level where a connection and with G, M as subgroups. Since it is built on the
is a noncommutative (anti-Hermitian) 1-form . direct product space, the bigger group factorizes into
Gauge transformations are invertible (unitary) these subgroups. Conversely, if X is a group
elements u of the noncommutative coordinate factorization such that the product G  M ! X is
algebra and the connection and curvature trans- bijective, each group acts on the other by actions
form as ", 3 defined by su = (s"u)(s3u) for u 2 G and s 2
M, where s, u are multiplied in X and the product is
! u1 u u1 du factorized as something in G and something in M.
So finite group matched pairs are equivalent to
F d ^ ! u1 F u group factorizations. In the Lie group context, the
270 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

corresponding system of differential equations is


equivalent to a local factorization.
s . =s 1=
s
s
u v uv e
There is a nice graphical representation of the
matched pair conditions which relates to surface a
integration. Thus, consider squares s s = s,e
u ab = s b u
u s
u
s s u u 1
u Ss = s 1
u
labeled by elements of M on the left edge and
Figure 3 Bicrossproduct Hopf algebra showing horizontal
elements of G on the bottom edge. We can fill in the product and vertical coproduct as an unproduct.
other two edges by thinking of an edge transformed
by the other edge as it goes through the square either
this, so is a semidirect coalgebra C(M) CG. Hence
horizontally or vertically, the two together is the
the two together are denoted C(M) CG. The dual
surface transport ) across the square. The matched
needs G finite and has the same form but with
pair equations have the meaning that a square can
vertical and horizontal compositions interchanged,
be subdivided either vertically or horizontally as
that is, a bicrossproduct CM C(G). Both Hopf
shown in Figure 2, where the labeling on vertical
algebras have the above labeled squares as basis.
edges is to be read from top down. The transport
It is possible to generalize both bicrossproducts
operation here is nothing other than normal order-
and double cross products associated to matched
ing in the factorizing group. In the Lie setting, it
pairs to general Hopf algebras H1 H2 and
means that the equations can be solved from
H1 H2 , respectively, where H1 , H2 are Hopf
infinitesimal solutions (a matched pair of Lie
algebras (see Majid 1990) and to relate the two in
algebras) by a simultaneous double integration over
general by dualization of one factor. Another
the group (i.e., building up a large box from many
general result (Majid 1995) is that H1 H2 acts
small ones). If one considers solving the quantum
covariantly on the algebra H1 from the right, or
YangBaxter equations on groups, they appear in
H1 H2 acts covariantly on H2 from the left. A
this notation as an equality of surface transport
third general result is that bicrossproducts solve the
going two ways around a cube, and the classical
extension problem
YangBaxter equations as curvature of the under-
lying higher-order connection. H1 ! H ! H2
Also in this notation there is a bicrossproduct
meaning that such a Hopf algebra H subject to some
quantum group defined in Figure 3, at least when M
technical requirements (such as an algebra splitting
is finite. The expressions are considered zero unless
map H2 ! H) is of the form H H1 H2 . The
the juxtaposed edges have the same group labels. In
theory was also extended to include cocycle bicros-
that case, the product is a semidirect product
sproducts at the end of the 1980s (by the author).
algebra C(M) CG of functions on M by the
The finite group case, however, was first found by
group algebra of G. The coproduct is the adjoint of
Kac and Paljutkin (1966) in the Russian literature
and later rediscovered independently in Takeuchi
(st ) u s (t u) (1981) and in the course of Majid (1988).

s s s (t u)
(st ) u =
t t u
t t u
The Planck-Scale Hopf Algebra
u u
We consider a quantum algebra of observables H
s (uv ) s u (s u) v
and ask when it is a Hopf algebra extending some
s u classical position coordinate algebra C[M] and some
s s (uv) = s (s u ) v
possibly noncommutative momentum coordinate
uv u v
algebra U(g ) in the form of a strict extension
e u s e e CM ! H ! Ug
e e u = e u e s s e = s s
u From the theory above this problem is governed by local
u e e
solutions of the matched pair equations on (G, M). It
Figure 2 Matched pair condition as a subdivision property. requires that H C[M] U(g ) as an algebra, that is,
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 271

the quantization of a particle moving on orbits in M a background curvature scale , and the correspond-
under some action of G (in an algebraic setting, or ing bicrossproduct C[p] C[x] is
one can use von Neumann or C -algebras etc.). And
it requires the classical phase space to be a p; x ih1  e x ; x x 1 1 x
 x
nonabelian or curved group M g  . This extends p p e 1 p; x p 0
to a coproduct on H which becomes the bicross- Sx x; Sp pe x

product Hopf algebra C[M] U(g ). In this way, the


problem which was open at the start of the 1980s of where we should allow power series or take e x as
finding true examples of Hopf algebras was given a an invertible generator.
physical interpretation as being equivalent to finding It is important to note that the matched pair
quantum-mechanical systems reconciled with curva- equations here have only this solution and it is
ture, and the equations that governed this were the necessarily singular at p = 0 or x = 0. The inter-
matched pair ones (Majid 1988). pretation in position space is as follows. Consider an
We still have to solve these equations. In the infalling particle of mass m with fixed momentum
Lie case, they mean a pair of cross-coupled first- p = mv1 (in terms of the velocity at infinity). By
order equations on G  M. These can be solved definition, p is the free-particle momentum and acts
locally as a double-holonomy construction in line on R as above. This corresponds to a free-particle
with the surface transport point of view, but are Hamiltonian p ^2 =2m and induces
nonlinear typically with singularities in the non- p_ 0
compact case. The equations are also symmetric  
under interchange of G, M so Born reciprocity p  x 1
x_ 1  e v1 1 
between position and momentum is extended to m 1 x   
the quantum system with generally curved at the classical level. We see that the particle takes
position and momentum spaces. Moreover, in so an infinite time to reach the origin, which is an
far as Einsteins equation G
= 8T
is also a accumulation point. This can be compared with the
compatibility between a quantity in position formula in standard radial infalling coordinates
space and a quantity originating (ultimately) in !
momentum space, the matched pair equations can 1
be viewed as a toy version of these. x_ v1 1  c2 x
1 2GM
Let us note that the reason to look for H a Hopf
algebra in the first place, aside from the reasons for distance x from the event horizon of a black hole
already given, is for observerobserved symmetry of mass M (here G is Newtons constant and c the
(this was put forward as a postulate for Planck-scale speed of light). So  c2=GM and for the sake of
physics). Thus, H  is also an algebra of observables further discussion we will use this value. With a
of some dual system, in our case U(m) C[G] or little more work, one can then see that
particles in G moving on orbits under M. Thus,
Born reciprocity is truly implemented in the mM m2P
quantum/curved system by Hopf algebra duality. Cx Cpusual qu. mech.
Cx Cp!
!
Put another way, Hopf algebras are the simplest CXusual curved geometry
objects after abelian groups that admit Fourier mM  m2P
transform (see Hopf Algebras and q-Deformation
Quantum Groups) and we require this on phase where mP is the Planck mass of the order of 105 g
space if Born reciprocity is to be extended to the and X = R R is a nonabelian group. In the first
quantum/curved system. limit, the particle motion is not detectably different
The Planck-scale Hopf algebra is the simplest from usual flat space quantum mechanics outside
example of these ideas (Majid 1988). Here G = the Compton wavelength from the origin. In the
M = R and the matched pair equations can be solved second limit, the estimate is such that noncommu-
completely. The general solution is tativity would not show up for length scales much
larger than the background curvature scale.
@ i @ This Hopf algebra is also the simplest way to
p h1  e x
^ i ; x
^ 1  eh p
@x h
 @p extend classical position C[x] and momentum C[p]
in the sense above. In other words, requiring to
for the action of one group with generator p on maintain observerobserved symmetry or Born
functions of x in the other group and vice-versa. It reciprocity throws up both quantum mechanics (in
has two parameters which we have denoted as h and the form of h) and something with the flavor of
272 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

gravity (in the form of ) and both are required for a for i = 1, 2 and the usual additive ones for p3 , M3 .
nontrivial Hopf algebra. Moreover, the construction There is also an appropriate counit and antipode.
necessarily has a self-dual form and indeed the The deformed spheres under the nonlinear rotation
dually paired Hopf algebra is C[p] C[x] with new in Majid (1990) are constant values of the Casimir
parameters  h0 = 1=h and 0 = 
h if we take the for the above algebra. This is
standard pairing x, p across the two algebras. Hopf
2
algebra duality realized by the quantum group coshp3  1 p2 ep3
Fourier transform F takes one between the two 2
models. which from the group of motions point of view
generates the noncommutative Laplacian when
acting on R3 . The model here is a Euclidean
inhomogeneous one.
Bicrossproduct Poincare The four-dimensional (4D) version U(so1, 3 )
Quantum Groups C[B ] of this construction (Majid and Ruegg
Another example from the 1980s in the same family 1994) is again linked to Planck-scale predictions,
as the Planck-scale Hopf algebra is G = SU2 and this time as a generalized symmetry. In terms of
M = B , a nonabelian version of R3 with Lie algebra translation generators p , rotations Mi and boosts
b of the form Ni we have

x3 ; xi  ixi ; xi ; xj  0 p ; p
 0; Mi ; Mj  iij k Mk

for i = 1, 2. The required solution of the matched Ni ; Nj  iij k Mk ; Mi ; Nj  iij k Nk


pair equations was found in Majid (1990) and has a p0 ; Mi  0; pi ; Mj  ii jk pk ; p0 ; Ni  ipi
nonlinear action of rotations on B . The interpreta-
tion of C[B ] U(su2 ) is of particles moving along as usual, and the modified relations and coproduct
!
orbits which are deformed spheres in B , and there 0
i i i 1  e2p
is a dual model where particles move instead on p ; Nj   j p ipi pj
2
2 
orbits in SU2 under the action of b . Moreover,
0
from the general theory of bicrossproducts, we Ni Ni 1 ep Ni ijk pj Mk
automatically have a covariant action of C[B ] 0

U(su2 ) on the auxiliary noncommutative space pi pi 1 ep pi


R3 = U(b ) with relations as above. and the usual additive coproducts on p0 , Mi . This
The quantum group here was actually obtained as a time the Lorentz group orbits in B are deformed
Hopfvon Neumann algebra but we limit ourselves to hyperboloids rather than deformed spheres, and the
the underlying algebraic version. Also, there is of Casimir that controls this has the same form as
course nothing stopping one considering this Hopf above but with  in the cosh term, that is, the
algebra equally well as U (poinc3 ), that is, a deforma- model is a Lorentzian one. We know from the
tion of the group of motions on R3 , rather than as an general theory of bicrossproducts that this Hopf
algebra of observables. The only difference is to denote algebra acts on U(b ) = R 1, 3
the spacetime in the

the generators of C[B ] by the symbols pi , reserving xi section Cogravity, and the Casimir induces the
instead for the auxiliary noncommutative space. We wave operator as we have seen there.
lower i, j, k indices using the Euclidean metric. Then Let us look a bit more closely at the deformed
the bicrossproduct has the form hyperboloids. Because neither group here is com-
pact, one expects from the general theory of
pi ; pj  0; Mi ; Mj  iij k Mk
bicrossproducts to have limiting accumulation
M3 ; pj  i3j k pk ; Mi ; p3  ii3 k pk regions. This is visible in the contour plot of p0
against jpj in Figure 4, where the p0 > 0 mass shells
as usual, for i, j = 1, 2, 3, and the modified relations
are now cups with almost vertical walls, compressed
  into the vertical tube
i 1  e2p3
Mi ; pj  ij 3  p2 ii k3 pj pk
2  jpj < 1
for i, j = 1, 2 and p2 = p21 p22 . The coproducts are In other words, the 3-momentum is bounded above
by the Planck momentum scale (if  is the Planck
Mi Mi ep3 M3 pi 1 Mi
time). Indeed, the light-cone equation (setting the
pi pi ep3 1 pi Casimir to zero) reads jpj = 1  ep3 so this is
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 273

2 group and the quantization of this is provided


by the quantum group coordinate algebras Cq [G]
(see Hopf Algebras and q-Deformation Quantum
Groups and Classical r-matrices, Lie Bialgebras, and
1 Poisson Lie Groups). The bicrossproduct quantum
groups are nevertheless unrelated to the latter even
though they spring form related classical data.
0 As already discussed, one interpretation here is
of quantized particles in G moving on orbits
under G and in vice versa in the dual model. The
dual model is equivalent in the sense that the
1 states of one (in the sense of positive-linear
functionals) lie in the algebra of observables of
the other and we also saw in the Planck-scale
2 example inversion of structure constants reminis-
2 1 0 1 2 cent of T-duality in string theory. Motivated in
Figure 4 Deformed mass-shell orbits in the bicrossproduct part by this duality Klimcik (1996) along with
curved momentum space for  = 1. Severa in the mid 1990s showed that indeed a
-model on G could be constructed in such a way
immediate. Nevertheless, this observation is so that there was a matching dual -model on G in
striking that the bicrossproduct model has been some sense equivalent in terms of solutions to the
dubbed doubly special and spawned the search for equations of motion. The Lagrangians here have
other such models. Such accumulation regions are a the usual form
main discovery of the noncompact bicrossproduct
L Eu u1 @ u; u1 @ u;
theory visible already in the Planck-scale Hopf
algebra. The model further confirms the role of L^ E
^ s s1 @ s; s1 @ s
the matched pair equations as a toy version of
where u : R1, 1 ! G and s : R1, 1 ! G are the dyna-
Einsteins. ^
mical fields, except that the inner products E, E
are not constant. Rather they are obtained by
solving nonlinear differential equations on the
PoissonLie T-Duality groups defined through the structure constants
We have explained in Section 3 that the matched of g , g  and the Drinfeld double D(g ). At the time,
pair equations are equivalent to a local factorization T-duality here was well understood in the case of
of Lie groups, with the action and back-reaction abelian groups while these PoissonLie T-duality
created equally and oppositely from this. For the models provided the first convincing nonabelian
two models in the last section, these are SL2 (C) models.
factorizing as SU2 and a 3D B , and SO2, 3 locally as This construction was extended by Beggs and
SO1, 3 and a 4D B . The first of these examples is in Majid (2001) to a general matched pair (G, M), that
fact one of a general family based on the Iwasawa is, a -model on G dual to one on M. The Poisson
decomposition GC = G G where G is a compact Lie case is the special case where the actions are
Lie group with complexification GC and G a coadjoint actions and the Lie algebra of G M is
certain solvable group. From this, one may construct D(g ). The solutions of the equations of motion for
a solution (G, G ) of the matched pair equations and the two systems are created equally and oppo-
bicrossproduct quantum group sitely from one on the factorizing group. It could
be expected that T-duality ideas again play a role in
CG  Ug Planck-scale physics.
associated to all complex simple Lie algebras. This is
again part of the bicrossproduct theory from the
Other Bicrossproducts
1980s. On the other hand, the Lie algebra g  here
can be identified with the dual of g in which case its There are also infinite-dimensional factorizations
Lie algebra corresponds to a Lie coproduct such as the RiemannHilbert problem (see
 : g ! g g and makes (g , ) into a Lie bialgebra in RiemannHilbert Problem) in the theory of
the sense of Drinfeld. This  exponentiates to a integrable systems and hence infinite-dimensional
Poisson bracket on G making it a PoissonLie matched pairs and bicrossproducts linked to
274 Bicrossproduct Hopf Algebras and Noncommutative Spacetime

them. Here we mention just one partly infinite See also: Classical r-Matrices, Lie Bialgebras, and
example of current interest. Poisson Lie Groups; Hopf Algebra Structure of
Thus, the diffeomorphisms on the line R may be Renormalizable Quantum Field Theory; Hopf Algebras
factorized into transformations of the form ax b and q-Deformation Quantum Groups; Quantum Group
Differentials, Bundles and Gauge Theory;
and diffeomorphisms that fix the origin and have
RiemannHilbert Problem; von Neumann Algebras:
unit differential there. After a (logarithmic) change
Introduction, Modular Theory, and Classification Theory.
of generators to arrive at an algebraic picture, one
has a bicrossproduct
H1 Ub H1
Further Reading
where b is now the two-dimensional (2D) Lie
Amelino-Camelia G and Majid S (2000) Waves on noncommu-
algebra with relations [x, y] = x and H1 is the algebra tative spacetime and gamma-ray bursts. International Journal
of polynomials in generators n and a certain of Modern Physics A 15: 43014323.
coalgebra as a model of the coordinate algebra of Beggs E and Majid S (2001) PoissonLie T-duality for quasi-
the group of diffeomorphisms that fix the origin with triangular Lie bialgebras. Communications in Mathematical
unit differential. The Hopf algebra H(1) was intro- Physics 220: 455488.
Connes A and Moscovici H (1998) Hopf algebras, cyclic
duced by Connes and Moscovici (1998) although not cohomology and the transverse index theory. Communications
actually as a bicrossproduct (but motivated by the in Mathematical Physics 198: 199246.
bicrossproduct theory) as part of a family H(n) useful Kac GI and Paljutkin VG (1966) Finite ring groups. Transactions
in cyclic cohomology computations. It has cross of the American Mathematical Society 15: 251294.
relations and coproduct determined by Kempf A, Mangano G, and Mann RB (1995) Hilbert space
representation of the minimal length uncertainty relation.
n ; x n1 ; n ; y nn ; Physical Review D 52: 11081118.
Klimcik C (1996) PoissonLie T-duality. Nuclear Physics B (Proc.
1 1 1 1 1 Suppl.) 46: 116121.
x x 1 1 x 1 y; Lukierski J, Nowicki A, Ruegg H, and Tolstoy VN (1991)
q-Deformation of Poincare algebra. Physics Letters B
y y 1 1 y 268: 331338.
Majid S (1988) Hopf algebras for physics at the Planck scale.
which we see has a semidirect product form where Journal of Classical and Quantum Gravity 5: 15871606.
n 3x = n1 , n 3y = nn . The coalgebra is also a Majid S (1990) Physics for algebraists: non-commutative and
semidirect coproduct by means of a back-reaction of non-cocommutative Hopf algebras by a bicrossproduct
H1 in B (expressed as a coaction). From the construction. Journal of Algebra 130: 1764.
Majid S (1990) Matched pairs of Lie groups associated to
bicrossproduct theory, we also have a dual model
solutions of the YangBaxter equations. Pacific Journal of
CB  Udiff 0 Mathematics 141: 311332.
Majid S (1990) On q-regularization. International Journal of
where diff 0 is the Lie algebra of the group of Modern Physics A 5: 46894696.
diffeomorphisms fixing the origin. As such it could be Majid S (1995) Foundations of Quantum Group Theory.
Cambridge: Cambridge University Press.
viewed as in the family of examples in the section
Majid S (2000) Meaning of noncommutative geometry and the
Bicrossproduct Poincare quantum groups but Planck-scale quantum group. Springer Lecture Notes in
now with a 2D B . We also conclude from Physics 541: 227276.
the bicrossproduct theory that this acts covariantly on Majid S and Ruegg H (1994) Bicrossproduct structure of the
R2 = U(b ) after introducing the scaling parameter . -Poincare group and non-commutative geometry. Physics
Letters B 334: 348354.
Finally, the Hopf algebra H(1) is also part of a
Oeckl R (2000) Untwisting noncommutative Rd and the
family of bicrossproduct Hopf algebras built on rooted equivalence of quantum field theories. Nuclear Physics B
trees and related to bookkeeping of overlapping 581: 559574.
divergences in renormalizable quantum field theories Seiberg N and Witten E (1999) String theory and noncommuta-
(see Hopf Algebra Structure of Renormalizable Quan- tive geometry. Journal of High Energy Physics 9909: 032.
Snyder HS (1947) Quantized space-time Physical Review D
tum Field Theory). While we have not had room to
67: 3841.
cover all bicrossproduct quantum groups of interest, it Takeuchi M (1981) Matched pairs of groups and bismash products
would appear that bicrossproducts are indeed inti- of Hopf algebras. Communications in Algebra 9: 841.
mately tied up with actual quantum physics.
Bifurcation Theory 275

Bifurcation Theory
M Haragus, Universite de Franche-Comte, Besancon, equation as  varies. A widely used way to
France characterize these changes is to say that the vector
G Iooss, Institut Non Lineaire de Nice, Valbonne, field F(  , 0 ) is structurally stable if the sets of orbits
France of the differential equation are homeomorphic for 
2006 Elsevier Ltd. All rights reserved. close to 0 , with homeomorphisms which preserve
the orientation of the orbits in time t. Then a
bifurcation occurs at  = 0 if F(  , 0 ) is not
Introduction structurally stable. It turns out that there is a close
link between the stability properties of equilibrium
Consider the following equation: solutions of the differential equation and the type of
FX;  0 1 the bifurcation in static theory.
The tools developed in bifurcation theory are
where X is the variable,  is a parameter, and X, , F extensively used to solve concrete problems arising
belong to appropriate (finite- or infinite-dimensional) in physics and natural sciences. These problems may
spaces. The problem of bifurcation theory is to be modeled by ordinary or partial differential
describe the singularities of the set of solutions equations, integral equations, but also delay equa-
S fX; X;  satisfies FX;  0g tions or iteration maps, and in all these cases the
presence of parameters naturally leads to bifurcation
The word bifurcation was introduced by H phenomena. They can be regarded as problems of
Poincare (1885) in his study of equilibria of rotating the form [1] or [3], in suitable function spaces, and
liquid masses. bifurcation theory allows to detect solutions and to
The simplest example is the study of the real roots describe their qualitative properties. During the last
x of a quadratic polynomial decades, a class of problems in which the use of
x2 bx c 0 2 bifurcation theory led to significant progress is
concerned with nonlinear waves in partial differen-
where  is represented by the pair of parameters tial equations, including hydrodynamic problems,
(b, c) 2 R 2 . As it is well known, real roots are nonlinear water waves, elasticity, but also pattern
determined by the sign of formation, front propagation, or spiral waves in
def reactiondiffusion type systems.
 b2  4c
For  < 0, there is no real solution of [2], while
there are two solutions x in the region  > 0, Examples in One and Two Dimensions
which merge when the distance between the point
(b, c) and the parabola  = 0 tends towards 0. It is The most complete results in bifurcation theory are
then clear that a singularity occurs in the structure available in one and two dimensions. The study of
of the set of solutions of [2] at the crossing of the static bifurcations in one dimension is concerned
parabola  = 0 or, in other words, a bifurcation with scalar equations
occurs in the parameter space (b, c) on the parabola f x;  0 4
 = 0. A point (0 , x0 ) 2 R 3 is then called a
bifurcation point if 0 = (b, c) satisfies  = 0, and where x 2 R,  2 R, and the function f is supposed to
x0 = b=2. be regular enough with respect to (x, ). When
In the theory of differential equations, F(X, ) f (x0 , 0 ) = 0 and the derivative of f with respect to x
often represents a vector field. This study is then satisfies @x f (x0 , 0 ) 6 0, the implicit function theorem
concerned with the existence of equilibrium solu- gives a unique branch of solutions x() for  close to
tions to the differential equation 0 , and shows the absence of bifurcation points near
(0 , x0 ). Bifurcation theory intervenes when
dX
FX;  3 @x f x0 ; 0 0 5
dt
and one cannot apply the implicit function theorem
and is therefore referred to as static bifurcation for solving with respect to x near x0 . A complete
theory. In addition, dynamic bifurcation theory is description of the set of solutions near (x0 , 0 ) can
concerned here with changes in the dynamic be obtained by looking at the partial derivatives of f
properties of the solutions of the differential with respect to x and .
276 Bifurcation Theory

For example, if x
@ f x0 ; 0 6 0;
it is possible to solve with respect to  and obtain a
regular solution (x) such that (x0 ) = 0 and
f (x, (x))  0. In addition, if the second order ( 0, 0)
derivative
@x2 f x0 ; 0 6 0
the picture of the solution set in the plane (, x), also
called bifurcation diagram, shows a turning point Figure 2 Supercritical pitchfork bifurcation in the case
2
with a fold opened to the left or to the right @x f (0, 0 ) > 0 and @x3 f (0, 0 ) < 0.. The solid (dashed) lines
depending upon the sign of the product @ f (x0 , 0 ) indicate the branch of stable (unstable) solutions in the
differential equation.
@x2 f (x0 , 0 ); see Figure 1. Notice that here the
bifurcation point (0 , x0 ) 2 R2 corresponds to the
appearance of a pair of solutions of [4] from solutions x() for  close to 0 . This situation arises
nowhere. This is the simplest example of a one- often in applications where usually this branch consists
sided bifurcation in which the bifurcating solutions of trivial solutions x() = 0. Then at a bifurcation
exist for either  > 0 or  < 0 . point (0 , x0 ) a second branch of solutions appears
A particularly interesting situation arises when the forming either a one-sided bifurcation, or a two-sided
equation possesses a symmetry. For example, assume bifurcation; see Figure 3.
that in [4] the function f is odd with respect to x. This We can now view f as a vector field in the
implies that we always have the solution x = 0, for any ordinary differential equation
value of the parameter . Assume now that f satisfies dx
f x;  8
@x f 0; 0 0 6 dt
and the study above corresponds to looking for
and that
equilibrium solutions of [8]. The stability of such a
2
@x f 0; 0 6 0; @x3 f 0; 0 6 0 7 solution is determined by the sign of the derivative
@x f (x, ) of f at this equilibrium, and it is closely
Then the point (0 , 0) is a pitchfork bifurcation related to the type of the static bifurcation.
point, this denomination being related with the In the case of a turning point bifurcation, when
bifurcation diagram in the plane (, x); see Figure 2. @x2 f (x0 , 0 ) 6 0, the sign of @x f (x, ) is different for
Notice that here, the bifurcation point (0 , x0 ) 2 R2 the two bifurcating solutions. This means that one
corresponds to the bifurcation from the origin of a pair solution is attracting (i.e., stable), the other one
of solutions exchanged by the symmetry x !x, in being repelling (i.e., unstable); see Figure 1. In the
addition to the persistent trivial solution x = 0 case of a pitchfork bifurcation as above, the stability
which is invariant under the above symmetry. Such a of the trivial solution x = 0 changes when  crosses
bifurcation is also referred to as a symmetry-breaking 0 , and the stability of both bifurcating nonzero
bifurcation. Similar bifurcation diagrams are found solutions is the opposite from the stability of the
when the equation [4] has a known branch of origin on the side of the bifurcation. The bifurcation

( 0, x 0)

(a) (b) (c)


Figure 3 Typical bifurcation diagrams in the case of a branch
of trivial solutions. One-sided bifurcations: (a) supercritical,
Figure 1 Turning point bifurcation in the case @ f (x0 , 0 ) > 0 (b) subcritical; two-sided bifurcation: (c) transcritical. The solid
and @x2 f (x0 , 0 ) < 0. The solid (dashed) line indicates the branch (dashed) lines indicate the branch of stable (unstable) solutions
of stable (unstable) solutions in the differential equation. in the differential equation.
Bifurcation Theory 277

is called supercritical if the bifurcating solutions lie


on the side of the bifurcation point where the basic
solution x = 0 is unstable and subcritical otherwise;
see Figure 2. The situation is the same in the case of
one-sided bifurcations for an equation which has a
known branch of solutions. In the case of a two-
sided bifurcation, there is an exchange of stability at
the bifurcation point (0 , x0 ), solutions on the two
branches having opposite stability for  > 0 and
 < 0 , which changes at (0 , x0 ). Such a bifurcation Figure 4 Supercritical Hopf bifurcation.
is also referred to as transcritical; see Figure 3.
Notice that the study of fixed points or periodic by Poincare, and then proved in two dimensions by
points for maps enter in the above frame. Specifi- Andronov (1937) using a Poincare map, and later in
cally, the period-doubling process occurring in n dimensions by Hopf (1948) by means of a
successive bifurcations of one-dimensional maps is LiapunovSchmidt-type method. For the differential
a common phenomenon in physics. equation, the absence of the zero eigenvalue in the
The analysis of bifurcations in two dimensions spectrum of L is not enough to ensure that the
leads to more complicated scenarios. Consider the vector field f (  , 0 ) is structurally stable in a
differential equation [8] in which now x 2 R2 and neighborhood of x0 . This only holds when the
f (x, ) 2 R 2 , and assume that f (x0 , 0 ) = 0. The spectrum of L does not contain purely imaginary
behavior of solutions near (x0 , 0 ) is determined by eigenvalues, as asserted by the HartmanGrobman
the differential Dx f (x0 , 0 )=: L of f with respect to theorem. We are then left with the case when L has
x, which can be identified with a 2  2 matrix. For a pair of purely imaginary eigenvalues i!, ! 2 R .
steady solutions, the implicit function theorem Static bifurcation theory gives that the system has a
insures the existence of a unique branch of solutions unique branch of equilibria (x(), ) for  close to
x() provided L is invertible or, in other words, zero 0 , and typically their stability changes as  crosses
does not belong to the spectrum of L. Consequently, 0 . For the differential equation a Hopf bifurcation
the study of bifurcations of steady solutions is occurs in which a branch of periodic orbits
concerned with the case when zero belongs to the bifurcates on one side of 0 , and their stability is
spectrum of L, and can be performed following opposite to that of the steady solution on this side;
the strategy described for one dimension, provided see Figure 4. A convenient way to study this
that the zero eigenvalue of L is simple. For example, bifurcation is through normal form theory,
assuming that the second eigenvalue is negative which is briefly described below.
leads in general to a saddlenode bifurcation, where
an additional dimension is added to the previous
picture of a turning point bifurcation, in which one
Local Bifurcation Theory
of the two bifurcating steady solutions is a stable
node, while the other one is a saddle. If, in addition, There are two aspects of bifurcation theory, local
there is a symmetry S commuting with f, that is, and global theory. As this designation suggests, local
such that f (Sx, ) = Sf (x, ), and if, for example, x0 theory is concerned with (local) properties of the set
is invariant under S, Sx0 = x0 , and the eigenvector 0 of solutions in a neighborhood of a known
associated to the zero eigenvalue of L is antisym- solution, while global theory investigates solutions
metric, L0 = 0 , then there is again a pitchfork in the entire space.
bifurcation. The equation possesses a branch of An important class of tools in local bifurcation
symmetric steady solutions the stability of which theory consists of reduction methods, among which
changes when crossing the value 0 of the para- the LiapunovSchmidt reduction and the center
meter, node on one side and saddle on the other, manifold reduction are often used to investigate
and a pair of solutions is created in a one-sided static and dynamic bifurcations, respectively. The
bifurcation which are exchanged by the symmetry S basic idea is to replace the bifurcation problem by
and have stability opposite to the one of the an equivalent problem in lower dimensions, for
symmetric solution, just as in the one-dimensional example, a one- or a two-dimensional problem as
pitchfork bifurcation above. the ones above.
A new type of bifurcation that arises for vector Consider again the equation [1] in which F : X 
fields in two dimensions is the so-called Hopf M ! Y is sufficiently regular, and X , Y, and M are
bifurcation. This bifurcation was first understood Banach spaces. Assume, without loss of generality,
278 Bifurcation Theory

that F(0, 0) = 0, or, in other words, that one solution Since dynamic bifurcations are related to the existence
is known. The equation can be then written as of purely imaginary spectral values of L, the kernel of L
alone is not enough to describe this situation. One has to
LX GX;  0
consider the spectral space Y c of L associated to the
in which L = DX F(0, 0) represents the differential of purely imaginary spectrum of L. A spectral gap is
F with respect to X at (0, 0), and is assumed to have needed between this part of the spectrum and the rest
a closed range. The implicit function theorem shows (always true in finite dimensions), so that the spectral
absence of bifurcation if L has a bounded inverse, so projection P onto Y c is well defined. One writes
that bifurcations are related to the existence of a X Xc Xh ; Xc PX; Xh id  PX
nontrivial kernel of L. The LiapunovSchmidt
reduction then goes as follows. and obtains the decomposed system
Let N(L) and R(L) denote the kernel and the range of dXc
L, respectively, and consider continuous projections LXc PGXc Xh ; 
dt
P : X ! N(L) and Q : Y ! R(L). Then there exists a dXh
bounded linear operator B : R(L) ! (id  P)X , the right LXh id  PGXc Xh ; 
dt
inverse of L, satisfying LB = id on R(L) and BL = id  P
on X . For X 2 X one may write The reduction procedure works provided the non-
homogeneous linear equation
X X0 X1 ; X0 PX; X1 id  PX
dXh
LXh f t
and then by projecting with id  Q and Q the dt
equation becomes possesses a unique solution in suitably chosen
function spaces with weak exponential growth,
id  QGX0 X1 ;  0
such that one can then solve the second equation
X1 BQGX0 X1 ;  0 for Xh = (Xc ) in a neighborhood of the origin in
these function spaces. This property is always true in
The implicit function theorem allows to solve the finite dimensions, but it has to be checked in infinite
second equation for X1 = (X0 , ) in a neighborhood dimensions. Different results showing the solvability
of the origin. Substitution into the first equation leads of this equation are available in both Banach and
to the equation in (id  Q)Y for X0 in PX , Hilbert spaces, relying upon additional conditions
id  QGX0 X0 ; ;  0 on the spectrum of L, decaying properties of the
resolvent of L on the imaginary axis, and regularity
also called bifurcation equation. This equation properties of the nonlinearity G. The map  is then
completely describes the set of solutions to [1] in a used to construct a map : PX  M ! (id  P)X ,
neighborhood of (0, 0), and this problem is then defined in a neighborhood of the origin, which
posed in a space of dimension much smaller than the parametrizes a local center manifold invariant under
dimension of X . the flow of the equation. The flow on this center
The basic principle of the LiapunovSchmidt method manifold is governed by the reduced equation in Y c ,
has been discovered and used independently by different
dXc
authors. E Schmidt (1908) used this method for integral LXc PGXc Xc ; ; 
equations, while Liapunov used it to study the stability dt
of the zero solution of nonlinear partial differential which completely describes the bifurcation problem.
equations when the linear part has zero eigenvalues The first proofs of this result were given in finite
(1947), and later in 1960 for the bifurcation problem dimensions by Pliss (1964) and Kelley (1967). Center
studied by Poincare (1885). In working in a Banach manifolds in infinite dimensions have been studied in
space of t-periodic functions, the LiapunovSchmidt different settings determined by assumptions on the
method may be used to solve the Hopf bifurcation linear part L and the nonlinear part G. One typical
problem, as did Hopf himself in 1948. assumption in infinite dimensions is that the spectrum
The analog of this reduction procedure for the of L contains only a finite number of purely imaginary
differential equation [3] is the center manifold eigenvalues, so that the reduced equation above is a
reduction. Assuming that F(0, 0) = 0, we obtain the differential equation in a finite-dimensional space.
differential equation These reduction methods work for a large class of
problems and the advantage of such an approach is
dX that one is left with a bifurcation problem in a
LX GX; 
dt lower-dimensional space. The methods involved in
Bifurcation Theory 279

solving this reduced bifurcation problem can be very part. The center manifold reduction provides a
different from one problem to another, and often two-dimensional reduced system with linear part
make use of some additional structure in the problem, having the simple eigenvalues i!, for which it is
such as a gradient-like structure, Hamiltonian convenient to write the normal form in complex
structure, or the presence of symmetries, which variables
are preserved by the reduction procedure.
dA 2  2k2 
A powerful tool for the analysis of these reduced i!A AQ A ;  o A
differential equations is provided by the normal dt
form theory, which goes back to works of Poincare for A(t) 2 C, where Q is a complex polynomial of
(1885) and Birkhoff (1927). The idea is to use degree k in jAj2 with Q(0, 0) = 0, or, equivalently, in
coordinate transformations to make the expression polar coordinates A = rei ,
of the vector field as simple as possible. The
transformed vector field is called normal form. dr  
rQr r 2 ;  o r 2k2
There is an extensive literature on normal forms dt
for vector fields in many different contexts, in both d  
finite- and infinite-dimensional cases. Typically the ! Q r 2 ;  o r 2k1
dt
classes of normal forms are characterized in terms of
the linear part of the differential equation. Qr and Q being the real and imaginary part of Q,
For differential equations of the form respectively. The radial equation for r truncated at
order 2k 1 decouples and admits a pitchfork bifurca-
dx tion. The bifurcating steady solutions of this equation
Lx gx;  9
dt then lead first to periodic solutions for the truncated
system, which are then shown to persist for the full
in which L is a matrix and g a sufficiently regular
equation by a standard perturbation analysis.
map such that g(0, 0) = 0, Dx g(0, 0) = 0, as encoun-
A situation that occurs in a large class of problems
tered in bifurcation theory, one possible character-
is when the problem possesses a reversibility
ization of normal forms makes use of the adjoint
symmetry, which often comes from some reflection
matrix L . Fixing any order k 2, there exist
invariance in the physical space, that is, when the
polynomials  and N of degree k in x with
vector field F(  , ) anticommutes with a symmetry
coefficients which are regular functions of ,
operator S. One of the simplest examples is the case
and (0, 0) = N(0, 0) = 0, Dx (0, 0) = Dx N(0, 0) = 0,
of a differential equation [9] when the matrix L has
such that by the change of variables
a double eigenvalue in 0, no other eigenvalues with
x y y;  zero real part, and a one-dimensional kernel which
is invariant by S. In this case, the center manifold
the equation [9] is transformed into the normal form reduction provides a two-dimensional reduced rever-
dy sible system, which can be put in the normal form
Ly Ny;  okykk 10
dt da
b
in which the polynomial N is characterized through dt
db
 
NetL y;  etL Ny;    a2 ojaj jbj3
dt
for all y, , and t, or, equivalently, which anticommutes with the symmetry
 
Dy Ny; L y L Ny;  (a, b) 7! (a, b). The above system undergoes a
reversible TakensBogdanov bifurcation and has
for all y and . This characterization allows to determine for  > 0 a phase portrait as in Figure 5. There are
the classes of possible normal forms for a given matrix L, two equilibria, one a saddle, the other a center, and
and also provides an efficient way to compute the a family of periodic orbits with the zero-amplitude
normal form for a given vector field g. As for the limit at the center equilibrium, and the infinite-
reduction methods, normal form transformations can be period limit a homoclinic orbit, originating at the
made to preserve the additional structure of the saddle point. In concrete problems the bounded
problem, such as Hamiltonian structure or symmetries. orbits of such a reduced system determine the shape
As an example, consider a differential equation of of physically interesting solutions of the full system
the form [9] with x 2 Rn and  2 R, which supports a of equations, such as, for example, in water-wave
Hopf bifurcation so that L has simple eigenvalues theory where to homoclinic and periodic orbits
i!, ! > 0, and no other eigenvalues with zero real correspond solitary and periodic waves, respectively.
280 Bifurcation Theory

solutions (0, ) for any . The bifurcation result


asserts that if for some real parameter value 0 zero
is an eigenvalue of odd multiplicity of the operator
id  0 L, then the set S of nontrivial solutions (X, )
possesses a maximal subcontinuum which contains
(0, 0 ) and meets either infinity in X  R or another
trivial solution (0, 1 ), 1 6 0 . In particular, (0 , 0)
Figure 5 Phase portrait of the reduced system in a reversible is a bifurcation point. A local version of this result is
TakensBogdanov bifurcation (left) and sketch of the a-component often referred to as Krasnoselskis theorem.
of solutions corresponding to homoclinic and periodic orbits (right).
Different versions and extensions of these theo-
rems can be found in the literature, as, for example,
in the case of a simple eigenvalue, or if the field F is
real-analytic when the set of solutions is path-
connected. More recent works address the question
of lack of compactness, and a number of results are
now available for problems with additional struc-
ture (gradient-like or Hamiltonian structure), but
Figure 6 Phase portrait of the reduced system in absence of also for concrete problems, such as the water-wave
reversibility (left) and sketch of the a-component of the solution problem.
corresponding to the bounded orbit (right).
See also: Bifurcations in Fluid Dynamics; Bifurcations of
Notice that in the absence of the reversibility Periodic Orbits; Central Manifolds, Normal Forms;
symmetry, the same type of bifurcation may lead to Dynamical Systems in Mathematical Physics: An
a completely different phase portrait for the reduced Illustration from Water Waves; GinzburgLandau
system as, for example, the one in Figure 6 in which Equation; Integrable Systems: Overview; Leray
the homoclinic and the periodic orbits disappear. Schauder Theory and Mapping Degree; Singularity and
This situation often occurs in the presence of a small Bifurcation Theory; Stability Theory and KAM; Symmetry
and Symmetry Breaking in Dynamical Systems.
dissipation in nearly reversible systems.

Global Bifurcation Theory Further Reading


Most of the existing results in global bifurcation Arnold VI (1988) Geometrical Methods in the Theory of
theory concern the static problem [1]. The analysis Ordinary Differential Equations. Grundlehren der Mathema-
of global sets of solutions often relies upon tischen Wissenschaften, vol. 250. New York: Springer.
topological methods, degree theory, but also varia- Buffoni B and Toland J (2003) Analytic Theory of Global
tional methods, or analytic function theory. Signifi- Bifurcation. Princeton: Princeton University Press.
Chossat P and Lauterbach R (2000) Methods in Equivariant
cant progress in understanding global branches of Bifurcations and Dynamical Systems. Advanced Series in
solutions has been made in the 1970s, in particular, Nonlinear Dynamics, vol. 15. River Edge, NJ: World
for nonlinear eigenvalue problems and the Hopf Scientific.
bifurcation problem (see, e.g., works by Rabinowitz, Chow S-N and Hale JK (1982) Methods of Bifurcation Theory.
Crandall, Dancer, and Alexander, Yorke, Ize, Grundlehren der Mathematischen Wissenschaften, vol. 251.
New York: Springer.
respectively).
Golubitsky M and Schaeffer DG (1985) Singularities and Groups
A now-classical result in the topological theory of in Bifurcation Theory, Vol. I. Applied Mathematical Sciences,
global bifurcations is the following theorem by vol. 51. New York: Springer.
Rabinowitz (1970), which gives a characterization Golubitsky M, Stewart I, and Schaeffer DG (1988) Singularities
of global sets of solutions for eigenvalue problems of and Groups in Bifurcation Theory, Vol. II. Applied Mathema-
tical Sciences, vol. 69. New York: Springer.
the form Guckenheimer J and Holmes P (1990) Nonlinear Oscillations,
X FX;  LX HX;  Dynamical Systems, and Bifurcations of Vector Fields. Applied
Mathematical Sciences, vol. 42. New York: Springer.
H(X, ) o(kXk), posed for (X, ) 2 X  R, X being Iooss G and Adelmeyer M (1998) Topics in Bifurcation Theory
a Banach space. In contrast to local theory where and Applications, Advances Series in Nonlinear Dynamics,
2nd edn., vol. 3, Singapore: World Scientific.
the function F is usually k-times differentiable (with Iooss G, Helleman RHG, and Stora R (eds.) (1983) Chaotic
a suitable k), in the global theory a typical behavior of deterministic systems. Session XXXVI of the
assumption is that F : X  R ! X is compact. The Summer School in Theoretical Physics held at Les Houches
equation above possesses a trivial branch of June 29July 31, 1981. Amsterdam: North-Holland.
Bifurcations in Fluid Dynamics 281

Ize J and Vignoli A (2003) Equivariant Degree Theory. de Ruelle D (1989) Elements of Differentiable Dynamics and
Gruyter Series in Nonlinear Analysis and Applications, vol. 8. Bifurcation Theory. Boston MA: Academic Press.
Berlin: de Gruyter and Co. Vanderbauwhede A (1989) Centre Manifolds, Normal Forms and
Kielhofer H (2004) Bifurcation Theory. An Introduction with Elementary Bifurcations. Dynamics Reported, Dynam. Report.
Applications to PDEs, Applied Mathematical Sciences, Ser. Dynam. Systems Appl., vol. 2, pp. 89169. Chichester: Wiley.
vol. 156. New York: Springer. Vanderbauwhede A and Iooss G (1992) Center Manifold Theory
Kuznetsov YA (2004) Elements of Applied Bifurcation Theory, in Infinite Dimensions. Dynamics Reported: Expositions in
3rd edn. Applied Mathematical Sciences, vol. 112. New York: Dynamical Systems, vol. 1, pp. 125163. Berlin: Springer.
Springer.

Bifurcations in Fluid Dynamics


G Schneider, Universitat Karlsruhe, Karlsruhe, time-periodic. If the rotational velocity of the inner
Germany cylinder is increased further, more complicated pat-
2006 Elsevier Ltd. All rights reserved. terns occur. The bifurcation scenario is well under-
stood from experiments and analytic investigations.
Benards problem consists in finding the flow of a
viscous incompressible fluid contained in between two
Introduction plates, where the lower plate is heated and the upper
Almost all classical hydrodynamical stability problems plate is kept at a constant temperature, cf. Figure 2. If
are experiments or gedankenexperiment which have the temperature difference between the two plates is
been designed to understand and to extract special below a certain threshold, the transport of energy from
phenomena in more complicated situations. Examples below to above is made by pure conduction. At this
are the TaylorCouette problem, Benards problem, threshold, this spatially homogenous solution becomes
Poiseuille flow, or Kolmogorov flow. unstable, convection sets in, and spatially periodic
The TaylorCouette problem consists in finding the patterns as rolls or hexagons occur. Convection
flow of a viscous incompressible fluid contained in problems play a big role in geophysical applications,
between two coaxial co- or counterrotating cylinders, that is, in spherical domains, as the earth. The paradigm
cf. Figure 1. If the rotational velocity of the inner for an anisotropic pattern-forming system is electro-
cylinder is below a certain threshold, the trivial convection in nematic crystals.
solution, called the Couette flow, is asymptotically Poiseuille flow consists in finding the flow of a
stable. At the threshold, this spatially homogenous viscous incompressible fluid flowing through a pipe
solution becomes unstable and bifurcates via a pitch- driven by some pressure gradient, cf. Figure 3. In
fork bifurcation or a Hopf bifurcation into different noncircular pipes, the trivial laminar flow becomes
spatially periodic patterns, that is, depending on the unstable at a critical pressure gradient. Experimen-
rotational velocity of the outer cylinder the basic tally, a direct transition to turbulent flow with large
patterns are stationary (called the Taylor vortices) or amplitudes is observed, according to the fact that in
general at the instability point of the trivial solution
a subcritical bifurcation occurs.

Figure 2 Benards problem with rolls.

Figure 1 The TaylorCouette problem with the Taylor vortices. Figure 3 Poiseuille flow with the trivial solution.
282 Bifurcations in Fluid Dynamics

@t U U NU
where U = 0 corresponds to the trivial solution, where
 is a linear and N(U) = O(U2 ) for U ! 0 a nonlinear
operator. Most of the examples from the previous
section are semilinear, that is, from a functional
analytic point of view, the nonlinear operator N can
be controlled in terms of the linear operator .
Since the form of the bifurcating pattern is only
Figure 4 The inclined-plane problem. The trivial Nusselt slightly influenced by far away boundaries, that is, for
solution possesses a flat top surface and a parabolic flow profile. instance, the upper and lower end of the rotating
cylinders in the TaylorCouette problem, the problems
Kolmogorov flow consists in finding the flow of a are considered from a theoretical point of view in
viscous incompressible fluid under the action of an unbounded domains,  = Rd  , with   Rm the
external force parallel to the flow direction x and bounded cross section that is, for instance, that the
varying periodically in the perpendicular y-direction. TaylorCouette problem is considered with two cylin-
This gedankenexperiment has been designed by ders of infinite length. Then the eigenfunctions of the
Kolmogorov in 1958 as a simplified model for the linear operator  are given by Fourier modes, that is,
Poiseuille flow problem in order to study the nature
eikx k;n z n keikx k;n z
of turbulence. The trivial solution which is called
P
Kolmogorov flow can become unstable via a long- with x 2 Rd , k 2 Rd , k  x = dj= 1 kj xj , z 2 , n 2 N.
wave instability along the flow direction. If an external control parameter is changed, inde-
The inclined-plane problem consists in finding the pendent of the underlying physical problem, the
flow of a viscous liquid running down an inclined trivial solution becomes unstable, then the surface
plane, cf. Figure 4. The trivial solution, the so-called k 7! Re1 (k) intersects the plane {Re1 (k) = 0}.
Nusselt solution, becomes sideband-unstable if the Generically, this happens first at a nonzero wave
inclination angle  is increased. Then the dynamics is vector kc 6 0 (cf. Figure 5).
dominated by traveling pulse trains, although the Examples for such an instability are the Taylor
individual pulses are unstable due to the long-wave Couette problem, Benards problem, or Poiseuille
instability of the flat surface. Time series taken from flow. Very often, due to some conserved quantity in
the motion of the individual pulses indicates the the problem we have Re1 (0) = 0 for all values of
occurrence of chaos directly at the onset of instability. the bifurcation parameter. Then, a so-called side-
There are other famous hydrodynamical stability band instability can occur, cf. Figure 6.
problems, with arbitrarily complicated bifurcation Examples for such an instability are the Kolmo-
scenarios. gorov flow problem or the inclined plane problem.
According to some symmetries in the problem, for
instance, reflection along the cylinders in the
TaylorCouette problem or rotational symmetry in
Spectral Analysis of the Trivial Solution
Benards problem, the curves in Figure 5 are double
All classical hydrodynamical stability problems are or rotational symmetric.
described by the NavierStokes equations In case of  being spherical symmetric, we have
1 fl rl; n z l fl rl; n z
@t U U  rp  U  rU f
 1
0rU
where U = U(x, t) 2 Rd with d = 2, 3 is the velocity
field, p = p(x, t) 2 R the pressure field, f some external
forcing, and  the dynamic viscosity. These equations k
are completed with boundary conditions. In case of
Benards problem, the NavierStokes equations are
coupled to a nonlinear heat equation. Rest of spectrum
By projecting U onto the space of divergence-free
vector fields and by taking the trivial solution as
new origin all problems from the previous section Figure 5 Real part of the spectrum in case of an instability at a
can be written as evolutionary system wave number kc 6 0. Definition of the small bifurcation parameter ".
Bifurcations in Fluid Dynamics 283

Es

k Mc

Ec
Rest of spectrum

Figure 8 The center manifold is invariant under the flow, is


Figure 6 Real part of the spectrum in case of a sideband
tangential to the central subspace Ec , and attracts nearby
instability. Definition of the small bifurcation parameter ".
solutions with some exponential rate.

with r  0, z 2 Sd , l, n for l 2 N0 and m =  l,


l  1, . . . , l 1, l being a spherical harmonic, that Mc fu c1 1 hc1 j
is, if l0 is the eigenvalue having first positive real hc1 2 spanf2 ; 3 ; . . .gg
part, then by symmetry, simultaneously 2l0 1
eigenvalues cross the imaginary axis. the so-called center manifold which is tangential to Ec ,
that is, kh(c1 )k  Ckc1 k2 (Figure 8). The dynamics on
Mc is no longer trivial due to the nonlinear terms.
Due to the fact that real problems are considered
Reduction of the Dimension Re1 (kc ) = 0 implies Re1 (kc ) = 0, that is, in case
In order to understand the occurrence of the spatially of 2=kc -periodic boundary conditions always two
periodic Taylor vortices in the TaylorCouette pro- eigenvalues cross the imaginary axis simultaneously.
blem and of the roll solutions and hexagons in For Benardss problem in a strip or for the Taylor
Benards problem, the problems are considered with Couette problem in case of a bifurcation of fixed
periodic boundary conditions along the unbounded points, the reduced system on the center manifold is
directions. Then the instability of the trivial solution derived with the ansatz
occurs when at least one eigenvalue crosses the U "A"2 teikc x c:c: O"2
imaginary axis. Generically, this happens by a simple
real eigenvalue or a pair of complex-conjugate where 0 < " 1 is the small bifurcation parameter,
eigenvalues crossing the imaginary axis (Figure 7). cf. Figure 5. Then due to eikc x eikc x eikc x = eikc x the
Center manifold theory and the LyapunovSchmidt complex-valued amplitude A satisfies the so-called
reduction allow to reduce the a priori infinite-dimen- Landau equation
sional bifurcation problem to a finite-dimensional one.
@T A A  AjAj2 O"2
In case of a real eigenvalue 1 crossing the imaginary
axis, the solution u can be written as a sum of the where the Landau coefficient  2 R is obtained by
weakly unstable mode and the stable modes, that is, classical perturbation analysis (Figure 9). The
u = c1 1 ur , (c1 2 R), where ur lives in the closure of reduced system is symmetric under the S1 -symmetry
the span of the stable eigenfunctions {2 , 3 , . . . }. For
the linearized system all solutions are attracted by the
one-dimensional set Ec = {u j ur = 0}, in which all Im
solutions diverge to infinity.
For the nonlinear system and small bifurcation
parameter this attracting structure survives, no
longer as a linear space, but as a manifold
Re

Rest of Rest of
spectrum spectrum
Figure 9 The dynamics of the Landau equation. Except of the
origin which corresponds to the Couette flow, all solutions
converge towards the circle of fixed points, which corresponds
to the family of Taylor vortices. The translation invariance of the
Figure 7 Generically, a simple real eigenvalue or a pair of TaylorCouette problem is reflected by the rotational symmetry of
complex-conjugate eigenvalues cross the imaginary axis. the reduced system.
284 Bifurcations in Fluid Dynamics

A 7! Aei with  2 R which corresponds to the


translation invariance of the original systems.
This so-called equivariant bifurcation theory has
been applied successfully to convection problems in
the plane and on the sphere.
The stability of time-periodic flows can be
analyzed with Floquet multipliers. Bifurcations
from a time-periodic solution can lead to quasiper- Figure 10 The front solution of the GinzburgLandau equation
modulates the underlying pattern in the original system.
iodic motion in time. Ruelle and Takens (1971)
showed that already the next bifurcation leads to
chaotic dynamics. Since this time many classical connecting the stable Taylor vortices with the
hydrodynamical stability problems have been ana- unstable Couette flow, cf. Figure 10.
lyzed with bifurcation theory up to turbulent flows. The diffusion operator in the GinzburgLandau
It was observed that center manifold theory can equation reflects the parabolic shape of Re1 close
also be applied successfully to elliptic PDE problems to k = kc in Figure 5. In case of the long-wave
posed in spatially unbounded cylindrical domains. instability, as drawn in Figure 6, the second-order
A famous example is the construction of capillary- differential operator changes in a fourth-order
gravity solitary waves for the so-called water-wave differential operator.
problem. For Kolmogorov flow with T = "4 t and X = "x and
the amplitude scaled with ", we obtain that in lowest
order A has to satisfy a CahnHilliard equation
Modulation Equations p
@T A  2@X2 A  3@X4 A @X2 A3
The analysis of the last section is of no use in case of
a sideband instability occurring at the wave number where A(X, T) 2 R and  2 R a constant (cf. Figure 6).
kc = 0, as it happens in the inclined-plane problem The KuramotoShivashinsky (KS)-perturbed KdV
or in the Kolmogorov flow problem. Moreover, in equation
case of an instability at a wave vector kc 6 0, based 3
on the above analysis, front solutions cannot be @T A @X u  @X A2 =2  "@x2 @x4 u
described. In such situations, the method of modula- with A = A(X, T) 2 R, X 2 R, T  0, where 0 < " 1
tion equations generalizes the role of the finite- is still a small parameter, can be derived for the
dimensional amplitude equations from the last inclined problem with T = "3 t and X = "x and the
section. amplitude scaled with "2 .
The complex cubic GinzburgLandau equation in The theory of modulation equations is nowadays a
normal form is given by well-established mathematical tool which allows us to
@T A 1 i@X2 A A  1 iAjAj2 construct special solutions, global existence results for
the solutions of pattern-forming systems, or allows to
where the coefficients ,  2 R are real, and we have characterize the attractors in such systems. The
X 2 R, T  0, and A(X, T) 2 C. The Ginzburg method is based on approximation results, showing
Landau equation is a universal amplitude equation that solutions of the original systems can be approxi-
that describes slowly varying modulations, in space mated by the modulation equation and attractivity
and time, of the amplitude of bifurcating spatially results showing that every solution of the original
periodic solutions in pattern-forming systems close system develops in such a way that it can be described
to the threshold of the first instability. Whenever the by the modulation equation.
instability drawn in Figure 5 occurs, that is, for the This method can also be applied to secondary
TaylorCouette problem and Benards problem in a bifurcations describing instabilities of spatially per-
strip, that is, d = 1, it can be derived by a multiple iodic wave trains. Then the so-called phase-diffusion
scaling ansatz equations, conservation laws, Burgers equations,
and again the KS equations occur.
ux; t
"A"x  cg t; "2 teikc x!0 t c:c:
However, this method cannot be applied success-
For instance, in case of  =  = 0, the Ginzburg fully in all situations. There are counterexamples
Landau equation possesses front solutions connect- showing that not every formally derived modulation
ing the stable fixed point A = 1 with the unstable equation describes the original system in a correct
fixed point A = 0. Such solutions correspond in the way. Moreover, very often according to some
TaylorCouette problem to modulating fronts symmetries in the original problem no consistent
Bifurcations of Periodic Orbits 285

Im
LeraySchauder Theory and Mapping Degree; Multiscale
Approaches; Newtonian Fluids and Thermohydraulics;
Symmetry and Symmetry Breaking in Dynamical Systems;
Continuous spectrum
Turbulence Theories; Variational Methods in Turbulence.

Re Further Reading
Chandrasekhar S (1961) Hydrodynamic and Hydromagnetic
Stability. Oxford: Clarendon.
Discrete eigenvalues
Chang H-C and Demekhin EA (2002) Complex Wave Dynamics
on Thin Films, Studies in Interface Science, vol. 14. Amsterdam:
Figure 11 Spectrum for the flow around an obstacle. Elsevier.
Chossat P and Iooss G (1994) The TaylorCouette Problem,
multiple scaling analysis is possible, that is, that the Applied Mathematical Sciences, vol. 102. Springer.
modulation equations still depend on ". Chow S-N and Hale J (1982) Methods of Bifurcation Theory,
Grundlehren der Mathematischen Wissenschaften, vol. 251.
Berlin: Springer.
Discussion Golubitsky M and Schaeffer DG (1985) Singularities and Groups
in Bifurcation Theory I, Applied Mathematical Sciences,
There is no satisfactory bifurcation analysis for situa- vol. 51. Berlin: Springer.
tions where boundary layers play a role. The most Golubitsky M, Stewart I, and Schaeffer DG (1988) Singularities
and Groups in Bifurcation Theory II, Applied Mathematical
simple problem is the flow around some obstacle. The
Sciences, vol. 69. Berlin: Springer.
difficulties are according to the fact that due to the Haken H (1987) Advanced Synergetics. Berlin: Springer.
unbounded flow region there is always continuous Henry D (1981) Geometric Theory of Semilinear Parabolic Equa-
spectrum up to the imaginary axis. From the localized tions, Lecture Notes in Mathematics, vol. 840. Berlin: Springer.
obstacle discrete eigenvalues are created, (cf. Figure 11). Mielke A (2002) The GinzburgLandau equation in its role as a
modulation equation. In: Fiedler B (ed.) Handbook of Dyna-
In such a situation, so far there is no mathematical mical Systems II, pp. 759834. Amsterdam: North-Holland.
bifurcation theory available. Ruelle D and Takens F (1971) On the nature of turbulence.
Communications in Mathematical Physics 20: 167192.
See also: Bifurcation Theory; Dynamical Systems in Temam R (1988) Infinite-Dimensional Systems in Mechanics and
Mathematical Physics: An Illustration from Water Waves; Physics. Berlin: Springer.

Bifurcations of Periodic Orbits


J-P Francoise, Universite P.-M. Curie, Paris VI, Paris, The Asymptotic Phase of a Stable
France Periodic Orbit
2006 Elsevier Ltd. All rights reserved.
Let  be a periodic orbit of a vector field and let
S() denote the stable manifold of  (resp. U()
denotes the unstable manifold of ). The following
Introduction theorem can be found, for instance, in Hartman
Bifurcation theory of periodic orbits relates to (1964).
modeling of quite diverse subjects. It appeared Theorem There exist  and K such that Re(j ) < ,
classically in the field of celestial mechanics with j = 1, . . . , k and Re(j ) > , j = k 1, . . . , and for all
the contributions of H Poincare. Van der Pol (1926, x 2 S(), there is an asymptotic phase t0 such that for
1927, 1928, 1931) observed the frequency-locking all t  0
phenomenon in electrical circuits. More recently,
Malkins theory (Malkin 1952, 1956, Roseau 1966) j t x  t  t0 j< K et=T
was used to justify synchronization of weakly
coupled oscillators modeling the electrical activity Similarly, for any x 2 U(), there is a t0 such that t  0,
of the cells of the sinusal node in the heart. This j t x  t  t0 j< K et=T
article provides the essential mathematical back-
ground necessary for existence of frequency locking. If the periodic orbit is stable, the local stable
Applications can be found, for instance, in Weakly manifold coincides with an open neighborhood of .
Coupled Oscillators. In such a case, there is a foliation of this open set
286 Bifurcations of Periodic Orbits

whose leaves are the points with a given asympto- di


tic phase. The asymptotic phase can be considered fi ; ; 
dt
as a coordinate function  defined on the 3
neighborhood S(). di
Fi ; ; ; i 1; . . . ; m
If we consider now the particular case of a plane dt
system, this function can be completed with the
square of the distance function to the orbit into a Definition The system [2] has a phase locking if
coordinate system called the amplitudephase the system induced by [3] on (t)
system and denoted as (, ).
di
Fi 0; ;  4
dt

Frequency Locking and Phase Locking has an attractive singular point.

The term oscillator has two meanings. A con- As the attractive singular points are structurally
servative oscillator is a plane vector field which stable, this is enough to assume that the system
displays an open set of periodic orbits. It is said to di
be isochronous if all orbits have same period. A Fi 0; ; 0 5
dt
dissipative oscillator is a planar vector field which
displays an attractive limit cycle (attractive periodic displays an attractive singular point.
orbit).
We consider N dissipative oscillators:
Periodic Orbits of Linear Systems
dxi
f xi ; yi
dt Consider the linear system
1
dyi dx
gxi ; yi Pt  x qt 6
dt dt
where i = 1, . . . , m. where P is a continuous T-periodic matrix function
The dynamical system obtained by considering the and q is a vector T-periodic continuous function,
space of all the variables (xi , yi ), i = 1, . . . , m, dis- x = (x1 , . . . , xn ). Consider also the two associated
plays an invariant torus full of periodic orbits that homogeneous equations:
we denote by T m (0).
dx
Assume now that the N oscillators are weakly Pt  x 7a
coupled: dt

dxi
f xi ; yi Fi x; y;  dx
dt P t  x 7b
2 dt
dyi
gxi ; yi Gi x; y;  where P denotes the transposed of P.
dt
The set of T-periodic solutions of [7b] is a vector
where  can be considered as small as we wish. space. m denotes its dimension. Let Uj (t), j = 1, . . . , m,
Definition The system [2] has a frequency locking be a basis of this vector space. This basis is completed
if it displays a family of stable periodic orbits  for by adding n  m solutions Uj (t), j = m 1, . . . , n, to
all values of  small enough which tends to (in the obtain a basis of Rn . Let U(t) be the matrix whose
sense of Hausdorffs topology) a periodic orbit of [1] columns are these vectors; denote Uij (t) the elements of
contained in the periodic torus T m (0). this matrix.
With the change of variable x = U (0)1 y, system
Assume now that [2] has a frequency locking [6] gets transformed into
associated with the periodic orbit (t). Consider the
projections i (t) of (t) on the coordinates plane dy
Qty rt 8
(xi , yi ), i = 1, . . . , m. Assume that  is small enough dt
so that the projection belongs to the open set Si on with Q(t) = U (0)P(t)U (0)1 and r(t) = U (0)q(t).
which are defined the amplitudephase coordi- Matrix V(t) = U1 (0)U(t) is such that
nates of the system [1]. We can write the system [2],
restricted to the open set S = m i=1 Si , as
dV
Q tV 0; V0 I
dt
Bifurcations of Periodic Orbits 287

and the k first column vectors V(t), denoted as [7a]. To be more specific, one can choose x (t) to
V j (t), j = 1, . . . , m, are T-periodic. be the unique solution of [6] such that
Let X(t) be the fundamental solution defined by y(0)k = 0, k = m 1, . . . , n, and j (t) solutions of
[7a], such that y(0)k = jk . With these notations,
dX
Qt  X; X0 I x (t) is such that
dt
y0k k ; k 1; . . . ; m
then,
and its other initial conditions y(0)k = k , k = m
X1 t V  t
1, . . . , n, are fixed:
The solution of [8] can be written as k k0
Z t
yt Xt  y0 Xt  X1 uru du 9
0

This yields that T-periodic solutions of [8] have Malkins Theorem for Quasilinear
initial data y(0) given by Systems
Z T
Consider now nonlinear systems with the
V  T  I  y0 V  srs ds 10 perturbation:
0
dx
Conversely, given a solution y(0) of [10], Pt  x qt f x; t;  14
T-periodicity of P and q and uniqueness of solutions dt
of a differential equation imply that y(0) represents the where f is C1 and T-periodic in t.
initial data of a T-periodic solution of [8]. Hence, the Assume that the solutions y(t, y(0), ) of [14] exist
T-periodic solutions of [8] are in one-to-one corre- for all values of t, 0  t  T. The solutions define a
spondence with the affine space defined by the differential function of their initial data y(0). This is,
solutions of [10]. The m first rows of V  (T)  I are for instance, true for perturbations of linear systems
zero and its rank is exactly n  m. In the following, if  is small enough.
assume that the determinant  formed by the (n  m) Assume that q satisfies la condition [12] and that
last rows and last columns of (V  (T)  I) is not zero. there is a solution
A necessary and sufficient condition so that [8]  0 
1 ; . . . ; 0m
displays a T-periodic solution is
Z TX n to the equations
Vjk urj u du 0; k 1; . . . ; m 11a n Z T
X
0 j1
k  Ujk ufj x u; u; 0 du 0;
j1 0

X
n
Vjk T  jk yj 0 k 1; . . . ; m 15a
jm1
n Z
so that
X T
Vjk srj s ds; m1sn 11b @ k 
j1 0 j0 ; k 1; . . . m; j 1; . . . ; m 15b
@j
This yields the Fredholm alternative, if the m is invertible.
conditions, Proceed as in previous section with the coordinate
Xn Z T change x = U (0)1 y. Equation [14] gets trans-
Ujk sqj s ds 0; k 1; . . . ; m 12 formed into
j1 0
dy
Qty rt Fy; t;  16
are satisfied, then [6] displays a family x (t) of dt
T-periodic solutions depending of m parameters with F = U (0)f (U (0)1  y, t, ).
(1 , . . . , m ): Solutions of [16] are uniquely determined by their
x t 1 1 t    m m t x
t 13 initial data. We can understand the parameters (, )
as coordinates on the space of solutions. With this
where x (t) is a particular T-periodic solution and viewpoint, for instance, the set of T-periodic
j (t) denote T-periodic independent solutions of solutions of [6] is an affine space of dimension m
288 Bifurcations of Periodic Orbits

given by the equations = 0 and is parametrized by displays an m-parameter family x (t) of T-periodic
the coordinates . In this space, we pick up a point orbits.
(which corresponds to a particular T-periodic solu- Assume that the solutions y(t, y(0), ) exist for all
tion of [6]): ( = 0 ). T-periodic solutions of [16] are 0  t  T and define a differentiable mapping of the
in one-to-one correspondence with the solutions of initial data y(0). This is, for instance, the case if we
Xn Z T assume that the nonperturbed equation defines a
Ck ; ;  Vjk sFj ys; ; ; ; s; ds 0; flow and if  is small enough.
j1 0 Assume also that the different solutions x (t) are
k 1; . . . ; m 17a independent in the sense that the mapping

X  7! x t
Ck ; ;  Vjk T  I j
jm1;...;n is an immersion for any t. In other words, the m
n Z
X T vectors dx (t)=dj are independent.
 Vjk srj s ds We linearize the solution along the family of
j1 0
periodic orbits:
n Z
X T
 Vjk sFj ys; ; ; ; s;ds 0;
0 x x t 
23
j1

k m 1; . . . ; n 17b Equation [21] gets transformed into


where k , k = 1, . . . , m and k = yk (0), k = m
1, . . . , n parametrize the solutions y(t, , , ) of d

Dfx x t;t 
gx t;t; 0 F
;t; 24
[14] in this way: dt
X
m
Set, furthermore,
y0U 0  x0; x0 j j 0 x
0 18
j1
Pt Dfx x t; t; rt gx t; t; 0
Consider the determinant of the Jacobian matrix
and denote U(t) the fundamental solution of [7b]
of the mapping
described earlier.
; 7! C; ;  19
Theorem Assume that there is a solution
0
for  =  , k = k0 ,
k = m 1, . . . , n ,  = 0. This is
 
equal to the product of  and the determinant of 01 ; . . . ; 0m
@ k 
j 0 20 of the m equations:
@j 
which is nonzero. n Z
X T
The implicit-function theorem shows that the k  Ujk ugj x u; u; 0 du 0;
j1 0
differential equation [14] (and thus [16] as well)
has, for  small enough, a unique T-periodic solution
k 1; . . . ; m 25a
which tends to x0 when  tends to 0.
such that

Generalization of Malkins Theorem @ k 


j 0; k 1; . . . m; j 1; . . . ; m 25b
@j 
Finally, we consider the most general situation of
the perturbation of a general system (not necessarily
is invertible. Then, for all  sufficiently small, eqn
linear):
[21] has a unique T-periodic solution which tends to
dx x0 when  tends to 0.
f x; t gx; t;  21
dt We show that under the hypothesis of the
where we assume that theorem, we can apply the results proved in the
preceding section. Note that one can prove the
dx theorem for eqn [24] because it reduces to [21] with
f x; t 22
dt the change of variables [23].
Bifurcations of Periodic Orbits 289

Note first that the m conditions [25a] imply that Then, the solutions
(t) depend linearly on . We thus
the m equations, obtain that a priori p () are quadratic functions of :

d
p 1 ; . . . ; m
Dfx x0 t; t 
gx0 t; t; 0 Z
dt 1X T
@ 2 fj @zk @zl
q r Ujp   ds
display a family of T-periodic solutions which 2 qrkl 0 @zk @zl @q @r
depend on m parameters  = (1 , . . . ,m ). From Z " !
X T
1 @ 2 fj @zk  @zl 
(13), one can write q Ujp 
l
k
qkl 0 2 @zk @zl @q @q


 t 1 1 t    m m t
t 26 #
@gj @zk
 is a particular T-periodic solution and  ds    28
where
(t) @zk @q
the j (t) are independent T-periodic solutions
of (22a). where the dots represent quantities independent of .
We use then the expression
Lemma 1 A possible choice for the solutions j (t)
is @x (t)=@j j=0 .  2 
d @ zj
We have already assumed that these vectors are dt @q @@r
independent. They are obviously T-periodic solu- X @ 2 fj @zk @zl X @fj @ 2 zk
tions to (22a).  
In the following, we will assume that all other periodic kl
@zk @zl @q @r k
@zk @q @@r
solutions of (22a) are linear combinations of these.
As a consequence of what was proved in the This allows one to find the homogeneous quadratic
section on periodic orbits of linear systems, system part as
[24] displays a periodic solution (for  small enough)
if there exists a solution XZ T
@ 2 fj @zk @zl
Ujp   ds
 0  jkl 0 @zk @zl @q @r
0
1 ; . . . ; m  2 
XZ T d @ zj
Ujp s ds
to equations 0 ds @q @@r
j
n Z
X T XZ T @fj @ 2 zk
k  Ujk sFj
 s; s; 0 ds 0;  Ujp s ds
j1 0 jk 0 @zk @q @r

k 1; . . . ; m
Integration by parts yields
such that XZ T
@ 2 fj @zk @zl
Ujp   ds
@ k  jkl 0 @zk @zl @q @r
j 0; k 1; . . . m; j 1; . . . ; m
@j  X Z T dUjp  2
@fj @ zk
 Ujp s ds 0
is invertible. j 0 ds @z k @ q @r

Lemma 2 The quantities k () depend linearly in .


because U is solution to [7a]. This shows that [28]
Proof Observe first that the quantities Fj (
, s, 0) is linear in . Suffices to show that the determinant
depend quadratically of
: of this system does not vanish to have existence and
uniqueness of the solution such that
1 X @ 2 fj
Fj
; s; 0 x 0 s; s
k
l
2 k;l @zk @zl  @ 1 ; . . . ; m
6 0
@1 ; . . . ; m
X @gj
x0 s; s; 0 Consider now the coefficient of the linear part:
k
@zk
XZ T  2 
@gj @ fj  @gj @zk
x 0 s; s; 0 27 Ujp 
l  ds
@  kl 0 @zk @zl @zk @q
290 Bi-Hamiltonian Methods in Soliton Theory

  2 
and the coefficient d p  XZ T @ fj  @gj @zk
Ujp 
 ds
n Z
X T dq 0 @zk @zl l @zk @q
kl 0
p  Ujp ugj x u; u; 0 du
j1 0
This achieves the proof of the theorem. In the special
We can write case of Hamiltonian systems, in the case of the
Z T  peturbations of an isochronous system, the method
d p @Ujp @gj @zk explained is equivalent to Mosers averaging theory.
 gj Ujp  ds
dq 0 @q @zk @q The reader is referred to other articles in this
encyclopedia for a discussion of other aspects of
Note that
synchronization, frequency locking, and phase locking.
d
j X @fj 

r gj zt; 0 ; 0 See also: Bifurcation Theory; Fractal Dimensions in
dt r
@zr
Dynamics; Integrable Systems: Overview; Isochronous
and we obtain Systems; LeraySchauder Theory and Mapping Degree;
! LjusternikSchnirelman Theory; Singularity and
Z
d p T
@Ujpd
j X @fj  Bifurcation Theory; Symmetry and Symmetry Breaking in
 
r Dynamical Systems; Synchronization of Chaos; Weakly
dq 0 @q ds r
@zr
 Coupled Oscillators.
@gj @zk
Ujp  ds
@zk @q
Further Reading
Integration by parts yields Hartman P (1964) Ordinary Differential Equations. New York:
 Z T   ! Wiley.
d p  d @Ujp X @fj Malkin I (1952) Stability Theory of the Motion. Moscou
 

j 

r
dq 0 0 ds @q r
@zr Leningrad: Izdat. Gos.
  Malkin I (1956) Some Problems in the Theory of Nonlinear
Z T
@gj @zk Oscillations. Gostekhisdat.
Ujp  ds Moser J (1970) Regularization of Keplers problem and the
0 @zk @q
averaging method on a manifold. Communication of Pure and
From the equation Applied Mathematics 23: 609636.
Roseau M (1966) Vibrations non lineaires et theorie de la stabilite,
dUjp X @fk Springer Tracts in Natural Philosophy, vol. 8. Berlin: Springer.
U 0 Van der Pol B (1926) On relaxation-oscillations. Philosophical
dt @zj kp
k Magazine 3(7): 978992.
Van der Pol B (1931) Oscillations sinusoidales et de relaxation.
we deduce that Londe electrique 245256.
  X @fk @Ujp X @ 2 fk Van der Pol B and Van der Mark J (1927) Frequency
d @Ujp @zr demultiplication. Nature 120: 363364.
 Ukp
dt @q k
@zj @q k
@zj @zr @q Van der Pol B and Van der Mark J (1928) The heart beat
considered as a relaxation oscillation, and an electrical model
and thus this shows that of the heart. Philosophical Magazine 6(7): 763775.

Bi-Hamiltonian Methods in Soliton Theory


M Pedroni, Universita di Bergamo, solution of the (nonlinear) Kortewegde Vries
Dalmine (BG), Italy equation (henceforth simply the KdV equation)
2006 Elsevier Ltd. All rights reserved. ut 14uxxx  6uux 1
to the solution of linear equations. After the KdV
equation, a lot of other nonlinear partial differential
Introduction
equations, solvable by means of the inverse-scattering
At the end of the 1960s, the theory of integrable method, were found out. A common feature of such
systems received a great boost by the discovery equations is the existence of soliton solutions, that
(made by Gardner, Green, Kruskal, and Miura) of is, solutions in the shape of a solitary wave (with
the inverse-scattering method (see Integrable additional interaction properties). For this reason
Systems: Overview). It allows one to reduce the they are called soliton equations.
Bi-Hamiltonian Methods in Soliton Theory 291

It was soon observed that the KdV equation can Hamiltonian Methods in Soliton Theory
be seen as an infinite-dimensional Hamiltonian
The most famous example of soliton equation is
system with an infinite sequence of constants of
the KdV equation [1], where u is usually a
motion in involution; the corresponding (commut-
periodic or rapidly decreasing real function. The
ing) vector fields are symmetries for the KdV
choice of the coefficients in the equation has no
equation, and form the so-called KdV hierarchy. In
special meaning, since they can be changed
particular, Zakharov and Faddeev constructed
arbitrarily by rescaling x, t, and u. Right after
action-angle variables for the KdV equation. These
the discovery of the inverse-scattering method for
facts pointed out that the KdV equation is an
solving the Cauchy problem for the KdV equation,
infinite-dimensional analog of a classical integrable
it was realized that this equation can be seen as an
Hamiltonian system (Dubrovin et al. 2001), whose
infinite-dimensional Hamiltonian system. Indeed,
theory has been developed during the nineteenth
from a geometrical point of view, eqn [1] defines a
century by Liouville, Jacobi, and many others.
vector field X(u) = (1=4)(uxxx  6uux ) on M, the
Moreover, the infinite-dimensional case suggested
infinite-dimensional vector space of C1 functions
methods (such as the existence of a Lax pair) which
from the unit circle S1 to R. (For the sake of
were applied successfully also to finite-dimensional
simplicity, we consider only the periodic case; the
cases such as the Toda lattices and the Calogero
integrals in this article are therefore understood to
systems. More recently, after the discovery by
be taken on S1 .) The vector field X associated with
Witten and Kontsevich of remarkable relations
the KdV equation is Hamiltonian, that is, it can be
between the KdV hierarchy and matrix models of
factorized as
two-dimensional (2D) quantum gravity, there has
been a renewed interest in the study of soliton  
Xu 2@x  18uxx 3u2
equations in the community of theoretical physicists.
We also mention that the classical versions of the where dH = (1/8)(uxx 3u2 ) is the differential of
extended W n -algebras of 2D conformal field theory the functional
are the (second) Poisson structures of the Gelfand Z  
1 1
Dickey hierarchies. Hu u3 u2x dx
In this article we describe the so-called 8 2
bi-Hamiltonian formulation of soliton equations. that is, the variational derivative h=u of the density
This approach to integrable systems springs from the h = (1=8)(u3 (1/2)u2x ), and P = 2@x is a Poisson
observation, made by Magri at the end of the 1970s, that (or Hamiltonian) operator. This means that the
the KdV equation can be seen as a Hamiltonian system corresponding composition law
in two different ways. In the same circle of ideas, there Z Z
were important works by Adler, Dorfman, Gelfand, fF; Gg dF PdG dx 2 dF dGx dx 2
Kupershmidt, Wilson, and many others. Thus, the
concept of bi-Hamiltonian manifold, which constitutes
between functionals of u has the usual properties
the geometric setting for the study of bi-Hamiltonian
of the Poisson bracket, that is, it is R-bilinear
systems, emerged. This notion and its applications to the
and skew-symmetric, and it fulfills the Leibniz
theory of finite-dimensional integrable systems is
rule and the Jacobi identity. In other words,
discussed in Multi-Hamiltonian Systems.
(M, P) is an infinite-dimensional Poisson mani-
In the first section of this article, we discuss the
fold. Using the Poisson bracket [2], eqn [1] can
Hamiltonian form of soliton equations and, more
be written as
generally, we present an important class of infinite-
dimensional Poisson (also called Hamiltonian) ut fu; Hg 3
structures, namely those of hydrodynamic type.
Then we show how to use the bi-Hamiltonian corresponding to the usual Hamilton equation in
properties of the KdV equation in order to construct R2n
its conserved quantities. We also recall that the KdV z_ i fzi ; Hg; i 1; . . . ; 2n 4
equation can be seen as an Euler equation on the
dual of the Virasoro algebra. In the third section, we up to the replacement of z with u, and of the
deal with other examples of integrable evolution discrete index i with the continuous index x. More
equations admitting a bi-Hamiltonian representa- precisely, in the expression ut = {u, H} the symbol u
tion, that is, the Boussinesq and the CamassaHolm should be replaced by ux (in analogy with zi ), the
equations, and we consider the bi-Hamiltonian functional assigning to the generic function v 2 M
structures of hydrodynamic type. its value at a fixed point x, that is, ux : v 7! v(x). In
292 Bi-Hamiltonian Methods in Soliton Theory

 
these notations, the Poisson bracket [2] takes the ij 0 I
form P 
I 0
fux ; uy g 20 x  y then we have the Hamiltonian formulation of the
where the -function is as usual defined as field equations,
Z h h
f yx  y dx f x qit ; pit  ; i 1; . . . ; N
pi qi
so that its derivatives are given by Another important example of Poisson bracket on
Z Mn is given by
f yk x  y dx f k x
fui; x ; uj; y g gij 0 x  y 8
Another important example is given by the where gij are the entries of a constant symmetric
Boussinesq equation matrix. In this case,R the Hamiltonian vector field

associated with H = h dx is given by
utt 13 uxxxx 4u2x 4uuxx 5  
Xn
h
i ij
describing, like KdV, shallow water (soliton) waves ut g @x ; i 1; . . . ; n 9
uj
in a nonlinear approximation. It can be obtained by j1
the first-order (in time) system R
Notice that this vector field is zero if H = uk dx,
u1 t 23 u2 u2x u1xx  23 u2xxx ; u2 t 2u1x  u2xx 6 with k = 1, . . . , n. This amounts to saying that such
an H is a Casimir function of the Poisson bracket
by taking the derivative of its second equation with [8], that is, that {H, F} = 0 for all functionals F. A
respect to t, plugging the result in the first one, and simple example of this class (with n = 2) is given by
setting u= u2 . The system [6] is Hamiltonian, since it the Poisson structure of the Boussinesq equation,
can be written as corresponding to the choice g11 = g22 = 0 and
    g12 = g21 = 1. Suppose now that the matrix with
h h
u1 t 2
; u 2
t entries gij is invertible. Then they can be interpreted
u x u1 x as the contravariant components of a flat pseudo-
with h = (u1 )2 (1=9)(u2 )3  u1 u2x (1=3)(u2x )2 , and Riemannian metric in Rn . A change of coordinates
  (u1 , . . . , un ) 7! ( n ) in Rn transforms the
u1 , . . . , u
0 @x Poisson bracket [9] in
7
@x 0
ui; x ; u
f u0 x  y ijk 
j; y g gij  ukx x  y 10
u
is easily seen to be a Poisson operator. Thus, the
Poisson manifold associated with the Boussinesq where gij (
u) are the components of the metric in the
ij
equation is the space of periodic C1 functions with new coordinates and the k are the contravariant
values in R2 . More generally, one can consider the Christoffel symbols related to the usual Christoffel
space Mn of C1 functions from the unit circle S1 to symbols by
Rn . If Pij , for i, j = 1, . . . , n, are the entries of a ij j
constant skew-symmetric matrix and ui, x assigns to k gil lk 11
the generic function v 2 Mn the value of its ith Conversely, the expression [10] gives a Poisson
components at a fixed point x, then bracket if the metric defined by gij is flat and its
Christoffel symbols are related to the ijk by [11].
fui; x ; uj; y g Pij x  y
These are the Poisson structures of hydrodynamic
defines a Poisson bracket on Mn . One can also let type introduced by Dubrovin and Novikov. We will
the Pij depend on the uk in such a way that they consider them again later.
form Rthe components of a Poisson tensor on Rn . If
H = h dx is a functional on Mn with density h, the
associated Hamiltonian vector field gives rise to the Bi-Hamiltonian Formulation
following system of partial differential equations: of the KdV Equation
X
n
h The KdV equation [1] has a lot of remarkable
uit Pij ; i 1; . . . ; n
j1
u j properties, such as the Lax representation and the
existence of a -function. In this section, we recall a
In particular, if n = 2N and geometrical feature of KdV, namely, the fact that it
Bi-Hamiltonian Methods in Soliton Theory 293

has a second Hamiltonian structure, and we show Such relations are often called LenardMagri rela-
that the integrability of KdV can be seen as a natural tions. Then the functionals Hk are in involution with
consequence of its double Hamiltonian representa- respect to both Poisson brackets. Indeed, for k > j,
tion. We have already seen that the KdV vector field one has
X(u) = (1=4)(uxxx  6uux ) can be written as
fHj ; Hk g0 fHj ; Hk1 g1 fHj1 ; Hk1 g0
Xu P0 dH2    fHk ; Hj g0
where P0 = 2@x and so that {Hj , Hk }0 = 0 for all j, k  0, and therefore
Z   {Hj , Hk }1 = 0 for all j, k  0. Hence, these func-
1 1
H2 u3 u2x dx tionals are constants of motion (in involution) for
8 2
the KdV equation. The Hamiltonian vector fields
But X admits another Hamiltonian representation: associated with them are symmetries for the KdV
equation; the corresponding evolution equations are
Xu P1 dH1 called higher-order KdV equations. The set of such
equations is the well-known KdV hierarchy. We
where P1 = (1=2)@xxx 2u@x ux and
remark that the existence of a sequence of func-
Z
1 tionals {Hk }k0 , fulfilling the LenardMagri rela-
H1  u2 dx tions [12] and starting from a Casimir of P0 , is
4
equivalentP to the existence of a Casimir function
The important point is that P1 is also a Poisson H() = k0 Hk k for the Poisson pencil
operator. Moreover, it is compatible with P0 , that is, P = P1  P0 , where  is a real parameter. A
any linear combination of P0 and P1 is still a Poisson straightforward way (due essentially to Miura,
operator. Thus, the KdV equation is a bi-Hamiltonian Gardner, and Kruskal) to determine such a Casimir
system, that is, it can be seen in two different (but function is to consider the (generalized) Miura map
compatible) ways as a Hamiltonian system. Next, we h 7! u = hx h2  . As shown by Kupershmidt
will show how this property can be used to construct and Wilson, it transforms the Poisson structure
an infinite sequence of conserved quantities for the (1=2)@x (in the variable h) into the Poisson pencil
KdV equation, which are in involution with respect to P =  (1=2)@xxx 2(u )@x ux . Given u, the
the Poisson brackets { , }0 and { , }1 associated with Riccati equation
P0 and P1 . In particular, the phase space M of KdV
is a bi-Hamiltonian manifold, that is, it has two hx h2 u  13
different (but compatible) Poisson structures. Let us
rename X1 = X the KdV vector field. Since admits a unique Psolution with the asymptotic
X = P0 dH2 = P1 dH1 , one is naturally led to con- expansion h = z k1 hk zk , where z2 = . More-
sider the vector fields over, the coefficients hk are differential polynomials
in u (i.e., polynomials in u and its x-derivatives) that
X0 P0 dH1 ; X2 P1 dH2 can be computed by recurrence. Thus, the general-
ized Miura map can be seen as an Rinvertible
Explicitly, X0 (u) = ux and X2 (u) = (1=16)(uxxxxx 
transformation. Since the functional h 7! h dx is a
10uuxxx  20ux uxx 30u2 ux ). One can check that
Casimir of the Poisson structure (1=2)@x , it follows
these vector fields are also Rbi-Hamiltonian. Indeed,
that if h(u) is the
R solution of the Riccati equation
X0 (u) = P1 dH0 , with H0 = u dx, and
[13], then u 7! h(u) dx is a Casimir of the Poisson
X2 P0 dH3 with pencil P . More precisely,
R one has to introduce the
Z functional H() = z h(u) dx, that turns out to be a
1 2 5
H3  uxx 5uu2x u4 dx Laurent series in , because the even coefficients of
64 2 h(u) are x-derivatives. This is the Casimir function
The functional H0 is a Casimir of P0 , that is, we were looking for. Explicitly, one finds that the
P0 dH0 = 0, so that the iteration ends on this side, first terms of h(u) are
but it can be continued indefinitely from the other
h1 12u; h2 14ux ; h3 18uxx  u2
side, as shown below. For the time being, let us take
1
for granted that there exists an infinite sequence h4 16 uxxx  4uux
{Hk }k0 of functionals such that P1 dHk = P0 dHk1 ; 1
h5 32 uxxxx  6uuxx 5u2x 2u3
in other words,
Obviously, h1 is the density of a Casimir function of
f; Hk g1 f; Hk1 g0 12 P0 , while h3 and h5 are (one-half of) the densities of the
294 Bi-Hamiltonian Methods in Soliton Theory

two Hamiltonians H1 and H2 of the KdV equation. This is (up to rescaling) the second Poisson
We conclude this section showing that, as observed bracket of KdV. The KdV equation is therefore
by Khesin and Ovsienko (Arnold and Khesin 1998), an Euler equation, that is, it can be obtained from
the bi-Hamiltonian structures of KdV have a clear the Euler equations for the rigid body by repla-
Lie-algebraic origin. Indeed, the second Hamiltonian cing the Lie algebra of the rotation group with
structure is the LiePoisson structure on the dual of the Virasoro algebra. To be more precise, the
the Virasoro algebra, while the first one can be Hamiltonian vector R field associated with
obtained by freezing the second one at a suitable H1 (u, c) = (1=2)( u2 dx c) is
point. Let X (S1 ) be the Lie algebra of vector fields
on S1 . The Virasoro algebra is the vector space ut 3uux cuxxx 0; ct 0
g = X (S1 )  R endowed with the Lie-algebra
If c 6 0, this is (up to rescaling) the KdV equation
structure
[1]. For c = 0, we have the Burgers equation (also
   
@ @ called dispersionless KdV equation), to be discussed
f x ; a ; gx ; b again later on. The first Poisson bracket for the KdV
@x @x
@ hierarchy can be obtained by freezing the Lie
f 0 xgx  g0 xf x ; Poisson bracket at the point ((1=2)dx  dx, 0) of the
Z @x
dual of the Virasoro algebra. This means that
f 0 xg00 x dx 14 instead of [16] one has to consider

It is called a central extension of X (S1 ) since it is fF; Gg0 u; c


   0  
obtained by considering the usual commutator 1 f g
between vector fields (up to a sign) and by adding dx  dx; 0 ;
2 u u
a copy of R, which turns out to be the center of  0   Z  0  00 
g f @ f g
the Virasoro algebra. Equation [14] gives rise  ; dx
indeed to aR Lie-algebra structure because the u u @x u u
Z  0    0  
expression f 0 (x)g00 (x) dx defines a 2-cocycle of 1 f g g f
 dx 17
X (S1 ). The dual space g of g can be considered 2 u u u u
as the space of the pairs (u dx  dx, c), where
u 2 C1 (S1 ) and c 2 R. The pairing is obviously The
R corresponding Hamiltonian is H2 = (1=2)
given by (u3 cu2x ) dx. From this (Lie algebraic) point of
  Z view, the compatibility between the two Poisson
@ brackets follows from the fact that the pencil { , } =
u dx  dx; c; f ;a uxf x dx ac
@x { , }  { , }0 is obtained from the LiePoisson
The LiePoisson structure on the dual g of a Lie bracket { , } by applying the translation
  
algebra g is defined as 
u dx  dx; c 7! u dx  dx; c
fF; GgX hX; dFX; dGXi 15 2

where F, G 2 C1 (g) and their differentials at X 2 g


are seen as elements
R of g. When g is the RVirasoro algebra
and F(u, c) = f (u, c) dx, G(u, c) = g(u, c) dx are
Other Examples
two functionals on g whose densities f and g are
differential polynomials in u, one has In the previous section, we have presented the bi-
Hamiltonian structure of the KdV equation and
fF; Ggu; c some of its properties. Now we give two more
 0  
f g examples of equations the Boussinesq equation
u dx  dx; c; and the CamassaHolm equation admitting a
u u
 0   Z  0  00  bi-Hamiltonian formulation. We have seen in an
g f @ f g
 ; dx earlier section that the system [6] associated with
u u @x u u
Z           the Boussinesq equation [5] is Hamiltonian with
f 0 g g 0 f respect to the Poisson structure [7] and the
u  dx
u u u u Hamiltonian
Z  0  00 Z
f g
c dx 16 H1 u1 ; u2 u1 2 19 u2 3  u1 u2x 13 u2x 2 dx
u u
Bi-Hamiltonian Methods in Soliton Theory 295

Z x
A more complicated Poisson structure for this
system is ux my sinhy  x dy
0
! Z 1  
1 1
A 3@x4 3u2 @x2 9u1 @x 3u1x my cosh y  x  dy
P 18 2 sinh1=2 0 2
B 6@x3 6u2 @x 3u2x
The CamassaHolm equation is then bi-Hamiltonian
with with respect to the Poisson pair
A 2@x5  4u2 @x3  6u2x @x2 2u2 2 6u1x  6u2xx @x P1 @xxx  @x ; P2 2m@x mx
3u1xx  2u2xxx 2u2 u2x
Indeed, it can be written as mt = P1 dH2 = P1 dH2 ,
and where
Z
B 3@x4  3u2 @x2 9u1  6u2x @x 6u1x  3u2xx 1
H1  u2 u2x dx
2
It can be obtained by means of the Drinfeld Z
1
Sokolov reduction (or also by means of a H2 u3 uu2x dx
2
bi-Hamiltonian reduction) from the LiePoisson
structure (modified with the cocycle @x ) on the Notice that the Poisson pair of the CamassaHolm
space of C1 maps from S1 to the Lie algebra of equation can be obtained from that of KdV by
3 3 traceless matrices. This is the reason why it is moving the cocycle @xxx from the second Poisson
a Poisson structure, compatible with [7]. The system structure to the first one. Indeed,
[6] can be written as
0 1
! Pa;b;c a@xxx b@x c2m@x mx
u1t h2 /u1
@ A P
a; b; c 2 R 20
u2 h2 /u2
t
is a family of pairwise compatible Poisson operators.
where h2 = (1=3)u1 is the density of a Casimir of the Moreover, we mention that Misioek has shown that
Poisson structure [7]. Thus, the Boussinesq equation also the CamassaHolm equation is an Euler equation
is a bi-Hamiltonian system and can be shown to on the dual of the Virasoro algebra. We conclude this
possess, like KdV, an infinite sequence of conserved article with a brief discussion concerning the so-called
quantities and symmetries, forming the Boussinesq bi-Hamiltonian structures of hydrodynamic type. They
hierarchy. The KdV and the Boussinesq hierarchy are play a relevant role in the theory of Frobenius
indeed particular examples of GelfandDickey hier- manifolds, that, in turn, have deep relations with
archies (Dickey 2003). They are hierarchies of many important topics in contemporary mathematics
systems of n equations with n unknown functions and physics, such as GromovWitten invariants and
and they are related, via the DrinfeldSokolov isomonodromic deformations. As we have seen in the
approach, to the Lie algebra sl(n 1). As shown by earlier section, a Poisson structure of hydrodynamic
Adler, Dickey, and Gelfand, these hierarchies have a type is given, on the space of C1 maps from S1 to (an
bi-Hamiltonian formulation. Also the generalized open set of) Rn , by
KdV equations, associated by Drinfeld and Sokolov
with an arbitrary affine KacMoody Lie algebra, are fui; x ; uj; y g gij u0 x  y ijk uukx x  y 21
bi-Hamiltonian (or are obtained as suitable reduc-
tions of bi-Hamiltonian systems). Let us consider where gij (u) are the contravariant components of
now the (dispersionless) CamassaHolm equation a (pseudo-)Riemannian flat metric and the ijk are
the (contravariant) Christoffel symbols of the
ut  utxx 3uux 2ux uxx uuxxx 19 metric. If two Poisson structures of hydrodynamic
type are given, it can be shown that they are
which also describes shallow water waves, and compatible if and only if the two corresponding
possesses remarkable solutions called peakons, since metrics form a flat pencil. This means that their
they represent traveling waves with discontinuous linear combinations (with constant coefficients)
first derivative. In order to supply this equation with a are still flat (pseudo-)Riemannian metrics, and
(bi-)Hamiltonian structure, one has to perform the that the contravariant Christoffel symbols of the
change of variable m = u uxx , whose inverse, in the linear combinations are the linear combinations
space of period-1 functions, turns out to be given by of the contravariant Christoffel symbols of the
296 Billiards in Bounded Convex Domains

two metrics. The simplest example is given by the Further Reading


bi-Hamiltonian formulation of the Burgers (or
Arnold VI and Khesin BA (1998) Topological Methods in
dispersionless KdV) equation, Hydrodynamics. New York: Springer.
ut 3uux 0 Baszak M (1998) Multi-Hamiltonian Theory of Dynamical
Systems. Berlin: Springer.
that we have already encountered. We know that Dickey LA (2003) Soliton Equations and Hamiltonian Systems,
this equation is Hamiltonian with respect to the 2nd edn. River Edge: World Scientific.
(Lie)Poisson operator R2u@x ux , with Hamiltonian Dorfman I (1993) Dirac Structures and Integrability of Nonlinear
Evolution Equations. Chichester: Wiley.
function H1 = (1=2) u2 dx, and with respect to Drinfeld VG and Sokolov VV (1985) Lie algebras and equations
the Poisson operator
R @x , with Hamiltonian function of Kortewegde Vries type. Journal of Soviet Mathematics
H2 = (1=2) u3 dx. This also means that the bi- 30: 19752036.
Hamiltonian structure of the Burgers equation Dubrovin BA (1996) Geometry of 2D topological field theories.
comes from the family [20]. The first Hamiltonian In: Donagi R et al. (ed.) Integrable Systems and Quantum
Groups (Montecatini Terme, 1993), Lecture Notes in Mathe-
structure corresponds to the standard metric on R, matics, vol. 1620, pp. 120348. Berlin: Springer.
that is, du  du, whereas the second one is given by Dubrovin BA, Krichever IM, and Novikov SP (2001) Integrable
the metric (2u)1 du  du. systems. I. In: Arnold VI (ed.) Encyclopaedia of Mathematical
Sciences. Dynamical Systems IV, pp. 177332. Berlin: Springer.
See also: Classical r-Matrices, Lie Bialgebras, and Faddeev LD and Takhtajan LA (1987) Hamiltonian Methods in
Poisson Lie Groups; Hamiltonian Fluid Dynamics; the Theory of Solitons. Berlin: Springer.
Infinite-Dimensional Hamiltonian Systems; Integrable Magri F, Falqui G, and Pedroni M (2003) The method of Poisson
pairs in the theory of nonlinear PDEs. In: Conte R et al. (ed.)
Systems and Recursion Operators on Symplectic and
Direct and Inverse Methods in Nonlinear Evolution Equations,
Jacobi Manifolds; Integrable Systems: Overview;
Lecture Notes in Physics, vol. 632, pp. 85136. Berlin: Springer.
Kortewegde Vries Equation and Other Modulation Marsden JE and Ratiu TS (1999) Introduction to Mechanics and
Equations; Multi-Hamiltonian Systems; Recursion Symmetry, 2nd edn. New York: Springer.
Operators in Classical Mechanics; Solitons and Olver PJ (1993) Applications of Lie Groups to Differential
KacMoody Lie Algebras; Toda Lattices; WDVV Equations, 2nd edn. New York: Springer.
Equations and Frobenius Manifolds.

Billiards in Bounded Convex Domains


S Tabachnikov, Pennsylvania State University, table and let AXB be a billiard trajectory from A to
University Park, PA, USA B with reflection at a boundary point X. Then, the
2006 Elsevier Ltd. All rights reserved. position of a variable point X extremizes the length
AXB. This is the Fermat principle of geometrical
optics.
In this article, we discuss billiards in bounded
Billiard Flow and Billiard Ball Map
convex domains with smooth boundary, also called
The billiard system describes the motion of a free Birkhoff billiards. A related article treats billiards in
particle inside a domain with elastic reflection off the polygons (see Polygonal Billiards).
boundary. More precisely, a billiard table is a The billiard flow is defined as a continuous-time
Riemannian manifold M with a piecewise smooth dynamical system. The time-t billiard transformation
boundary, for example, a domain in the plane. The acts on unit tangent vectors to M which constitute the
point moves along a geodesic line with a constant speed phase space of the billiard flow, and the manifold M is
until it hits the boundary. At a smooth boundary point, its configuration space. Thus, the billiard flow is the
the billiard ball reflects so that the tangential compo- geodesic flow on a manifold with boundary.
nent of its velocity remains the same, while the normal It is useful to reduce the dimensions by one and to
component changes its sign. This means that both replace continuous time by discrete one, that is, to
energy and momentum are conserved. In dimension 2, replace the billiard flow by a mapping, called the
this collision is described by a well-known law of billiard ball map and denoted by T. The phase space
geometrical optics: the angle of incidence equals the of the billiard ball map consists of unit tangent
angle of reflection. Thus, the theory of billiards has vectors (x, v) with the foot point x on the boundary
much in common with geometrical optics. If the billiard of M and the inward direction v. A vector (x, v)
ball hits a corner, its further motion is not defined. moves along the geodesic through x in the direction
The billiard reflection law satisfies a variational of v to the next point of its intersection x1 with the
principle. Let A and B be fixed points in the billiard boundary @M, and then v reflects in @M to the new
Billiards in Bounded Convex Domains 297


where Sn1 and Bn1 are the unit sphere and the unit

disk in Euclidean spaces.


Existence and Nonexistence of Caustics
Given a plane billiard table, a caustic is a curve
Figure 1 Billiard ball map. inside the table such that if a segment of a billiard
trajectory is tangent to this curve then so is each
reflected segment. Caustics correspond to invariant
inward vector v1 . Then, one has: T(x, v) = (x1 , v1 ). circles of the billiard ball map (i.e., invariant curves
For a convex M, the map T is continuous. If M is that go around the phase cylinder): such an invariant
n-dimensional, then the dimension of the phase circle is a one-parameter family of oriented lines,
space of the billiard ball map is 2n  2. and the respective caustic is their envelop. An
Equivalently, and more in the spirit of geometrical envelop may have cusp-like singularities but if the
optics, one considers L, the space of oriented boundary of the billiard table is a smooth curve with
geodesics (rays of light) that intersect the billiard positive curvature then a caustic, sufficiently close to
table. This space of lines is in one-to-one correspon- the boundary, is smooth and convex.
dence with the phase space of the billiard ball map: One can recover the table from a caustic by the
to an inward unit vector (x, v) there corresponds the following string construction. Let  be a caustic.
oriented line through x in the direction v (Figure 1). Wrap a closed nonstretchable string around , pull it
The space of rays L carries a canonical symplec- tight at a point and move this point around  to
tic structure, that is, a closed nondegenerate obtain a new curve . Then,  is a caustic for the
differential 2-form. In the Euclidean case, this billiard inside . Note that this construction has one
symplectic structure ! is defined as follows. Given parameter, the length of the string.
an oriented line in Rn , let q be the unit vector The following useful mirror equation relates
along and p be the vector obtained by dropping various quantities depicted in Figure 2:
the perpendicular
P from the origin to . Then, 1 1 2k
! = dp ^ dq = dpi ^ dqi . This construction identi-
fies L with the cotangent bundle of the unit sphere: a b sin 
q is a unit vector and p is a (co)tangent vector at q, where k is the curvature of the boundary at the
and ! identifies with the canonical symplectic impact point.
structure of T  Sn1 . In the general case of a Do caustics exist for every convex billiard table?
Riemannian manifold M, the symplectic structure This is important to know, in particular, because the
on the space of oriented geodesics is obtained from existence of a caustic implies that the billiard ball
that on T  M by symplectic reduction. map is not ergodic. The answer is given by a
One has an important result: the billiard ball map theorem of Lazutkin: if the boundary of the billiard
preserves the symplectic structure T  (!) = !. As a table is sufficiently smooth and its curvature never
consequence, T is also measure preserving. In the vanishes, then there exists a collection of smooth
planar case, one has the following explicit formula caustics in the vicinity of the billiard curve whose
for this measure. Let t be an arc length parameter union has a positive area. Originally this theorem
along the boundary of the billiard table and let asked for 553 continuous derivatives; later this was
 2 [0, ] be the angle made by the unit vector with reduced to six. This result uses the techniques of the
this boundary. Then, (, t) are coordinates in the KAM (KolmogorovArnoldMoser) theory. The
phase space, identified with the cylinder, and the
invariant measure is sin  d dt. k
As a consequence, the total area of the phase

space equals 2L where L is the perimeter length of a b
the boundary of the billiard table, and the mean free
path equals A=L, where A is the area of the billiard
table. In the general n-dimensional case, the mean
free path equals

volSn1 volM
volBn1 vol@M Figure 2 String construction and mirror equation.
298 Billiards in Bounded Convex Domains

crucial fact is that, in appropriate coordinates, the One has the following theorem: a billiard
billiard ball map is approximated, near the bound- trajectory inside M remains tangent to fixed
ary of the phase cylinder, by the integrable map (n  1) confocal quadrics. A similar and closely
(x, y) 7! (x y, y). related result holds for the geodesic curves on M:
On the other hand, by a theorem of Mather, if the the tangent lines to a fixed geodesic on M are
curvature of a convex smooth billiard curve vanishes tangent to (n  2) other fixed quadrics, confocal
at some point, then this billiard ball map has no with M. For a triaxial ellipsoid, this theorem goes
invariant circles. This result belongs to the well- back to Jacobi.
developed theory of area-preserving twist maps of Explicit formulas for the integrals of the billiard
the cylinder, of which the billiard ball map is an in an n-dimensional ellipsoid [1] are as follows. Let
example. (x, v) be a phase point, a unit inward tangent vector
whose foot point x lies on the boundary. The
following functions are invariant under the billiard
ball map:
Integrable Billiards
X vi xj  vj xi 2
Let a plane billiard table be an ellipse with foci F1 Fi x; v v 2i ; i 1; . . . ; n
and F2 . It is known since antiquity that a billiard j6i
a2j  a2i
ball shot from F1 reflects to F2 . A generalization of
this optical property of the ellipse is the following these functions are not independent: F1    Fn = 1.
theorem: a billiard trajectory inside an ellipse In fact, the integrals Fi Poisson-commute (with
forever remains tangent to a fixed confocal conic. respect to the Poisson bracket associated with the
More precisely, if a segment of a billiard trajectory symplectic structure in the phase space of the
does not intersect the segment F1 F2 , then all the billiard ball map that was described above). Accord-
segments of this trajectory do not intersect F1 F2 and ing to the ArnoldLiouville theorem, this complete
are all tangent to the same ellipse with foci F1 and F2 ; integrability of the billiard inside an ellipsoid implies
and if a segment of a trajectory intersects F1 F2 , that the phase space is foliated by invariant tori and,
then all the segments of this trajectory intersect F1 F2 in appropriate coordinates, the map on each torus is
and are all tangent to the same hyperbola with foci a parallel translation.
F1 and F2 . Similar results on complete integrability hold
It follows that confocal ellipses are the caustics of for billiards inside quadrics in spaces of constant
the billiard inside an ellipse. In particular, a positive or negative curvature. The former is
neighborhood of the boundary of such a billiard the intersection of a quadratic cone with the
table is foliated by caustics. A long-standing unit sphere, and the latter with the unit
conjecture, attributed to Birkhoff, asserts that if a pseudosphere.
neighborhood of a strictly convex smooth boundary
of a billiard table is foliated by caustics, then this
Periodic Orbits
table is an ellipse. This conjecture remains open. The
best result in this direction is a theorem of Bialy: if Periodic billiard trajectories inside a planar billiard
almost every phase point of the billiard ball map in a table correspond to inscribed polygons of extremal
strictly convex billiard table belongs to an invariant perimeter length. When counting periodic trajec-
circle, then the billiard table is a disk. tories, one does not distinguish between polygons
The multidimensional analogs of the optical obtained from each other by cyclic permutation or
properties of an ellipse are as follows. Consider an reversing the order of the vertices. In other words,
ellipsoid M in Rn given by the equation one counts the orbits of the dihedral group Dn
acting on n-periodic billiard polygons.
x21 x22 x2n An additional topological characteristic of a
   1 1
a21 a22 a2n periodic billiard trajectory is the rotation number
defined as follows. Assume that the boundary  of a
and define the confocal family of quadrics M by the billiard table is parametrized by the unit circle and
equation consider a polygon (x1 , x2 , . . . , xn ) inscribed in .
x21 x2 x2 For all i, one has xi 1 = xi ti with ti 2 (0, 1). Since
2 2  2 n 1 the polygon is closed, t1    tn 2 Z. This integer,
a21  a2  an 
that takes values from 1 to n  1, is called the
where  is a real parameter. The topological type of rotation number of the polygon and denoted by .
M changes as  passes the values a2i . Changing the orientation of a polygon replaces the
Billiards in Bounded Convex Domains 299

4 5 4 4
5 2

3
3 2 3 2 5
5

2 4 3
1 1
1 1

Figure 3 Rotation numbers of periodic trajectories.

rotation number  by n  . The leftmost 5-periodic f = f , f j@M = 0. From the physical point of view,
trajectory in Figure 3 has  = 1 and the other three the eigenvalues  are the eigenfrequencies of the
 = 2. membrane M with a fixed boundary. Roughly
The following theorem is due to Birkhoff: for speaking, one can recover the length spectrum from
every n  2 and   b(n  1)=2c, coprime with n, that of the Laplacian. More precisely, the following
there exist two geometrically distinct n-periodic theorem of K Anderson and R Melrose holds:
billiard trajectories with the rotation number . For X  p
example, there are at least two 2-periodic billiard cos t i
trajectories inside every smooth oval: one is the i 2spec 
diameter, the longest chord, and another one is of is a well-defined generalized function (distribution)
minimax type, similar to the minor axis of an of t, smooth away from the length spectrum. That is,
ellipse. if l > 0 belongs to the singular support of this
In higher dimensions, lower bounds on the distribution, then there exists either a closed billiard
number of periodic billiard trajectories inside strictly trajectory of length l, or a closed geodesic of length l
convex domains with smooth boundaries were in the boundary of the billiard table.
obtained only recently by Farber and the present This relation between the Laplacian and the
author. Here is one of the results: for a generic length spectrum is due to the fact that geometric
billiard table in Rm , the number of n-periodic optics is not a very accurate description of light. In
trajectories is not less than (n  1)(m  1). The wave optics, light is considered as electromagnetic
proof consists in using the Morse theory to estimate waves, and geometric optics gives a realistic approx-
below the number of critical points of the perimeter imation only when the wave length is small. This
length function on the space of inscribed n-gons and small-wave approximation is based on the assump-
its quotient space by the dihedral group Dn , and the tion that the waves are locally almost harmonic,
main difficulty is in describing the topology of these while their amplitudes change slowly from point to
spaces. point. The substitution of such a function into the
Returning to convex smooth planar billiards, the corresponding PDEs gives, in the first approxima-
following conjecture remains open for a long time: tion, the equations of wave fronts, that is, of
the set of n-periodic points of the billiard ball map geometric optics.
has zero measure. This is easy for n = 2; for n = 3 Here is another spectral result concerning a
this is a theorem by M Rychlik. The motivation for smooth strictly convex plane domain, due to
this question comes from spectral geometry. In S Marvizi and R Melrose. Let Ln be the supremum
particular, according to a theorem of Ivrii, the and ln the infimum of the perimeters of simple
above conjecture implies the Weyl conjecture on billiard n-gons. Then,
the second term for the spectral asymptotics of the
Laplacian in a bounded domain with the Dirichlet lim nk Ln  ln 0
n!1
or Neumann boundary conditions.
for any positive k. Furthermore, Ln has an asymp-
totic expansion, as n ! 1,
Length Spectrum X1
ci
The set of lengths of the closed trajectories in a Ln l
i1
n2i
convex billiard M is called the length spectrum of M.
There is a remarkable relation between the length where l is the length of the boundary of billiard table
spectrum and the spectrum of the Laplace operator and ci are constants, depending on the curvature of
in M with the Dirichlet boundary condition: the boundary.
300 Black Hole Mechanics

Acknowledgments Gutkin E (2003) Billiard dynamics: a survey with the emphasis on


open problems. Regular and Chaotic Dynamics 8: 113.
This work was partially supported by NSF. Katok A and Hasselblatt B (1995) Introduction to the Modern
Theory of Dynamical Systems. Cambridge: Cambridge
See also: Adiabatic Piston; Hamiltonian Systems: University Press.
Obstructions to Integrability; Hyperbolic Billiards; Kozlov V and Treshchev D (1991) Billiards. A Genetic Introduc-
tion to the Dynamics of Systems with Impacts. Providence:
Integrable Discrete Systems; Integrable Systems and
American Mathematical Society.
Algebraic Geometry; Optical Caustics; Integrable
Lazutkin V (1993) KAM Theory and Semiclassical Approxima-
Systems: Overview; Polygonal Billiards; Semiclassical tions to Eigenfunctions. Berlin: Springer.
Spectra and Closed Orbits; Separatrix Splitting; Stability Moser J (1980) Various Aspects of Integrable Hamiltonian Systems.
Theory and KAM. Progress in Mathematics, vol. 8, pp. 233289. Basel: Birhauser.
Siburg KF (2004) The Principle of Last Action in Geometry and
Dynamics, Lecture Notes in Mathematics, vol. 1844. Berlin:
Further Reading Springer.
Sinai Ya (1976) Introduction to Ergodic Theory. Princeton:
Chernov N and Markarian R, Theory of Chaotic Billiards Princeton University Press.
(to appear). Tabachnikov S (1995) Billiards, Societe Math. de France,
Farber M and Tabachnikov S (2002) Topology of cyclic Panoramas et Syntheses, No 1.
configuration spaces and periodic orbits of multi-dimensional Tabachnikov S (2005) Geometry and Billiards. American Mathe-
billiards. Topology 41: 553589. matic Society (to appear).

Black Hole Mechanics


A Ashtekar, Pennsylvania State University, of globally stationary black holes. In the second, we
University Park, PA, USA will consider black holes which are themselves in
2006 Elsevier Ltd. All rights reserved. equilibrium but in surroundings which may be time
dependent. Finally, in the third part, we summarize
what is known in the fully dynamical situations. For
simplicity, all manifolds and fields are assumed to be
Introduction
smooth and, unless otherwise stated, spacetime is
Over the last 30 years, black holes have been assumed to be four dimensional, with a metric of
shown to have a number of surprising properties. signature , , , , and the cosmological con-
These discoveries have revealed unforeseen relations stant is assumed to be zero. An arrow under a
between the otherwise distinct areas of general spacetime index denotes the pullback of that index to
relativity, quantum physics, and statistical the horizon.
mechanics. This interplay, in turn, led to a number
of deep puzzles at the very foundations of physics.
Global Equilibrium
Some have been resolved while others continue to
baffle physicists. The starting point of these To capture the intuitive notion that black hole is a
fascinating developments was the discovery of region from which signals cannot escape to the
laws of black hole mechanics by Bardeen, asymptotic part of spacetime, one needs a precise
Bekenstein, Carter, and Hawking. They dictate the definition of future infinity. The standard strategy is to
behavior of black holes in equilibrium, under small use Penroses conformal boundary J . A black hole
perturbations away from equilibrium, and in fully region B of a spacetime (M, gab ) is defined as B = Mn
dynamical situations. While they are consequences I (J ), where I denotes chronological past. The
of classical general relativity alone, they have a boundary @B of the black hole region is called the
close similarity with the laws of thermodynamics. event horizon and denoted by E. Thus, E is the
The origin of this seemingly strange coincidence lies boundary of the past of J . It therefore follows that E is
in quantum physics. For further discussion, a null 3-surface, ruled by future inextendible null
see Asymptotic Structure and Conformal Infinity; geodesics without caustics. If the spacetime is globally
Loop Quantum Gravity; Quantum Geometry and hyperbolic, an instant of time is represented by a
Its Applications; Quantum Field Theory in Curved Cauchy surface M. The intersection of B with M may
Spacetime; Stationary Black Holes. have several disjoint components, each representing a
The focus of this article is just on black hole black hole at that instant of time. If M0 is a Cauchy
mechanics. The discussion is divided into three parts. surface to the future of M, the number of disjoint
In the first, we will introduce the notions of event components of M0 [ B in the causal future of M [ B
horizons and black hole regions and discuss properties must be less than or equal to those of M [ B
Black Hole Mechanics 301

(see Hawking and Ellis (1973)). Thus, black holes can satisfies the t orthogonality property, its event
merge but can not bifurcate. (By a time reversal, i.e., by horizon E is a Killing horizon. (Although one can
replacing J with J  and I with I , one can define a envisage stationary black holes in which these
white hole region W. However, here we will focus only additional symmetry conditions are not met, this
on black holes.) possibility has been ignored in black hole mechanics
A spacetime (M, gab ) is said to be stationary (i.e., time on stationary spacetimes. Quasilocal horizons, dis-
independent) if gab admits a Killing field t a that cussed below, do not require any spacetime symme-
represents an asymptotic time translation. By conven- tries.) In these cases, the normalization freedom in
tion, t a is assumed to be unit at infinity. (M, gab ) is said Ka is fixed by requiring that Ka have the form
to be axisymmetric if gab admits a Killing field a
Ka ta a 2
generating an SO(2) isometry. By convention a is
normalized such that the affine length of its integral on the horizon, where  is a constant, called the
curves is 2. Stationary spacetimes with nontrivial Mn angular velocity of the horizon. The resulting  is
I (J ) represent black holes which are in global called the surface gravity of the black hole. It is
equilibrium. In the EinsteinMaxwell theory in four remarkable that  is constant for all such black
dimensions, there exists a unique three-parameter holes, even when their horizon is highly distorted
family of stationary black hole solutions, generally (i.e., far from being spherically symmetric) either
parametrized by mass m, angular momentum J, and due to rotation or due to external matter fields. This
electric charge Q. This is the celebrated KerrNewman is analogous to the fact that the temperature of a
family. Therefore, in general relativity a great deal of thermodynamical system in equilibrium is constant,
work on black holes has focused on these solutions and independently of the details of the system. In
perturbations thereof. The KerrNewman family is analogy with thermodynamics, constancy of  is
axisymmetric and furthermore, its metric has the referred to as the zeroth law of black hole
property that the 2-flats spanned by the Killing fields mechanics.
t a and a are orthogonal to a family of 2-surfaces. This Next, let us consider an infinitesimal perturbation
property is called t orthogonality. These features of  within the three-parameter KerrNewman family.
KerrNewman space-times are widely used in black A simple calculation shows that the changes in the
hole physics. Note however that uniqueness fails in ArnowittDeserMisner (ADM) mass m, angular
higher dimensions, and also in the presence of momentum J, and the total charge Q of the
nonabelian gauge fields or rings of perfect fluids around spacetime and in the area a of the horizon are
black holes in four dimensions. In mathematical constrained via
physics, there is significant literature on the new 
stationary black hole solutions in EinsteinYang m a   J Q 3
8G
MillsHiggs theories. These are called hairy black
holes. Research on stationary black hole solutions with where the coefficients , ,  are black hole para-
rings received a boost by a recent discovery that these meters,  = Aa Ka being the electrostatic potential at
black holes can violate the Kerr inequality J  Gm2 the horizon. The last two terms, J and Q, have
between angular momentum J and mass m. the interpretation of work required to spin the
A null 3-manifold K in M is said to be a Killing black hole up by an amount J or to increase its
horizon if gab admits a Killing field Ka which is charge by Q. Therefore, [3] has a striking resem-
everywhere normal to K. On a Killing horizon, one blance to the first law, E = TS W, of thermo-
can show that the acceleration of Ka is proportional dynamics if (as the zeroth law suggests)  is made
to Ka itself: proportional to the temperature T, and the horizon
area a to the entropy S. Therefore, [3] and its
Ka ra Kb Kb 1 generalizations discussed below are referred to as
the first law of black hole mechanics.
The proportionality function  is called surface In KerrNewman spacetimes, the only contribu-
gravity. We will show in the next section that if a tion to the stressenergy tensor comes from the
mild energy condition holds on K, then  must be Maxwell field. Bardeen et al. (1973) consider
constant. Note that if we rescale Ka via Ka ! cKa , stationary black holes with matter such as perfect
where c is a constant, surface gravity also rescales as fluids in the exterior region and stationary perturba-
 ! c. tions  thereof. Using Einsteins equations, they
In the KerrNewman family, the event horizon is show that the form [3] of the first law does not
a Killing horizon. More generally, if an axisym- change; the only modification is addition of certain
metric, stationary black hole spacetime (M, gab ) matter terms on the right-hand side which can be
302 Black Hole Mechanics

interpreted as the work W done on the total physically. These considerations call for a replace-
system. A generalization in another direction was ment of E by a quasilocal horizon which leads to a
made by Iyer and Wald (1994) using Noether first law involving only horizon attributes, and
currents. They allow nonstationary perturbations which can grow only in response to the influx of
and, more importantly, drop the restriction to energy. Such horizons are discussed in the next two
general relativity. Instead, they consider a wide sections.
class of diffeomorphism-invariant Lagrangian
densities L(gab , Rabcd , ra Rbcde , . . . , .. .. , ra .. .. , . . . )
Local Equilibrium
which depend on the metric gab , matter fields .. .. ,
and a finite number of derivatives of the Riemann The key idea here is drop the requirement that
tensor and matter fields. Finally, they restrict spacetime should admit a stationary Killing field and
themselves to  6 0. In this case, on the maximal ask only that the intrinsic horizon geometry be time
analytic extension of the spacetime, the Killing field independent. Consider a null 3-surface  in a
Ka vanishes on a 2-sphere So called the bifurcate spacetime (M, gab ) with a future-pointing normal
horizon. Then, [3] is generalized to field a . The pullback qab := gab of the spacetime
 metric to  is the intrinsic, degenerate metric of 
m Shor W 4 with signature 0, , . The first condition is that it
2
be time independent, that is, L qab = 0 on .
Here W again represents work terms and Shor is Then by restriction, the spacetime derivative opera-
given by tor r induces a natural derivative operator D on .
I While D is compatible with qab , that is, Da qbc = 0, it
L
Shor 2 nab ncd 5 is not uniquely determined by this property because
So Rabcd
qab is degenerate. Thus, D has extra information,
where nab is the binormal to So (with nab nab = 2), not contained in qab . The pair (qab , D) is said to
and the functional derivative inside the integral is determine the intrinsic geometry of the null surface
evaluated by formally viewing the Riemann tensor . This notion leads to a natural definition of a
as a field independent of the metric. For the horizon in local equilibrium. Let  be a null, three-
EinsteinHilbert action, this yields Shor = a=4G and dimensional submanifold of (M, gab ) with topology
one recovers [3]. S  R, where S is compact and without boundary.
These results are striking. However, the under-
Definition 1  is said to be isolated horizon if it
lying assumptions have certain unsatisfactory
admits a null normal a such that:
aspects. First, although the laws are meant to refer
just to black holes, one assumes that the entire (i) L qab = 0 and [L , D] = 0 on  and
spacetime is stationary. In thermodynamics, by (ii) T a b b is a future pointing causal vector on .
contrast, one only assumes that the system under
On can show that, generically, this null normal field
consideration is in equilibrium, not the whole
a is unique up to rescalings by positive constants.
universe. Second, in the first law, quantities a, , 
are evaluated at the horizon while M, J are Both conditions are local to . In particular, (M, gab )
evaluated at infinity and include contributions from is not required to be asymptotically flat and there is no
possible matter fields outside the black hole. A more longer any teleological feature. Since  is null and
satisfactory law of black hole mechanics would L qab = 0, the area of any of its cross sections is the
involve attributes of the black hole alone. Finally, same, denoted by a . As one would expect, one can
the notion of the event horizon is extremely global show that there is no flux of gravitational radiation or
and teleological since it explicitly refers to J . An matter across . This captures the idea that the black
event horizon may well be developing in the very hole itself is in equilibrium. Condition (ii) is a rather
room you are sitting today in anticipation of a weak energy condition which is satisfied by all
gravitational collapse in the center of our galaxy matter fields normally considered in classical general
which may occur a billion years hence. This feature relativity. The nontrivial condition is (i). It extracts
makes it impossible to generalize the first law to from the notion of a Killing horizon just a tiny part
fully dynamical situations and relate the change in that refers only to the intrinsic geometry of . As a
the event horizon area to the flux of energy and result, every Killing horizon K is, in particular, an
angular momentum falling across it. Indeed, one can isolated horizon. However, a spacetime with an
construct explicit examples of dynamical black holes isolated horizon  can admit gravitational radiation
in which an event horizon E forms and grows in the and dynamical matter fields away from . In fact, as a
flat part of a spacetime where nothing happens family of RobinsonTrautman spacetimes illustrates,
Black Hole Mechanics 303

gravitational radiation could even be present arbitra- matter fields along a define a vector field X() on
rily close to . Because of these possibilities, there are G. One shows that it is an infinitesimal canonical
many nontrivial examples and the transition from transformation, that is, satisfies LX() W = 0, where W
event horizons of stationary spacetimes to isolated is the symplectic structure on G. The Hamiltonian
horizons represents a significant generalization of H() generating this canonical transformation is
black hole mechanics. (In fact, the derivation of the given by
zeroth and the first law requires slightly weaker 

assumptions, encoded in the notion of a weakly H J  J1
I I 7
isolated horizon (Ashtekar et al. 2000, 2001).)  1 1
An immediate consequence of the requirement J  !a a   Aa a ? F
8G S 4 S
L qab = 0 is that there exists a 1-form !a on  such
()
that Da b = !a b . Following the definition of  on a where J1 is the ADM angular momentum at
Killing horizon, the surface gravity () of (, ) is infinity, S is any cross section of , and  the area
()
defined as () = !a a . Again, under a ! ca , we have element thereon. The term J is independent of the
(c) = c . Together with Einsteins equations, the choice of S made in its evaluation and interpreted as
two conditions of Definition 1 imply L !a = 0 and the horizon angular momentum. It has numerous
a D[a !b] = 0. The Cartan identity relating the Lie properties that support this interpretation. In parti-
and exterior derivative now yields cular, it yields the standard angular momentum
expression in KerrNewman spacetimes.
Da !b b  Da  0 6 To define horizon energy, one has to introduce a
Thus, surface gravity is constant on every isolated time-translation vector field ta . At infinity, ta must
horizon. This is the zeroth law, extended to horizons tend to a unit time translation. On , it must be a
representing local equilibrium. In the presence of an symmetry of qab . Since a and a are both horizon
electromagnetic field, Definition 1 and the field symmetries, ta = ca a on , for some constants
equations imply L Fab = 0 and a Fab = 0. The first of c and . However, unlike a , the restriction of ta to
these equations implies that one can always choose a  cannot be fixed once and for all but must be
gauge in which L Aa = 0. By Cartan identity it then allowed to vary from one phase-space point to
follows that the electrostatic potential () := Aa a is another. In particular, on physical grounds, one
constant on the horizon. This is the Maxwell analog expects  to be zero at a phase-space point
of the zeroth law. representing a nonrotating black hole but nonzero
In this setting, the first law is derived using a at a point representing a rotating black hole. This
Hamiltonian framework (Ashtekar et al. 2000, freedom in the boundary value of ta introduces a
2001). For concreteness, let us assume that we are qualitatively new element. The vector field X(t) on G
in the asymptotically flat situation and the only defined by the Lie derivatives of gravitational and
gauge field present is electromagnetic. One begins by matter fields does not, in general, satisfy LX(t) W = 0;
restricting oneself to horizon geometries such that  it need not be an infinitesimal canonical transforma-
admits a rotational vector field a satisfying tion. The necessary and sufficient condition is that
L qab = 0. (In fact for black hole mechanics, it ((c) =8G)a J (c) Q be an exact var-
suffices to assume only that L ab = 0, where ab is iation. That is, X(t) generates a Hamiltonian flow if
the intrinsic area 2-form on . The same is true on and only if there exists a function E(t) on G such that
dynamical horizons discussed in the next section.) t c
One then constructs a phase space G of gravitational E a J c Q 8
8G
and matter fields such that (1) M admits an internal
boundary  which is an isolated horizon; and (2) all This is precisely the first law. Thus, the framework
fields satisfy asymptotically flat boundary conditions provides a deeper insight into the origin of the first
at infinity. Note that the horizon geometry is law: it is the necessary and sufficient condition for
allowed to vary from one phase-space point to the evolution generated by ta to be Hamiltonian.
another; the pair (qab , D) induced on  by the Equation [8] is a genuine restriction on the choice of
spacetime metric only has to satisfy Definition 1 and phase-space functions c and , that is, of restrictions
the condition L qab = 0. to  of evolution fields ta . It is easy to verify that M
Let us begin with angular momentum. Fix a admits many such vector fields. Given one, the
vector field a on M which coincides with the fixed Hamiltonian H(t) generating the time evolution
a on  and is an asymptotic rotational symmetry along ta takes the form
at infinity. (Note that a is not restricted in any way t
in the bulk.) Lie derivatives of gravitational and Ht Et
1  E 9
304 Black Hole Mechanics

re-enforcing the interpretation of E(t)  as the horizon It is tempting to ask if there is a local physical
energy. process directly responsible for the growth of area.
In general, there is a multitude of first laws, one for For event horizons, the answer is in the negative
each vector field ta , the evolution along which preserves since they can grow in a flat portion of spacetime.
the symplectic structure. In the EinsteinMaxwell However, one can introduce quasilocal horizons
theory, given any phase-space point, one can choose a also in the dynamical situations and obtain the
canonical boundary value toa exploiting the uniqueness desired result (Ashtekar and Krishnan 2003). These
theorem. E(to ) is then called the horizon mass and constructions are strongly motivated by earlier ideas
denoted simply by m . In the KerrNewman family, introduced by Hayward (1994).
H(to ) vanishes and m coincides with the ADM mass
Definition 2 A three-dimensional spacelike sub-
m1 . Similarly, if a is chosen to be a global rotational
() () manifold H of (M, gab ) is said to be a dynamical
Killing field, J equals J1 . However, in more general
horizon if it admits a foliation by compact
spacetimes where there is matter field or gravitational
2-manifolds S (without boundary) such that:
radiation outside , these equalities do not hold; m
and J represent quantities associated with the (i) the expansion () of one (future directed) null
horizon alone while the ADM quantities represent normal field a to S vanishes and the expansion
the total mass and angular momentum in the space- of the other (future directed) null normal field,
time, including contributions from matter fields and na is negative; and
gravitational radiation in the exterior region. In the (ii) T a b b is a future pointing causal vector on H.
first law [8], only the contributions associated with
One can show that this foliation of H is unique and
the horizon appear.
that S is either a 2-sphere or, under degenerate and
When the uniqueness theorem fails, as, for
physically over-restrictive conditions, a 2-torus. Each
example, in the EinsteinYangMillsHiggs theory,
leaf S is a marginally trapped surface and referred to as a
first laws continue to hold but the horizon mass m
cut of H. Unlike event horizons E, dynamical horizons
becomes ambiguous. Interestingly, these ambiguities
H are locally defined and do not display any teleological
can be exploited to relate properties of hairy black
feature. In particular, they cannot lie in a flat portion of
holes with those of the corresponding solitons. (For
spacetime. Dynamical horizons commonly arise in
a summary, see Ashtekar and Krishnan (2004).)
numerical simulations of evolving black holes as world
tubes of apparent horizons. As the black hole settles
down, H asymptotes to an isolated horizon , which
Dynamical Situations tightly hugs the asymptotic future portion of the event
horizon. However, during the dynamical phase, H
A natural question now is whether there is an analog of
typically lies well inside E.
the second law of thermodynamics. Using event
The two conditions in Definition 2 immediately
horizons, Hawking showed that the answer is in the
imply that the area of cuts of H increases mono-
affirmative (see Hawking and Ellis (1973)). Let (M, gab )
tonically along the outward direction defined by
admit an event horizon E. Denote by a a geodesic null
the projection of a on H. Furthermore, this change
normal to E. Its expansion is defined as () := qab ra b ,
turns out to be directly related to the flux of energy
where qab is any inverse of the degenerate intrinsic
falling across H. Let R denote the radius function
metric qab on E, and determines the rate of change of the
on H so that the area of any cut S is given by
area element of E along a . Assuming that the null energy
aS = 4R2 . Let N denote the norm of @a R and H,
condition and Einsteins equations hold, the Raychaud-
the portion of H bounded by two cross sections S1
huri equation immediately implies that if () were to
and S2 . The appropriate energy turns out to be
become negative somewhere it would become infinite
associated with the vector field Na , where a is
within a finite affine parameter. Hawking showed that,
normalized such that its projection on H is the unit
if there is a globally hyperbolic region containing
normal ^r a to the cuts S. In the generic and
I (J ) [ E that is, if there are no naked singularities
physically interesting case when S is a 2-sphere, the
this can not happen, whence ()  0 on E. Hence, if a
Gauss and the Codazzi (i.e., constraint) equations
cross section S2 of E is to the future of a cross section S1 ,
imply
we must have aS2  aS1 . Thus, in any (i.e., not
Z
necessarily infinitesimal) dynamical process, the change 1 1
a in the horizon area is always non-negative. This R2  R1 Tab Na ^b d3 V
2G H 16G
result is known as the second law of black hole Z  
mechanics. As in the first law, the analog of entropy is  N ab ab 2
a
a d3 V 10
the horizon area. H
Black Hole Mechanics 305

Here ^ a is the unit normal to H, ab the shear of a a cosmological constant . (The only significant
(i.e., the tracefree part of qam qbm rm n ), and
a = change is that the topology of cuts S of dynamical
qab^rc rc b , where qab is the projector onto the horizons is restricted to be S2 if  > 0 and is
tangent space of the cuts S. The first integral on completely unrestricted if  < 0.) In the first two
the right-hand side can be directly interpreted as the frameworks, results have also been extended to higher
flux across H of matterenergy (relative to the dimensions. Since the notions of isolated and dynami-
vector field Na ). The second term is purely cal horizons make no reference to infinity, these
geometric and is interpreted as the flux of energy frameworks can be used also in spatially compact
carried by gravitational waves across H. It has spacetimes. The notion of an event horizon, by
several properties which support this interpretation. contrast, does not naturally extend to these space-
Thus, not only does the second law of black hole times. On the other hand, the generalization [4] of the
mechanics hold for a dynamical horizon H, but the first law [3] is applicable to event horizons of
cause of the increase in the area can be directly stationary spacetimes in a wide class of theories while
traced to physical processes happening near H. so far the isolated and dynamical horizon frameworks
Another natural question is whether the first law are tied to general relativity (coupled to matter
[8] can be generalized to fully dynamical situations, satisfying rather weak energy conditions). From a
where  is replaced by a finite transition. Again, the mathematical physics perspective, extension to more
answer is in the affirmative. We will outline the idea general theories is an important open problem.
for the case when there are no gauge fields on H. As
with isolated horizons, to have a well-defined notion See also: Asymptotic Structure and Conformal Infinity;
of angular momentum, let us suppose that the Branes and Black Hole Statistical Mechanics; Dirac
intrinsic 3-metric on H admits a rotational Killing Fields in Gravitation and Nonabelian Gauge Theory;
Geometric Flows and the Penrose Inequality; Loop
field . Then, the angular momentum associated
Quantum Gravity; Minimal Submanifolds; Quantum Field
with any cut S is given by
Theory in Curved Spacetime; Quantum Geometry and its
I I
1 1 Applications; Random Algebraic Geometry, Attractors
JS  Kab a^rb d2 V  j d2 V 11 and Flux Vacua; Shock Wave Refinement of the
8G S 8G S
FriedmanRobertsonWalker Metric; Stationary Black
where Kab is the extrinsic curvature of H in (M, gab ) and Holes.
j() is interpreted as the angular momentum density.
Now, in the Kerr family, the mass, surface gravity, and
the angular velocity can be unambiguously expressed as Further Reading
well-defined functions m(a,  J) of the
 J), (a, J), and (a,
Ashtekar A, Beetle C, and Lewandowski J (2001) Mechanics
horizon area a and angular momentum J. The idea is to of rotating black holes. Physical Review 64: 044016 (gr-qc/
use these expressions to associate mass, surface gravity, 0103026).
and angular velocity with each cut of H. Then, a Ashtekar A, Fairhurst S, and Krishnan B (2000) Isolated horizons:
surprising result is that the difference between the Hamiltonian evolution and the first law. Physical Review D
62: 104025 (gr-qc/0005083).
horizon masses associated with cuts S1 and S2 can be
Ashtekar A and Krishnan B (2003) Dynamical horizons and their
expressed as the integral of a locally defined flux across properties. Physical Review D 68: 104030 (gr-qc/0308033).
the portion H of H bounded by H1 and H2 : Ashtekar A and Krishnan B (2004) Isolated and dynamical
Z I horizons and their applications. Living Reviews in Relativity
1 1  d2 V 10: 178 (gr-qc/0407042).
m2 m 1  da j
8G H 8G S2 Bardeen JW, Carter B, and Hawking SW (1973) The four laws of
I Z  2 I  black hole mechanics. Communications in Mathematical
 d2 V  Physics 31: 161.
 j 
d j d2 V 12 DeWitt BS and DeWitt CM (eds.) (1972) Black Holes.
S1 1
 S
Amsterdam: North-Holland.
If the cuts S2 and S1 are only infinitesimally separated, Frolov VP and Novikov ID (1998) Black Hole Physics.
this expression reduces precisely to the standard first Dordrecht: Kluwer.
Hawking SW and Ellis GFR (1973) Large Scale Structure of
law involving infinitesimal variations. Therefore, [12] is Space-Time. Cambridge: Cambridge University Press.
an integral generalization of the first law. Hayward S (1994) General laws of black hole dynamics. Physical
Let us conclude with a general perspective. On the Review D 49: 64676474.
whole, in the passage from event horizons in Iyer V and Wald RM (1994) Some properties of noether charge
stationary spacetimes to isolated horizons and then and a proposal for dynamical black hole entropy. Physical
Review D 50: 846864.
to dynamical horizons, one considers increasingly Wald RM (1994) Quantum Field Theory in Curved Spacetime and
more realistic situations. In all the three cases, the Black Hole Thermodynamics. Chicago: University of Chicago
analysis has been extended to allow the presence of Press.
306 Boltzmann Equation (Classical and Quantum)

Boltzmann Equation (Classical and Quantum)


M Pulvirenti, Universita di Roma La Sapienza, As fundamental features of eqn [1], we have the
Rome, Italy conservation in time of the following five quantities
2006 Elsevier Ltd. All rights reserved.
Z Z
dx dv f x; v; tv 4

with  = 0, 1, 2, expressing conservation of the


Introduction probability, momentum, and energy.
R R
From now on we shall set = R3 for notational
Ludwig Boltzmann (1872) established an evolution simplicity.
equation to describe the behavior of a rarefied gas, Moreover, Boltzmann introduced the (kinetic)
starting from the mathematical model of elastic balls entropy defined as
and using mechanical and statistical considerations. Z Z
The importance of this equation is twofold. First, it Hf dx dv f log f x; v 5
provides a reduced description (as well as the
hydrodynamical equations) of the microscopic and proved the famous H-theorem asserting the
world. Second, it is also an important tool for the decreasing of H(f (t)) along the solutions to eqn [1].
applications, especially for dilute fluids when the Finally, in the case of bounded domains or
hydrodynamical equations fail to hold. homogeneous solutions (f = f (v; t) is independent of
The starting point of the Boltzmann analysis is to x), the distribution defined for some  > 0,  > 0,
abandon the study of the gas in terms of the detailed and u 2 R 3 by
motion of molecules which constitute it because of
 2
their large number. Instead, it is better to investigate Mx; v 3=2
e=2jvuj 6
a function f (x, v), which is the probability density of 2=
a given particle, where x and v denote its position called Maxwellian distribution, is stationary for the
and velocity. Actually, f (x, v)dx dv is often confused evolution given by eqn [1]. In addition, M minimizes
with the fraction of molecules falling in the cell of H among all distributions with given total mass ,
the phase space of size dx dv around x, v. The two given mean velocity u, and mean energy. The
concepts are not exactly the same, but they are parameter  is interpreted as the inverse
asymptotically equivalent (when the number of temperature.
particles is diverging) if a law of large numbers holds. In conclusion, Boltzmann was able to introduce
The Boltzmann equation is the following: not only an evolutionary equation with the remark-
@t v  rx f Qf ; f 1 able properties expressing mass, momentum, and
energy conservation, but also the trend to the
where Q, the collision operator, is defined by eqn [2]: thermal equilibrium. In other words, he tried to
Z Z
conciliate the Newtons laws with the second
Qf ; f dv1 dnv  v1  n principle of thermodynamics.
R3 S2

 f x; v f x; v01  f x; vf x; v1 
0
2
The Boltzmann Heuristic Argument
and
Thus, we want to find an evolution equation for the
v0 v  nn  v  v1  quantity f (x, v; t). The molecular system we are
3
v01 v1 nn  v  v1  considering consists of N identical particles of
diameter r in the whole space R3 . We denote by
Moreover, n (the impact parameter) is a unitary x1 , v1 , . . . , xN , vN a state of the system, where xi and
vector and S2 = {njn  (v  v1 )  0}. vi indicate the position and the velocity of the
Note that v0 , v01 are the outgoing velocities after a particle i. The particles cannot overlap (i.e., the
collision of two elastic balls with incoming velocities centers of two particles cannot be at a distance
v and v1 and centers x and x rn, r being the smaller than the particle diameter r).
diameter of the spheres. Obviously, the collision The particles are moving freely up to the first
takes place if n  (v  v1 )  0. Equations [3] are a instance of contact, that is, the first time when two
consequence of the conservation of total energy, particles (say particles i and j) arrive at a distance r.
momentum, and angular momentum. Note also that Then the pair interacts when an elastic collision
r does not enter in eqn [1] as a parameter. occurs. This means that they change instantaneously
Boltzmann Equation (Classical and Quantum) 307

their velocities, according to the conservation of that we have to integrate over the hemisphere
the energy and linear and angular momentum. S2 = {(v2  v)  n > 0}:
More precisely, the velocities after a collision Z
with incoming velocities v and v1 are those given 2
G N  1r dv2
by formula [3]. After the first collision, the Z
system evolves by iterating the procedure. Here  dn f2 x; v; x nr; v2 jv2  v  nj 11
we neglect triple collisions because they are S
unlikely. The evolution equation for a tagged
Summing G and L, we get
particle is then of the form
Z
@t v  rx f Coll 7 Coll N  1r2 dv2
Z
where Coll denotes the variation of f due to the  dn f2 x; v; x nr; v2 v2  v  n 12
collisions.
We have which, however, is not a very useful expression
Coll G  L 8 because the time derivative of f is expressed in terms
of another object, namely f2 . An evolution equation
where L and G (the loss and gain terms, respectively) for f2 will imply f3 , the joint distribution of three
are the negative and positive contributions to the particles, and so on, up to we include the total
variation of f due to the collisions. More precisely, particle number N. Here the basic main assumption
L dx dv dt is the probability of the test particle to of Boltzmann enters, namely that two given particles
disappear from the cell dx dv of the phase space are uncorrelated if the gas is rarefied, namely
because of a collision in the time interval (t, t dt)
and Gdx dv dt is the probability to appear in the f x; v; x2 ; v2 f x; vf x2 ; v2 13
same time interval for the same reason. Let us
Condition [13], referred to as the propagation of
consider the sphere of center x with radius r and a
chaos, seems contradictory at first sight: if two
point x rn over the surface, where n denotes the
particles collide, correlations are created. Even though
generic unit vector. Consider also the cylinder with
we could assume eqn [13] at some time, if the test
base area dS = r2 dn and height jVjdt along the
particle collides with particle 2, such an equation
direction of V = v2  v.
cannot be satisfied anymore after the collision.
Then a given particle (say particle 2) with velocity
Before discussing the propagation of chaos
v2 can contribute to L because it can collide with the
hypothesis, we first analyze the size of the collision
test particle in the time dt, provided it is localized in
operator. We remark that, in practical situations
the cylinder and if V  n  0. Therefore, the contri-
for a rarefied gas, the combination Nr3  104 cm3
bution to L due to the particle 2 is the probability of
(i.e., the volume occupied by the particles) is very
finding such a particle in the cylinder (conditioned to
small, while Nr2 = O(1). This implies that G = O(1).
the presence of the first particle in x). This quantity is
Therefore, since we are dealing with a very large
f2 (x, v, x nr, v2 ) j (v2  v)  njr2 dn dv2 dt, where f2
number of particles, we are tempted to perform the
is the joint distribution of two particles. Integrating in
limit N ! 1 and r ! 0 in such a way that
dn and dv2 , we obtain that the total contribution to
r2 = O(N1 ). As a consequence, the probability that
L due to any predetermined particle is
Z Z two tagged particles collide (which is of the order of
the surface of a ball, i.e., O(r2 )) is negligible.
r2 dv2 dn f2 x; v; x nr; v2 jv2  v  nj 9
S2 However, the probability that a given particle
performs a collision with any one of the remaining
where S2 is the unit hemisphere (v2  v)  n < 0. N  1 particles (which is O(Nr2 ) = O(1)) is not
Finally, we obtain the total contribution multiplying negligible. Therefore, condition [13] is referring to
by the total number of particles: two preselected particles (say particles 1 and 2), so
Z
that it is not unreasonable to conceive that it holds
L N  1r2 dv2 in the limiting situation in which we are working.
Z However, we cannot insert [13] in [12] because
 dn f2 x; v; x nr; v2 jv2  v  nj 10 this latter equation refers to instants before and after
S
the collision and, if we know that a collision took
The gain term can be derived analogously by place, we certainly cannot invoke eqn [13]. Hence, it
considering that we are looking at particles which is more convenient to assume eqn [13] in the loss
have velocities v and v2 after the collisions so term and work over the gain term to keep advantage
308 Boltzmann Equation (Classical and Quantum)

of the factorization property which will be assumed a two-body interaction V = V(r), the resulting
only before the collision. Boltzmann equation is eqn [1], with
Coming back to eqn [11] for the outgoing pair Z Z
 
velocities v, v2 (satisfying the condition (v2  v)  n > 0), Qf ; f dv1 dn Bv  v1 ; n f 0 f10  ff1 17
we make use of the continuity property S2
  where we are using the usual shorthand notation:
f2 x; v; x nr; v2 f2 x; v0 ; x nr; v02 14
 
where the pair v0 , v02 is pre-collisional. On f2 f 0 f x; v0 ; f10 f x; v01 ; f f x; v;
18
expressed before the collision, we can reasonably f1 f x; v1
apply condition [13] and obtain and B = B(v  v1 ; n) is a suitable function of the
Z Z relative velocity v  v1 and the impact parameter n,
2
G  L N  1r dv2 dnv  v2  n which is proportional to the cross section relative to
S2
  the potential V. Another equivalent, sometimes
 f x; v f x  nr; v02
0
more convenient, way, to express eqn [17] is
 f x; vf x nr; v2  15 Z Z Z
0
 
Qf ; f dv1 dv dv01 W v; v1 jv0 ; v01
after a change n ! n in the gain term, using the
 0 0 
notation S2 for the hemisphere {nj = (v2  v)  n  0}. f f1  ff1 19
This transforms the pair v0 , v02 from a pre-collisional
to a post-collisional pair. with
Finally, in the limit N ! 1, r ! 0, Nr2 = 1 , we  
W v; v1 jv0 ; v01
find    
w v; v1 jv0 ; v01   v v1  v0  v01
 
@t v  rx f 2  2
Z Z   12 v2 v21  v0  v01 20
1 dv2 dnv  v2  n
S where w is a suitable kernel. All the qualitative
  properties, such as the conservation laws and the
 f x; v0 f x; v02  f x; vf x; v2  16
H-theorem, are obviously still valid.
The parameter , called mean free path, represents,
roughly speaking, the typical length a particle can
cover without undergoing any collision. In eqns [1] Consequences
and [2], we just chose  = 1. The Boltzmann equation provoked a debate involving
Equation [16] (or, equivalently, eqns [1] and [2]) is Loschmidt, Zermelo, and Poincare, who outlined
the Boltzmann equation for hard spheres. Such an inconsistencies between the irreversibility of the equa-
equation has a statistical nature, and it is not tion and the reversible character of the Hamiltonian
equivalent to the Hamiltonian dynamics from which dynamics. Boltzmann argued the statistical nature of
it has been derived. Indeed, the H-theorem shows that his equation and his answer to the irreversibility
such an equation is not reversible in time as expected paradox was that most of the configurations behave
of any law of mechanics. as expected by the thermodynamical laws. However,
This concludes the heuristic preliminary analysis of he did not have the probabilistic tools for formulating
the Boltzmann equation. We certainly know that the in a precise way the statements of which he had a
above arguments are delicate and require a more precise intuition.
rigorous and deeper analysis. If we want the Boltzmann Grad (1949) stated clearly the limit N ! 1,
equation not to be a phenomenological model, derived r ! 0, Nr2 ! const:, where N is the number of
by ad hoc assumptions and justified only by its particles and r is the diameter of the molecules, in
practical relevance, but rather that it is a consequence which the Boltzmann equation is expected to hold.
of a mechanical model, we must derive it rigorously. In This limit is usually called the BoltzmannGrad limit
particular, the propagation of chaos should be not a (BG limit in the sequel).
hypothesis but the statement of a theorem. The problem of a rigorous derivation of the
Boltzmann equation was an open and challenging
problem for a long time. Lanford (1975) showed that,
Beyond the Hard Spheres
although for a very short time, the Boltzmann equation
The heuristic arguments we have developed so far can be derived starting from the mechanical model of the
can be extended to different potentials than that of hard-sphere system. The proof has a deep content but is
the hard-sphere systems. If the particles interact via relatively simple from a technical viewpoint.
Boltzmann Equation (Classical and Quantum) 309

Z
Existence 1 3 1
v2 M dv T u2 25
The mathematical study of the Boltzmann equation 2 2 2
starts with the problem of proving the existence of Moreover, the only solution to the equation
the solutions. One would like to be able to show that, Z
for all (or at least for a physically significant family hvQf ; f dv 0 26
of) initial distributions (which are positive and
summable functions) with finite momentum, energy, is any linear combination of the quantities (1, v, v2 ),
and entropy, there exists a unique solution to eqn [1] called collision invariants. The last property
with the same mass, momentum, and energy as of the obviously corresponds to the mass, momentum,
initial distribution. Moreover, the entropy should and energy conservation.
decrease and the solution should approach the right With this in mind, consider a change of
Maxwellian as t ! 1. The problem, in such a variables in the Boltzmann equation [1], passing
generality, is still unsolved, but several results in this from microscopic to macroscopic variables,
direction have been achieved since the pioneering x ! "x, t ! "t. Here " is a small scale parameter
works due to Carleman (1933) for the homogeneous expressing the ratio between the typical inter-
equation. Actually, there are satisfactory results for particle distances and the typical distances over
some special situations, such as the homogeneous which the macroscopic equations are varying.
solutions (independent of x) close to the equilibrium, Such a change yields
to the vacuum, or to homogeneous data. The most
1
general result we have up to now is, unfortunately, @t v  rx f" Qf" ; f" 27
not constructive. This is due to Di Perna and Lions "
(1989), who showed the existence of suitable weak We need to allow the small parameter " (mean free
solutions to eqn [1]. However, we still do not know path or the Knudsen number) to tend to zero. In
whether such solutions, which preserve mass and order to eliminate the singularity on the right-hand
momentum, and satisfy the H-theorem, are unique side of [27], we multiply both sides by the collision
and also preserve the energy. invariants v with  = 0, 1, 2; and obtain the five
equations:
Z
dv v @t v  rx f" 0 28
Hydrodynamics
The derivation of hydrodynamical equations from On the other hand, if f" converges to f, as " ! 0,
the Boltzmann equation is a problem as old as the necessarily Q(f , f ) = 0 and hence f = M. Therefore,
equation itself and, in fact, it goes back to Maxwell we expect that in the limit " ! 0,
and Hilbert. Preliminary to the discussion of the Z
hydrodynamic limit, we establish a few properties of dv v @t v  rx M 0 29
the collision kernel.
It is a well-known fact that the only solution to Equation [29] fixes a relation among the fields , u, T
the equation as functions of x and t. A standard computation gives
us the Euler equations for compressible gas
Qf ; f 0 21
@t  divu 0 30
is a local Maxwellian, namely

f x; v : Mx; v 1
@t u u  ru rp 0 31

x 2
3=2
ejvuxj =2Tx 22
2Tx
@t T u  rT 23Tru 0 32
where the local parameters , u, and T satisfy the
where the pressure p is related to the density  and
relations
the temperature T by the perfect gas law
Z
M dv  23 p T 33

Z In order to make the above arguments rigorous,


Hilbert (1916) developed a useful tool, called the
vM u 24
Hilbert expansion, to control the limiting procedure.
310 Boltzmann Equation (Classical and Quantum)

Namely, he expressed a formal solution to eqn [27] the upstream and the downstream values of the
in the form of a power series expansion: densities, mean velocities, and temperatures. Such
X relations are known in gas dynamics as the
f" fj "j 34 RankineHugoniot conditions. A solution of this
j0
problem has been found by Caflisch and Nikolaenko
where f0 is the local Maxwellian, with the para- (1983) in case of a weak shock (namely, when M
meters , u, T satisfying the Euler equations. All the and M are close) by using Hilbert expansion
other coefficients fj of the developments can be techniques. More recently, Liu and Yu (2004)
determined by recurrence, inverting suitable opera- established also stability and positivity of this
tors. However, the series is not expected to be solution.
convergent, so that the way to show the validity of
the hydrodynamical limit rigorously is to truncate
the expansion and to control the remainder. The Quantum Kinetic Theory
first result in this direction was obtained by Caflisch
(1980). However, this approach is based on the Uehling and Uhlembeck (1933) introduced the
regularity of the solutions to the Euler equations, following kinetic equation for describing a large
which is known to hold only for short times since system of weakly interacting bosons or fermions:
Z Z Z
shocks can be formed. How to approximate the 0
 
shocks in terms of a kinetic description is still a @t v  rx f dv1 dv dv01 W v; v1 jv0 ; v01
difficult and open problem.  f1 f 1 f1 f 0 f10
Note that the hydrodynamical picture of the  
Boltzmann equation just means that we are looking  1 f 0 1 f10 ff1 g 36
at the solutions of this equation at a suitable Here the / sign, stand for bosons/fermions,
macroscopic scale. The rarefaction hypothesis respectively, and
underlying the Boltzmann description is reflected in  
the law of perfect gas, which states that the W v; v1 jv0 ; v01
 
particles, in the local thermal equilibrium, are free. ^ 0  v1 2  v v1  v0  v0
^ 0  v  Vv
Vv 1
  2 
  12 v2 v21  v0 2  v01 37
Stationary Problems Moreover,
Stationary non-Maxwellian solutions to the Z
^
Vp 4 dx eipx 38
Boltzmann equation should describe stationary
nonequilibrium states exhibiting nontrivial flows.
In spite of the physical relevance of these problems, where V is the interaction potential. Note that eqn
not many complete mathematical results are, at the [37] is the expression of the cross section of a
moment, available. Among them, there is the quantum scattering in the Born approximation.
traveling-wave problem, which can be formulated The unknown f = f (x, v; t) in eqn [37] is the expected
in the following way. We look for a solution number of molecules falling in the unit (quantum) cell
f = f (x  ct, v), f : R  R 3 ! R , constant in form of the phase space. This function is proportional to the
but traveling with a constant velocity c > 0, to one-particle Wigner function, introduced by Wigner
(1932) to handle kinetic problems in quantum
v1  cf 0 Qf ; f 35 mechanics, and defined as (setting h = 1):
0
where v1 is the first component of v and f denotes Z
1  
the spatial derivative of f. Equation [35] must be 3
dy eiyv  x 12 y; x  12 y
2
complemented by the boundary conditions which
are f ! M , as x ! 1, where M are the right where (x; z) is the kernel of a one-particle density
and left Maxwellians, namely two prescribed equili- matrix. Basically, the Wigner function is an equiva-
brium situations at infinity. The parameters (density, lent way to describe a state of a quantum system.
mean velocity, and temperature) of the Maxwel- For instance, eqn [40] below expresses the equili-
lians, however, cannot be chosen arbitrarily. Indeed, brium distributions for bosons and fermions in
the conservations of the mass, momentum, and terms of Wigner functions. In general, the Wigner
energy (which are properties of Q) imply the functions, due to the uncertainty principle, are real
conservations (in x) of the fluxes of these quantities. but not necessarily positive; however, the integral
Hence, we have to impose five equations that relate with respect to x and v gives the probability
Boltzmann Equation (Classical and Quantum) 311

distributions of the velocity and the position, limit, which consists in scaling space and time and the
respectively. In the kinetic regime, in which we are interaction potential as
interested, the scales are mesoscopic, namely the p
typical quantum oscillations are on a scale much x ! "x; t ! "t; ! " 43
smaller than the characteristic scales of the problem,
where "1 = N 1=3 is a parameter diverging when the
so that we expect that f should be a genuine
number of particles N tends to infinity.
probability distribution, since the Heisenberg
We mention, incidentally, that under such a
principle does not play an essential role. However,
scaling, a classical system is described by a transport
the interaction occurs on a microscopic scale, so that
equation, called FokkerPlanckLandau equation,
we expect that the statistics play a role in addition with a diffusion operator in the velocity space.
to the quantum rules for the scattering. The BG limit considered for classical particle
In this framework, the entropy functional is
systems is different from that considered here
Z Z
for weakly interacting quantum systems. It is actually
Hf dx dv f x; v log f x; v equivalent to rescaling space and time according to

1 f x; v log1 f x; v 39 x ! "x; t ! "t 44
It is decreasing along the solutions to eqn [35] and it is leaving the interaction unscaled but, in order to
also minimized (among the distributions with given control the total interaction, we make the density
mass, momentum, and energy) by the equilibria diverging gently as "1 = N 1=2 .
z A quantum system under such a scaling is expected to
Mv 2 40 be described by a Boltzmann equation [1] with the
e=2jvuj
z collision operator Q computed with the full quantum
namely the BoseEinstein and the FermiDirac cross section. Now we do not have any effect of the
distributions, respectively. Here  > 1 and z > 0 statistics because in this rarefaction limit these correc-
are the inverse temperature and the activity, respec- tions disappear. On the other hand, the cross section is
tively. Note that, for the BoseEinstein distribution, that arising from the analysis of the quantum scattering.
z < 1. This creates, in a sense, an inconsistency with Since we do not rescale the interaction, all the other
eqn [36]. Indeed, assuming u = 0 and an initial terms in the Born expansion of the cross section play a
distribution f = f0 (v) with the density larger than the role. This kind of Boltzmann equation is a good
maximal density allowed by eqn [40], namely description of a rarefied gas in which quantum effects
Z are not negligible.
1
c : dv =2v2 41
e 1 See also: Adiabatic Piston; Evolution Equations: Linear
and Nonlinear; Gravitational N-Body Problem (Classical);
it cannot converge to any equilibrium. In order to Interacting Particle Systems and Hydrodynamic
overcome this difficulty related to the Bose con- Equations; Kinetic Equations; Multiscale Approaches;
densation, one can enlarge the definition of the Nonequilibrium Statistical Mechanics: Dynamical
equilibria family by setting Systems Approach; Quantum Dynamical Semigroups.

1
Mv v 42
e=2v2 1
Further Reading
to take care of excess of mass by means of a condensate
Balesku R (1978) Equilibrium and Nonequilibrium Statistical
component. However, it is not clear whether eqn Mechanics. Moscow: Mir (distributed by Imported Publica-
[36] can actually describe the Bose condensation tions, Chicago, Ill).
since its derivation from the Schrodinger equation Caflisch RE (1980) The fluid dynamical limit of the nonlinear
requires, just from the very beginning, the existence of Boltzmann equation. Communications of Pure and Applied
bosonic quasifree states which can be constructed only Mathematics 33: 651666.
Caflisch RE and Nicolaenko B (1983) Shock waves and the
if the density is moderate. Further analyses are certainly Boltzmann equation. Nonlinear partial differential equations.
needed to clarify the situation. A rigorous derivation of Contemporary Mathematics 17: 3544.
the Uehling and Uhlembeck equation is, up to now, far Carleman T (1933) Sur la theorie de lequation integro-differentielle
from being obtained even for short times; nevertheless, de Boltzmann. Acta Mathematica 60: 91146.
such an equation is extensively used in the applications. Cercignani C (1998) Ludwig Boltzmann. The Man Who Trusted
Atoms. Oxford: Oxford University Press.
Equation [36] concerns a weakly interacting gas of Cercignani C, Illner R, and Pulvirenti M (1994) The Mathema-
quantum particles. From a mathematical viewpoint, it tical Theory of Dilute Gases. Springer Series in Applied
is expected to be valid in the so-called weak-coupling Mathematics, vol. 106. New York: Springer.
312 BoseEinstein Condensates

Di Perna RJ and Lions P-L (1989) On the Cauchy problem for the Liu T-P and Yu S-H (2004) Boltzmann equation: micromacro
Boltzmann equations: Global existence and weak stability. decompositions and positivity of shock profiles. Communica-
Annals of Mathematics 130: 321366. tions in Mathematical Physics 246(1): 133179.
Grad H (1949) On the kinetic theory of rarified gases. Spohn H (1994) Quantum kinetic equations. In: Fannes M, Maes C,
Communications in Pure and Applied Mathematics and Verbeure A (eds.) On Three Levels: Micro, Meso and Macro
2: 331407. Approaches in Physics. New York: Plenum.
Hilbert D (1916) Begrundung der Kinetischen Gastheorie. Uehling EA and Uhlembeck GE (1933) Transport phenomena in
Mathematische Annalen 72: 331407. EinsteinBose and FermiDirac gases. I. Physical Reviews
Lanford OE III (1975) Time evolution of large classical systems. 43: 552561.
In: Ehlers J, Hepp K, and Weidenmuller HA (eds.) Lecture Wigner EP (1932) On the quantum correction for thermodynamic
Notes in Physics, vol. 38, pp. 1111. Berlin: Springer. equilibrium. Physical Reviews 40: 749759.

BoseEinstein Condensates
F Dalfovo, L P Pitaevskii, and S Stringari, general ground, one can start with the definition
Universita di Trento, Povo, Italy of the one-body density matrix
2006 Elsevier Ltd. All rights reserved. y

^ rr
n1 r; r 0  ^ 0 1
The quantities 
^ y (r) and (r) are the field operators
Introduction which create and annihilate a particle at point r,
In 1924 the Indian physicist S N Bose introduced a new respectively; they satisfy the bosonic commutation
statistical method to derive the blackbody radiation law relations
in terms of a gas of light quanta (photons). His work, ^
r; ^ y r 0  r  r 0 ;
 ^
r; ^ 0  0
r 2
together with the contemporary de Broglies idea of
matterwave duality, led A Einstein to apply the same If the system is in a pure state described by the
statistical approach to a gas of N indistinguishable N-body wave function (r 1 , . . . , r N ), then the
particles of mass m. An amazing result of his theory was average [1] is taken following the standard rules of
the prediction that below some critical temperature a quantum mechanics and the one-body density
finite fraction of all the particles condense into the matrix can be written as
lowest-energy single-particle state. This phenomenon,
named BoseEinstein condensation (BEC), is a conse- n1 r; r 0
quence of purely statistical effects. For several years, Z
such a prediction received little attention, until 1938, N dr 2 dr N  r;r 2 ; ...; r N r 0 ; r 2 ;.. .;r N 3
when F London argued that BEC could be at the basis of
the superfluid properties observed in liquid 4 He below involving the integration over the N  1 variables
2.17 K. A strong boost to the investigation of Bose r 2 , . .., r N . In the more general case of a statistical
Einstein condensates was given in 1995 by the observa- mixture of pure states, expression [3] must be
tion of BEC in dilute gases confined in magnetic traps averaged according to the probability for a system
and cooled down to temperatures of the order of a few to occupy the different states.
nK. Differently from superfluid helium, these gases Since n(1) (r, r 0 ) = (n(1) (r 0 , r)) the quantity n(1) ,
allow one to tune the relevant parameters (confining when regarded as a matrix function of its indices
potential, particle density, interactions, etc.), so to make r and r 0 , is Hermitian. It is therefore always possible
them an ideal test-ground for concepts and theories on to find a complete orthonormal basis of single-
BEC. particle eigenfunctions, i (r), in terms of which the
density matrix takes the diagonal form
X
What Is BEC? n1 r; r 0 ni i ri r 0 4
i
In nature, particles have either integer or half-
integer spin. Those having half-integer spin, like P ni are subject to the normal-
The real eigenvalues
electrons, are called fermions and obey the Fermi ization condition i ni = N and have the meaning of
Dirac statistics; those having integer spin are occupation numbers of the single-particle states i .
called bosons and obey the BoseEinstein statis- BEC occurs when one of these numbers (say, n0 )
tics. Let us consider a system of N bosons. In becomes macroscopic, that is, when n0 N0 is a
order to introduce the concept of BEC on a number of order N, all the others remaining of order 1.
BoseEinstein Condensates 313

In this case eqn [4] can be conveniently rewritten in The sum on the right is the number of noncondensed
the form particles (N  N0 ), and the quantity N0 =N is called
X condensate fraction.
n1 r; r 0 N0 0 r0 r 0 ni i ri r 0 5 If the system is not uniform, the eigenfunctions of
i60
the density matrix are no longer plane waves but,
and the state represented by 0 (r) is called provided N is sufficiently large, the concept of BEC
BoseEinstein condensate. This definition is rather is still well defined, being associated with the
general, since it applies to any macroscopic (N  1) occurrence of a macroscopic occupation of a
system of indistinguishable bosons independently of single-particle eigenfunction 0 (r) of the density
mutual interactions and external fields. matrix. Thus, the condensed bosons can be
The one-body density matrix [1] contains informa- described by means of the function (r) =
tion on important physical observables. By setting p
N0 0 (r), which is a classical complex field playing
r = r 0 one finds the diagonal density of the system the role of an order parameter. This is the analog of
^ y rri
^ the classical limit of quantum electrodynamics,
nr  n1 r; r h 6
R where the electromagnetic field replaces the micro-
with N = dr n(r). The off-diagonal components scopic description of photons. The function  may
can instead be used to calculate the momentum also depend on time and can be written as
distribution
r; t jr; tj eiSr;t 11
np h ^ y ppi
^ 7 Its modulus determines the contribution of the
R
^
where (p) = (2 h) 3=2 ^ exp [ip  r=
dr (r) h] is the condensate to the diagonal density [6], while the
field operator in momentum representation. By phase S is crucial in characterizing the coherence
inserting this expression for (p) ^ into eqn [7] one and superfluid properties of the system. The order
finds parameter [11], also named macroscopic wave
Z  function or condensate wave function, is defined
1 s s only up to a constant phase factor. One can always
np 3
dR ds n1 R ; R  eips=h
2h 2 2 multiply this function by the numerical factor ei
8 without changing any physical property. This
reflects the gauge symmetry exhibited by all the
where s = r  r 0 and R = (r r 0 )=2. physical equations of the problem. Making an
Let us consider a uniform system of N particles in explicit choice for the value of the order parameter,
a volume V and take the thermodynamic limit and hence for the phase, corresponds to a formal
N, V ! 1 with density N/V kept fixed. The eigen- breaking of gauge symmetry.
functions of the density matrix are plane waves and
the lowest-energy state has zero momentum, p = 0,
and constant wave function 0 (r) = V 1=2 . BEC in
this state implies a macroscopic number of particles BEC in Ideal Gases
having zero momentum and constant density N0 =V. Once we have defined what is a BoseEinstein
The density matrix only depends on s = r  r 0 and condensate, the next question is when such a
can be written as condensation occurs in a given system. The ideal
N0 1 X Bose gas provides the simplest example. So, let us
n1 s np eips=h 9 consider a gas of noninteracting bosons described
V V p60
by the Hamiltonian H ^ =P H ^ (1)
i i , where the Schro-
In the s ! 1 limit, the sum on the right vanishes due dinger equation H ^ (1) i (r) = i i (r) gives the spec-
to destructive interference between different plane trum of single-particle wave functions and
waves, but the first term survives. One thus finds that, energies. One can define an occupation number
in the presence of BEC, the one-body density matrix ni as the number of particles in the state with
tends to a constant finite value at large distances. This energy i . Thus, any given state of the many-body
behavior is named off-diagonal long-range order, system is specified by a set {ni }. The mean
since it involves the off-diagonal components of the occupation numbers, n i , can be calculated by
density matrix. Its counterpart in momentum space is using the standard rules of statistical mechanics.
the appearance of a singular term at p = 0: For instance, by considering a grand canonical
X ensemble at temperature T, one finds
np N0 p np0 p  p0 10
p0 60 i fexpi    1g1
n 12
314 BoseEinstein Condensates

with  = 1=(kB T). The chemical P potential  is fixed equivalent to saying that BEC occurs when the
by the normalization condition i n i = N, where N mean distance between bosons is of the order of
is the average number of particles in the gas. For their de Broglie wavelength.
T ! 1 the chemical potential is negative and large. Another interesting case, which is relevant for the
It increases monotonically when T is lowered. Let us recent experiments with BEC in dilute gases con-
call 0 the lowest single-particle level in the fined in magnetic and/or optical traps, is that of an
spectrum. If at some critical temperature Tc the ideal gas subject to harmonic potentials. Let us
normalization condition can be satisfied with consider, for simplicity, an isotropic external poten-
 !  0 , then the occupation of the lowest state, tial Vext (r) = (1=2)m!2ho r2 . The single-particle Hamil-
0 = N0 , becomes of order N and BEC is realized.
n tonian is H ^ (1) = (h2 =2m)r2 Vext (r) and its
Below Tc the normalization condition P must be eigenvalues are nx , ny , nz = (nx ny nz 3=2)h!ho .
replaced with N = N0 NT , where NT = i60 n i is The corresponding density of states is () =
the number of particles out of the condensate, that (1=2)(h!ho )3 2 . A natural thermodynamic limit for
is, the thermal component of the gas. Whether BEC this system is obtained by letting N ! 1 and
occurs or not, and what is the value of Tc depends !ho ! 0, while keeping the product N!3ho constant.
on the dimensionality of the system and the type of The condition for BEC to occur is that  approaches
single-particle spectrum. the value 000 = (3=2)h!ho from below by cooling the
The simplest case is that of a gas confined in a gas down to Tc . Following the same procedure as
cubic box of volume V = L3 with periodic boundary for the uniform gas, one finds
conditions, where H h2 =2m)r2 . The eigen-
^ (1) = (
functions are plane waves p (r) = V 1=2 exp [ip  kB Tc h!ho N= 31=3 0:94h!ho N 1=3 15
r=h], with energy p = p2 =2m and momentum
and
p = 2 hn=L. Here n is a vector whose components
nx , ny , nz are 0 or  integers. The lowest eigenvalue N0 T N1  T=Tc 3  16
has zero energy (0 = 0) and zero momentum. The
mean occupation numbers are given by Notice that the condensate is not uniform in this case,
p = {exp [(p2 =2m  )] 1}1 . In the thermo-
n since it corresponds to the lowest eigenfunction of the
dynamic limit (N, V ! 1 with harmonic oscillator, which is a Gaussian of width
P N/V kept constant), aho = [h=(m!ho )]1=2 . Correspondingly, the condensate
one
R can replace the sum p with the pintegral
d(), where () = (2)2 V(2m= h2 )3=2  is the in the momentum space is also a Gaussian, of width
density of states. In this way, one can calculate the a1
ho . This implies that, differently from the gas in a box,
thermal component of the gas as a function of T, here the condensate can be seen both in coordinate and
finding the critical temperature momentum space in the form of a narrow distribution
 2=3 emerging from a wider thermal component. Finally,
2 h2 N results [15] and [16] remain valid even for anisotropic
kB Tc 13
m V 3=2 harmonic potentials, with trapping frequencies !x , !y ,
and !z , provided the frequency !ho is replaced by the
where is the Riemann zeta function and (3=2) geometric average (!x !y !z )1=3 .
2.612. For T > Tc , one has  < 0 and NT = N. For
T < Tc one instead has  = 0, NT = N  N0 and
BEC in Interacting Gases
N0 T N1  T=Tc 3=2  14
Actual condensates are made of interacting particles.
The critical temperature turns out to be fully The full many-body Hamiltonian is
determined by the density N/V and by the mass of Z
the constituents. These results were first obtained ^ dr 
H ^ 0 r
^ y rH ^
by A Einstein in his seminal paper and used by Z
1
F London in the context of superfluid helium. We dr 0 dr 
^ y r
^ y r 0 Vr  r 0 r
^ 0 r
^ 17
notice that the replacement of the sum with an 2
integral in the above derivation is justified only if where V(r  r 0 ) is the particleparticle interaction and
the thermal energy kB T is much larger than the H^ 0 = (h2 =2m)r2 Vext (r). Differently from the
energy spacing between single-particle levels, that is, case of ideal gases, H ^ is no longer a sum of single-
if kB T  h2 =2mV 2=3 . Is is also worth noticing that particle Hamiltonians. However, the general defini-
the above expression for Tc can be written as tions given in the section What is BEC? are still

3T N=V 2.612, where


T = [2 h2=(mkB T)]1=2 is valid. In particular, the one-body density matrix, in the
the thermal de Broglie wavelength. This is presence of BEC, can be separated as in eqn [5]. One
BoseEinstein Condensates 315

can write n(1) (r, r 0 ) =  (r)(r 0 ) n ~(1) (r, r 0 ), where  jj2 . It has been derived assuming that N is large
is the order parameter of the condensate ( (r)(r 0 ) while the fraction of noncondensed atoms is negli-
being of order N), while n ~(1) (r, r 0 ) vanishes for large gible. On the one hand, this means that quantum
jr  r 0 j. This is equivalent to say that the bosonic field fluctuations of the field operator have to be small,
operator splits in two parts, which is true when njaj3
1, where n is the particle
density. In fact, one can show that, at T = 0 the
^
r ^
r r 18 quantum depletion of the condensate is proportional
to (njaj3 )1=2 . On the other hand, thermal fluctuations
where the first term is a complex function and the
have also to be negligible and this means that the
second one is the field operator associated with
theory is limited to temperatures much lower than
the noncondensed particles. This decomposition is
Tc . Within these limits, one can identify the total
particularly useful when the depletion of the
density with the condensate density.
condensate, that is, the fraction of noncondensed
The stationary solution of eqn [20] corresponds to
particles, is small. This happens when the interac-
the condensate wave function in the ground state. One
tion is weak, but also for particles with arbitrary
can write (r, t) = 0 (r) exp (it=h), where  is the
interaction, provided the gas is dilute. In this case,
chemical potential. Then the GP equation [20] becomes
one can expand the many-body Hamiltonian by
as a small quantity. !
treating the operator 
h2 r2 2
A suitable strategy consists in writing the Heisen-  Vext r gj0 rj 0 r 0 r 21
2m
berg equation for the evolution of the field opera-
tors, i ^ = [,
h@t  ^
^ H], using the many-body where n(r)= j0 (r)j2 is the particle density. The same
Hamiltonian [17]: equation can be obtained by minimizing the energy of
i ^ t
h@t r; the system written as a functional of the density:
 Z  Z " 2 #
H ^ 0 dr 0 
^ y r 0 ; tVr  r 0 r
^ 0 ; t h p 2 gn2
En dr j= nj nVext r 22
2m 2
^ t
r; 19
The first term on the right corresponds to the
The zeroth-order is thus obtained by replacing the quantum kinetic energy coming from the uncertainty
operator  ^ with the classical field . In the integral principle; it is usually named quantum pressure
containing the interaction V(r  r 0 ), this replacement is, and vanishes for uniform systems.
in general, a poor approximation when short distances The next order in  gives the excited states of the
(r  r 0 ) are involved. In a dilute and cold gas, one can condensate. In a uniform gas the ground-state order
nevertheless obtain a proper expression for the inter- parameter, 0 , is a constant and the first-order
action term by observing that, in this case, only binary expansion of H ^ was introduced by N Bogoliubov in
collisions at low energy are relevant and these collisions 1947. In particular, he found an elegant way to
are characterized by a single parameter, the s-wave diagonalize the Hamiltonian by using simple linear
scattering length, a, independently of the details of the combinations of particle creation and annihilation
two-body potential. This allows one to replace V(r  r 0 ) operators. These are known as Bogoliubovs trans-
^ with an effective interaction V(r  r 0 ) = g(r  r 0 ),
in H formations and stay at the basis of the concept of
where the coupling constant g is given by g = 4h2 a=m. quasiparticle, one of the most important concepts in
The scattering length can be measured with several quantum many-body theory.
experimental techniques or calculated from the exact A generalization of Bogoliubovs approach to the
two-body potential. Using this pseudopotential and case of nonuniform condensates is obtained by
replacing the operator  ^ with the complex function  in considering small deviations around the ground
the Heisenberg equation of motion, one gets state in the form

i
h@t r; t r; t eit=h 0 r urei!t v rei!t 23
!
 2 r2
h 2 Inserting this expression into eqn [20] and keeping
 Vext r gjr; tj r; t 20
2m terms linear in the complex functions u and v, one gets
This is known as GrossPitaevskii (GP) equation and ^ 0   2g2 rur g2 rvr
h!ur H 24
0 0
it was first introduced in 1961. It has the form of a
nonlinear Schrodinger equation, the nonlinearity
coming from the mean-field term, proportional to  ^ 0   2g2 rvr g2 rur
h!vr H 25
0 0
316 BoseEinstein Condensates

These coupled equations allow one to calculate the vessels, viscousless motion, quantized vorticity, and
energies " = 
h! of the excitations. They also give the others. These features can also be observed in BEC.
so-called quasiparticle amplitudes u and v, which obey The link between BEC and superfluidity is given by
the normalization condition the phase of the order parameter [11]. To under-
Z stand this point, let us consider a uniform system. If
drui ruj r  vi rvj r = ij ^ t) is a solution of the Heisenberg equation [19]
(r,
with Vext = 0, then
In a uniform gas, u and v are plane waves and one  
recovers the famous Bogoliubovs spectrum ^  vt; t exp i mv  r  1 mv2 t
^ 0 r; t r
 28
" !#1=2 h 2
h2 q2 
 h2 q2
h! 2gn 26 where v is a constant vector, is also a solution. This
2m 2m
equation gives the Galilean transformation of
where q is the wave vector of the excitations. the field operator and also applies to its condensate
For large momenta the spectrum coincides with the component . At equilibrium, the p ground-state
free-particle energy h2 q2 =2m. At low momenta, it order parameter is given by 0 = n exp (it=h),
instead gives the phonon dispersion ! = cq, where where n is a constant independent of r. In a frame
c = [gn=m]1=2 is the Bogoliubov sound velocity. The where the condensate moves with velocity v, the
transition between the two regimes occurs when the order
p parameter instead takes the form 0 =
excitation wavelength is of the order of the healing n exp (iS), with S(r, t) = h1 [mv  r  (mv2 =2 )t].
length, The velocity of the condensate can thus be identified
p with the gradient of the phase S:
8na1=2 
h=mc 2 27
h

which is an important length scale for superfluidity. vr; t =Sr; t 29
m
When the order parameter is forced to vanish at some
point (by an impurity, a wall, etc.), the healing length This definition is also valid for v varying slowly in
provides the typical distance over which it recovers its space and time. The modulus of the order para-
bulk value. In a nonuniform condensate the excitations meter plays a minor role in this definition and it is
are no longer plane waves but, at low energy, they have not necessary to assume the gas to be dilute and
still a phonon-like character, in the sense that they close to T = 0. Indeed, the relation [29] between the
involve a collective motion of the condensate. velocity field and the phase of the order parameter
The GP equation [20] is the starting point for an also applies in the presence of large quantum
accurate mean-field description of BEC in dilute depletion, as in superfluid 4 He, and at T 6 0. In
cold gases, which is rigorous at T = 0 and for this case, n should not be identified with the
njaj3
1. Static and dynamics properties of con- condensate density. Conversely, in dilute gases at
densates in different geometries can be calculated by T = 0, n is the condensate density and the velocity
solving the GP equation numerically or using [29] can be simply obtained by applying the usual
suitable approximated methods. The inclusion of definition of current density operator, ^j, to the order
effects beyond mean field is a highly nontrivial and parameter [11].
interesting problem. A rather extreme case is The velocity [29] describes a potential flow and
represented by liquid 4 He, which is a dense system corresponds to a collective motion of many particles
where the interaction between atoms causes a large occupying a single quantum state. Being equal to the
depletion of the condensate even at T = 0 (N0 =N gradient of a scalar function, it is irrotational
being less than 10%) and thus a full many-body (= vs = 0) and satisfiesH the OnsagerFeynman
treatment is required for its rigorous description. quantization condition vs  dl = h=m, with
Nevertheless, even in this case, the general defini- non-negative integer. These conditions are not
tions of the section What is BEC? are still useful. satisfied by a classical fluid, where the hydro-
dynamic velocity field, v(r, t) = j(r, t)=n(r, t), is the
average over many different states and does not
correspond to a potential flow.
Superfluidity and Coherence
By using the definition of the phase S and velocity
With the word superfluidity, one summarizes a v, together with particle conservation, one can show
complex of macroscopic phenomena occurring in that the dynamics of a condensate, as far as
quantum fluids under particular conditions: persis- macroscopic motions are concerned, is governed by
tent currents, equilibrium states at rest in rotating the hydrodynamic equations of an irrotational
BoseEinstein Condensates 317

nonviscous fluid. Within the mean-field theory, this been observed in condensates of ultracold atoms. In
can be easily seen by rewriting the GP equation [20] these systems it was also possible to measure the
in terms of the density n = jj2 and the velocityp coherence length, that is, the distance jr  r 0 j at which
[29]. Neglecting the quantum pressure term r2 n the one-body density vanishes and the phase of the
(hence limiting the description to length scales order parameter is no more well defined. In most
larger than the healing length ), one gets situations, the coherence length turns out to be of the
order of, or larger than the size of the condensates.
@ However, interesting situations exist when the coher-
n =  vn 0 30
@t ence length is shorter but the system still preserves some
and features of BEC (quasicondensates).
 
@ mv2
m v = Vext n 0 31 Final Remarks
@t 2
BoseEinstein condensates of ultracold atoms are
with the local chemical potential (n) = gn. These easily manipulated by changing and tuning the
equations have the typical structure of the dynamic external potentials. This means, for instance, that one
equations of superfluids at zero temperature and can can prepare condensates in different geometries,
be viewed as the T = 0 case of the more general including very elongated (quasi-1D) or disk-shaped
Landaus two-fluid theory. (quasi-2D) condensates. This is conceptually impor-
One of the most striking evidences of superfluidity tant, since BEC in lower dimensions is not as simple as
is the observation of quantized vortices, that is, in three dimensions: thermal and quantum fluctua-
vortices obeying the OnsagerFeynman quantization tions play a crucial role, superfluidity must be properly
condition. A vast literature is devoted to vortices in re-defined, and very interesting limiting cases can be
superfluid helium and, more recently, vortices have explored (TonksGirardeau regime, Luttinger liquid,
also been produced and studied in condensates of etc.). Another possibility is to use laser beams to
ultracold gases, including nice configurations of produce standing waves acting as an external periodic
many vortices in regular triangular lattices, similar potential (optical lattice). Condensates in optical
to the Abrikosov lattices in superconductors. Other lattices behave as a sort of perfect crystal, whose
phenomena, such as the reduction of the moment of properties are the analog of the dynamic and transport
inertia, the occurrence of Josephson tunneling properties in solid-state physics, but with controllable
through barriers, the existence of thresholds for spacing between sites, no defects and tunable lattice
dissipative processes (Landau criterion), and others, geometry. One can investigate the role of phase
are typical subjects of intense investigation. coherence in the lattice, looking, for instance, at
Another important consequence of the fact that Josephson effects as in a chain of junctions. By tuning
BEC is described by an order parameter with a well- the lattice depth one can explore the transition from a
defined phase is the occurrence of coherence effects superfluid phase and a Mott-insulator phase, which is
which, in different words, mean that condensates a nice example of quantum phase transition. Control-
behave like matter waves. For instance, one can ling cold atoms in optical lattice can be a good starting
measure the phase difference between two conden- point for application in quantum engineering, inter-
sates by means of interference. This can be done in ferometry, and quantum information.
coordinate space by confining two condensates in Another interesting aspect of BECs is that the key
two potential minima, a and b, at a distance d. Let equation for their description in mean-field theory,
us take d along z and assume that, at t = 0, the order namely the GP equation [20], is a nonlinear Schro-
parameter is given by the linear combination dinger equation very similar to the ones commonly
(r) = a (r) exp (i )b (r) with a and b real used, for instance, in nonlinear quantum optics. This
and without overlap. Then let us switch off the opens interesting perspectives in exploiting the analo-
confining potentials so that the condensates expand gies between the two fields, such as the occurrence of
and overlap. If the overlap occurs when the density dynamical and parametric instabilities, the possibility
is small enough to neglect interactions, the motion to create different types of solitons, the occurrence of
is ballistic and the phase of each condensate evolves nonlinear processes like, for example, higher harmonic
as S(r, t) mr2 =(2ht), so that v = r=t. This implies generation and mode mixing.
a relative phase S(x, y, z d=2)  S(x, y, z  A relevant part of the current research also involves
d=2) = mdz= ht. The total density n = jj2 thus systems made of mixtures of different gases, BoseBose
exhibits periodic modulations along z with wave- or FermiBose, and many activities with ultracold
length  ht=md. This interference pattern has indeed atoms now involve fermionic gases, where BEC can
318 Bosons and Fermions in External Fields

also be realized by condensing molecules of fermionic Dalfovo F, Giorgini S, Pitaevskii LP, and Stringari S (1999)
pairs. An extremely active research now concerns the Theory of BoseEinstein condensation in trapped gases.
Reviews of Modern Physics 71: 463.
BCSBEC crossover, which can be obtained in Fermi Griffin A, Snoke DW, and Stringari S (1995) BoseEinstein
gases by tuning the scattering length (and hence the Condensation. Cambridge: Cambridge University Press.
interaction) by means of Feshbach resonances. Huang K (1987) Statistical Mechanics, 2nd edn. New York:
Ten years after the first observation of BEC in Wiley.
ultracold gases, it is almost impossible to summarize Inguscio M, Stringari S, and Wieman CE (1999) BoseEinstein
Condensation in Atomic Gases, Proceedings of the Inter-
all the researches done in this field. A large amount national School of Physics Enrico Fermi, Course CXL.
of work has already been devoted to characterize the Amsterdam: IOS Press.
condensates and several new lines have been opened. Ketterle W (2002) Nobel lecture: when atoms behave as waves:
Rather detailed review articles and books are BoseEinstein condensation and the atom laser. Reviews of
already available for the interested readers. Modern Physics 74: 1131.
Landau LD and Lifshitz EM (1980) Statistical Physics, Part 1.
Oxford: Pergamon Press.
See also: Interacting Particle Systems and Hydrodynamic
Leggett AJ (2001) BoseEinstein condensation in the alkali gases:
Equations; Quantum Phase Transitions; Quantum some fundamental concepts. Reviews of Modern Physics
Statistical Mechanics: Overview; Renormalization: 73: 307.
Statistical Mechanics and Condensed Matter; Superfluids; Lifshitz EM and Pitaevskii LP (1980) Statistical Physics, Part 2.
Variational Techniques for GinzburgLandau Energies. Oxford: Pergamon Press.
Pethick CJ and Smith H (2002) BoseEinstein Condensation in
Dilute Gases. Cambridge: Cambridge University Press.
Further Reading Pitaevskii LP and Stringari S (2003) BoseEinstein Condensation.
Oxford: Clarendon Press.
Cornell EA and Wieman CE (2002) Nobel lecture: BoseEinstein
condensation in a dilute gas, the first 70 years and some recent
experiments. Reviews of Modern Physics 74: 875.

Bosons and Fermions in External Fields


E Langmann, KTH Physics, Stockholm, Sweden describe some prototype examples and a general
2006 Elsevier Ltd. All rights reserved. Hamiltonian framework which has been used in
mathematically precise work on such models. The
general framework for this latter work is the
mathematical theory of Hilbert space operators
Introduction
(see, e.g., Reed and Simon (1975)), but in our
In this article we discuss quantum theories which discussion we try to avoid presupposing knowledge
describe systems of nondistinguishable particles of that theory. As mentioned briefly in the end, this
interacting with external fields. Such models are work has had close relations to various topics of
of interest also in the nonrelativistic case (in recent interest in mathematical physics, including
quantum statistical mechanics, nuclear physics, anomalies, infinite-dimensional geometry and group
etc.), but the relativistic case has additional, theory, conformal field theory, and noncommutative
interesting complications: relativistic models are geometry.
genuine quantum field theories, that is, quantum We restrict our discussion to spin-0 bosons and
theories with an infinite number of degrees of spin-1/2 fermions, and we will not discuss models
freedom, with nontrivial features like divergences of particles in external gravitational fields but
and anomalies. Since interparticle interactions are only refer the interested reader to DeWitt (2003).
ignored, such models can be regarded as a first We also only mention in passing that external
approximation to more complicated theories, and field problems have also been studied using
they can be studied by mathematically precise functional integral approaches, and mathemati-
methods. cally precise work on this can be found in the
Models of relativistic particles in external electro- extensive literature on determinants of differential
magnetic fields have received considerable attention operators.
in the physics literature, and interesting phenomena
like the Klein paradox or particleantiparticle pair
Examples
creation in overcritical fields have been studied; see
Rafelski et al. (1978) for an extensive review. We Consider the Schrodinger equation describing a
will not discuss these physics questions but only nonrelativistic particle of mass m and charge e
Bosons and Fermions in External Fields 319

moving in three-dimensional space and interacting certain (anti-) commutator relations, and this is a
with an external vector and scalar potentials A and convenient way to construct the appropriate many-
, respectively, particle Hilbert space, Hamiltonian, etc. In the
nonrelativistic case, this formalism can be regarded
1
i@t H ; H ir eA2  e 1 as an elegant reformulation of a pedestrian con-
2m struction of a many-body quantum-mechanical
(we set  h = c = 1, @t = @=@t, and ,, and A can model, which is useful since it provides convenient
depend on the space and time variables x 2 R 3 and computational tools. However, this formalism nat-
t 2 R). This is a standard quantum-mechanical urally generalizes to the relativistic case where the
model, with the one-particle wave function one-particle model no longer has an acceptable
allowing for the usual probabilistic interpretation. physical interpretation, and one finds that one can
One interesting generalization to the relativistic nevertheless give a consistent physical interpretation
regime is the KleinGordon equation to [2] and [3] provided that are interpreted as
h i quantum field operators describing bosons and
i@t e2 ir eA2  m2 0 2 fermions. This particular exchange statistics of the
relativistic particles is a special case of the spin-
with a C-valued function . There is another
statistics theorem: integer-spin particles are bosons
important relativistic generalization, the Dirac
and half-integer spin particles are fermions. While
equation
many structural features of this formalism are
i@t e  ir eA  a m 0 3 present already in the simpler nonrelativistic models,
the relativistic models add some nontrivial features
with a = (1 , 2 , 3 ) and  Hermitian 4  4
typical for quantum field theories.
matrices satisfying the relations
In the following, we discuss a precise mathema-
i j j i ij ; i  i ; 2 1 4 tical formulation of the quantum field theory models
described above. We emphasize the functorial nature
and a C4 -valued function (we also write 1 for the of this construction, which makes manifest that it
identity). These two relativistic equations differ by also applies to other situations, for example, where
the transformation properties of under Lorentz the bosons and fermions are also coupled to a
transformations: in [2] it transforms like a scalar gravitational background, are considered in other
and thus describes spin-0 particles, and it transforms spacetime dimensions than 3 1, etc.
like a spinor describing spin-1/2 particles in [3]. While
these equations are natural relativistic generaliza-
tions of the Schrodinger equation, they no longer
Second Quantization:
allow to consistently interpret as one-particle
Nonrelativistic Case
wave functions. The physical reason is that, in a
relativistic theory, high-energy processes can create Consider a quantum system of nondistinguishable
particleantiparticle pairs, and this makes the particles where the quantum-mechanical descrip-
restriction to a fixed particle number inconsistent. tion of one such particle is known. In general, this
This problem can be remedied by constructing a one-particle description is given by a Hilbert space
many-body model allowing for an arbitrary number h and one-particle observables and transforma-
of particles and antiparticles. The requirement that tions which are self-adjoint and unitary operators
this many-body model should have a ground state is on h, respectively. The most important observable
an important ingredient in this construction. is the Hamiltonian H. We will describe a general
It is obviously of interest to formulate and study construction of the corresponding many-body
many-body models of nondistinguishable particles system.
already in the nonrelativistic case. An important
Example As a motivating example we take the
empirical fact is that such particles come in two
Hilbert space h = L2 (R3 ) of square-integrable func-
kinds, bosons and fermions, distinguished by their
tions f (x), x 2 R3 , and the Hamiltonian H in [1]. A
exchange statistics (we ignore the interesting possi-
specific example for a unitary operator on h is the
bility of exotic statistics). For example, the fermion
gauge transformation (Uf )(x) = exp(i(x))f (x) with
many-particle version of [1] for suitable  and A is a
 a smooth, real-valued function on R 3 .
useful model for electrons in a metal. An elegant
method to go from the one- to the many-particle In this example, the corresponding wave functions
description is the formalism of second quantization: for N identical such particles are the L2 -functions
one promotes to a quantum field operator with fN (x1 , . . . , xN ), xj 2 R3 . It is obvious how to extend
320 Bosons and Fermions in External Fields

one-particle observables and transformations to such for all f 2 h. Then the relations characterizing the
N-particle states: for example, the N-particle Hamil- field operators can be written as
tonian corresponding to H in [1] is y
f ; g f ; g
XN
1 f ; g 0 10
HN irxj eAt; xj 2  et; xj 5
j1
2m 8f ; g 2 h
and the N-particle gauge transformation
Q UN is defined where
through multiplication with N j=1 exp(i(x j )). Z
For systems of indistinguishable particles it is f ; g d3 xf xgx
3
enough to restrict to wave functions which are even R

or odd under particle exchanges, is the inner product in h. The Fock space F  (h) can
then be defined by postulating that it contains a
fN x1 ; . . . ; xj ; . . . ; xk ; . . . ; xN
normalized vector  called vacuum such that
fN x1 ; . . . ; xk ; . . . ; xj ; . . . ; xN 6
f  0 8f 2 h 11
for all 1  j < k  N, with the upper and lower (y)
and that all (f ) are operators on F  (h) such that
signs corresponding to bosons and fermions, respec- y
(f ) = (f )
, where
is the Hilbert space adjoint.
tively (this empirical fact is usually taken as a
Indeed, from this we conclude that F  (h), as vector
postulate in nonrelativistic many-body quantum
space, is generated by
physics). It is convenient to define the zero-particle
Hilbert space as C (complex numbers) and to f1 ^ f2 ^    ^ fN y
f1 y
f2    y
fN  12
introduce a Hilbert space containing states with all
possible particle numbers: this so-called Fock space with fj 2 h and N = 0, 1, 2, . . . , and that the Hilbert
contains all states space inner product of such vectors is
0 1 hf1 ^ f2 ^    ^ fN ; g1 ^ g2 ^    ^ gM i
f0
B f1 x1 C X Y
N
B C N;M 1jPj fj ; gPj 13
B f2 x1 ; x2 C
B C 7 j1
B f3 x1 ; x2 ; x3 C P2SN
@ A
.. with SN the permutation group, with (1)jPj = 1
.
always, and (1)jPj = 1 and 1 for even and odd
with f0 2 C. The definition of HN and UN then permutations, respectively. The many-body Hamil-
naturally extends to this Fock space; see below. tonian q(H) corresponding to the one-particle Hamil-
tonian H can now be defined by the following relations:
y y
General Construction qH 0; qH; f  Hf 14
The construction of Fock spaces and many-particle for all f 2 h such that Hf is defined. Indeed, this
observables and transformations just outlined in a implies that
specific example is conceptually simple. An alter-
qHf1 ^ f2 ^    ^ fN
native, more efficient construction method is to use
quantum fields, which we denote as (x) and X
N
y
(x), x 2 R 3 . They can be fully characterized by the f1 ^ f2 ^    ^ Hfj ^    ^ fN 15
j1
following (anti-) commutator relations:
which defines a self-adjoint operator on F  (h), and
x; y
y 3 x  y; x; y 0 8 it is easy to check that this coincides with our down-
where [a, b] ab  ba, with the commutator and to-earth definition of HN above. Similarly, the
anticommutators (upper and lower signs, respec- many-body transformation Q(U) corresponding to
tively) corresponding to the boson and fermion case, a one-particle transformation U can be defined as
respectively. It is convenient to smear these fields QU ; QU y
f y
Uf QU 16
with one-particle wave functions and define
Z for all f 2 h, which implies that
f d3 xf x x
3 QUf1 ^ f2 ^    ^ fN
ZR 9 17
Uf1 ^ Uf2 ^    ^ UfN
y
f d3 x y xf x
R3
Bosons and Fermions in External Fields 321

and thus coincides with our previous definition of for all m, n. We also note that, in our definition of
UN . q(A), we made a convenient choice of normal-
While we presented the construction above for a ization, but there is no physical reason to not choose
particular example, it is important to note that it a different normalization and define
actually does not make reference to what the one-
q 0 A qA  bA 24
particle formalism actually is. For example, if we
had a model of particles on a space M given by where b is some linear function mapping self-adjoint
some nice manifold of any dimension and with M operators A to real numbers. For example, one may wish
internal degrees of freedom, we would take to use another reference vector  ~ instead of  in the
h = L2 (M) CM and replace [9] by Fock space, and then would choose b(A) = h, ~ q(A)i.
~
Z Then the relations in [19] are changed to
XM
f dx fj x j x 18 q0 A; q 0 B q 0 A; B S0 A; B 25
M j1
where S0 (A, B) = b([A, B]). However, the C-number
and its Hermitian conjugate, with the measure  on term S0 (A, B) in the relations [25] is trivial, since it
M defining the inner product in h, can be removed by going back to q(A).
Z X
f ; g dx fj xgj x
j Physical Interpretation
With that, all formulas after [9] hold true as they stand. The Fock space F  (h) is the direct sum of subspaces
Given any one-particle Hilbert space h with inner of states with different particle numbers N,
product ( , ), observable H, and transformation U, the
M
1
formulas above define the corresponding Fock spaces F  h hN 26

F  (h) and many-body observable q(H) and transfor- N0
mation Q(U). It is also interesting to note that this
where the zero-particle subspace h(0)
 = C is gener-
construction has various beautiful general (functorial)
ated by the vacuum , and h(N) is the N-particle
properties: the set of one-particle observables has a
subspace generated by the states f1 ^ f2 ^    ^
natural Lie algebra structure with the Lie bracket given
fN , fj 2 h. We note that
by the commutator (strictly speaking: i times the
commutator, but we drop the common factor i for N q1 27
simplicity). The definitions above imply that
is the particle-number operator, N FN = NFN for
(N)
qA; qB qA; B 19 all FN 2 h . The field operators obviously change
the particle number: y (f ) increases the particle
for one-particle observables A, B, that is, the above- number by one (maps h(N) to h(N1) ), and (f )
 
mentioned Lie algebra structure is preserved under decreases it by one. Since every f 2 h can be interpreted
this map q. In a similar manner, the set of one- as one-particle state, it is natural to interpret y (f ) and
particle transformations has a natural group struc- (f ) as creation and annihilation operators,
ture preserved by the map Q, respectively: they create and annihilate one particle in
QUQV QUV; QU1 QU1 20 the state f 2 h. It is important to note that, in the
fermion case, [10] implies that y (f )2 = 0, which is a
Moreover, if A is self-adjoint, then exp(iA) is mathematical formulation of the Pauli exclusion
unitary, and one can show that principle: it is not possible to have two fermions in the
same one-particle state. In the boson case, there is no
QexpiA expiqA 21
such restriction. Thus, even though the formalisms
For later use, we note that, if {fn }n2Z is some used to describe boson and fermion systems look very
complete, orthonormal basis in h, then operators A similar, they describe dramatically different physics.
on h can be represented by infinite matrices
(Amn )m, n2Z with Amn = (fm , Afn ), and Applications
X
qA Amn ym n 22 In our example, the many-body Hamiltonian
m;n H0 q(H) can also be written in the following
where (y)
= (y)
(fn ) obey suggestive form:
n
Z
   
m;
y
n  m;n ; m;
y
n  0 23 H0 d3 x y xH x 28
322 Bosons and Fermions in External Fields

and similar formulas hold true for other observables Field Algebras and Quasifree Representations
and other Hilbert spaces h = L2 (M) Cn . It is
In the previous section, we identified the field
rather easy to solve the model defined by such
operators (y) (f ) with particular Fock space opera-
Hamiltonian: all necessary computations can be
tors. This is analogous to identifying the operators
reduced to one-particle computations. For example,
pj = i@xj and qj = xj on L2 (RM ) with the generators
in the static case, where A and  are time
of the Heisenberg algebra, as usually done. (We
independent, a main quantity of interest in statistical
recall: the Heisenberg algebra is the star algebra
physics is the free energy
generated by Pj and Qj , j = 1, 2, . . . , M < 1, with
E 1 logtrexp H0  N  29 the well-known relations

where  > 0 is the inverse temperature,  the Pj ; Pk  ijk ; Pj ; Pk  Pj ; Qk  0


chemical potential, and the trace over the Fock 32
Pyj Pj ; Qyj Qj
space F  (h). One can show that
  for all j, k.) Identifying the Heisenberg algebra with
E tr 1 log1  expH   30
a particular representation is legitimate since, as is
where the trace is over the one-particle Hilbert space well known, all its irreducible representations are
h. Thus, to compute E, one only needs to find the (essentially) the same (this statement is made precise
eigenvalues of H. by a celebrated theorem due to von Neumann).
It is important to mention that the framework However, in case of the algebra generated by the
discussed here is not only for external field field operators (y) (f ), there exist representations
problems but can be equally well used to for- which are truly different from the ones discussed in
mulate and study more complicated models with the last section, and such representations are needed
interparticle interactions. For example, while the to construct relativistic external field problems. It is
model with the Hamiltonian H0 above is often too therefore important to distinguish the fields as
simple to describe systems in nature, it is easy to generators of an algebra from the operators repre-
write down more realistic models, for example, the senting them. We thus define the (boson or fermion)
Hamiltonian field algebra A (h) over a Hilbert space h as the star
Z Z algebra generated by y (f ), f 2 h, such that the map
H H0 e2 =2 d3 x d3 y y x y y f ! (f ) is linear and the relations

 jx  yj1 y x 31 f ; y g f ; g


describes electrons in an external electromagnetic f ; g 0 33
field interacting through Coulomb interactions. This y y
 f f
illustrates an important point which we would like
to stress: the task in quantum theory is twofold, are fulfilled for all f , g 2 h, with y the star
namely to formulate and to solve (exact of other- operation in A (h). The particular representation
wise) models. Obviously, in the nonrelativistic case, of this algebra discussed in the last section will be
it is equally simple to formulate many-body models denoted by 0 , 0 ((y) (f )) = (y) (f ). Other represen-
with and without interparticle interactions, and only tations P can be constructed from any projection
the latter are simpler because they are easier to operators P on h, that is, any operator P on h
solve: the two tasks of formulating and solving satisfying P
 = P2 = P . Writing (y) (f ) short for
models can be clearly separated. As we will see, in P ((y) (f )), this so-called quasifree representation
the relativistic case, even the formulation of an is defined by
external field problem is nontrivial, and one finds
^y f y
P f P f
that one cannot formulate the model without at 34
least partially solving it. This is a common feature of ^f P f  y
P f
quantum field theories making them challenging and
interesting. where the bar means complex conjugation. It is
important to note that, while the star operation is
identical with the Hilbert space adjoint
in the
fermion case, we have
Relativistic Fermion and Boson Systems
^f y Ff
with
We now generalize the formalism developed in the 35
previous section to the relativistic case. F P  P  for bosons
Bosons and Fermions in External Fields 323

where F is a grading operator, that is, F


= F and F2 = 1. Many-body formalism We now explain how to
We stress that the physical star operation always is
, construct a physical many-body description from these
that is, physical observables A obey A = A
. data. To simplify notation, we first assume that D has a
The present framework suggests to regard quantiza- purely discrete spectrum (which can be achieved by
tion as the procedure which amounts to going from a using a compact space). We can then label the eigen-
one-particle Hilbert space h to the corresponding field functions fn by integers n such that the corresponding
algebra A (h). Indeed, the Heisenberg algebra is eigenvalues En 0 for n 0 and En < 0 for n < 0.
identical with the boson field algebra A (CM ) (since Using the naive representation of the fermion field
the latter is obviously identical with the algebra of M algebra discussed in the last section, we get (we use the
harmonic oscillators), and thus conventional quantum notation introduced in [22])
mechanics can be regarded as boson quantization in the X X
special case where the one-particle Hilbert space is qD jEn j yn n  jEn j yn n 38
n 0 n<0
finite dimensional. It is interesting to note that
fermion quantum mechanics A (CM ) is the natural which is obviously not bounded from below and thus
framework for formulating and studying lattice fer- not physically meaningful. However, yn n = 1  n yn ,
mion and spin systems which play an important role in which suggests that we can remedy this problem by
condensed matter physics. interchanging the creation and annihilation operators
In the following, we elaborate the naive inter- for n < 0. This is possible: it is easy to see that
pretations of the relativistic equations in [2] and [3]
^n 8n 0 and ^n y
8n < 0 39
as a quantum theory of one particle, and we discuss n n
why they are unphysical. For simplicity, we assume provides a representation of the algebra in [23]. We
that the electromagnetic fields , A are time inde- thus define
pendent. We then show that quasifree representa- X
tions as discussed above can provide physically q
^D En : ^yn ^n : 40
acceptable many-particle theories. We first consider n2Z

the Dirac case, which is somewhat simpler. with the so-called normal ordering prescription
 
Fermions : ym n : ym n  ; ym n  41
One-particle formalism Recalling that i@t is the where we made use of the freedom of normalization
energy operator, we define the Dirac Hamiltonian D explained after [23] to eliminate unwanted additive
P
by rewriting [3] in the following form: constants. We get q(D) = n2Z jEn j yn n , which is
i@t D ; D ir eA  a m  e 36 manifestly a non-negative self-adjoint operator with
 as ground state. We thus found a physical many-
This Dirac Hamiltonian is obviously a self-adjoint body description for our model. We can now define
operator on the one-particle Hilbert space h = L2 (R 4 ) for other one-particle observables,
C4 , but, different from the Schrodinger Hamiltonian in X
[1], it is not bounded from below: for any E0 > 1, q
^A Amn : ^ym ^n : 42
n2Z
one can find a state f such that the energy expectation
value (f, Df ) is less than E0 . This can be easily seen for and, by straightforward computations, we obtain
the simplest case where the external potential vanishes,
A =  = 0. Then the eigenvalues of D can be computed qA; q
^ ^B q
^A; B SA; B 43
P P
by Fourier transformation, and one finds where S(A, B) = m<0 n 0 (Amn Bnm  Bmn Anm ),
q that is,
E  p 2 m2 ; p 2 R 3 37
SA; B trP AP BP  P BP AP 44
Due to the negative energy eigenvalues we conclude P
with P = n<0 fn (fn , ) the projection onto the
that there is no ground state, and the Dirac
subspace spanned by the negative energy eigenvec-
Hamiltonian thus describes an unstable system,
tors of D and P = 1  P . One can show that q ^(A)
which is physically meaningless.
is no longer defined for all operators but only if
To summarize: a (unphysical) one-particle
description of relativistic fermions is given by a P AP and P AP are
Hilbert space h together with a self-adjoint Hamil- HilbertSchmidt operators 45
tonian D unbounded from below. Other observables
and transformations are given by self-adjoint and (we recall that a is a HilbertSchmidt operator if
unitary operators on h, respectively. tr(a
a) < 1). The C-number term S(A,B) in [43] is
324 Bosons and Fermions in External Fields

often called Schwinger term and, different from the i@t  K


similar term in [25], it is now nontrivial, that is, it is

C i 48
no longer possible to remove it by a redefinition  ; K 2
^0 (A) = q
q ^(A)  b(A). This Schwinger term is an y iB C
example of an anomaly, and it has various interest- with
ing implications.
In a similar manner, one can construct the many- B2 ir eA2 m2 ; C e 49
body transformations Q(U)^ of unitary operators U Thus, one sees that the natural one-particle Hilbert
on h satisfying the very HilbertSchmidt condition space for the KleinGordon equation is
in [45], and one obtains h = L2 (R 3 ) C2 ; here, and in the following, we
^
QU ^
QV ^
U; VQUV 46 identify h with h0 h0 , h0 = L2 (R3 ), and use a
convenient 2  2 matrix notation naturally asso-
with interesting phase-valued functions . ciated with that splitting. However, the one-particle
More generally, for any one-particle Hilbert Hamiltonian is not self-adjoint but rather obeys
space h and Dirac Hamiltonian D, the physical

representation is given by the quasifree representa-


0 i
K JKJ; J 50
tion P in [34] with P the projection onto the i 0
negative energy subspace of D. The results about q ^ with
the Hilbert space adjoint. It is important to
and Q ^ mentioned hold true in any such
note that J is a grading operator. Thus, we can
representation. define a sesquilinear form
Thus the one-particle Hamiltonian D determines
which representation one has to use, and one f ; gJ f ; Jg 8f ; g 2 h 51
therefore cannot construct the physical represen-
with ( , ) the standard inner product, and [50] is
tation without specific information about D. How-
equivalent to K being self-adjoint with respect to
ever, not all these representations are truly different:
this sesquilinear form; in this case, we say that K is
if there is a unitary operator U on the Fock space
J-self-adjoint. Thus, in the KleinGordon case, this
F (h) such that
sesquilinear form takes the role of the Hilbert space
U
P1 y
f U P2 y
f 47 inner product and, in particular, not (,) but (,)J is
 
preserved under time evolution. However, different
for all f 2 h, then the quasifree representations from y , y J is not positive definite, and it is
associated with the different projections P(1)  and
therefore not possible to interpret it as probability
P(2)
 are physically equivalent: one could equally well
density as in conventional quantum mechanics. For
formulate the second model using the representation consistency, one has to require that one-particle
of the first. Two such quasifree representations are transformations U are unitary with respect to (,)J ,
called unitarily equivalent, and a fundamental that is, U1 = JUJ. We call such operators J-unitary.
theorem due to Shale and Stinespring states that To summarize: a (unphysical) one-particle
two quasifree representations P(1, 2) are unitarily description of relativistic bosons is given by a

equivalent if and only if P(1) (2)
  P is a Hilbert
Hilbert space of the form h = h0 h0 , the grading
Schmidt operator (a similar result holds true in the operator J in [50], and a J-self-adjoint Hamiltonian
boson case). K of the form as in eqn [48], where B 0 and C are
self-adjoint operators on h0 . Other observables and
transformations are given by J-self-adjoint and
Bosons J-unitary operators on h, respectively.
One-particle formalism Similarly as for the Dirac
case, the solutions of the KleinGordon equation in Many-body formalism We first consider the quasi-
[2] also do not define a physically acceptable one- free representation P(0) 
of the boson field algebra
particle quantum theory with a ground state: the A (h) so that the grading operator in [35] is
energy eigenvalues in [37] for A =  = 0 are a equal to J, that is, P(0)  = (1  J)=2. Writing
(y) (y)
consequence the relativistic invariance and thus P(0)

( (f )) = (f ), one finds that
equally true for the KleinGordon case. However,
qA
qJAJ; QU
QJU
J 52
in this case there is a further problem. To find the
one-particle Hamiltonian, one can rewrite the and thus J-self-adjoint operators and J-unitary
second-order equation in [2] as a system of first- operators are mapped to proper observables and
order equations, transformations. In particular, q(K) is a self-adjoint
Bosons and Fermions in External Fields 325

operator, which resolves one problem of the one-particle related to conformal field theory (see, e.g., Kac and
theory. However, q(K) is not bounded from below, and Raina (1987) for a textbook presentation and Carey
thus P(0)

is not yet the physical representation. and Ruijsenaars (1987) for a detailed mathematical
The physical representation can be constructed account within the framework described by us).
using the operators It turns out that the mathematical framework


discussed in the previous section is sufficient for
1 B1=2 iB1=2 1 0
T p 1=2 ; F 53 constructing fully interacting quantum field theories,
2 B iB1=2 0 1
in particular YangMills gauge theories, in 1 1
(for simplicity, we restrict ourselves to the case C = 0 but not in higher dimensions. The reason is that, in
and B > 0; we use the calculus of self-adjoint operators 3 1 dimensions, the one-particle observables A of
here) with the following remarkable properties: interest do not obey the HilbertSchmidt condition
in [45] but only the weaker condition
T 1 JT
F

tra
an < 1; a P AP 56
B 0 54
TKT 1 ^
K
0 B with n = 2, and the natural analog of g2 in 3 1
One can check that dimensions thus seems to be the Lie algebra g2n of
operators satisfying this condition with n = 2. Various
^y f y
Tf ; ^f T 1 f 55 results on the representation theory of such Lie
algebras g2n>2 have been developed (see Mickelsson
is a quasifree representation P of A (h) with
(1989), where various interesting relations to infinite-
P = (1  F)=2. With that the construction of q ^ and
^ is very similar to the fermion case described dimensional geometry are also discussed).
Q
^ and F now As mentioned, the Schwinger term S(A,B) in [44] is
above (the crucial simplification is that K
an example of an anomaly. Mathematically, it is a
are diagonal). In particular, q^(K) is a non-negative
nontrivial 2-cocycle of the Lie algebra g2 , and analogs
operator with the ground state , and q ^(A) and
^ for the groups g2n>2 have been found. These cocycles
Q(U) are self-adjoint and unitary for every one-
provide a natural generalization of anomalies (in the
particle observable A and transformation U, respec-
meaning of particle physics) to operator algebras. They
tively. One also gets relations as in [43] and [46].
not only shed some interesting light on the latter, but
also provide a link to notions and results from
Related Topics of Recent Interest noncommutative geometry (see, e.g., Gracia-Bonda
et al. (2001)). We believe that this link can provide a
The impossibility to construct relativistic quantum- fruitful driving force and inspiration to find ways to
mechanical models played an important role in the deepen our understanding of quantum YangMills
early history of quantum field theory, as beautifully theories in 3 1 dimensions (Langmann 1996).
discussed in chapter 1 of Weinberg (1995).
The abstract formalism of quasifree representations See also: Anomalies; C*-Algebras and Their
of fermion and boson field algebras was developed in Classification; Dirac Fields in Gravitation and Nonabelian
many papers (see, e.g., Ruijsenaars (1977), Grosse and Gauge Theory; Dirac Operator and Dirac Field; Gerbes in
Langmann (1992), and Langmann (1994) for explicit Quantum Field Theory; Quantum Field Theory in Curved
results on Q ^ and ). A nice textbook presentation Spacetime; Quantum n-Body Problem; Superfluids;
with many references can be found in chapter 13 of Two-Dimensional Models.
Gracia-Bonda et al. (2001) (this chapter is rather self-
contained but mainly restricted to the fermion case).
Further Reading
Based on the ShaleStinespring theorem, there has
been considerable amount of work to investigate Carey AL and Ruijsenaars SNM (1987) On fermion gauge
whether the quasifree representations associated groups, current algebras and KacMoody algebras. Acta
Applicandae Mathematicae 10: 186.
with different external electromagnetic fields
DeWitt B (2003) The Global Approach to Quantum Field
1 , A1 and 2 , A2 are unitarily equivalent, if and Theory, International Series of Monographs on Physics, vols.
which time-dependent many-body Hamiltonians 1 and 2, p. 114. New York: Oxford University Press.
exist, etc. (see chapter 13 of Gracia-Bonda et al. Gracia-Bonda JM, Varilly JC, and Figueroa H (2001) Elements
(2001), and references therein). of Noncommutative Geometry, Birkhauser Advanced Texts:
The infinite-dimensional Lie algebra g2 of Hilbert Basel Textbooks. Boston: Birkhauser.
Grosse H and Langmann E (1992) A superversion of quasifree second
space operators satisfying the condition in [45] is an quantization. Journal of Mathematical Physics 33: 10321046.
interesting infinite-dimensional Lie algebra with a Kac VG and Raina AK (1987) Bombay Lectures on Highest
beautiful representation theory. This subject is closely Weight Representations of Infinite-Dimensional Lie Algebras,
326 Boundaries for Spacetimes

Advanced Series in Mathematical Physics, vol. 2. Teaneck: Reed M and Simon B (1975) Methods of Modern Mathematical
World Scientific Publishing. Physics. II. Fourier Analysis, Self-Adjointness. New York:
Langmann E (1994) Cocycles for boson and fermion Bogoliubov Academic Press.
transformations. Journal of Mathematical Physics 96112. Ruijsenaars SNM (1977) On Bogoliubov transformations for
Langmann E (1996) Quantum gauge theories and noncommuta- systems of relativistic charged particles. Journal of Mathema-
tive geometry. Acta Physica Polonica B 27: 24772496. tical Physics 18: 517526.
Mickelsson J (1989) Current Algebras and Groups, Plenum Weinberg S (1995) The Quantum Theory of Fields, vol. I (English
Monographs in Nonlinear Physics. New York: Plenum Press. summary) Foundations. Cambridge: Cambridge University Press.
Rafelski J, Fulcher LP, and Klein A (1978) Fermions and bosons
interacting with arbitrary strong external fields. Physics
Reports 38: 227361.

Boundaries for Spacetimes


S G Harris, St. Louis University, St. Louis, MO, USA This article will consider several of the methods
2006 Elsevier Ltd. All rights reserved. that have been used or proposed for constructing
boundaries for spacetimes, ranging from the ad hoc
(but practical) to the universal. Perhaps the
Introduction simplest way to classify these methods is into
those which employ or analyze embeddings of the
There is a common practice in mathematics of placing a spacetime in question and those that do not.
boundary on an object which may not appear to come
naturally equipped with one; this is often thought of as
adding ideal points to the object. Perhaps the most Boundaries from Embeddings
famous example is the addition of a single point at General
infinity to the complex plane, resulting in the Riemann
sphere: this is a boundary point in the sense of providing The simplest and most common method of construct-
an ideal endpoint for lines and other endless curves in ing a boundary for a spacetime M is to find a suitable
the plane. Often, there is more than one reasonable way manifold N (of the same dimension) and an appro-
to construct a boundary for a given object, depending priate map  : M ! N which is a topological embed-
on the intent; for instance, the plane is sometimes ding, that is, a homeomorphism onto its image (M).
We can consider M   , the closure of (M) in N, as the
equipped, not with a single point at infinity, but with a
-completion of M, and @ (M) = M    (M) as the
circle at infinity, resulting in a space homeomorphic to a
closed disk. Both these boundaries on the plane have -boundary. Typically, this embedding is chosen in
useful but different things to tell us about the nature of such a way that curves of interest in M such as
the plane; the common feature is that, by bringing the timelike or null geodesics or causal curves of bounded
infinite reach of the plane within the confines of a more acceleration which have no endpoints in M, do have
finite object, we are better able to grasp the behavior of endpoints in @ (M); in other words, if c : [0, 1) ! M is
the original object. such a curve of interest, then limt!1 (c(t)) exists in N.
The general usefulness of the construction of The common practice, initiated by Penrose in
boundaries for an object is to allow behavior of 1967, is to choose N to be another spacetime
structures in the completed object to aid in often called the unphysical spacetime, while M is
visualization of behavior in the original object, considered the spacetime of physical interest and to
such as by providing a degree of measurement or require the embedding  to be a conformal mapping,
other classification of processes at infinity. This that is,  carries the spacetime metric in M to a scalar
utility has not been overlooked for spacetimes. A multiple of the spacetime metric in N. As conformal
variety of purposes may be served by various maps preserve the local causal structure, leaving
boundary construction methods: providing a locale unchanged the notions of timelike curve or null
curve, this means that M   inherits from N a causal
for singularities (as the spacetime itself is modeled
by a smooth manifold with a smooth metric, free of structure which, locally, is an extension of that of M.
singular points); providing a platform from which to This allows us to speak of causal relationships within
M  , closely related to those in M.
measure global properties such as total energy or
angular momentum; displaying in finite form the
Minkowski Space
causal structure at infinity; or providing a compact
(or quasicompact) topological envelope for the The prototypical example is the conformal embedding
spacetime while preserving the causal structure. of Minkowski space into the Einstein static spacetime.
Boundaries for Spacetimes 327

Let Rn denote Euclidean n-space, Sn the unit termed future-null infinity, and I is past-null infinity.
n-sphere, and Ln Minkowski n-space, that is, Rn with All spacelike geodesics come to i0 , spacelike infinity.
metric ds2 = dx21    dx2n1  dt2 (so Ln = For n = 2, this picture produces the familiar
Rn1  L1 ). The n-dimensional Einstein static space- diamond representation of L2 (Figure 3): as E2 is
time is the product spacetime En = Sn1  L1 . Con- easily unrolled into another copy of L2 (metric
sider Sn1 as embedded in Rn = Rn1  R1 . Then the
conformal embedding is  : Ln ! En , expressed as
i+
 : Rn1  L1 ! Sn1  L1  Rn1  R 1  L1 given
by (x, t) = ((x=jxj) sin , cos , ), where  = tan1
(t jxj)  tan1 (t  jxj) and  = tan1 (t jxj)
tan1 (t  jxj). The boundary @ (Ln ) consists of the =
following: the points {  = ; 0 <   }, composed
of an Sn2 of null lines coming together at the point
i = (0, 1, ); a similar cone of null lines {   = ;
   < 0} with vertex at i = (0, 1, ); and a single
limit-point for both cones at i0 = (0, 1, 0). The  > 0
null cone is called I (the letter is read scri for
script-I), its counterpart I (Figures 1 and 2). As all
future-directed timelike geodesics in Ln have i as an
endpoint in En , i is called future-timelike infinity;
similarly, i is past-timelike infinity. Every future-
directed null geodesic ends up on I , which is thus
+

E2

=0
i+ =

+
i0

Image of L2
i0

=0

=
i =

Figure 1 L2 conformally embedded in E2 = S1  L1 : Figure 2 L3 conformally embedded in E3 = S2  L1 :


328 Boundaries for Spacetimes

(1994) formulated what they called the abstract


Unrolled E2 boundary of a spacetime. This depends on a choice
of class of interesting curves, each characterizable
+ as having either infinite or finite parameter length;
typical choices for this class would be timelike
geodesics or causal geodesics or timelike curves of
bounded acceleration. For instance, a boundary
point may be said to represent a singularity with
respect to the chosen class of curves if it is the
endpoint of one such curve with finite parameter
length; nonsingular points are points at infinity.
Image of L2
These classifications do not require conformal
embeddings, nor even that the target of the embed-
dings be spacetimes; they accommodate boundaries
i of a far more general type than Penroses notion
Figure 3 L 2
conformally embedded in unrolled E2 , i.e., stemming from conformal embeddings.
R 1  L1 = L2 : A somewhat different study of boundaries from
embeddings has been formulated by Garca-Parrado
d2  d 2 ), this means that (L2 ) is the region jj and Senovilla (2003), classifying points at infinity and
jj <  in L2 ; timelike curves and null geodesics in singularities in @ (M) for embeddings  : M ! N in
the original L2 are the same as in (L2 ), and their which N is a spacetime,  preserves the chronology
endpoints in the boundary of the diamond are relation , and there is also a diffeomorphism
evident. For higher dimensions, the picture is not as : (M) ! N which again preserves (the chronol-
visually obvious, since En cannot be unrolled; but the ogy relation in a spacetime is defined thus: x y if
principle of reading the causal structure at infinity of and only if there is a future-directed timelike curve
Ln via its boundary points in En remains the same. from x to y). This scheme applies more generally than
to conformal embeddings, but the requirement for
Conformal Embeddings chronology-preserving maps in both directions guar-
antees a strong sensitivity to causality; it amounts to a
There have been various formulations designed to mild extension of Penroses notion that is often much
emulate the conformal mapping of Ln with respect to easier to construct.
spacetimes, which are, in some sense, asymptotically
like Minkowski space being conformally mapped into
larger spacetimes. A spacetime M with metric g is Universal Constructions
called asymptotically simple or (alternatively) asymp-
totically flat if there is a spacetime N with metric h, B-Boundary
an embedding  : M ! N, and a scalar function  Attempts have been made to formulate boundary
defined on N with  h = (  )2 g (i.e.,  is concepts specifically for defining singularities as
conformal with 2 the conformal factor) and  = 0 ideal endpoints for finite-length geodesics. The
on @ (M), d 6 0 on @ (M), and various other most complete venture in this direction is the
restrictions on , depending on the intent. One can b-boundary (b for bundle) of Schmidt (Hawking
define asymptotic symmetries of M by means of and Ellis 1973, pp. 276284). This is a formulation
motions within @ (M), leading to notions of global that takes note only of the connection in the linear
energy and angular momentum (see Hawking and frames bundle L(M) of a spacetime M (or of any
Ellis (1973) and Wald (1984) for details). manifold with a linear connection, metric or other-
wise); in other words, it takes no particular note of
Classifications of Embeddings
the spacetime metric or even of the causal structure of
As a general rule, there is no uniqueness in the the spacetime, but only of the notion of parallel
choice of an embedding  for a spacetime M to translation of tangent vectors along curves. Parallel
construct a boundary, nor in the topology of the translation of a frame (a basis for the tangent space)
resulting boundary @ (M), or even of which curves along a curve is used to obtain an ad hoc length for
of interest end up having endpoints in the boundary. the curve by treating the translated frame as positive-
In an attempt to categorize which embeddings yield definite orthonormal at each point; whether this
equivalent results and what sort of results there are length is finite or infinite is independent of the choice
in terms of endpoints of curves, Scott and Szekeres of the original frame. The Schmidt construction
Boundaries for Spacetimes 329

defines a boundary on M which gives an endpoint for


each curve, endless in M, which is finite in that sense:
Select a positive-definite metric on L(M), give it a
boundary by means of Cauchy completion, and then
take the appropriate quotient by the bundle group.
This has an appealing universality of application, but
the problems of putting it into practice are quite P
formidable. Also, the fact that it takes no special note
of the spacetime character of M suggests that it may
not be of particular utility for physical insights.

Causal Boundary: Basics Figure 4 PIP P = I  (x ).

In 1972 Geroch, Kronheimer, and Penrose (GKP)


a causal curve ending at x.) The future causal
formulated a notion of boundary the causal ^
boundary of M, @(M), consists of all the TIPs of M;
boundary that is specifically adapted to the causal ^
^ = @(M)
the future causal completion of M is M [ M.
character of a spacetime M; indeed, it is defined in
But that is just a set; the causal structure of M needs
such a way that one need know only the chronology ^
to be extended to M.
relation on M without any further reference to ^
For any x 2 M and P 2 @(M), set x P if and
the metric (another way of saying this is that the
only if x 2 P; set P x if and only if P  I (y) for
causal boundary is conformally invariant). Like ^
some y x (y 2 M); and for P and Q in @(M), set
Schmidts b-boundary, the causal boundary is a 
P Q if and only if P  I (y) for some y 2 Q.
universal construction, not depending on any extra-
If we consider this an extension of the relation on
neous choices; however, although it has an obvious
M, then we end up with a relation which, like that
clarity in its causal structure, there are subtleties in
on M, is transitive and antireflexive. Furthermore, it
the choice of an appropriate topology which are ^   if and
has the property that for all ,  2 M,
perhaps not yet fully resolved. As this boundary
only if for some x 2 M,  x . (One can also
construction appears to embody the best hopes for a
amend the chronology relation within M to be more
practical universal construction, it is detailed here in
like the definition in the extension; that is not of
some depth.
major import.)
The causal boundary construction applies only to
We can also extend the causality relation
on M
strongly causal spacetimes; essentially, this means ^ (in M, x
y if there is a future-directed
to one on M
that the local causal structure at each point is
exactly reflective of the global causal structure.
The basic construction of the causal boundary of
a spacetime M starts with two separate parts: the
future and past (pre-)boundaries of M, intended as
yielding endpoints for, respectively, future- and past-
endless causal curves. Part of the difficulty of the
causal boundary is knowing how best to meld these
two into one; currently, there are several answers to
this conundrum.
The elements of the future causal boundary of M
are defined in terms of the past-set operator I . For
a point x 2 M, the pastS of x is I (x) = {y j y x}; for
a set A  M, I [A] = x2A I (x). A set P  M is


called a past set if I [P] = P; anything of the form


P
P = I [A] is a past set, and all past sets have this
form. A past set P is an indecomposable past set (IP)
if P cannot be written as P1 [ P2 for past sets which
are proper subsets Pi ( P. IPs come in exactly two
varieties: pointlike IPs (PIPs), of the form I (x)
(Figure 4), and terminal IPs (TIPs), of the form I [c]
for c a future-endless causal curve (Figure 5). (Of c
course, any I (x) can also be expressed as I [c] for c Figure 5 TIP P = I  c.
330 Boundaries for Spacetimes

causal curve from x to y): for x 2 M and P, Q 2 of more concern is that the topology prescribed by
^
@(M), x
P for I (x)  P, P
x for P  I (x), and GKP is not what might be expected in even the
P
Q for P  Q. simplest of cases, for example, Minkowski space: Ln
The intent is to have the elements of @(M)^ provide needs no identifications among boundary points (no
future endpoints for future-endless causal curves in matter whose identification procedure is followed).
M; in particular, we want two such curves, c1 and The GKP topology on Ln , restricted to @(L ^ n ), is not
n2 1
c2 , to be assigned the same future endpoint precisely that of a cone (S  R with a point added), as is
when I [c1 ] = I [c2 ]. This is accomplished by the the case for I in the conformal embedding into En ;
simple expedient of defining the future endpoint of a ^ n ) (not including i )
but, instead, each null line in @(L
future-endless causal curve c to be P = I [c]. We do is an open set, and i has no neighborhood in @(L ^ n)
not have a topology on M ^ as yet, but it is worth save for the entire boundary. This is a topology
noting that if P is the assigned future endpoint of c, bearing no relation at all to that of any embedding.
then I (P) = I [c]; this is at least the correct causal
behavior for a putative future endpoint of c.
Future Causal Boundary
We can perform all the operations above in the
time-dual manner, obtaining the past causal bound- Construction An alternative approach, initiated by

ary @(M), consisting of terminal indecomposable Harris (1998), is to forego the full causal boundary
future sets (TIFs), and the past causal completion and concentrate only on M ^ and M  separately. There
M 
 = @(M) [ M. The full causal boundary of M is an advantage to this in that the process of future
consists of the union of @(M)^ 
with @(M) with some causal completion that is to say, forming M ^ from
sort of identifications to be made. M can be made functorial in an appropriate
As an example of the need for identifications, category of chronological sets: a set X with a
consider M to be L2 with a closed timelike line relation which is transitive and antireflexive such
segment deleted, say M = L2  {(0, t) j 0  t  1}. that it possesses a countable subset S which is
^
For @(M), we have first the boundary elements at chronologically dense, that is, for any x, y 2 X,
infinity: the TIP i = M (the past of the positive time there is some s 2 S with x s y. Any strongly
axis) and the set of TIPs making up I (the pasts of causal spacetime M is a chronological set, as is M. ^
null lines going out to infinity in L2 ); and then, the The entire construction of the future causal bound-
boundary elements coming from the deleted points: ary works just as well for a chronological set. The
for each t with 0 < t  1, two IPs emanating from role of a timelike curve in a chronological set is
(0, t), that is, P t , the past of the null line going taken by a future chain: a sequence c = {xn } with
pastwards from (0, t) toward x > 0, and P t , the past xn xn1 for all n. For any future chain c, I [c] is an
of the null line going pastwards from (0, t) toward IP, and any IP can be so expressed; but unlike in
x < 0; and P0 , emanating from (0, 0), that is, the spacetimes, I (x) may or may not be an IP for x 2 X.
past of the negative time axis. Similarly, @(M)  Then, X ^ is always future complete in the sense that
   ^ there is an element  2 X ^
consists of i , I , TIFs Ft and Ft emanating from for any future chain c in X,
 
(0, t) for 0  t < 1, and the TIF F1 emanating from with I () = I [c]: for instance, if the chain c lies in
(0, 1). We probably want to make at least the X but there is no x 2 X with I (x) = I [c], just let
following identifications for each t with 0 < t < 1, ^
 = I [c], which is an element of @(X). This yields a

Pt Ft

and P  
t Ft ; P1 F1 P1 ; and F0 functor of future completion from the category of

P0 F0 . This results in a two-sided replacement chronological sets to the category of future-complete
for the deleted segment; for some purposes, it might chronological sets, and the embedding X ! X ^ is a
be deemed desirable to identify the two sides as one, universal object in the sense of the category theory;
but a universal boundary is probably a good idea, this implies that it is categorically unique and is the
leaving further identifications as optional quotients minimal future-completion process.
of the universal object. However, it is crucial to have more than the
How best to define the appropriate identifications chronology relation operating in what is to be a
in general is a matter of some controversy. GKP boundary; topology of some sort is needed. This is
defined a somewhat complicated topology on accomplished by defining what might be called the
M ^
 = @(M) 
[ @(M) [ M, then used an identification future-chronological topology for any chronological
intended to result in a Hausdorff space. There are set including for M ^ when M is a strongly causal
significant problems with this approach in some spacetime. This topology is defined by means of a
outre spacetimes, as pointed out by Budic and Sachs limit-operator L ^ on sequences: if X is the chron-
(1974) and Szabados (1989), both of whom recom- ological set, then for any sequence of points  = {xn }
mended a different set of identifications. But what is ^
in X, L() denotes a subset of X which is the set of
Boundaries for Spacetimes 331

limits of . It is explicitly recognized that there may


be more than one limit of a sequence, as the space
may not be Hausdorff; no attempt is made to
remove any non-Hausdorffness, as this is viewed as x
giving important information on how, possibly,
two points in the future causal boundary represent
P
very similar and yet not identical pieces of z
information about the causal structure at infinity. I (x)

Once the limit operator is in place, the actual


^ n g): there is some IP P ) I  (x) such that for
Figure 7 x 62 L(fx
topology on X is defined thus: a subset A  X is
all z 2 P, z xn for infinitely many n.
said to be closed if and only if for any sequence
^
  A, L()  A (and open sets are complements of
closed sets). This yields the elements of L() ^ as ^
in M, the point I [c] in @(M) is the topological
topological limits of . endpoint of c in M. ^
The definition of L ^ is simplest when X has the cn , then X is homeomorphic to the conformal
5. If X = L
property that I (x) is an IP for any x 2 X; as this is image of Ln in En together with I and i ; in
true for X being either a spacetime M or the future ^ n ) has the topology of a cone.
particular, @(L
causal completion M ^ of a spacetime, the discussion
here is restricted to this situation. Let us also make Examples The future causal boundary with the
the common assumption that X is past-distinguishing, future-chronological topology can be calculated
that is, I (x) = I (y) implies x = y. with a fair degree of success. For instance, if M
Let  = {xn } be a sequence of points in a past- is conformal to a simple product spacetime Q  L1
distinguishing chronological set X in which the past (Q a Riemannian manifold), then @(M) ^ is much
of any point is an IP. Then L() ^ consists of those ^ n
like @(L ) in that it consists of null or timelike
points x for which (see Figures 6 and 7) lines factored over a particular boundary construc-
tion @(Q) on Q, coming together at a single point i
1. for all y 2 I (x), for n sufficiently large, y xn , (the IP which is all of M); if Q is complete, then
and these are all null lines, and together they may be
2. for any IP P ) I (x), there is some z 2 P such that called I .
for n sufficiently large, z 6 xn . The elements of @(Q) are defined in terms of the
Then the future-chronological topology on X has Lipschitz-1 functions on Q known as Busemann
these features: functions: if c : [, !) ! Q is any endless unit-speed
curve (typically, ! = 1), then the Busemann function
1. It is a T1 topology, that is, points are closed. bc : Q ! R is defined by bc (q) = lims!! (s  d(c(s), q)),
2. If I (x) = I [c] for a future chain c = {xn }, then x where d is the distance function in Q; this function
is a topological limit of the sequence {xn }. is either finite for all q or infinite for all q. The set
3. If X = M, a strongly causal spacetime, then the B(Q) of finite Busemann functions has an R-action
future-chronological topology is precisely the defined by a  bc = bac , where (a  c)(s) = c(s a).
manifold topology. Then @(Q) = B(Q)=R. For any P 2 @(M), ^ the
4. If X = M, ^ the future causal completion of a 1
boundary of P, as a subset of Q  L Q  R, is
strongly causal spacetime M, then the induced the graph of a Busemann function (the function is
topology on M is the manifold topology, @(M) ^ is
^ ^ bc for P generated by a null curve projecting to c);
a closed subset of M, and M is dense in M. As per and a point x = (q, t) in M can be represented by
property (2), for any future-endless causal curve c @(I (x)), which is the graph of the function
t  d(, q). Thus, one could use the function-
space topology on B(Q) to topologize M; ^ in that
^
function-space topology @(M) is a cone on @(Q),
and M,^ apart from i , is the topological product of
Xn X
Z R with Q [ @(Q). The future-chronological topol-
ogy is sometimes different from the function-space
P topology, allowing more convergent sequences
y
than the function-space topology does. When this
I (x) happens, the result is non-Hausdorff, revealing
^ n g): for all y 2 I  (x ), eventually y xn , and for pairs of points in @(M) ^ which are more closely
Figure 6 x 2 L(fx
all IP P ) I  (x ), there is some z 2 P such that eventually z 6 xn : related to one another than the function-space
332 Boundaries for Spacetimes

topology reveals; but it is still the case that @(M), ^ formed of TIPs and TIFs, plus any TIP or TIF that
apart from i , is fibered by R over @(Q). cannot be paired; this produces an appropriate set of
If Q is a warped product Q = (a, b)  K for a ^
identifications within @(M) 
[ @(M). The chronology
compact manifold K with metric dr2 e(r) h with h relation on M is extended to M 
 = @(M) [ M by treating
a metric on K, then one can calculate more precisely: each point x in M as the Szabados pair (I (x), I (x)) and
if, for instance,  has a minimum in the interior of each unpaired IP P as (P, ;) and unpaired IF F as (;, F),
(a, b) and has suitable growth on either end, then and then defining (P, F) (P0 , F0 ) whenever
@(Q) represents two copies of K (one for each end of F \ P0 6 ;.
(a, b)  K), the future-chronological topology is the The resulting chronological set is not necessarily
same as the function-space topology, and M ^ (apart either past- or future-distinguishing, but it is (past and
from i ) is a simple product of R with Q [ @(Q): future)-distinguishing. The topology they propose
^
@(M) is precisely a null cone over two copies of K. places endpoints in @(M) for all causal curves which
This applies, for instance, to exterior Schwarzschild, are endless in M, but there may be multiple future
where K = S2 ; the boundary at one end of exterior endpoints for a single future-endless curve. The
Schwarzschild is the usual I , and the boundary at topology need not be T1 : points can fail to be closed.
the other end is the null cone {r = 2m}, where For a product spacetime M = Q  L1 , the MarolfRoss
exterior attaches to interior Schwarzschild. topology on M  is always the function-space topology.
Calculations for the future-chronological topology As of this writing, there is active research by J L Flores
become much easier when @(M) ^ is purely spacelike, ^
to institute a MarolfRoss type of identification of @(M)
^
that is, no P 2 @(M) is contained in the past of any 
with @(M) using a topology that partakes more of the
other element of M. ^ For instance, if M is conformal future- and past-chronological topologies.
to a multiwarped product, Q1      Qm  (a, b)
with metric f1 (t)2 h1    fm (t)2 hm  dt2 , where hi See also: Asymptotic Structure and Conformal Infinity;
is a Riemannian metric on Qi , then @(M) ^ will be Spacetime Topology, Causal Structure and Singularities.
purely spacelike if all theR Riemannian factors are
b
complete and for each i, b 1=fi (t) dt < 1; in that
^
case, @(M) Q, where Q = Q1      Qm and
^ Q  (a, b). This applies, for instance, to inter- Further Reading
M
ior Schwarzschild, where Q1 = R 1 and Q2 = S2 , Budic R and Sachs RK (1974) Causal boundaries for general relativistic
yielding the topology of R 1  S2 for the Schwarzs- space-times. Journal of Mathematical Physics 15: 13021309.
Garca-Parrado A and Senovilla JMM (2003) Causal relationship:
child singularity.
a new tool for the causal characterization of Lorentzian
There is a categorical universality for spacelike manifolds. Classical and Quantum Gravity 20: 625664.
boundaries and the future-chronological topology. Geroch RP, Kronheimer EH, and Penrose R (1972) Ideal points
This means that any other reasonable way of in space-time. Proceedings of the Royal Society of London,
future-completing interior Schwarzschild must yield Series A 327: 545567.
Harris SG (1998) Universality of the future chronological
R1  S2 or a topological quotient of that for the
boundary. Journal of Mathematical Physics 39: 54275445.
singularity; and if the result is to be past-distinguishing, Harris SG (2000) Topology of the future chronological boundary:
R1  S2 is the only possibility. universality for spacelike boundaries. Classical and Quantum
Of course, all this can be done in the time-dual Gravity 17: 551603.
fashion, using the past-chronological topology on Harris SG (2001) Causal boundary for standard static spacetimes.
 It would be desirable to combine the future and Nonlinear Analysis 47: 29712981 (Special Edition: Proceed-
M.
ings of the Third World Congress in Nonlinear Analysis).
past causal boundaries with a suitable topology as Harris SG (2004a) Boundaries on spacetimes: an outline. Classical
well as appropriate identifications. There has been and Quantum Gravity 359: 6585.
some work in that direction. Harris SG (2004b) Discrete group actions on spacetimes: causality
conditions and the causal boundary. Classical and Quantum
Gravity 21: 12091236.
Causal Boundary: Revisited Harris SG and Dray T (1990) The causal boundary of the trousers
space. Classical and Quantum Gravity 7: 149161.
Marolf and Ross (2003) have proposed an identification Hawking SW and Ellis GFR (1973) The Large Scale Structure of
of TIPs and TIFs that relies on the equivalence relation Space-Time. Cambridge: Cambridge University Press.
defined by Szabados. For an IP P and IF F, call (P, F) a Marolf D and Ross SF (2003) A new recipe for causal
completions. Classical and Quantum Gravity 20: 40854118.
Szabados pair if P  I (x) for all x 2 F, P is maximal Schmidt BG (1972) Local completeness of the b-boundary.
among IPs for that property, and dually for F with Communications in Mathematical Physics 29: 4954.
respect to P. For instance, for any x 2 M, (I (x), I (x)) Scott SM and Szekeres P (1994) The abstract boundary a new
is a Szabados pair. The MarolfRoss version of the approach to singularities of manifolds. Journal of Geometry

causal boundary, @(M), consists of all Szabados pairs and Physics 13: 223253.
Boundary Conformal Field Theory 333

Szabados LB (1988) Causal boundary for strongly causal space- Wald RM (1984) General Relativity. Chicago: University of
times. Classical and Quantum Gravity 5: 121134. Chicago Press.
Szabados LB (1989) Causal boundary for strongly causal space-
times: II. Classical and Quantum Gravity 6: 7791.

Boundary Conformal Field Theory


J Cardy, Rudolf Peierls Centre for Theoretical In two dimensions, it is useful to use the so-called
Physics, Oxford, UK complex coordinates z = x1 ix2 , z = x1  ix2 . In
2006 Elsevier Ltd. All rights reserved. CFT, there are local densities j (z, z), called primary
fields, whose correlation functions transform covar-
iantly under conformal mappings z ! z0 = f (z):
Boundary conformal field theory (BCFT) is simply h1 z1 ; z1 2 z2 ; z2   i
the study of conformal field theory (CFT) in Y j    
h
domains with a boundary. It gains its significance f 0 zj hj f 0 zj h1 z01 ; z01 2 z02 ; z02   i 1
[1] because, in some ways, it is mathematically i
simpler: the algebraic and geometric structures of where (hj , hj ) (usually real numbers, not complex
CFT appear in a more straightforward manner; and conjugates of each other) are called the conformal
[2] because it has important applications: in string weights of j . These local fields can in general be
theory in the physics of open strings and D-branes, normalized so that their two-point functions have
and in condensed matter physics in boundary critical the form
behavior and quantum impurity models.

This article, however, describes the basic ideas hj zj ; zj k zk ; zk i jk =zj  zk 2hj zj  zk 2hj 2
from the point of view of quantum field theory,
without regard to particular applications or to any They satisfy an algebra known as the operator
deeper mathematical formulations. product expansion (OPE)
i z1 ; z1  j z2 ; z2
X
Review of CFT cijk z1  z2 hi hj hk
k
Stress Tensor and Ward Identities   
 z1  z2 hi hj hk k z1 ; z1    3
Two-dimensional CFTs are massless, local, relati-
vistic renormalized quantum field theories. which is supposed to be valid when inserted into
Usually they are considered in imaginary time, higher-order correlation functions in the limit when
that is, on two-dimensional manifolds with jz1  z2 j is much less than the separations of all the
Euclidean signature. In this article, the metric is other points. The ellipses denote the contributions of
also taken to be Euclidean, although the formula- other nonprimary scaling fields to be described
tion of CFTs on general Riemann surfaces is also below. The structure constants cijk , along with the
of great interest, especially for string theory. For conformal weights, characterize the particular CFT.
the time being, the domain is the entire complex An essential role is played by the energy
plane. momentum tensor, or, in Euclidean field theory
Heuristically, the correlation functions of such a language, the stress tensor T
. Heuristically, it is
field theory may be thought of as being given by defined as the response of the partition function to
the Euclidean path integral, that is, as expectation a local change in the metric:
values of products of local densities with respect
T
x 2 ln Z= g
x 4
to a Gibbs measure Z1 eSE ({ }) [d ], where the
{ (x)} are some set of fundamental local fields, SE (the factor of 2 is included so that similar factors
is the Euclidean action, and the normalization disappear in later equations).
factor Z is the partition function. Of course, such The symmetry of the theory under translations
an object is not in general well defined, and this and rotations implies that T
is conserved,
picture should be seen only as a guide to @
T
= 0, and symmetric. Scale invariance implies
formulating the basic principles of CFT which that it is also traceless  T

= 0. It should be
can then be developed into a mathematically noted that the vanishing of the trace of the stress
consistent theory. tensor for a scale invariant classical field theory does
334 Boundary Conformal Field Theory

not usually survive when quantum corrections are c 0


Tz ! f 0 z2 Tz0 fz ; zg 9
taken into account: indeed,  / (g), the renorma- 12
lization group (RG) beta-function. A quantum field where {z0 , z} = (f 000 f 0  32 f 00 2)=f 0 2 is the Schwartzian
theory is thus only a CFT when this vanishes, that is, derivative.
at an RG fixed point. In complex coordinates, the
components Tzz = Tzz = 4 vanish, while the con- Virasoro Algebra
servation equations read
As with any quantum field theory, the local fields
@ z Tzz @z Tzz 0 5 can be realized as linear operators acting on a
Hilbert space. In ordinary QFT, it is customary to
Thus, correlators of T(z)  Tzz are locally analytic
quantize on a constant-time hypersurface. The
(in fact, globally meromorphic) functions of z, while
 generator of infinitesimal time translations is the
those of T(z)  Tzz are antianalytic. It is this
Hamiltonian H, which itself is independent of
property of analyticity which makes CFTs tractable
which time slice is chosen, because of time
in two dimensions.
translational symmetry. It is also given by the
Since an infinitesimal conformal transformation
integral over the hypersurface of the timetime
z ! z (z) induces a change in the metric, its effect
component of the stress tensor. In CFT, because of
on a correlation function of primary fields, given by [1],
scale invariance, one may instead quantize on fixed
may also be expressed through an appropriate integral
circle of a given radius. The analog of the
involving an insertion of the stress tensor. This leads to
Hamiltonian is the dilatation operator D, which
the conformal Ward identity:
Z generates scale transformations. Unlike H, the
Y spectrum of D is usually discrete, even in an
hTz j zj ; zj i z dz
C j
infinite system. It may also be expressed as an
X D Y E integral over the radial component of the stress
hj 0 zj zj @=@zj j zj ; zj 6 tensor:
j j Z 2
where C is a contour encircling all the points {zj }. ^ 1
D rT ^ rr rd
 2 0
(A similar equation holds for the insertion of T.) Z Z
1 ^ 1 ^ zdz
Using Cauchys theorem, this determines the first zTzdz  zT
few terms in the OPE of T with any primary density: 2i C 2i C
L ^0 L^ 10
0
hj
Tz  j zj ; zj zj ; zj
z  zj 2 where, because of analyticity, C can be any contour
1 encircling the origin.
@z zj ; zj O1 7 This suggests that one define other operators
z  zj j Z
^ 1 ^
The other, regular, terms in the OPE generate new Ln  zn1 Tzdz 11
2 C
scaling fields, which are not in general primary,
and similarly the L ^ . From the OPE [8] then follows
called descendants. One way of defining a density to n
be primary is by the condition that the most singular the Virasoro algebra V:
term in its OPE with T is a double pole.
^ n; L
L ^ m  n  mL^ nm c nn2  1nm;0 12
The OPE of T with itself has the form 12
with an isomorphic algebra V  generated by the L ^
 n.
c=2 2
Tz  Tz1 4
Tz1    8 In radial quantization, there is a vacuum state j0i.
z  z1 z  z1 2
Acting on this with the operator corresponding to a
The first term is present because hT(z)T(z1 )i is scaling field gives a state jj i  ^j (0, 0)j0i which is
nonvanishing, and must take the form shown, with c an eigenstate of D: in fact,
being some number (which cannot be scaled to ^ 0 jj i hj jj i;
L ^ j i h
L j jj i 13
0 j
unity, since the normalization of T is fixed by its
definition) which is a property of the CFT. It is From the OPE [7], one sees that jLn j i / Ln jj i,
known as the conformal anomaly number or the and, if j is primary, Ln jj i = 0 for all n  1.
central charge. This term implies that T is not itself The states corresponding to a given primary field,
primary. In fact, under a finite conformal transfor- and those generated by acting on these with all the
mation z ! z0 = f (z), Ln with n < 0 an arbitrary number of times, form a
Boundary Conformal Field Theory 335

highest-weight representation of V. However, this is This is related to the (punctured) plane by the
not necessarily irreducible. There may be null conformal mapping z ! (1=2) ln z  t ix. The
vectors, which are linear combinations of states at result is a QFT on the circle 0
x < 1, in
a given level which are themselves annihilated by all imaginary time t. The generator of infinitesimal
the Ln with n > 0. They exist whenever h takes a time translations is related to that for dilatations in
value from the Kac table: the plane:

rm 1  sm2  1 ^ 2D
H ^  c
h hr;s 14 6
4mm 1
2L ^  c
^0 L 18
0
with the central charge parametrized as c = 1  6= 6
(m (m 1)), and r, s are non-negative integers. These where the last term comes from the Schwartzian
null states should be projected out, giving an derivative in [9]. Similarly, the generator of transla-
irreducible representation V h . tions in x, the total momentum operator, is
The full Hilbert space of the CFT is then ^ ).
P = 2(L0  L 0
M A general torus is, up to a scale transformation, a
H 
nh;h V h  V 15
h

parallelogram with vertices (0, 1,
, 1
) in the
h;h
complex plane, with the opposite edges identified.
where the non-negative integers nh, h specify how We can make this by taking a cylinder of unit
many distinct primary fields of weights (h, h) there circumference and length Im,
, twisting the ends by
are in the CFT. a relative amount Re
, and sewing them together.
The consistency of the OPE [3] with the existence This means that the partition function of the CFT on
of null vectors leads to the fusion algebra of the the torus can be written as
CFT. This applies separately to the holomorphic and ^ ^
antiholomorphic sectors, and determines how many Z
;
 tr eIm
HiIm
P
copies of V c occur in the fusion of V a and V b : ^ ^
tr qL0 c=24 q
L0 c=24 19
X
c
Va Vb Nab Vc 16 using the above expressions for H and P and
c
c
introducing q  e2i
.
where the Nab are non-negative integers. Through the decomposition [15] of H, the trace
A particularly important subset of all CFTs sum can be written as
consists of the minimal models. These have rational X
central charge c = 1  6(p  q)2 =pq, in which case Z
;
 nh;h h q h q 20
the fusion algebra closes with a finite number of 
h;h
possible values 1
r
q, 1
s
p in the Kac
where
formula [14]. For these models, the fusion algebra
^ X
takes the form h q  trV h qL0 c=24 dh Nqhc=24N 21
0 0 N
X
r1 r X
2 1 s1 s2 1

V r1 ;s1 V r2 ;s2 V r;s 17 is the character of the representation of highest weight
rjr1 r2 j sjs1 s2 j h, which counts the degeneracy dh (N) at level N. It is
where the prime on the sums indicates that they are purely an algebraic property of the Virasoro algebra,
to be restricted to the allowed intervals of r and s. and its explicit form is known in many cases.
There is an important theorem which states that All of this would be less interesting were it not
the only unitary CFTs with c < 1 are the mini- for the observation that the parametrization of the
mal models with p=q = (m 1)=m, where m is an torus through
is not unique. In fact, the
integer 3. transformations S :
! 1=
and T :
!
1
give the same torus (see Figure 1). Together, these

Modular Invariance

The fusion algebra limits which values of (h, h)
might appear in a consistent CFT, but not which 1/
ones actually occur, that is, the values of the nh, h .
This is answered by the requirement of modular
invariance on the torus. First consider the theory on 0 1 0 1
an infinitely long cylinder, of unit circumference. Figure 1 Two equivalent parametrizations of the same torus.
336 Boundary Conformal Field Theory

operations generate the modular group SL(2, Z), half plane. The conformal Ward identity, cf. [7],
and the partition function Z(
,
) should be now reads
invariant under them. T-invariance is simply imple- D Y E
mented by requiring that h  h is an integer, but Tz j zj ; zj
the S-invariance of the right-hand side of [20] j

places highly nontrivial constraints on the nh, h . X hj 1


That this can be satisfied at all relies on the 2
@z
j z  zj z  zj j
remarkable property of the characters that they !
transform linearly under S: j
h 1 DY E
X 0 2
@zj j zj ; zj 24
z  zj z  zj
h e2i=
Shh h0 e2i
22 j
h0
In radial quantization, in order that the Hilbert
This follows from applying the Poisson sum formula spaces defined on different hypersurfaces be equiva-
to the explicit expressions for the characters, which lent, one must choose semicircles centered on some
are related to Jacobi theta-functions. In many cases point on the boundary, conventionally the origin.
(e.g., the minimal models) this representation is The dilatation operator is now
finite dimensional, and the matrix S is symmetric Z Z
^ 1 ^ 1 ^ z dz
and orthogonal. This means that one can immedi- D zTzdz  zT 25
ately obtain a modular invariant partition function 2i S 2i S
by forming the diagonal sum where S is a semicircle. Using the conformal
X boundary condition, this can also be written as
Z h q h q
 23 Z
h
^ ^ 1 ^
D L0 zTz dz 26
so that nh, h = hh . However, because of various 2i C
symmetries of the characters, other modular invariants where C is a complete circle around the origin. As
are possible: for the minimal models (and some others) before, one may similarly define the Ln , and they
these have been classified. Because of an analogy of the satisfy a Virasoro algebra.
results with the classification of semisimple Lie Note that there is now only one Virasoro algebra.
algebras, the diagonal invariants are called the A-series. This is related to the fact that conformal mappings
which preserve the real axis correspond to real
Boundary CFT analytic functions. The eigenstates of L0 correspond
to boundary operators ^j (0) acting on the vacuum
In any field theory in a domain with a boundary, state j0i. It is well known that in a renormalizable
one needs to consider how to impose a set of QFT operators at the boundary require a different
consistent boundary conditions. Since CFT is for- renormalization from those in the bulk, and this will
mulated independently of a particular set of funda- in general lead to a different set of conformal
mental fields and a Lagrangian, this must be done in weights. It is one of the tasks of BCFT to determine
a more general manner. A natural requirement is these, for a given allowed boundary condition.
that the off-diagonal component Tk? of the stress However, there is one feature unique to boundary
tensor parallel/perpendicular to the boundary should CFT in two dimensions. Radial quantization also
vanish. This is called the conformal boundary makes sense, leading to the same form [26] for the
condition. If the boundary is parallel to the time dilation operator, if the boundary conditions on the
axis, it implies that there is no momentum flow negative and positive real axes are different. As far as
across the boundary. Moreover, it can be argued the structure of BCFT goes, correlation functions with
that, under the RG, any uniform boundary condi- this mixed boundary condition behave as though a
tion will flow into a conformally invariant one. For local scaling field were inserted at the origin. This has
a given bulk CFT, however, there may be many led to the term boundary condition changing (bcc)
possible distinct such boundary conditions, and it is operator, but it must be stressed that these are not
one task of BCFT to classify these. local operators in the conventional sense.
To begin with, take the domain to be the upper-
half plane, so that the boundary is the real axis. The
conformal boundary condition then implies that
 z) when z is on the real axis. This has the The Annulus Partition Function
T(z) = T(
immediate consequence that correlators of T  are Just as consideration of the partition function on the
those of T, analytically continued into the lower- torus illuminates the bulk operator content nh, h , it
Boundary Conformal Field Theory 337

Note that H is the same Hamiltonian that appears in


[18], and the boundary states lie in H, [15].
How are these boundary states to be character-
a b ized? Using the transformation law [9] the
conformal boundary condition applied to the
circle implies that Ln = L  n . This means that
any boundary state jBi lies in the subspace
1 satisfying
Figure 2 The annulus, with boundary conditions a and b on
either boundary. L ^ jBi
^ n jBi L 31
n

Moreover, because of the decomposition [15] of


turns out that consistency on the annulus helps H, jBi is also some linear superposition of states from
classify both the allowed boundary conditions, and Vh  V  . This condition can therefore be applied in
h
the boundary operator content. To this end, con- each subspace. Taking n = 0 in [31] constrains h = h.
sider a CFT in an annulus formed of a rectangle of For simplicity, consider only the diagonal CFTs with
unit width and height , with the top and bottom nh, h = h, h . It can then be shown that the solution
edges identified (see Figure 2). The boundary of [31] is unique and has the following form.
conditions on the left and right edges, labeled by The subspace at level N of V h has dimension
a, b, . . . , may be different. The partition function dh (N). Denote an orthonormal basis by jh, N ; ji,
with boundary conditions a and b on either edge is with 1
j
dh (N), and the same basis for V  h by
denoted by Zab (). jh, N ; ji. The solution to [31] in this subspace is
One way to compute this is by first considering then
the CFT on an infinitely long strip of unit width. X1 dX
h N

This is conformally related to the upper-half plane jhii  jh; N; ji  jh; N; ji 32
N0 j1
(with an insertion of bcc operators at 0 and 1 if
a 6 b) by the mapping z ! (1=) ln z. The gen- These are called Ishibashi states. Matrix elements of
erator of infinitesimal translations along the strip is the translation operator along the cylinder between
them are simple:
^ ab D
H ^  c=24 L
^ 0  c=24 27
^
hhh0 jeH= jhii
Thus, for the annulus, 0
1 dhX
X 0 N 1 dh N
X X
Zab  tr e
^ ab
 H
tr q
^ 0 c=24
L
28 hh0 ; N 0 ; j0 j
N 0 0 j0 1 N0 j1

with q  e . As before, this can be decomposed ^ ^
 hh0 ; N 0 ; j0 je2=L0 L0 c=12 33
into characters:
X
Zab  nab
h h q 29 jh; N; ji  jh; N; ji
h
1 dX
X h N

but note that now the expression is linear. The non-  h0 h e4=hNc=24 34
N0 j1
negative integers nhab give the operator content with
the boundary conditions (ab): the lowest value of h
with nhab > 0 gives the conformal weight of the bcc h0 h h e4= 35
operator, and the others give conformal weights of
Note that the characters which appear are
the other allowed primary fields which may also sit
related to those in [29] by the modular transfor-
at this point.
mation S.
On the other hand, the annulus partition function
The physical boundary states satisfying [29],
may be viewed, up to an overall rescaling, as the
sometimes called the Cardy states, are linear
path integral for a CFT on a circle of unit
combinations of the Ishibashi states:
circumference, being propagated for (imaginary)
time 1 . From this point of view, the partition X
jai hhhjaijhii 36
function is no longer a trace, but rather the matrix h
element of eH= between boundary states:
^
Equating the two different expressions [29] and [30]
Zab  hajeH= jbi 30 for Zab , and using the modular transformation law
338 Boundary Conformal Field Theory

[22] and the linear independence of the characters from which one finds the allowed boundary states
gives the (equivalent) conditions:    
 
X ~ p1 j0ii p1 1 1  1
j0i 43
nhab Shh0 hajh0 iihhh0 jbi 37 2 2 2 2  16
1=4
h0
X  +    
1
~ 1  1 1  1
0
hajh0 iihhh0 jbi Shh nhab 38 1
 p j0ii p  1=4  44
h 2 2 2 2 2 16
These are called the Cardy conditions. The require-  +
ments that the right-hand side of [37] should give a 1  
~ 1
non-negative integer, and that the right-hand side of  j0ii   45
16 2
[38] should factorize in a and b, give highly
nontrivial constraints on the allowed boundary
states and their operator content. The nontrivial part of the fusion algebra of this
For the diagonal CFTs considered here (and for CFT is
the nondiagonal minimal models) a complete solu-
V 161 V 161 V 0 V 12 46
tion is possible. It can be shown that the elements Sh0
of S are all non-negative, so one may choose
~ = (Sh )1=2 . This defines a boundary state
hhhj0i V 1 V1 V 1 47
0 16 2 16

X 1=2
~ 
j0i Sh0 jhii 39 V1 V1 V0 48
2 2
h
from which can be read off the boundary operator
and a corresponding boundary condition such that
content
nh00 = h0 . Then, for each h0 6 0, one may define a
boundary state 1 1 1
nhh~ 1 n01~ 1~ n21~ 1~ n21~ ~
1
n116~ 1
~ 1 49
hhhjh~0 i  Sh0 =Sh 1=2
h 0 40 16 16 16 16 16 16 2 16

From [37], this gives nhh0 0


=  . For each allowed h0 The c = 12 CFT is known to describe the continuum limit
h0 h
in the torus partition function, there is therefore a of the critical Ising model, in which spins s = 1 are
boundary state jh~0 i satisfying the Cardy conditions. localized on the sites of a regular lattice. The above
However, there is a further requirement: boundary conditions may be interpreted as the con-
tinuum limit of the lattice boundary conditions s =1,
Shh0 Shh00 free and s = 1, respectively. Note there is a symmetry
nhh0 h00 41
Sh0 of the fusion rules which means that one could
equally well have inverted the ordering of this
should be a non-negative integer. Remarkably, this
correspondence.
combination of elements of S occurs in the Verlinde
formula, which follows from considering consis-
tency of the CFT on the torus. This states that the
right-hand side of [41] is equal to the fusion algebra Other Topics
coefficient Nhh0 h00 . Since these are non-negative
Boundary Entropy
integers, the consistency of the above ansatz for the
boundary states is consistent. The partition function on annulus of length L and
We conclude that, at least for the diagonal models, circumference  can be thought of as the quantum
there is a bijection between the allowed primary fields statistical mechanics partition function for a one-
in the bulk CFT and the allowed conformally invariant dimensional QFT in an interval of length L, at
boundary conditions. For the minimal models, with a temperature  1 . It is interesting to consider this
finite number of such primary fields, this correspon- in the thermodynamic limit when  = L= is large. In
dence has been followed through explicitly. that case, only the ground state of H contributes in
[30], giving
Example The simplest example is the diagonal c = 12
unitary CFT corresponding to m = 3. The allowed Zab L;  haj0ih0jbiecL=6 50
values of the conformal weights are h = 0, 12 , 16
1
, and
0 1 1 p
1 1 from which the free energy Fab = 1 ln Zab and
B 1
2 2 2 the entropy S ab = 2 (@Fab =@) can be obtained.
1 1 C
S@ 2 2  p
2A
42 The result is
p1  p1 0
2 2 S ab c=3L sa sb o1 51
Boundary Conformal Field Theory 339

where the first term is the usual extensive contribu- KacMoody algebras via the coset construction. The
tion. The other two pieces sa  ln (haj0i) and sb  classification of boundary conditions from this point
ln (hbj0i) may be identified as the boundary entropy of view is fruitful and also important for applica-
associated with the corresponding boundary states. tions, but is beyond the scope of this article.
A similar definition may be made in massive QFTs.
It is an unproven but well-verified conjecture that Stochastic Loewner Evolution
the boundary entropy is a nonincreasing function
In recent years, there has emerged a deep connection
along boundary RG flows, and is stationary only for
between BCFT and conformally invariant measures
conformal boundary states.
on curves in the plane which start at a boundary of a
BulkBoundary OPE domain. These arise naturally in the continuum limit
of certain statistical mechanics models. The measure
The boundary Ward identity [24] has the implica- is constructed dynamically as the curve is extended,
tion that, from the point of view of the dependence using a sequence of random conformal mappings
of its correlators on zj and zj , a primary field called stochastic Loewner evolution (SLE). In CFT,
j (zj , zj ) may be thought of as the product of two the point where the curve begins can be viewed as
local fields which are holomorphic functions of zj the insertion of a boundary operator. The require-
and zj , respectively. These will satisfy OPEs as jzj  ment that certain quantities should be conserved in
zj j ! 0, with the appearance of primary fields on the mean under the stochastic process is then equivalent
right-hand side being governed by the fusion rules. to this operator having a null state at level two.
These fields are localized on the real axis: they are Many of the standard results of CFT correspond to
the boundary operators. There is therefore a kind of an equivalent property of SLE.
bulkboundary OPE:
X 
j zj ; zj djk Im zj hj hj hk bk Re zj 52 Acknowledgments
k
This article was written while the author was a
where the sum on the right-hand side is, in principle,
member of the Institute for Advanced Study. He
over all the boundary fields consistent with the
thanks the School of Mathematics and the School of
boundary condition, and the coefficients djk are
Natural Sciences for their hospitality. The work was
analogous to the OPE coefficients in the bulk. As
supported by the Ellentuck Fund.
before, they are nonvanishing only if allowed by the
fusion algebra: a boundary field of conformal weight See also: Affine Quantum Groups; Eight Vertex and Hard
hk is allowed only if Nhhkh > 0. Hexagon Models; Indefinite Metric; Operator Product
j j
For example, in the c = 12 CFT, the bulk operator Expansion in Quantum Field Theory; Quantum Phase
1
with h = h = 16 goes over into the boundary opera- Transitions; Stochastic Loewner Evolutions; String Field
tor with h = 0, or that with h = 12 , depending on the Theory; Superstring Theories; Symmetries in Quantum
boundary condition. The bulk operator with Field Theory: Algebraic Aspects; Two-Dimensional
h = h = 12 , however, can only go over into the Conformal Field Theory and Vertex Operator Algebras.
identity boundary operator with h = 0 (or a descen-
dent thereof.)
The fusion rules also apply to the boundary Further Reading
operators themselves. The consistency of these with Affleck I (1997) Boundary condition changing operators in
bulkboundary and bulkbulk fusion rules, as well conformal field theory and condensed matter physics. Nuclear
as the modular properties of partition functions, was Physics B Proceedings Supplement 58: 35.
examined by Lewellen. Cardy J (1984) Conformal invariance and surface critical
behavior. Nuclear Physics B 240: 514532.
Cardy J (1989) Boundary conditions, fusion rules and the
Extended Algebras Verlinde formula. Nuclear Physics B 324: 581.
CFTs may contain other conserved currents apart di Francesco P, Mathieu P, and Senechal D (1999) Conformal
Field Theory. New York: Springer.
from the stress tensor, which generate algebras Kager W and Nienhuis B (2004) A guide to stochastic Loewner evolution
(KacMoody, superconformal, W-algebras) which and its applications. Journal of Statistical Physics 115: 1149.
extend the Virasoro algebra. In BCFT, in addition to Lawler G (2005) Conformally Invariant Processes in the Plane.
the conformal boundary condition, it is possible (but American Mathematical Society.
not necessary) to impose further boundary condi- Lewellen DC (1992) Sewing constraints for conformal field theories
on surfaces with boundaries. Nuclear Physics B 372: 654.
tions relating the holomorphic and antiholomorphic Petkova V and Zuber JB Conformal Boundary Conditions and What
parts of the other currents on the boundary. It is They Teach Us, Lectures given at the Summer School and
believed that all rational CFTs can be obtained from Conference on Nonperturbative Quantum Field Theoretic
340 Boundary Control Method and Inverse Problems of Wave Propagation

Methods and their Applications, August 2000, Budapest, Werner W Random Planar Curves and SchrammLoewner Evolu-
Hungary, hep-th/0103007. tions, Springer Lecture Notes (to appear), math.PR/0303354.
Verlinde E (1988) Fusion rules and modular transformations in
2D conformal field theory. Nuclear Physics B 300: 360.

Boundary Control Method and Inverse Problems of Wave


Propagation
M I Belishev, Petersburg Department of Steklov A point x 2  is said to belong to the set c0  if
Institute of Mathematics, St. Petersburg, Russia x is connected with  via more than one shortest
2006 Elsevier Ltd. All rights reserved. geodesic. The set c := c0 is called the separation set
(cut locus) of  with respect to . It is a closed set of
zero volume. Let
 ( ) be the length of the geodesic
Introduction emanating from 2  orthogonally to  and
connecting with c. The function
 () is continuous
Inverse problems are generally positioned as the on .
problems of determination of a system (its structure, For x 2  n c the pair ( ,
), such that
parameters, etc.) from its input ! output
= d(x, ) = d(x, ), constitutes the semigeodesic
correspondence. coordinates of x. The set of these coordinates
The boundary-value inverse problems deal with
systems which describe processes (wave, heat, electro-  : f ;
j 2 ; 0

<
 g   0; T 
magnetic ones, etc.) occurring in media occupying a
is called the pattern of . Pictorially, to get the
spatial domain. The process is initiated by a boundary
pattern, one needs to slit  along c and then pull it
source (input) and is described by a solution of a certain
on the cylinder   [0, T ]. The part T :=  \ ( 
partial differential equation in the domain. Certain
[0, T]) of the pattern consists of the semigeodesic
additional information about the solution, which can be
coordinates of the points x 2 hiT nc (Figure 1).
extracted from measurements on the boundary, plays
the role of the output. The objective is to determine the
parameters of the medium in particular, the coeffi- Dynamical System
cients in the equation from this information.
The boundary control (BC) method (Belishev Propagation of waves in the manifold is described by
1986) is an approach to the boundary-value inverse a dynamical system T of the form
problems based on their links with the control utt  g u h in   0; T 1
theory and system theory. The present article is a
version of the BC method which solves the problem
of reconstruction of a Riemannian manifold from its u jt0 ut jt0 0 in  2
boundary spectral or dynamical data.
uf on   0; T 3
Forward Problems where g is the BeltramiLaplace operator, 0<T
1,
f and h are the boundary and volume sources
Manifold
(controls), u = uf ,h (x,t) is the solution (wave).
Let (, d) be a smooth compact Riemannian manifold Set H := L2 (); the spaces of the controls are
with the boundary , dim   2; d is the distance
determined by the metric tensor g. For A  denote F T : L2   0; T; GT : L2 0; T; H

hAir : fx 2  j dx; A
rg; r0

T
the hypersurfaces  := {x 2  j d(x, ) = T}, T > 0
= T*
are equidistant to . In terms of the dynamics of
=T
the system, the value T
c
() T
T : minfT > 0 j hiT g max d ;  ()
x
*

 * T
=0

means the time needed for waves, moving from 
with the unit speed, to fill . Figure 1 Manifold and pattern. (Data from Belishev (1997).)
Boundary Control Method and Inverse Problems of Wave Propagation 341

The input 7! state map of the system T is The sets of waves


realized by the control operator W T :
T
U T : Wbd
T
F T ; U T! : Wvol
T
GT!
T T f ;h
F  G ! H; W ff ; hg : u  ; T
are said to be reachable at time t = T from  and !,
and its parts
respectively. Denoting
T
Wbd : F T ! H; T
Wvol : GT ! H
T HA : fy 2 H j supp y  Ag
Wbd f : uf ;0  ; T; Wvol
T
h : u0;h  ; T
by virtue of [6] one has the embeddings U T  HhiT
In the case f = 0 the evolution of the system is and U T!  Hh!iT . The property of the system T
governed by the operator L := g defined on the that plays the key role in inverse problems is that
Sobolev class H 2 () \ H01 () of functions vanishing these embeddings are dense:
on , and the semigroup representation
u0;h ; r Wvol
r
h cl U T HhiT ; cl U T! Hh!iT 7
Z r h i
L 1=2 sin r tL1=2 h; tdt 4 for any T > 0 (cl denotes the closure in H).
0 In control theory, relations [7] are interpreted as
holds for all r  0. an approximate controllability of the system in
The input 7! output map is implemented by the subdomains filled with waves; the name BC
response operator RT : F T ! F T , method is derived from the first one (boundary
controllability). This property means that the sets
RT f : @ u f ;0 on   0; T of waves are rich enough: any function supported
1 in the subdomain hiT reachable for waves excited
defined on controls f 2 H (  [0, T]) vanishing on
on  can be approximated with any precision in
  {t = 0}; here  = () is the outward normal to .
H-norm by the wave uf ,0 (  , T) due to appropriate
The normal derivative @ uf ,0 describes the forces
choice of the control f acting from . The proof of
appearing on  as a result of interaction of the wave
[7] relies on the fundamental HolmgrenJohnTataru
with the boundary.
T  unique continuation theorem for the wave equation
The map CT : F T ! F T , CT := (Wbd T
) Wbd , which
(Tataru 1993).
is called the connecting operator, can be represented
via the response operator of the system 2T :
Laplacian on Waves
CT 12ST  R2T J2T ST 5
If h = 0, so that the system is governed only by
ST : F T ! F 2T being the extension of controls from boundary controls, its trajectory {uf ,0 (  , t)j0  t  T}
  [0, T] onto   [0, 2T] as odd functions of t does not leave the reachable set U T . In this case, the
with respect to t = T, and J2T : F 2T ! F 2T being the system possesses one more intrinsic operator LT
integration which acts in the subspace cl U T and is introduced
Z t
through its graph
J2T f ; t f ; sds
0  
T T T 1
gr L : cl fWbd f ; Wbd ftt gj f 2 C0   0; T 8

Controllability
(closure in H  H). By virtue of the relation
Open subsets    and !   determine the LT Wbd T T
f = g Wbd f following from the wave
subspaces equation [1] and [6], the operator LT is interpreted
as Laplacian on waves filling the subdomain hiT .
F T : ff 2 F T j supp f    0; Tg
In the case T > T , one has hiT = , cl U T = H,
GT! : fh 2 GT j supp h  !  0; Tg and LT is a densely defined operator in H, satisfying
LT  L. Using [7], one proves the equality LT = L.
of controls acting from  and !, respectively. In view
This equality and representation [4] imply that
of hyperbolicity of the problem [1][3], the relation
Z r h i
supp u f ;h ; t  hit [ h!it ; t0 6 r
Wvol h LT 1=2 sin r tLT 1=2 h;tdt 9
0
holds for f 2 F T and h 2 GT! . This means that the
waves propagate in  with the speed = 1. for all r  0 and any fixed T > T  .
342 Boundary Control Method and Inverse Problems of Wave Propagation

Spectral Problem 1. Does the coincidence of the inverse data imply


the equivalence of the manifolds?
The Dirichlet homogeneous boundary-value pro-
2. Given the inverse data of an unknown manifold,
blem is to find nontrivial solutions of the system
how to construct a manifold possessing these
g  in  10 data?
The BC method gives an affirmative answer to the
0 on  11 first question and provides a procedure producing a
This problem is equivalent to the spectral analysis representative of the class of equivalent manifolds
of the operator L; it has the discrete spectrum from its inverse data. The method is based on the
{k }1 concepts of model and coordinatization.
k=1 , 0 < 1 < 2  , k ! 1; the eigenfunctions
{k }1
k=1 , Lk = k k , form an orthonormal basis
in H. Model
Expanding the solutions of the problem (1)(3)
over the eigenfunctions of the problem [10], [11] A pair consisting of an auxiliary Hilbert space H~
T
one derives the spectral representation of waves: and an operator Wbd : F T ! H~ is said to be a model
T T
of the system  , if Wbd is determined by inverse
X
1
T T
uf ;0 ; T Wbd
T
f f ; sTk F T k  12 data, and the map U : Wbd f 7! Wbd f is an isometry
T
T
from Ran Wbd  H onto Ran Wbd  H. ~ The model is
k1
an intermediate object in solving inverse problems. It
where plays the role of an auxiliary copy of the original
h i
1=2 1=2 dynamical system which an external observer can build
sTk ; t : k sin T tk @ k 
from measurements on the boundary. While the
Thus, for a given control f, the Fourier coefficients genuine wave process inside , initiated by a boundary
of the wave uf ,0 are determined by the spectrum control, remains unaccessible for direct measurements,
~
its H-representation can be visualized by means of the
{k }1 1
k = 1 and the derivatives {@ k }k = 1 . T
model control operator Wbd . This is illustrated by the
diagram on Figure 2, where the upper part is invisible
Inverse problems for an external observer, whereas the lower part can be
extracted from inverse data.
General Setup Each type of data determines a corresponding
The set of pairs  := {k ; @ k }1 model. The spectral model is the pair
k = 1 associated with
the problem [10], [11] is said to be the Dirichlet H~ : l2 ; ~ T : f; sT T g1
W 13
bd k F k1
spectral data of the manifold (, d). The spectral
(frequency domain) inverse problem is to recover the (see [12]); the role of isometry U is played by the
manifold from its spectral data. Fourier transform F : H ! H, ~ Fy := {(y,)H }1 . By
k=1
Since the speed of wave propagation is unity, the virtue of [4], the data  also determine the operator
r
response operator RT contains the information not ~ ! H,
W vol : L2 ([0, r]; H) ~
about the entire manifold but only about its part Z r h i
r ~ 1=2 t dt;
~ 1=2 sin r tL
hiT=2 . This fact is taken into account in the W vol L
0
dynamical (time domain) inverse problem which
aims to recover the manifold from the operator R2T r0 14
given for a fixed T > T .
If the manifolds (0 , d0 ) and (00 , d00 ) are isometric
via an isometry i : 0 ! 00 , then, identifying the H
boundaries by i()
, one gets two manifolds with
the common boundary  = @0 = @00 which possess
identical inverse data: 0 = 00 , R0 2T = R00 2T . Such T
W bd
manifolds are called equivalent: they are indistin- U
guishable for the external observer extracting  or
R2T from the boundary measurements. Therefore,
these data do not determine the manifold uniquely ~T
W bd
and both of the inverse problems need to be F T ~
H
clarified. The precise formulation is given in the
form of two questions: Figure 2 Model of a system. (Data from Belishev (1997).)
Boundary Control Method and Inverse Problems of Wave Propagation 343

where L := ULU = diag{k }1


k = 1 . Thus, the spectral
model allows one to see the Fourier images of
invisible waves. c
According to [5], the response operator R2T
determines the modulus of the control operator x (, )
x (, )
T T  T 1=2 T 1=2
jWbd j Wbd Wbd  C
which enters in the polar decomposition
T T
Wbd = jWbd j. Along with it, the response operator
determines the dynamical model


H~ : cl RanCT 1=2 ; ~ T : CT 1=2
W bd 15
Figure 3 The subdomains.
The correspondence system ! model is realized
T T
by the isometry U =  : Wbd f 7! jWbd jf . The opera-
T T 
tor L := UL U dual to the Laplacian on waves, is !" ;  : hi nhi "  \ h" i
determined by its graph
(shaded domain on Figure 3) shrinks to x(, ); if
~T
gr L  >  (), then the family terminates: !" (, ) = ; as
  " < "0 () (the case  =  0 in Figure 3). Such behavior
~ T f ; W
: cl fW ~ T ftt gj f 2 C1   0; T 16 of subdomains implies that
bd bd 0
D Er
T
(see [8]) and, therefore, L is also determined lim hi nhi "  \ h" i
"!0
by R2T . In the case T > T , the operator 
r
~ ! H~ dual to W r , is represented hx; ir ;    
Wvol : L2 ([0, r]; H) vol
;;  >   18
in the form
Z r h i
~ r
Wvol L ~ T 1=2 t dt;
~ T 1=2 sin r tL Step 2 (wave subspaces) Pass from the subdomains
0 to the corresponding subspaces Hhi , Hh" i ,
r0 17 Hh!" (, )ir , and represent them via reachable sets
by [7]:
in accordance with [9]. Thus, the dynamical model
visualizes the  -images of the waves propagating Hhi cl Wbd

F; Hh" i cl Wbd

F "
inside .
Hh!" ; ir cl Wvol
r
L2 0; r; H!" ; 

r
cl Wvol L2 0; r; Hhi
Wave Coordinatization 
In a general sense, a coordinatization is a corre- Hhi "  \ Hh" i
spondence between points x of the studied set A and 
r 
cl Wvol L2 0; r; cl Wbd F
elements x of another set A~ such that: (i) the

elements of A~ are accessible and distinguishable; (ii)  "  "
cl Wbd 
F  \ cl Wbd F "
the map x 7! x is a bijection; and (iii) relations
between elements of A~ determine those between Define
points of A which are studied (H Weyl). Coordina- 
tization enables one to study A via operations with W r; : lim cl Wvol
r 
L2 0; r; cl Wbd F
~ "!0
coordinates x 2 A. 
The external observer investigating the mani- cl Wbd  "  "
F  \ cl Wbd 
F " 19
fold probes  with waves initiated by sources on
. The relevant coordinatization of  described W r(,0) := W r(,0) , r  0 (the limits in the sense of the
below uses such waves and is implemented in strong operator convergence of the projections in H
three steps. on the corresponding subspaces). By the definitions,
Step 1 (subdomains) Let x(, ) be the end point of one has W r(,) = lim"!0 Hh!" (,)ir , whereas [18]
the geodesic of the length  > 0 emanating from  2  leads to the equality
in the direction (), and let "   be a small 
Hhx; ir ;    
neighborhood shrinking to  as " ! 0. If    (), W r; 20
then the family of subdomains f0g;  >  
344 Boundary Control Method and Inverse Problems of Wave Propagation

for all  2 ,   0, r  0. As a result, since any x 2  This sample is isometric to the original (, d) by
can be represented as x = x(, ), one attaches to every construction. Identifying properly the boundaries @ 
point of the manifold a family of expanding subspaces and , one turns (, d) into a canonical representa-
{W r(,) jr  0} built out of waves. As is seen from [20], tive of the class of equivalent manifolds possessing
the family is determined by the point x (not dependent the given inverse data.
on the representation x = x(,)); the subspaces which If the response operator R2T is given for a fixed
it consists of coincide with Hhxir . T < T , the above procedure produces the wave
Expressing the distance as copy of the submanifold (hiT , d). This locality in
time is an intrinsic feature and advantage of the BC
dx0 ; x00 2 inf fr > 0 j Hhx0 ir \ Hhx00 ir 6 f0gg method: longer time of observation on  increases
in accordance with [20], one can represent the depth of penetration into .

dx0 ; x00 Amplitude Formula


2 inf fr > 0 j W r0 ; 0 \ W r00 ; 00 6 f0gg 21
Another variant of the BC method is based on
where x0 = x( 0 ,  0 ), x00 = x( 00 ,  00 ), and hence find geometrical optics formulas describing the propaga-
the distance via the above families. tion of singularities of the waves.
Step 3 (wave copy) By varying  2 ,   0, Let y 2 H, and let  be the density of the volume
gather all nonzero families {W r(,) jr  0} =: x in the in semigeodesic coordinates: dx =  d d; the
set  = {x}. Redenoting W rx := W r(,) 2 x, endow the function
set with the distance 
1=2
~y;  :  ;  yx; ; ;  2 
~ x0 ; x
d~ ~00 : 2 inf fr > 0 j W rx~0 \ W rx~00 6 f0gg 22 0; otherwise

In view of [21], one has d(x0 , x00 ) = d(x0 , x00 ), so defined on   [0, T ] is called the image of y. The
that the metric space (, d) is an isometric copy amplitude formula represents the images of waves
of (, d) by construction. Thus, the correspondence initiated by boundary controls in the form
x 7! x (point 7! family) is an isometry and
satisfies the general principles (i)(iii) of u f ;0g
; T;  T 
lim Wbd T
I P Wbd f ; t
t!T  0
coordinatization. 0< <T
The manifold (, d) is the end product of the
wave coordinatization. It represents the original where I is the identity operator and P is the
manifold as a collection of infinitesimal sources projection in H onto clWbd 
F  . The formula is
interacting with each other via the waves which they derived by the ray method going back to
produce. J Hadamard, the derivation uses the controllability
[7].
Solving Inverse Problems Any model determines the right-hand side of the
T 
last relation by the isometry: (Wbd ) (I P )
The motivation for the above coordinatization is T T   T T
Wbd = (Wbd ) (I P )Wbd , where Wbd = UWbd T
, I is
that the wave copy can be reproduced via any   
the identity operator, and P = UP U is the projec-
model. Namely, the external observer with the 
tion in H~ onto cl Wbd F  . This leads to the
d)
knowledge of  or R2T (T > T ) can recover (, representation
up to isometry by the following procedure:
uf ;0g

; T;  ~ T  ~I P
lim W ~ W
~ T f ; t
1. Construct the model corresponding to the given t!T  0
bd bd

inverse data and determine the operators Wbd ,
0< <T 23
0    T by [13], [15]; then determine
T r
L, L , and Wvol by [14] or [16], [17]. and makes the amplitude formula a useful tool for
2. Replace on the right-hand side of [19] all solving the inverse problems. The external observer
operators W without tildes by the ones with can construct a model via inverse data and then
r
tildes, and get the subspaces W~ (,) = UW r(,) , visualize by [23] the wave images on the part T of
 2 ,   0, r  0. the pattern (see Figure 1). The collection of images
r g
3. Gather all nonzero families {W~ (,) jr  0} = : x in the uf ,0 corresponding to all possible controls f is rich

set  = {x} and redenote the subspaces as enough for recovering the tensor g on T (i.e., the
r r
~
W x := W~ (,) 2 x; endow the set with the metric metric tensor in semigeodesic coordinates) and
r r
d(x0 , x00 ):= 2 inf{r > 0 j W~ x0 \ W~ x00 6 {0}} (see [22]), turning the pattern into an isometric copy of the
and get a sample (, d) of the wave copy (, d). submanifold (hiT , d). This variant of the method is
Boundary Control Method and Inverse Problems of Wave Propagation 345

more appropriate if one needs to recover unknown framework of linear system theory (Belishev
coefficients of the wave equation in  it can be 2001). The method is also related to the problem
realized in terms of numerical algorithms. of triangular factorization of operators (Belishev
and Pushnitski 1996).
Numerical algorithms for solving two-dimensional
spectral and dynamical inverse problems for the wave
Extensions of the Method
equation
utt u = 0 which recover the variable
Electromagnetic waves are also well suited for density
have been developed and tested (Filippov,
coordinatization and for constructing the wave copy Gotlib, Ivanov, 19941999).
d). An appropriate version of the amplitude
(,
formula also exists for the system governed by the See also: Dynamical Systems and Thermodynamics;
Maxwell equations (see Further Reading). At present Geophysical Dynamics; Inverse Problem in Classical
(2004), the applicability of the BC method to three- Mechanics.
dimensional inverse problems of elasticity theory is
still an open question. The following hypothesis Further Reading
concerns the Lame system: the wave coordinatization
procedure (steps 13) using the elastic waves instead Belishev MI (1988) On an approach to multidimensional inverse
of the above uf ,0 , gives rise to the copy of   R3 problems for the wave equation. Soviet Mathematics. Doklady
36(3): 481484.
endowed with the metric jdxj2 =c2p where
p Belishev MI (1996) Canonical model of a dynamical system with
cp = ( 2 )=
is the speed of the pressure waves. boundary control in the inverse problem of heat conductivity.
The concept of model is used for solving inverse St. Petersburg Mathematical Journal 7(6): 869890.
problems for the heat and Schrodinger equations Belishev MI (1997) Boundary control in reconstruction of
manifolds and metrics. Inverse Problems 13(5): R1R45.
(Avdonin and Belishev, 19952004), as well as for
Belishev MI (2001) Dynamical systems with boundary control:
the problem of boundary data continuation models and characterization of inverse data. Inverse Problems
(Belishev 2001, Kurylev and Lassas 2002). A variant 17: 659682.
of the BC method allows one to recover not only the Belishev MI (2002) How to see waves under the Earth surface
manifold but also the Schrodinger type operators on (the BC-method for geophysicists). In: Kabanikhin SI and
it and/or the dissipative term in the scalar wave Romanov VG (eds.) Ill-Posed and Inverse Problems, pp. 6784.
Utrecht/Boston: VSP.
equation (Kurylev and Lassas 19932003). Belishev MI (2003) The Calderon problem for two-dimensional
An appropriate version of the amplitude formula manifolds by the BC-method. SIAM Journal of Mathematical
solves the inverse problem for one-dimensional two- Analysis 35(1): 172182.
velocity dynamical system which describes the waves Belishev MI (2004) Boundary spectral inverse problem on a class
consisting of two modes propagating with different of graphs (trees) by the BC-method. Inverse Problems
20(3): 647672.
speeds and interacting with each other (Belishev, Belishev MI and Glasman AK (2001) Dynamical inverse problem
Blagoveschenskii, Ivanov, 19972000). for the Maxwell system: recovering the velocity in the regular
One more variant of coordinatization going back zone (the BC-method). St. Petersburg Mathematical Journal
to the first paper on the BC method, associates with 12(2): 279319.
Belishev MI and Gotlib VYu (1999) Dynamical variant of the
points x 2  the Dirac measures x ; then, their
BC-method: theory and numerical testing. Journal of Inverse
images x are identified via suitable models. This and Ill-Posed Problems 7(3): 221240.
variant solves inverse problems on graphs and the Belishev MI, Isakov VM, Pestov LN, and Sharafutdinov VA
two-dimensional elliptic Calderon problem. The (2000) On reconstruction of metrics from external electro-
reader is referred to articles by the present author magnetic measurements. Russian Academy of Sciences.
listed in Further Reading. Doklady. Mathematics 61(3): 353356.
Belishev MI and Ivanov SA (2002) Characterization of data of
Within the scope of the method, one derives some dynamical inverse problem for two-velocity system. Journal of
natural analogs of the classical GelfandLevitan Mathematical Sciences 109(5): 18141834.
KreinMarchenko equations (Belishev, 19872001). Belishev MI and Lasiecka I (2002) The dynamical Lame system:
Also, an appropriate analog solves the kinematic regularity of solutions, boundary controllability and boundary
inverse problem for a class of two-dimensional data continuation. ESAIM COCV 8: 143167.
Katchalov A, Kurylev Y, and Lassas M (2001) Inverse Boundary
manifolds (Pestov 2004). Spectral Problems. Chapman and Hall/CRC Monographs and
There exists an abstract version of the Surveys in Pure and Applied Mathematics, vol. 123. Boca
approach, embedding the BC method into the Raton, FL: Chapman and Hall/CRC.
346 Boundary-Value Problems for Integrable Equations

Boundary-Value Problems for Integrable Equations


B Pelloni, University of Reading, UK DaveyStewartson I and II, and Kamdotsev
2006 Elsevier Ltd. All rights reserved.
Petviashvili I and II equations.
There is no universally accepted definition of an
integrable PDE, but on account of the above results,
Introduction the existence of a Lax pair can be taken as the
defining property of such equations. In the course of
Integrable equations are a special class of nonlinear the 1970s, the inverse scattering transform was
equations arising in the modeling of a wide variety applied to solve the initial-value (Cauchy) problem
of physical phenomena. It has been argued that for many integrable equations. In principle, there is
integrable PDEs are in a certain, specific sense no obstruction to solving analytically the initial-value
universal models for physical phenomena invol- problem by the inverse scattering transform as soon
ving weak nonlinearity. Indeed, integrable equations as a Lax pair is constructed for the equation, and
are obtained by a procedure involving rescaling and appropriate decaying initial conditions are pre-
an asymptotic expansion from very large classes of scribed. The solution is then characterized in
nonlinear evolution equations, which preserves terms of a certain integral equation. This approach
integrability while retaining in the limit weakly is equivalent to associating with the initial-value
nonlinear effects. For this reason, integrable equa- problem a classical problem in complex analysis,
tions are a very important class of PDEs. Important namely a matrix RiemannHilbert problem,
examples are the nonlinear Schrodinger (NLS) defined in the complex spectral space. This point
equation of view is currently taken by many authors as it
iqt qxx  2jqj2 q 0;  1 1 provides a unifying and very flexible framework for
the analysis.
the KortewegdeVries (KdV) equation After the success of the inverse scattering trans-
qt qx  qxxx 6qqx 0 2 form in solving the Cauchy problem, it was natural
to attempt to generalize the approach to boundary-
the modified KdV (mKdV) equation value problems. To describe the difficulties involved
in this generalization, consider the case of evolution
qt  qxxx  6q2 qx 0;  1 3
equations in one space and one time dimensions.
and the sine-Gordon (SG) equation in light-cone or The independent variables can be denoted by (x, t),
laboratory coordinates with t > 0 representing time. While the initial-value
problem is posed on the full real line, hence for
qxt sin q 0 or qtt  qxx sin q 0 4
x 2 (1, 1), the simplest boundary-value problem
A general method for solving the initial-value is posed on a half-line, for x 2 (0, 1). In addition
problem for integrable equations in one space to initial conditions for initial time t = 0, it is
dimension was discovered in 1967, when in a necessary to prescribe conditions at the boundary
pioneering and much celebrated work (Gardner x = 0. The number of conditions that must be
et al. 1967), the initial-value problems for KdV prescribed to obtain a problem which admits a
with decaying initial condition was completely unique solution depends on the particular equation,
solved. Soon afterwards, it was understood that but for evolution equation it is roughly equal to
this method, now known as the inverse scattering half the number of x-derivatives involved in the
transform, is of more general applicability. Indeed, equation. For example, for the NLS equation, a
it can be applied to those nonlinear equations that well-posed problem is defined as soon as one
can be written as the compatibility condition of a boundary condition at x = 0 is prescribed; hence a
pair of linear eigenvalue equations. The method of typical boundary-value problem for this equation is
solution for the Cauchy problem essentially relies on obtained, for example, when q(x, 0) = q0 (x) and
the possibility of expressing the equation through q(0, t) = g0 (t) are prescribed and compatible, so that
this pair, now called a Lax pair after the work of q0 (0) = g0 (0). It follows that, while qxx (0, t) can be
Lax (1968), who first clarified the connection. computed from the equation, qx (0, t) is not imme-
Zakharov and Shabat (1972) constructed such a diately known. An even more difficult situation
pair for the NLS equation, and in subsequent years arises for the KdV equation [2] (with the sign),
the Lax pairs associated with all important integr- for which a well-posed problem is again defined as
able equations in one and two spatial variables were soon as one boundary condition is prescribed, so
constructed. These include the NLS, sG, mKdV, that there are two unknown boundary values.
Boundary-Value Problems for Integrable Equations 347

Because of this simple fact, a straightforward Recently, Fokas (2000) introduced a general
application of the ideas of the inverse scattering methodology to extend the ideas of the inverse
transform immediately encounters one crucial diffi- scattering transform to boundary-value problems.
culty. This transform method yields an integral This methodology provides the tools to analyze
representation of the solution which involves not boundary-value problems for integrable equations to
only the given boundary conditions f (t), but also the a considerable degree of generality. We note as a
other unknown boundary values in our example side remark that linear PDEs are trivially integrable,
for the NLS equation, the function qx (0, t). The in the sense of admitting a Lax pair (in this case the
problem of characterizing these unknown boundary Lax pair can be found algorithmically, while the
values has impeded progress in this direction for over construction of the Lax pair associated with a
thirty years. nonlinear equation is by no means trivial). As a
On account of their physical significance, various consequence of this remark, the extension of the
boundary-value problems for the KdV equation have inverse scattering transform also provides a method
been considered, and classical PDE techniques (not for solving boundary-value problems for a large
specific to integrable models) have been used to variety of linear PDEs of mathematical physics.
establish existence and uniqueness results (Bona What follows is a general description of the
et al. 2001, Colin and Ghidaglia 2001, Colliander approach of Fokas, considering, for the sake of
and Kenig 2001). These approaches, and in parti- concreteness, the case of an integrable PDE in the
cular the approach of Colliander and Kenig, are two variables (x,t) which vary in the domain D
quite general and possibly of wide applicability, and (typically, for an evolution problem D = (0, 1)
give global existence results in wide functional (0, T)). We assume that q(x, t) denotes the unique
classes. However, they do not rely on integrability solution of a boundary-value problem posed for
properties. Indeed, none of these results use the such an equation.
integrable structure of the equation in any funda-
The method consists of the following steps.
mental or systematic way. However, the fact that
these equations are integrable on the full line implies 1. Write the PDE as the compatibility condition of a
very special properties that should be exploited in Lax pair. This is a pair of linear ODEs for the
the analysis and it is natural to try to generalize the function  = (x, t, k) involving the solution
inverse scattering transform approach. q(x, t) of the PDE, the derivatives of this solution,
Such a generalization is sometimes directly possi- and a complex parameter k, called the spectral
ble. For example, it has been used for studying the parameter. This can be done algorithmically for
problem on the half-line for the hyperbolic version linear PDEs, and in this case (x, t,k) is a scalar
of the sG equation [4a] which does not involve function. For nonlinear integrable PDEs, (x, t, k)
unknown boundary values (Fokas 2000, Pelloni). It is in general a matrix-valued function.
has also been used to study some specific boundary- The equivalence of the PDE with a Lax pair
value problems for the NLS equation, for example, can be reformulated in the language of differ-
for homogeneous Dirichlet or Neumann conditions, ential forms, and in this language it is easier to
when it is possible to use even or odd extensions of describe the methodology in general. Assume
the problem to the full line (Ablowitz and Segur then that (x, t, k) is a differential 1-form
1974), or more recently in Degasperis et al. (2001). expressed in terms of a function q(x, t) and its
In the latter case, however, the unknown boundary derivatives, and of a complex variable k, and one
values are characterized through an integral Fred- which is characterized by the property that
holm equation, which does not admit a unique d = 0 if and only if q(x, t) satisfies the given
solution. Some special cases of boundary-value PDE. The closure of the form  yields the two
problems for the KdV equation (Adler et al. 1997, important consequences 2(a) and 2(b) below.
Habibullin 1999) and elliptic sG (Sklyanin 1987) 2. (a) Since the domain D under consideration is
have also been studied via the inverse scattering simply connected, the closed form  is also exact;
transform. However all the examples considered are hence, it is possible to find the particular, 0-form
nongeneric, and it has recently been shown (Fokas, (x, t,k), solving d = . In particular, (x, t, k)
in press) that the boundary conditions chosen fall in can be chosen to be sectionally bounded with
the special class of the so-called linearizable respect to k by solving either a RiemannHilbert
boundary conditions, for which the problem can be problem or a d-bar problem in the complex
solved as if it were posed on the full line. One spectral k plane, and the solution (x, t, k) is
cannot hope to use similar methods to solve the then expressed in terms of certain spectral
problem with generic boundary conditions. functions depending on all the boundary values
348 Boundary-Value Problems for Integrable Equations

of the solution q(x, t) of the PDE. The function  


0 q
q(x, t) can then be expressed in terms of x ik3  Q; Q
q
 0 6
(x, t, k). (b) The integral of  along the
2 2
boundary of the domain D vanishes. This yields t 2ik 3  2kQ  iQx 3  ijqj 3 
an integral constraint between all boundary
values of the solution of the PDE, which The first step towards a systematic new approach to
becomes an algebraic constraint for the spectral solving boundary-value problem was the work of
functions. The resulting algebraic identity is Fokas and Its, who associated the boundary-value
called the global relation. problem for NLS on the half-line to a single
3. The last step is the analysis the k-invariance RiemannHilbert problem determined by both
properties of the global relation. This analysis equations in the Lax pair. The jump determining
yields the characterization of the spectral func- this RiemannHilbert problem has an explicit
tions in terms only of the given boundary exponential dependence on both x and t. This differs
conditions. from the classical inverse scattering approach, in
which the x-part of the Lax pair is used to determine
The crucial and most difficult step in the solution an x-transform with t-dependent scattering data,
process is the characterization described above. The and the t-part of the Lax pair is then exploited to
analysis required depends on the type of problem find the time evolution of these data. The work of
under consideration. For nonlinear integrable evolu- Fokas and Its led to the understanding that both
tion PDEs posed on the half-line x > 0, in general equations in the Lax pair [6] must be considered in
the characterization mentioned in step (3) involves order to construct a spectral transform appropriate
solving a system of nonlinear Volterra integral to solve boundary-value problems. Fokas (2000)
equations. This is an important difference from the reviews his systematic way to solve these problems
case of the Cauchy problem, where the solution is by performing the simultaneous spectral analysis of
given by a single integral equation where all the both equations in the Lax pair. The transform thus
terms are explicitly known. obtained, which is a nonlinearization of the Fourier
The method outlined above has been applied transform, precisely generalizes the inverse scatter-
successfully to solve a variety of boundary-value ing transform.
problems for linear and integrable nonlinear PDEs. This simultaneous analysis also leads naturally to
For concreteness, here the focus is on the important the identification of the global relation which
case of integrable evolution PDEs in one space, which holds between initial and boundary data, and which
illustrates clearly the generalities of this method. plays an essential role in deriving an expression for
the solution of the problem which does not involve
unknown boundary values.
Integrable Evolution Equations in One The RiemannHilbert problem with explicit (x, t)
Space Dimension dependence, the global relation, and the invariance
properties of the latter with respect to the spectral
The crucial property of integrable PDEs which is parameter are the fundamental ingredients of this
used in the inverse scattering transform approach to systematic approach to solve boundary-value pro-
solve the initial-value problem is the fact that they blems for integrable equations.
can be written as the compatibility of a Lax pair. The steps involved in this method are summar-
Many integrable evolution equations of physical ized in the introduction. While steps (1) and (2)
significance (such as NLS, KdV, sG, and mKdV) can be described generally, and, once the Lax pair
admit a Lax pair of the form is identified, can be performed algorithmically (at
least under the assumption that the solution of the
x if1 k3  Qx; t; k PDE exists), the last step is the most difficult part
5
~
t if2 k3  Qx; t; k of the analysis, and it needs to be considered
separately for each given problem. However, it is
where (x, t, k) is a 2  2 matrix, 3 = diag(1,  1), this step that yields the effective characterization
fi (k), i = 1, 2, are analytic functions of the complex of the solution.
parameter k, and Q, Q ~ are analytic functions of k, The results obtained for the particular case of eqn
of the function q(x, t) (and of its complex conjugate [1] are reviewed in detail in the next section, as they
q(x, t) for complex-valued problems) and of its provide an important example, which can be
derivatives. For example, the NLS equation [1] is generalized without any conceptual difficulty to
equivalent to the compatibility condition of the pair eqns [2][4].
Boundary-Value Problems for Integrable Equations 349

The NLS Equation where D denotes the first quadrant of the


complex k-plane:
As already mentioned, the initial-value problem for
NLS was solved, for decaying initial condition, by D fkjRe k > 0; Im k > 0g
Zakharov and Shabat, and studied in depth by many  denotes the closure of D, and c(t, k) is a
D
others. However, by the mid-1990s only a handful
function of k analytic in D and of order O(1=k)
of papers had been written on the solution of the
as k ! 1. The spectral functions are defined by
boundary-value problem posed on the half-line, all
on a specific example or aspect of the problem, or 2

At; k e2ik t 2 t; k;
attempts at solving the problem using general PDE 2ik2 t
9
techniques. Bt; k e 1 t; k
For this equation, the approach of Fokas yields where the vector (t, k) with components 1 and
the following results. Let the complex-valued 2 is the following solution of the t-problem of
function q(x, t) satisfy the NLS equation [1], for the associated Lax pair evaluated at x = 0:
x > 0 and t > 0, for prescribed one initial and one
boundary conditions. For the sake of concreteness, ~
t 2ik2 3  Q0; t; k
we select the specific initial and boundary 0 < t < T; k2C
conditions  
0
0; k
qx; 0 q0 x 2 SR 1 10
q0; t g0 t 2 SR 7 ~
Q0; t; k
!
q0 0 g0 0  jg0 tj2 2kg0 t ig1 t
where S denotes the space of Schwartz functions 2kg0 t  ig1 t jg0 tj2
(similar results hold for different choices of bound-
ary conditions, and less restrictive function classes).  Given a(k), b(k) and A(k), B(k), define a 2  2
The solution of this initial boundary-value (IBV) matrix RiemannHilbert problem. This problem
problem can be constructed as follows (Fokas 2000, has the distinctive feature that its jump has
2002; in press): explicit (x, t) dependence in the exponential
form of exp {ikx 2ik2 t}. Determine q(x, t) in
 Given q0 (x) construct the spectral functions
terms of the solution of this RiemannHilbert
{a(k), b(k)}. These functions are defined by
problem by using the fact that these functions
ak 2 0; k; bk 1 0; k are related by the Lax pair. Then the function
q(x, t) solves the IBV problem [1][7] with
where the vector (x, k) with components 1 (x, k)
q(x, 0) = q0 (x), q(0, t) = g0 (t), and q0x (0, t) = g1 (t).
and 2 (x, k) is the following solution of the
x-problem of the associated Lax pair evaluated The above construction can be summarized in the
at t = 0: following theorem (Fokas 2002):

x ik3  Qx; 0; k; 0 < x < 1; Im k  0 Theorem 1 Consider the boundary-value problem
   for the NLS equation [1] determined by the conditions
0 [7]. Let a(k), b(k) be given by [8], and suppose that
x; k eikx o1 as x ! 1
1 there exists a function g1 (t) such that if A(k), B(k) are
 
0 q0 x defined by [9], then the global relation [8] holds.
Qx; 0; k Let M(x, t, k) be the solution of the 2  2
q0 x 0
RiemannHilbert problem with jump on the real
and imaginary axes given by
(3 and Q(x, t, k) are defined after eqns [5] and [6],
respectively).  M (x, t, k) = M (x, t, k)J(x, t, k) with M = M in
 Given q0 (x) and g0 (t) characterize g1 (t) by the the second and fourth quadrants of C, M = M in the
requirement that the spectral functions first and third quadrants of C, and J(x, t, k) is defined
2
{A(t, k), B(t, k)} satisfy the global relation in terms of a, b, A, B and the exponential eikx2ik t :
 M = I O(1=k) as k ! 1 and has appropriate
2 ct; k
Bt; k  RkAt; k e4ik t
residue conditions if there are poles
ak Then M(x,t,k) exists and is unique, and
8
bk 
Rk ; t 2 0; T; k 2 D qx; t 2i lim kMx; t; k12
ak k!1
350 Boundary-Value Problems for Integrable Equations

The result above relies on characterizing the representation has now been derived for all equations
unknown boundary value g1 (t) a priori by requiring [1][3], see Fokas (in press).
that the global relation hold. Recently, substantial The analysis of the invariance properties of the
progress has been made in this direction in the case of global relation with respect to k also yields the
integrable nonlinear evolution equations, in particu- characterization of all the boundary conditions for
lar of NLS. Namely Fokas (in press) contains an which the transform obtained to represent the solution
effective description of the map assigning to each linearizes. For these boundary conditions, called
given q(x, 0) = q0 (x) and g0 (t) = q(0, t) a unique value linearizable, the solution can be represented as
for qx (0, t) (called the Dirichlet to Neumann map) for effectively as for the Cauchy problem. For example,
the NLS, as well as for a version of the Korteweg the linearizable boundary conditions for the NLS
deVries and sG equations. We state below the equation are given by any boundary values that satisfy
relevant theorem for the case of the NLS equation.
Theorem 2 Let q(x, t) satisfy the NLS equation on g0 tg1 t  g0 tg1 t 0
the half-line 0 < x < 1, t > 0 with the initial and
An example of boundary condition satisfying
boundary conditions [7]. Then g1 (t) := qx (0, t) is
this constraint, encompassing also Dirichlet and
given by
Neumann homogeneous conditions, is q(0, t) 
Z qx (0, t) = 0, with  a non-negative constant.
g0 t 2
g1 t e2ik t 2 t;k2 t;kdk As mentioned at the beginning of the previous
 @D
Z section, the approach described in general can be
4i 2

e2ik t kRk2 t; kdk used to obtain results similar to those given for the
 @D
Z NLS equation for many other integrable evolution
2i 2
equations, in particular, mKdV (Boutet de Monvel
e2ik t k1 t;k1 t;k ig0 tdk
 @D et al. 2004), sG, and KdV (Fokas 2002). The results
obtained are essentially the same as for NLS,
with =(1 ,2 ) given by the solution of [10]. The
starting from the general form [5] of the Lax pair,
Neumann datum g1 (t) is unique and exists globally
and include the derivation of the solution representa-
in t.
tion, the complete characterization of linearizable
This result yields a rigorous proof of the global boundary conditions, and the analysis of the Dirichlet
existence of the solution of boundary-value pro- to Neumann map.
blems on the half-line for the NLS equation. There- The approach above can also be used for studying
fore, the assumption in Theorem 1 that a suitable boundary-value problems posed on finite domains,
function g1 (t) exists can be dropped. for x 2 [0, 1]. This has been done for a model for
transient simulated Raman scattering (Fokas and
Menyuk 1999), for the sG equation in light-cone
coordinates (Pelloni, in press), and for the NLS
Generalizations and Summary of Results
equation (Fokas and Its 2004). In this case also the
Results analogous to the ones presented in the method yields a representation of the solution which
previous section can be phrased exclusively in terms is suitable for asymptotic analysis. In this respect,
of integral equations rather than in terms of the question of soliton generation from boundary
RiemannHilbert problems, as done for example in data is of some importance, and has been recently
Khruslov and Kotlyarov (2003). This is the point of considered by various authors (Fokas and Menyuk
view of the school of Gelfand and Marchenko, and in 1999, Boutet de Monvel and Kotlyarov 2003,
this setting the functions  are given in the so-called Pelloni in press, Boutet de Monvel et al. 2004).
GelfandLevitanMarchenko representation. Results The results are however still considered case by case,
on boundary-value problems for the NLS equation and there is no general framework for this problem
using this representation have been obtained only identified yet. For problem on the half-line, solitons
under additional assumptions on the unknown part may be generated but not necessarily in correspon-
of the boundary values. It was only after the idea that dence to the singularities that generate soliton for
the x- and t-parts of the spectral equations should be the full line problem, even when the same singula-
treated simultaneously that this approach yielded rities are present. For problems posed on finite
complete results. However, the GelfandLevitan domains, in some specific cases at least for the
Marchenko representation yields a crucial simplifica- simulated Raman scattering, and the sG equations,
tion for deriving the explicit form of the Dirichlet to it appears that the dominant asymptotic behavior is
Neumann map and proving Theorem 2. This given by a similarity solution.
Braided and Modular Tensor Categories 351

In conclusion, the extension of the inverse scattering Colliander JE and Kenig CE (2001) The generalized Korteweg
transform given by Fokas provides the tool for analyzing deVries equation on the half line (http://arxiv.org/abs/
math.AP/0111294).
boundary-value problems specific to nonlinear integr- Degasperis A, Manakov S, and Santini PM (2001) The nonlinear
able equations. This tool relies, in an essential way, on Schrodinger equation on the half line. JETP Letters
the integrability structure of the problem, and yields a 74(10): 481485.
full characterization of the solution as well as uniqueness Fokas AS (2000) On the integrability of linear and nonlinear
and existence results. The solution representation thus PDEs. Journal of Mathematical Physics 41: 4188.
Fokas AS (2002) Integrable nonlinear evolution equations on the half
obtained is not always fully explicit, but it is always line. Communications in Mathematical Physics 230: 139.
suitable for asymptotic analysis using standard techni- Fokas AS (2005) A generalised Dirichlet to Neumann map for
ques such as the recent nonlinearization of the classical certain nonlinear evolution PDEs. Communications on Pure
steepest descent method. and Applied Mathematics 58: 639670.
Fokas AS and Its AR (2004) The nonlinear Schrodinger equation
See also: @ Approach to Integrable Systems; Integrable on the interval. Journal of Physics A: Mathematical and
General 37: 60916114.
Discrete Systems; Integrable Systems and the Inverse
Fokas AS and Menyuk CR (1999) Integrability and self-similarity
Scattering Method; Integrable Systems: Overview;
in transient stimulated Raman scattering. Journal of Nonlinear
Nonlinear Schrodinger Equations; RiemannHilbert Science 9: 131.
Methods in Integrable Systems; Separation of Variables Gardner GS, Greene JM, Kruskal MD, and Miura RM (1967)
for Differential Equations; Sine-Gordon Equation. Method for solving the Kortewegde Vries equation. Physical
Review Letters 19: 1095.
Habibullin IT (1999) KdV equation on a half-line with the zero
Further Reading boundary condition. Theoretical and Mathematical Fizika
119: 397.
Ablowitz MJ and Segur HJ (1974) The inverse scattering Khruslov E and Kotlyarov VP (2003) Generation of asymptotic
transform: semi-infinite interval. Journal of Mathematical solitons in an integrable model of stimulated Raman scattering by
Physics 16: 1054. periodic boundary data. Mat. Fiz. Anal. Geom. 10(3): 366384.
Adler VE, Gurel B, Gurses M, and Habibullin IT (1997) Journal Lax PD (1968) Integrals of nonlinear equations of evolution and
of Physics A 30: 3505. solitary waves. Communications in Pure and Applied Mathe-
Bona J, Sun S, and Zhang BY (2001) A non-homogeneous boundary matics 21: 467490.
value problem for the KortewegdeVries equation. Transactions of Pelloni B (2005) The asymptotic behaviour of the solution of boundary
the American Mathematical Society 354: 427490. value problems for the SineGordon equation on a finite interval.
Boutet de Monvel A, Fokas AS, and Shepelsky D (2004) The Journal of Nonlinear Mathematical Physics 12: 518529.
modified KdV equation on the half-line. Journal of the Sklyanin EK (1987) Boundary conditions for integrable equations.
Institute of Mathematics of Jussieu 3: 139164. Functional Analysis and its Applications 21: 8687.
Boutet de Monvel A and Kotlyarov VP (2003) Generation of Zakharov VE and Shabat AB (1972) An exact theory of two-
asymptotic solitons of the nonlinear Schrodinger equation by dimensional self-focusing and one-dimensional automodula-
boundary data. Journal of Mathematical Physics 44: 31853215. tion of waves in a nonlinear medium. Soviet Physics JEPT
Colin T and Ghidaglia J-M (2001) An initial-boundary value problem 34: 6278.
for the KortewegdeVries equation posed on a finite interval.
Advanced Differential Equations 6(12): 14631492.

Braided and Modular Tensor Categories


V Lyubashenko, Institute of Mathematics, Kyiv, isomorphic; moreover, the permutation isomorphism
Ukraine (the twist) c : X Y 7! Y X, x y ! y x, is
2006 Elsevier Ltd. All rights reserved. involutive, c2 = idX Y . Next examples of monoidal
categories were given by categories of representa-
tions of supergroups or Lie superalgebras. They are
also symmetric: now the symmetry (Koszuls rule)
Introduction
c : X Y ! Y X, x y 7! (1)deg x
deg y y x, is the
Tensor or monoidal categories are encountered in twist with a sign, which depends on the degree (or
various branches of modern mathematical physics. parity) deg x of elements x 2 X.
First examples came without mentioning the name of a The development of the theory of exactly solvable
monoidal category as categories of modules over a models in statistical mechanics led Drinfeld (1987)
group or a Lie algebra. The operation of a monoidal to the notion of quantum groups Hopf algebras H
product in this case is the usual tensor product X C Y with additional structures (quasitriangular Hopf
of modules (representations) X and Y. These categories algebras). H-Modules also form a monoidal cate-
are symmetric: the modules X Y and Y X are gory; however, it is not symmetric, but only braided.
352 Braided and Modular Tensor Categories

It means that a canonical braiding isomorphism realized as categories of modules over weak Hopf
c : X  Y ! Y  X still exists, but it is not involutive algebras, but we stress again that the monoidal product
any more, c2 6 id. The braiding c satisfies the Yang for such modules does not coincide with the tensor
Baxter equation product of vector spaces. So, general features are better
seen at the level of category theory, and we now start
c  11  cc  1 with precise definitions.
1  cc  11  c : X  Y  Z ! Z  Y  X

for any three H-modules X, Y, Z. Rigid Monoidal Categories


In the above examples, we also have an obvious
isomorphism of associativity a : X  (Y  Z)! We recall here the basic definitions of monoidal
(X  Y)  Z of the iterated tensor product. categories, monoidal functors, and dual objects.
There are, however, monoidal categories of Definition 1 A monoidal category (C, , a, 1, l, r) is
modules, where such an isomorphism is nontri- a category C, a functor  : C  C ! C (called the
vial, namely, modules over quasi-Hopf algebras. tensor product), a functorial isomorphism a : X 
These were introduced by Drinfeld (1989a, b) in (Y  Z) ! (X  Y)  Z, the associativity isomorph-
connection with the KnizhnikZamolodchikov equa- ism, a unit object 1, and two functorial isomorph-
tions. These nontrivial associativity isomorphisms isms l : 1  X ! X, r : X  1 ! X such that
a : X  (Y  Z) ! (X  Y)  Z are required to a a
satisfy the pentagon equation of Mac Lane and X  Y  Z  W ! X  Y  Z  W ! X  Y  Z  W
X  a# "a  W
Stasheff.
a
Braided monoidal categories also arise in rational X  Y  Z  W X  Y  Z  W

^
conformal field theories (RCFTs), integrable models commutes (the pentagon equation) and
of statistical mechanics and topological quantum  
field theories (TQFTs). The common feature of Xl Y r 1
X
Y
aX;1;Y X  1  Y ! X  Y ! X  1  Y
these categories is that they are semisimple abelian
with finite number of simple modules. In other
words, such a category C is equivalent to the category Definition 2 A monoidal functor (F, , f ) : (C, ) !
of finite-dimensional Cn = C      C-modules for (D,  ) is a functor F : C ! D, a functorial isomorph-
some n. However, not monoidally equivalent, the ism  = X, Y : F(X)  F(Y) ! F(X  Y) 2 D, and an
monoidal structure can be rather involved. For isomorphism f : 1 ! F1 2 D such that
instance, from the Ising model one can obtain the 1 
monoidal category with two simple objects I and X, FX  FY  FZ ! FX  FY  Z ! FX  Y  Z
which obey the monoidal law 1  1 = 1, 1  X = X  a# #Fa
1 = X, X  X = 1  X. Clearly, such relations cannot 1 
be satisfied by finite-dimensional C-vector spaces 1 FX  FY  FZ ! FX  Y  Z ! FX  Y  Z
and X, if  would mean the usual tensor product C  
F1  FX ! F1  X FX  F1 ! FX  1
of C-vector spaces. However, here  means simply a
functor  : C  C ! C with certain properties. Cate- f 1 " # F l; 1f " #Fr
gories which come from RCFT, integrable models or 1  FX l FX FX  1 r FX
^

TQFT often enjoy additional properties. They are


rigid for each object X, there exists a dual object commute. A morphism of monoidal functors
X_ . They are ribbon (balanced) there is a canonical  : (F, , f ) ! (G, , g) is a functorial morphism
endomorphism vX : X ! X for each object X, which  : F ! G such that
is related to the braiding. They are modular, which is 
defined as nondegeneracy of a certain matrix. The FX  FY ! FX  Y
meaning of modularity is that the ribbon category is
suitable for producing a TQFT out of it.
 # #
For categories equivalent to the category of
GX  GY ! GX  Y
C      C-modules, the ribbon (braided) monoidal
structure can be specified by a finite number of complex f 
g 1 ! F1 ! G1
matrices. For instance, 6j-symbols or q-6j-symbols
encode the associativity isomorphism. In this form, The f datum of a monoidal functor (F, , f ) is
modular categories appeared in the work of Moore and uniquely determined by the (F, ) data, so we can
Seiberg (1989) on RCFTs. Such categories can be denote a monoidal functor as (F, ) or even F.
Braided and Modular Tensor Categories 353

The coherence theorem of Mac Lane (1963) states X


that any monoidal category C is equivalent to a
A morphism f: X Y by f
strictly monoidal category, in which X  (Y  Z) =
(X  Y)  Z, 1  X = X = X  1, and the isomorph- Y
isms a, l, r are identity isomorphisms. Thus, in X Y
The braiding cX,Y : X Y Y X by
theoretical constructions, one may ignore the associa-
tivity isomorphism. It is not always so in practice. For X Y
instance, working with quasi-Hopf algebras related The inverse braiding c 1 : X Y Y X by
with the KnizhnikZamolodchikov equation one
prefers to keep the original category, which is (a X X
The evaluation evX : X X 1 by
deformation of) the category of modules over a Lie
algebra, rather than to replace it with a strict monoidal
The coevaluation coevX : 1 X X by
category, that is not a category of modules any more. X X
Definition 3 A rigid category C is a monoidal Figure 1 Conventions for notation of morphisms from
category in which, to every object X 2 C, dual tangles.
objects X_ and _ X 2 C are assigned together with
morphisms of evaluation and coevaluation conventions are listed in Figure 1. The suggested
[ assignment of morphisms in C to elementary pictures
evX : X  X_ ! 1 X X_ extends to a unique functor  from the category of
[ C-colored tangles to the category C itself. With the
ev0X : _ X  X ! 1 _ X X
\ above interpretation, these tangles need not be
coevX : 1 ! X_  X X_ X oriented. We shall use the same notation for framed
\ tangles, and the framing will be within the plane.
coev0X : 1 ! X  _ X X _ X The maps Ob C ! Ob C, X 7! X_ , and X 7! _ X
extend to contravariant self-equivalences C ! C,
The evaluations and coevaluations are chosen such
f 7! f t , and f 7! t f . For given f, the morphisms f t
that the compositions
and t f can be defined, respectively, by the following
r 1 1coev a ev1 1
X! X1 ! XX_ X! XX_ X ! 1X! X pictures using the assignment from Figure 1:
11 coev0 1 a1 1ev0 r
X! 1X ! X _ XX! X_ XX ! X1! X Y Y
1 1
_ 1 _ coev1 _ _ a _ _ 1ev _ r _
X ! 1X ! X XX ! X XX ! X 1! X X
_ r 1 _ 1coev0 _ _ a _ _ ev0 1 _ 1 _
X! X1 ! XX X! XX X ! 1 X! X
ft = f
are all identity morphisms. Y

In a rigid monoidal category C, there is a pairing X X



X  Y  Y _  X_ ! X  Y  Y _ Y Y
ev
 X_ XevX_ X  1  X_ rX_ X  X_ ! 1 X
^

_ _
which induces an isomorphism jX, Y : Y  X ! (X  t
f = f
Y)_ , such that the above pairing coincides with
Y
_ _ 1j _ ev
X
X  Y  Y  X ! X  Y  X  Y ! 1 X
The equation
We have a monoidal self-equivalence of C,
 coevY _ _
coevXY 1 ! Y  Y Y  1  Y __ ; j2 : C; ; 1 ! C; ; 1; X 7! X__ ; f 7! f tt
 
Y _  X_  X  Y
1coevX 1 j j1t 1
j2 X;Y X__  Y __ ! Y _  X_ _ ! X  Y__

^

j 1

! X  Y_  X  Y
It is not always true that the two duals X_ and _ X
also holds. Similarly, there is an isomorphism are isomorphic. However, there are canonical
jX, Y : _ Y  _ X ! _ (X  Y). isomorphisms
Morphisms constructed from braidings and (co)-
evaluations are often described by tangles. The X ! _ X_ ; X ! _ X_
354 Braided and Modular Tensor Categories

We may replace the category C with an equivalent one, These are isomorphisms of monoidal functors
such that the above isomorphisms become identity (see [1])
morphisms, and the functors _ and _  are inverse to
each other. We shall assume this to simplify notations. u21 : Id; c2 ! __ ; j2
Finally, we denote the iterated duals by X(n_) = X__ u21 : Id; c2 ! __ ; j2
(n times) and X(n_) = __ X (n times) for n 0.
In particular, this implies the commutativity of the
diagram
Braided Categories XY c2
XY

^
Here we review the definitions of the braiding u21  u21 # #u21
isomorphism and further derived isomorphisms. Sev- j2
eral basic relations between them are listed. Two X__  Y __ ! X  Y__
important classes of examples of braided categories The square of the monoidal functor (__ , j2 ) is
are given by the categories of modules over quasitrian-
gular Hopf algebras and the categories of tangles. ____ ; j4 : C; ; 1 ! C; ; 1;
Definition 4 A braided category (C, c) is a monoidal X 7! X____ ; f 7! f tttt
category C equipped with a functorial isomorphism
where
c = cX, Y : X  Y ! Y  X the braiding, or the
 
commutativity isomorphism such that the two ____ ____ j2 __
jtt
__ __ 2 ____
j4X;Y X Y ! X  Y ! X  Y
hexagons commute,

X  Y  Z 1c
1 X  Z  Y ! X  Z  Y
a The natural isomorphism u40 = u21 u21 is, in fact, an
^

a # # c
1  1 isomorphism of monoidal functors u40 : (Id, id) !
c
1 a (____ , j4 ).
X  Y  Z ! Z  X  Y ! Z  X  Y

(one for c and one for c1 ).


Ribbon Categories
The graphical notation for the braiding and its
Now we define balancing and recall some properties
inverse is
of balanced (ribbon) categories.
X Y Definition 5 Let C be a rigid braided category.
c cX;Y : X  Y ! Y  X A balancing X : X ! X__ is an isomorphism of
Y X monoidal functors  : (Id, id, id) ! (__ , j2 , d2 ) such
that  2 = u40 and X
t 1
= X _ :X
___
! X_ . The cate-
X Y gory C equipped with a balancing is called
c balanced.
Y X We also use the notation u20 = . In any balanced
In a rigid braided category, we can define category, there exists a canonical ribbon twist v.
functorial isomorphisms using again the conventions A ribbon twist v = vX : X ! X, v : Id ! Id is a self-
from Figure 1: adjoint (vX_ = vtX ) automorphism of the identity
functor such that c2 = (v1 1
X  vY ) vXY . It can be
determined from the equations

u20 u21 v1 u21 v : X ! X__


u12 = , 2 =
u1
 1 u2 2
0 u1 v
1
u2
1 v : X !
__
X
X X In particular, its square is given by the canonical
isomorphism v2 = u2 2
1 u1 . Conversely, in any
X X rigid braided category with a ribbon twist (called
ribbon category) there exists a canonical balan-
u2
1
= , u2
1
= cing u20 given by the above formulas. Thus, ribbon
categories and balanced categories are synonyms.
In the case of X = 1, we have v1 = id1 .
Braided and Modular Tensor Categories 355

The following result can be used to simplify are very similar to those of usual Hopf algebras, for
notations: example, the antipode is antimultiplicative with
respect to the braiding (see, e.g., Majid (1993)).
Proposition 1 For any ribbon category C there exists
For Hopf algebras in rigid braided categories, there
a ribbon category D equivalent to C such that in it
exist integrals in a sense very much similar to the
(i) 1_ = 1; case of ordinary finite-dimensional Hopf algebras,
(ii) for any object X we have _ X = X_ , X__ = X, as shown by Bespalov et al. (2000).
and X = idX : X ! X__ X.
(iii) for any object X we have evX = ev0X_ : X 
X_ ! 1, and coevX = coev0X_ : 1 ! X_  X.
Modular Categories
In the category C = H-mod, where H is a ribbon
Assume that a braided rigid monoidal category C is
Hopf algebra, the equation X_ = _ X is not neces-
equivalent as a category (with monoidal structure
sarily satisfied. Nevertheless, X_ is canonically
ignored) to the category of finite-dimensional mod-
isomorphic to _ X. The same holds in any ribbon
ules over a finite-dimensional algebra. In particular,
category. We identify these objects via  = u20 :
_ C is abelian. Then there exists an object F in C,
X ! X_ . This allows us to use the right dual
equipped with a morphism iX : X  X_ ! F for each
objects in place of the left ones. In that role, the
X 2 Ob C, such that the diagram
right duals are equipped with the left evaluation
and coevaluation, called flipped evaluation and f Y _
X  Y_ Y  Y_

^
coevaluation, respectively:
Xf t # #iY
_ X_  _ __ ev
e :X  X
ev X  X ! 1 _ iX
^

XX F

^
g :1
coev coev X__  X_  1 X_ X  X_
^

is commutative for all morphisms f : X ! Y of C, and,


They are often denoted simply ev and coev and moreover, F is universal between objects with such
e and coev
should be replaced by ev g in applications. In properties. Here f t : Y _ ! X_ is the transpose of a
the context of Hopf algebra,  is given by the action morphism f : X ! Y. In other words, FRis a direct limit,
Z2C
of a group-like element introduced by Drinfeld. called the coend and denoted as F = Z  Z_ . It
can also be defined via an exact sequence
M M iZ
X  Y _ f Y Xf Z  Z_ ! F ! 0
_ t
^

Hopf Algebras in Braided Categories f :X!Y2C Z2C

Let C be a braided monoidal category. A Hopf It turns out that the coend F is a Hopf algebra in
algebra H in C is an object H 2 Ob C together with the braided category C, when it is equipped with the
an associative multiplication m : H  H ! H and an following operations. The comultiplication in F is
associative comultiplication  : H ! H  H, obeying uniquely determined by the equation
the bialgebra axiom  
iX 
 m 
 X  X_ !F ! F  F
H  H!H!H  H 
 X  X_ X  1  X_
H  H  H  H  H  H
^

XcoevX_ X  X_  X  X_
^

HcH
HHHH 
^

 iX iX FF
^

mm
HH
^

The counit in F is determined by the equation


Moreover, H has a unit  : 1 ! H, a counit " : H ! 1,    
iX " ev
an antipode  : H ! H, and the inverse antipode X  X_ ! F ! 1 X  X_ ! 1
 1 : H ! H. The defining relations for these are the
same as in the classical case. Notice, in particular, The multiplication m : F  F ! F is defined by the
that the unit is also a morphism. Associativity of following diagram:
multiplication, as well as coassociativity of comulti-
X X Y Y X  X_  Y  Y _ iX iY
plication, is formulated with the use of associativity FF
^

isomorphism (in the nonstrict case). m = and Xc # #


9 m
Hopf algebras in braided categories have also
X  Y  X  Y_ iXY F
^

been called braided groups. Their basic properties X Y Y X


356 Braided and Modular Tensor Categories

The unit is given by the morphism and  is universal between morphisms with such
property. By duality, the integral functional  : F ! 1
i1
 : 1 1  1_ ! F is also two sided. It satisfies
  1

The diagram corresponding to the antipode F ! F  F ! F  1 F
F : F ! F is given by   

F ! 1 ! F
F   
1
F ! F  F ! 1  F F

F = and is universal between morphisms with such property.


The integral element and the integral functional are
unique up to a multiplication by an element of AutC 1.
F

The structure of the coend F as a Hopf algebra can


also be found directly from its universal property, as
Semisimple Abelian Modular Categories
in Majid (1993). Reshetikhin and Turaev proposed to construct invari-
There is a pairing of Hopf algebras ! : F  F ! 1 in C: ants of 3-manifolds via quantum groups. More
F F precisely, they use certain abelian semisimple ribbon
categories obtained from quantum groups at roots of
unity as trace quotients. One can forget about the origin
= of these categories and work simply with semisimple
modular categories. We shall describe them as input
data for the modular functor construction.
It induces a homomorphism of Hopf algebras F ! F_ . Let C be a C-linear abelian semisimple modular
ribbon category. Assume that the number of
Definition 6 A ribbon category C, equivalent as isomorphism classes of simple objects is finite.
a category to the category of finite-dimensional Assume also that 1 is simple and for each simple
modules over a finite-dimensional algebra, is called object X the endomorphism algebra End X = C. We
modular if the pairing ! is nondegenerate, that is, denote by S = {Xi }i the list of (representatives of
the induced morphism F ! F_ is invertible. isomorphism classes of) all simple objects.
Examples of nonsemisimple modular categories Under these assumptions, many formulas simplify.
include C = H-mod, where H = uq (g ) is a finite- The coend F 2 C takes the form
M
dimensional algebra, quotient of the quantum F X  X_ 2 C
universal enveloping algebra Uq (g ), and q is a root X2S
of unity of odd degree. In these examples, the
coalgebra F identifies with the dual Hopf algebra Any morphism 1 ! F is a C-linear combination of the
H , but the multiplication in F differs from that of standard morphisms for X 2 S,
H . Explicit formula for the multiplication in F uses X
X
the R-matrix for H (see, e.g., Majid (1993)).
A definition of modularity for another type of u20
coev 1u20 i
categories (not necessarily abelian) was given by X : 1 ! X _ X! X  X_ ! F
iX
Turaev (1994).
When the category C is modular, the integrals for
the Hopf algebra F have especially simple properties. F
The integral element in F is two sided. It is a The morphisms X form a basis of the commu-
morphism  : 1 ! F such that tative algebra Inv F = HomC (1, F). The Grothen-
  dieck ring of the category C determines the
1 m
F F  1 ! F  F ! F multiplication law in Inv F via the algebra
 " 
 isomorphism C Z K0 (C) ! Inv F, [X] 7! X .
F ! 1 ! F Any morphism F ! 1 can be represented as a
  linear combination of the morphisms
1 m
F 1  F ! F  F ! F prX evX
X : F ! X  X_ ! 1
Braided and Modular Tensor Categories 357

where X 2 S. The functional 1 : F ! 1 satisfies the sX1 sX;YZ


properties of a two-sided integral  of the braided
Hopf algebra F. X Y
u02
The Verlinde Formula
Y
The number X X
u02
= u20
X X
dimq X u20 Z
u02
coev 1u20 ev X
: 1 ! X_  X! X_  X__ !1 Z
is called the dimension of an object X 2 Ob C. (The
index q reminds us that this number coincides with
the q-dimension in the case C = Uq (g )-mod.) We
have dimq (X_ ) = dimq (X). X Y X Z
u02 u02
Definition 7 Introduce a biadditive function of two
variables s : Ob C  Ob C ! C on the class of objects of C: = Y Z
Y
X Y u02 u02
u2
0 u02 X X
XY =


X sXY sXZ

In particular, its restriction to S is a matrix sjS : S  This proves the second formula. &
S ! C, denoted again by s = (sXY )X, Y2S by abuse of Proposition 2 (Criterion of modularity) In the
notation; here X and Y run over simple objects. above assumption of semisimplicity, the following
Notice that sXY = sYX , so the matrix s is symmetric. conditions are equivalent:
Let us consider the C-algebra Inv F = HomC (1, F). It has (i) C is modular (! is nondegenerate);
the basis X , X 2 S; hence, it is n-dimensional, where (ii) the matrix (sXY )X, Y2S is nondegenerate;
n = Card S. The form ! on F induces a bilinear form (iii) for any X 2 S its dimension dimq X does not
 vanish, and there exist numbers 0Y , Y 2 S, such
! 0 : Inv F  Inv F ! Hom1; F  F Hom1;! 1 P
^

that for all X 2 S we have Y2S sXY 0Y = X1 ; and


The matrix (sXY ) is the matrix of the form !0 in the (iv) for
P each simple X 6 1 we have
basis (X ). s
Y2S XY dim q Y = 0 and dim q X
6 0.
Lemma 1 (The Verlinde formula) For any simple The easy implication (ii) ) (iii) can be deduced
X 2 S and any objects Y and Z of C, we have from the Verlinde formula. If the dimension
dimq (X) = sX1 of a simple object X vanishes, then
sX1 dimq X; sX1 sX;YZ sXY sXZ 2
s2XY = 0 for all Y 2 Ob C. This contradicts to the
Proof The first formula is straightforward. Since assumption of nondegeneracy of (sXY ).
Let us determine the coefficients Y of the integral
X Y
element
Y X
u02  Y Y : 1 ! F
 End X C Y2S

of the Hopf algebra F. It also has a two-sided


integral-functional  : F ! 1. The corresponding
endomorphism is
is a number, we can move it from the second factor   Z

~Z Z ! F  Z ! 1  Z Z
Z

to the first in the following computation:


358 Braided and Modular Tensor Categories

for an arbitrary object Z of C, where Z is the Multiplying both sides of [7] with 1 , we find
natural coaction. The equation
Y 1  dimq Y
X X X X The normalization is fixed by eqn [6], which we can
Y Y write as

= XY 3 Y
X Y
1 1  1 Y u20
Y2S
Y Y Y Y
follows from the properties of the two-sided integral X
21 dimq Y2
 of the Hopf algebra F. Due to uniqueness of Y2S
integrals,  is proportional to 1 . In eqn [3], X and
Y vary over S. The right-hand side is the identity Hence,
morphism if X = Y, and vanishes otherwise. Sub- !1
X 2
stituting the definition of Y , we rewrite the 2
1 dimq Y 8
equation as follows: Y2S

X Y X Y So, we find 1 , unique up to a sign.

u02
~ Conjugation Properties
y = XY 4
From the Verlinde formula [2], we conclude that
the commutative C-algebra Inv F possesses
X Y X Y homomorphisms
For X = 1, we get X : Inv F ! C
Y  ~Y 1Y  idY : Y ! Y 5 Y 7! dimq X1 sXY sXY =sX1
If Y 6 1, then ~Y = 0. So [5] tells essentially that The matrix s is invertible, so that its columns cannot
be proportional. Hence, all X are different char-
1  ~1 id1 : 1 ! 1 6 acters. Their number is n = Card S = dimC F; hence,
Now return to [4] with X = Y. If we compose that there is an isomorphism of C-algebras
equation with coev : 1 ! Y _  Y, we obtain
: Inv F ! C      C Cn
X X  7! 1 ; . . . ; n 
Now we show that the dimensions dimq (Y) are
y . ~n = y
~
real numbers, so that 1 is also a real number. One
can introduce in Inv F an antilinear involution,
Y Y Y Y  : Inv F ! Inv F; X X_
and a scalar (Hermitian) product
Y u02 Y X jY XY ; X; Y 2 S
7 Then Inv F becomes a finite-dimensional commu-
=
tative Hilbert algebra. Indeed,
Y Y X Y jZ dim HomX  Y; Z
dim HomX; Y _  Z X j Y Z
From the theory of finite-dimensional commutative
= dimqY Hilbert algebras, we know that idempotents in the
algebra Inv F are self-adjoint (only in that case the
scalar product can be positive definite). Hence, is
Y Y a -morphism, that is, X ( ) = X (). Therefore,
Braided and Modular Tensor Categories 359

sXY _ =sX1 = sXY =sX1 . In the particular case of X = 1, the constructions due to Kerler and Lyubashenko
we obtain (2001) takes a nonsemisimple modular category as an
input and assigns to it a double TQFT functor, that is,
dimq Y dimq Y _ s1Y _ s1Y dimq Y a functor between double categories. The target is the
since s11 = 1. This proves that for any Y 2 C its 2-category of abelian categories.
dimension dimq (Y) is a real number.
See also: Axiomatic Approach to Topological Quantum
It is natural to take for 1 the positive root of the
Field Theory; Hopf Algebras and q-Deformation Quantum
right-hand side of [8]. Positiveness fixes 1 uniquely.
Groups; The Jones Polynomial; Knot Invariants and
Quantum Gravity; Quantum 3-Manifold Invariants;
Examples of Semisimple Modular Categories
Symmetries in Quantum Field Theory of Lower
In their original paper, Reshetikhin and Turaev Spacetime Dimensions; Topological Quantum Field
(1991) use as algebraic input data the representation Theory: Overview; von Neumann Algebras: Introduction,
theory of the quantum deformation U = Uq (sl2 ) of Modular Theory, and Classification Theory; von
the Lie algebra sl(2, C), where q is a root of unity. Neumann Algebras: Subfactor Theory.
They construct the invariant as a trace over
U-equivariant morphisms, and prove the necessary
modularity condition concerning the nondegeneracy Further Reading
of the braided pairing. Bakalov B and Kirillov A Jr. (2001) Lectures on Tensor
The general picture is drawn by Turaev (1994), Categories and Modular Functors, University Lecture Series,
where 3-manifold invariants and TQFTs are con- vol. 21. Providence, RI: American Mathematical Society.
Bespalov Y, Kerler T, Lyubashenko VV, and Turaev VG (2000)
structed from semisimple modular categories. He
Integrals for braided Hopf algebras. Journal of Pure and
shows how to obtain the latter as quotients of Applied Algebra 148(2): 113164 (arXiv:math.QA/9709020).
certain subcategories of representations of a modu- Drinfeld VG (1987) Quantum groups. In: Gleason A (ed.)
lar Hopf algebra by the ideal of trace-negligible Proceedings of the International Congress of Mathematicians
morphisms. (Berkeley, 1986), vol. 1, pp. 798820. Providence, RI:
American Mathematical Society.
Finkelberg (1996), based on results of Gelfand
Drinfeld VG (1989a) Quasi-Hopf algebras. Algebra i Analiz
and Kazhdan, establishes (via the theory of Kazhdan 1(6): 114148.
and Lusztig) an equivalence between two modular Drinfeld VG (1989b) Quasi-Hopf algebras and Knizhnik
categories. The first is the semisimple category C of Zamolodchikov equations. In: Problems of Modern Quantum
integrable modules over an affine Lie algebra ^g of Field Theory, pp. 113. BerlinNew York: Springer.
Finkelberg M (1996) An equivalence of fusion categories.
positive integer level k. The second is a certain
Geometric and Functional Analysis 6(2): 249267.
subquotient of the category of Uq (g )-modules for Huang Y-Z and Lepowsky J (1999) Intertwining operator
q = exp(
im1 =(k h_ )), where m 2 {1, 2, 3} and h_ algebras and vertex tensor categories for affine Lie algebras.
is the dual Coxeter number of g . Huang and Duke Mathematical Journal 99(1): 113134 (arXiv:q-alg/
Lepowsky (1999) describe the rigid braided struc- 9706028) (arXiv:q-alg/9706028).
Joyal A and Street RH (1991) Tortile YangBaxter operators in
ture of C using vertex operators. Bakalov and
tensor categories. Journal of Pure and Applied Algebra
Kirillov (2001) use geometrical constructions to 71: 4351.
make C into a modular category, associated with Kerler T and Lyubashenko VV (2001) Non-Semisimple Topologi-
the WessZuminoWitten (WZW) model. They cal Quantum Field Theories for 3-Manifolds with Corners,
construct the corresponding WZW modular functor. Lecture Notes in Mathematics, vol. 1765, vi + 379 pp.
Heidelberg: Springer.
Mac Lane S (1971) Categories for the Working Mathematician,
Modular Functor and TQFT GTM, vol. 5. New York: Springer.
Majid S (1993) Braided groups. Journal of Pure and Applied
Modular categories give rise to a modular functor Algebra 86(2): 187221.
and a TQFT. The meanings of those differ from Majid S (1995) Foundations of Quantum Group Theory.
Cambridge: Cambridge University Press.
author to author, but the common features are the Moore G and Seiberg N (1989) Classical and quantum conformal
following. Such a TQFT is a functor from the field theory. Communications in Mathematical Physics
category whose objects are smooth surfaces with 123: 177254.
additional structures and morphisms are three- Reshetikhin NY and Turaev VG (1991) Invariants of 3-manifolds
dimensional manifolds with additional structures to via link polynomials and quantum groups. Inventiones
Mathematicae 103(3): 547597.
the category of vector spaces. A modular functor is Turaev VG (1994) Quantum Invariants of Knots and 3-Manifolds,
the restriction of such TQFT to the subcategory whose de Gruyter Stud. Math, vol. 18. BerlinNew York: Walter de
morphisms are homeomorphisms of surfaces. One of Gruyter.
360 Brane Construction of Gauge Theories

Brane Construction of Gauge Theories


S L Cacciatori, Universita di Milano, Milan, Italy called D-particles and D-strings respectively, whereas
2006 Elsevier Ltd. All rights reserved. D(1) branes are instantons, that is, points in
spacetime. Concretely, D-branes are extended regions
in spacetime where the endpoints of open strings are
constrained to live. Mathematically, they are defined
Introduction
imposing Dirichlet conditions (whence the D of
Branes appear in string theories and M-theory as D-brane) on the ends of the string, along certain
extended objects which contain some nonperturba- spatial directions. Excitation of these string states
tive information about the theory, and, apart from gives rise to the dynamic of the brane. They
gravity, they can couple with gauge fields. correspond to a ten-dimensional U(1) gauge field,
At low energies, M-theory can be approximated whose components, which are tangent to the brane
with an 11-dimensional N = 1 supergravity, which in world volume, give rise to a gauge field in p 1
fact is unique and contains a graviton field (the metric dimensions, whereas the orthogonal components
g ), a spin 3/2 field (the gravitino) and a gauge field generate deformations of the brane shape. Moreover,
consisting of a 3-form potential field c. The gauge if n parallel p-branes overlap, the gauge theory on the
field, whose field strength is a 4-form G = dc, can then world volume is enhanced to a U(n) gauge theory.
couple electrically with two-dimensional extended Closed strings can generate gravitational interactions
objects, called M2 membranes. Moving in spacetime, responsible for wrappings of the brane. However, in
an M2 membrane describes a three-dimensional world the cases when gravitational interaction is negligible,
volume W3 so that its coupling to the gauge field is we can use this mechanism to construct (p 1)-
Z dimensional gauge theories, as we will see.
S2 k c 1 Before explaining how the construction works let
W3
us remember that there are two other interesting
k representing the charge. objects which often appear. In fact, we have not yet
With c we can associate a dual field ~c such that considered the NeveuSchwarz B-field: this field can
d~c =  G. It is a 6-form and can then electrically couple electrically with a one-dimensional object
couple with a five-dimensional object, the M5 and magnetically with a five-dimensional object.
membrane. However, as c is the true field, we say These are the usual string (also called a fundamental
that M5 couples magnetically with c. or F-string) and a five-dimensional membrane called
In superstring theories, which however are related NS5 brane.
to M-theory by a dualities web, there are many We will see how supersymmetric gauge theory
more objects to be considered. In particular, we will configurations can be realized geometrically, con-
consider type II strings, which at low energies are sidering more or less simple configurations of
described by ten-dimensional N = 2 supergravity branes. We will also show that quantum corrections,
theories. They contain a NeveuSchwarz sector be they exact or perturbative, can be described in
consisting of a graviton g , a 2-form potential this geometrical fashion. To be explicit, we will
B , and a scalar field , the dilaton. The content of work with four-dimensional gauge theories, but it is
the RamondRamond fields depends on the chirality clear that similar constructions can be done in
of the supercharges. different dimensions.
Type IIA strings are nonchiral (their left and right
supercharges having opposite chiralities) and con-
Gauge Groups on the Branes
tain only odd-dimensional p-form potentials A(p) ,
with p = 1, 3, 5, 7, 9. A deeper understanding of how D-branes and
Type IIB strings are chiral and contain only related world-volume gauge theories work requires
even-dimensional p-form potentials A(p) , with the introduction of dualities, but a quite simple
p = 0, 2, 4, 6, 8. heuristic argument can be given, giving up some
Proceeding as before, we see that a (p 1)-form rigor in favor of intuition.
potential can couple electrically with a p-dimensional To set our ideas, let us think of an open string
object and magnetically with a (6  p)-dimensional moving in a nearly flat (but ten-dimensional) space-
object. Such objects in fact exist in type II strings: the time. Its trajectory will describe a two-dimensional
Dp branes are p-dimensional extended objects, with surface having a boundary traced by the ends of the
p = 0, 2, 4, 6, 8 for IIA strings and p = 1, 1, 3, 5, 7, 9 string (Figure 1). The string can then be described by
for IIB strings. In particular, D0 and D1 branes are a map from a two-dimensional surface , having a
Brane Construction of Gauge Theories 361

Here we conventionally rescaled the A field to



normalize the action. To define the equation of
motion, however, we must also specify boundary
conditions for X (, ) on . Let us choose Neu-
mann conditions for  = 0, 1, . . . , p and Dirichlet
Closed string Open string conditions for the remaining directions
Figure 1 Strings moving in spacetime.
@ Xa  0; a 0; . . . ; p 6

boundary  = @, to spacetime, say X (, ) with @ Xi  0; i p 1; . . . ; 9 7
 = 0, 1, . . . , 9. Here we chose on  local coordi-
nates  = (, ), where  2 [0, ] is a spacelike This means that the extrema of the string are bound
coordinate and  is a timelike one. Then  = 0, on a (p 1)-dimensional region (including time): the
individuate the ends of the string and are identi- Dp brane. If for  we consider the full strip
fied for the closed string. Now, on a given back- (, ) = [0, ]  R then the U(1) action reduces to
ground, the string evolution is usually described as a Z 1
two-dimensional (supersymmetric) conformal field SA X Aa @ Xa ; 
1
theory for the fields X (, ). The action for the Z 1
bosonic part is the same for both type IIA and IIB  Aa @ Xa 0;  8
strings, and reads 1
Z p Thus, only the components of Aa tangent to the
1  
@X @X
SX d h h g  X brane interact with the ends of the strings. What
4 0  @ @

Z about the normal components Ai ?


1 @X @X 

B X d ^ d
2 To understand its meaning, let us proceed to
4 0  @ @
compute the mean momentum transferred by the
where g and B are the metric and a 2-form string, as it would be rigid. Imitating the Hamilton
potential field for the given spacetime background, Jacobi procedures for particles, let us consider the
and h
is a metric for . In general, we must also action up to a fixed time, say  = 0, so that
add a scalar field (X), but it will not play any role  = [0, ]  [1, 0]. It is then a function of the
here. Using conformal invariance, we can reduce h
position X (, 0) of the string at the instant  = 0.
to the flat metric. Also consider a flat background To compute the momentum, we must vary the
g (X) =  and concentrate for a moment on the action by changing the position by a constant shift
B-field. X () = 0 . The variation will then contain some
Conceived as a 2-form field over the spacetime, boundary terms which, for reasons of consistency,
the potential field B is a gauge field: its field strength we must make vanish.
3-form H = dB is unchanged under a shift Before doing such a computation, let us make
some further comments. It is plausible to assume
B ! B dA 3 that the two ends of the string could be charged for
generated by the 1-form field A(X). Here A should be different U(1) fields. To the states of the open string
a totally unphysical field. However, note that if one we can in fact add two discrete labels I, J = 1, . . . , n,
considers open strings, the action for the B-field, and for some integer n, called ChanPaton factors, and
then the full action is shifted by a boundary term referring, respectively, to the two ends of the string.
Z We will indicate the ends of the string as X (0, ; I)
1 @X  and X ( , ; J) when we need to specify the states. If
SX ! SX A  X d 4
4 0  @ the string is in the excited state (I, J), then X(0, ; I)
can couple with the field AI and X( , ; J) with A( J) .
The boundary  just describes the timelike world
For simplicity, we will now assume that these fields
lines of the ends of the string. Thus, the ends of
are constant. Note however that A(I) must be
the string carry a U(1) charge and, even though
intended as a function of X(0, ) only, and similarly
the B-field vanishes, we can have the open-string
for A( J) . Also to realize the variation we can vary
action
X (, ) by a function X (, ) =  () strictly
Z
1 picked to 0 at  = 0 so that essentially
SX @ X @  X d2 
4 0 
Z @   0  9
A X@ X d 5
 where () is the Dirac delta function.
362 Brane Construction of Gauge Theories

Using the chosen boundary conditions, the varia- boundary terms, the total variation of the action
tion of the full action contains the boundary terms due to the shift X (, 0) =  becomes
 Z 0 Z
1
J
Sbound Ai  Ai
I
@ i d S @  @ X d2
2 0 
Z 1 Z
1 
i @ Xi ; 0d @ X ; 0d 12
2 0 0 2 0 0

i The resulting momentum is
Xi ; 0  Xi 0; 0
2 0 Z
  P
1
@ X ; 0d
0 J I
2  Ai  Ai 10 2 0 0

Imposing the condition of its vanishing gives the On the bulk, the fields X satisfy the standard wave
physical interpretation for the normal components equation in two dimensions, so that the general
of the U(1) fields solution is the sum of a left-moving and a right-
  moving part, X (, ) = XL ( ) XR (  ).
J I
Xi ; 0  Xi 0; 0 2 0 Ai  Ai 11 Imposing the boundary conditions, one finds

This means that, up to a constant shift, the fields Xa ;  XaL   XaL   


A(K)
i measure the positions of the ends of the strings 2 0 pa  Xa0 13
in the transverse directions! (Figure 2). Equivalently,
we can say that the string ends on two different Dp Xi ;  XiL    XiL   
 
branes, parallel but displaced in the transverse 20 AJi  AIi  Xi0 14
directions by a quantity 2 0 A(i J)  A(I)
i . We are
thus also able to interpret the ChanPaton factors. Here X0 and pa are integration constants and
They mean that the string is living in a background XiL ( )  XiL (  ) = 0. A direct computation
of n parallel branes, stretched between the Ith and then shows that Pa = pa and Pi = 0, which is also
the Jth brane. On every brane, a U(1) gauge group what intuition suggests: the string can freely move
lives so that the full gauge group is U(1)n . However, along the branes but is fixed between them in the
when k of the branes overlap, the corresponding set orthogonal directions. However, if it is stretched
of states become indistinguishable, so that the gauge between two separated branes (i.e., if I 6 J), there is
group can be enhanced to a U(k) group. In another contribution to the energy. In fact the factor
conclusion, n overlapping parallel Dp branes carry T := 1=(2 0 ) represents the string tension, so that if
a (p 1)-dimensional U(n) gauge theory which  is its minimal length, its minimal contribution to
breaks in U(ki ) block factors if the branes separate the energy will be E = T. This energy must
in stacks of ki overlapping branes. equally contribute to the spectrum of the excited
We can say a little bit more about this. If the modes, the gauge field bosons. Here in fact, is where
string excited states represent gauge degree of T-duality comes into play, but we will not discuss it.
freedom, they must become massive to break gauge The conclusion is that the spectrum corresponding to
symmetry when the branes separate. To see this, let the stretched string must satisfy the condition E  T,
us conclude by computing the mean momentum which is as if the string states acquired a mass T,
carried by the string. After elimination of the that is,
9 
X 2
m2 A Ji  AIi 15
ip1
Aa
This gives us a geometric tool to construct (p 1)-
dimensional gauge theories: on n coincident Dp
Aa Aa branes there exists a U(n) gauge theory which can be
Ai broken separating the branes and thus giving a mass
to the gauge bosons. Such a mass is proportional to
the distance between the branes (Figure 3).
Before continuing with some examples, let us
Figure 2 Tangential components of Aa appear as gauge make two comments. First, the theory obtained in
modes. Normal components Ai appear as shift modes. this way is a supersymmetric one, because the
Brane Construction of Gauge Theories 363

Massless NS5
v
x6
x
D4

Massive

Figure 4 D4 branes ending on an NS5 brane. Gauge degrees


of freedom are frozen in four dimensions.
Figure 3 Stretched strings acquire a mass.

Dirichlet conditions allow the action of supersym- can try to consider the coexistence of more kinds of
metric transformations of the form L QL R QR , branes.
where QL and QR are the fermionic left and right One way to do this is to consider n parallel 4-branes
supercharge operators and L , R are spinors satisfy- ending on an NS5 brane in type IIA string theory
ing the brane projection condition L = 0 1  . . .  (Figure 4), and then analyze the gauge theory restricted
p R . Here  are the ten-dimensional Dirac to the four-dimensional intersection (here the theory is
matrices and one refers to antibranes for the nonchiral as 0  . . .  9 L=R =  L=R ). What kind of
negative sign. branes can end on other kind of branes can be
Second, the gauge group can be converted into an established, starting from the fact that strings can end
SO(n) or an Sp(n=2) (for even n), adding an on a brane, and using the dualities tool (Giveon and
orientifold plane parallel to the branes. The orienti- Kutasov 1999).
fold plane acts on the orthogonal spacetime direc- Let us fix some conventions. We will indicate with
tions with a Z2 -action x = (x0 , x1 , x2 , x3 ) 2 R4 the coordinates on the inter-
section, so that (x; v) = (x; x4 , x5 ) 2 R6 define the NS5
Xi Xi 16 brane, and (x, x6 ), with x6 2 [0, 1), the 4-branes. Also
if Xi = 0 is the position of the orientifold. It further vI will indicate the position of the Ith 4-brane on the 5-
acts on the string world sheet as    making it brane, and y = (x7 , x8 , x9 ) will collect the remaining
an unoriented string. The effect is to project out coordinates. Finally, we will indicate the product of -
some states from the spectra, thus reducing the matrices, corresponding to given directions, indicizing
gauge group. a simple  with the respective coordinates. For
example v = 4 5 . With these conventions, the
brane projection conditions for D4 and NS5 branes,
Geometric Engineering of Gauge respectively, read
Theories from Branes L x 6 R 17
To illustrate how brane construction of gauge
L x v L ; R x v R 18
theories works, we will consider a particular con-
figuration of branes (Witten 1997). These projections reduce supersymmetry to N = 2.
We would like to obtain a four-dimensional U(n) After a short manipulation and using for example
gauge theory. A possibility could be to take n D3 antichirality of R , it is easy to see that the first
branes in a type IIB string background. However, condition can be substituted by
such a model would contain too many supersymme-
L x y R 19
tries: in ten dimensions, supersymmetries are gener-
ated by two 16-dimensional chiral spinors L , R In other words, we could add a number of 6-branes
(0  . . .  9 L,R = L,R ). From the four-dimensional in the (x, y) directions, without further reducing
point of view, each of them represents four four- supersymmetry. We will consider this possibility
dimensional spinors giving an N = 8 supersymmetric later.
theory. The projection condition, due to the branes, On the D4 branes there is an eventually broken
reduces the number of supersymmetries to four. U(n) gauge theory. Here the vector fields
Supersymmetry not being manifest in nature, it is A ,  = 0, 1, 2, 3, 6, and the scalar fields vI and y
desirable to have fewer supersymmetric gauge theo- live. The last ones are set to zero by the Dirichlet
ries at hand. Because different brane projection conditions, whereas vI measure the fluctuations of
conditions can further reduce supersymmetry, we the D3 brane positions over NS5. The O(2) group
364 Brane Construction of Gauge Theories

of rotations of the (x4 , x5 ) coordinates acts on NS5 NS5


them, which can be broken by an expectation
value hvI i 6 0. The SO(3) rotations of (x6 , x7 , x8 ) v
(under which vI are singulets) do not influence the x
projection conditions and can then be identified with D4
the R-symmetry group SU(2)R . It could be broken by a
nonvanishing expectation value h yi 6 0, but as we x6
said it cannot happen in the actual configuration. This
highlights an unbroken supersymmetric Coulomb L
branch.
What is the physics as seen by an observer living Figure 5 N = 2 four-dimensional super YangMills theory, with
U(n) gauge group.
on the four-dimensional spacetime x? The compo-
nents A ,  = 0, 1, 2, 3, of the vector fields transform
as vectors with respect to the four-dimensional
What we just obtained is an N = 2 supersym-
Lorentz group SO(1, 3). They satisfy Neumann
metric classical U(n) gauge theory in four dimen-
boundary conditions on x6 = 0 and then survive as
sions, without matter, and in the Coulomb branch.
U(n) gauge vector fields. The A6 component behaves
Before considering quantization, let us briefly
as a scalar with respect to SO(1, 3) but is eliminated
discuss some possible generalizations. For example,
by a Dirichlet condition in x6 = 0. The v scalar field
matter can be realized attaching to the left-hand side
will be responsible for the eventual breaking of the
NS5 brane, new D4 branes parallel to the previous
gauge group.
ones, but extended in the x6 direction from 1 to 0
This seems to be quite a good scenario but
(Figure 6). Considering strings stretched between
actually the situation is unsatisfactory. If a 4-brane
long and short branes, we obtain states whose half-
extends to the interval [0, L] in the x6 direction, the
gauge action, associated with the end connected to
effective action for the gauge fields goes like this:
the long brane, is frozen. The corresponding states
Z L Z
1 thus appear in the fundamental representation and
dx 6
d4 xtrF F can be interpreted as matter states.
g2D4 0 R4
Z To consider the Higgs branch, one should be able
L

2 d4 xtrF
F
20 to break supersymmetry giving an expectation value
gD 4 R 4 to y. As mentioned above, in the actual configura-
where ,
= 0, 1, 2, 3. Thus, the gauge coupling in tion this cannot happen because y is set to 0 by
p Dirichlet conditions. Fortunately, as we said, one
four dimensions appears to be g4 = (gD4 )= L. In our
case, where L goes to infinity, the gauge coupling can add 6-branes in the (x, y) directions. If we insert
vanishes and the gauge degrees of freedom are such branes to stop the long D4 branes in a large but
frozen. Moreover, an argument similar to the one finite value of x6 , say x6 = M with M L, then
made for the stretched strings shows that the energy long branes have Neumann conditions in the y
of the D4 brane is very high and makes the directions. Thus, fluctuations of the long branes can
mechanism of gauge group breaking difficult. The give an expectation value to y, breaking super-
same is true for the NS5 brane, which also turns out symmetry and subsequently the Higgs branch can be
to be extremely massive and does not participate in tuned, shifting 4-branes stretched between 6-branes
the dynamics. But this is what we want. (Figure 7).
To solve the problem and restore gauge dynamics
in four dimensions, one must consider a stack of
4-branes of finite length in the x6 direction. This can
be achieved placing in x6 = L a second NS5 brane x NS5
parallel to the first one and in the same point in y Matter
(Figure 5). In this way, the D4 branes can stretch NS5
between the NS5 branes. If L is little enough, the D4
gauge dynamics is restored also requiring a small
value for gD4 , to ensure the gravitational coupling v x6
(and the couplings with the KaluzaKlein and NS5
modes) to be negligible. However, L must be bigger
L
then the X6 fluctuations in order to avoid quantum
corrections. Figure 6 Adding matter.
Brane Construction of Gauge Theories 365

x The full web of dualities suggests the existence of


NS5
y a unique unifying theory called M-theory. At low
Matter
D6 NS5 energies, M-theory appears as the strong-coupling
limit of type IIA strings. In such a limit, D0 branes
become the dominant objects and the corresponding
states can be interpreted as KaluzaKlein modes
v x6 coming from an eleventh dimension x10 compacti-
Higgs
branch fied on a circle S1 (Figure 10).
L Thus, M-theory manifests itself as an 11-dimensional
Figure 7 Permitting Higgs phases. supergravity. In particular, it can be shown that there
can be only a unique 11-dimensional supergravity. As
The details require some careful inspection, but said, here the nonperturbative objects are two- or five-
we shall stop our analysis here (Giveon and Kutasov dimensional membranes.
1999). From the M-theory point of view, the D4 branes
More general gauge configurations can be realized considered in our model appear as M5 membranes
by adding more parallel NS5 branes, and thus wrapped on the eleventh direction S1 (Figure 11).
obtaining product groups. Adding orientifold planes, Because quantum corrections are no longer negligi-
one can change gauge groups as explained in the ble, we can no longer think of these branes as
previous section (Figure 8). stretched in the x6 direction, but v must also be
Finally, we can take a further step towards more considered. Thus, the M5 membranes will describe,
physical models, constructing N = 1 gauge theories. in R10  S1 , a region R4  S, where R4 are the x
For example, this can be achieved from the previous coordinates, and S is a Riemann surface immersed in
N = 2 model, rotating the second NS5 brane from Q  S1 , Q being spanned by the (v, x6 ) coordinates.
the (x, v) position, to the (x, w) position, where In fact, supersymmetry constrains the surface to be a
w = (x8 , x9 ) (Figure 9). Then a new brane projection holomorphic curve, so that to describe it, it is
condition appears ( L = x w R ), breaking super- convenient to collect v = (x4 , x5 ) and (x6 , x10 ) into
symmetry down to N = 1. complex coordinates v = x4 ix5 and s = x6 ix10 .
In this case, one could also obtain chiral matter, To compute quantum fluctuations, let us note that
adding, for example, orientifold planes. the end of a D4 brane over an NS5 brane is free to
move along the v directions. A fully free end of a
brane would satisfy a free wave equation. However,
as x6 is constrained in all directions but the v ones, it
Quantum Corrections from M-Theory will simply satisfy a Laplace equation in two
Up to this point we have considered classical gauge dimensions: v X6 = 0. Let us solve it, for a fixed
configurations. Quantum corrections could be com- NS5 brane. It will be (at least for large values of v)
puted switching on brane fluctuations. However, it X
nL

X
nR

is an amusing fact that working with M-theory one x6 v k log jv  vLi j  k log jv  vRi j 21
can obtain exact quantum results. As an example, i1 i1

let us sketch how the exact SeibergWitten solution where nL is the number of D4 branes ending on
can be obtained for the N = 2 model described in the the left-hand side of the NS5 brane, in the positions
previous section, in the simplest case without v()
Li , and similar for the R index, which refers to
matter.

x NS5
(n1, n2)

NS5

D4 n1 n2 D4

x6 v

Figure 8 N = 2 four-dimensional super YangMills theory with U(n1 )  U(n2 ) gauge group and matter. Strings crossing the central
NS5 brane give matter in the (n1 , n2 ) representation.
366 Brane Construction of Gauge Theories

  Y
n
x
y NS5
w
s  s1 s  s2 v  vi 0 24
Matter
D6 i1

D4 Here s() are the positions of the NS5 branes, and


the positions vi of the D4 branes coincide for both
v x6 NS5 the NS5 branes. Also, for large values of v, one has
t(1)
vn and t(2)
vn .
Quantum mechanically, the configuration is
Figure 9 Going down to N = 1 supersymmetry. determined in terms of v and t by the holomorphic
curve S, which can be described as an algebraic
curve F(v, t) = 0, generalizing the classical configura-
(v, y) tion. As there are two NS5 branes and n D4 branes,
(x, x 6) F must be a polynomial of degree 2 in t,

Fv; t A2 vt2 A1 vt A0 v 25

where Aa , a = 1, 2, 3, are all polynomials of degree n.


x 10
Note that values of v such that A1 vanishes give the
Figure 10 In M-theory one can think as if at any ten-dimensional
solution t = 0, which corresponds to sending the right-
spacetime point, there is attached an S 1 circle of ray R10 .
hand side NS5 brane to 1. Similarly, A2 = 0 sends the
other NS5 brane to 1. To avoid these undesirable
configurations, we can set A0 = A2 = 1. For A1 , we
D4 brane M5 membrane (v, y) can take the most general choice, up to an eventual
(x, x 6) shift in v, giving the quantum configuration

x 10
t2 vn an2 vn2    a1 v a0 t 1 0 26
Figure 11 D4 branes become M5 membranes in M-theory. This realizes a quantum-mechanical correspondence
between the M5 membrane configurations described
the right-hand side. Here () refers to the th NS5 by the given polynomials, and the N = 2 super
brane, and k is an integration constant. YangMills vacua. But this is also the claimed
Because x6 is the real part of a holomorphic field, SeibergWitten curve. In particular, M-theory gives
whose imaginary part is compactified on a circle of a concrete physical meaning for the support Rie-
ray R10 , we then find mann surfaces of the SeibergWitten solutions.
  To conclude, let us make some further comments.
X
nL

sv R10 log v  vLi It is clear how the construction can be extended for
i1 involving more configurations, for example, with
X
nR   more NS5 branes, or adding matter.

 R10 log v  vRi 22 Also, we have seen that the geometrical picture
i1 which branes give of gauge theories extends at the
This describes the quantum fluctuations of the NS5 quantum level.
brane as seen in M-theory. In particular, because of A similar construction can be made for the N = 1
the imaginary part of s, the ends of the D4 branes model, which also permits a full geometrical proof
appear as vortices on the NS5 brane. In place of s, it of the Seiberg duality at both classical and quantum
is now convenient to introduce a new field levels.
t := exp (s=R10 ) so that Finally, we should note that there are also
Q nR   other methods, which work in spacetimes where extra

i1 v  v Ri dimensions are compactified. There, the branes wrap
tv Q   23 around certain singular loci which contain information
nL 
i1 v  vLi about gauge symmetries (Lerche 1997).
Before continuing, let us look a bit again at the See also: AdS/CFT Correspondence; Compactification of
classical limit. In this case, a fixed value of v will Superstring Theory; Gauge Theories from Strings;
correspond to the position of a D4 brane, whereas a Noncommutative Geometry from Strings; SeibergWitten
fixed value of s will correspond to the fixed position Theory; Supergravity; Superstring Theories;
of an NS5 brane. The classical configuration is then Supersymmetric Particle Models.
Brane Worlds 367

Further Reading Polchinski J (1998) String Theory. Vol. 1: An Introduction to the


Bosonic String. Cambridge: Cambridge University Press.
Giveon A and Kutasov D (1999) Brane dynamics and gauge Polchinski J (2004) String Theory. Vol. 2: Superstring Theory and
theory. Reviews of Modern Physics 71: 983. Beyond. Cambridge: Cambridge University Press.
Johnson CV (2003) D-Branes.Cambridge: Cambridge University Witten E (1997) Solutions of four-dimentional field theories via
Press. M-theory. Nuclear Physics B 500: 3.
Lerche W (1997) Introduction to SeibergWitten Theory and Its Zwiebach B (2004) A First Course in String Theory. Cambridge:
Stringy Origin. Nucl. Phys. Proc. Suppl. B 55: 83. Cambridge University Press.

Brane Worlds
R Maartens, Portsmouth University, Portsmouth, UK extended objects of higher dimension than strings
2006 Elsevier Ltd. All rights reserved. play a fundamental role in the theory. These objects
are known as branes (from membranes), and the
relation between them and strings leads to a new
Introduction picture of how gravity and matter may be connected
in the universe. Roughly speaking, open strings
At high enough energies, Einsteins classical theory describe the particles of the nongravitational sector,
of general relativity breaks down, and will be and their ends are attached to branes, while closed
superseded by a quantum gravity theory. The strings, which describe the graviton and associated
singularities predicted by general relativity in grav- particles of the gravitational sector, can move freely
itational collapse and in the hot big bang origin of in all dimensions.
the universe are thought to be artifacts of the Thus, the observable universe could be a
classical nature of Einsteins theory, which will be 1 3-surface a brane, embedded in a
removed by a quantum theory of gravity. Develop- 1 3 d-dimensional spacetime the bulk,
ing a quantum theory of gravity and a unified theory with standard-model particles and fields trapped on
of all the forces and particles of nature are the two the brane, while gravity is free to access the bulk.
main goals of current work in fundamental physics. Brane-world models offer a phenomenological way to
The problem is that general relativity and quantum test some of the novel predictions and corrections to
field theory cannot simply be molded together. general relativity that are implied by M theory.
There is as yet no generally accepted (pre-)quantum
gravity theory.
The quest for a quantum gravity theory has a long
and thus far not very successful history. Many Higher-Dimensional Gravity
different lines of attack have been developed, each Brane worlds can be seen as reviving the original
having a different way of dealing with the classical higher-dimensional ideas of Kaluza and Klein in the
singularities that arise from point particles and 1920s, but in a new context of quantum gravity. An
smooth spacetime geometry. String theory does important consequence of extra dimensions is that
away with zero-dimensional point particles, and the four-dimensional Planck scale Mp M(4) =
particles are modeled as different states of new 1.2  1019 GeV is no longer the fundamental energy
fundamental objects, the one-dimensional strings. It scale of gravity. The fundamental scale is instead
turns out, however, that there is a price to pay the M(4d) . This can be seen from the modification of
number of spacetime dimensions must be greater the gravitational potential. For an EinsteinHilbert
than four for a consistent theory. When fermions are gravitational action,
included, which leads to superstring theory, the Z q
required number of dimensions is ten one time and 1
Sgravity 2 d4 x dd y 4d g
nine space dimensions. 24d
There are in fact five distinct 19-dimensional h i
superstring theories. In the mid-1990s, duality  4d R  24d 1
transformations were discovered that relate these
superstring theories to each other and to the 110- we have the higher-dimensional Einstein field
dimensional supergravity theory. This led to the equations,
conjecture that all of these theories arise as different 4d
GAB 4d RAB  124d R4d gAB
limits of a single theory, which has come to be
known as M theory. It was also discovered that 4d 4d gAB 24d 4d TAB 2
368 Brane Worlds

where xA = (xa , y1 , . . . , yd ) and 2(4d) is the gravita- fundamental scale is much less than the Planck
tional coupling constant given by scale felt in four dimensions. This emerges by virtue
of the large size of the extra dimensions. It is not
8
24d 8G4d 3 necessary for all extra dimensions to be of equal size
M2d
4d for this mechanism to operate. There are string
theory solutions (HoravaWitten solutions) with
The static weak field limit of the field equations
two 19-branes located at the boundaries of the
leads to the 4d-dimensional Poisson equation,
bulk, at the endpoints of an S1 =Z2 orbifold, that is,
whose solution is the gravitational potential
a circle folded on itself across a diameter. The
24d orbifold extra dimension is the large one, whereas
Vr / 4 the other six extra dimensions on the branes are
r1d
compactified on a very small scale, close to the
In the simplest scenario, we can assume a fundamental scale, and their effect on the
toroidal configuration for the d extra dimensions, dynamics is felt through moduli fields, that is,
with each compactified on the same length scale L. five-dimensional scalar fields.
Then on scales r . L, the potential is 4d- These solutions can be thought of as effectively
dimensional, V  r(1d) . By contrast, on scales five dimensional, with an extra dimension that can
large relative to L, where the extra dimensions do be large relative to the fundamental scale. They
not contribute to variations in the potential, V behaves provide the basis for the RandallSundrum 1 (RS1)
like a four-dimensional potential, V  Ld r1 . This phenomenological models of five-dimensional grav-
means that the usual Planck scale becomes an effective ity. The single-brane RandallSundrum 2 (RS2)
coupling constant, describing gravity on scales much models with infinite extra dimension arise when
larger than the extra dimensions, and related to the the orbifold radius tends to infinity. The RS models
fundamental scale via the volume of the extra are not the only phenomenological realizations of M
dimensions: theory ideas. They were preceded by the brane-
world models of Arkani-Hamed, Dimopoulos, and
M2p  M2d
4d L
d
5 Dvali (ADD), which put forward the idea that a
large volume for the compact extra dimensions
would lower the effective Planck scale M(4d) . If
Large Extra Dimensions
M(4d) is close to the electroweak scale, Mew , then
If the extra-dimensional volume is significantly this would address the long-standing hierarchy
above the Planck scale, then the true fundamental problem, that is, why there is such a large gap
scale M(4d) can be much less than the effective scale between Mew  1 TeV and Mp  1016 TeV.
Mp , In the ADD models, more than one extra
dimension is required for agreement with experi-
Ld  Md
p ) M4d  Mp 6 ments, and there is democracy among the equiva-
In this case, we understand the weakness of gravity lent extra dimensions, which, in addition, are flat.
as due to the fact that it spreads into extra By contrast, the RS models have a preferred extra
dimensions, and only a part of it is felt in four dimension, with other extra dimensions treated as
dimensions. ignorable (i.e., stabilized except at energies near the
A lower limit on M(4d) is given by null results in fundamental scale). Furthermore, this extra dimen-
table-top experiments to test for deviations from sion is curved or warped rather than flat: the bulk
Newtons law in four dimensions, V / r1 . These is a portion of anti-de Sitter (AdS5 ) spacetime. The
experiments currently probe submillimeter scales, RS branes are Z2 -symmetric (mirror symmetry), and
and find no detectable deviation, so that have a tension, which serves to counter the influence
on the brane of the negative bulk cosmological
L . 101 mm  1015 TeV1 constant. This also means that the self-gravity of the
) M4d & 103215d=d2 TeV 7 branes is incorporated in the RS models. The novel
feature of the RS models compared to previous
Stronger bounds can be derived from null results in higher-dimensional models is that the observable
particle accelerators in some brane-world models, or three dimensions are protected from the large extra
from constraints imposed by observations of super- dimension (at low energies) by curvature (warping),
novae or of light-element abundance. rather than straightforward compactification.
Brane worlds, arising in the framework of string The RS brane worlds provide phenomenological
theory, thus incorporate the possibility that the models that reflect at least some of the features of
Brane Worlds 369

M theory, and that bring exciting new geometric The massless mode, h0 , is the usual four-
and particle physics ideas into play. The RS2 dimensional graviton mode. But there is a tower
models also provide a framework for exploring of massive modes, L1 , 2L1 , . . . , which
holographic ideas that have emerged in M theory. imprint the effect of the five-dimensional gravita-
Roughly speaking, holography suggests that tional field on the four-dimensional brane. Com-
higher-dimensional dynamics may be determined pactness of the extra dimension leads to
from a knowledge of the fields on a lower- discreteness of the spectrum. For an infinite
dimensional boundary. The AdS/CFT correspon- extra dimension, L ! 1, the separation between
dence is an example in which the classical the modes disappears and the tower forms a
dynamics of the higher-dimensional AdS gravita- continuous spectrum.
tional field are equivalent to the quantum
dynamics of a conformal field theory (CFT) on
the boundary.
RandallSundrum Brane Worlds
RS brane worlds do not rely on compactification to
localize gravity at the brane, but on the curvature of
KaluzaKlein Modes the bulk. What prevents gravity from leaking into
The dilution of gravity via extra dimensions not the extra dimension at low energies is a negative
only weakens gravity, it also broadens the range of bulk cosmological constant,
graviton modes felt on the brane. The graviton is 6
more than just the four-dimensional massless mode 5  62 12
2
of four-dimensional gravity other modes, with an
effective mass on the brane, arise from the fact where is the curvature radius of AdS5 and  is the
that the graviton is a (4d)-dimensional massless corresponding energy scale. The bulk cosmological
particle. These extra modes on the brane are constant with its repulsive gravity effect acts to
known as KaluzaKlein (KK) modes of the squeeze the gravitational potential closer to the
graviton. brane. We can see this clearly in Gaussian normal
For simplicity, consider a flat brane with one flat coordinates xA = (x , y) based on the brane at y = 0,
extra dimension, compactified through the identi- for which the metric takes the form
fication y $ y 2nL, where n = 0, 1, 2, . . . . The 5
ds2 dy2 e2jyj=  dx dx 13
perturbative five-dimensional graviton is defined
via with  the Minkowski metric. The exponential
5
warp factor reflects the confining role of the bulk
AB ! 5 AB hAB 8 cosmological constant. The Z2 -symmetry about the
where (5) AB is the five-dimensional Minkowski metric brane at y = 0 is incorporated via the jyj term. In the
and hAB is a small transverse traceless perturbation. Its bulk, this metric is a solution of the five-dimensional
amplitude can be Fourier expanded as Einstein equations,
X 5
hxa ; y einy=L hn xa 9 GAB 5 5 gAB 14
n
that is, (5) TAB = 0 in eqn [2]. The brane is a flat
where hn are the amplitudes of the KK modes, that Minkowski spacetime, gAB (x , 0) =   A  B , with
is, the effective four-dimensional modes of the five- self-gravity in the form of brane tension.
dimensional graviton. To see that these KK modes The two RS models are distinguished as follows:
are massive from the brane viewpoint, we start from
the five-dimensional wave equation that the massless RS1 There are two branes in RS1, at y = 0 and
five-dimensional field h satisfies (in a suitable y = L, with Z2 -symmetry identifications
gauge): y $ y; yL$Ly 15
5 & 2
h 0 ) &h @ yh 0 10 The branes have equal and opposite tensions, ,
where
It follows that the KK modes satisfy a four-
2
dimensional KleinGordon equation with an effec- 3 Mp
tive four-dimensional mass, mn :  16
4 2
n The positive-tension TeV brane has fundamental
&hn m2n hn ; mn 11
L scale M(5)  1 TeV. Because of the exponential
370 Brane Worlds

h  
warping factor, the effective scale on the negative
hm y e2y= Bm J2 mley=
tension Planck brane at y = L is Mp . On the  i
positive tension brane, Cm Y2 mley= 27
h i
M2p M35 1  e2L= 17 where J2 , Y2 are Bessel functions.
The boundary condition for the perturbations is
So RS1 gives a new approach to the hierarchy h0 (t, 0) = 0, which implies
problem. Because of the finite separation between
J1 m
the branes, the KK spectrum is discrete. C0 0; Cm  Bm 28
Y1 m
RS2 In RS2, there is only one, positive-
tension, brane. This may be thought of as arising In the RS1 model, we have a further boundary
from sending the negative tension brane off to condition, h0 (t, L) = 0, which leads to a discrete
infinity, L ! 1. Then the energy scales are eigenspectrum, namely the masses m that satisfy
related via    
J1 meL= Y1 m  Y1 meL= J1 m 0 29
M2p
M35 18 The zero mode is normalizable, since

Z 1 
 
On the RS2 brane, the negative (5) is offset by  B e2y=
dy <1 30
 0 
the positive brane tension . The fine-tuning in eqn 0
[16] ensures that there is zero effective cosmological
Its contribution to the gravitational potential
constant on the brane, so that the brane has the
V = 1/2h00 gives the four-dimensional result, V /
induced geometry of Minkowski spacetime. To see
r1 . The contribution of the massive KK modes sums
how gravity is localized at low energies, we consider
to a correction of the four-dimensional potential.
the five-dimensional graviton perturbations of the
For r  , one obtains
metric:  
5
GM GM
gAB ! 5 gAB hAB Vr
1
2 31
19 r r r
hAy 0 h  @ h
which simply reflects the fact that the potential
We split the amplitude h into three-dimensional becomes truly five dimensional on small scales. For
Fourier modes, and the linearized five-dimensional r  ,
Einstein equations lead to the wave equation (y > 0)  
h i GM 22
Vr
1 2 32
e2y= h k2 h h00  4 h0 20 r 3r

which gives the small correction to four-dimensional
Separability means we can write gravity at low energies from extra-dimensional effects.
X
ht; y m t hm y 21
m

and the wave equation reduces to Cosmological Brane Worlds


m m2 k2 m 0
22 The RS models contain vacuum (Minkowski)
branes. In order to pursue brane-world ideas in
4 0 cosmology, we need to generalize the RS models to
h00m  h e2y= hm 0 23
m incorporate cosmological branes with matter and
The zero-mode solution is radiation on them. The effective field equations on
the brane are the vehicle for brane-bound observers
0 t A0 eikt A0 eikt 24 to interpret cosmological dynamics. They arise from
projecting the five-dimensional field equations onto
h0 y B0 C0 e4y= 25 the brane, via the GaussCodazzi equations. These
and the massive KK mode (m > 0) solutions are equations involve also the extrinsic curvature K of
 p  the brane, which determines how the brane is
m t Am exp i m2 k2 t imbedded in the bulk.
 p  The stress-energy on the brane (tension, matter,
Am exp i m2 k2 t 26 radiation) means that there is a jump in K across
Brane Worlds 371

the brane. More precisely, the junction conditions E  , the projection of the bulk Weyl tensor on the
across the brane are brane, encodes corrections from KK or five-
dimensional graviton effects. From the brane-
g 
  g 0 33
observer viewpoint, the energymomentum
h i corrections in S  are local, whereas the KK
K  K
 2
T brane
 1 brane
T g  34 corrections in E  are nonlocal, since they
  5  3
incorporate five-dimensional gravity wave
where modes. These nonlocal corrections cannot be
brane
determined purely from data on the brane. In
T T  g 35 the perturbative analysis of RS2 which leads to
is the total energymomentum tensor on the brane the corrections in the gravitational potential, eqn
and T brane = g T
brane
. The Z2 -symmetry means that [32], the KK modes that generate this correction
when approaching the brane from one side and are responsible for a nonzero E  ; this term is
going through it, one emerges into a bulk that looks what carries the modification to the weak-field
the same, but with the normal reversed. This implies field equations.
that The effective field equations are not a closed system.
One needs to supplement them by five-dimensional
K
 K
 36
equations governing E  , which are obtained from the
so that we can use the junction condition (eqn [34]) five-dimensional Einstein equations.
to determine the extrinsic curvature:

K 1225 T 13  Tg 37 Cosmological Dynamics

where T = T  , we have dropped the () and we A (14)-dimensional spacetime with spatial
evaluate quantities on the brane by taking the limit 4-isotropy (four-dimensional spherical/ plane/
y ! 0. hyperbolic symmetry) has a natural splitting into
Together with the GaussCodazzi equations, eqn [37] hypersurfaces of symmetry, which are (13)-
leads to the induced field equations on the brane: dimensional surfaces with 3-isotropy and
3-homogeneity, that is, FriedmannRobertson
2
G g 2 T 6 S   E  38 Walker (FRW) surfaces. In particular, the AdS5
z bulk of the RS2 brane world, which admits a
where foliation into Minkowski surfaces, also admits an
FRW foliation since it is 4-isotropic. The general-
2  24 1645 39
ization of AdS5 that preserves 4-isotropy and
 solves the five-dimensional Einstein equation is
  4 12 5 2  40 Schwarzschild AdS5 , and this bulk therefore
admits an FRW foliation. It follows that an
1
S  12 TT  14T T  FRW cosmological brane world can be embedded
 in Schwarzschild AdS5 spacetime.
1
24 g 3T
T
 T 2 41
The black hole in the bulk is felt on the brane
and via the E  term. The bulk black hole gives rise to
dark radiation on the brane via its Coulomb
E  5 CACBD nC nD g A g B 42
effect. The FRW brane can be thought of as
A moving radially along the fifth dimension, with the
where n is the unit normal to the brane and
(5) junction conditions determining the velocity via
CACBD is the Weyl tensor in the bulk.
The induced field equations [38] show two key the Friedmann equation. Thus, one can interpret
modifications to the standard four-dimensional Einstein the expansion of the universe as motion of the
field equations arising from extra-dimensional effects. brane through the static bulk. In the special case
of no black hole and no brane motion, the brane is
S   (T )2 is the high-energy correction term,
empty and has Minkowski geometry, that is, the
which is negligible for  , but dominant for original RS2 brane world is recovered, in different
  (where is the energy density): coordinates.
An intriguing aspect of the cosmological metric is
j2 S  =j jT j
  43 that five-dimensional gravitational wave signals can
j2 T j   take shortcuts through the bulk in traveling
372 Brane Worlds

between points A and B on the brane. The travel This is much weaker than the limit imposed by
time for such a graviton signal is less than the time table-top experiments, which limit the curvature
taken for a photon signal (which is stuck to the radius to . 0.2 mm, leading to
brane) from A to B.
Cosmological dynamics on the brane are governed  > 100 GeV4 ) M5 > 108 GeV 47
by the modified Friedmann equation:
The high-energy regime during radiation domina-
  2
 m 1 K tion is short-lived. Since 2 = decays as a8 during the
H2 1 4  2 44 radiation era, it will rapidly drop below one, and the
3 2 a 3 a
universe will enter the low-energy four-dimensional
regime. However, traces of the high-energy era may be
where H = a=a is the Hubble expansion rate, a(t) is
left in the perturbation spectra that leave an imprint in
the scale factor, K is the curvature index, and m is
the cosmic microwave background radiation.
the mass of the bulk black hole.
In conclusion, simple brane-world models of RS2
The 2 = term is the high-energy term. When 
type provide a rich phenomenology for exploring
, in the early universe, then H 2 / 2 . This means
some of the ideas that are emerging from M theory.
that a given energy density produces a greater rate of
The higher-dimensional degrees of freedom for the
expansion that it would in standard four-dimen-
gravitational field, and the confinement of standard
sional gravity. As a consequence, inflation in the
model fields to the visible brane, lead to a complex
early universe is modified in interesting ways, some
but fascinating interplay between gravity, particle
of which may leave a signature in cosmological
physics, and geometry, which enlarges and enriches
observations.
general relativity in the direction of a quantum
The m=a4 term in eqn [44] is the dark
gravity theory. High-precision astronomical data
radiation, so called because it redshifts with
mean that cosmology is a potential laboratory for
expansion like ordinary radiation. But, unlike
testing and constraining these brane worlds. The
ordinary radiation, it is not a form of detectable
models predict extra-dimensional signatures in the
matter, but the imprint on the brane of the
cosmic microwave background and other observa-
gravitational field in the bulk (the Coulomb effect
tions, and these predictions can in principle be tested
of the bulk black hole). This additional effective
against data.
relativistic degree of freedom is constrained by
nucleosynthesis in the early universe. Any extra See also: String Theory: Phenomenology; Supergravity;
radiative energy not thermally coupled to radiation Superstring Theories.
affects the rate of production of light elements, and
observed abundances place tight constraints on
such extra energy. The dark radiation can be no
more than 3% of the radiation energy density at Further Reading
nucleosynthesis: Brax P and van de Bruck C (2003) Cosmology and brane worlds:
a review. Classical and Quantum Gravity 20: R201 (arXiv:
3m hep-th/0303095) (arXiv: hep-th/0303095).
. 0:03 45
2 nuc Cavaglia M (2003) Black hole and brane production in TeV
gravity: a review. International Journal of Modern Physics
The other modification to the Hubble rate is via A18: 1843 (arXiv:hep-ph/0210296).
Langlois D (2003) Cosmology in a brane-universe. Astrophysics
the high-energy correction =. In order to recover and Space Science 283: 469 (arXiv:astro-ph/0301022).
the observational successes of general relativity, the Maartens R (2004) Brane-world gravity. Living Reviews in
high-energy regime where significant deviations Relativity 7: 7 (arXiv:gr-qc/0312059).
occur must take place before nucleosynthesis, that Quevedo F (2002) Lectures on string/brane cosmology. Classical
is, cosmological observations impose the lower and Quantum Gravity 19: 5721 (arXiv:hep-th/0210292).
Rubakov V (2001) Large and infinite extra dimensions. Physics-
limit Uspekhi 44: 871 (arXiv:hep-ph/0104152).
Wands D (2002) String-inspired cosmology. Classical and
 > 1 MeV4 ) M5 > 104 GeV 46 Quantum Gravity 19: 3403 (arXiv:hep-th/0203107).
Branes and Black Hole Statistical Mechanics 373

Branes and Black Hole Statistical Mechanics


S R Das, University of Kentucky, Lexington, KY, USA where  is the surface gravity at the horizon. The
2006 Elsevier Ltd. All rights reserved. principle of detailed balance further ensures that the
radiation rate of some species of particle i, i (k),
in some given momentum range (k, k dk) is related
to the corresponding absorption cross section i (k) by
Introduction
i k dd k
k 4
In classical general relativity, a black hole is a e!=TH  1 2d
solution of Einsteins equations with a region of
spacetime which is causally disconnected from the where ! is the energy and d denotes the number of
asymptotic region at infinity. The boundary of such spatial dimensions. The  sign refers to fermions
a region is called the event horizon. The spacetime (bosons), respectively. A nontrivial k dependence of
around the simplest black hole in three space i signifies a departure from black-body behavior.
dimensions is described by the Schwarzschild metric Consequently, i (k) is often called a grey-body
  factor. Equations [2] and [3] may be derived by
2GM
ds2  1  dt2 combining Hawkings calculation of the radiation
rc2 with standard thermodynamic relations. Alterna-
 
2GM 1 2 tively, they follow from the leading semiclassical
1 dr r 2 d2 1 approximations of path-integral formulations of
rc2
Euclidean gravity based on the standard Einstein
where G is Newtons gravitational constant, c is the Hilbert action. For an account of black-hole
velocity of light, and we have used spherical thermodynamics, see Wald (1994).
coordinates with d the line element on an S2 . A Unlike usual thermodynamic systems, black holes
nonrotating, uncharged star which is too massive to appear to pose a deep puzzle. In usual systems,
form a neutron star will eventually collapse, and at thermodynamics is a coarse-grained description of a
late times the metric will be given by [1]. The system which is in a highly degenerate state.
horizon is a null surface S2  t and the radius of the Typically, such systems are described in terms of a
S2 is rhorizon = 2GM=c2 . The Schwarzschild solution few macroscopic parameters such as the total
has generalizations to black holes with charge and energy, the total volume, the total charge. For each
angular momentum and no-hair theorems guarantee set of values of these macroscopic parameters, there
that a black hole has no other characteristic property. are a large number of microscopic states which can
All these solutions can be generalized to other be described in terms of the constituents such as
theories like supergravity in various dimensions. atoms or molecules. This degeneracy manifests itself
In 1974, Hawking showed that due to pair as an entropy S which is related to the number of
production of particles near the horizon, black microscopic states for a given set of values of the
holes radiate thermally. Hawkings calculation is macroscopic parameters,  by Boltzmanns relation
valid for black holes whose masses are much larger
than the Planck mass: for such black holes, the S log 5
curvature at the horizon is weak and normal
where units have been chosen such that the
semiclassical quantization is valid. Remarkably, the
Boltzmann constant is unity. For a black hole, the
properties of Hawking radiation are quite universal.
macrostates are specified by its mass, charge, and
A black hole can be characterized by an entropy
angular momentum. No-hair theorems, however,
called the BekensteinHawking entropy. The leading
seem to suggest that there are no other properties
result for the entropy SBH for all black holes in any
and hence no obvious candidate for microstates. In
theory with the standard EinsteinHilbert action is
the absence of such a statistical basis, one would be
given by
inevitably led to the conclusion that there is loss of
AH information in processes involving black holes.
SBH 2 In a consistent quantum theory of gravity, there
4G
would be such a statistical basis since quantum
where AH denotes the area of the horizon. The mechanics is unitary. String theory is a strong
temperature TH is given by candidate for a unified theory which contains
 gravity. Indeed, string theory provides a microscopic
TH 3
2 description for a class of black holes.
374 Branes and Black Hole Statistical Mechanics

Black Hole Solutions in String Theory (d  p  4)-dimensional extended objects as well.


These extended objects are called branes.
Perturbatively, the basic excitations of string theory
In the type IIB example, there should be two
are fundamental closed and open strings character-
kinds of one-dimensional extended objects
ized by a string tensionp T
s and hence a length scale, which carry electric charge under BMN , B0MN ,
the string length ls = 1= 2Ts . Consistency requires
called the F-string and the D-string, respectively.
that the string should be able to propagate in ten
There are also two kinds of five-dimensional
spacetime dimensions and should be supersym-
branes which carry magnetic charges under
metric at the fundamental level. Formulated in
BMN , B0MN , called the NS 5-brane and D5 brane,
this fashion, there are several consistent string
respectively. Finally, there should be a 3-brane,
theories: type IIA, type IIB, and heterotic string
since the corresponding 5-form field strength is
theory (which contain only closed strings perturba-
self-dual as well as a D7 brane. A similar catalog
tively) and type I theory (which contains both open
can be prepared for other string theories, as well
and closed strings).
as for 11-dimensional supergravity, which is the
At energies much smaller than 1=ls , only the
low-energy limit of M-theory.
massless modes of the string can be excited. For all
The classical solutions for a set of p-branes of the
these string theories, the massless spectrum of closed
same kind generally have inner and outer horizons
strings contains the graviton and the low-energy
which have the topology t  S8p  Rp . The outer
dynamics is given by the appropriate supersymmetric
horizon is then associated with a Hawking tempera-
generalization of general relativity, supergravity. In
ture and a BekensteinHawking entropy. Of parti-
addition, the closed-string spectrum contains a
cular interest are extremal limits. In this limit, the
neutral scalar field, the dilaton , whose expectation
inner and outer horizons coincide and the mass
value gives rise to a dimensionless parameter govern-
density is simply proportional to the charge. Given
ing interactions, called the string coupling gs :
some charge, the extremal solution has the lowest
energy. Extremal limits are interesting because in
gs e<> 6
supergravity these correspond to solutions in which
The ten-dimensional gravitational constant is given some of the supersymmetries (in this case, half of the
by supersymmetries) are retained such solutions are
called BogomolnyPrasadSommerfeld (BPS) satu-
G10 86 g2s ls8 7 rated solutions. The charge in question appears as a
central charge in the extended supersymmetry
Ten-dimensional supergravity has a wide variety of algebra. This fact may be used to show that such
black hole solutions, the simplest of which is the BPS solutions are absolutely stable. Indeed, for the
straightforward generalization of the Schwarzschild particular solution considered here, the Hawking
solution. temperature TH ! 0, so that there is no Hawking
radiation, as required by stability. Furthermore, the
entropy SBH ! 0. The horizon shrinks to a point
Black p-Brane Solutions
which appears as a naked null singularity.
More significantly, there are solutions which are All the ten dimensions of string theory need not be
charged with respect to the various gauge fields that noncompact. In fact, to describe the real world, one
appear in the supergravity spectrum. Generically, must have a solution of string theory in which six of
these charged solutions represent extended objects. the dimensions are wrapped up and form a compact
For accounts of such solutions, see Maldacena space. In principle, however, one can compactify
(1996). any number of dimensions. In the above example
Consider, for example, the supergravity which of a p-brane, it is trivial to compactify the
follows from type IIB string theory. This theory has directions along which the brane is extended to a
a pair of 2-form gauge fields BMN and B0MN and a p-dimensional torus, T p , which can be chosen to be
4-form gauge field AMNPQ with a self-dual field a product of p circles each of radius R. At length
strength. Just as an ordinary point electric charge scales much smaller than R, the theory then becomes
produces a 1-form gauge field, a (p 1)-form gauge a (10  p)-dimensional theory. The p-brane appears
field may be sourced by an electrically charged as a black hole with a spherical horizon and,
p-dimensional extended object. The corresponding since the original p-form gauge field now behaves
field strength is a (p 2)-form, whose Hodge dual in as an ordinary 1-form gauge field with a nonzero
d spacetime dimensions is a (d  p  2) form. This time component, this is an electrically charged
shows that there should be magnetically charged black hole.
Branes and Black Hole Statistical Mechanics 375

D1D5N System and Five-Dimensional Black The BekensteinHawking entropy is given by


Holes
RVr03
For reasons which will become clear in the next SBH cosh 1 cosh 5 cosh  12
83 ls8 g2s
section, it is useful to get extremal black holes with
large horizon areas, so that Hawkings semiclassical while the Hawking temperature is
formulas are valid. It turns out that such solutions 1
involve branes of various types which intersect each TH 13
2r0 cosh 1 cosh 5 cosh 
other and are suitably wrapped on compact internal
spaces. Such black holes then have necessarily The extremal limit of this solution is given by
different kinds of charges. It turns out that the
r0 ! 0; 1 ; 5 ;  ! 1
simplest case is a five-dimensional black hole with 14
three kinds of charges, which is obtained by brane Q1 ; Q5 ; N fixed
systems wrapped on a compact five-dimensional The extremal solution is a BPS saturated state and
space. An example is a type IIB solution which has retains four of the original supersymmetries. In this
D5 branes which are wrapped on either T 4  S1 or limit, the inner and outer horizons coincide. How-
K3  S1 , together with D1 branes wrapped on the S1 ever, the horizon is now a smooth S3 with a finite
as well as some momentum along the S1 . From the area in the Einstein frame metric. Consequently, the
noncompact five-dimensional point of view, this is a extremal BekensteinHawking entropy is also finite
black hole with three kinds of gauge charges: the D5 and may be seen to be
charge Q5 , the D1 charge Q1 , and a KaluzaKlein
charge N coming from the momentum P = N=R -charge extremal 2p
S3BH Q1 Q5 N 15
along the circle of radius R.
When the internal space is T 4  S1 the five- The temperature, however, is zero in this limit,
dimensional Einstein frame metric is given by which is consistent with the stability of a BPS
saturated state.
  The above five-dimensional black hole is in fact a
2 2=3 r02
ds f r 1  2 dt2 generalization of the ReissnerNordtsrom black
r
hole. Similar solutions with large horizon areas in
  the extremal limit can be constructed in four
1=3 dr 2
f r r 2 d23 8 dimensions. One such construction is in the IIB
1  r02 =r2
theory wrapped on T 6 in which there are four sets of
D3 branes which wrap four different T 3 s contained
where
in the T 6 . Black holes with lower supersymmetry
! ! may be obtained by replacing the T 6 by a Calabi
r 2 sinh2 1 r 2 sinh2 5
f r 1 0 2 1 0 2 Yau space.
r r
! Duality and Branes
r 2 sinh2 
 1 0 2 9 String theory has a rich set of symmetries called
r
duality symmetries which relate different kinds of
string theories that are suitably compactified.
and the three charges are
These symmetries relate different classical solutions.
Vr02 sinh 21 r02 sinh 25 For example, application of these symmetries relate
Q1 ; Q5 the five-dimensional black holes above with other
324 gs ls6 2gs ls2
10 five-dimensional black holes with different kinds of
VR2 charges. Furthermore, at the level of supergravity,
N r 2 sinh 2 these various theories may be derived from
324 ls8 g2s 0
a yet unknown 11-dimensional theory called the
where V is the volume of the T 4 and R is the radius M-theory whose low-energy limit is 11-dimensional
of the circle S1 . supergravity.
The ADM mass of the black hole is
RVr02 Branes in String Theory
MADM
324 g2s ls8
For a given string theory, the perturbative spectrum
cosh 21 cosh 25 cosh 2 11 consists of strings. However, at the nonperturbative
376 Branes and Black Hole Statistical Mechanics

level, there are, in addition, extended objects of gauge, the off-diagonal gauge fields and their super-
other dimensionalities. Duality symmetries imply symmetric partners (which include scalar fields in
that these extended objects are as fundamental the adjoint representation) are the low-energy
as the strings themselves. Such extended objects are degrees of freedom of open strings which connect
also called branes. For an exhaustive account of different branes.
branes in string theory, see Johnson (2003). The mass density or tension Tp of a single Dp
Like their counterparts in supergravity, branes in brane is given by
string theory are typically charged with respect to
1
some gauge fields. While supergravity solutions are Tp p1
16
possible with any value of the charge, in string gs 2p ls
theory the brane charges have to be quantized. This couples to the (p 1)-form gauge field with a
Multiple units of the minimum quantum of charge charge
can appear as collections of branes each with unit
charge or, alternatively, branes which wrap around p gs Tp 17
compact cycles in space a multiple number of times. and the YangMills coupling constant for the collec-
tive theory on the brane world volume is given by
D-Branes 2
gYMDp 2p2 gs lsp3 18
The extended objects in string theory are described
in terms of their collective excitations. These The ground state of a single Dp brane is a BPS state
are best understood for the class of branes called which preserves 16 of the 32 supersymmetries of the
D-branes in the type II theory, discovered by original theory. One consequence of this is that two or
Polchinski. These are D1, D3, D5, and D7 branes more parallel Dp branes of the same type form a
in type IIB and D0, D2, D4, and D6 branes in threshold bound state preserving the same supersym-
type IIA theory. Dp branes are characterized by the metries, with no net force between them. As a result, the
fact that they couple to, and act as sources for, tension of N parallel Dp branes is simply NTp .
(p 1)-form gauge fields which belong to the Branes of different dimensionalities can also form
RamondRamond sector of the theory. Collective bound states. Of particular interest are configura-
excitations of a p-dimensional extended object in tions which can form threshold bound states which
field theory are expected to be described by waves preserve some supersymmetries. For example, a set
on its (p 1)-dimensional world volume. The of N1 parallel Dp branes can form a threshold
collective coordinate action would be a quantum bound state with a set of N2 parallel D(4 p)
field theory which has vectors, corresponding branes with all the p branes lying entirely along the
to longitudinal oscillations of the brane, and (4 p)-branes. This configuration is also a BPS
scalars which correspond to transverse oscillations. saturated state preserving eight of the original
For D-branes in string theory, the theory of supersymmetries and would have charges under
collective excitations is a string field theory of open both (p 1)-form and (p 5)-form gauge poten-
strings whose endpoints lie on the brane. (This is the tials. The BPS nature ensures that the total mass
origin of the nomenclature D-brane: an open string density is the sum of the individual mass densities.
whose ends are constrained to lie on the brane has a
NS Branes
world-sheet description in which the bosonic
fields corresponding to transverse target space The other extended objects in string theory are
coordinates have Dirichlet boundary conditions.) called NS branes since they couple to p-form
The lowest-energy states of open superstrings are gauge fields which arise from the NeveuSchwarz/
ordinary massless gauge fields and their supersym- NeveuSchwarz sector of the world-sheet theory.
metric partners so that the low-energy limit of These are present in all the five string theories and
the string field theory is a supersymmetric gauge appear in two types. The first is a macroscopic
theory. fundamental string which may be wound around a
The fact that the underlying theory is a string compact direction. The second is called a solitonic
theory has an important consequence. For a system 5-brane. While the collective dynamics of a funda-
of N parallel D-branes of the same type, one mental string is the standard world-sheet description
would have open strings which join different branes of string theory, the description for the NS 5-brane
as well as the same brane. The low-energy is rather complicated and not known in full
theory then becomes a supersymmetric nonabelian detail. The rest of this article deals exclusively with
gauge theory with gauge group U(N). In a suitable D-branes.
Branes and Black Hole Statistical Mechanics 377

D-Branes and Black Branes It is well known that in ap p ) gauge


U(Q theory
p
the real
coupling constant is gYM QP  gs Qp . This means
The idea that black holes correspond to highly
that the semiclassical limit corresponds to a strongly
degenerate states in string theory is quite old and
coupled string-field theory which reduces to strongly
dates back to t Hooft (1990) and Susskind (1993).
coupled gauge theory in the low-energy limit and the
In the following two sections we discuss such black
picture of D-branes as a collection of open strings is
holes which are described by D-branes. For reviews
not very useful. In fact, known calculational methods
see Maldacena (1996), Das and Mathur (2001), and
in gauge theory or open-string theory are not valid in
David et al. (2002).
this regime.
We have so far discussed the string-theoretic
branes in two different ways. In the first description,
branes are solutions of the low-energy equations of Microscopic Entropy for Two-Charge Systems
motion this is the setting in which branes provide
conventional descriptions of black holes. In the The prospects are much better for extremal black
second description, branes are certain states in the holes, which appear as BPS states in string theory.
quantum theory of superstrings. More specifically, This is because the spectra of BPS states do not
D-branes are described in terms of states of the depend on the coupling. The degeneracy of such
open-string field theory which lives on the branes. states may therefore be calculated at weak coupling,
The first description is necessarily approximate. On where techniques are well known and the result can
the other hand, the second description is exact in be extrapolated to strong coupling without change.
principle, although in practice one might not know The simplest BPS state is the ground state of a set of
how to write down and analyze the string-field parallel D-branes of the same type. This state is indeed
theory in an exact fashion. 128-fold degenerate, which would imply a micro-
The description in terms of open-string field scopic entropy. This entropy, however, is small and
theory should reduce to the description in terms of therefore invisible in the corresponding classical
a classical solution when the charges and masses solution. Indeed, the classical solution shows that in
become large. If black-hole thermodynamics has a the extremal limit the horizon area is zero, leading to a
microscopic origin, D-branes should be highly vanishing BekensteinHawking entropy.
degenerate states in this limit and the entropy The next interesting class of states consists of
should be given by the Boltzmann formula. Further- threshold bound states with two kinds of
more, Hawking radiation should be understood as charges. Consider, for example, the D1D5 system
an ordinary decay process. on T 4  S1 considered above with no momentum
For a system of Qp parallel Dp branes, the mass along the D1s. By known duality transformations,
is Qp =gs , while Newtons gravitational constant this is equivalent to a fundamental IIB string which
G  gs2 . Gravitational effects are controlled by is wound Q5 times around the S1 and with a net
GM  gs Qp . A semiclassical limit in closed-string momentum P = Q1 =2Q5 R (where R is the radius of
theory requires gs ! 0, while a nontrivial gravita- the S1 ), with four of the transverse directions
tional effect in this limit requires gs Qp finite, which compactified on a T 4 . For this system, it is easy to
implies one must have Qp 1. Furthermore, when count the number of states for given values of Q1
gs Qp  1 the typical curvatures are small compared and Q5 at weak string coupling by simply enumer-
to the string scale and the semiclassical string theory ating the perturbative oscillator states of the string.
reduces to classical supergravity. This is the limit in For large values of Q1 and Q5 , we can alternatively
which branes are well described as classical calculate this entropy by using a canonical ensemble
solutions. of eight massless bosons corresponding to the eight
Similar considerations apply for brane systems with transverse polarizations and their supersymmetric
multiple charges. For example, in the D1D5N partners eight massless fermions moving on the
system the classical solution becomes a good string with some temperature T and a chemical
description when all the quantities gs Q1 , gs Q5 , and potential  for the total momentum.
gs2 N become large. (The relevant quantity which Consider a noninteracting gas of f massless bosons
comes with the momentum has gs2 rather than gs and f fermions living on a circle with circumference
because the mass contribution from the momentum is L. The average number of left- and right-moving
simply N/R without any inverse power of gs .) particles with some energy e, denoted by L , R ,
However, gs is the square of the coupling constant respectively, are
of the open-string theory living on the brane in fact, 1
eqn [18] shows this relation in the low-energy limit. i e ; i L; R 19
ee=Ti  1
378 Branes and Black Hole Statistical Mechanics

where the  sign refers to fermions and bosons, The key point, however, is that the two-charge
respectively, and we have introduced left- and right- solution has a singular horizon where the string
moving temperatures TL , TR . The physical tempera- frame curvature is large. Consequently, low-energy
ture is tree-level supergravity breaks down near the horizon
  and higher-derivative terms (e.g., higher powers of
1 1 1 1 curvature) become important. This issue has been
20
T 2 TL TR best studied for the fundamental heterotic string
compactified on T 6 . This is dual to the D1D5
The extensive quantities, such as the energy E, system in type IIB theory compactified on K3  T 2 .
momentum P, and entropy S, then become the sum The classical supergravity solution is then a singular
of left- and right-moving pieces: black hole in four spacetime dimensions. In one of
E EL ER ; P P L PR ; S SL SR 21 the first papers on the string-theoretic understanding
of black hole thermodynamics, Sen (1995) showed
and the distribution function [19] leads to the that, for large np , nw , string-loop effects are small
following thermodynamic relations: near the horizon so that the only relevant correc-
s tions are higher-derivative terms coming from
3Ei 4Si integrating out the massive modes of the string at
Ti ; i L; R 22 tree level. Furthermore, a robust scaling argument
Lf fL
shows that regardless of the detailed nature of the
Since the total momentum P = PR PL = ER  EL is derivative corrections, the macroscopic entropy
nonzero, the lowest-energy state is clearly the one in defined through the horizon area must be of the
p
which all the particles move in the same direction, form a np nw , where a is a pure number. Finally,
for example, right moving. This is a BPS state and one can define a stretched horizon as the surface
corresponds to the extremal solution in supergravity. where the curvature becomes of the order of the
Then E = ER = P = PR . This approach to the black string scale and the area of the stretched horizon
p
hole entropy was initiated by Das and Mathur is indeed proportional to np nw . This result gives
(1996) and Callan and Maldacena (1996). a strong indication that string theory provides a
For our two-charge system, f = 8, P = 2Q1 =L, microscopic basis for black hole thermodynamics,
and L = 2RQ1 Q5 . Using [22] we get although the coefficient a cannot be determined
without more detailed knowledge of higher-
p
2-charge-II
Smicro 2 2Q1 Q5 23 derivative terms.

This is the microscopic entropy for the fundamental


Microscopic Entropy of Extremal Three-Charge
string with momentum in the type II theory. By System
duality, this is also the microscopic entropy of the
D1D5 system. This is a large number which should Brane bound states with three kinds of charge
agree with the macroscopic entropy calculated from provide examples of black holes whose extremal
the corresponding classical solution. limits have large horizons with curvatures much
The discussion is almost identical for the funda- smaller than the string scale. In this case, a
mental heterotic string, except that now we have microscopic count of states in string theory should
24 right-moving bosons, eight left-moving bosons, exactly account for the BekensteinHawking
and eight left-moving fermions, and the BPS state formula, without corrections coming from
consists only of right movers. If nw denotes the higher derivatives. This is indeed true, as first found
winding number and np the quantized momentum by Strominger and Vafa (1996). In the following, we
the extremal heterotic string entropy is will outline how this calculation can be done in the
D1D5N system on K3  S1 or T 4  S1 following
p
S2-charge
micro
heterotic
4 np nw 24 the treatment of Dijkgraaf et al. (1996).
D1 branes can be considered as instanton
The supergravity solution for the D1D5 strings in the six-dimensional supersymmetric
system may be obtained by substituting  = 0 in U(Q5 ) gauge theory of D5 branes (actually, these
eqns [8][13]. In the extremal limit, the classical should be called solitonic strings rather than
BekensteinHawking entropy vanishes as is clear instantons, since the configurations are time
from the expression [15], in which N = 0. This independent). The total instanton number is the
appears to be in contradiction with the fact that the D1-brane charge Q1 . The moduli space of
state has a large microscopic entropy. these instantons is then a blown-up version of the
Branes and Black Hole Statistical Mechanics 379

orbifold (T 4 )Q1 Q5 =S(Q1 Q5 ) or (K3)Q1 Q5 =S(Q1 Q5 ) times. Thus, the thermodynamics may be analyzed
and is 4Q1 Q5 dimensional. Since any instanton exactly along the lines of the fundamental string in
configuration is independent of time x0 and the S1 the previous section. The thermodynamic relations
direction x5 , the collective coordinate dynamics is a are given by [22] with f = 4 and L = 2RQ1 Q5 . The
(1 1)-dimensional field theory which lives in the extremal state consists entirely of right movers and
(x0 , x5 ) space. At low energies, this flows to a E = ER = N=R. Substituting these values in [22]
conformal field theory with a central charge yields the correct formula for the microscopic
c = 6Q1 Q5 since there are 4Q1 Q5 bosons each entropy
contributing 1 to the central charge and an equal p
number of fermions each contributing 1=2. The BPS S3-charge
micro 2 Q1 Q5 N 27
state with momentum N=R is a purely right- or left-
moving state in this conformal field theory which The same expression follows if f = 4Q1 Q5 and
has a conformal weight N. From general principles L = 2R corresponding to Q1 Q5 singly wound
of conformal invariance, the degeneracy of such strings. However, for statistical methods to hold,
states for large N is given by Cardys formula the entropy must be much larger than the number of
p flavors. The ratio p of
the entropy to the number of
dN  e2 cN=6 25 flavors is S=f  N=Q1 Q5 for multiple singly
wound strings and is not guaranteed to be large
so that the microscopic entropy is when all of Q1 , Q5 , p N
are large. On the other hand,
p this ratio is S=f  Q1 Q5 N for the long string.
Smicro
3-charge log dn 2 cN=6 26
This shows that the long string is always entropi-
Substituting the value of c = 6Q1 Q5 , this is in exact cally favored.
agreement with the BekensteinHawking entropy of A departure from the extremal state is achieved by
the classical solution given in [15]. adding a left-moving momentum 2n=L as well as a
right-moving momentum 2n=L to the extremal
state, thus adding energy to the system but main-
Nonextremal Black Holes and Hawking taining the total momentum. For the long string, this
Radiation yields
The BPS property of ground states of D-brane p p
systems enables us to compute the degeneracy of SR 2 Q1 Q5 N n; SL 2 n 28
microstates exactly in the regime of parameters For small departures from extremality, n  N, the
where the state can be reliably described as a black expressions for the total entropy and temperature as
hole solution in the low-energy theory. However, a function of the excess energy E = 2n=Q1 Q5
extremal black holes have vanishing temperature agree exactly with the near-extremal Bekenstein
and do not radiate. To understand the microscopic Hawking entropy and the Hawking temperature of
origins of Hawking radiation, one has to go away the classical solution, as shown by Callan and
from extremality. Such states are not supersym- Maldacena (1996) and by Horowitz and
metric and an extrapolation of weak-coupling Strominger.
calculations to strong coupling is not a priori The necessity of the long string appears in another
justified. Nevertheless, it turns out that for small important physical consideration. For statistical
departures from extremality, weak-coupling results mechanics to be valid, the specific heat of the system
still reproduce semiclassical answers for entropy, has to be larger than unity. This implies that for
temperature, and luminosity. the case considered here the energy gap E must be
larger than 1=RQ1 Q5 , which is precisely what the
Near-Extremal Entropy
long string yields.
Nonextremal properties are best understood for the
D1D5N system on T 4  S1 . In the orbifold limit,
Hawking Radiation
the conformal field theory which describes the low-
energy dynamics is equivalent to a gas of strings A nonextremal state described above is unstable,
which are wound around the S1 and which can since a left mover can annihilate a right mover into a
oscillate along the T 4 . The total winding number is closed-string mode which may leave the brane
k = Q1 Q5 and may be achieved by sets of strings system and propagate to the asymptotic region.
which are multiply wound in various ways. As The resulting closed-string state will be in a thermal
argued below, entropically the most favored config- state whose temperature is the physical temperature
uration is a single long string wound around Q1 Q5 of the initial state. This process is the microscopic
380 Branes and Black Hole Statistical Mechanics

description of Hawking radiation. The decay rate is supersymmetric and therefore a naive extrapolation
related to the absorption cross section of the to strong coupling is not a priori justified. There
corresponding mode by the principle of detailed are strong indications, however, that low-energy
balance, encoded in eqn [4]. nonrenormalization theorems are at work. This
From the point of view of the classical solution, agreement has been established not only for black
the absorption cross section can be calculated by holes with finite-horizon areas, but also for other
solving the linearized wave equation in the systems with no horizons most significantly, a set
background geometry and calculating the ratio of of parallel 3-branes and forms the basis for
the incident and reflected waves. It follows from Maldacenas conjecture about AdS/CFT Correspon-
these calculations that at low energies, absorption dence (see AdS/CFT Correspondence).
(and hence emission) are dominated by massless
minimally coupled scalars. In fact, for any spheri-
cally symmetric black hole in any number of Effects of Higher-Derivative Terms
dimensions, there is a general theorem which The classical low-energy limit of string theory is
ensures that the low-energy limit of this absorption supergravity. The effects of the massive modes of the
cross section is exactly equal to the horizon area. string as well as effect of string loops is to add terms to
In the microscopic model for the three-charge the supergravity action which involve higher number
black hole, this absorption cross section may be of spacetime derivatives, for example, terms containing
calculated by the usual rules of quantum mechanics. higher powers of the curvature. In the presence of such
In the long-string limit and in the approximation terms, the BekensteinHawking formula for black hole
that the modes on the long string form a dilute gas, entropy [2] receives corrections which can be calcu-
the result has been derived by Das and Mathur lated in a systematic fashion. It turns out that for a
(1996): class of extremal black holes, this corrected entropy as
2G10 Q1 Q5 e!=T  1 computed in the modified supergravity is also in exact
! ! !=2T 29 agreement with a microscopic calculation.
V e R  1e!=2TL  1
One example of this agreement is provided by four-
where V is the volume of the T 4 and T is the dimensional extremal black holes in type IIA string
physical temperature given by [20]. For a near- theory compactified on a CalabiYau manifold. These
extremal hole TR  TL , so that T  2TL . Then are obtained by wrapping D4 branes on three different
in the extreme low-energy limit !  TR , so that 4-cycles on the CalabiYau and having in addition a
the corresponding Bose factor may be approxi- number of D0 branes. Let pA , A = 1, . . . ,3 denote the
mated as 1=(e!=2TR  1)  2TR =!. The cross three D4 charges and q0 denote the D0 charge. The
section [29] becomes microscopic entropy of the BPS state can be computed
by embedding this in M-theory:
4Q1 Q5 G10 TR 4G10 SR
 SCYBlack
micro
hole
V 2RV r
1
4G5 Sextremal AH 30 2 jq0 jCABC pA pB pC c2A pA 31
6
where G5 is the five-dimensional Newtons gravita- where CABC is the intersection number of the
tional constant. We have used the relation [22] with 4-cycles and c2 denotes the second Chern class of
L = 2RQ1 Q5 and f = 4. The fact that in the near- the CalabiYau space. When all the charges pA are
extremal limit SR is simply the extremal entropy and large, the term involving c2 is subdominant. In this
the fact that the extremal entropy reproduces the case, the result agrees with the BekensteinHawking
BekensteinHawking formula has been used as well. entropy of the corresponding classical solution.
Thus, the microscopic cross section exactly reproduces When the charges are not all large (so that the
the semiclassical result at low energies. Even more second term is appreciable), the curvatures of the
remarkably, the full cross section [29] agrees with the supergravity solution become large at the horizon
semiclassical answer for the gray-body factor for and higher-derivative corrections to the action
parameters which correspond to the dilute-gas regime, cannot be ignored. In this particular case, it turns
as shown by Maldacena and Strominger. out that these higher-derivative corrections are
It is rather surprising that the results for micro- string-loop corrections and can be computed using
scopic absorption cross section calculated at weak general properties of N = 2 supersymmetry, so that
coupling agree with the semiclassical answers, since one can compute corrections to near-horizon
the relevant process involves states which are not geometry. Furthermore, one has to now modify the
Branes and Black Hole Statistical Mechanics 381

expression for macroscopic entropy using the open strings. This is a consequence of the basic
formalism of Wald. Putting these together, it is duality between open strings and closed strings.
found that the macroscopic entropy following from Furthermore, the open-string theory lives in a lower-
the modified supergravity is in exact agreement with dimensional spacetime. This is a manifestation of
[31]. This subject is reviewed in Mohaupt (2000). the holographic principle. As argued by Maldacena,
These methods have also been applied to the the presence of a horizon implies that the low-
problem of two-charge black holes in heterotic energy limit retains all the modes of the closed
string theory on T6 or, equivalently, type IIA on strings near the horizon, while it truncates the open-
K3  T 2 (Dabholkar 2004). Recall that in this case string theory to a gauge theory. Openclosed duality
the horizon of the usual supergravity solution is then reduces to gaugestring duality. This provides a
singular. It has been found that leading-order strong evidence that black holes obey the normal
higher-derivative corrections smoothen out the laws of quantum mechanics and hence their time
horizon into a AdS2  S2 spacetime and the evolution is unitary.
modified expression for the macroscopic entropy is One of the most outstanding problems in the
again in exact agreement with the microscopic subject is a proper understanding of neutral black
answer [23]. holes. Most of the quantitative results described
above depend on supersymmetry, which allows
extrapolation of weak-coupling answers to the
Geometry of Microstates
strong-coupling domain. Some of these results can
A satisfactory solution of the information-loss be extended to situations which have small depar-
paradox requires a much more detailed understand- tures from supersymmetry, for example, near-
ing of black holes in string theory. The discussion extremal black holes. States corrresponding to
above shows that black holes have microstates neutral black holes are, however, far from super-
which may be described well in the weak-coupling symmetry and known calculational techniques fail.
regime. It is interesting to ask whether there is a There are good reasons to expect, however, that the
description of these microstates in the strong- general philosophy in particular the holographic
coupling regime in terms of the effective geometry principle is still valid. Finally, so far string theory
perceived by suitable probes. This question has been has been able to attack problems of eternal black
answered for the two-charge system in great detail holes. A satisfactory understanding of the informa-
(see Mathur (2004)). It turns out that the D1D5 tion-loss problem requires an understanding of the
microstates can be described by perfectly smooth dynamics of black hole formation and subsequent
metrics with no horizons, and they asymptote to evaporation. Unfortunately, very little is known
the standard two-charge metric discussed above. about this at the moment.
The location of the erstwhile stretched horizon
marks the point where the different microstates See also: AdS/CFT Correspondence; Black Hole
start differing from each other significantly. Since Mechanics; Supergravity; Superstring Theories.
each such geometry does not have a horizon, neither
does it have any entropy this is consistent with
their identification with nondegenerate microstates. Glossary
Indeed, the number of such microstates correctly ADM (ArnowittDeserMisner) mass Mass of a gravita-
accounts for the microscopic entropy. Whether a tional background which is asymptotically flat.
similar picture holds for the three-charge system AdSn (anti-de Sitter space) A space (or spacetime) with
remains to be seen in detail, although there are some constant negative curvature in n dimensions.
indications that this may be true. In this approach, it BPS state (BogomolnyPrasadSommerfeld state) In a
is not yet fully understood how a horizon emerges theory of extended supersymmetry, a state that is
and why the entropy scales as the horizon area. invariant under a nontrivial subalgebra of the full
supersymmetry algebra. These states always carry
conserved charges, and supersymmetry determines the
Outlook mass exactly in terms of the charges.
CalabiYau space Complex Kahler manifold with
One key feature of the understanding of black hole
vanishing first Chern class.
statistical mechanics from the dynamics of branes is Compactify (n. compactification) To consider a field or
the fact that a problem in gravity is mapped to a string theory in a spacetime some of whose spatial
problem in a theory without gravity, for example, dimensions are compact.
open-string field theory. In fact, the closed strings in Dirichlet boundary condition The boundary condition
the bulk are already contained in the spectrum of the which fixes the value of a field on the boundary.
382 Branes and Black Hole Statistical Mechanics

Duality Equivalence of systems which appear to be Threshold bound state A bound state which is margin-
distinct. For string theories, such equivalences relate ally bound, that is, the binding energy is zero.
string theories on different spacetimes as well as Tree level In a Feynman diagram expansion of a field
theories with different coupling constants. theory, terms which contribute to lowest order of the
EinsteinHilbert action The standard action for gravity Planck constant h.
which leads
R to Einsteins equation, U(N) The group of N  N unitary matrices. If the
p
S = (1=16G) dd x gR, where R is the Ricci scalar, determinant is unity, the subgroup is called SU(N).
g denotes the determinant of the metric, and G is
Newtons gravitational constant.
Instanton A classical solution of Euclidean field theory Further Reading
with finite action.
KaluzaKlein gauge field In a compactified theory, the Callan CG and Maldacena M (1996) D-brane approach to black
gauge field which arises from the metric of the higher- hole quantum mechanics. Nuclear Physics B 472: 591
dimensional theory. (arXiv:hep-th/9602043).
Dabholkar A (2004) Exact counting of black hole microstates,
K3 The unique CalabiYau manifold in four dimensions
arXiv:hep-th/0409148.
having an SU(2) holonomy. Das SR and Mathur SD (1996) Comparing decay rates for black
Loop levels In a Feynman diagram expansion of a field holes and D-branes. Nuclear Physics B 478: 561 (arXiv:hep-
theory, terms which contribute in higher orders of the th/9606185).
Planck constant h. Das SR and Mathur SD (2001) The quantum physics of black
Macroscopic entropy Entropy associated with gravita- holes: results from string theory. Annual Review of Nuclear
tional backgrounds via the BekensteinHawking for- and Particle Science 50: 153 (arXiv:gr-qc/0105063).
mula or its generalization. David JR, Mandal G, and Wadia SR (2002) Microscopic
Microscopic entropy Entropy which follows from the formulation of black holes in string theory. Physics Reports
degeneracy of states of a system via Boltzmanns 369: 549 (arXiv:hep-th/0203048).
Dijkgraaf R, Moore GW, and Verlinde E (1996) Elliptic genera of
relation.
symmetric products and second quantized strings. Commu-
Minimally coupled scalar A scalar field whose equation nications in Mathematical Physics 185: 197 (arXiv:hep-th/
of motion is the standard KleinGordon equation 9608096).
where the derivatives are covariant derivatives. t Hooft G (1990) The black hole interpretation of string theory.
NeveuSchwarz/NeveuSchwarz states In type I and II Nuclear Physics B 335: 138.
string theories, bosonic closed-string states whose left- Johnson C (2003) D-Branes. Cambridge: Cambridge University
and right-moving parts are bosonic. Press.
No-hair theorem A theorem in general relativity which Maldacena JM (1996) Black holes in string theory, arXiv:hep-th/
states that black holes with nonsingular horizons are 9607235.
uniquely characterized by their mass, angular Maldacena J, Strominger A, and Witten E (1997) Black hole
entropy in M-theory. Journal of High Energy Physics
momenta, and charges which can couple to long-
9712: 002 (arXiv:hep-th/9711053).
range gauge fields. Mathur SD (2004) Where are the states of a black hole?,
Orbifold A coset space M=G where G is a group of arXiv:hep-th/0401115.
discrete symmetries of a manifold M. If G has a fixed Mohaupt T (2000) Black hole entropy, special geometry
point, the space is singular. and strings. Fortschritte der Physik 49: 3 (arXiv:hep-th/
p-Form A fully antisymmetric p-index tensor. 0007195).
RamondRamond states In type I and II string theories, Sen A (1995) Extremal black holes and elementary string states.
bosonic closed-string states whose left- and right- Modern Physics Letters A 10: 2081.
moving parts are fermionic. Strominger A and Vafa C (1996) Microscopic origin of the
ReissnerNordstrom black hole Black hole solution of BekensteinHawking entropy. Physics Letters B 379: 99
(arXiv:hep-th/9601029).
general relativity with electric Maxwell charge.
Susskind L (1993) Some speculations about black hole entropy in
Sn n-Dimensional sphere. string theory, arXiv:hep-th/9309145.
Supergravity Supersymmetric extension of general Wald R (1994) Quantum Field Theory In Curved Space-Time and
relativity. Black Hole Thermodynamics. Chicago, IL: University of
Supersymmetry A symmetry between bosons and Chicago Press.
fermions.
Breaking Water Waves 383

Breaking Water Waves


A Constantin, Trinity College, Dublin, features of wave breaking (blow-up rate and blow-
Republic of Ireland up set for certain types of breaking waves). We
2006 Elsevier Ltd. All rights reserved. conclude the presentation with a discussion of the
way in which solutions to the CamassaHolm
equation can be continued after wave breaking.

Introduction
Watching the sea or a lake it is often possible to The Governing Equations
trace a wave as it propagates on the waters surface.
One can roughly distinguish two types of breaking The water waves that one typically sees propagating
waves. All waves break while reaching the shore but on the surface of the sea or on a lake are, as a matter
certain waves break far from the shore. In the first of common experience, approximately two dimen-
case, the change in water depth or the presence of an sional. That is, the motion is identical in any direction
obstacle (e.g., a rock) seems to cause wave breaking, parallel to the crest line. To describe these waves, it
while for certain waves within the second category, suffices to consider a cross section of the flow that is
these factors appear not to be essential. It is a matter perpendicular to the crest line. Choose Cartesian
of observation that for many waves that break in the coordinates (x, y) with the y-axis pointing vertically
open water a drastic increase in their slope near upwards and the x-axis being the direction of wave
breaking is noticeable. This leads us to the following propagation, while the origin lies at the mean water
mathematical definition: the wave profile gradually level. Let (u(t, x, y), v(t, x, y)) be the velocity field of
steepens as it propagates until it develops a point the flow, let y =  d be the flat bed (for some fixed
where the slope is vertical and the wave is said to d > 0), and let y = (t, x) be the waters free surface.
have broken (Whitham 1980). Throughout this Homogeneity (constant density) is a physically reason-
article, we are concerned with wave breaking that able assumption for gravity waves (Johnson 1997),
is not caused by a drastic change of the topography and it implies the equation of mass conservation
of the bottom; for a discussion of wave breaking at ux vy 0 1
the beach we refer to Johnson (1997). The governing
equations for water waves (see the next section) are The inviscid setting is realistic since experimental
too difficult to be dealt with in their full generality. evidence confirms that the length scales associated
Therefore, to gain some insight, one has to find with an adjustment of the velocity distribution due to
simpler models that are more tractable mathemati- laminar viscosity or turbulent mixing are long com-
cally. Investigating the properties of the model, pared to typical wavelengths. Under the assumption of
certain predictions can be made. The conclusions inviscid flow the equation of motion is Eulers equation
reached will reflect reality only to some limited ut uux vuy Px
extent. The value of a model depends on the number 2
and the degree of accuracy of physically useful vt uvx vvy Py  g
deductions that can be made from it the truth of where P(t, x, y) denotes the pressure and g is the
the model is meaningless as all experiments contain gravitational constant of acceleration. The free
inaccuracies and effects other than those accounted surface decouples the motion of the water from
for (while deriving the model) cannot be totally that of the air so that (Johnson 1997) the dynamic
excluded. We intend to discuss the way in which a boundary condition
recent model due to Camassa and Holm (1993) can
lead to a better understanding of breaking water P P0 on y t; x 3
waves. Firstly we survey a few classical nonlinear must hold if we neglect surface tension, where P0 is
partial differential equations that model the propa- the (constant) atmospheric pressure. Moreover,
gation of water waves over a flat bed (within the since the same particles always form the free surface,
confines of the linear theory one cannot cope with we have the kinematic boundary condition
the wave breaking phenomenon) and discuss their
relevance to the study of breaking waves. We then v t ux on y t; x 4
analyze the breaking of waves within the context of On the flat bed we have the kinematic boundary
the CamassaHolm equation: existence of breaking condition
waves, criteria that guarantee that a certain initial
shape develops into a breaking wave, specific v0 on y d 5
384 Breaking Water Waves

expressing the fact that the flow is tangent to the yields an equation that is usually of significance in
horizontal bed (or, equivalently, that water cannot some region of space/time. The aim of this process is to
penetrate the rigid bed). The governing equations obtain a simpler model that can be used to gain some
for water waves are [1][5]. Other than the fact that understanding and to make some predictions for
they are highly nonlinear, a main difficulty in specific physical processes. This scaling method yields
analyzing the governing equations lies in the fact the Kortewegde Vries (KdV) equation
that we deal with a free boundary problem: the free
surface y = (t, x) is not specified a priori. In our t x xxx 0; t > 0; x 2 R 7
discussion, we suppose that initially (at time t = 0), a as a model for the unidirectional propagation of
disturbance of the flat surface of still water was shallow water waves over a flat bed (Johnson 1997).
created and we analyze the subsequent motion of In [7] the function (t, x) represents the height of the
the water. The balance between the restoring gravity waters free surface above the flat bed. We would
force and the inertia of the system governs the like to emphasize that the shallow water regime
evolution of the mass of water and our primary does not refer to water of insignificant depth it
objective is the behavior of the free surface. indicates that the typical wavelength is much larger
An important category of flows are those of zero than the typical depth (e.g., tidal waves are
vorticity, characterized by the additional assumption considered to be shallow water waves although
uy v x 6 they affect the motion of the deep sea). The KdV
model admits the solitary wave solutions
The vorticity of a flow, ! = uy  vx , measures the local p 
spin or rotation of a fluid element. In flows for which c
c t; x 3c sech2 x  ct ; c 2 R 8
[6] holds the local whirl is completely absent and for 2
this reason such flows are called irrotational. Relation
[6] ensures the existence of a velocity potential, namely For any fixed c > 0, the profile c propagates without
a function (t, x, y) defined up to a constant via change of form at constant speed c on the surface on
the water, that is, it represents a traveling wave. Since
x u; y v the profiles [8] of the traveling waves drop rapidly to
Notice that [1] ensures that  is a harmonic the undisturbed water level  = 0 ahead and behind the
function, that is, (@x2 @y2 ) = 0. In this way, the crest of the wave, c are called solitary waves. Notice
powerful methods of complex analysis become that [8] shows that taller solitary waves travel faster.
available for the study of irrotational flows. Thus, They have other special properties: an initial profile
while most water flows are with vorticity, the study consisting of two solitary waves, with the taller
of irrotational flows can be defended mathemati- preceding the smaller one, evolves in such a way that
cally on grounds of beauty. Concerning the physical the taller wave catches up the other, there is a period of
relevance of irrotational water flows, experimental complicated nonlinear interaction but eventually both
evidence indicates that for waves entering a region solitary waves emerge completely unscathed! This
of still water the assumption of irrotational flow is special type of nonlinear interaction (the superposition
realistic (Johnson 1997). Moreover, as a conse- principle is not valid since KdV is a nonlinear
quence of Kelvins circulation theorem (Acheson equation) in which solitary waves regain their form
1990), a water flow that is irrotational initially has upon collision occurs only for special equations, in
to be irrotational at all later times. It is thus which case the solitary waves are called solitons. A
reasonable to consider that water motions starting further interesting property of the KdV model, relevant
from rest will remain irrotational at later times. for the understanding of the interaction of solitons, is
the fact that it is completely integrable (McKean
1998): there is a transformation which converts the
equation into an infinite sequence of linear ordinary
Nonlinear Model Equations differential equations which can be trivially integrated.
Starting from the governing equations [1][6] one can Moreover, the KdV-solitons c are stable: an initial
derive a variety of model equations using the non- profile that is close to the form of a soliton will evolve
dimensionalization and scaling approach: a suitable into a wave that at any later times has a form close to
set of nondimensional variables is introduced, which, that of a soliton (Benjamin 1972). Despite all these
after scaling, leads to the appearance of parameters. intriguing features of the KdV-model, for all initial
The sizes and relative sizes of these parameters then profiles x 7! (0, x) within the Sobolev space H 1 (R) of
govern the type of phenomenon that is of interest. An square-integrable functions with a square-integrable
asymptotic expansion in one or several parameters distributional derivative, eqn [7] has a unique solution
Breaking Water Waves 385

defined for all times t  0 (cf. Kenig et al. (1996)) so H 3 (R) there is a unique solution of [10] defined on
that the KdV model cannot be used to shed light on the some maximal time interval [0, T) and the solution
wave breaking phenomenon. stays uniformly bounded on [0, T) with
Whitham (1980) suggested the equation  
Z lim inf fx t; xgT  t 2 if T < 1
t"T x2R
t x kx  yy t; ydy 0 9
R In addition to this, for a large class of initial data, there
for the free surface profile x 7! (t, x), with the is precisely one point where the slope of the wave
singular kernel becomes infinite at breaking time (Constantin 2000): if
Z   0 6 0 is odd and such that 0 (x)  000 (x)  0 for all
1 tanh  1=2 ix x  0, then the corresponding wave t 7! [x 7! (t, x)]
kx e d
2 R  will break in finite time T < 1 and
to model wave breaking. It can be shown lim x t;0 1
t"T
(see Constantin and Escher (1998) and references
therein) that [9] describes wave breaking: there are whereas
smooth initial profiles x 7! (0, x) such that the
coshx
resulting unique solution of [9] exists on a maximal jx t; xj  K K
time interval [0, T) with jsinhxj
t 2 0; T; x 6 0
sup ft; xg < 1
t;x20;TR for some constant K > 0. Thus, the CamassaHolm
inf fx t; xg ! 1 as t"T model is an integrable infinite-dimensional Hamil-
x2R tonian system with stable solitons and eqn [10]
admits also breaking waves as local solutions (see
(the solution remains bounded but its slope becomes
Constantin and Escher (1998) and McKean (1998)
infinite in finite time). However, in contrast to the KdV
and references therein for further results on wave
model, eqn [9] is not integrable and does not possess
breaking for the CamassaHolm equation).
soliton solutions. As emphasized by Whitham (1980),
We conclude our discussion by pointing out that it
it is intriguing to find models for water waves which
is possible to continue solutions of the Camassa
exhibit both soliton interaction and wave breaking.
Holm equation past the breaking time. For this
The CamassaHolm equation
purpose it is convenient to rewrite [10] as the
t  txx 3x 2x xx xxx 10 nonlinear nonlocal conservation law
Z  
1 2
was first obtained by Fokas and Fuchssteiner (1981/ t x @x ejxyj 2 x dy 0 11
82) as a nonlinear partial differential equation with 2 R 2
infinitely many conservation laws. Camassa and Holm reminiscent to some extent to the form of [7] and [9]
(1993) derived [10] as a model for shallow water and obtained by formally applying the operator
waves, established that the equation possesses soliton (1  @x2 )1 to [10] in view of the fact that
solutions and found that it is formally integrable (for
a discussion of the integrability issues we refer 1  @x2 1 f P  f for f 2 L2 R
to Constantin (2001), and Lenells (2002)). Moreover,
the solitons of [10] are stable (Constantin and Strauss the kernel of the convolution being
2003). An astonishing plentitude of structures is
Px 12ejxj ; x2R
tied into the CamassaHolm equation: [10] is a re-
expression of geodesic flow on the diffeomorphism By introducing a new set of independent and depen-
group (Constantin 2000, Kouranbaeva 1999), a dent variables it is possible to resolve all singularities
property that can be used to show that the least action due to wave breaking in the sense that [11] is
principle holds in the sense that there is a unique flow transformed into a semilinear system, the unique
transforming a wave profile into a nearby profile solution of which can be obtained as a fixed point of
within the class of flows that minimize the kinetic a contractive operator (Bressan and Constantin 2005).
energy (see the discussion in Constantin (2000) and In terms of [11], a semigroup of global conservative
Constantin and Kolev (2003)). Interestingly, the solutions (in the sense that the total energy
CamassaHolm equation also models wave breaking. Z
1
More precisely (see the discussion in Constantin 2 x2 dx
(2000)), for any initial data x 7! 0 (x) = (0, x) in 2 R
386 BRST Quantization

equals a constant, for almost every time), depending Constantin A and Kolev B (2003) Geodesic flow on the
continuously on the initial data (0, ) 2 H 1 (R), is diffeomorphism group of the circle. Commentarii Mathematici
Helvetica 78: 787804.
thus constructed. Constantin A and Strauss WA (2000) Stability of peakons. Commu-
nications on Pure and Applied Mathematics 53: 603610.
See also: Compressible Flows: Mathematical Theory; Fokas AS and Fuchssteiner B (1981/82) Symplectic structures,
Dynamical Systems in Mathematical Physics: their Backlund transformations and hereditary symmetries.
An Illustration from Water Waves; Integrable Systems: Physica D 4: 4766.
Overview; Interfaces and Multicomponent Fluids. Gesztesy F and Holden H (2003) Soliton Equations and their
Algebro-Geometric Solutions. Cambridge: Cambridge Univer-
sity Press.
Further Reading Johnson RS (1997) A Modern Introduction to the Mathematical
Theory of Water Waves. Cambridge: Cambridge University Press.
Acheson DJ (1990) Elementary Fluid Dynamics. New York: Johnson RS (2002) CamassaHolm, Kortewegde Vries and
Oxford University Press. related models for water waves. Journal of Fluid Mechanics
Benjamin TB (1992) The stability of solitary waves. Proceedings 455(2002): 6382.
of the Royal Society of London Series A 328: 153183. Kenig CE, Ponce G, and Vega LA (1996) A bilinear estimate with
Bressan A and Constantin A (2005) Global conservative applications to the KdV equation. Journal of the American
solutions of the CamassaHolm equation, Preprints on Mathematical Society 9: 573603.
Conservation Laws 2005-016 (www.math.ntnu.no/conserva- Kouranbaeva S (1999) The CamassaHolm equation as a geodesic
tion/2005/016) . flow on the diffeomorphism group. Journal of Mathematical
Camassa R and Holm DD (1993) A new integrable shallow water Physics 40: 857868.
equation with peaked solitons. Physical Review Letters Lenells J (2002) The scattering approach for the CamassaHolm
71: 16611664. equation. Journal of Nonlinear Mathematical Physics
Constantin A (2000) Existence of permanent and breaking waves 9: 389393.
for a shallow water equation: a geometric approach. Annales McKean HP (1979) Integrable systems and algebraic curves. In:
de lInstitut Fourier (Grenoble) 50: 321362. Global Analysis, Lecture Notes in Mathematics, vol. 755,
Constantin A (2001) On the scattering problem for the Camassa pp. 83200. Berlin: Springer.
Holm equation. Proceedings of the Royal Society of London McKean HP (1998) Breakdown of a shallow water equation.
Series A 457: 953970. Asian Journal of Mathematics 2: 867874.
Constantin A and Escher J (1998) Wave breaking for nonlinear Whitham GB (1980) Linear and Nonlinear Waves. New York:
nonlocal shallow water equations. Acta Mathematica Wiley.
181: 229243.

BRST Quantization
M Henneaux, Universite Libre de Bruxelles, Bruxelles, the necessary algebraic material underlying the con-
Belgium struction and then illustrates it in the cases of the
2006 Elsevier Ltd. All rights reserved. Hamiltonian BRST formalism and the Lagrangian
BRST formalism.

Introduction
A Result from Homological Algebra
The BRST symmetry was originally introduced in the
seminal papers by Becchi et al. (1976) and Tyutin (1975) The main result of homological algebra needed in
for YangMills gauge theories as a tool for controlling the BRST construction deals with a differential
the renormalization of the models in a consistent (gauge- complex C with two gradings. The first grading is
independent) way. This symmetry was discovered as a an N-degree and is called the resolution degree, or
residual symmetry of the gauge-fixed action. It was r-degree. The second grading is a Z-degree and is
realized later that, in fact, the BRST construction is quite called the total ghost number. It is denoted by gh.
general, in the sense that it covers arbitrary gauge We assume that there are two odd derivations  and
theories and not just YangMills gauge models. s0 that have the following properties:
Furthermore, it is intrinsic, in that no gauge choice is
actually necessary to define it. r 1; gh 1
1
The purpose of this review is to explain the general, rs0 0; ghs0 1
intrinsic features of the BRST formalism applicable to
any gauge theory. The proper setting for discussing and
these issues is that of homological algebra (Stasheff
(1998), and references therein). This article first explains 2 0; s0  s0 0; s20 ; s1  2
BRST Quantization 387

for some derivation s1 of r-degree 1 and ghost In physical applications, the total ghost number is
number 1. The bracket [ ,] is the graded commu- a derived quantity. The primary gradings are the
tator in this specific case, the anticommutator. We resolution degree and the filtration degree called
also assume that the homology of  vanishes at the pure ghost number and denoted pgh. It is an
nonzero value of the r-degree, both in the original N-degree and one has
complex C,
gh pgh  r 11
Hk ; C 0; k>0 3 The r-degree is known as the antighost or antifield
number, depending on the context (see below).
(which is equivalent to a 0, ra > 0 ) a b) When r(x) = 0, one has gh(x) = pgh(x). Since the
and in the space of derivations, pure ghost number is non-negative, this implies that
;  0; r 6 0 )  ;  4 H k s; C 0; k<0 12
where  and  are both derivations in C. The
r-degree of a homogeneous linear operator 
is defined through r((x)) = r() r(x) for any
element x 2 C and is negative when  decreases the A Geometric Application
r-degree. Geometric Setting
In H0 (, C), the (odd) derivation s0 defines a
differential. The cohomology of s0 modulo , Theorem 1 is relevant to the following situation.
denoted H k (s0 , H0 (, C)), is the cohomology of s0 in Consider a surface  in a manifold M, defined by
H0 (, C). It is explicitly defined through the cocycle equations
condition fa 0 13
s0 a m 5 which may or may not be independent. (We assume
with coboundaries of the form for definiteness that the variables in M are bosonic,
that is, that M is an ordinary manifold as opposed
s0 b n 6 to a supermanifold. The graded case can be covered
without difficulty by including appropriate sign
The central result underlying the BRST construc-
factors at the relevant places.) Assume that  is
tion is:
partitioned by orbits generated by vector fields X 
Theorem 1 Given the above setting, there exists defined everywhere in M, tangent to  and closing
an odd derivation s in C with the following on  in the Lie bracket,
properties:
X  ; X   C  X  more 14
s  s0 s1    7
where more denotes terms that vanish on . We
assume, for simplicity, that the vector fields X  are
rsk k; ghsk 1 8 linearly independent of , although this is not
necessary. The formalism can be developed in the
s2 0 9 nonindependent case, but it then requires more vari-
ables. We are interested in the quotient space =O of
Furthermore, one has the surface  by the orbits. To guide the geometrical
intuition, we shall assume that this quotient space is a
Hk s; C H k s0 ; H0 ; C 10 smooth manifold (the fiber of the orbits, etc.), and we
shall suggestively adopt notations adapted to this best
The proof is straightforward (see, e.g., possible case. The approach, being purely algebraic, is
Henneaux and Teitelboim (1992)). In particular, in fact more general. (Accordingly, the notations
the proof of [10] is a standard spectral sequence should be understood with a liberal mind.)
argument with a sequence that collapses after the The aim here is to describe the algebra of
second step. It is interesting to note that, contrary observables, that is, the algebra C1 (=O) of
to s0 , which is only a differential modulo ,s is a functions on the quotient space =O. The terminology
true differential. The construction of s provides a observables anticipates the physical situation dis-
model for H k (s0 , H0 (, C)). The differential s is not cussed below, where the orbits are the gauge orbits.
unique, but this does not affect the subsequent In order to describe algebraically the algebra of
discussion. observables, one observes that this algebra is obtained
388 BRST Quantization

through a two-step procedure. First, one restricts the functions on M are annihilated by , they are
functions from M to . Second, one imposes the clearly cycles at r-degree zero. Because the left-
invariance condition along the orbits. To each of these hand side fa of the equations fa = 0 are exact
steps corresponds a separate differential. (equal to ta ), the ideal N coincides with the set
of boundaries in degree zero.
Longitudinal Complex Thus,
The longitudinal complex is associated with the H0 ; K C1  21
second step. One can consider on  an exterior
derivative operator D along the gauge orbits. This We see accordingly that  successfully enforces the
operator is defined on functions on  as restriction to the surface  through its homology in
degree zero.
Df X  f C 15 However, if the equations fa = 0 are not indepen-

where the 1-forms C dual to the X s are called dent, this is not the end of the story. Indeed, any
ghosts. In the physical context, the form-degree is identity ZaA fa = 0 on the functions fa leads to a
the pgh described earlier, and so pgh(C ) = 1. The nontrivial cycle ZaA ta in r-degree 1, (ZaA ta ) = 0. This
action of D on the ghosts is given by is undesirable. To cure this drawback, one intro-
duces further generators tA in r-degree 2, one for
DC 12C   C C 16 each identity ZaA fa = 0, and defines
The longitudinal complex L is the complex of
tA ZaA ta ; rtA 2 22
exterior forms along the gauge orbits. In our
representation used here, it is given by the space
in order to kill the unwanted cycles ZaA ta . The
of polynomials in the ghosts C with coefficients
Koszul complex K is thus enlarged to contain these
that are functions on . The exterior derivative D
new (even) variables and redefined as
is defined on this space by extending the formulas
[15] and [16] so that it is an odd derivation. One K C1 M  ^ta  StA 23
clearly has (on )
where S(tA )
is the symmetric algebra in The tA .
D2 0 17 operator  is extended to K as an odd derivation.
One has 2 = 0 and the property [21] is unaffected
The functions on the quotient space =O are just the by the inclusion of the new generators. Furthermore,
elements of the zeroth cohomological group by construction,
H 0 (D, L ),
H1 ; K 0 24
H 0 D; L C1 =O 18
If there is no identity on the identities, we shall
In general, H k (D, L ) 6 0. assume that the process stops. Otherwise, one needs
to introduce further generators in r-degree 3 and
KoszulTate Differential 
possibly higher. When all the appropriate variables
The KoszulTate differential  implements the first are included, there is no homology at higher
step in the reduction procedure. More precisely, it r-degree. Thus,
provides an algebraic resolution of the algebra
Hk ; K 0; k>0 25
C1 () of the smooth functions on the surface .
That algebra can be identified with the quotient
algebra
Combining  with D
C1  C1 M=N 19
We now turn to the problem of combining the
where N is the ideal of functions that vanish on .
KoszulTate complex with the longitudinal com-
The KoszulTate complex K is defined by adding
plex, so as to implement the full reduction. To that
one new generator for each equation fa = 0 defining
end, we define C by adding the ghosts to K,
, denoted ta and assigned r-degree 1. In the algebra
C1 (M)  ^(ta ) (where ^(ta ) is the exterior algebra C K  ^C 0 26
on t ), one defines  through
We then extend the action of the KoszulTate
f 0 8f 2 C1 M; ta fa 20 differential in the simplest way which preserves all
gradings, namely
and extends it as an odd derivation. It is clear
that r() = 1 and that 2 = 0. Because the C 0 27
BRST Quantization 389

It is clear that the homology of  in C is given by canonical transformations that are generated by the
first-class constraints. Assuming that all the second-
H0 ; C L ; Hk ; C 0 k > 0 28
class constraints have been eliminated and that the
One can also extend the longitudinal derivative bracket being used is the Dirac bracket, one sees
D to the whole complex C because the vector fields that there is a vector field X  for each constraint
X  are defined throughout M and so, the defini- function fa ,   a. (The functions fa are thus
tions [15] and [16] make sense in C. One defines assumed to be independent since the vector fields
the action of D on the generators t by requiring X  are assumed to be so. If not, further variables are
that needed, but the analysis proceeds along the same
ideas.)
D D 0 29 This implies, in turn, that there is a pairing between
This is easily verified to be possible. However, the the ghosts Ca associated with the longitudinal exterior
(odd) derivation so obtained fails to be a differential derivative and the generators ta of the KoszulTate
in C when the vector fields X do not close off the complex. This pairing enables one to extend the
surface . In that case, the gauge transformations bracket structure defined on the phase space to the
are not integrable off ; one says that they form an pairs (Ca , ta ) by declaring that these are canonically
open algebra. One has then D2 = 0 only on , or, conjugate. The variables ta are the momenta conjugate
more precisely, to the ghosts, [ta ,Cb ] = ab . Accordingly, the complex C
relevant to the Hamiltonian situation,
D2 s1  s1  30
C C1 P  ^Ca ^ ta 33
for some (odd) derivation s1 (that vanishes in the
closed algebra case). But this situation is precisely has a phase-space structure (here, P  M is the
the one discussed earlier, with the KoszulTate manifold obtained after eliminating the second-class
differential being indeed , as anticipated by the constraints, equipped with the Dirac bracket). The
notation, and the longitudinal differential D playing space C is known as the extended phase space.
the role of s0 (the degrees also match). Applying the The r-degree is called antighost number in the
theorem discussed there, we can conclude: Hamiltonian context.
Theorem 2 There exists a differential s in C, By the general theorem described in the previous
section, one knows that the cohomology at gh = 0 of
s  D s1    ; s2 0 31 the BRST differential is isomorphic to the algebra of
such that the observables. Thus, there are two alternative
ways to describe this physical algebra, either
H 0 s; C C1 =O 32 through reduction, by eliminating the redundant
(gauge) variables, or cohomologically in an extended
This is an immediate consequence of Theorem 1 space containing additional variables, the ghosts,
and eqns [18] and [28]. The differential s is known and their momenta.
in the physical applications described below as the There is an additional interesting feature of the
BRST differential. BRST construction in the Hamiltonian case: the
BRST transformation is a canonical transformation
in the extended phase space, in the sense that

Hamiltonian BRST Construction sF ; F 34

As a first application of the above setting, we for some BRST generator  of ghost number 1
consider the Hamiltonian description of gauge (F,  2 C). The nilpotency s2 of the BRST differen-
systems. As already known, gauge systems are tial is equivalent to
characterized in the Hamiltonian description by
;  0 35
constraints and, for this reason, are called con-
strained Hamiltonian systems. Furthermore, the That s is canonically generated implies that the
gauge transformations generate gauge orbits on the cohomological BRST groups come with a natural
constraint surface and the physical observables are bracket structure: the Poisson bracket of the extended
the functions on the quotient space of the constraint phase space passes on to the BRST cohomological
surface by the gauge orbits. groups. In particular, H 0 (s, C), equipped with this
A further important feature arises in the Hamilto- bracket structure, is isomorphic (as Poisson algebra)
nian formalism: the gauge transformations are to the algebra of physical observables.
390 BRST Quantization

Lagrangian BRST Construction ghost number is carried by the odd antibracket).


The nilpotency s2 = 0 of the BRST differential is
The analysis of the Lagrangian BRST construc-
equivalent to the crucial master equation,
tion, due to Batalin and Vilkovisky (1981) (anti-
field formalism), proceeds in the same way because
S; S 0 39
the covariant description of the space of observables
involves also the same geometric ingredients. The Because the BRST differential is canonically
surface  is now the stationary surface, that is, generated, there is a natural bracket in cohomology.
the space of solutions to the equations of motion. This bracket is not the Poisson bracket of observa-
The space M in which it is embedded is the space of bles (at gh = 0) because it changes the ghost number
all field histories. The gauge symmetry acts on this by one unit. One can, however, relate it to the
space. Furthermore, the gauge vector fields are Poisson bracket of observables (Barnich and Hen-
tangent to  since a solution is mapped on a neaux 1996); furthermore, it plays an important role
solution by a gauge transformation. The integral in the study of the consistent deformations of the
submanifolds are the gauge orbits. The observables action.
are the functions on the quotient space.
Since the equations of motion follow from an
action principle, there are as many equations as Spacetime Locality
there are fields i . The corresponding generators ta
in the KoszulTate complex (at degree 1) are called In the context of local field theory, one is often
antifields conjugate to the fields and are denoted interested in a particular class of functions of the
i . The r-degree is known as antifield (or also field histories, namely the so-called space of local
antighost) number. The gauge symmetry of the functionals. A local functional is, by definition, the
action implies Noether identities on the equations of integral of a local n-form (where n is the spacetime
motion. These are, therefore, not independent. dimension). A local n-form reads, in local
According to the above general discussion, there coordinates,
are further generators in the KoszulTate complex,
! f x dn x 40
at degree 2. More precisely, there are as many new
generators in degree 2 as there are Noether identities where f (x) depends on the fields at x as well as on a
or independent gauge symmetries. These are called finite number of their derivatives. When the ghosts
antifields conjugate to the ghosts and denoted C . and the antifields are included, the local functions
In the longitudinal complex, one has the ghosts C , depend on them in the same way.
with as many ghosts as there are gauge symmetries. The previous general cohomological result was
Thus, the BRST complex is the space derived in the space of all function(al)s, without locality
restriction. When changing the space of cochains, one
C C1 M  ^C  ^i  SC 36
may change the cohomology. For instance, a local
where M is the space of all field histories. There is functional which is BRST-trivial in the space of all
now a natural pairing between the original field functionals may become nontrivial in the space of local
variables i and the antifields i , as well as between functionals. This indeed happens here because the
the ghosts C and the antifields C . One thus defines homology of the KoszulTate differentials usually no
a bracket in which the fields i and the ghosts C on longer vanishes at strictly positive r-degree in the space
the one hand, and the antifields i and C on the of local functionals, where it is related to local
other, are declared to be conjugate. This bracket is conservation laws. As a result, the analysis of the
denoted by parentheses, BRST cohomology in the space of local functionals is
an interesting and nontrivial problem. In particular, the
i ; j ji ; C ; C  37 cohomological groups H k (s) in the space of local
functionals may not vanish at negative ghost numbers.
However, since the bracket pairs variables with
degrees that add up to 1, it is in fact an odd
bracket, called the antibracket.
The BRST differential is again canonically gener-
BRST Quantization
ated, but this time in the antibracket, The quantization of a dynamical system can proceed
along different lines. For gauge models, the path-
sF S; F; F2C 38
integral approach is most efficiently pursued in the
where the generator S is an even function of the context of the antifield formalism. We shall briefly
fields, the ghosts and the antifields, with gh = 0 (the outline here the general principles underlying the
BRST Quantization 391

operator approach, which is based on the Hamiltonian exhaustive here. Some of its main successes are
formalism. outlined here, with suggestions for Further reading.
In the operator approach, all the variables,
including the ghosts and the conjugate momenta, Renormalization of Gauge Theories
are realized as operators in a space endowed with a
nonpositive-definite inner product (because of the First, there is the original context of perturbative
ghosts and the gauge modes). Real dynamical renormalization and anomalies for gauge theories of
variables become formally Hermitian operators. the YangMills type. The relevant cohomology here
Ignoring anomalies, the BRST generator  becomes is the BRST cohomology in the space of local
an operator that fulfills the conditions functionals involving the fields, the ghosts, and the
antifields. The antifields are also known in this
 ; 2 0 41 context as Zinn-Justin sources for the BRST varia-
(which allows for nontrivial solutions  6 0 because tions of the fields and ghosts, since Zinn-Justin was
the inner product is not positive definite). The the first to introduce them (with that meaning).
second relation is a consequence of the classical Many authors have contributed to the full computa-
Poisson bracket relation [,] = 0 and the fact that tion of the local BRST cohomology. A review is
the graded Poisson bracket of two odd objects given in Barnich et al. (2000), where extensions to
becomes the anticommutator. other theories are also indicated.
To remove the ghost and gauge redundancy, which
has no physical content, one must impose a condition String Theory
that selects physical states. The appropriate condition
Modern string theory would be inconceivable with-
is motivated by the general cohomological result
out the BRST formalism. This started with the
connecting the BRST cohomology with the algebra of
pioneering paper by Kato and Ogawa (1983), where
physical observables. One imposes the condition the critical dimension of the bosonic string was
j i 0 42 derived from the condition that 2 should vanish
(quantum mechanically), and where it was shown
Because of [41], states of the form ji are solutions that the string physical states could be identified
of [42], but they have a vanishing inner product with with the state BRST cohomology. The reader is
any other physical states, including themselves. They referred to excellent monographs on modern string
are called null states. The physical states are given by theory (see Further reading).
the BRST state cohomology. The physical operators
are given by the BRST operator cohomology at
Deformations of Gauge Models
gh = 0 and induce a well-defined action in the state
cohomology. In particular, the Hamiltonian, being The study of consistent deformations of a given
gauge invariant in the original theory, is represented gauge theory (i.e., the problem of introducing
by a BRST cohomological class, so that the time consistent couplings) is also efficiently dealt with in
evolution maps physical states on physical states. the BRST context. References to applications may
The whole scheme is (formally) consistent because be found in Henneaux (1998).
exact BRST operators have vanishing matrix elements
between states annihilated by the BRST operator , See also: Anomalies; BatalinVilkovisky Quantization;
while null states ji are such that h jAji= 0 whenever BF Theories; Constrained Systems; Functional
A is a BRST-closed operator, [A, ] = 0, and j i a Integration in Quantum Physics; Graded Poisson
Algebras; Indefinite Metric; Perturbative Renormalization
physical state. Problems may arise, however, if the
Theory and BRST; Quantum Chromodynamics; Quantum
classical relations [, ] = 0 and [H,] = 0 are not
Field Theory: A Brief Introduction; Renormalization:
satisfied in presence of extra terms of order 
h,that is, General Theory; String Field Theory; Supermanifolds;
Topological Sigma Models.
2 6 0 or H H 6 0 43

In such cases, one says that they are anomalies. These


are usually fatal to the consistency of the theory. Further Reading
Barnich G, Brandt F, and Henneaux M (2000) Local BRST
Some Applications cohomology in gauge theories. Physics Reports 338: 439.
Barnich G and Henneaux M (1996) Isomorphisms between the
The number of applications of the BRST formalism BatalinVilkovisky antibracket and the Poisson bracket.
is so large that it would be out of place to try being Journal of Mathematical Physics 37: 5273.
392 BRST Quantization

Batalin IA and Vilkovisky GA (1977) Relativistic S-matrix of Henneaux M and Teitelboim C (1992) Quantization of Gauge
dynamical systems with boson and fermion constraints. Systems. Princeton: Princeton University Press.
Physics Letters B69: 309. Kato M and Ogawa K (1983) Covariant quantization of strings
Batalin IA and Vilkovisky GA (1981) Gauge algebra and based on BRS invariance. Nuclear Physics B212: 443.
quantization. Physics Letters B102: 27. Kugo T and Ojima I (1979) Local covariant operator formalism
Becchi C, Rouet A, and Stora R (1976) Renormalization of gauge of nonabelian gauge theories and quark confinement problem.
theories. Annals of Physics, NY 98: 287. Progress of Theoretical Physics (Suppl.) 66: 1.
Fradkin ES and Vilkovisky GA (1975) Quantization of relativistic Polchinski J (1998) String Theory,vols. 1 and 2. Cambridge:
systems with constraints. Physics Letters B55: 224. Cambridge University Press.
Green MB, Schwarz JH, and Witten E (1987) Superstring Theory, Stasheff JD (1998) The (secret?) homological algebra of the
vols. 1 and 2. Cambridge: Cambridge University Press. BatalinVilkovisky approach. Contemporary Mathematics
Henneaux M (1998) Consistent interactions between gauge fields: 219: 195.
the cohomological approach. Contemporary Mathematics Tyutin IV (1975) Gauge invariance in field theory and statistical
219: 93. physics in the operator formalism. Preprint Lebedev-75-39.
C
C-Algebras and their Classification
G A Elliott, University of Toronto, Toronto, Canada unital commutative C -algebra under the Gelfand
2006 Elsevier Ltd. All rights reserved.
Naimark correspondence may be viewed as the
space of maximal proper ideals, with a natural
topology (the hull-kernel, or Jacobson, topology),
The study of algebras of Hilbert space operators, closed and is called the spectrum. This space may also be
under the adjoint operation and in the weak operator viewed as the set of (unital, linear, multiplicative)
topology, was begun by John von Neumann shortly maps from the algebra into the complex numbers,
after the discovery of quantum mechanics, and partly in which case the topology is that of pointwise
with the aim of understanding the monolithic ideas convergence.
proposed by Heisenberg and Schrodinger. Second, using this result, Gelfand and Naimark
Seventy-five years later, the theory of these proved that arbitrary C -algebras could be axioma-
algebras has become a monolith in its own right tized in a simple way abstractly, as  -algebras that
(see von Neumann Algebras: Introduction, Modular is, as algebras over the complex numbers with a
Theory and Classification Theory; von Neumann conjugate linear anti-automorphism of order 2 with
Algebras: Subfactor Theory), with more internal certain special properties. It is now known that the
structure and with more external reference to physics only property that needs to be assumed is the
and, as it turns out, to other areas of mathematics existence of a (necessarily unique) Banach space
than could possibly have been imagined at the outset. norm related to the  -algebra structure by means of
(The most striking example of an application to the so-called C -algebra identity:
mathematics is perhaps the discovery of the Jones kx xk kx k kxk 1
knot polynomial (see The Jones Polynomial); note
that this has also had repercussions for physics.) This is clearly related to and in fact implies the
Twenty-five years after the beginning of the normed algebra inequality
theory of von Neumann algebras, as these algebras
kx yk  kxk kyk 2
are now called, Gelfand and Naimark noticed that a
second class of algebras of operators on a Hilbert One reason that the GelfandNaimark axiomati-
space, closed under the adjoint operation, was zation of C -algebras is important is that it under-
worthy of study, namely those closed in the norm lines how natural it is to consider a C -algebra
topology. Gelfand and Naimark made two impor- abstractly, i.e., independently of any particular
tant discoveries concerning this class of operator representation. Indeed, while one of the fundamen-
algebras, now called C -algebras. tal phenomena of von Neumann algebra theory
First, Gelfand and Naimark showed that, in the (discovered by Murray and von Neumann) is that,
commutative case, at least when the C -algebra is essentially in rather a strong sense there is only
considered only up to isomorphism with its one way to represent a given von Neumann algebra
identity as a concrete algebra of operators sup- on a Hilbert space (and there is even a canonical
pressed the information contained in a C -algebra way, called the standard representation!), it is an
is purely topological. More precisely, Gelfand and equally fundamental phenomenon of C -algebra
Naimark showed that the category of unital theory that, except in extremely special cases, this
commutative C -algebras, with unit-preserving is no longer true.
algebra homomorphisms (these necessarily preserve For instance, although the C -algebra of compact
the adjoint operation), is equivalent in a contra- operators on a given Hilbert space has, up to unitary
variant way (i.e., with reversal of arrows) to the equivalence, only a single irreducible representation
category of compact Hausdorff spaces, with con- this is what underlies the fact, proved by von
tinuous maps. The compact space associated with a Neumann, referred to as the uniqueness of the
394 C-Algebras and their Classification

Heisenberg commutation relations for a quantum- C -algebra should contain the compact operators.
mechanical system with finitely many degrees of Third, any two irreducible representations with the
freedom as soon as one considers a physical system same kernel should be unitarily equivalent. Fourth,
with infinitely many degrees of freedom, one finds that it should be possible to parametrize the unitary
the naturally associated C -algebra has infinitely equivalence classes of irreducible representations by
many indeed, uncountably many unitary equiva- a real number in a natural way (respecting the
lence classes of irreducible representations, and it is natural Borel structure introduced by Mackey).
impossible to parametrize these in any reasonable way. The first of the equivalent properties listed above,
This striking dichotomy presents itself also in that all representations of a C -algebra should be of
other contexts, more elementary perhaps than the type I, suggested a name for the property that the
physics of infinitely many degrees of freedom. C -algebra itself should be of type I. This property
Consider the dynamical system consisting of a circle of a C -algebra, identified by Glimm or, rather, its
and a fixed rotation acting on it. If the rotation is of opposite, which as mentioned above is much more
finite order i.e., if the angle is a rational multiple common (just as irrational numbers are more
of 2 then the naturally associated C -algebra is common than rationals, or systems with infinitely
relatively easy to study. In the case of angle zero, it many degrees of freedom are, at least in theory,
is the unital commutative C -algebra with Gelfand much more common than those with finitely many
Naimark spectrum the torus. In the general case of a degrees of freedom) is a fundamental unifying
rational angle, the space of unitary equivalence principle of nature.
classes of irreducible representations is still naturally Besides commutative C -algebras as mentioned
parametrized by the torus. (And this is the same as above, just another way of looking at topological
the space of primitive ideals the kernels of the spaces (compact Hausdorff spaces, that is) and
irreducible representations with the Jacobson besides the C -algebra associated to a rotation or to
topology.) a physical system with infinitely many degrees of
In the irrational case the case of a rotation by an freedom, what are some of the naturally occurring
irrational multiple of 2 (still elementary from a examples of C -algebras of type I or not!
geometrical point of view; note that the calendar is First, let us take a closer look at what arises from
based on such a system!) the irreducible represen- a system with infinitely many degrees of freedom
tations are no longer parametrized up to unitary in the fermion case. As shown by Jordan and
equivalence by the torus and the space of primitive Wigner, one obtains what, as a C -algebra, is very
ideals consists of a single point the C -algebra is easy to describe, namely, just the infinite tensor
simple. (But it is decidedly not simple to study!) product in the category of unital C -algebras of
This fundamental dichotomy in the classification copies of the algebra of 2  2 matrices over the
of C -algebras conjectured by Gaarding and complex numbers. As it happens, in work earlier
Wightman in the quantum-mechanical setting and than that referred to above, Glimm had considered
by Mackey in the geometrical one was established such infinite tensor product C -algebras, also allow-
by Glimm. Glimm proved (in the setting of separ- ing the components to be matrix algebras of order
ability; most of his results were generalized later different from two. This raised a problem of
to the nonseparable case) that a large number of classification for those C -algebras, all of which
a priori different ways that a C -algebra could were simple and not of type I. (The only simple
behave well were in fact one and the same behavior: unital C -algebra of type I is a single matrix algebra,
either all present for a given C -algebra, or all or a finite tensor product of matrix algebras!)
catastrophically absent! In a pioneering classification paper (the first paper
Some of the properties considered by Glimm, and on the classification of C -algebras being perhaps
shown to be equivalent (for a separable C -algebra) that of Gelfand and Naimark, in which the commu-
were as follows. First of all, every representation of tative case was described), Glimm obtained the
the C -algebra on a Hilbert space should be of type classification of infinite tensor products of matrix
I, i.e., should generate a von Neumann algebra of algebras, showing that it was a direct extension of
type I. (A von Neumann algebra was said by Murray the classification of finite tensor products, i.e., just
and von Neumann to be of type I if it contained a of the matrix algebras themselves. As described later
minimal projection of central support one, i.e., a by Dixmier, Glimms classification was as follows.
projection not contained in a proper direct sum- Given a sequence n1 , n2 , . . . of natural numbers
mand and minimal with this property.) Second, in (equal to one or more), form the infinite product in
every irreducible representation (not necessarily a natural way just by keeping track of the total
injective) on a Hilbert space, the image of the number of times each prime number appears in the
C-Algebras and their Classification 395

finite products n1 . . . nk (a multiplicity which may be to be added have orthogonal representatives) one
either finite or infinite). Call such a formal infinite might refer to this as a local abelian semigroup
product a generalized integer or, perhaps, a which was used by Murray and von Neumann to
supernatural number! Two (countably) infinite divide von Neumann algebras into what they called
tensor products of matrix algebras are isomorphic types I, II, and III was shown by the author to
(just as in the finite tensor product case) if and only determine Brattelis algebras up to isomorphism.
if the corresponding supernatural numbers are Bratteli called his algebras approximately finite-
equal. dimensional C -algebras, or AF algebras. The author
In formulating Glimms classification of infinite referred to his invariant simply as the range of the
tensor products of matrix algebras in this way, (abstract) dimension, and pointed out that this
Dixmier pointed out that each supernatural number structure determined an enveloping ordered abelian
determines a subgroup of the rational numbers group, which he called the dimension group. It was
(those with denominator dividing the supernatural soon noticed that the dimension group was related
number) and that every subgroup of the rational to the K-group introduced by Grothendieck in
numbers containing the integers arises in this way. algebraic geometry (see K-Theory), and by Atiyah
He then gave an alternative derivation of Glimms and Hirzebruch (see K-Theory) in topology.
theorem by recovering this subgroup of the rational Grothendiecks K-group was defined for an arbi-
numbers as a natural invariant of the algebra, trary ring with unit, and Atiyah and Hirzebruch in
namely, as the subgroup generated by the values effect considered the special case of the ring of
on projections of the unique normalized trace. (By a continuous functions on a compact Hausdorff space
trace is meant here a unitarily invariant positive in other words, a commutative C -algebra in the
linear functional.) This could even be interpreted as process showing that the deep phenomenon of Bott
an alternative statement of Glimms theorem. periodicity could be expressed in terms of this
Soon afterwards, Bratteli considered an extension invariant. The invariant itself (see below) is essen-
of Glimms class of C -algebras, namely, the tially the same as that of Murray and von Neumann.
inductive limits of arbitrary sequences of finite- In the special case that the ring is an AF algebra, the
dimensional C -algebras, and gave a classification of K-group coincides with the dimension group. (The
these algebras in terms of the embedding multiplicity K-group has a natural ordered, or pre-ordered,
data in the sequences. This was exactly analogous to structure, although this was often suppressed.)
the original classification of Glimm, but now vastly Let us consider the definition of the K-group of a
more complex, with the multiplicity data of the not necessarily unital C -algebra; it is in this setting
sequence encoded in what is now called a Bratteli that the statement of Bott periodicity attains its
diagram. (Note that a finite-dimensional C -algebra simplest form.
is just a direct sum of matrix algebras over the First, in the unital case, one constructs the abelian
complex numbers.) Bratteli diagrams have proved to local semigroup (addition just partially defined) of
be very important, and in particular have been shown Murrayvon Neumann equivalence classes of pro-
by Putnam and others to be useful for the study of jections, as described above in the case of an AF
minimal homeomorphisms of the Cantor set. algebra. Let us call this the dimension range. As
Brattelis extension of Glimms tensor product stated above, for AF algebras this is all that needs to
classification was followed by a corresponding be done the enveloping group of the dimension
extension by the present author of Dixmiers range is already the K-group. In the general case,
approach to Glimms result. It was no longer one must repeat the construction for the algebra of
possible to express the appropriate data in terms of 2  2 matrices over the given algebra, with the given
traces (even in the case of a unique normalized algebra considered as embedded as the upper left-
trace). Instead, the present author recalled the hand corner of the matrix algebra. The dimension
concept of equivalence of projections introduced range of the given algebra then maps naturally into
by Murray and von Neumann forty years earlier, (but not necessarily onto) the dimension range of the
together with the fact, proved by Murray and von matrix algebra. One should then repeat this con-
Neumann, that equivalence is compatible with struction, doubling the order of the matrix algebra
addition of orthogonal projections. (Two projec- at every stage (or, alternatively, increasing it just by
tions in a  -algebra are equivalent if they are equal one). The enveloping group of the (algebraic)
to x x and xx for some element x.) The resulting inductive limit of this sequence of local semigroups
elementary invariant the set of equivalence classes is then the K-group of the given algebra. (Alterna-
of projections with the operation of addition tively, one may just consider immediately the

whenever defined (whenever the equivalence classes -algebra of all infinite matrices over the given
396 C-Algebras and their Classification

C -algebra with only finitely many nonzero entries, first referred to as the index map, and the second
and form the dimension range of this  -algebra and (sometimes referred to as the odd-order index map)
the enveloping group of this abelian local semi- obtained from this immediately from Bott periodicity
group, now in fact a semigroup.) (as stated above) such that the periodic six-term
In the case of a nonunital C -algebra, one adjoins sequence
a unit (as may be done, for instance, by representing
K0 J ! K0 A ! K0 A=J
the C -algebra faithfully on a Hilbert space, and
" #
showing that the C -algebra obtained by adjoining
K1 A=J K1 A K1 J
the identity operator is independent of the representa-
tion actually, one need only check that the  -algebra is exact. (The periodicity stated above can also be
structure is unique, as the C -algebra norm on a recovered from this.)
C -algebra is always determined by the  -algebra Given that the functor K0 classifies AF algebras,
structure). The K-group of the resulting unital one might expect the functor K1 to be useful for
C -algebra then maps naturally into the K-group of classification purposes also. In fact, this is the case.
the natural one-dimensional quotient, and the kernel (Indeed, as shown by Brown, the K1 -functor is
of this map is, for reasons that will become clearer already important for the theory of AF algebras in
later, defined to be the K-group of the nonunital spite of, or even because of (!), the fact that the
algebra. K1 -group of an AF algebra is zero.) Using the six-
Atiyah and Hirzebruch in fact referred to the term exact sequence of Bott periodicity described
K-group of the C -algebra as K0 the reason being above, corresponding to an extension of C -algebras,
that there is another very natural group to consider, together with results of the present author, Brown
namely, the K-group of the suspension of the showed that any extension of one AF algebra by
C -algebra. (The suspension, SA, of a C -algebra A another is again an AF algebra.
is defined as the C -algebra of all continuous A rather large class of simple unital C -algebras
functions from the real line R into A which converge has by now been classified by means of the
to zero at 1, with the pointwise  -algebra invariants K0 and K1 together with the class of
operations and the supremum norm. It may also be the unit in K0 , and the order (or pre-order) structure
defined as the (unique) C -algebra tensor product on K0 and also taking into account the compact
A  C0 (R), where C0 (R) denotes the suspension of convex set of tracial states on the C -algebra
the C -algebra C of complex numbers.) Denoting (a positive linear functional on a C -algebra is called
the K0 -group of the suspension of a given C -algebra a trace if it has the same value on x x and x x for
by K1 , one might expect this process to continue, every element x, and a tracial state if it is a state,
but in fact it is periodic (K0 , K1 , K0 , K1 , . . .). Bott that is, has norm 1, or has value 1 on the unit in the
periodicity states that there is a natural isomorphism case the algebra has a unit). In addition to the set of
of K2 with K0 . (C -algebras can also be defined with tracial states, together with its natural topology and
the field of real numbers as scalars, and in this case convex structure, one should also keep track of the
the period of Bott periodicity is eight.) natural pairing between traces and K0 (any trace on
Another way of stating Bott periodicity, or, more a unital C -algebra has the same value on two
precisely, of embedding it into the K-theory of equivalent projections equal to x x and x x for
C -algebras, is as follows. Given a short exact some element x and hence gives rise to an additive
sequence of C -algebras, real-valued functional on K0 ).
In terms of these invariants (which might, broadly
0 ! J ! A ! A=J ! 0 3
speaking, be called K-theoretical), it has been

i.e., given a C -algebra A and a closed two-sided possible to classify the simple unital C -algebras
ideal J (the quotient  -algebra is then a C -algebra (not of type I) arising as inductive limits (i.e., as the
with the quotient norm) A is sometimes referred to completions of increasing unions) of sequences of
as an extension of J by A=J consider the natural finite direct sums of matrix algebras over separable
short (not necessarily exact) sequences commutative C -algebras, these assumed to have
spectra of dimension at most three, on the one hand
K0 J ! K0 A ! K0 A=J 4
(work of the present author together with Guihua
and Gong and Liangqing Li, a culmination of earlier
work of these authors together with a number of
K1 J ! K1 A ! K1 A=J 5
others), and, on the other hand, it has been possible
(K0 and K1 are functors!). There exist natural connect- (work of Kirchberg and Phillips, also based on
ing maps K1 (A=J) ! K0 (J) and K0 (A=J) ! K1 (J) the earlier work by a number of authors) to classify the
C-Algebras and their Classification 397

C -algebra tensor products (in a natural sense) of who settled a particularly stubborn case), it is
these C -algebras with what is called the Cuntz natural to ask whether the K-theoretical invariants
C -algebra O1 (see below). In the first of these two described above might be sufficient to classify all
cases, the compact convex set of tracial states amenable separable C -algebras, say, those which
always a Choquet simplex is an arbitrary (metriz- are simple and unital.
able) such space. The work of Villadsen has shown that additional
In the second case, this space is empty (as it is for invariants must in fact be considered, if one is to
O1 in particular). In both cases, K0 and K1 are deal with arbitrary amenable simple C -algebras,
arbitrary countable abelian groups, with the proviso and this has been confirmed in subsequent work of
that K0 is not the sum of a torsion group and a Rrdam and of Toms. (Villadsens examples were
cyclic group. In the first case, the order structure on obtained by removing the condition of low dimen-
K0 , the class of the unit element, and the pairing of sion on the spectra of the commutative C -algebras
K0 with the space of traces have certain special appearing in the inductive limit decomposition
properties; as it turns out, these can be expressed in considered above.) The very nature of these authors
a simple way. (The class of the unit need only be work, however, has been to introduce additional
positive and nonzero.) In the second case, the order invariants, all of which it seems natural to consider
structure on K0 is degenerate every element is as, broadly speaking, K-theoretical. (And all of
positive and the class of the unit can be arbitrary which, as it happens, are already familiar.)
(including zero!). The question of the classifiability, in terms of
Let us just note that the Cuntz C -algebra O1 is simple invariants (K-theoretical in nature, at least in
the unital C -algebra generated by an infinite the broad sense, and including the spectrum which is
sequence s1 , s2 , . . . of isometries with orthogonal indispensable in the nonsimple case), of all (separ-
ranges (in other words, elements si such that si si is able) amenable C -algebras would therefore still
the unit and sj si = 0 if j 6 i). One need not require appear to be on the agenda.
the C -algebra to have the universal property with Already, in any case, just like the analogous
respect to these generators and relations as it is in question for von Neumann algebras (now settled),
fact unique (up to an isomorphism preserving these this question would appear to have had a noticeable
generators). In particular, this C -algebra is simple. influence on the development of the subject not
(If one considers a finite sequence of isometries with least in underlining the importance of K-theoretical
orthogonal ranges, and assumes in addition that the methods, which have proved to be pertinent both in
sum of these is the unit, one also obtains a simple connection with the index theory of differential
C -algebra, the Cuntz C -algebra On , n = 2, 3, . . .). operators on geometrical structures from foliations
The K0 -group and K1 -group of O1 are, respectively, to fractals and in connection with questions in
Z and 0. (The K0 -group and K1 -groups of On for physics, related to quantum statistical mechanics
n = 2, 3, . . . are, respectively, Z=(n  1)Z and 0.) (see e.g., Quantum Hall Effect), to quantum field
Both classes of C -algebras considered in the theory (e.g., the standard model), and even to string
classification result stated above, although des- theory and M-theory.
cribed in rather a concrete way (in terms of
inductive limits and tensor products), can also be See also: Axiomatic Quantum Field Theory; Bosons and
characterized axiomatically, in a way that makes it Fermions in External Fields; The Jones Polynomial;
clear that they are, in fact, much more general than K-Theory; Positive Maps on C *-Algebras; Quantum Hall
Effect; von Neumann Algebras: Introduction, Modular
they seem. (These axiomatizations are due to
Theory, and Classification Theory; von Neumann
Lin and to Kirchberg and Phillips. Typically, the Algebras: Subfactor Theory.
abstract axioms are easier to establish in a
given case than the inductive limit form described
above.)
In view of this, and the fact that one of the axioms Further Reading
is a notion of amenability (the analogous property Davidson KR (1996) C -Algebras by Example. Fields Institute
for C -algebras of a notion that has also been Monographs, 6. Providence, RI: American Mathematical
considered for von Neumann algebras) and since Society.
amenable von Neumann algebras (on a separable Dixmier J (1969) Les C -Algebres et leurs Representations,
Hilbert space) have been classified completely (in 2nd edn. Paris: GauthierVillars.
Elliott GA (1995) The classification problem for amenable
remarkable work of Connes, together with many C -algebras. In: Chatterji SD (ed.) Proceedings of the Interna-
others, starting with Murray and von Neumann tional Congress of Mathematicians, vols. 1, 2, pp. 922932.
and, one must also mention, ending with Haagerup, (Zurich, 1994). Basel: Birkhauser.
398 Calibrated Geometry and Special Lagrangian Submanifolds

Evans DE and Kawahigashi Y (1998) Quantum Symmetries on Pedersen GK (1979) C -Algebras and their Automorphism
Operator Algebras. Oxford: Oxford University Press. Groups, London Math. Soc. Monographs. London: Academic
Fillmore PA (1996) A Users Guide to Operator Algebras. Press.
New York: Wiley. Rrdam M (2002) Classification of Nuclear, Simple C -Algebras,
Kadison RV and Ringrose J (198392) Fundamentals of the Theory Encyclopaedia of Mathematical Sciences, vol. 126, pp. 1145.
of Operator Algebras (4 volumes). New York: Academic Press. Berlin: Springer.
Lin H (2001) An Introduction to the Classification of Amenable Sakai S (1971) C -Algebras and W  -Algebras. Berlin: Springer.
C -Algebras. Singapore: World Scientific.

Calibrated Geometry and Special Lagrangian Submanifolds


D D Joyce, University of Oxford, Oxford, UK Proposition 2 Let (M, g) be a Riemannian mani-
2006 Elsevier Ltd. All rights reserved. fold, a calibration on M, and N a compact
-submanifold in M. Then N is volume-minimizing
in its homology class.
Calibrated Geometry Proof Let dim N = k, and let [N] 2 Hk (M, R) and
[] 2 H k (M, R) be the homology and cohomology
Calibrated geometry, introduced by Harvey and
classes of N and . Then
Lawson (1982), is the study of special classes of Z Z
minimal submanifolds N of a Riemannian mani-
  N jTx N volTx N VolN
fold (M, g), defined using a closed form on M x2N x2N
called a calibration. For example, if (M, J, g) is a
since jTx N = volTx N for each x 2 N, as N is a
Kahler manifold with Kahler form !, then complex
calibrated submanifold. If N 0 is any other compact
k-submanifolds of M are calibrated with respect to
k-submanifold of M with [N 0 ] = [N] in Hk (M, R),
= !k =k!. Another important class of calibrated
then
submanifolds are special Lagrangian submanifolds
Z Z
in CalabiYau manifolds, which is the focus of the 0
  N   N  jTx N0  volTx N0
section Special Lagrangian geometry. x2N0 x2N0
VolN 0
Calibrations and Calibrated Submanifolds
since jTx N0  volTx N0 because is a calibration. The
We begin by defining calibrations and calibrated last two equations give Vol(N)  Vol(N 0 ). Thus, N
submanifolds. is volume-minimizing in its homology class. &
Definition 1 Let (M, g) be a Riemannian manifold. Now let (M, g) be a Riemannian manifold with a
An oriented tangent k-plane V on M is a vector calibration , and let  : N ! M be an immersed
subspace V of some tangent space Tx M to M with submanifold. Whether N is a -submanifold
dimV = k, equipped with an orientation. If V is an depends upon the tangent spaces of N. That is, it
oriented tangent k-plane on M then gjV is a depends on  and its first derivative. So, for N to be
Euclidean metric on V; so, combining gjV with the calibrated with respect to is a first-order partial
orientation on V gives a natural volume form volV differential equation on . But if N is calibrated then
on V, which is a k-form on V. N is minimal, and for N to be minimal is a second-
Now let be a closed k-form on M. is said to order partial differential equation on .
be a calibration on M, if for every oriented k-plane One moral is that the calibrated equations, being
V on M, jV  volV . Here, jV =   volV for some first order, are often easier to solve than the minimal
 2 R, and jV  volV if   1. Let N be an submanifold equations, which are second order. So
oriented submanifold of M with dimension k. Then calibrated geometry is a fertile source of examples of
each tangent space Tx N for x 2 N is an oriented minimal submanifolds.
tangent k-plane. We say that N is a calibrated
Calibrated Submanifolds and Special Holonomy
submanifold if jTx N = volTx N for all x 2 N.
It is easy to show that calibrated submanifolds A calibration on (M, g) is only interesting if there
are automatically minimal submanifolds. We exist plenty of -submanifolds N in M, locally
prove this in the compact case, but noncompact or globally. Since jTx N = volTx N for each x 2 N,
calibrated submanifolds are locally volume-minimizing -submanifolds will be abundant only if the family
as well. F of calibrated tangent k-planes V with jV = volV
Calibrated Geometry and Special Lagrangian Submanifolds 399

is reasonably large say, if F has small Cm . Thus, a CalabiYau m-fold (M, g) with
codimension in the family of all tangent k-planes V Hol(g) = SU(m) has a holomorphic volume form
on M. A maximally boring example is the k-form . The real part Re  is a calibration on M, and
= 0, which is a calibration but has no calibrated the corresponding calibrated submanifolds are
tangent k-planes, so no -submanifolds. called special Lagrangian submanifolds.
Thus, most calibrations will have few or no  The group G2  O(7) preserves a 3-form 0 and a
-submanifolds, and only special calibrations with 4-form 0 on R7 . Thus, a Riemannian 7-manifold
F large will have interesting calibrated geometries. (M, g) with holonomy G2 comes with a 3-form
Now the field of Riemannian holonomy groups is a and 4-form , which are both calibrations. The
natural companion for calibrated geometry, because corresponding calibrated submanifolds are called
it gives a simple way to generate interesting associative 3-folds and coassociative 4-folds.
calibrations which automatically have F large.  The group Spin(7)  O(8) preserves a 4-form 0
Let G  O(n) be a possible holonomy group of a on R8 . Thus a Riemannian 8-manifold (M, g) with
Riemannian metric. In particular, we can take G to be holonomy Spin(7) has a 4-form , which is a
one of the holonomy groups U(m), SU(m), Sp(m), G2 , calibration. The -submanifolds are called Cayley
or Spin(7) from Bergers classification. Then G acts 4-folds.
on the k-forms k (R n ) on Rn , so we can look for
It is an important general principle that to each
G-invariant k-forms on Rn . Suppose 0 is a nonzero,
calibration on an n-manifold (M, g) with special
G-invariant k-form on Rn .
holonomy constructed in this way, there corre-
By rescaling 0 we can be arrange that for each
sponds a constant calibration 0 on Rn . Locally, -
oriented k-plane U  R n , we have 0 jU  volU , and
submanifolds in M resemble the 0 -submanifolds in
that 0 jU = volU for at least one such U. Let H be the
Rn , and have many of the same properties. Thus, to
stabilizer subgroup of this U in G. Then 0 jU =
understand the calibrated submanifolds in a mani-
volU by G-invariance, so   U is a calibrated
fold with special holonomy, it is often a good idea to
k-plane for all  2 G. Thus, the family F 0 of
start by studying the corresponding calibrated
0 -calibrated k-planes in R n contains G=H, so it is
submanifolds of Rn .
reasonably large, and it is likely that the calibrated
In particular, singularities of -submanifolds in M
submanifolds will have an interesting geometry.
will be locally modeled on singularities of 0 -
Now let M be a manifold of dimension n, and g
submanifolds in Rn . (In the sense of geometric
a metric on M with Levi-Civita connection r and
measure theory, the tangent cone at a singular point
holonomy group G. Then there is a k-form on M
of a -submanifold in M is a conical 0 -submanifold
with r = 0, corresponding to 0 . Hence d = 0,
in Rn .) So by studying singular 0 -submanifolds in
and is closed. Also, the condition 0 jU  volU for
Rn , we may understand the singular behavior of
all oriented k-planes U in Rn implies that jV 
-submanifolds in M.
volV for all oriented tangent k-planes V in M. Thus,
is a calibration on M. The family F of calibrated
tangent k-planes on M fibers over M with fiber F 0 ; Special Lagrangian Geometry
so, it is reasonably large.
This gives a general method for finding interesting We now focus on one class of calibrated submani-
calibrations on manifolds with reduced holonomy. folds, special Lagrangian submanifolds in Calabi
Here are the most significant examples. Yau manifolds. CalabiYau 3-folds are used to
make the spacetime vacuum in string theory, and
 Let G = U(m)  O(2m). Then G preserves a special Lagrangian 3-folds are the classical versions
2-form !0 on R 2m . If g is a metric on M with of A-branes, or supersymmetric 3-cycles, in Calabi
holonomy U(m), then g is Kahler with complex Yau 3-folds. Special Lagrangian geometry aroused
structure J, and the 2-form ! on M associated to great interest amongst string theorists because of its
!0 is the Kahler form of g. role in the SYZ conjecture, providing a geometric
One can show that ! is a calibration on (M, g), basis for mirror symmetry of CalabiYau 3-folds.
and the calibrated submanifolds are exactly the
CalabiYau Manifolds
holomorphic curves in (M, J). More generally,
!k =k! is a calibration on M for 1  k  m, and Here is our definition of CalabiYau manifold.
the corresponding calibrated submanifolds are the Readers are warned that there are several different
complex k-dimensional submanifolds of (M, J). definitions of CalabiYau manifolds in use in the
 Let G = SU(m)  O(2m). Then G preserves a literature. Ours is unusual in regarding  as part of
complex volume form 0 = dz1 ^    ^ dzm on the given structure.
400 Calibrated Geometry and Special Lagrangian Submanifolds

Definition 3 Let m  2. A CalabiYau m-fold is a Special Lagrangian Submanifolds


quadruple (M, J, g, ) such that (M, J) is a compact
Definition 5 Let (M, J, g, ) be a CalabiYau m-fold.
m-dimensional complex manifold, g a Kahler metric
Then Re  is a calibration on the Riemannian
on (M, J) with Kahler form !, and  a holomorphic
manifold (M, g). An oriented real m-dimensional
(m, 0)-form on M called the holomorphic volume
submanifold N in M is called a special Lagrangian
form, which satisfies
submanifold (SL m-fold) if it is calibrated with respect
to Re .
!m =m! 1mm 1=2 i=2m  ^ 
 1
Here is an alternative definition of SL m-folds. It
The constant factor in [1] is chosen to make Re  a is often more useful than Definition 5.
calibration. It follows from [1] that g is Ricci-flat, 
is constant under the Levi-Civita connection, and Proposition 6 Let (M, J, g, ) be a CalabiYau
the holonomy group of g has Hol(g)
SU(m). m-fold, with Kahler form !, and N a real m-dimen-
sional submanifold in M. Then N admits an
Let (M, J) be a compact, complex manifold, and g orientation making it into an SL m-fold in M if
a Kahler metric on M, with Ricci curvature Rab . Define and only if !jN 0 and Im jN 0.
the Ricci form  of g by ac = Jab Rbc . Then  is a closed
real (1, 1)-form on M, with de Rham cohomology class Regard N as an immersed submanifold, with
[] = 2c1 (M) 2 H 2 (M, R), where c1 (M) is the first immersion  : N ! M. Then [!jN ] and [ Im jN ] are
Chern class of M in H 2 (M, Z). The Calabi conjecture unchanged under continuous variations of the
specifies which closed (1, 1)-forms can be the Ricci immersion . Thus, [!jN ] = [Im jN ] = 0 is a neces-
forms of a Kahler metric on M. sary condition not just for N to be special
Lagrangian, but also for any isotopic submanifold
The Calabi conjecture Let (M, J) be a compact, N 0 in M to be special Lagrangian. This proves:
complex manifold, and g0 a Kahler metric on M,
with Kahler form !0 . Suppose that  is a real, closed Corollary 7 Let (M, J, g, ) be a CalabiYau m-
(1, 1)-form on M with [] = 2c1 (M). Then there fold, and N a compact real m-submanifold in M.
exists a unique Kahler metric g on M with Kahler Then a necessary condition for N to be isotopic
form !, such that [!] = [! 0 ] 2 H2 (M, R), and the to a special Lagrangian submanifold N0 in M
Ricci form of g is . is that [!jN ] = 0 in H 2 (N, R) and [Im jN ] = 0 in
H m (N, R).
Note that [!] = [!0 ] says that g and g 0 are in the
same Kahler class. The conjecture was posed by Calabi
Deformations of Compact SL m-Folds
in 1954, and was eventually proved by Yau in 1976.
Its importance to us is that when the canonical bundle The deformation theory of compact special Lagran-
KM is trivial, so that c1 (M) = 0, we can take  0, and gian manifolds was studied by McLean (1998), who
then g is Ricci-flat. Since KM is trivial, it has a nonzero proved the following result:
holomorphic section, a holomorphic (m, 0)-form . As
Theorem 8 Let (M, J, g, ) be a CalabiYau
g is Ricci-flat, it follows that r = 0, where r is the
m-fold, and N a compact special Lagrangian
Levi-Civita connection of g. Rescaling  by a complex
m-fold in M. Then the moduli space MN of special
constant makes [1] hold, and then (M, J, g, ) is a
Lagrangian deformations of N is a smooth manifold
CalabiYau m-fold. This proves:
of dimension b1 (N), the first Betti number of N.
Theorem 4 Let (M, J) be a compact complex m-
Sketch proof. Suppose for simplicity that N is an
manifold with KM trivial. Then every Kahler class
embedded submanifold. There is a natural orthogo-
on M contains a unique Ricci-flat Kahler metric g.
nal decomposition TMjN = TN , where  ! N is
There exists a holomorphic (m, 0)-form , unique
the normal bundle of N in M. As N is Lagrangian,
up to change of phase  7! ei , such that
the complex structure J : TM ! TM gives an iso-
(M, J, g, ) is a CalabiYau m-fold.
morphism J :  ! TN. But the metric g gives an
Using algebraic geometry, one can produce many isomorphism TN T  N. Composing these two
examples of complex m-folds (M, J) satisfying these gives an isomorphism  T  N.
conditions, such as the Fermat (m 2)-tic Let T be a small tubular neighborhood of N in M.
Then we can identify T with a neighborhood of the
fz0 ; . . . ; zm1  zero section in . Using the isomorphism  T  N, we

2 CPm1 : zm2    zm2 have an identification between T and a neighborhood of
0 m1 0 2
the zero section in T  N. This can be chosen to identify
Therefore, CalabiYau m-folds are very abundant. the Kahler form ! on T with the natural symplectic
Calibrated Geometry and Special Lagrangian Submanifolds 401

structure on T  N. Let  : T ! N be the obvious for all t can be satisfied by choosing the phases of
projection. the t appropriately, and if the image of H2 (N, Z) in
Under this identification, submanifolds N 0 in T  H2 (M, R) is zero, then the condition [!jN ] = 0 holds
M which are C1 close to N are identified with the automatically.
graphs of small smooth sections  of T  N. That is, Thus, the obstructions [!t jN0 ] = [Im t jN0 ] = 0 in
submanifolds N 0 of M close to N are identified with Theorem 9 are actually fairly mild restrictions, and
1-forms  on N. We need to know: which 1-forms  SL m-folds should be considered as pretty stable
are identified with SL m-folds N 0 ? under small deformations of the CalabiYau
Now, N 0 is special Lagrangian if !jN0 Im jN0 0. structure.
But jN0 : N 0 ! N is a diffeomorphism, so we can
Remark The deformation and obstruction theory
push !jN0 and Im jN0 down to N, and regard them
of compact SL m-folds are extremely well behaved
as functions of . Calculation shows that
compared to many other moduli space problems in
 !jN0 d and  Im jN0 F; r differential geometry. In other geometric problems
(such as the deformations of complex structures on a
where F is a nonlinear function of its arguments.
complex manifold, or pseudoholomorphic curves in
Thus, the moduli space MN is locally isomorphic to
an almost-complex manifold, or instantons on a
the set of small 1-forms  on N such that d 0
Riemannian 4-manifold), the deformation theory
and F(, r) 0.
often has the following general structure.
Now it turns out that F satisfies F(, r) 
d() when  is small. Therefore, MN is locally There are vector bundles E, F over a compact
approximately isomorphic to the vector space of 1- manifold M, and an elliptic operator P : C1 (E) !
forms  with d = d() = 0. But by Hodge theory, C1 (F), usually first order. The kernel Ker P is the
this is isomorphic to the de Rham cohomology set of infinitesimal deformations, and the cokernel
group H 1 (N, R), and is a manifold with dimension Coker P the set of obstructions. The actual moduli
b1 (N). space M is locally the zeros of a nonlinear map
To carry out this last step rigorously requires  : Ker P ! Coker P.
some technical machinery: one must work with In a generic case, Coker P = 0, and then the
certain Banach spaces of sections of T  N, 2 T  N moduli space M is locally isomorphic to Ker P,
and m T  N, use elliptic regularity results to prove and so is locally a manifold with dimension ind(P).
that the map  7! (d, F(, r)) has closed image in However, in nongeneric situations Coker P may be
these Banach spaces, and then use the implicit nonzero, and then the moduli space M may be
function theorem for Banach spaces to show that nonsingular, or have an unexpected dimension.
the kernel of the map is what is expected. However, SL m-folds do not follow this pattern.
Instead, the obstructions are topologically determined,
and the moduli space is always smooth, with dimen-
Obstructions to Existence of Compact SL m-Folds
sion given by a topological formula. This should be
Let {(M, Jt , gt , t ) : t 2 ( , )} be a smooth one- regarded as a minor mathematical miracle.
parameter family of CalabiYau m-folds. Suppose
N0 is an SL m-fold in (M, J0 , g0 , 0 ). When can we
Mirror Symmetry and the SYZ Conjecture
extend N0 to a smooth family of SL m-folds Nt in
(M, Jt , gt , t ) for t 2 ( , )? Mirror symmetry is a mysterious relationship
By Corollary 7, a necessary condition is that between pairs of CalabiYau 3-folds M, M, arising
[!t jN0 ] = [Im t jN0 ] = 0 for all t. Our next result from a branch of physics known as string theory,
shows that locally, this is also a sufficient condition. and leading to some very strange and exciting
conjectures about CalabiYau 3-folds, many of
Theorem 9 Let {(M, Jt , gt , t ) : t 2 ( , )} be a
which have been proved in special cases.
smooth one-parameter family of CalabiYau m-folds,
In the beginning (the 1980s), mirror symmetry
with Kahler forms !t . Let N0 be a compact SL m-fold
seemed mathematically completely mysterious. But
in (M, J0 , g0 , 0 ), and suppose that [!t jN0 ] = 0
there are now two complementary conjectural
in H 2 (N0 , R) and [Im t jN0 ] = 0 in H m (N0 , R) for all
theories, due to Kontsevich and StromingerYau
t 2 ( , ). Then N0 extends to a smooth one-
Zaslow, which explain mirror symmetry in a fairly
parameter family {Nt : t 2 (
,
)}, where 0 <

mathematical way. Probably both are true, at some
and Nt is a compact SL m-fold in (M, Jt , gt , t ).
level. The second proposal, due to Strominger, Yau,
This can be proved using similar techniques to and Zaslow (1996), is known as the SYZ conjecture.
Theorem 8. Note that the condition [Im t jN0 ] = 0 Here is an attempt to state it.
402 Calibrated Geometry and Special Lagrangian Submanifolds

The SYZ conjecture Suppose M and M are mirror submanifolds, and especially their singularities,
CalabiYau 3-folds. Then (under some additional rather than on global topological questions. In
conditions), there should exist a compact topologi- addition, we are intrested in what fibrations of
cal 3-manifold B and surjective, continuous maps generic CalabiYau 3-folds might look like.
f : M ! B and f : M ! B, such that There is now a well-developed theory of SL
m-folds with isolated singularities modeled on
(i) There exists a dense open set B0  B, such that
cones (Joyce 2003a). This is applied to SL
for each b 2 B0 , the fibers f 1 (b) and f 1 (b) are
fibrations and the SYZ conjecture in Joyce
nonsingular special Lagrangian 3-tori T 3 in M (2003a, b), leading to the tentative conclusions
and M. Furthermore, f 1 (b) and f 1 (b) are in that for generic CalabiYau 3-folds M, special
some sense dual to one another. Lagrangian fibrations f : M ! B will be only piece-
(ii) For each b 2  = BnB0 , the fibers f 1 (b) and wise smooth, and have discriminants  of real
f 1 (b) are expected to be singular special codimension 1 in B, in contrast to smooth fibra-
Lagrangian 3-folds in M and M. tions which have  of codimension 2. We also
The fibrations f and f are called special Lagran- argue that for generic mirrors M, M and f , f,
the discriminants ,  cannot be homeomorphic
gian fibrations, and the set of singular fibers  is
called the discriminant. In part (i), the nonsingular and so do not coincide. This contradicts part (ii)
fibers of f and f are supposed to be dual tori. What above.
does this mean? A better way to formulate the SYZ conjecture
On the topological level, we can define duality may be in terms of families of mirror CalabiYau
between two tori T, T to be a choice of isomorph- 3-folds Mt , Mt and fibrations ft : Mt ! B, ft : Mt !
ism H 1 (T, Z) H1 (T, Z). We can also define B for t 2 (0, ) which approach the large complex
duality between tori equipped with flat Riemannian structure limit as t ! 0. Then we could require the
discriminants t ,  t of ft , f to converge to some
metrics. Write T = V=, where V is a Euclidean t
vector space and  a lattice in V. Then the dual common, codimension 2 limit 0 as t ! 0.
torus T is defined to be V  = , where V  is the It is an important, and difficult, open problem to
dual vector space and  the dual lattice. However, construct examples of special Lagrangian fibrations
there is no notion of duality between nonflat of compact, holonomy SU(3) CalabiYau 3-folds.
metrics on dual tori. None are currently known.
Strominger, Yau, and Zaslow argue only that
their conjecture holds when M, M are close to the See also: Minimal submanifolds; Mirror Symmetry:
large complex structure limit. In this case, the A Geometric Survey; Moduli Spaces: An Introduction;
Riemannian Holonomy Groups and Exceptional Holonomy.
diameters of the fibers f 1 (b), f 1 (b) are expected to
be small compared to the diameter of the base space
B, and away from singularities of f , f, the metrics on
the nonsingular fibers are expected to be approxi- Further Reading
mately flat. So, part (i) of the SYZ conjecture says Gross M, Huybrechts D, and Joyce D (2003) CalabiYau
that for b 2 BnB0 , f 1 (b) is approximately a flat Manifolds and Related Geometries, Universitext Series, Berlin:
Riemannian 3-torus, and f 1 (b) is approximately the Springer.
dual flat Riemannian torus. Harvey R and Lawson HB (1982) Calibrated geometries. Acta
Mathematical research on the SYZ conjecture has Mathematica 148: 47157.
Joyce DD (2000) Compact Manifolds with Special Holonomy.
followed two broad approaches. The first could be Oxford: Oxford University Press.
described as symplectic topological. For this, we Joyce DD (2003a) Special Lagrangian submanifolds with isolated
treat M, M just as symplectic manifolds and f , f just conical singularities. V. Survey and applications. Journal of
as Lagrangian fibrations. We also suppose B is a Differential Geometry 63: 279347, math.DG/0303272.
smooth 3-manifold and f , f are smooth maps. Under Joyce DD (2003b) Singularities of special Lagrangian fibrations
and the SYZ conjecture. Communications in Analysis and
these simplifying assumptions, Mark Gross, Wei- Geometry 11: 859907, math.DG/0011179.
Dong Ruan, and others have built up a beautiful, Joyce DD (2003c) U(1)-invariant special Lagrangian 3-folds in C3
detailed picture of how dual SYZ fibrations work at and special Lagrangian fibrations. Turkish Mathematical
the global topological level. Journal 27: 99114, math.DG/0206016.
The second approach could be described as local McLean RC (1998) Deformations of calibrated submanifolds.
Communications in Analysis and Geometry 6: 705747.
geometric. Here, we try to take the special Lagran- Strominger A, Yau S-T, and Zaslow E (1996) Mirror symmetry
gian condition seriously from the outset, and focus is T-duality. Nuclear Physics B 479: 243259, hep-th/
on the local behavior of special Lagrangian 9606040.
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 403

CalogeroMoserSutherland Systems of Nonrelativistic


and Relativistic Type
S N M Ruijsenaars, Centre for Mathematics and and with their most prominent features and inter-
Computer Science, Amsterdam, The Netherlands relationships. Second, we intend to give a rough
2006 Elsevier Ltd. All rights reserved. sketch of the state of the art concerning explicit
solutions for the various versions. This involves a
concretization of the action-angle maps and eigen-
function transforms that simultaneously diagonalize
Introduction the commuting dynamics, paying special attention to
Systems of CalogeroMoserSutherland (CMS) type their remarkable duality properties.
form a class of finite-dimensional dynamical systems It is beyond the scope of this article to review the
that are integrable both at the classical and at the hundreds of papers specifically dealing with CMS
quantum level. The CMS systems describe N point type systems, let alone the much larger literature
particles moving on a line or on a ring, interacting where they play some role. Indeed, the systems have
via pair potentials that are specific functions of four been encountered in a great many different contexts
types, namely rational (I), hyperbolic (II), trigono- and they are related to a host of other integrable
metric (III), and elliptic (IV). They occur not only in systems in various ways. Accordingly, they can be
a nonrelativistic (Galilei-invariant), but also in a studied from the perspective of various subfields of
relativistic (Poincare-invariant) setting. Thus, one mathematics and theoretical physics. First some of
can distinguish a hierarchy of 16 physically distinct these perspectives and relations to seemingly quite
versions (classical/quantum, nonrelativistic/relativis- different topics will be mentioned before embarking
tic, type IIV), the most general one being the on the far more focused survey.
quantum relativistic type IV system. Staying first within the confines of the CMS type
The nonrelativistic systems date back to pioneer- systems, some nonobvious limits yielding other
ing work by Calogero, Sutherland, and Moser in the familiar finite-dimensional integrable systems will
early 1970s. The pair potential structure of the be mentioned. To begin with, all of the AN1 type
interaction can be encoded in the root system AN1 , systems give rise to systems with a Toda type
and there also exist integrable versions for all of the (exponential nearest neighbor) interaction via a
remaining root systems. The classical systems are suitable limiting transition (basically a strong-
given by N Poisson commuting Hamiltonians with a coupling limit). This leads to integrable N-particle
polynomial dependence on the particle momenta systems with a classical/quantum, nonrelativistic/
p1 , . . . , pN . Accordingly, the quantum versions are relativistic, nonperiodic/periodic version; starting
described by N commuting Hamiltonians that are from the quantum relativistic periodic Toda system,
partial differential operators. the remaining seven versions can be obtained by
The relativistic systems were introduced in the suitable limits.
mid-1980s, at the classical level by Ruijsenaars and Next, we recall that the quantum system of N
Schneider, and at the quantum level by Ruijsenaars. nonrelativistic bosons on the line or ring interacting
They converge to the nonrelativistic systems in the via a pair potential of -function type is soluble via a
limit c ! 1, where c is the speed of light. Again, the Bethe ansatz, with the line version exhibiting
systems can be related to the root system AN1 , and quantum soliton behavior (factorized scattering). It
they admit integrable versions for other root has been shown that there exist scaling limits of
systems. All of the commuting classical Hamilto- eigenfunctions for suitable CMS systems that give
nians depend exponentially on generalized momenta rise to the latter Bethe type eigenfunctions for N = 2,
p1 , . . . , pN . Hence, the associated commuting quan- while convergence for N > 2 is plausible, but has
tum Hamiltonians are analytic difference operators. not been demonstrated thus far.
The above integrable systems can be further Via suitable analytic continuations preserving
generalized by allowing supersymmetry or internal reality/formal self-adjointness, one can arrive at
degrees of freedom (spins), coupled in quite CMS systems with more than one species of particle
special ways to retain integrability. In this article, (particles and antiparticles). Likewise, analytic
however, the focus is on the 16 versions of the continuations and appropriate limits of CMS sys-
AN1 -symmetric CMS systems without internal tems associated with root sytems other than AN1
degrees of freedom. The primary aim is to acquaint lead to a further proliferation of N-dimensional
the reader with their definition and integrability, integrable systems. Typically, such limits refer either
404 CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type

to the commuting Hamiltonians (the Toda limit in the intersection of the theory of Hilbert space
being a case in point) or to the joint eigenfunctions eigenfunction expansions and the theory of linear
(as exemplified by the -function system limit); it analytic difference equations.
seems difficult to control both sets of quantities at The study of the thermodynamics (N ! 1 limit
once. with temperature  0 and density  0 fixed) asso-
Starting from the spin type CMS systems, another ciated with the trigonometric and elliptic CMS
kind of limit can be taken. Specifically, by freez- systems and their spin cousins yields its own circle
ing the particles at equilibrium positions, it is of problems. It was initiated by Sutherland three
possible to arrive at integrable spin chains of decades ago, and even though a host of results on
HaldaneShastry and Inozemtsev type. partition functions, correlation functions, fractional
At this point, it is expedient to insert a brief statistics, strongweak coupling duality, relations to
remark on finite-dimensional integrable systems. As Yangians, etc., have meanwhile been obtained,
the term suggests, one may expect that, with due many questions are still open. This area also has
effort, such systems can be integrated, or, equiva- links with random-matrix theory, but the input from
lently, solved. But it should be noted that the this field is thus far limited to certain discrete
latter terms (let alone the qualifier due effort) couplings.
have no unambiguous mathematical meaning. Cer- The above N-dimensional integrable systems are
tainly, solving involves obtaining explicit infor- related to a great many infinite-dimensional integr-
mation on the action-angle map and joint able systems, both at the classical and at the
eigenfunction transform at the classical and quan- quantum level. On the one hand, there are structural
tum level, resp., but a priori it is not at all clear how analogs that have been used to advantage in the
far one can proceed. study of CMS systems, including Lax pair and R-
Focusing again on the CMS systems and their matrix formulations, zero-curvature representations,
relatives, it should be stressed that, in many cases, bi-Hamiltonian formalism, Backlund transforma-
one is still far removed from a complete solution, tions, time discretizations, and tools such as Baker
especially for the elliptic CMS systems. In this Akhiezer functions, Bethe ansatz, separation of
regard the previous remark serves not only as a variables, and Baxter-type Q-operators.
caveat, but also to make clear why the various On the other hand, there are striking physical
vantage points provided by different subfields in similarities between various soliton field theories
mathematics and physics are crucial: typically, they (a prominent one being the sine-Gordon field
yield complementary insights and distinct represen- theory) and infinite soliton lattices (in particular
tations for solutions, serving different purposes. several Toda type lattices), and the CMS systems for
To be sure, in first approximation the mathe- special parameter values. Particularly conspicuous
matics involved at the classical and quantum level is are the ties between the classical CMS systems and
symplectic geometry and Hilbert space theory, resp. the KP and two-dimensional Toda hierarchies. The
In point of fact, however, far more ingredients have latter relations actually extend beyond the solitons,
turned out to be quite natural and useful. On the including rational and theta function solutions.
classical level, these include the theory of groups, Lie CMS systems are relevant in various other
algebras and symmetric spaces, linear algebra and contexts not yet mentioned. A prominent one
spectral theory, Riemann surface theory, and more among these is a class of supersymmetric gauge
generally algebraic geometry. field theories. In this quantum context, the classical
On the quantum level, the viewpoint of harmonic CMS systems have surfaced in the description
analysis on symmetric spaces is particularly natural of moduli spaces encoding the vacuum structure
and fruitful for the nonrelativistic CMS systems and (SeibergWitten theory). Equally surprising, certain
their arbitrary root-system versions, whereas quan- classical CMS systems (with internal degrees
tum groups/algebras/symmetric spaces can be tied in of freedom) have found a second application in a
with the relativistic systems and their versions for quantum context, namely in the description of
other root systems. (The c ! 1 limit amounts to the quantum chaos (level repulsion).
q ! 1 limit in the quantum group picture.) As a We conclude this introduction by listing addi-
matter of fact, the whole area of special functions tional disparate subjects where connections with
and their q-analogs is intimately related to the CMS type systems have been found. These include
quantum CMS type systems (cf. also the last section the theory of Sklyanin, affine Hecke, KacMoody,
of this article). Finally, the occurrence of commut- Virasoro and W-algebras, equations of Knizhnik
ing analytic difference operators in the relativistic Zamolodchikov, YangBaxter, WittenDijkgraaf
(q 6 1) systems leads to largely uncharted territory VerlindeVerlinde, and Painleve type, Gaudin,
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 405

Hitchin, WessZumino, matrix and quasi-exactly reduces to [5] (up to an additive constant). Likewise,
solvable models, DunklCherednik and Polychrona- [7] results from [6] by choosing ! = =2 and
kos operators, the quantum Hall effect and quantum taking i!0 to 1.
transport, two-dimensional YangMills theory, The physical picture associated with the trigono-
functional equations, integrable mappings, Huygens metric and elliptic systems is quite different from
principle, and the bispectral problem. that of the rational and hyperbolic ones. Of course,
the potentials [7] and [6] are again repulsive, but
now the internal motion is confined and oscillatory.
Classical Nonrelativistic CMS Systems More specifically, due to energy conservation the
A system of N nonrelativistic equal-mass m particles phase spaces
on the line interacting via pair potentials can be III GIII  R N ;
described by a Hamiltonian
GIII fxN <    < x1 ; x1  xN < =g 8
1 XN X
H p2j Vxj  xk ; m>0 1 IV GIV  R N ;
2m j1 1j<kN
GIV fxN <    < x1 ; x1  xN < 2!g 9
The CMS systems are defined by four distinct
choices of pair potential. The simplest choice reads are left invariant by the flow generated by the
trigonometric and elliptic N-particle Hamiltonian, resp.
Vx g2 =mx2 ; g>0 I 2 Alternatively, one may interpret the trigonometric
Hamiltonian as describing particles constrained to
Hence, the coupling constant g has dimension
move on a circle and interacting via the inverse
[action] (the product of [position] and [momen-
square potential [2]. In this picture, the quantities
tum]). This potential is clearly repulsive. Thus, each
2x1 , . . . , 2xN are viewed as angular positions on
initial state in the phase space
the circle, and one needs a suitable quotient of the
 fx; p 2 R2N j x 2 Gg 3 phase space [8] by a discrete group action to
describe a state of the system.
where G is the configuration space Turning to integrability aspects, we begin by
G fx 2 RN j xN <    < x1 g 4 noting that the total momentum Hamiltonian

is a scattering state. X
N
P pj 10
The next level is given by the hyperbolic choice
j1

Vx g2  2 =m sinh2 x; >0 II 5 obviously Poisson commutes with the above defin-
1
Hence,  has dimension [position] , and the ing Hamiltonians of the systems. For N = 2, there-
previous system arises by taking  to 0. It is clear fore, integrability is plain. It is possible to write
that [5] yields again a repulsive particle system, so down explicitly the higher commuting Hamiltonians
that each state in  given by [3] is a scattering state. for N > 2 as well but, in the nonrelativistic setting,
The highest level in the hierarchy is the elliptic it is more illuminating to characterize them as the
level, where power traces or (equivalently) the symmetric func-
tions of a so-called Lax matrix.
Vx g2 }x; !; !0 =m; !; i!0 > 0 IV 6 The Lax matrix is an N  N matrix-valued
and }(x; !, !0 ) denotes the Weierstrass }-function function on the phase space of the system. It plays
with periods 2! and 2!0 . It is beyond the scope of a pivotal role not only for understanding integr-
this article to elaborate on the elliptic regime, even ability, but also for setting up an action-angle
though it is of considerable interest. It reappears in transformation. The latter issue is discussed again
later sections as the most general regime in which later. Here the more conspicuous features of the Lax
integrability holds true. Indeed, a prominent feature matrix will be explained, focusing on the type II
of the elliptic case [6] is that it can be specialized system for expository ease. Then one can choose
both to the hyperbolic case [5] and to the trigono- Ljj pj ; Ljk ig=sinh xj  xk ;
metric case, given by
j; k 1; . . . ; N; j 6 k 11
Vx g2  2 =m sin2 x III 7
Thus, L is Hermitean and we have
To obtain the hyperbolic specialization, one
should take !0 = i=2 and send ! to 1; then [6] tr L P; tr L2 2mH 12
406 CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type

(The rational Lax matrix results from [11] by taking Accordingly, one gets
 ! 0, and the trigonometric one by taking  ! i.
The elliptic Lax matrix has a similar structure, but it ^1 ; . . . ; p
Lt diagp ^N L1 ; t!1 19
involves an extra spectral parameter.)
Since the time evolution is a canonical transforma-
Although not obvious, it is true that all of the
tion and the Poisson brackets {Hk , Hl } are time
power traces
independent (by the Jacobi identity), it now readily
1 follows from [19] that they vanish. (Indeed, Hk and
Hk tr Lk ; k 1; . . . ; N 13 Hl reduce to power traces of L1 , and the asymptotic
k
momenta p1 , . . . , pN Poisson commute.)
are in involution (i.e., Poisson commute). One way to
understand this involves the so-called Lax pair
equation associated with the Hamiltonian flow gener-
Quantum Nonrelativistic CMS Systems
ated by H = H2 =m. This involves a second N  N
matrix function given by The canonical quantization prescription
X ig 2 pj ! ih@=@xj ; j 1; . . . ; N 20
Mjj 2
l6j m sinh xj  xl
(h being the Planck constant) gives rise to an
2 14 unambiguous quantum Hamiltonian
ig cosh xj  xk
Mjk
m sinh2 xj  xk
2 X
h N X
j 6 k H @j2 Vxj  xk 21
2m j1 1j<kN
When the positions and momenta in L and M evolve
according to the H-flow, one has for any classical Hamiltonian [1]. Thus, the defin-
ing Hamiltonians of the above systems give rise to
L_ t Mt ; Lt  15 well-defined partial differential operators (PDOs),
which act on suitable dense subspaces of the
where [  ,  ] is the matrix commutator. (Indeed, [15]
Hilbert space L2 (G , dx),  = I, . . . , IV, with GI and
amounts to the Hamilton equations, as is readily
GII given by G in [4], and GIII , GIV by [8] and [9],
checked.) Since M is anti-Hermitean, it is not
respectively.
difficult to derive from this Lax pair equation that
We recall that there is no general result ensuring that
the flow is isospectral: Lt is related to L0 by a
a classically integrable system admits an integrable
unitary transformation Lt = Ut L0 Ut obtained from
quantum version. More precisely, when one substi-
Mt , so that the spectrum of Lt is time independent.
tutes [20] in N Poisson commuting Hamiltonians, it
This argument already shows the existence of N
need not be true that they commute as quantum
conserved quantities under the H-flow, namely the
operators, even when no ordering ambiguities are
N eigenvalues of L. It is, however, simpler to work
present. For the power trace Hamiltonians such
with either the power traces Hk given by [13] or
ambiguities do occur. (For example, [11] gives rise
with the symmetric functions Sk of L, given by
to a term in H3 proportional to p1 =sinh2 (x1  x2 ).)
X
N On the other hand, no noncommuting factors occur
det1N L  k Sk 16 in the quantization of S1 , . . . , SN . To verify this, one
k0 need only note that Sk equals the sum of all k  k
These Hamiltonians depend only on the eigenvalues principal minors of L, cf. [16]; choosing a diagonal
of L, so they are also conserved under the flow. element pj in a summand, one therefore has no
Note that dependence on xj in the remaining factors, hence no
ordering ambiguity.
S1 P; S2 P2  mH 17 As a result, the prescription [20] yields N
To see why these Hamiltonians are in involution, unambiguous operators Sk (x, ihr), which are
one can invoke the long-time asymptotics of the moreover formally self-adjoint on L2 (G , dx) for
H-flow. It reads each of the four cases  = I, . . . , IV. Although by no
means obvious, it is true that these operators do
^;
pt p p^N <    < p
^1 ; commute. Thus, integrability is preserved under
xj t x ^j =m;
tp 18 quantization of the above systems. Now the power
j
traces of a matrix can be expressed as polynomials
j 1; . . . ; N; t ! 1 in the symmetric functions (via the Newton
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 407

identities), so this yields an ordering ensuring that A natural ansatz to take interaction into account
the quantized power traces commute as well. now reads
Just as the action-angle transformation for a
X
N p 
classically integrable system diagonalizes all of j
H mc2 cosh Vj x
the Poisson commuting Hamiltonians at once (in the j1
mc
sense that the transformed Hamiltonians depend
X
N p 
only on the action variables), one expects that there P mc sinh
j
Vj x 25
exists a unitary operator that transforms all of the j1
mc
commuting Hamiltonians to diagonal form. In the Y
classical setting, the existence of this diagonalizing Vj x f xj  xk
k6j
map follows (under suitable technical restrictions)
from the LiouvilleArnold theorem, whereas in the Indeed, it is plain that this still entails
quantum context the existence of such a joint
eigenfunction transformation is a far more delicate fH; Bg P; fP; Bg H=c2 26
issue. This problem is briefly discussed later again,
noting here that the solutions obtained to date vary But to obtain a relativistic particle system, the time
considerably in completeness and explicitness for and space translations must also commute. The
the four regimes. corresponding requirement {H, P} = 0 yields a severe
constraint on the pair potential function f (x) in
[25] whenever N > 2. (For N = 2, one gets
{H, P} = 0 irrespective of the choice of f.)
Classical Relativistic CMS Systems As it turns out, the vanishing requirement is
The nonrelativistic spacetime symmetry group is the satisfied when
Galilei group. Its Lie algebra is represented by the
f 2 x a b}x 27
time translation generator H given by [1], space
translation generator P given by [10], and the Galilei where a, b are constants and }(x) is the Weierstrass
boost generator function already encountered. Taking, for example,
a, b > 0, one can take the positive square root of the
X
N
B m xj 22 right-hand side of [27]. This choice of f (x) yields the
j1
defining Hamiltonian of the relativistic elliptic
system (type IV). In the three degenerate cases, it is
More precisely, the Poisson brackets are given by convenient to choose
8 2 2 2 2 1=2
fH; Pg 0; fH; Bg P; fP; Bg Nm 23 < 1 g =m c x
> I
2 2 1=2
so that the last bracket does not vanish (as is f x 1 sin g=mc=sinh x II 28
>
: 2 2 1=2
the case for the Galilei Lie algebra). This deviation 1 sinh g=mc=sin x III
is inconsequential, however, since the constant
Nm (central extension) yields trivial Hamilton It is an elementary exercise to check that this
equations. implies
The relativistic spacetime symmetry group (Poin-
lim H  Nmc2 Hnr ; lim P Pnr 29
care group) yields a Lie algebra that differs from c!1 c!1
[23] only in Nm being replaced by H=c2 , where c is
where Hnr and Pnr are the above nonrelativistic time
the speed of light. Clearly, the functions
and space translation generators. Hence, the defin-
X
N p  ing Hamiltonians of the relativistic systems reduce
j
H mc2 cosh to their nonrelativistic counterparts in the limit
mc
j1 c ! 1.
24
X
N p  The special character of the function [27] makes
j
P mc sinh itself felt not only in ensuring Poincare invariance,
mc
j1 but also in entailing integrability. To begin with,
note that the functions
together with B given by [22] give rise to these
altered Poisson brackets. Physically, these three  X
N 
generators describe a system of N relativistic free S
N exp
 pj ;  1=mc 30
mass-m particles in terms of their rapidities pj =mc. j1
408 CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type

commute with H and P, so that integrability for f (x) = 1, one obtains commuting quantum operators
N = 3 is plain. More generally, the Hamiltonians whose action is exemplified by
X  X Y    
S
l exp
 pj f xj  xk ; h d h
exp  i Fx F x  i 39
I f1;...;Ng j2I j2I
31 mc dx mc
jIjl k62I

l 1; . . . ; N That is, the operators act on functions that have an


analytic continuation in x1 , . . . , xN from the real line
can be shown to mutually commute. Clearly, one has R to a strip around R in the complex plane C,
Sl SN SNl ; l 1; . . . ; N  1 32 whose width is at least 2h=mc.
Operators of this type are called analytic differ-
and ence operators (henceforth AOs). The choice
H S1 S1 =2m2 ; P S1  S1 =2 33 f (x) = 1 amounts to the free case g = 0 in [28].
For g 6 0, however, the canonical quantization
exemplified by [39] yields noncommuting AOs.
As anticipated by the notation, the functions Thus, the factor ordering following from [31]
S1 , . . . , SN may be viewed as the symmetric functions would entail that integrability breaks down at the
of a Lax matrix. More precisely, in the elliptic case quantum level.
this is true up to multiplicative constants that As mentioned before, there is no general result
depend on a spectral parameter occurring in the guaranteeing that a different ordering that preserves
Lax matrix. As before, only the Lax matrix for the integrability exists. Even so, this is true in the
type II system is specified here. In this case, one can present case. Specifically, the function f (x) can be
dispense with the spectral parameter and choose factorized as f (x)f (x), and then the AOs
Ljk ej Cjk ek ; j; k 1; . . . ; N 34 X Y
S
l f xj  xk
I f1;...;Ng j2I
where jIjl k62I
Y  X Y
ej expxj pj =2 f xj  xl 1=2 35  exp ih @j f
xj  xk 40
l6j j2I j2I
k62I

sinhig
Cjk expxj xk 36 do commute. In the elliptic case [27], this factoriza-
sinh xj  xk ig
tion involves the Weierstrass -function, and com-
In [35], f (x) is the type II function given by [28]. The mutativity can be encoded in a sequence of
matrix C arises from Cauchys matrix 1=(wj  zk ) functional equations satisfied by the -function.
via a suitable substitution, and Cauchys identity For the type IIII systems the pertinent factorization
 N of [28] is given by
1 8
det > 1=2
wj  zk j;k1 < 1
ig=x I
Y
N Y wj  wk zj  zk f
x sinh x
ig=sinh x1=2 II 41
1 >
:
37 sin x
ig=sin x1=2 III
w  zj 1j<kN wj  zk zj  wk
j1 j
(Here one has g > 0, and the choice of square root is
ensures that [34] yields the Hamiltonians Sl of [31]. such that f
(x) ! 1 for g # 0.)
To conclude this section, we point out that the The nonrelativistic limit c ! 1 of the quantum
relation Hamiltonians [33] can be determined by expanding
S1 and S1 in a power series in  = 1=mc. In this
L 1N Lnr O 2 ; !0 38
way, one obtains once more [29], except for a small,
where Lnr denotes the nonrelativistic Lax matrix but crucial change in Hnr : instead of the coupling
[11], can be used to deduce the involutivity of the constant dependence g2 in the potential energy, one
nonrelativistic Hamiltonians from that of their gets g(g  h). The extra term arises from the action
relativistic counterparts. of the term linear in  in the expansion of the
exponential on the term linear in  in the expansion
of the functions f
(x).
Quantum Relativistic CMS Systems
From the perspective of the nonrelativistic quan-
When the canonical quantization prescription [20] is tum CMS systems, the change g2 ! g(g  h) appears
applied to the classical Hamiltonians [31] with ad hoc. As it transpires, however, the different
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 409

dependence on g ensures that the eigenfunctions of and in particular to reveal its hidden duality
Hnr depend on g in a far simpler way. This will properties. The starting point is a commutation
become clear shortly. relation of L(x, p) with a diagonal matrix A(x)
given by
Ax diagdx1 ; . . . ; dxN
Action-Angle Transforms and Duality 
y (I) 44
Under certain technical assumptions, any integrable dy
exp2y (II)
system given by N independent Poisson commu-
ting Hamiltonians S1 (x, p), . . . , SN (x, p) on a 2N- Obviously, the symmetric functions Dk (x) of A(x)
dimensional phase space admits local canonical yield an integrable system on , so the Hamiltonians
transformations to action-angle variables. Like the  k 1 ^
^ D ^;
x; p
Dk ^ x; p k 1; . . . ; N 45
spectral theorem on the quantum level, this
structural result is of limited practical value. Indeed, yield an integrable system on the action-angle phase
just as the spectral theorem yields no concrete space . The crux of the matter is now that these
information concerning eigenfunctions, bound-state systems are familiar: they are also systems of type I
energies, scattering, etc., associated with a given and II!
self-adjoint Hamiltonian, the LiouvilleArnold To be specific, let us denote the dual systems just
theorem only yields general insight in the type of described by a caret, and the nonrelativistic/relati-
motion that can occur and the geometric character vistic systems by a suffix nr/rel, resp. Then the
of the local maps (in terms of invariant tori). duality properties alluded to are given by
To fully comprehend (solve) a given integrable
^Inr Inr ; ^Irel IInr
system, one should render the associated action- 46
angle map as concrete as possible. For the CMS type ^ nr Irel ;
II ^ rel IIrel
II
systems, a complete solution to this problem has
only been achieved for the systems of type IIII. The and 1 serves as the action-angle map for the dual
motion in the trigonometric systems is oscillatory, so systems.
that a closeup via the action-angle transform In order to sketch why this state of affairs holds
involves extensive geometric constructions. By con- true for the IIrel system, recall that its Lax matrix is
trast, the type I and II systems are scattering systems, given by [34]. From this, one readily checks the
and here the action-angle map can be tied in with commutation relation
the classical wave maps (Mller transformations). cothigA; L 2e  e  AL LA 47
We now sketch some salient features of the
action-angle maps for systems of type I and II. In Since L is Hermitean, there exists a unitary U
all cases the map (denoted ) is a canonical diagonalizing L. It can now be shown that the
transformation from the phase space  (eqn [3]) spectrum of L is positive and nondegenerate, and
with 2-form dx ^ dp to the phase space that U e has nonzero components. The gauge
ambiguity in U (given by a permutation matrix and
^ f^
 ^ 2 R2N j p
x; p ^ 2 Gg 42 diagonal phase matrix) can, therefore, be fixed by
requiring
with 2-form dx ^ dp. Thus, the actions p1 , . . . , pN
vary over G given by [4] and the angles x1 , . . . , xN U LU diagexpp
^1 ; . . . ; expp
^N ;
over R. Consequently,  amounts to  with x and p
^ ^
pN <    < p1 48
interchanged.
As should be the case, the transformed commuting U ej > 0; j 1; . . . ; N 49
Hamiltonians
A suitable reparametrization of U e then yields the
^
Sk Sk  1 ; k 1; . . . ; N 43 angle vector x.
depend only on the action vector p. To be specific, As a consequence, U AU becomes a function of x
they arise from Sk (x, p) by taking g = 0 (no interac- and p. In detail, one finds
tion, hence no x dependence) and substituting p ! p. U AU^ ^ L=2; 2; p
x; p ^ T
^; x 50
Indeed, the actions pk are the t ! 1 limits of the
momenta pk (t), where the t dependence refers to the where L(, ; x, p) is given by [34] and T denotes the
defining Hamiltonian of the system. transpose. Therefore, the dual Lax matrix
As it happens, the Lax matrix L is of decisive A = U AU is essentially equal to L, explaining the
importance to concretize the action-angle map , ^ rel IIrel announced above.
self-duality II
410 CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type

With the action-angle transform under explicit existence of joint eigenfunctions has been shown,
control, much more can be said about the solutions but also because in the relativistic case the unitarity
to Hamiltons equations for each of the commuting of II and IV already breaks down for N = 2 when
Hamiltonians, both as regards finite times and as g increases beyond a critical value, cf. [57] below. It
regards long-time asymptotics (scattering). It is is quite likely that this happens for N > 2 as well,
beyond the scope of this article to enlarge on this, but this is not readily apparent from the current
but it is worth mentioning that the scattering reveals fragmentary knowledge on joint eigenfunctions for
the solitonic character of the particles. Indeed, the N > 2.
set of asymptotic momenta p1 , . . . , pN is conserved The only two cases where the g > 0 joint
under the scattering and the asymptotic position eigenfunction transform is of an elementary nature
shifts are factorized in terms of pair shifts. A quite are the IIInr and IIIrel cases. Indeed, the joint
remarkable feature of the type I systems is that the eigenfunctions describing the internal motion are of
shifts actually vanish (billiard ball scattering). the form

n x Wx1=2 Pn x; n 2 NN1 54


Eigenfunction Transforms and Duality Here,
Both at the relativistic and at the nonrelativistic level Y
the commuting quantum Hamiltonians S1 , . . . , SN Wx wxj  xk 55
1j<kN
are formally self-adjoint on the Hilbert space
L2 (G , dx),  = I, . . . , IV. Thus, it may be expected is a positive weight function on GIII and the Pn (x)
that it is possible to construct a unitary eigenfunc- are multivariable orthogonal polynomials. Thus,
tion transform Pn (x) is a finite linear combination of the above
^  ; d  p; free boson states, with p in [52] a linear function of
 : L2 G ; dx ! L2 G
n. For the IIInr case, these eigenfunctions were
 I; . . . ; IV 51 already found by Sutherland. (Here, the functions
diagonalizing Sk as multiplication by a real-valued Pn (x) amount to polynomials, often called the Jack
function Mk (p). Here G encodes the joint spectrum polynomials, which arose in a statistics context.)
and d  (p) is a suitable measure on G . The IIIrel polynomials may be viewed as the special
Obviously, this expectation is borne out in the AN1 case of Macdonalds orthogonal q-polyno-
free case g = 0. Then,  is basically Fourier mials for arbitrary root systems, with
transformation, its kernel consisting of a sum of q exp2h 56
joint eigenfunctions
(Note that q converges to 1 both in the nonrelati-
expix  p=
h;  2 SN 52 vistic limit c ! 1 and in the classical limit h ! 0.)
with  ranging over the permutation group SN . For For the IInr case, the joint eigenfunctions were
 = I, II, one can take G = G = G (eqn [4]) and found and studied a couple of decades ago by
d  (p) = dp. Here one gets Heckman and Opdam, yielding a multivariable
( hypergeometric transform. Indeed, for N = 2, the
X pi1    pik eigenfunctions can be expressed in terms of the
Mk p 53 hypergeometric function 2 F1 , as has been known
1i1 <<ik N exppi1    exppik
since the early days of quantum mechanics. Like-
in the nonrelativistic and relativistic case, resp. For wise, the arbitrary-N Inr joint eigenfunction trans-
 = III, IV, one needs to take into account periodic form (studied in detail by de Jeu) can be viewed as a
boundary conditions on the walls of G , yielding a multivariable Hankel transform, the N = 2 kernel
discrete joint spectrum after the center-of-mass being essentially a Hankel function.
motion is omitted. (With the above choices of GIII Much less is known concerning IVnr eigenfunc-
and GIV , cf. [8] and [9], the center-of-mass motion is tions, and a fortiori for the associated transform
a free motion along the line, so the total momentum IV . For N = 2 the time-independent Schrodinger
still varies continuously.) Of course, the diagona- equation amounts to the Lame equation. Hence,
lized Sk are once more given by [53], since the kernel solutions are Lame functions that can be studied in
of  consists of free boson states. particular via Fuchs theory (regular singularities). A
Taking next g > 0, the above expectation has not far more explicit form of the eigenfunctions dates
been confirmed for all of the eight regimes involved. back to work by Hermite in the nineteenth century.
This is not only because in some cases not even the More precisely, provided the g dependence of the
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 411

defining Hamiltonian is changed from g2 to g(g  h) To conclude, we mention that the soliton scatter-
(a change already encountered above), Hermites ing behavior at the classical level is preserved under
results apply to couplings g = l h, l = 2, 3, 4, . . . His quantization in all cases where this can be checked.
eigenfunctions have a structure that is nowadays That is, no new momenta are created in the
referred to as the Bethe ansatz. For the same g values scattering process and the S-matrix is factorized as
and arbitrary N, Hnr eigenfunctions of Bethe ansatz a product of pair S-matrices. Moreover, for the type
type were found and studied by Felder and I cases, the S-matrix is a momentum-independent
Varchenko, but even for these g values much (but g-dependent) phase, as a quantum analog of the
remains to be done to achieve a complete under- classical billiard ball scattering.
standing of the IV transform.
A quite different approach, due to Komori and See also: Bethe Ansatz; Classical r-Matrices, Lie
Takemura, does yield rather detailed information on Bialgebras, and Poisson Lie Groups; Functional
IV for arbitrary g > 0. The key feature of their Equations and Integrable Systems; Integrable Discrete
Systems; Integrable Systems and Algebraic Geometry;
strategy is to view the IVnr case as a perturbation of
Integrable Systems in Random Matrix Theory; Integrable
the IIInr case. This entails, however, that the validity
Systems: Overview; Isochronous Systems; Ordinary
of their results is restricted to large imaginary period Special Functions; q-Special Functions; Quantum
of the }-function. CalogeroMoser Systems; SeibergWitten Theory;
For the IVrel system, there are only rather Separation of Variables for Differential Equations;
complete results on IV for N = 2. More specifically, Sine-Gordon Equation; Toda Lattices.
the eigenfunction transform is known to be unitary
for
g 2 0; 
h = 57 Further Reading
and a dense set in a corresponding parameter space. Babelon O, Bernard D, and Talon M (2003) Introduction to
(For g outside this interval, unitarity is violated.) Classical Integrable Systems. Cambridge: Cambridge Univer-
The kernel of IV involves eigenfunctions of Bethe sity Press.
ansatz structure. For g = lh, l = 2, 3, . . . and arbitrary Calogero F (1971) Solution of the one-dimensional N-body
problem with quadratic and/or inversely quadratic pair
N, Bethe ansatz type Hrel eigenfunctions were found potentials. Journal of Mathematical Physics 12: 419436.
by Billey, generalizing the FelderVarchenko results Calogero F (2001) Classical Many-Body Problems Amenable to
mentioned above. Exact Treatments. Berlin: Springer.
It remains to discuss the Irel and IIrel systems. To van Diejen JF and Vinet L (eds.) (2000) CalogeroMoser
this end, we first recall the classical dualities [46]. It Sutherland Models. Berlin: Springer.
Fock V, Gorsky A, Nekrasov N, and Rubtsov V (2000) Duality in
is natural to expect that these dualities are still integrable systems and gauge theories. Journal of High Energy
present at the quantum level. For the Inr case, this is Physics 7(28): 139.
readily confirmed: the transform is indeed invariant Marshakov A (1999) SeibergWitten Theory and Integrable
under interchange of x and p. In fact, the N = 2 Systems. Singapore: World Scientific.
center-of-mass Hankel transform even depends only Moser J (1975) Three integrable Hamiltonian systems connected
with isospectral deformations. Advances in Mathematics
on (x1  x2 )(p1  p2 ), so that self-duality is manifest 16: 197220.
in this case. Olshanetsky MA and Perelomov AM (1981) Classical integrable
More generally, for N = 2 the expected dualities finite-dimensional systems related to Lie algebras. Physics
[46] are indeed present. The IInr 2 F1 transform Reports 71: 313400.
satisfies the Irel analytic difference equation in p1  Olshanetsky MA and Perelomov AM (1983) Quantum integrable
systems related to Lie algebras. Physics Reports 94: 313404.
p2 due to the contiguous relations obeyed by 2 F1 . The Ruijsenaars SNM (1987) Complete integrability of relativistic
IIrel transform is only unitary when g is restricted by CalogeroMoser systems and elliptic function identities.
[57], and it is indeed self-dual in the same sense as the Communications in Mathematical Physics 110: 191213.
action-angle map (Ruijsenaars). Ruijsenaars SNM (1999) Systems of CalogeroMoser type. In:
Turning finally to the case N > 2, the multi-variable Semenoff G and Vinet L (eds.) Proceedings of the 1994 Banff
Summer School Particles and Fields, pp. 251352. Berlin:
hypergeometric transform II does have the expected Springer.
duality property. More specifically, its inverse diag- Ruijsenaars SNM and Schneider H (1986) A new class of
onalizes the commuting Irel AOs (Chalykh). For IIrel integrable systems and its relation to solitons. Annals of
with N > 2 and g = l h, l = 2, 3, . . . , Chalykh also Physics (NY) 170: 370405.
finds elementary joint eigenfunctions with the Sutherland B (1972) Exact results for a quantum many-body
problem in one dimension II. Physical Review A
expected self-duality. To date, no Hilbert space results 5: 13721376.
for the N > 2 IIrel case have been obtained.
412 Canonical General Relativity

Canonical General Relativity


C Rovelli, Universite de la Mediterranee et Centre de initial time, the theory predicts the value A(t) of
Physique Theorique, Marseilles, France these quantities for any given later instant of time t.
2006 Elsevier Ltd. All rights reserved. The space of the possible initial states s is the phase
space 0 . Observables are real functions on 0 .
Infinitesimal time evolution can be represented as a
vector field in 0 . This vector field is determined by
Introduction
the Hamiltonian, which is also a function on 0 . The
Lagrangian formulations of general relativity (GR) integral lines s(t) of this vector field determine
were found by Hilbert and by Einstein himself, the time evolution A(t) = A(s(t)) of the observables.
almost immediately after the discovery of the theory. This conceptual structure is very general. It can be
The construction of Hamiltonian formulations of easily adapted to special-relativistic systems. How-
GR, on the other hand, has taken much longer, and ever, it is not general enough for general-relativistic
has required decades of theoretical research. systems. GR is not formulated as the evolution of
The first such formulations were developed by states and observables in a preferred time variable
Dirac and by Bergmann and his collaborators, in the which can be measured by a physical clock. Rather,
1950s. Their cumbersome formalism was simplified it is formulated as the relative (common) evolution
by the introduction of new variables: first by of many observable quantities. Accordingly, in GR
Arnowit, Deser, and Misner in the 1960s and then there is no quantity playing the same role as the
by Ashtekar in the 1980s. A large number of conventional Hamiltonian. In fact, the canonical
variants and improvements of these formalisms Hamiltonian density that one obtains from a
have been developed by many other authors. Most Legendre transformation from a Lagrangian
likely the process is not over, and there is still much vanishes identically in GR.
to learn about the canonical formulation of GR. The origin of this peculiar behavior of the theory is
A number of reasons motivate the study of the following. The field equations are written as
canonical GR. In general, the canonical formalism evolution equations in a time coordinate t. However,
can be an important step towards quantum theory; they are invariant under arbitrary changes of t. That is,
it allows the identification of the physical degrees of if we replace t with an arbitrary function t0 = t0 (t) in a
freedom, and the gauge-invariant states and obser- solution of the field equations, we obtain another
vables of theory; and it is an important tool for solution. This underdetermination does not lead to a
analyzing formal aspects of the theory such as its lack of predictivity in GR, because we do not interpret
Cauchy problem. All these issues are highly non- the variable t as the measurable reading of a physical
trivial, and present open problems, in GR. clock, as we do in non-general-relativistic theories.
In turn, the structural peculiarity and the con- Rather, we interpret t as a nonobservable mathematical
ceptual novelty of GR have motivated re-analyses parameter, void of physical significance. Accordingly,
and extensions of the canonical formalism itself. the notions of state at a given time and value of
The following sections discuss the source of the an observable at a given time are very unnatural in GR.
peculiar difficulty of canonical GR, and summarize A Hamiltonian formulation of GR requires a
the formulations of the theory that are most version of the canonical formalism sufficiently
commonly used. general to deal with this broader notion of evolu-
tion. Generalizations of the Hamiltonian formalism
have been developed by many authors, such as Dirac
The Origin of the Difficulties (see below), Souriau, Arnold, Witten, and many
The reason for the complexity of the Hamiltonian others. The first step in this direction was taken by
formulation of GR is not so much in the intricacy of Lagrange himself: Lagrange gave a time-independent
its nonlinear field equations; rather, it must be found interpretation of the phase space as the space  of
in the conceptual novelty introduced by GR at the the solutions of the equations of motion (modulo
very foundation of the structure of mechanics. gauges). As we shall see, however, consensus is still
The dynamical systems considered before GR can lacking on a fully satisfactory formalism.
be formulated in terms of states evolving in time. One
assumes that a time variable t can be measured by a
Dirac Theory of Constrained Systems
physical clock, and that certain observable quantities
A of the system can be measured at every instant of Dirac has developed a Hamiltonian theory for
time. If we know the state s of the system at some mechanical systems with constraints, precisely in
Canonical General Relativity 413

view of its application to GR. Diracs theory is A constrained system is first class if the Poisson
beautiful, finds vast applications, and it is still brackets of the constraints among themselves
commonly taken as the basis to discuss Hamiltonian vanishes weakly. Maxwell theory and GR are first-
GR, although GR does not fit very naturally into class constrained systems. In a first-class constrained
Diracs scheme. In the following, only the part of system, the constraints generate flows that preserve
Diracs theory relevant for GR is summarized. C and foliate it into orbits. The space of these
Consider a Lagrangian system with Lagrangian orbits is called the physical phase space (see
variables qi , with i = 1, . . . , n. Call vi the corresponding Figure 1).
velocities. Let the system be defined by the Lagrangian This flow is interpreted as a gauge transforma-
L(qi , vi ). The momenta are defined as functions of qi tion, namely as a change of mathematical descrip-
and vi by pi (qi , vi ) = @L(qi , vi )=@vi . The canonical tion of the same physical state. As first observed by
Hamiltonian H(qi , pi ) = vi (qi , pi )pi  L(qi , vi (qi , pi )) Dirac, such interpretation is necessary if we demand
(summation over repeated indices is understood) is a deterministic physical evolution, for the following
obtained by inverting the function pi (qi , vi ) and expres- reason. A first-class constrained system is a system
sing the velocities as functions of the momenta vi (qi , pi ). in which the time evolution qi (t) of the Lagrangian
The phase space 0 is the space of the variables (qi , pi ). variables is not completely determined by the
Infinitesimal time evolution is given by the vector field equations of motion. (The relation between con-
V = vi (qi , pi )@=@qi fi (qi , pi )@=@pi , where velocities straints and underdetermination of the evolution is
and forces are given by the Hamilton equations simple to understand. In a Lagrangian system, the
vi = @H=@pi and fi = @H=@qi . number of equations of motion is equal to the
More formally, the 2-form ! = dpi ^ dqi endows number of Lagrangian variables. If one of these
0 with a symplectic structure. In the presence of equations is a constraint (between the initial
such a structure, every function A determines a velocities and initial coordinates), then one evolu-
vector field VA , defined by iVA ! = dA. By inte- tion equation is missing.) To recover a deterministic
grating this field, we have a flow in 0 , called the physical evolution, we must interpret two mathe-
flow generated by A. Time evolution is the flow matical states that can evolve from the same initial
generated by the Hamiltonian. Given two functions data, as describing the same physical state. As
A and B, their Poisson brackets are defined by the shown by Dirac, the transformations generated by
function {A, B} = VA (B) = VB (A). Therefore, the the constraints are precisely the ones that implement
time evolution of an observable A satisfies such an identification.
dA=dt = {A, H}. A dynamical system is completely It follows that the physical states must be identified
characterized by the set (0 , !, A, H), where with the equivalence classes of the points of C under
A = (A1 , . . . , AN ) is the ensemble of the observables. the gauge transformations generated by the con-
A constrained system, in the sense of Dirac, is straints, namely with the orbits of their flow. It is
a system for which the image of the function vi ! easy to show that (locally) there is a unique
pi (qi , vi ) is smaller than Rn . We can characterize symplectic 2-form !ph on ph such that its pullback
the image I of the map (qi , vi ) ! (qi , pi ) with a set to C is equal to the pullback of ! to C (i ! =  !ph ,
of equations on 0 see Figure 1). Physical observables Aph are functions
on C that are gauge invariant, namely constant on
C qi ; pi 0 1

where  = 1, . . . , m0 . These are called the primary 0


constraints. i
The constraint surface C is the largest subspace Orbits

of I which is preserved by time evolution. It can be C


characterized by adding additional constraints, still
of the form (1), with  = m0 1, . . . , m. These

additional constraints, called secondary constraints,
can be computed as the Poisson brackets of the
ph
primary constraints with the Hamiltonian (plus the
Poisson brackets of these secondary constraints with
Space of the orbits
the Hamiltonian, and so on, until the Poisson
Figure 1 The structure of a first-class constrained system.
brackets of all the constraints with the Hamiltonian 0 : phase space, C : constraint surface, ph : physical phase
vanish on in C). We say that an equation holds space; i : imbedding of C in ;  projection to orbit space
weakly if it holds on C. (sending each point into its orbit).
414 Canonical General Relativity

the orbits. That is, they are functions on ph . The freedom of GR are therefore (10  4  4) = 2 per
Hamiltonian is a physical observable. The dynamical point. In the linearized theory, these are the two
system (ph , !ph , Aph , H), where Aph is the ensemble degrees of freedom that describe the two polariza-
of the physical observables, is a complete description tions of a gravitational wave of given momentum.
of the physical system, called the gauge-invariant Formulations of GR in which there are additional
formulation, with no more constraints or gauges. gauge invariances (such as Cartans tetrad formula-
For instance, the phase space of Maxwell theory is tion, see below) have, accordingly, more constraints.
coordinatized by the Maxwell potential Since the Hamiltonian generates evolution in the
A (x),  = 0, 1, 2, 3, and its conjugate momentum Lagrangian evolution parameter t, and since such
E (x). Since the time derivative of A0 does not evolution can be obtained as a gauge transforma-
appear in the Maxwell action, the primary con- tion, it follows that the Hamiltonian is a constraint
straint is in GR. The vanishing of the Hamiltonian is a
characteristic feature of general-relativistic systems.
E0 x 0 2 The Hamiltonian structure of GR is therefore
The secondary constraint turns out to be the Gauss determined by its phase space and its constraints.
law, The gauge-invariant formulation of the theory is
given just by the set (ph , !ph , Aph ) and no Hamilto-
@a Ea x 0 3 nian. The physical interpretation of this structure is
where a = 1, 2, 3. The first generates arbitrary discussed in the last section.
transformations of A0 , while the second gene-
rates the time-independent gauge transformations
Aa (x) = @a (x). The pair (A0 , 0 ) can be dropped ADM Formalism
altogether, since it is formed by a pure gauge In Einsteins formulation, the Lagrangian variable of
variable and a variable constrained to vanish. GR is the metric field g (x, t) (here we use the
RThe3 (gauge-invariant) Hamiltonian is H = 1=8 signature [ , , , ]). Arnowit, Deser, and
d x (Ea Ea Ba Ba ), where Ba = abc @b Ac is the Misner have introduced the following change of
magnetic field and Ea is easily recognized as the variables:
electric field. Ea and Ba are the physical p
observables. qab gab ; N 1= g00 ; N a qab ga0 6
where qab is the inverse of the three-dimensional
metric qab , used henceforth to raise and lower space
General Structure of GR Constraints indices a, b = 1, 2, 3. This is equivalent to writing the
GR fits into Dirac theory with a certain difficulty. invariant interval in the form
Since the constraints are the generators of the gauge ds2 N 2 dt2 qab dxa N a dtdxb N b dt
invariances, it is easy to determine their structure in
GR. The gauge invariances of GR are given by the These variables have an interesting geometric inter-
coordinate transformations x ! x0 = f  (x), where pretation. Consider a family of spacelike (ADM)
x = (x, t). Accordingly, we have four primary con- surfaces t defined by t = constant. qab is the 3-metric
straints  = 0, analogous to [2], and four secondary induced on the surface. N is called the lapse function
constraints C (x) = 0, analogous to [3]. These are and N a is called the shift function. Their geometrical
usually separated into the three momentum interpretation is illustrated in Figure 2.
constraints When written in terms of these variables, the
action of GR takes the form
Ca x 0 4 Z
p
which generate fixed-time spatial coordinate trans- Sqab ; N; N a  d4 x qNR kab kab  k2 
formations and the Hamiltonian constraint
Cx 0 5 where q = det qab and R are the determinant and the
Ricci scalar of the metric qab ;
which generates changes in the t coordinate.
1
The metric g (x) that represents the gravitational kab @t qab  Da Nb  Db Na
field in Einsteins original formulation has ten 2N
independent components per point. Each first-class is the extrinsic curvature of the constant time
constraint indicates that one Lagrangian variable is surface; and Da is the covariant derivative of qab .
a gauge degree of freedom. The physical degrees of This action is independent of the time derivatives of
Canonical General Relativity 415

t + dt Tetrad Formalism
N a dt
The tetrad formalism, developed by Cartan, Weyl,
(x, t + dt) and Schwinger, has definite advantages with respect
N
to the metric formalism. It allows the coupling of
t
fermion fields to GR and is, therefore, needed to
(x, t) couple the standard model to GR. In the tetrad
Figure 2 The geometrical interpretation of the lapse N(x , t)
formalism, the gravitational field is represented by
and shift N a (x , t) fields. Two ADM surfaces, defined by the four covariant fields eI (x), where I, J, . . . = 0, 1, 2, 3
values t and t dt, are displayed. N(x , t)dt is the proper length are flat Lorentz indices raised and lowered with the
of the vector joining the two surfaces, normal to the first surface Minkowski metric IJ = diag[1, 1, 1, 1]. The
at (x , t). This is the proper time lapsed between the two surfaces relation with the metric formalism is given by
for an observer at rest on the first surface at (x , t). The quantity
dx a = N a (x , t)dt is the shift (the displacement) between the g IJ eI eJ
endpoint of this vector and the point (x , t dt) having the same
spacial coordinates as (x , t). In this formulation, GR has an additional local
SO(3,1) gauge invariance, given by local Lorentz
transformations on the I indices. The corresponding
N and N a . The conjugate momenta  and a of these
canonical formalism is usually defined in a gauge
quantities are therefore the primary constraints and
in which ei0 = 0, where i, j, . . . = 1, 2, 3 are flat
the pairs (, N) and (a , N a ) can be taken out of the
three-dimensional indices raised and lowered with
phase space as for the pair (E0 , A0 ) in the Maxwell
example. We can therefore take the 3-metric qab (x) the ij = diag[1, 1, 1]. In this gauge, the
and its conjugate momentum pab (x) as the canonical Lorentz group is reduced to the local SO(3) group
variables of GR. The momentum is related to the of spatial transformations, and the ADM variable
velocity @t qab , by are defined by
 
p N Ni
pab qkab  kqab I
e 11
0 eia
where k = kab qab .
where N i = eia Na . This is equivalent to writing the
The secondary constraints [4] and [5] turn out to be
  invariant interval in the form
p 1 b  
Ca qDb p p a 0 7 ds2 N 2 dt2 eai dxa Ni dt eib dxb N i dt
q
and The reduced canonical variables can be taken to be
  the field eia (x) that represents the triad of the
1 1 p
C p pab pab  p2  qR 0 8 ADM surface, and its conjugate momentum pai (x).
q 2 Their relation with the three-dimensional metric
where p = pab qab variables is given by transforming internal indices
If the two fields qab (x, t) and pab (x, t) satisfy the into tangent indices with the triad field eia and its
Hamilton equations inverse eai . In particular,
j
@qab x; t qab ij eia eb 12
fqab x; t; Htg 9
@t
pab ebi pai 13
ab
@p x; t
fpab x; t; Htg 10 Also, for later reference,
@t
where 2  i 1 i 
Z kia  eib kab p  e p 14
det e a 2 a
3 ab
Ht d x Nx; tCqab x; t; p x; t
where p = eia pai .
N a x; tCa qab x; t; pab x; t The momentum and Hamiltonian constraints are
the same as in the ADM formulation, with qab and
with arbitrary functions N(x, t), N a (x, t), then the pab expressed in terms of the triad variables. The
metric g (x, t), defined from qab , N, N a by eqn [6], is additional constraint that generates the internal
the general solution of the vacuum Einstein equation rotations is
Ricci[g] = 0. Therefore, these equations provide a
Hamiltonian form of the Einstein field equation. Gi ijk eja pak 0 15
416 Canonical General Relativity

Ashtekar Formalism A variant of this formalism commonly used in


quantum gravity is obtained by replacing [16] with
The Ashtekar formalism simplifies the form of the
the Barbero connection
constraints and casts GR in a form having the same
kinematics as YangMills theory. With its variants, it Aia 1
2 ijk !ajk kia 22
is widely used in nonperturbative quantum gravity, in
particular in the loop formulation (see Loop Quan- where is an arbitrary complex number, called the
tum Gravity). It can be obtained from the tetrad Immirzi parameter. In terms of this connection, [21]
canonical formalism by the canonical transformation is replaced by

Aia 12 ijk !ajk ikia 16 i 1 2


C ijk Fab Eja Ekb det ekab kab  k2 0
4
Eai det e eai 17 where eia and kab are given as function of E and A by
where !ij = !ija
dxa is the (torsion-free) spin connec- [22] and [17]. The choice = 1, with the constraint
tion of the triad 1-form field ei = eia dxa , determined [19][21], gives the canonical formulation of Eucli-
by the Cartan equation dean GR.
All the formulations described extend readily to
j
dei !k ^ ek 0 matter couplings. The structure of the constraints
remains the same with additional constraints corre-
The electric field E is real, while the SenAshtekar
sponding to matter gauge invariances, if any. The GR
connection Ai = Aia dxa is complex and satisfies the
constraints are modified by the addition of matter terms.
reality condition
In particular, the Hamiltonian constraint C and the
Ai Ai 2i e 18 momentum constraint Ca are modified by the addition
of terms determined by the energy density and the
The connection Ai has a simple geometrical inter- momentum density of the matter, respectively. In the
pretation. It is the pullback Aai = !()
a0i on the t = 0 Ashtekar formulation, a fermion field modifies the
ADM surface of the self-dual part Gauss law constraint by the addition of a torsion term.
 
1 i KL
!IJ !IJ  IJ !KL
2 2
Evolution
of the four-dimensional torsion free spin connection
!IJ I In the gauge-invariant canonical structure of GR, there
 determined by the tetrad field e .
In terms of these fields, the constraint equations is no explicit time flow generated by a Hamiltonian. If
can be written in the form the formalism is utilized just in order to express the
Einstein equation in first-order canonical form, this is
Gi Da Eai 0 19 not a difficulty, because evolution in the coordinate
i
time is generated by the constraints. On the other
Ca Fab Eai 0 20 hand, if we are interested in understanding the
i structure of states, observables, and evolution of GR,
C ijk Fab Eja Ekb 0 21
the situation appears to be puzzling. An additional
where Da is the covariant derivative and Fab is the complication arises from the fact that virtually no
curvature defined by the connection A. The first of these gauge-invariant observable Aph is known explicitly as
constraints is the nonabelian version of the Gauss law a function on the phase space. These issues become
[3]: it is the gauge constraint of YangMills theory. The especially relevant when the canonical formalism is
constraints are polynomial in the canonical variables. taken as a starting point for quantization. How is
These equations are often written using a basis
i physical evolution represented in canonical GR?
in the su(2) Lie algebra, and defining the su(2) The first relevant observation is that the gauge-
connection A = Ai
i and the su(2)-valued vector invariant phase space ph is better understood as a
field Ea = Eai
i . In terms of these fields the con- phase space in the sense of Lagrange: namely as the
straints can be written in the form space  of the solutions of the equations of motion
modulo gauges, rather than a space of instantaneous
G Da Ea 0 states. Recall that in GR the notion of instanta-
Ca trFab Ea  0 neous state is rather unnatural.
In the ADM formulation, for instance, an orbit on
C trFab Ea Eb  0
the constraint surface of GR can be understood as
where the trace is on su(2). the ensemble of all possible values that the variables
Canonical General Relativity 417

(qab (x), pab (x)) can take on arbitrary spacelike ADM explored. Among these: definitions of the physical
surfaces embedded in a given solution of the symplectic structure directly on the space of the
Einstein equation. Motion along the orbit (which solutions of the field equations; generalization of the
has dimension 4  13 ) corresponds to arbitrary initial and final surfaces to boundaries of compact
deformations of the surface. spacetime regions; construction of evolving con-
Physical applications of classical GR deal with stants of motion, namely families of gauge-invar-
relations between partial observables. A partial iant observables depending on a clock time
observable is any variable physical quantity that can parameter; multisymplectic formalisms that treats
be measured, even if its value cannot be determined space and time derivatives on a more equal footing;
from the knowledge of the physical state. An example and others. Many of these techniques are attempts
of partial observable in nonrelativistic mechanics is to overcome the unequal way in which time and
given precisely by the nonrelativistic time t. Partial space dependence are treated in the conventional
observables are represented in GR as functions on 0 . Hamiltonian formalism.
A physical state in ph determines an orbit in C, and GR has deeply modified our understanding of
therefore a set of relations between partial observables space and time. An extension of the canonical
(see Figure 1). That is, it determines the possible values formalism of mechanics, compatible with such a
that the partial observables can take when and modification, is needed, but consensus on the way
where other partial observables have given values. (or even the possibility) of formulating a fully
All physical predictions of classical GR can be satisfactory general-relativistic extension of Hamil-
expressed in this form. tonian mechanics is still lacking.
One of the partial observables can be selected to
play the role of a physical clock time, and evolution See also: Asymptotic Structure and Conformal Infinity;
can be expressed in terms of such clock time. In Constrained Systems; General Relativity: Overview;
general, it is difficult if not impossible to find a Loop Quantum Gravity; Quantum Cosmology; Quantum
Geometry and its Applications; Spin Foams;
clock time observable in terms of which evolution is
WheelerDe Witt Theory.
a proper conventional Hamiltonian evolution. Mat-
ter couplings partially simplify the task. For
instance, if the motion of planet Earth is coupled
Further Reading
to GR, then proper time along this motion from a
significative event on Earth, which is a partial Arnowitt R, Deser S, and Misner CW (1962) The dynamics of
observable, can be a convenient clock time. In pure general relativity. In: Witten L (ed.) Gravitation: An Introduc-
gravity, the York time defined as the trace of the tion to Current Research, p. 227. New York: Wiley.
Ashtekar A (1991) Non-Perturbative Canonical Gravity. Singapore:
extrinsic curvature TY = k, on ADM surfaces where World Scientific.
k is spatially constant, has been extensively and Bergmann P (1989) The canonical formulation of general
effectively used as a clock time in formal analysis of relativistic theories: the early years, 19301959. In: Howard D
the theory. A Hamiltonian that generates evolution and Stachel J (eds.) Einstein and the History of General
in a given clock time T can be formally obtained by Relativity. Boston: Birkhauser.
Dirac PAM (1950) Generalized Hamiltonian dynamics. Canadian
solving the Hamiltonian constraint with respect to a Journal of Mathematical Physics 2: 129148.
momentum PT conjugate to T. Such reparametriza- Dirac PAM (1958) The theory of gravitation in Hamiltonian form.
tions of the relative evolution of the partial Proceedings of the Royal Society of London, Series A 246: 333.
observables can be useful to analyze equations and Dirac PAM (1964) Lectures on Quantum Mechanics. New York:
to help intuition, but they are by no means necessary Belfer Graduate School of Science, Yeshiva University.
Gotay MJ, Isenberg J, Marsden JE, and Montgomery R (1998)
to have a well-defined interpretation of the theory. Momentum maps and classical relativistic fields. Part 1:
Another possibility to introduce a preferred time Covariant field theory. Archives: physics/9801019.
flow is to consider asymptotically flat solutions of Hanson A, Regge T, and Teitelboim C (1976) Constrained
the field equations. In this case, one can define a Hamiltonian Systems. Rome: Academia Nazionale dei Lincei.
nonvanishing Hamiltonian, given by a boundary Henneaux M and Teitelboim C (1972) Quantization of Gauge
Systems. Princeton: Princeton University Press.
integral at spacial infinity. This Hamiltonian gen- Isham CJ (1993) Canonical quantum gravity and the problem of
erates evolution in an asymptotic Minkowski time. time. In: Ibort LA and Rodriguez MA (eds.) Recent Problems in
This choice is convenient for describing observations Mathematical Physics, Salamanca, Dordrecht: Kluwer Academic.
performed from a large distance on isolated gravita- Lagrange JL (1808) Memories de la premiere classe des sciences
tional systems. Many general-relativistic physical mathematiques et physiques. Paris: Institute de France.
Rovelli C (2004) Quantum Gravity. Cambridge: Cambridge
observations do not belong to this category. University Press.
Various other techniques to define a fully gen- Souriau JM (1969) Structure des Systemes Dynamics. Paris:
erally covariant canonical formalism have been Dunod.
418 Capacities Enhanced by Entanglement

Capacities Enhanced by Entanglement


P Hayden, McGill University, Montreal, QC, Canada mixed state on C and d to the maximally mixed
2006 Elsevier Ltd. All rights reserved. state on a specified d-dimensional quantum system.
For a given quantum state AB on the composite
system AB, A = trB AB and

Introduction HA HA trA log2 A 1


Shared entanglement between a sender and receiver is the von Neumann entropy of A , while
can significantly improve the usefulness of a
HAjB Ic AiB HAB  HB 2
quantum channel for the communication of either
classical or quantum data. Superdense coding and is its conditional entropy and
teleportation provide the most well-known examples
of this improvement; free entanglement doubles the IA; B HA HB  HAB 3
classical capacity of a noiseless quantum channel its mutual information.
and makes it possible for a noiseless classical channel
to send quantum data. In fact, the entanglement-
assisted classical and quantum capacities of a Entanglement-Assisted Classical
quantum channel are in many senses simpler and and Quantum Capacities
better behaved than their unassisted counterparts
(Holevo 1998, Schumacher and Westmoreland The entanglement-assisted classical capacity of a
1997, Devetak 2005). Most importantly, these quantum channel N : A0 ! B is the optimal rate at
capacities can be calculated using simple formulas which classical information can be communicated
and finite optimization procedures (Bennett et al. through the channel while in addition making use of
1999, 2002). No such finite procedure is known for an unlimited number of maximally entangled states.
either of the unassisted capacities. Moreover, the The formal definition proceeds as follows. Alice
entanglement-assisted classical and quantum capa- and Bob are assumed to share nS ebits in the form of
~~
cities are related by a simple factor of 2. The a maximally entangled state jiAB of Schmidt rank
unassisted capacities, in contrast, have completely 2nS . Conditioned on her message m 2 {1, 2, . . . , 2nR },
Alice will apply an encoding operation E m : A ~ ! A0n .
different formulas. In fact, the simple factor of 2
2nR
generalizes to a statement known as the quantum Bobs decoding is given by a POVM {m }m = 1 on the
composite system BB~ n . The procedure is said to have
reverse Shannon theorem, which governs the rate at
which one quantum channel can simulate another maximum probability of error  if
(Bennett et al. 2005). The answer is given by the  n 
max tr m N  E m   1   4
ratio of the entanglement-assisted capacities. m

These elements, illustrated in Figure 1, consisting of


Notation the shared entanglement, as well as the encoding and
Quantum systems will be denoted by A, B, and so decoding operations meeting the criterion of eqn [4],
on as well as their variants such as A0 and A.^ The are called a (2nR , 2nS , n, ) entanglement-assisted clas-
choice of letter will generally indicate which party sical code for the channel N . A rate R is said to be
holds a given system, with A reserved for the sender, achievable if there exists a choice of S  0 and a
Alice, and B for the receiver, Bob. Given a quantum sequence of entanglement-assisted classical codes
system C, Cn will often be written as Cn . These (2nR , 2nS , n, n ) with n ! 0. The entanglement-assisted
symbols will be used to denote both the Hilbert
space of the quantum system and the set of density An Bn
n
operators on that system. Thus, a quantum channel
m

N : A0 ! B refers to a trace-preserving, completely {m}mM = 1


positive (TPCP) map from the operators on the B
Hilbert space of A0 to those of B. idC refers to the
identity channel on C. The map N  idC will Figure 1 Circuit representation of the elements of an
entanglement-assisted classical code for the channel N . Alice
frequently be abbreviated to N in order to simplify
encodes message m by applying the operation E m to her half
long expressions. Likewise, the density operator of the shared entanglement. Bob decodes by applying the
jihj of a pure quantum state ji will be POVM fm 0g on the output of the channel and his half of the
abbreviated to . C will refer to the maximally shared entanglement.
Capacities Enhanced by Entanglement 419

classical capacity CE (N ) of N is defined to be the capacities of a channel. Proceeding as before to


supremum over all achievable rates. formally define the quantum capacity, Alice and Bob
are again assumed to share a maximally entangled
Theorem 1 (Bennett et al. 1999, 2002). The ~~
state jiAB of Schmidt rank 2nS . Alices encoding
entanglement-assisted classical capacity CE of a ^A~ ! A0n acting
operation will be a TPCP map E : A
quantum channel N : A0 ! B is given by ^ and her half of the shared
on an input system A
entanglement, A.~ Bobs decoding will likewise be a
CE N max IA; B 5

TPCP map D : BB ~ n!B ^ acting on the output of the
0
where the maximization is over states  AB = N (AA ) channel, Bn , and his half of the shared entangle-
arising from the channel by acting on the A0 half of ment, B.~ A ^ and B^ are assumed to be isomorphic
AA0
any pure state ji . quantum systems of some fixed dimension 2nQ . The
procedure is said to have subspace fidelity 1   if
The theorem bears a strong formal resemblance to    ^
Shannons noisy coding theorem for the classical ^ n ~~ ^
B
hj D  N  E AB  A jiB  1   9
capacity of a classical noisy channel. There the
^
capacity formula is also given by an optimization of for all jiA 2 A. ^ These elements, illustrated in
the mutual information, but over joint distributions Figure 2, are together called a (2nQ , 2nS , n, )
between the input and output alphabets arising from entanglement-assisted quantum code for the channel
the action of the channel. Such a joint distribution N . A rate Q is said to be achievable if there exists a
cannot exist in general for a quantum channel choice of S  0 and a sequence of entanglement-
because the no-cloning theorem excludes the possi- assisted quantum codes (2nR , 2nS , n, n ) with n ! 0.
bility of the input and output existing simulta- The entanglement-assisted quantum capacity QE (N )
neously. Equation [5] instead refers to a natural of N is defined to be the supremum over all
substitute for the joint inputoutput distribution: a achievable rates.
quantum state arising from the quantum channel There is considerable freedom in the definition of
acting on half of an entangled pure state. the entanglement-assisted quantum capacity. It
Another point worth stressing is that, unlike the could, for example, be defined as the largest amount
known formulas for the unassisted classical and of maximal entanglement that can be generated
quantum capacities of a quantum channel, eqn [5] using the channel, minus the entanglement con-
refers to only a single use of N instead of the limit sumed during the protocol itself. Alternatively, the
n
of many uses, N . The formula can therefore fidelity criterion eqn [9] could be strengthened to
n
readily be used to evaluate CE for any channel of require that D  N  E preserve not only pure
interest. ^
states on A but any entanglement between A ^ and a
Consider, for example, the d-dimensional depo- reference system. All of these variants yield the same
larizing channel capacity formula:
Dp  1  p pd 6 QE N 12 CE N 10

that with probability p completely randomizes the This equivalence is a direct consequence of the
input but otherwise leaves the input invariant. For existence of the teleportation and superdense coding
such channels, the maximum is achieved 0 by choos- protocols. When maximal entanglement is available,
ing a maximally entangled state for jiAA , yielding teleportation converts the ability to send classical
data into the ability to send quantum data at half
CE Dp 2 log2 d the classical rate. Conversely, by consuming
 
d2  1
 hd 2 1  p 7
d2

An Bn
n
where for any 0  q  1 and integer r  1,

B
hr q  q log2 q  1  q
  B
1q
 log2 8
r1 Figure 2 Circuit representation of the elements of an
entanglement-assisted quantum code for the channel N . E is
is the Shannon entropy of the distribution
Alices encoding operation, which acts on both her input state
(q, (1  q)=(r  1), . . . , (1  q)=(r  1)). and her half of the shared entanglement. Bob decodes using a
Entanglement assistance also simplifies the rela- quantum operation D acting on the output of the channel and his
tionship between the classical and quantum half of the shared entanglement.
420 Capacities Enhanced by Entanglement

maximal entanglement, superdense coding converts quantity of an ensemble of states that can be produced
the ability to send quantum data into the ability to by Alice acting on half of a shared entangled state and
send classical data at double the quantum rate. then sending her half through the channel. Invok-
ing the HolevoSchumacherWestmoreland (HSW)
theorem for the classical capacity (Holevo 1998,
Sketch of Proof Schumacher and Westmoreland 1997) therefore com-
The proof of a capacity theorem can usually be pletes the proof; using coding, the Holevo quantity is
broken into two parts, achievability and optimality. an achievable communication rate.
The achievability part demonstrates the existence of The proof that eqn [5] is optimal involves a series
a sequence of codes reaching the prescribed rate of entropy manipulations similar to the optimality
while the optimality part shows that it is impossible proofs for the unassisted classical and quantum
to do better. capacities. From the point of view of quantum
The main idea in the achievability proof can be information, the truly unusual part of the proof is
understood by studying the special case where the demonstration that it is unnecessary to consider
d2n
0 0
A = A . Let dA0 = dimA0 and {Uj }j =A0 1 be a set of multiple copies of N (Cerf and Adami 1997).
Weyl operators for A0n . The relevant property of Specifically, let
these operators is that averaging over them imple-
f N max IA; B 17
ments the constant map: for all density operators , 

d2n0 where the maximization is defined as in Theorem 1.


1 X A
0n

2n
Uj Ujy  A 11 Techniques analogous to those used for the unas-
dA0 j1 sisted capacities yield the upper bound
Consider the state j that arises if Alice acts with Uj 1 n
on the A0n 0nhalf of a rank-dAn 0 maximally entangled CE N  lim f N 18
n!1 n
state jiAA and then sends the A0n half of the
resulting state through N . (Note that here A0n also Unlike the unassisted case, however, a relatively easy
~ The entropy of the resulting
plays the role of A.) argument shows that
state is f N 1  N 2 f N 1 f N 2 19
 
Hj H N Uj  IB~ Ujy  IB~ 12 (The analogous statement is an important conjecture
for the classical capacity and is known to be false for
H N 13 the quantum capacity (DiVincenzo et al. 1998).) As
a result, CE (N )  f (N ), which is the optimality part
since Uj does not change the local density operator of Theorem 1.
on A0n . To see the origin of eqn [19], it will be helpful to
On the other hand, if Alice selects a value of j BE
invoke Stinesprings theorem to write N j = trEj U j j j ,
from the uniform distribution, then the resulting where0 0 U j : A0j ! Bj Ej is an isometry. Fix a state
average input state to the channel will be jiAA1 A2 and let  = (U 1  U 2 )(). Equation [19]
0n 0n follows from the fact that
A   A A  A 14
and the corresponding average output state will be IA; B1 B2   IAB2 E2 ; B1 
0n
N (A )  A , which has entropy IAB1 E1 ; B2  20
0n
HN A HA 15 Simply redefining A to be AB2 E2 shows that the first
Therefore, the Holevo quantity of the ensemble of term of the right-hand side is upper bounded by
output states, defined as the entropy of the average f (N 1 ). The second term, likewise, is upper bounded
state minus the average of the entropies of the by f (N 2 ). Equation [20] is itself equivalent to the
individual output states, will be equal to inequality
   
0n 0n HB1 B2 jE1 E2  HB1 B2 
HA H N A  H N AA 16
 HB1 jE1  HB2 jE2 
This is precisely the quantity I(A; B) for the state HB1  HB2  21
0n
N (AA ) since the channel N transforms the A0n
system into B. Moreover, if Bob is given the A part of The inequality H(B1 B2 )  H(B1 ) H(B2 ) holds
the maximally entangled state, then this is the Holevo by the subadditivity of the von Neumann entropy.
Capacities Enhanced by Entanglement 421

Repeated applications of the strong subadditivity decoding will likewise be a TPCP map D : Bm B ~ ! Bn
inequality, moreover, lead to the inequality acting on m copies of the output of the channel, and his
half of the shared entanglement, B.~ This procedure is
HB1 B2 jE1 E2   HB1 jE1  said to -simulate N n on ( A0 n
) if
2
HB2 jE2  22  
n  0n  m  ~ ~ 0n
F N 2 AA ; D  N 1  E AB  AA
Together, they prove eqn [20] and, thence, eqn [19].
The intuitive meaning of this single-letterization is 1   25
unclear, but regardless, it is interesting to note that
where F is the mixed state fidelity F(, ) =
p
the proof involved invoking a pair of purifying
(tr 1=2 1=2 )2 . The entire procedure, illustrated in
environment systems, E1 and E2 , and studying the
Figure 3, is said to be a (2nS , m, n, ) entanglement-
entropy relationships between the true outputs of
assisted simulation of N 2 by N 1 . A rate R, measured
the channel and the environments share.
in copies of N 2 per copy of N 1 , is said to be
0
achievable for A if there exists a choice of S  0 and
The Quantum Reverse Shannon Theorem a sequence of (2nS , mn , n, n ) entanglement-assisted
simulations with n=mn ! R while n ! 0.
A strong argument can be made that the entanglement- The quantum reverse Shannon theorem states
assisted capacity of a quantum channel is the most that the entanglement-assisted capacity completely
important capacity of that channel and that all the governs the achievable simulation rates.
other capacities are, in some sense, of less significance.
The fact that it is unnecessary to distinguish between Theorem 2 (Winter 2004, Bennett et al.). Given
the classical and quantum entanglement-assisted capa- two channels N 1 : A0 ! B and N 2 : A0 ! B, R is an
cities because they are related by a factor of 2 is a hint achievable simulation rate for N 2 by N 1 and all
0

in that direction, as is the simple, single-letter formula input states A if and only if
for CE (N ). CE N 1
A more general argument can be made by R 26
CE N 2
considering the problem of having one channel
simulate another. Indeed, the quantum capacity of Note that the form of eqn [26] ensures that the
a quantum channel is simply the optimal rate at simulation is asymptotically reversible: if a channel
which that channel can simulate the noiseless N 1 is used to simulate N 2 and the simulation is then
channel id2 on a single qubit. Likewise, the classical used to simulate N 1 again, then the overall rate
capacity of a quantum channel is its optimal rate for becomes
simulation of a qubit dephasing channel CE N 1 CE N 2
1 27
 7! j0ih0jj0ih0j j1ih1jj1ih1j 23 CE N 2 CE N 1
In this spirit, the fact that CE (N ) = 2QE (N ) can be Thus, in the presence of free entanglement and for a
0
re-expressed in the form known input density operator of the form (A )n , a
single parameter, the entanglement-assisted classical
CE N
QE N 24 capacity, suffices to completely characterize the
CE id2 asymptotic properties of a quantum channel.
Equivalently, when entanglement is free, the optimal
rate at which N can simulate a noiseless qubit channel
is given by the ratio between the entanglement- An Am m Bn
assisted classical capacities of N and id2 . The
1

quantum reverse Shannon theorem generalizes this Bn


statement to the simulation of arbitrary channels in
B An n Bn
the presence of free entanglement. 2

Suppose that Alice and Bob would like to use


N 1 : A0 ! B to simulate another channel N 2 : A0 ! B. (a) (b)
A0 AA0n
Fix an input state and let ji be a purification Figure 3 Circuit representation of an entanglement-assisted
0
of (A )n . As always, assume that Alice and Bob share simulation of N 2 by N 1 . (a) The simulation circuit, with Alices
~~ encoding operation E acting on n copies of A0 and Bobs
a maximally entangled state jiAB of Schmidt rank
decoding operation producing n copies of B. (b) The circuit that
2nS . Alices encoding operation will be a TPCP map the protocol is intended to simulate. As stated, the quantum
~ 0n ! A0m acting on n copies of the input system
E : AA reverse Shannon theorem allows the simulation circuit to depend
0
A and her half of the shared entanglement, A. ~ Bobs on the density operator of the input state restricted to A0n .
422 Capacities Enhanced by Entanglement

Moreover, since two channels that are asymptoti- can be written trE U BE for some isometry U BE .0 Let
AA0
cally equivalent without free entanglement will ji be a pure state and jiABE = U BE jiAA the
surely remain equivalent if free entanglement is corresponding purified channel output state. Careful
permitted, eqn [26] gives essentially the only analysis of the entanglement-assisted classical commu-
possible nontrivial, single-parameter asymptotic nication protocol achieving the rate I(A; B) leads to
characterization of quantum channels. This is the an entanglement-assisted quantum communication
sense in which the entanglement-assisted capacity protocol consuming entanglement at the rate
should be regarded as the most important capacity (1=2)I(A; E) ebits per use of N and yielding commu-
of a quantum channel. nication at the rate of (1=2)I(A; B) qubits per use N .
The proof of the quantum reverse Shannon The protocol achieving this goal is known as the
theorem is quite involved, but some of its features father (Devetak et al. 2004).
can be understood without much work. First, note If the entanglement consumed in the father were
that by the optimality statement of the entanglement- actually supplied by quantum communication from
assisted classical capacity, the desired simulation can Alice to Bob, then the net rate of quantum
exist only if eqn [26] holds. Otherwise, composing communication produced by the resulting protocol
the simulation of N 2 by N 1 with a sequence of codes would be (1=2)I(A; B)  (1=2)I(A; E) qubits from
achieving CE (N 2 ) would result in a sequence of codes Alice to Bob, that is, the total produced minus the
beating the capacity formula for N 1 . total consumed.
Similarly, note that one method to simulate a This quantity, how much more information B has
channel N 1 using N 2 is to first use N 2 to simulate about A than E does, can be simplified using an
the noiseless channel and then use the simulated interesting identity. Since jiABE is pure,
noiseless channel to simulate N 1 . Since the achiev-
able rates for the first step are characterized by the IA; E HA HE  HAE 28
entanglement-assisted capacity theorem, proving the
HA HAB  HB 29
achievability part of Theorem 2 reduces to finding
protocols for simulating a general noisy quantum Expanding I(A; B) and canceling terms then reveals
channel N 2 by a noiseless one. That perhaps sounds that
like a strange goal, but nonetheless is the difficult
1
part of the quantum reverse Shannon theorem. 2IA; B  12IA; E HAjB
It is likely that the quantum reverse Shannon Ic AiB 30
theorem can be extended to cover other types of
0
inputs than the known tensor power states (A )n . where the function Ic is known as the coherent
The most desirable form of the theorem would be information. After optimizing over input states and
one valid for all possible input density operators on multiple channel uses, this is precisely the formula for
A0n , providing a single simulation procedure the unassisted quantum capacity of a quantum channel
dependent only on the channels and not the input (Devetak 2005). Thus, the net rate of qubit commu-
state. It is known that without modifying the form nication for the protocol derived from the father
of the free entanglement, this most ambitious form exactly matches the rates necessary to achieve the
of the theorem fails, but it is conjectured that the unassisted quantum capacity. The only caveat is that
full-strength theorem does hold provided very large the protocol derived from the father uses quantum
amounts of entanglement are supplied in the form of communication catalytically, meaning that some com-
the so-called embezzling states (van Dam and munication needs to be invested in order to get a gain
Hayden 2003). of Ic (AiB). For the unassisted quantum capacity, no
investment is necessary. Nonetheless, detailed analysis
of the situation reveals that the amount of catalytic
Relationships between Protocols communication required can be reduced to an amount
There is another sense in which the entanglement- sublinear in the number of channel uses, meaning the
assisted capacity can be viewed as the fundamental rate of required investment can be made arbitrarily
capacity of a quantum channel: an efficient protocol small. In this sense, the father protocol essentially
for achieving the entanglement-assisted capacity can generates the optimal protocols for the unassisted
be converted into protocols achieving the unassisted quantum capacity.
quantum and classical capacities, or at least very Protocols achieving the unassisted classical capa-
close variants thereof. city can be constructed in a similar way. In this case,
0
An efficient protocol in this case refers to one that one starts from an ensemble E = {pj , N ( jA )} of
does not waste entanglement. Suppose that N : A0 ! B states generated by the channel. Achievability of
Capacities Enhanced by Entanglement 423

the unassisted classical capacity formula follows discuss their results prior to their publication and
from achievability of rates of the form to Jon Yard for a careful reading of the manu-
X 0
 script. This work has been supported by the
E H pj N A
j Canadian Institute for Advanced Research, the
j Canada Research Chairs program, and Canadas
X  
 pj H N A0 NSERC.
j 31
j
See also: Capacity for Quantum Information; Channels in
for arbitrary ensembles of output states. Consider Quantum Information Theory; Entanglement; Finite Weyl
the channel Systems; Quantum Channels: Classical Capacity;
X Quantum Entropy.
e 
N hjjjji N j 32
j
0 P p 0
and input state jiAA = j pj jjiA jjiA . If  = Ne(), Further Reading
then I(A; B) is equal to (E). Thus, there are protocols
consuming entanglement that achieve the classical Abeyesinghe A, Devetak I, Hayden P, and Winter A (2005) Fully
quantum SlepianWolf (in preparation).
communications rate (E) for the modified channel
Bennett CH, Devetak I, Harrow AW, Shor PW, and Winter A (2005)
Ne. Because the channel Ne includes an orthonormal The quantum Reverse Shannon Theorem (in preparation).
measurement which destroys all entanglement between Bennett CH, Shor PW, Smolin JA, and Thapliyal AV (1999)
A and B, however, it can be argued that any Entanglement-assisted classical capacity of noisy quantum
entanglement used in such a protocol could be replaced channels. Physical Review Letters 83: 3081 (arXiv.org:quant-
ph/9904023).
by shared randomness, which could then in turn be
Bennett CH, Shor PW, Smolin JA, and Thapliyal AV (2002)
eliminated by a standard derandomization argument. Entanglement-assisted capacity of a quantum channel and
The net result is a procedure for choosing rate (E) the reverse Shannon theorem. IEEE Transactions on Informa-
codes for the channel N consisting of states of the form tion Theory 48(10): 2637 (arXiv.org:quant-ph/0106052).
Cerf N and Adami C (1997) Von Neumann capacity of noisy
j1   jn , which is the essence of the achievability
quantum channels. Physical Review A 56: 3470 (arXiv.org:
proof for the unassisted classical capacity.
quant-ph/9609024).
This may seem like an unnecessarily cumbersome Devetak I (2005) The private classical capacity and quantum
and even circular approach to the unassisted capacity of a quantum channel. IEEE Transactions on
classical capacity given that the proof sketched Information Theory 51(1): 44 (arXiv.org/0304127).
above for the entanglement-assisted classical capa- Devetak I, Harrow AW, and Winter A (2004) A family of
quantum protocols. Physical Review Letters 93: 230504
city itself invokes the unassisted result in the form of
(arXiv.org:quant-ph/0308044).
the HSW theorem. The approach becomes more DiVincenzo DP, Smolin JA, and Shor PW (1998) Quantum
satisfying when one learns that simple and direct channel capacity of very noisy channels. Physical Review A
proofs of the father protocol exist that completely 57: 830 (arXiv.org:quantph/9706061).
bypass the HSW theorem (Abeyesinghe et al. 2005). Holevo AS (1998) The capacity of the quantum channel with
general signal states. IEEE Transactions on Information
Thus, the entanglement-assisted communication
Theory 44: 269273.
protocols can be easily transformed into their Schumacher B and Westmoreland MD (1997) Sending classical
unassisted analogs, confirming the central place of information via noisy quantum channels. Physical Review A
entanglement-assisted communication in quantum 56: 131138.
information theory. van Dam W and Hayden P (2003) Universal entanglement
transformation without communication. Physical Review A
67: 060302 (arXiv.org:quant-ph/0201041).
Winter A (2004) Extrinsic and instrinsic data in quantum
Acknowledgmnts measurements: asymptotic convex decomposition of
positive operator valued measures. Communications in
The author is grateful to the inventors of the Mathematical Physics 244(1): 157 (arXiv.org:quantph/
quantum reverse Shannon theorem for letting him 0109050).
424 Capacity for Quantum Information

Capacity for Quantum Information


D Kretschmann, Technische Universitat Information Theory), it lies at the heart of quantum
Braunschweig, Braunschweig, Germany information theory.
2006 Elsevier Ltd. All rights reserved. In a very typical scenario, Alice and Bob would
like to implement the ideal (noiseless) quantum
channel S = id: they are interested in sending
quantum states undistorted over some distance, or
Introduction want to store them safely for some period of time, so
that all the precious quantum correlations are
Any processing of quantum information, be it preserved. The capacity Q(T)  Q(T, id) is then the
storage or transfer, can be represented as a quantum maximal number of qubit transmissions per use of
channel: a completely positive and trace-preserving the channel, taken in the limit of long messages and
map that transforms states (density matrices) on the using collective encoding and decoding schemes
senders end of the channel into states on the asymptotically eliminating all transmission errors.
receivers end. Very often, the channel S that sender This is what is generally called the quantum capacity
and receiver (conventionally called Alice and Bob, of the channel T, and it is our main focus in this
respectively) would like to implement is not readily article. Little is known so far about the quantum
available, typically due to detrimental noise effects, capacity for the simulation of other (nonideal)
limited technology, or insufficient funding. They channels (cf. the section Related capacities).
may then try to simulate S with some other channel In remarkable contrast to the classical setting,
T, which they happen to have at their disposal. The quantum channel capacities are very much affected
quantum channel capacity Q(T, S) of T with respect by additional resources. This leads to unexpected
to S quantifies how well this simulation can be and fascinating applications such as teleportation
performed, in the limit of long input strings, so that and dense coding. But it also results in a bewildering
Alice and Bob can take advantage of collective pre- variety of inequivalent channel capacities, which still
and post-processing (cf. Figure 1). Higher capacities hold many challenges for future research.
may result if Alice and Bob are allowed to use
additional resources in the process, such as classical
Notation
side channels or a bunch of maximally entangled
pairs shared between them. A quantum channel which transforms input systems
Quantum capacity thus gives the ultimate bench- on a Hilbert space HA into output systems on a
marks for the simulation of one quantum channel by (possibly different) Hilbert space HB is represented
another and for the optimal use of auxiliary (in Schrodinger picture) by a completely positive and
resources. Together with the compression rate of a trace-preserving linear map T : B (HA ) ! B (HB ),
quantum source (see Source Coding in Quantum where B (H) denotes the space of trace class
operators on the Hilbert space H (see Channels in
Quantum Information Theory). We write A instead
of B (HA ) to streamline the presentation, and An for
the n-fold tensor product B (HA )n .
Decoding It is evident that the definition of channel capacity
requires the comparison of different quantum
channels. A suitable distance measure is the norm
Resources

of complete boundedness (or cb-norm, for short),


T T T S S
denoted by k  kcb . For two channels T and S, the
distance (1=2)kT  Skcb can be defined as the largest
difference between the overall probabilities in two
Encoding statistical quantum experiments differing only by
exchanging one use of S by one use of T. These
experiments may involve entangling the systems on
Figure 1 Equipped with collective encoding and decoding which the channels act with arbitrary further
operations (and perhaps some auxiliary resources), n = 3
systems; hence the cb-norm remains a valid distance-
instances of the channel T simulate m = 2 instances of the
channel S. The transmission rate of the above scheme is 2/3. measure if the given channel is only part of a larger
Capacity is the largest such rate, in the limit of long messages system. Equivalently, we may set kTkcb :
and optimal encoding and decoding. supn kT  idn k, where kRk := supk%k1  1 kR(%)k1
Capacity for Quantum Information 425

denotes p the
norm of linear operators, and or even the average fidelity,
k%k1 := tr % % is the trace norm on the space of Z
trace-class operators B (H). 
FT : h jTj ih jj i d 3
We use base two logarithms throughout, and we
write ld x := log2 x and exp2 x := 2x . Unfortunately, this equivalence is restricted to
capacities with noiseless reference channel S = id.
In the vicinity of other (nonideal) channels, equiva-
lence of the stabilized and unstabilized error criteria
Quantum Channel Capacity
may be lost. Of course, the comparison of channels
The intuitive concept underlying quantum channel is ultimately based on the comparison of a state to
capacity is made rigorous in the following its image, and here the pure states are the worst
definition: case. Hence, the remarkable insensitivity of the
quantum capacity to the choice of the error criterion
Definition 1 A positive number R is called achiev-
stems from the observation that the comparison
able rate for the quantum channel T : A ! B with
between an arbitrary state and a pure state is rather
respect to the quantum channel S : A0 ! B0 iff for any
insensitive to the criterion used.
pair of integer sequences (n )2N and (m )2N with
Instead of requiring the error quantity in eqn [1] to
lim ! 1 n = 1 and lim ! 1 mn  R we have
approach zero in the large block limit  ! 1, one
lim inf kDT  n E  S  m kcb 0 1 might feel tempted to impose that the errors vanish
!1 D;E
completely for some sufficiently large block length,
the infimum taken over all encoding channels E and since this is the standard setup in the theory of
decoding channels D with suitable domain and quantum error correction (see Quantum Error Correc-
range. The channel capacity Q(T, S) of T with tion and Fault Tolerance). While it is true that errors
respect to S is defined to be the supremum of all can always be assumed to vanish exponentially in eqn
achievable rates. The quantum capacity is the special [1], requiring perfect correction may completely change
case Q(T) := Q(T, id2 ), with id2 being the ideal the picture: if a channel has some small positive
qubit channel. probability for depolarization, the same also holds for
its tensor powers, and no such channel allows the
In this article, we mainly concentrate on
perfect transmission of even one qubit. Hence, the
channels between finite-dimensional systems. This
capacity for perfect correction will vanish for such
is enough to bring out the basic ideas. Many of the
channels, while the standard capacity (in accordance
concepts and results discussed here can be general-
with Definition 1) will be close to maximal, Q(T) 1.
ized to Gaussian channels, which play a central
The existence of perfect error-correcting codes thus
role as building blocks for quantum optical
gives lower bounds on the channel capacity, but is not
communication lines (Holevo and Werner 2001,
required for a positive transfer rate.
Eisert and Wolf).
In the other extreme, one might sometimes feel
There is considerable freedom in the definition
inclined to tolerate (small) finite errors in the
of quantum channel capacity, at least for ideal
transmission. For some " > 0, we define Q" (T)
reference channels (Kretschmann and Werner
exactly like the quantum capacity in Definition 1,
2004). In particular, the encoding channels E in
but require only that the error quantity in eqn [1]
eqn [1] may always be restricted to isometric
falls below " for some sufficiently large .
embeddings.
Obviously, Q" (T)
Q(T) for any quantum
In addition, it is not necessary to check an infinite
channel T. We also have lim" ! 0 Q" (T) = Q(T)
number of pairs of sequences (n )2N and (m )2N
(Kretschmann and Werner 2004). In the classical
when testing a given rate R, as Definition 1 would
setting, even a strong converse is known: if " > 0 is
suggest. Instead, it is enough to find one such pair
small enough, one cannot achieve bigger rates by
which achieves the rate R infinitely often,
allowing small errors, that is, C" (T) = C(T). It is still
lim ! 1 m =n = R.
undecided whether an analogous property holds for
Without affecting the capacity, the cb-norm kTkcb
the quantum capacity Q(T).
may be replaced by the unstabilized operator norm
kTk or by fidelity measures, which are in general
much easier to compute. In particular, one might
choose the minimum fidelity, Related Capacities
FT : min h jTj ih jj i 2 This article is chiefly concerned with the quantum
k k1 capacity of a quantum channel. A variety of other
426 Capacity for Quantum Information

capacities have been derived from Definition 1 by enhance it. However, unlike in the purely classical
either amending the channel S to be simulated, or case, both the quantum and classical channel
allowing Alice and Bob to make use of additional capacity (but not the entanglement-assisted capacity)
resources. Their interrelations are reviewed in Bennett may increase under classical feedback.
et al. (2004)
Much interest has been devoted to the hybrid
problem of transmitting classical information undis-
Elementary Properties
torted over noisy quantum channels. The classical The capacity of a composite channel T1 T2 cannot
capacity C(T) of a quantum channel T is discussed in be bigger than the capacity of the channel with the
the article Quantum Channels: Classical Capacity of smallest bandwidth. This in turn suggests that
this Encyclopedia. It is obtained by choosing the ideal simulating a concatenated channel is in general easier
one-bit channel rather than the one-qubit channel as than simulating any of the individual channels. These
the standard of reference in Definition 1. Encoding relations are known as bottleneck inequalities:
channels E and decoding channels D are then
QT1 T2 ; S  minfQT1 ; S; QT2 ; Sg 4
restricted to preparations and measurements, respec-
tively. Since a quantum channel can also be employed
QT; S1 S2
maxfQT; S1 ; QT; S2 g 5
to send classical information, we have C(T)
Q(T).
There are, obviously, examples in which this Instead of running T1 and T2 in succession, we may
inequality also run them in parallel. In this case, the capacity
P is strict: the entanglement-breaking channel
T(%) = j hjj%jji jjihjj is composed of a measurement can be shown to be superadditive,
in the orthonormal basis {jji}j , followed by a prepara-
QT1  T2 ; S
QT1 ; S QT2 ; S 6
tion of the corresponding basis states. It destroys all
the entanglement between the sender and a reference For the standard ideal channels, we even have
system, implying Q(T) = 0. Yet all the basis states jji additivity. The same holds true if both S and one
are transmitted undistorted, which is enough to of the channels T1 , T2 are noiseless, the third
guarantee that C(T) = 1. channel being arbitrary. However, results on the
Definition 1 also applies to purely classical activation of bound-entangled states seem to suggest
channels, and thus to the setting of Shannons that the inequality in eqn [6] may be strict for some
information theory. A classical channel T between channels (see Entanglement).
two d-level systems is completely specified by the Finally, the two-step coding inequality tells us that
d d matrix (Txy )dx, y = 1 of transition probabilities. by using an intermediate channel in the coding
For these channels the cb-norm difference is just process we cannot increase the transmission rate:
(twice) the maximal error probability:
QT1 ; T2
QT1 ; T3 QT3 ; T2 7
kid  Tkcb = 2 supx {1  Txx } Applying eqn [7] twice with T2 = id and T3 = id
immediately yields upper and lower bounds on the
which is the standard error criterium for classical channel capacity with nonideal reference channel,
information transfer.
QT1
Dense coding and teleportation suggest that
QT1 ; T2
QT1 Qid; T2 8
entanglement is a powerful resource for information QT2
transfer. It doubles the classical channel capacity of The evaluation of the lower bound in eqn [8] then
a noiseless channel, and it allows to send quantum requires efficient protocols for simulating a noisy
information over purely classical channels. Surpris- channel T2 with a noiseless resource.
ingly, the entanglement-assisted capacities are often There are special cases in which the quantum
simpler and better behaved than their unassisted channel capacity can be evaluated relatively easily,
counterparts. Unlike the classical and quantum the most relevant one being the noiseless channel idn ,
capacities proper, they are relatively easy to calcu- where by the subscript n we denote the dimension of
late using finite optimization procedures, and there the underlying Hilbert space. In this case, we have
has recently been significant progress in under-
standing the simulation rates for nonideal channels ld n
Qidn ; idm 9
in this scenario (see Capacities Enhanced by ld m
Entanglement). The lower bound Q(idn , idm )
ldn=ldm is immedi-
The quantum channel capacity is unaffected by ate from counting dimensions. To establish the
entanglement-breaking side channels. In particular, upper bound, we use the fact that a noiseless
classical forward communication alone cannot quantum channel cannot simulate itself with a rate
Capacity for Quantum Information 427

exceeding unity: Q(idm , idm )  1. This is just the which n copies of a given bipartite quantum state %
upper bound we want to prove for the special case shared between Alice and Bob can be asymptotically
n = m, and it can be extended to the general case converted into m maximally entangled qubit pairs
with the help of the two-step coding inequality [7]: (see Entanglement). Similar to the quantum capa-
Q(idm , idn ) Q(idn , idm )  Q(idm , idm )  1, implying city, the definition involves the large block limit
Q(idn , idm )  1=Q(idm , idn )  ld n=ld m, where in the n, m ! 1 and an optimization over all conceivable
last step we have applied the lower bound with the distillation protocols. These may consist of several
roles of n and m interchanged. rounds of local quantum operations and (forward or
Combining eqn [9] with the two-step coding two-way) classical communication. The one-way
inequality [7], we see that for any channel T and two-way distillable entanglement of % will be
denoted by D1 (%) and D2 (%), respectively.
ld m
QT; idn QT; idm 10 Suppose that Alice and Bob are connected by a
ld n quantum channel T and run such a one-way distilla-
which shows that quantum channel capacities relative tion protocol on (many copies of) theP state
p
to noiseless channels of different dimensionality only %T := (T  id)jihj, where ji := (1= dA ) i ji, ii
differ by a constant factor. Fixing the dimensionality is maximally entangled on HA  HA0 . If the distillation
of the reference channel then only corresponds to a yields maximally entangled qubits at positive rate R,
choice of units. Conventionally, the ideal qubit Alice may apply the standard teleportation scheme to
channel id2 is chosen as a standard of reference, as send arbitrary quantum states to Bob undistorted at
in Definition 1 above, thereby fixing the unit bit. that same rate R. Like the distillation protocol itself,
The upper bound on the capacity of ideal channels teleportation requires classical forward communica-
can also be obtained from a general upper bound on tion, which however does not affect the channel
quantum capacities (Holevo and Werner 2001), capacity (cf. the section Related capacities). Thus,
which has the virtue of being easily calculated in Q(T)
D1 (%T ). If two-way distillation is allowed, we
many situations. It involves the transposition map, have Q2 (T)
D2 (%T ) for the capacity Q2 (T) assisted
which we denote by , defined as matrix transposi- by two-way classical side communication.
tion with respect to some fixed orthonormal basis. Conversely, if Alice and Bob use a bipartite
The transposition is positive but not completely quantum state % shared between them as a substitute
positive, and thus does not describe a physical for the maximally entangled state ji in the
channel (see Channels in Quantum Information standard teleportation protocol, they will implement
Theory). We have kkcb = d for a d-level system. some noisy quantum channel T% . If this channel
For any channel T and small " > 0, allows to transfer quantum information at nonvan-
ishing rate R, Alice may share maximally entangled
QT  Q" T  ld kTkcb : Q T 11
states with Bob at that same rate R. Consequently,
where Q" is the finite error capacity introduced in D1 (%)
Q(T% ) and D2 (%)
Q2 (T% ).
the section Quantum channel capacity. These relations (Bennett et al. 1996) allow to
The upper bound Q (T) has some remarkable bound channel capacities in terms of distillable
properties, which make it a capacity-like quantity in entanglement and vice versa. If the two maps
its own right. For example, it is exactly additive, T 7! %T and % 7! T% are mutually inverse, we even
have D1 (%) = Q(T% ) and D2 (%) = Q2 (T% ). In this
Q S  T Q S Q T 12
case, the duality % T% is the physical implementa-
for any pair S, T of channels, and it satisfies tion of Jamiolkowskis isomorphism between bipar-
the bottleneck inequality: tite states and channels (see Channels in Quantum
Information Theory). This has been shown
Q ST  min{Q S; Q T}
(Horodecki et al. 1999) to hold for isotropic states,
Moreover, it coincides with the quantum capacity on which are invariant under the group of all U  U
ideal channels, Q (idn ) = Q(idn ) = ld n, and it vanishes transformations, where U is the complex conjugate
whenever T is completely positive. In particular, if of the unitary U. The corresponding channels are
id  T maps any entangled state to a state with positive partly depolarizing.
partial transpose, we have Q (T) = 0. In general, T%T 6 T. However, the so-called con-
clusive teleportation allows us to implement T at
least probabilistically, resulting in the relation
StateChannel Duality
1
Quantum capacity is closely related to the distillable 2
QT  D1 %T  QT 13
entanglement, which is the optimal rate m/n at dA
428 Capacity for Quantum Information

The duality [13] can be applied to show that both taking the limit n ! 1 in eqn [15] is indeed required,
the unassisted and the two-way quantum capacities and in general the evaluation of the capacity formula
are continuous in any open set of channels [15] still demands the solution of asymptotically large
having nonvanishing capacities (Horodecki and variational problems. This should be contrasted with
Nowakowski 2005). the entanglement-assisted capacities CE (T) = 2QE (T)
(where a simple nonregularized coding theorem is
known to hold, see Capacities Enhanced by Entan-
Coding Theorems glement) and the capacity for classical information
C(T) (where additivity is conjectured but not proved,
Computing channel capacities straight from Defini-
see Quantum Channels: Classical Capacity). Even a
tion 1 is a tricky business. It involves optimization in
maximization of the single-shot coherent information
systems of asymptotically many tensor factors, and
Ic (T, %) appears to be a difficult optimization
can only be performed in special cases, like the
problem, since this quantity is neither convex nor
noiseless channels in the section Elementary prop-
concave and may have multiple local maxima (Shor
erties. Coding theorems aspire to reduce this
2003). Thus, even for simple-looking systems like the
problem to an optimization over a low-dimensional
qubit depolarizing channel, so far we only have upper
space. They usually come in two parts: the converse
and lower bounds on the quantum channel capacity,
provides an upper bound on the channel capacity
but do not yet know how to compute its exact value.
(typically in terms of some entropic expression),
We now sketch Devetaks proof of Theorem 1,
while the direct part consists of a coding scheme
assuming only some familiarity with Holevo
that attains this bound. By Shannons celebrated
SchumacherWestmoreland (HSW) random codes
coding theorem, the classical capacity of a classical
for the classical channel capacity (see Quantum
noisy channel can be obtained from a maximization
Channels: Classical Capacity). It is easily seen from
of the mutual information over all joint input
Stinesprings dilation theorem (see Channels in
output distributions.
Quantum Information Theory) that a noiseless
For the quantum channel capacity, the relevant
quantum channel provides perfect security against
entropic quantity is the coherent information,
eavesdropping. This is one of the characteristic traits
  of quantum mechanics and lies at the heart of
Ic T; % : H T%  H T  idj % ih % j 14
quantum cryptography. In his proof, Devetak
where H denotes the von Neumann entropy: showed a way to turn this around and upgrade
H(%) = tr% ld%, and % 2 HA  HA0 is a purifica- coding schemes for private classical information to
tion of the density operator % 2 A. The coherent quantum channel codes.
information does not increase under quantum The relation between quantum information trans-
operations, Ic (S T, %)  Ic (T, %) for any quantum fer over a channel T : A ! B and privacy against
channel S and state % 2 A. This is the data eavesdropping is best understood in terms of the
processing inequality (Barnum et al. 1998), which companion channel TE : A ! E. TE arises from a
shows that the regularized coherent information given Stinespring isometry V : HA ! HB  HE of
provides an upper bound on the quantum channel T  TB by interchanging the roles of the output
capacity: if Alice and Bob have a coding scheme for system B and the environment E:
the channel T with capacity Q(T), n channel uses
TB % trE V%V  TE % trB V%V  16
allow them to share a maximally entangled state of
size exp2 n Q(T). The coherent information of this The channel TE describes the information flow into
state equals n Q(T), and was no larger prior to the environment E, a system we assume to be under
Bobs decoding. complete control of a potential eavesdropper, Eve
Recently, Devetak (2005) developed a coding say. The setup for private classical information
scheme to show that this bound is in fact attainable. transfer (including the definition of rates and capa-
Different proofs were outlined by Lloyd and Shor. city) is then exactly the same as for the classical
channel capacity (see Quantum Channels: Classical
Theorem 1 For every quantum channel T,
Capacity), but the protocols now have to satisfy the
1 additional requirement that TE releases (almost) no
QT lim max Ic T  n ; % 15 information to the environment. This can be achieved
n!1 n %
by randomizing over E exp2 n (TE , {pi , %i }) code
Unlike the classical or quantum mutual information, words of a standard HSW code of total size
coherent information is strictly superadditive for exp2 n (TB , {pi , %i }), where {pi , %i } is the quantum
some channels (DiVincenzo et al. 1998). Hence, ensemble from which a set of random code words
Capacity for Quantum Information 429

{k, l }kB=, 1,E l = 1 is generated. The appearance of Given a set of pure state code words
the Holevo bound {jkl i}kB=, 1,E l = 1 of a private classical information
! protocol, for entanglement transfer Alice prepares
X X  
T;fpi ;%i g : H pi T%i  pi H T%i 17 the input state
i i
1 X 1 X

B E
in the dimension of both these code spaces can be jiA0 A p jkiA0  p jkl iA 20
B k1 E l1
understood from the size of the relevant typical
subspaces (Devetak and Winter 2004).
The randomization guarantees that the remaining where A0 denotes a reference system that Alice keeps
B exp2 n((TB )  (TE )) code words are almost in her lab. On his share of the resulting output state
indistinguishable to Eve: j0 iA0 BE Bob will then employ the corresponding
  measurement operators {Mkl }k,B l, =
E
1 to implement the
1 XE
  coherent measurement
 n 
 TE kl  jl   "; 8j; k 1; . . . ; B 18
E  X p
l1 1
VM j iB := kl
Mkl jiB  j kliB1 B2
The net transfer rate for private classical informa-
tion is then R (TB )  (TE ), which is just the total which places the measurement outcomes into some
transfer rate for the channel Alice ! Bob reduced by reference system B1  B2 . Any measurement which
the transfer rate Alice P
! Eve. identifies the output with high probability only
Remarkably, if % = i pi j i ih i j is a decomposi- slightly disturbs the output state, and thus Bobs
tion of % 2 A into pure states, the private transfer coherent measurement leaves the total system in an
rate exactly equals the coherent information, approximation of the state
Ic TB ; % H TB %  H TE % X
B ;E
1
TB  TE 19 j00 i p jki 0 jki jli j0 i 21
B E k1;l1 A B1 B2 kl BE
The so-called entropy exchange
  in which Eve and Bob are still entangled. A
H TE % = H TB  idj % ih %
completely depolarizing channel TE would directly
quantifies the extent to which a formerly pure yield a factorized output state B  E here. Although
ancilla state becomes mixed via interaction with the randomization in eqn [18] does not necessarily
the signal states. Equation[19] then nicely reflects result in complete depolarization, there is a controlled
the intuition that for high-rate quantum information unitary operation which Bob may apply to effectively
transfer the signal states should not entangle too decouple Eves system, resulting in the output state
p P
much with the environment. In fact, for an almost (1= B ) k j kkiA0 B1  E, which is the maximally
noiseless channel the entropy exchange nearly entangled state of size B exp2 n Ic (TB , %) required
vanishes, and the optimized coherent information for teleportation. The direct part of the capacity
almost attains the maximal value 1, while for nearly theorem then follows by applying the above coding
depolarizing channels we have Ic (TB , %) H(%)  0. scheme to large blocks and maximizing over (pure)
So far, we have sketched a protocol for private input ensembles, concluding the proof.
classical information transfer. Devetaks coherenti- Devetaks proof of the coding theorem seems to
fication allows to pass from the transmission of indicate that the private classical capacity Cp (T)
classical messages to the transmission of coherent equals the quantum capacity Q(T) for every
superpositions. This technique has also been applied quantum channel T. However, for the coherentifica-
to obtain entanglement distillation protocols from tion protocol, we have restricted the private coding
secret key distillation, and offers a unified view on schemes to pure state input ensembles, and thus we
the secret classical resources and their quantum can only conclude that Q(T)  Cp (T). The existence
counterparts (Devetak and Winter 2004, Devetak of bound-entangled states with positive one-way
et al. 2004). distillable secret key rate (Horodecki et al. 2005)
In order to transfer quantum information, Alice implies that this inequality can be strict. A general
will only need to send one half of a maximally procedure does exist to retrieve (almost) all the
entangled state of dimensionality exp2 n Ic (TB , %). information from the output of a noisy quantum
As described in the previous section, teleportation channel that releases (almost) no information to the
then allows her to transfer arbitrary quantum states environment. But this requires a stronger form of
from a subspace of that size. privacy than eqn [18].
430 Capacity for Quantum Information

Quantum Channels with Memory shown to die out even exponentially. The set of
these channels is open and dense in the set of
This article has so far been restricted to memory-
quantum memory channels. Hence, generic memory
less quantum channels, in which successive chan-
channels are forgetful.
nel inputs are acted on independently. Messages of
The capacity of memory channels is defined in
n symbols are then processed by the tensor
complete analogy to the memoryless case, replacing
product channel T  n , as in Definition 1 and
the n-fold tensor product T n in Definition 1 by
illustrated in Figure 1. In many real-world applica-
the n-fold concatenation Tn . The coding theorems
tions, the assumption of having uncorrelated noise
for (private) classical and quantum information
cannot be justified, and memory effects need to be
can then be extended from the memoryless case
taken into account. For a quantum channel T with
to the very important class of forgetful channels
register input A and register output B, such effects
(Kretschmann and Werner 2005).
are conveniently modeled (Bowen and Mancini
Nonforgetful channels call for universal coding
2004) by introducing an additional memory
schemes, which apply irrespective of the initializa-
system M, so that now T : M  A ! B  M is a
tion of the input memory. Such schemes are
completely positive and trace-preserving map with
presently known only for very special cases.
two input systems and two output systems. Long
messages with n signal states will then be
processed by the concatenated channel Acknowledgmnts
Tn : M  An ! Bn  M. In such a concatenation,
the memory system is passed on from one channel The author thanks the members of the quantum
application to the next, and thus introduces information group at TU Braunschweig for their
(classical or quantum) correlations between con- careful reading of the manuscript and many helpful
secutive register inputs. suggestions. He also gratefully acknowledges the
Remarkably, this relatively simple model can be funding from Deutsche Forschungsgemeinschaft
shown (Kretschmann and Werner 2005) to encom- (DFG).
pass every reasonable physical process: every sta-
See also: Capacities Enhanced by Entanglement;
tionary channel S : A1 ! B1 which turns an infinite
Channels in Quantum Information Theory; Entanglement;
string of input states (on the quasilocal algebra A1 ) Positive Maps on C -Algebras; Quantum Channels:
into an infinite string of output states on B1 and Classical Capacity; Quantum Error Correction and Fault
satisfies the causality constraint is in fact a con- Tolerance; Source Coding in Quantum Information Theory.
catenated memory channel. Causality here means
that the outputs of the stationary channel S at given
time t0 do not depend on inputs at times t > t0 . Further Reading
Figure 2 illustrates the structure theorem for causal
Barnum H, Nielsen MA, and Schumacher B (1998) Information
stationary quantum channels. In general, it produces transmission through a noisy quantum channel. Physical
not only the memory channel T with memory Review A 57: 4153 (quant-ph/9702049).
algebra M, but also a map R describing the Bennett CH, Devetak I, Shor PW, and Smolin JA (2004)
influence of input states in the remote past. Inequalities and separations among assisted capacities of
quantum channels, quant-ph/0406086.
Intuitively, such a map is often not needed, because
Bennett CH, DiVincenzo DP, Smolin JA, and Wootters WK
memory effects decrease in time: the memory (1996) Mixed-state entanglement and quantum error correc-
channel T is called forgetful if outputs at a large tion. Physical Review A 54: 3824 (quant-ph/9604024).
time t depend only weakly on the memory initializa- Bowen G and Mancini S (2004) Quantum channels with a finite
tion at time zero. In fact, memory effects can be memory. Physical Review A 69: 012306 (quant-ph/0305010).
Devetak I (2005) The private classical information capacity and
quantum information capacity of a quantum channel. IEEE
Transactions on Information Theory 51: 44 (quant-ph/0304127).
tr tr Devetak I, Harrow AW, and Winter A (2004) A family of
quantum protocols. Physical Review Letters 93: 230504
(quant-ph/0308044).
S = R T T tr
Devetak I and Winter A (2004) Relating quantum privacy and
quantum coherence: an operational approach. Physical
Time Time Review Letters 93: 080501 (quant-ph/0307053).
DiVincenzo DP, Shor PW, and Smolin JA (1998) Quantum
Figure 2 By the structure theorem, a causal automaton S can channel capacities of very noisy channels. Physical Review A
be decomposed into a chain of concatenated memory channels 57: 830 (quant-ph/9706061).
T plus some input initializer R. Evaluation with the partial trace tr Eisert J and Wolf MM Gaussian quantum channels. In Cerf N,
means that the corresponding output is ignored. Leuchs G, and Polzik E (eds.) Quantum Information with
Capillary Surfaces 431

Continuous Variables of Atoms and Light. London: Imperial Horodecki K, Pankowski L, Horodecki M, and Horodecki P
College Press (in preparation)(quant-ph/0505151). (2005) Low dimensional bound entanglement with one-way
Holevo AS and Werner RF (2001) Evaluating capacities of distillable cryptographic key, quant-ph/0506203.
bosonic Gaussian channels. Physical Review A 63: 032312 Kretschmann D and Werner RF (2004) Tema con variazioni:
(quant-ph/9912067). quantum channel capacity. New Journal of Physics 6: 26
Horodecki M, Horodecki P, and Horodecki R (1999) General (quant-ph/0311037).
teleportation channel, singlet fraction, and quasidistillation. Kretschmann D and Werner RF (2005) Quantum channels with
Physical Review A 60: 1888 (quant-ph/9807091). memory. Physical Review A 72: 062323 (quant-ph/0502106).
Horodecki P and Nowakowski ML (2005) Simple test for Shor PW (2003) Capacities of quantum channels and how to find
quantum channel capacity, quant-ph/0503070. them. Mathematical Programming 97: 311 (quant-ph/0304102).

Capillary Surfaces
R Finn, Stanford University, Stanford, CA, USA
a
2006 Elsevier Ltd. All rights reserved.

g
Historical and Conceptual Background u0
A capillary surface is the interface separating two
fluids that lie adjacent to each other and do not mix.
Examples of such surfaces are the upper surface of
liquid partially filling a vertical cylinder (capillary
tube), the surface of a liquid drop resting in Figure 1 Capillary tube in infinite reservoir, in downward
equilibrium on a tabletop (sessile drop) and the gravity field.
surface of a liquid drop hanging from a ceiling
(pendent drop); further instances are the surface of a
falling raindrop, the bounding surface of the liquid more general usage adopted in the definition above
in the fuel tank of a spaceship, and the interface derives from the recognition of a class of phenomena
formed by a fluid mass rotating within another fluid. with a common physical basis.
This last example extends to the problem of rotating The first recorded observations concerning
stars. capillarity seem due to Aristoteles c. 350 BC. He
Interfaces separating fluids and solids share some wrote that a broad flat body, even of heavy
of the physical attributes of capillary surfaces, and material, will float on water, however a narrow
the study of wetted portions of rigid support thin one such as a needle will always sink. Any
surfaces becomes essential for describing global reader with access to a needle and a glass of water
behavior of capillary configurations. However, some will have little difficulty refuting the assertion.
significant distinctions appear that change the Remarkably, the error in reasoning seems not to
formal structure of the problems, and must be have been pointed out for almost 2000 years,
accounted for in the theory. when Galileo addressed the problem in his
Phenomena governed by capillarity pervade all of Discorsi, about 1600. The only substantive studies
daily life, and most are so familiar as to escape till that time are apparently those of Leonardo da
special notice. By contrast, throughout the eigh- Vinci a hundred years earlier. Leonardo intro-
teenth century and presumably earlier, great atten- duced reasoning close in spirit to that of current
tion centered on the rise of liquid in a narrow glass literature; however, the Calculus was not available
circular-cylindrical tube dipped vertically into a to him, and he was not in a position to develop his
liquid reservoir (Figure 1); this striking event had a ideas in quantitative ways.
dramatic impact that confounded intuition. Clarifi-
Youngs Contribution
cation of the behavior became one of the major
problems challenging the scientific world of the The later discovery of the Calculus provided a
time, and was not achieved during that period. The driving impetus guiding many new studies during
term capillary, adapted from the Latin capillus the eighteenth century. But despite the enormity of
for hair, was applied to the phenomenon since it was that weapon, it did not on its own suffice, and initial
observed only for tubes with very fine openings; the quantitative success had to await two initiatives
432 Capillary Surfaces

taken by Thomas Young in 1805. Young based his where N is a unit normal on S, and n is unit
studies on the concept of surface tension that had conormal (as indicated in Figure 2) on . Multi-
been introduced by von Segner half a century earlier. plying both sides of [4] by , the right-hand side
Segner hypothesized that every curve on a fluid/fluid becomes the net surface tension force on S. Since
interface S experiences on both its sides an orthogo- that must equal the net balancing pressure force, we
nal force  per unit length, which (for given obtain
temperature) depends only on the materials and is Z
directed into the tangent planes on the respective
p  2H N dS 0 5
sides. The presence of such forces can be indicated S
by simple experiments. They become clearly evident
in the case of thin (soap) films spanning a frame, in Letting the diameter of S tend to zero, the assertion
which case there is an easily observed orthogonal follows.
pull on the frame, see the section Dual interpreta- We emphasize here the implicit assumption above,
tion of : distinction between fluids and solids. that  is a constant depending only on the particular
Young made two basic conceptual contributions materials, and not on the shape of S. This author
(Y1, Y2): knows of no source in which that is clearly
established, although experiments and experience
Y1. Relation of pressure jump across a free interface
provide some a posteriori justification. See the
to mean curvature and surface tension.
further comments under Y2, and later in sections
Consider a piece of surface S in the shape of a Gauss contribution: the energy method and
spherical bowl of radius R, separating two immisci- Dual interpretation of : distinction between fluids
ble fluid media, as in Figure 2. In equilibrium, any and solids.
pressure difference p across S must be balanced by
Y2. The capillary contact angle.
a tension  on its rim . If S projects to a disk of
(small) radius r on the plane tangent to S at the Young asserted that there are surface tensions for
symmetry point, we are led to solid/fluid interfaces analogous to those just intro-
duced, and again depending only on the materials.
r2 p 2r sin # 1
This assertion is erroneous, as was suggested in
where # is inclination of S at the rim, relative to the writings of Bikerman and of others, and more
plane. We thus find at the base point recently established in a definitive example by Finn.
Using his premise, Young attempted to characterize
d sin # 1
p 2 2 2 the contact angle  made by the fluid surface with a
dr R rigid boundary, by requiring that the net tangential
Young then went on to consider a general S, without component of the three surface tension vectors
symmetry hypothesis. Letting 1=R1 , 1=R2 denote the vanish at the triple interface; this leads to the often
planar curvatures at a point in S of two normal employed but incorrect Young diagram, see
sections in orthogonal directions, he asserted that Figure 3, and the relation
 
1 1 1  1  2
p 2  2H 3 cos  6
2 R1 R2 0
where H is the mean curvature of S at the point.
Although Young provided no formal justification for
this step, we can establish it with the aid of a general 1
formula from differential geometry that was not
known in his lifetime: Solid Gas
Z I
2HN dS n ds 4
S r
2
0
n
p1
Liquid

p2
Figure 2 Pressure change across fluid element, balanced by Figure 3 Young diagram; balance of tangential forces.
surface tension. Residual normal force remains.
Capillary Surfaces 433

for cos  in terms of the magn itudes of the three quantitative indication of what narrow should
surface tensions. Young concluded that the signify. Note that whenever 0   < =2, [9]
contact angle depends only on the materials, and becomes negative when the nondimensional Bond
in no other way on the conditions of the problem. Number B = a2 exceeds 8; since u is known to be
This basic assertion is by a fortuitous acciden t positive in the indicated range for , [9] provides
correc t, as follows from the contribution by no information in that case, whereas [7] is still of
Gauss described below; it underlies all modern some value. Nevertheless, [9] is asymptotically
theory. exact and consists of the first two terms of the
Using Y1 and Y2, Young produced the first formal expansion in powers of a; that was first
verifiable prediction for the rise height u0 in proved by D Siegel in 1980, almost 200 years
the circular capillary tube of Figure 1. He following the discovery of the formulas. In 1968,
assumed the interface to be spherical, so that H P Concus extended the formal expansion for the
is constant and a = cos =H. He assumed vanish- height to the entire traverse 0 < r < a. F Brulois
ing outside pressure. According to classic laws of (1981) and independently E Miersemann (1994)
hydrostatics, p = gu0 = 2H by Y1, where  is proved the expansion to be asymptotic to every
fluid density; there follows the celebrated rela- order. Explicit bounds for the rise height above
tion, presented entirely in words in his 1805 and below, making quantitative the notion of
article: narrow, were obtained by Finn.
Laplace supplied the first detailed mathematical
2 cos  g
u0  ;  7 investigations into the behavior of capillary surfaces,
a  applying his ideas to many specific examples. His
underlying motivation apparently derived at least
Young scorned the mathematical method, and partly from astronomical problems, and he pub-
made a point of deriving and publishing his lished his contributions in two Supplements to the
results on capillarity without use of any mathe- tenth volume of his Mecanique Celeste.
matical symbols. This personal idiosyncrasy
causes his publications to be something of a
Gauss Contribution: The Energy Method
challenge to read.
Young and Laplace both based their reasonings
The Laplace Contribution on force-balance arguments, which at best were
unclear and at worst conceptually wrong. In
In 1806, Laplace published the first analytical expres-
1830, Gauss took up the problem anew from a
sion for the mean curvature of a surface u(x, y), and
variational point of view, using the Johann
showed that the expression can be written as a
Bernoulli principle of virtual work. To do so, he
divergence. He obtained the equation
attempted to characterize both surface energies
ru and bulk fluid energies in terms of postulated
div Tu  2H; Tu  q 8
particle attractions and repulsions. In an aston-
1 jruj2
ishing 30 pages, he essentially introduced founda-
Thus, if H is known from geometrical or physical tions of modern potential theory, of measure
considerations, as it is for the capillary tube in theory, and of thermodynamics. He ended up
the example just considered, one finds a second- with elaborate expressions that could not readily
order (nonlinear) equation for the surface height be applied, and which at least to some extent he
of any solution as a graph. The equation is did not use. He asserted, for example, that the
elliptic for any function u(x, y) inserted into the bulk internal energy would be proportional to
coefficients, however not uniformly so; the parti- volume, which for an incompressible fluid is
cular nonuniformity leads to some striking and constant under admissible deformations, and on
unusual behavior of its solutions, as we shall see. that basis he ignored the bulk energy term
With the aid of [8], Laplace improved the Young completely. His procedures then led him, in an
estimate [7] to independent and more convincing way, to the
" !# identical equation and boundary condition that
2 cos  1 2 1  sin3  had been produced by his predecessors. It must,
u0    a 9
a cos  3 cos3  of course, be remarked that his justification for
ignoring the bulk energy term would not be
Both Young and Laplace proposed their for- correct for a compressible liquid (see the section
mulas for narrow tubes, but neither gave any Compressibility), and it is open to some
434 Capillary Surfaces

question for the central motivating problem of a


capillary tube dipped into an infinite liquid bath,
in which event there is no volume constraint.
S 2 .
The material that follows is guided by the ideas of
1
.
Gauss; however, I have found it advantageous to
replace his elaborate hypotheses on particle attrac-
tions and repulsions by a simpler phenomenological Figure 5 Attractions on a fluid element: (1) interior to the fluid;
reasoning as to the nature of the energy terms to be (2) on the surface interface.
expected.
To fix ideas, we consider a semi-infinite cylinder The constant  has the dimensions of force per unit
of general section  and of homogeneous material, length, and turns out to be the surface tension of the
closed at the bottom, situated vertically in a down- interface. We note from [10] its dual interpretation
ward gravity field g per unit mass, and partly filled as areal energy density on S, arising from formation
with an incompressible liquid of density  covering of that surface. This alternative interpretation lends
the bottom (a more exact discussion, taking account conceptual support to the supposition that  is
of compressibility, is indicated below in the section constant on S. See the section Dual interpretation
Compressibility). We assume an equilibrium fluid of : distinction between fluids and solids.
configuration with the liquid bounded above by an Implicit in the above discussion are deep
ideally thin interface S : u(x, y) (see Figure 4). We premises about the nature of the forces acting
distinguish the energy terms that occur: within the fluid. Essentially these forces must be
1. Surface energy. This is the energy required to perceptible only at infinitesimal distances, and
create the surface interface S. We can characterize it grow rapidly with decreasing distance. Forces
by noting that fluid particles within or exterior to the both of attraction and of repulsion must be
liquid are attracted equally to neighboring particles in present. The recognition of the need for such
all directions; however, at the surface S there is a forces can be traced back to Newton. Quantita-
differential attraction, to particles of the exterior tive postulates as to their precise nature were
medium (such as air) above, or to the liquid below introduced by van der Waals in the late nine-
(see Figure 5). Thus, particles in the interface are teenth century, and the topic remains still in
pulled orthogonally to S. In general, for a liquidgas active study. Since these forces appear at mole-
interface, significant work will be done only on the cular distance levels, their introduction leads
liquid and those particles will be pulled toward the inevitably to questions of statistical mechanics.
liquid; otherwise, the liquid would evaporate across Additionally, our discussion of work done in
the interface and disappear. The work done in that forming the surface implicitly assumes a compres-
(infinitesimal) motion is proportional to the area of S, sible transition layer there, in conflict with our
so that for the surface energy ES we obtain treatment of S as an ideally thin interface
Z q bounding an incompressible fluid. In these senses,
ES  1 jruj2 dx 10 it is striking that [10] which is in accord with
 classical constructions could be obtained via
global qualitative postulates concerning a con-
tinuum in static equilibrium, in which the specific
nature of the forces is not introduced.
S Rayleigh measured the thickness of the surface
interface between water and air to be of mole-
cular size, thus providing experimental justifica-
tion for the procedure adopted.
2. Wetting energy. A similar discussion applies at
g
the interface separating the liquid and solid at the
cylinder walls; however, this time the net attraction
can be in either direction, as particles from neither
medium can migrate significantly into the other. For
the wetting energy EW , we write, with  the
boundary of ,
Figure 4 Liquid in cylindrical capillary tube, of general section . I
Reproduced with permission from the American Institute of EW  u ds 11
Aeronautics and Astronautics. 
Capillary Surfaces 435

We designate  as the relative adhesion coefficient of


as we wish on the boundary, and the fundamental
the liquidgassolid configuration. We assume that lemma now yields  Tu =  on . We now note
the cylinder walls are of homogeneous material, so that for any liquid surface u(x, y) there holds
that  will be constant. In general,  is a difference of
factors that apply on the walls at the two interfaces,  Tu cos  17
with the liquid and with the external medium. on , where  is the angle between the cylinder wall
3. Gravitational energy. The work done in and the surface S, measured within the liquid. Since
lifting an amount of liquid h against the  is assumed to be constant, that is so also for . It is
gravity field from the base level to a height h in a a physical constant: the contact angle, that must be
vertical tube of small section  is ghh. Thus, measured in an independent experiment, and cannot
the work done in filling that tube up to the be prescribed in advance or calculated within the
surface height u is (gu2 =2), and the total scope of the theory.
gravitational energy is The constant , originally introduced as a general
Z proportionality constant, is now characterized as
g
EG u2 dx 12  = cos . We thus see that a physical surface of the
2 
form envisaged is possible only if 1    1.
4. Volume constraint. In the configuration con- Physically, one expects that if  < 1 the liquid
sidered the volume is to be unvaried during will separate from the walls, while, if  > 1, the
admissible deformations; we take account of the liquid will spread over the walls as a thin film.
constraint by introducing a Lagrange parameter , Equation [16] and boundary condition [17]
and an additional energy term provide a nonlinear second-order equation that is
Z elliptic for any function u(x, y), and also a non-
EV  u dx 13 linear transversality condition on the boundary, for

determining the surface interface S. The expression
According to the principle of virtual work, the div Tu is exactly twice the mean curvature of the
sum E of the above energies must remain unvaried surface S. If  6 0 then can be eliminated by
in any deformation that respects all mechanical addition of a constant to u. The problem [16][17]
constraints other than the volume constraint. We for the fluid in a vertical cylindrical capillary tube
choose a deformation u ! u "
, with
smooth in of general section becomes thus a geometrical one:
the closure of , which determines a functional E("). to find a surface whose mean curvature is a
From E0 (0) = 0 follows prescribed function of position in space, and
8 9 which meets the cylindrical boundary walls in a
Z >
< >
= prescribed angle .
ru In the absence of gravity, [16] takes the form
r
 q
u dx
>
: 1 jruj2
>
;
div Tu 2H 18
I
 
ds 0 14 for a surface of constant mean curvature H. The

constant H is determined by integrating [18] over ,
from which and using [17]:
Z jj cos 
 

div Tu u dx 2H 19
jj

I

 Tu  ds 0 15 where jj and jj denote the respective perimeter
 and area, and thus H is independent of volume.
q From the known uniqueness up to an additive
with Tu  ru= 1 jruj2 , and with the unit constant of the solutions of [18], [17] it follows
exterior normal on . Choosing first
to have that the shape of the solution surface is indepen-
compact support in , the boundary term vanishes, dent of volume. That result holds also for [16], [17]
and the fundamental lemma of the calculus of in view of the possibility to eliminate from the
variations yields equation by addition of a constant, and the
uniqueness of the solutions of the resulting
div Tu u ;  g= 16
equation.
throughout . Thus, the area integral in [15] Equations [16][17] or [18][17] are appropriate
vanishes for any
. We are therefore free to choose for determining capillary surfaces that are graphs
436 Capillary Surfaces

u(x, y) over a base domain . More generally, any 1


surface S in 3-space satisfies the equation 1
0 0
x 2HN 20
2 V N
2
where H is its scalar mean curvature and N is a unit
normal vector on S. Here  is the intrinsic (a) (b)
Laplacian in the metric of S. This is the appropriate Figure 7 (a) Floating spherical ball; presumed Young forces.
relation to be applied in situations for which the (b) Normal and vertical components of Young forces; contra-
physical surface folds over itself and cannot be diction to presumed equilibrium.
expressed globally as a graph. The formal simplicity
of [20] is deceptive; the challenges arising from the To do so would lead to a net downward force v on
nonlinearity in the equation can be formidable, and the ball (see Figure 7b), contradicting the supposed
very little general theory is as yet available. equilibrium state.

Dual Interpretation of : Distinction between Mathematical and Physical Predictions:


Fluids and Solids
Experiments
We have already remarked the duality in connection
In the following sections, we study the kinds of
with eqn [10] above. It can be made explicit with a
behavior imposed on a surface S by the requirement
simple experiment proposed by Dupre. One makes a
that it appear as solution of one of the indicated
rigid frame with a sliding bar of length l, as in
equations and boundary conditions. Some of these
Figure 6, and dips the frame into soap solution. On
properties are quite surprising in the context of
lifting the frame from the solution the opening will
classically expected behavior of solutions of equa-
be filled with a soap film, and one finds a force
tions of mathematical physics. The mathematical
F = 2l on the bar, directed orthogonal to the bar
predictions were, however, corroborated in certain
(the factor 2 appears since the film has two sides).
cases experimentally, as we discuss below.
The work done in sliding the bar a distance x is
F = 2lx, which can also be written F = 2A
Uniqueness and Nonuniqueness
with A an element of area. In this sense, the two
interpretations of  are formally equivalent, for We begin by considering uniqueness questions. We
fluid/fluid interfaces. start with a semi-infinite capillary tube, closed at the
The equivalence cannot be extended to solid/fluid bottom, to be partially filled with a prescribed
interfaces. Consider a rigid spherical ball of generic volume of (incompressible) liquid making contact
material and radius R, freely floating in an infinite angle  on the container walls (Figure 8a). If   0,
liquid bath in a gravity-free environment, see any solution is uniquely determined. That is a quite
Figure 7a. It can be shown that the unique general theorem, valid for a wide class of domains 
symmetric solution to the problem is a horizontal including all piecewise smooth domains (at the
surface, as in the figure. A variational procedure as corners of which data of the form [17] cannot be
above shows that if e0 , e1 , e2 are the interfacial prescribed); formally, data can be omitted on any
energy densities associated with the three interfaces, boundary set of linear Hausdorff measure zero. In
then this result, no growth conditions need be imposed
e1  e2 near the boundary (note that such a statement
cos  21 would be false for solutions of the Laplace equation
e0
under Dirichlet boundary conditions).
in formal analogy with the Young relation [6]. But Next we consider a sessile liquid drop on a
e1 , e2 cannot be interpreted as interfacial forces horizontal plate (Figure 8b). Again the solution is
whose net tangential component cancels that of e0 . uniquely determined by the volume and by ,
although the known proof differs greatly from that
of the other case.
We now consider a smooth deformation of the
base plane, depending on a parameter t, which
F carries it into the cylinder; that can be done in such
a way that the supporting surface is at all times
bowl-shaped, as in Figure 8c. Since the bowl
Figure 6 Dupre apparatus for exhibiting surface tension. formation tends to restrict the possible deformations
Capillary Surfaces 437

admitting an entire continuum of distinct solution


interfaces, all with the same contact angle and
g enclosing the same fluid volume (Gulliver and
Hildebrandt; Finn). This can be done for any gravity
field. Figure 9 illustrates seven members of the family
of interfaces, in the particular case  = 0.
The question immediately arises as to which if
any of the continuum of surfaces will be seen in
an experiment. In fact, it can be proved that none
(a) (b) (c) of the indicated surfaces is mechanically stable
(Finn, Concus and Finn, Wente). Since the indicated
family includes all symmetric surfaces that are
stationary for the energy functional, we find that
45 any stable stationary configuration must be asym-
metric. Thus, we have obtained an example of
symmetry breaking, in which all conditions of the
45
45 problem are symmetric, but for which all physically
45
acceptable solutions are asymmetric.
(d) These results were subjected to computational test
by M Callahan using the Surface Evolver software,
Figure 8 Support configurations: (a) capillary tube, general
section; (b) horizontal plate; (c) convex surface appearing during to experimental test by M Weislogel in a drop
deformation of horizontal plate to capillary tube; and (d) tower, and to experimental test by S Lucid in the
Nonuniqueness of configuration appearing during convex defor- Mir Space Station. The results of the latter experi-
mation. Reproduced from Mathematics Intelligencer 24(3) 2002 ment are compared in Figure 10 with the computer
2133 with permission from Springer-Verlag Heidelberg.
calculations. In both cases, both a local minimizer
(potato chip) and a presumed global minimizer
of the fluid consistent with smooth contact with the (spoon) were observed.
supporting rigid surface, one might expect that The seven surface interfaces indicated in Figure 9
the corresponding capillary surface S(t), arising all provide the same sum of surface and wetting
from the identical fluid mass, will for each t be energy, and bound the same volume of fluid. They
uniquely determined. all satisfy an eqn [18] with constant H, in
That is however not true, even for symmetric accordance with hypotheses of incompressibility
configurations. We can see that from the configuration and vanishing gravity. Thus, formally, all configura-
of Figure 8d, consisting of a vertical circular cylinder tions have identical mechanical energy. The surfaces
whose base is a 45 cone. We assume a contact angle
 = 45 and adjust the radius so that a horizontal
surface lying just below the cylinder/cone juncture
provides the prescribed volume. This is a formal
solution surface. Now fill the configuration with a
larger volume, so that the contact line will lie above the
juncture. The upper surface will no longer be flat, in
view of the 45 contact angle, and takes an appearance
as indicated in the figure. Finally, we decrease the fluid
volume, keeping all other parameters unchanged. As
noted above, the upper surface moves rigidly down-
ward, and it is clear that if the original surface is close
enough to the juncture line, then the prescribed volume
will be attained before the contact line reaches the
juncture. Thus, uniqueness fails. Figure 9 Seven spherical capillary interfaces in an exotic
In this construction as just described, the bounding container of homogeneous material in zero gravity. All interfaces
surface is not smooth; however, one sees easily that bound the same volume and have the same sum of free surface
the procedure continues to work if the edge and and wetting energies. If all pressures above the interfaces are the
same, then the pressures below them successively increase as the
vertex are smoothed locally. In fact, one can carry the curvature vectors of the vertical sections change from upwardly to
procedure to a striking conclusion; by appropriate downwardly directed. Reproduced from Mathematics Intelligences
smoothing, one can construct a bounding surface 24(3) 2002 2133 with permission from Springer-Verlag Heidelberg.
438 Capillary Surfaces



P

Spoon (left) Rotationally Potato chip
symmetric

Figure 11 Wedge domain. Reproduced from Finn R Capillary


Surface Interfaces in Notices of AMS 46 No.7 (1999) with
permission of the American Mathematical Society.

2
Spoon (left) Potato chip

Figure 10 Symmetry breaking in exotic container, g = 0. Below:
calculated presumed global minimizer (spoon) and local minimizer + D1
D2
(potato chip). Above: experiment on Mir: symmetric insertion of fluid (No graph)
(center); spoon (left); potato chip (right). This is a grayscale version
(D)
of a color figure reproduced from Journal of Fluid Mechanics, 224:
38394, (1991) with permission of Cambridge University Press.
R 2

are all spherical caps; however, the radii R of the


(Continuous)
caps vary considerably. According to Y1 above, the
pressure change across each interface is p = 2=R.
D2
Since one may assume the outer region to be a
vacuum with zero pressure for all caps, we find that D1+
(I )
the pressures within the fluids vary greatly among (No graph)
the configurations. One would thus expect that
work is done within the fluid in passing from one 0
0 1
configuration to another, a circumstance we have
Figure 12 Domain R of data yielding continuous normal to
excluded by hypothesis when determining the
capillary surface in wedge of opening 2a < p. The symbols D
family. From this point of view, the (customary) and I are clarified in the section Behavior at a corner point.
hypothesis of incompressibility that was used in Reproduced from Capillary Wedges Revisited in SIAM J. Math.
determining the family is put into significant ques- Anal. 27 No.1 (1996) 5669 with permission from SIAM.
tion; we examine this point in some detail in the
section Compressibility. also additional material anticipating the section
Drops in wedges).
For data points interior to R, this criterion also
Discontinuous Dependence I
suffices for the existence of at least one such solution
Capillary surfaces can exhibit striking discontinuous surface, for any prescribed H; such surfaces can in
dependence on the defining data. As initial example, fact be produced explicitly as spherical caps (planes
we consider the behavior of a solution of [18][17] if H = 0). It remains to discuss what can occur with
at a protruding corner point P of the domain  of data arising from the remaining four subregions of
definition. For simplicity, we assume the corner the square.
bounded locally by straight segments, meeting in an If (1 , 2 ) 2 D

1 , then there is no solution to


opening angle 2 < , thus forming locally a wedge [18][17] in any neighborhood of the corner point
domain. In anticipation of material to follow, we P. On the other hand, an explicit solution for any
assume contact angles 1 and 2 on the respective H > 0 can be found as a lower spherical cap on
sides, 0  1 , 2  . One can show that a necessary the segment 1 2 =   2 that separates D 1
condition for a solution surface over a domain  as from R (see Figure 13, which indicates the
in Figure 11 to have a continuous normal vector up equatorial circle). Correspondingly, if H < 0 then
to P is that the data point (1 , 2 ) lie in the closure of an explicit solution can be found on the separation
the rectangle R of Figure 12. (This figure includes line between D 1 and R. Thus, there is a
Capillary Surfaces 439

0 lies strictly interior to a section 1 of a tube Z1 ,


will raise liquid from an infinite reservoir in a
downward directed gravity field to a higher level
over 0 than will Z1 over that subdomain of its
2 section. That is true if both cylinders are circular,
1 and in the intervening years its correctness was
established in a number of other cases of particular
2 2 . P interest.
Finn and Kosmodemyanskii, Jr. showed, how-
1
ever, by example that the assertion fails in a large
range of cases, and in fact can fail with arbitrarily
large height differences, uniformly over 0 . Beyond
Figure 13 Construction of solution as lower hemisphere; g 1
g 2 = p  2a, H > 0. Reproduced from Capillary Wedges Revis- that, the construction exhibits a strikingly discontin-
ited in SIAM J. Math. Anal. 27 No.1 (1996) 5669 with uous change of behavior, under perturbations of a
permission from SIAM. disk as inner domain. Perhaps more remarkably, the
assertion can hold with the inner domain a disk, but
discontinuous change in behavior in crossing from with discontinuous reversal of behavior as the disk is
R to either of the D1 regions. perturbed to neighboring disks. That was shown in a
This behavior was put to experimental test by form of the example given later by Finn, and
W Masica, who considered the case 0 < 1 = 2 = illustrated in Figure 15. Here the outer domain 1
 < =2 near the crossing point  = cr with D 1 , for is polygonal, with sides that extend to be tangent to
which cr = =2. He partially filled a regular a unit disk 0 , as indicated. The angle  is to be
hexagonal cylinder of acrylic plastic, successively chosen so that 0  =2    min , where min is the
with two different liquids, making respective contact smallest of the interior vertex half-angles of 1 . In
angles greater or less than cr with the plastic. For view of the assumed infinite fluid reservoir, there is
each liquid, Masica then allowed the cylinder to fall no volume constraint, and the governing equation
in a 132 m drop tower. Figure 14 compares the two [16] takes the form
configurations after about 5 s of free fall. In the case
div Tu u;  g= > 0 22
 > cr he obtained the spherical-cap solution,
which in this case covers the entire base domain  Taking at first the inner domain to be 0 , it can
and appears as an explicit solution of [18][17]. be shown that for the corresponding solutions u0
When  < cr , the liquid rose to the top of the and u1 of [22], there holds u0 > u1 over 0 for
cylinder near the edges, filling out the edges over the
corner points. The surface interface S does not cover
, but instead folds back over itself, doubly covering
a portion of . Thus, a physical surface appears as it
must, but it is not a solution of [18] over .

Discontinuous Dependence II

About 1970, M Miranda raised informally the


question, whether a capillary tube Z0 , whose section
1 1+

0
1

Figure 15 Discontinuous reversal of limiting height behavior. All


sides of the polygonal domain 1 are tangent to the unit disk 0 .
(a) (b) For the corresponding solution heights u 0 in 0 , u " in the disk "
Figure 14 Liquid in hexagonal cylinder, during free fall in drop of radius 1 e, and u 1 in 1 , there holds u 1  u 0 < 0, for any
tower: (a) a g > p=2; (b) a g < p=2. downward gravity. But lim ! 0 (u 1  u " ) = 1, for any e > 0.
440 Capillary Surfaces

any  > 0, and thus the Miranda question has a Gerhardt (F and G) extended this condition, and
positive answer for that configuration. But if we showed in particular that solutions exist in general
replace 0 by a concentric disk " 1 of radius in piecewise smooth . This result contrasts with the
1 ", we find zero-gravity case [18] discussed in the section
( )  Existence p
questions
II, for which solutions fail to
 2" cos  
 1 " exist when 1 L2 cos  > 1 at a protruding corner
 inf u x;   sup u x;  
 " 1"   (see the section Discon
" ptinuous
dependence I).
1  sin ! 1  sin  However, in the cases 1 L2 cos  > 1 studied
< 1 " 23 by F and G the solution u(x) is necessarily
cos  cos 
unbounded in the corner. This condition is equiva-
where ! = arccos(cos = sin ), and u" is the solution lent to < j  =2j at the corner. Concus and Finn
of [22], [17] in " . Since  does not appear on the showed that if  j  =2j in a neighborhood 
right side of [23], there follows in particular that for of a corner with rectilinear sides, as indicated in
any " > 0, there holds Figure 11, then the solution u(x) satisfies
( )
2
lim inf u1 x;  sup u" x; 1 24 jux;j <  26
!0 " " 

In particular, a negative answer to Mirandas independent of ,  in the range considered. Here it


question appears for all gravity sufficiently small. is assumed that [16] is normalized so that = 0;
But as observed above, a positive answer occurs in when  6 0 this can always be achieved by adding a
0 , for any positive gravity. Thus, the limiting constant to u. On the other hand, if < j  =2j,
behavior as  ! 0 changes discontinuously, as " ! 0. then
We find that the two limiting procedures cannot be p
interchanged: for any x 2 0 , we obtain cos #  k2  sin2 #
ux;  27
  kr
lim lim u1 x;  u" x; 1:
"!0 !0 where k = sin = cos  and # is polar angle relative
  25
lim lim u1 x;  u" x;  const: < 0 to a bisector at the vertex; hence u becomes
!0 "!0
unbounded as O(1=r). Thus, the behavior changes
discontinuously as the configuration for which
= j  =2j is crossed.
Existence Questions I
This prediction was corroborated by T Coburn in
For the general equation [20] there is an established a kitchen sink experiment in the Medical School
literature on existence of surfaces containing a at Stanford University. Coburn formed a wedge
prescribed space curve. There is very little literature using two sheets of acrylic plastic, resting on a glass
relating to the capillarity boundary condition that plate, and inserted a drop of distilled water at the
the solution surface S meet a prescribed support base of the wedge. Initially, the wedge was opened
surface W in a prescribed angle . The existence of sufficiently that   =2, and he obtained the
at least one such surface interior to a prescribed configuration of Figure 16a, with the maximum
sufficiently smooth closed space domain was proved height slightly lower than that indicated by [26]. By
by Almgren, and then Taylor proved smoothness at closing down the angle slightly, the liquid rose to
the contact curve. These are abstract theorems that over ten times that height, as shown in Figure 16b.
are basic for the theory but in general do not This experiment was later repeated by Weislogel
provide specific information in particular cases of under laboratory conditions; it incidentally estab-
interest. lishes the contact angle of water and acrylic plastic
Special interest attaches to the nonparametric in the Earths atmosphere as 80
2 .
cases [16] or [18] with boundary condition [17], The indicated procedure provides in general a
especially in view of the discontinuous behavior very accurate way to measure contact angles, when
properties described above. These cases were studied the angle is not far from =2. For  near zero or  in
in depth by a number of authors, with results that the Earths gravity field, the discontinuity is con-
put the above examples into some perspective. fined to a microscopic neighborhood of the vertex,
M Emmer proved the existence of a unique and can be difficult to observe. This technical
solution of [16][17] for any compact  having difficulty was addressed by Fischer and Finn, who
Lipschitz boundary with Lipschitz constant L such
p introduced canonical proboscis domains, the
that 1 L2 cos  < 1  " for some " > 0. Finn and theory of which was further developed by Finn and
Capillary Surfaces 441

particular computed solutions, Concus and Finn


conjectured that all solutions of [18] or of [16] that
arise from data in D
2 are discontinuous at P. A
number of attempts to prove or to disprove this
conjecture have till now been unsuccessful.
An existence theorem for [16][17] alternative to
that of Emmer was obtained independently by
Uraltseva, using a very different approach. This
procedure yielded smoothness estimates up to the
boundary, but required a hypothesis of boundary
smoothness, so that the result does not mesh with the
discontinuous dependence behavior as does that of
Emmer. Later versions of the existence result, again
under boundary smoothness requirements, were given
(a) (b) by Gerhardt, Spruck, and Simon and Spruck. In the
procedure introduced by Emmer, the boundary trace is
Figure 16 Distilled water in wedges formed by acrylic plastic
plates; g > 0. (a) a g > p=2; (b) a g < p=2. Reproduced shown to exist only in a very weak sense (which,
from P Concus and R Finn, On Capillary Free Surfaces in a however, suffices for a uniqueness proof). The later
Gravitational Field in Acta Math 132 (1974) 207223 with work can be adapted to show that the Emmer
permission of Institut Mittag-Loeffler. solutions are smooth on the smooth parts of @.
None of the above procedures provides existence for
Leise and by Finn and Marek. For such domains the the zero gravity case [18]. As we shall see in the
change in behavior is not strictly discontinuous, but following section, that is not an accident of the
it is nearly so, and it extends over large portions of methods, but reflects subtle properties of the equations.
the cylinder section, so that it is easily observable.
Concus, Finn, and Weislogel conducted space
Existence Questions II
experiments, demonstrating the feasibility of the
method as a means for measuring contact angles in We consider here the zero-gravity case [18], over a
general ranges. domain  bounded by a piecewise smooth curve ,
In [26][27] no growth conditions at the corner under the boundary condition [17]. Integrating [18]
are imposed; the estimates hold for every solution over  and using [17], we find 2H jj = jj cos . Let
defined in  and assuming the prescribed data on  ,  =  \ @ ,  =  \ @ . The same proce-
the side walls, with no data prescribed at the vertex. dure over  , using that jTuj < 1 for any u(x, y),
The formula [27] is the initial term of a formal leads to the bound
asymptotic expansion of the solution, in powers of r.
;   > 0 28
Miersemann obtained the complete expansion,
asymptotic to every order, when < j  =2j. He where  is defined by
obtained somewhat less complete information in the
;    jj  j j cos  2H j j 29
bounded case [26].
Chen, Finn, and Miersemann provided a form of The inequality [28] must hold for any choice of
[27] that is applicable for any data (1 , 2 ) on the  . This provides a necessary condition for
respective sides of the wedge, that arise from the D
1 existence of a solution to [18][17] in . E Giusti
regions of Figure 12. Lancaster and Siegel and showed that when  is interpreted in a generalized
independently Chen, Finn, and Miersemann showed sense as a Caccioppoli set, the condition [28]
that if 2  1 2    2 , then every solution becomes also sufficient for existence.
is bounded at the vertex. This result holds also for It is easy to give specific examples of convex
the zero gravity eqn [18]. analytic domains , in which subdomains  can be
In the case of [18], Concus and Finn showed that found such that [28] fails. Thus, the general
in the D
1 regions no solution exists, regardless of H. existence results for [16] do not carry over to [18],
Again, this result holds without growth conditions. regardless of local domain smoothness. Neverthe-
From these considerations and from remarks in less, in many cases of interest (e.g., a circular disk or
the section Discontinuous dependence I follows an ellipse that is not too eccentric), solutions of
that for data in D
2 , all solutions either of [18] or of [18][17] do exist for any  and are well behaved.
[16] are bounded but have discontinuous derivatives Finn investigated the condition [28] in general by
at the vertex P. Extrapolating from the behavior of showing the existence of a system of arcs {} 
442 Capillary Surfaces

tending to a corner point P of a domain . These limits



can exhibit remarkable idiosyncratic behavior. For

simplicity of exposition, we restrict ourselves here to


rectilinear boundary segments at P, and assume


constant boundary angles 1 , 2 6 0,  on the two
sides. L and S prove first that the limits Ru exist and


vary continuously with direction of approach; then
Figure 17 Extremal configuration for the functional . they show the existence of fan regions of directions
adjacent to those of the sides, in which the limits are
that minimize . All such arcs are circular of radius constant independent of direction, see Figure 18. They
1/2H, and meet  either at smooth points in an obtain that if the opening angle 2 at P satisfies 2 <
angle , or else at a reentrant corner point in an , then for data in the rectangle R of Figure 12 the fans
angle   , measured on the side of  opposite to overlap (see Figure 18a), so that the solution is
that into which the curvature vector points necessarily continuous at P. For data in D 2 , the
(Figure 17). All minimizing configurations are solution decreases from the 1 side 1 to the 2 side 2
bounded by arcs of that form, although not all (D behavior), subject to the ConcusFinn conjecture
such configurations minimize. In a typical situation (see the section Existence questions I), with the
one will encounter only a finite number of such arcs, reverse behavior (I) in D2 . Concus and Finn showed
in which case only a finite number of cases need be that if 2 <  then in D
1 there is no bounded solution
examined. If  > 0 in each such case, then a of [16][17] or [18][17] as a graph. For [16][17],
solution of [18][17] exists for the given  and . unbounded solutions do however exist for such data
It may occur that no such arcs exist; we then observe (see the section Existence questions I).
that since [;; ] = [; ] = 0,  cannot become
nonpositive for any   unless a minimizing 
can be found in , contradicting the assumed
nonexistence of minimizers. Thus, the criterion is A
then vacuously satisfied, and we conclude that a
solution of [18][17] exists. A
A
One has, of course, to ask what happens
physically in cases for which [; ]  0 for some
2
 as above. The possible modes of behavior were B P
studied in particular cases by Tam and later by 2 P
A
deLazzer, Langbein, Dreyer, and Rath; Finn and
Neel characterized the general case. Formally, the B
fluid rises to infinity throughout domains  of the B B
form indicated, but with H replaced by a value
(a) (b)
H  < H; on the opposite side of the circular arcs ,
the fluid is asymptotic to the vertical cylinders over
. In a physical situation, the fluid will rise to the
top of the container in a nearly cylindrical region
adjacent to a portion of the container walls,
approximating the indicated behavior and partially
wetting the top of the container. One sees that
behavior in Figure 14b, in which the fluid fills out
regions adjacent to the corners. An analogous
configuration would still be observed if the corners
were smoothed locally. If insufficient fluid is
available, a portion of the base  could become
unwetted. (c)
Figure 18 (a) Fan domains APA0 and BPB 0 of constant limiting
values; 2a < p so that the fans overlap when data are in R. (b)
Behavior at a Corner Point 2a > p; case 1. Fans APA0 and BPB 0 of constant radial limits
appear. Limiting value changes strictly monotonically as
Lancaster and Siegel (L and S) studied the behavior of approach direction changes from A0 P to B 0 P. (c) 2a > p; case 2.
the limits (which they designate by Ru) of bounded In addition to the two fans adjacent to the sides of the
solutions of [16] or of [18] along radial segments wedge, a half plane of constant radial limits appears.
Capillary Surfaces 443

If 2 > , then the fans do not overlap, and 2


in fact continuity at P cannot in general be 2( )

expected. Outside the indicated fan regions adja-
+
cent to the wedge sides, the limit values either D2
D1

change strictly monotonically with angle of (D)


approach, as in Figure 18b, or else they do so (ID), (D), (I )
except for approaches within a third, central fan,
which covers a full half-space, and interior to
2
which the limiting values again remain constant,
see Figure 18c. L and S give an example under R
which that behavior actually occurs. Remarkably, 2( )
in the example the prescribed data are the same on Continuous, (I), (D)
both boundary segments. The solution is never-
+
theless discontinuous at P, with an interval in D1

D2
which the radial limit increases, another interval in (DI ), (D), (I )
which it decreases, two fans of constant limit (I )

adjacent to the sides, and a fan of breadth  in- 0


between. 0 2 1
General conditions for continuity at a reentrant Figure 19 p < 2a < 2p. Possible modes of behavior. Repro-
corner (2 > ) have not yet been established. L and duced with permission from the Pacific Journal of Mathematics.
S give a sufficient condition, depending on a
hypothesis of symmetry. Since no such hypothesis possibility is that the drop surface S is part of a
is needed when 2 < , one might at first expect it sphere. For data in D
1 , no such drop can exist,
to be superfluous. However, Shi and Finn showed barring exotically singular behavior at the vertex
that by introducing an asymmetric domain perturba- points where the edge of the wedge meets S.
tion that in an asymptotic sense can be arbitrarily For data in D
2 the situation is less clear. Concus,
small, the solution can be made discontinuous at P. Finn, and McCuan (CFM) showed that local
That can be done without affecting any other behavior exhibiting such data is indeed possible;
hypotheses of the L and S theorem. however, they conjectured that such behavior
In as yet unpublished work, D Shi characterized cannot occur for simple drops. In conjunction with
all possible behaviors at a reentrant corner, subject the above results, they were led to the conjecture
to the validity of the ConcusFinn conjecture at a that the free surface S of any liquid drop in a planar
protruding corner. If   0 then all solutions of [16] wedge, that meets the wedge in exactly two vertices
or of [18] in a neighborhood of P in  are bounded and the wedge faces in constant contact angles
at P. The further behavior depends on the particular 1 , 2 , is necessarily spherical. Here it is supposed
data, and is indicated in Figure 19. Note the analogy only that 0  1 , 2  .
with Figure 12, although the interpretations in the The behavior of a drop of prescribed volume, as
figures differ in detail. Here the symbol I denotes the data move from the midpoint of R to the D
strictly increasing from the side 1 to 2 , except on regions along parallels to the sides of R, is displayed
the fan regions of constant limits; ID denotes in Figure 20. As one moves into the D
2 regions, the
constancy on a fan adjacent to 1 , then strictly drop detaches from one side of the wedge and
increasing, then constancy on a fan of opening , becomes a spherical cap resting on a single planar
then strictly decreasing, then constancy on a fan surface, in accord with the above conjecture. As D 1
adjacent to 2 . D and DI are defined analogously. is approached, the liquid becomes a drop of very
All cases can be realized in particular configurations. large radius that fills out a long thin region in the
wedge, and disappears to infinity as the boundary of
Drops in Wedges R is crossed. However, as D 1 is entered, the
configuration transforms smoothly into a spherical
Closely related to the material just discussed is the liquid bridge, connecting the two faces of the wedge
question of the possible configurations of a con- without contacting the wedge line.
nected drop of liquid placed into a wedge formed by
intersecting plates of possibly differing materials, in
Stability Questions
the absence of gravity. Thus, one has distinct
contact angles 1 , 2 on the two plates. Finn and A number of authors, for example, Langbein, Vogel,
McCuan showed that if (1 , 2 ) 2 R then the only Finn and Vogel, Steen, and Zhou, have studied the
444 Capillary Surfaces

causes the entire fluid to disappear to infinity in the


wedge.
CFM proved that if a connected liquid mass with
spherical outer surface S cuts off areas jW1 j, jW2 j
from plates 1 , 2 which it meets in angles 1 , 2 , as
in Figure 20, then
+ +
(a) In R, near D1 (a) In R, near D2 X
2  
 Wj  cos j jSj 3jV j 30
1
R

where jSj denotes area of the spherical free surface


interface, jV j the enclosed volume, and R the radius.
An immediate consequence is that the mechanical
energy E of the configuration is
(b) In R, near D1

(b) Center point 3jV j
E 31
R
where  is surface tension. Using this result, they
show that if a spherical liquid mass meets two
wedge faces in angles 1 , 2 in the absence of
gravity, then the configuration has smaller mechan-
ical energy than does any connected liquid mass of
the same volume that meets only one of the faces in
the contact angle for that face. In turn, the drop on a
single face has smaller energy than does a spherical
(c) In D1 (c) In R, near D2
ball of the same volume that meets no face. Note
(A) (B) that in all zero-gravity cases for which stability
Figure 20 (A) Drop configurations in wedge with opening relative to plate tilting can be expected, the liquid
angle 2a = 50 , for three data positions on the line g 1 = g 2 = g mass must be spherical.
(a) g = 70 (in R, near D 
1 ); (b) g = 90 (in R, near D 1 ); (c)
g = 110 (in D 1 ). The first two cases yield edge blobs, the third a
spherical tube that does not contact the edge line. (B) Drop Compressibility
configurations in a wedge of opening angle 2a = 50 , for three
data choices in R, on the line g 1 = p  g 2 = g ; (a) g = 70 (near Until very recently, all literature on capillarity was
D 
2 ); (b) g = 90 (center of R); (c) g = 35 (near D 2 ). As D 2 is

based on a hypothesis that the body of the fluid


entered, original boundary conditions can no longer be satisfied is incompressible. Indeed, from the point of view
by spherical drop, but configuration changes smoothly into drop
of macroscopic mechanical measurements, most
on single plane, with prescribed data for that plane. Reproduced
with permission from Concus P, Finn R and McCuan J (2001) liquids are nearly incompressible. But all liquids are
Liquid bridges, edge blobs, and Scherk-type capilliary surfaces. also to some extent compressible, and this property
Indiana University Mathematics Journal 50: 411441. was even conceptually essential in our characteriza-
tion in the section Gauss contribution: the energy
stability of liquid drops trapped between parallel method of the surface energy, even for the nomin-
plates, forming an annular liquid bridge joining the ally incompressible case. It is as yet unclear to what
plates under the capillarity boundary condition of extent the compressibility properties of the bulk
prescribed contact angles 1 , 2 on the respective liquid will influence the physical predictions of the
plates. These studies consider the effects of dis- theory. In this connection, see the remarks at the end
turbances within the fluid, assuming the plates are of the section Uniqueness and nonuniqueness.
rigid and perfectly parallel. CFM show that from the
The Equations I
point of view of physical prediction, the results of
these studies may be open to some question. Finn derived two possible equations extending [16]
Specifically, they show that unless the drop is and [17], arising from different modelings. Both
initially of spherical form, then infinitesimal tilting characterize equilibrium points as stationary points
of one of the plates always results in a discontinuous for the mechanical energy, and both are based on a
transition of the drop form. Depending on the hypothesized pressuredensity relation  = 0
particular data, the transition can be to a spherical (p  p0 ). The first equation takes account of
drop; however, it can also occur that the tilting the change in density with height, arising from
Capillary Surfaces 445

the gravity field. For a container consisting of a existence theorem above can no longer be expected;
semi-infinite vertical cylinder, closed at the bottom, it is possible to give explicit examples of analytic
one obtains domains, and constant data , for which no solution
0 g of the problem exists. Thus, even in a large down-
div Tu u g1  cos ! 32 ward gravity field, the solutions can emulate the

behavior of solutions of [18]. That can happen,
where ! is the angle between the upward directed however, only for data  exceeding =2. The
surface normal and the vertical axis, and is to be condition [33] is again necessary for existence.
determined by a volume constraint. Athanassenas For eqn [34],  cannot be eliminated by addition
and Finn proved that for a general smooth domain of a constant to the solution, and its determination
, prescribed , and prescribed fluid mass M subject creates a new level of difficulty toward solution of
to the restriction the physical existence question. Athanassenas and
M < 0 jj= g 33 Finn proved unique existence of solutions of [35],
[17] for a capillary tube of general smooth section 
there exists exactly one solution of [32] achieving dipped into an infinite liquid bath (which corre-
the boundary data . sponds to  = 0), when 0    =2. If  > =2 then
The condition [33] is necessary for existence with solutions do not always exist; it can happen that the
the prescribed mass. surface moves down to the bottom of the tube,
The methods used for this theorem do not permit regardless of the depth of immersion. Under a
regularity conditions to be relaxed to allow domains hypothesis of radial symmetry, Finn and Luli were
with corner points. An approximation procedure able to prove the existence of solutions with
yields an existence theorem for such cases, however prescribed mass in a semi-infinite cylinder closed at
the uniqueness proof then fails; it can be replaced by the bottom, in the range 0   < , and uniqueness
a weaker result, estimating the difference between if 0    =2. Note that in this case, values  >
two eventual solutions: Let u, v, be solutions of [32] =2 are not excluded. For large enough mass, the
in a piecewise smooth domain , and suppose  surface will always cover the base of the tube.
Tu   Tv on  = @ except at the corner points,
where no data are prescribed. Then
Closing Remarks
u  v =0 34
This brief survey is intended only as a general
throughout .
indication of the current state of the theory; much
Note that in this result, no growth condition is
material of interest could not be included. Nor have
imposed at the corner points. It can happen that
we addressed hysteresis effects on contact angle.
both u and v are unbounded at a corner point;
Detailed references to the material discussed and also
nevertheless, [34] holds uniformly over .
to further information can be found in the articles
The solutions of [32] emulate many of the
listed below. More recent publications can be located
characteristics of solutions of [16]. Notably, there is
by following links in MathSciNet or Zentralblatt.
again a dichotomy of behavior, depending on open-
ing angle 2 at a corner point, with all solutions
either bounded, or unbounded with growth like 1=r. Acknowledgmnt
I owe a special debt of thanks to my colleague
The Equations II Paul Concus, who read the material in detail and
provided many effectual suggestions, leading to a
If in addition to taking account of the change of density
much-improved exposition.
with height, one accounts for the energy change due to
expansion or contraction of volume elements with See also: Compressible Flows: Mathematical Theory;
changing density, one is led to the equation Interfaces and Multicomponent Fluids; Newtonian Fluids
0  p0 gu and Thermohydraulics.
div Tu e  1

g1  cos !  35 Further Reading
Here the changes from the incompressible case are References for text material and for further reading are cited in
much more significant than for [32]. In order to the expository articles:
ensure stable behavior of solutions, it seems appro- Finn R (2002a) Milan Journal of Mathematics 70: 123.
priate to impose the condition 0 > p0 . The general Finn R (2002b) Mathematical Intelligencer 24: 2133.
446 Cauchy Problem for Burgers-Type Equations

Cartan Model see Equivariant Cohomology and the Cartan Model

Cauchy Problem for Burgers-Type Equations


G M Henkin, Universite P.-M. Curie, Paris VI, Equation [2] first appeared for (F) = a b  F,
Paris, France " = 1, x = n 2 Z, in Levi, Ragnisco, Bruchi (1983) as
2006 Elsevier Ltd. All rights reserved. a semidiscrete equation reducible to the linear
equation
dGn t
aGn1 t  Gn t
dt
Burgers Type Equations
by the substitution
We consider here two types of equations: the scalar  
partial differential equations (PDEs) of the form a Gn t  Gn1 t
Fn; t 
b Gn t
@f @f @2f
f " 2; ">0 1 Equation [2] for general (F) was introduced by
@t @x @x
Henkin, Polterovich (1991) for the description of a
f = f (x, t), x 2 R, t 2 R , and the scalar difference Schumpeterian evolution of industry. For any " > 0,
differential equations of the form one can consider [2] as the family of difference
differential equations, depending on the parameter
@F Fx; t  Fx  "; t  = {x="} 2 [0, 1), where {x="} denotes the frac-
F 0; ">0 2
@t " tional part of x=". For physical applications of [1]
F = F(x, t), x 2 R, t 2 R . (see Gelfand (1959), Landan and Lifschitz (1968),
Equation [1] for the case of linear f 7! (f ) Lax (1973)), the inviscid case (" = 0) is the most
was called as Burgers equation by Hopf (1950), interesting. But, for some special physical models
who justified this by the remark: equation was and for some social and biological applications (see
first Henkin, Polterovich (1991), Serre (1999)), the
interesting case concerns eqn [2] with " = 1 and
@f @f @2f x 2 Z.
f " 2
@t @x @x The results considered in this article concern
mainly the Cauchy problem for eqns [1] and [2]
introduced by J. M. Burgers (1940) as a simplest with initial data f(x, 0), F(x, 0) satisfying the
model to the differential equations of fluid flow. In conditions
fact, eqn [1] for linear (f ) was introduced earlier in
1915 by Bateman. Equation [1] for general (f ) f x; 0 !  ; x ! 1
appeared later in very different models, for example, Z 0
in the model for displacement of oil by water, in a jf x; 0   jdx 3
model of road traffic, etc. 1
Z 1
For (f ) = a b  f , Hopf and Cole have studied
j  f x; 0jdx < 1
[1] basing on the substitution 0
  
1 @g and correspondingly
f  a" g
b @x
Fk" " !  ; k ! 1
reducing [1] to the heat equation X
0
jFk" "; 0   j
@y @2g k1 4
" 2
@t @x X
1
j  Fk" "; 0j < 1
This transformation (often called as the Hopf k0
Cole transform) appeared for the first time in 1906
in the book of Forsyth Theory of differential where    ,  2 [0, 1) and the mapping
equations.  7! {F(k" ", 0)  sgn k , k 2 Z} 2 l1 is smooth.
Cauchy Problem for Burgers-Type Equations 447

The standard classical questions concerning From references one can deduce the following gene-
Cauchy problems [1], [3] and [2], [4], namely ral properties of Cauchy problems [1], [3] and [2], [4].
those relating to existence, unicity, regularity, and
Theorem 0 Under Assumption 1, we have:
conservation laws are well established (see Oleinik
(1959), and Serre (1999)). This section formulates (i) There exists a unique (weak) solution f(x, t), x 2
only those which are essential for the study R, t 2 R of the problem [1], [3]; this solution is
of asymptotic behavior of solutions f(x, t) and necessarily smooth for t > 0; besides, it satisfies
F(x, t), when t ! 1 or " ! 0, and of the relation the following conservation laws for t > 0:
between vanishing viscosity and difference scheme
f x; t !  ; x ! 1
approximations for inviscid Burgers type
equations. f x; t !  ; x ! 1
One can see that asymptotic behavior of solutions Z 1  Z 0
d
of [2], [4] when " ! 0 is not the same as the   f x; t dx  f x; t   dx
dt 0 1
asymptotic behavior of [1], [3] when " ! 0, in Z 
spite of fact that in the limiting case " = 0 both [1] ydy
and [2] look identical. It can be explained by the 
fact that eqn [2] can be interpreted as a semidiscrete Moreover, if the initial value f(x, 0) is nonde-
approximation of the nonconservative (nonphysical) creasing as a function of x, then solution f(x, t)
equation is nondecreasing as a function of x for all t  0.
@F @F " @2F (ii) There exists a unique solution F(x, t) x 2 R, t 2
F F 2 R of the problem [2], [4]; this solution is
@t @x 2 @x
smooth for t > 0; besides, it satisfies the follow-
However, the problem [2], [4] can be naturally ing conservation laws for t > 0 and  2 [0, 1):
transformed into conservative (physical) initial pro-
blem. Indeed, the substitution Fk" "; t !  ; k ! 1

Z F Fk" "; t !  ; k ! 1
dy " #
f 1 Z  Z Fk"";t
0 y
d X dy X0
dy

dt k1 Fk"";t y k1  y
(under condition of integrability of 1=(y)) trans-
forms [2] into the equation   

@f x; t f x; t  f x  "; t Moreover, if for some  2 [0, 1) the F(k" ", 0) is


0 5 nondecreasing as a function of k 2 Z then solution
@t "
F(k" ", t) is also nondecreasing as a function of
where 0 (f ) = (F). Equation [5] is the so-called k 2 Z for all t  0 and the same .
monotone one-sided semidiscrete approximation of
conservative viscous equation,
 
@f @f " @ @f Gelfands Problem and IljinOleinik
F F 6
@t @x 2 @x @x Theorem
where The main results considered in this article are related
Z 
to the following problem, formulated explicitly by

dy Gelfand (1959): to find the asymptotic (t ! 1) of the
f x; 0 ! ; x ! 1
0 y solution f (x, t) of the eqn [1] with the initial condition
 
The results of finite-difference approximations  ; if x > x
for nonlinear conservation laws (see A. Harten, f x; 0 0 7
f x; if x 2 x ; x 
J. Hyman, P. Lax (1976)) explain both the similarity
of behavior of [6] and [5] as well as some difference where    .
in the behavior of [1] and [2]. Gelfand found a solution to this problem for the
For further exposition the following assumption is inviscid case " = 0 with initial conditions
useful: f (x, 0) =  if x < 0, and f (x, 0) =  if x  0 (see
below), and remarked that it would be interesting to
Assumption 1 Let in [1], [2] be a positive and prove that the main term of the asymptotic (t ! 1)
continuously differentiable function on the interval of f (x, t) satisfying [1], [7] coincides with the
[ ,  ]. Let 0 have only isolated zeros. solution of [1], [7] for " = 0.
448 Cauchy Problem for Burgers-Type Equations

Gelfands problem admits natural extension for N-wave has been obtained by Dafermos (1977)
eqn [2] with the initial conditions and Liu (1978).
For the case of a general (f ), in particular, for
Fx; 0  ; if x > x the case of nonincreasing (f ), we need the notion
8
Fx; 0 F0 x; if x 2 x ; x  of shock profile. Following Serre (1999), three
definitions can be introduced.
Let us Rintroduce, for u 2 [ ,  ], the function
u Definition The initial problem [1], [3] (correspond-
(u) =   (y)dy. Let the function (u), u 2
[ ,  ], be upper bound of the convex set ingly, [2], [4]) admits ( ,  )-shock profile ( <  )
if there exists a traveling-wave solution of this equation,
fu; v: v  u; u 2  ;  g that is, of the form f = f(x  ct) (correspondingly,
F = F(x  Ct)), such that f(x) !  when x ! 1
By Assumption 1, the set s = {u 2 [ ,  ]:
(correspondingly, F(x) !  when x ! 1).
(u) < (u)} is the finite union of intervals,
s= ( , 0 ) [ (1 , 1 ) [   (L ,  ), where  = 0  From the results of Gelfand (1959) and Oleinik
0  1 < 1    L  L =  . (1959), it follows that initial problem [1], [3] admits
Let us define the function f(x, t) by ( ,  )-shock profile iff
8  Z 
< ; if x <   t 1
^f x; t ^0 1 x=t; if   t  x    t c ydy
:    
 ; if x >   t Z u
1
0 < ydy; 8u 2  ;  9
where in the case (u)  l , u 2 (l , l ), l = 0, u    
0
1, . . . , L; also, by definition, ( )( 1) (l ) = [l , l ]. From the results of Henkin and Polterovich
Theorem 1 (Gelfand) The solution f (x, t) of the (1991) and Belenky (1990), it follows that initial
problem [1], [7] for the case " = 0 and initial problem [2], [4] admits ( ,  )-shock profile iff
conditions f (x, 0) =  , if x > 0, has the explicit Z 
form: f (x, t) = f(x, t). 1 1 dy

C     y
The analogous statement is valid also for the Z u
1 dy
problem [2], [8] if, in the construction above, one > 
; 8u 2  ;  10
u  u  y
takes
Z u In the case " = 0, the equality in [9] and [10] is
dy
u called the RankineHugoniot condition, the inequal-
0 y ity in [9] and [10] is called the entropy condition (or
instead of (u), u 2 [ ,  ]. the GelfandOleinik condition).
The Gelfand problem for [1], [3] and [1], [7] with Definition For initial problem [1], [3] (correspond-
monotonic (f ) was solved by Iljin and Oleinik ingly, [2], [4]) admitting ( ,  )-shock profile and
(1960). In the case  =  , the solution of this for " = 0, we will call by shock waves the weak
problem follows from an earlier work of Lax (1957). solutions of [1], [3] (correspondingly, [2], [5], [4]) of
For the case of linear (f ), the solution of this problem the form
follows from an earlier work of Hopf (1950).

For semidiscrete initial problems [2], [4] and [2], ~f  x  ct  ; if x  ct
[8], the analog of the asymptotic results of Hopf and

IljinOleinik have been obtained and applied by ~ x  Ct  ;
F if x  Ct
Henkin and Polterovich (1991).
The case of increasing (f ) has been studied in where c, C satisfy RankineHugoniot and entropy
detail. In this case, for both initial problems [1], [3] conditions [9], [10].
and [2], [4], there is uniform convergence of solutions Definition The ( ,  )-shock profile for [1] (cor-
f (x, t) and F(x, t) to the so-called rarefaction profile respondingly, for [2]) is called strict if in addition to
  [9], [10] we have the Lax (1954) condition:
 ; x >  t
gx=t 1
x=t; x 2   t;   t  < c <  11
t ! 1 (see Iljin and Oleinik (1960) and Henkin and correspondingly
and Polterovich (1991)). More precise result in
this case about convergence to the so-called  < C <  12
Cauchy Problem for Burgers-Type Equations 449

The ( ,  )-shock profile for [1] or [2] is called The values of d0 and D0 are determined by
semicharacteristic if one of the inequalities in [11] or Z d0 Z 1
[12] is strict and the other is an equality. This profile f x; 0   dx f x; 0   dx 0
is called characteristic if both inequalities in [11] or 1 d0
Z D0 Z
[12] are equalities. 1
Fx; 0   dx Fx; 0   dx 0
One can check (Iljin and Oleinik 1960, Henkin and 1 D0
Polterovich 1991) that if in addition to Assumption 1
the function on [ ,  ] is nonconstant and
nonincreasing then eqn [1] (correspondingly, [2])
admits a strict ( ,  )-shock profile. Remarks
The main result of IljinOleinik (1960) for eqn [1] (i) The statements of Theorem 2 give a positive
and analogous statement of Henkin and Polterovich answer to Gelfands question for the case of
(1991) for eqn [2] can be presented as follows. initial problem [1], [3] and [2], [4], admitting
Theorem 2 strict shock profiles.
(ii) For linear (f ) = a bf , a > 0, a b > 0,
(i) Let the initial problem [1], [3] admit a strict b < 0, the traveling waves f, F for [1], [3] and
( ,  )-shock profile f. Let f (x, t), x 2 R, t 2 [2], [4] can be found explicitly:
R , be a solution of [1], [3]. Then there exists
d0 2 R   
~f 
sup jf x; t  ~f x  ct  d0 j ! 0; t ! 1 13 1 expfpx  ctg
x2R b    b
c a   ; p
The value of d0 is determined uniquely by relation 2 2"

Z 1 ~    
F
ff x; 0  ~f x  d0 g dx 0 1 expfPx  Ctg
1 
a b  a b
(ii) Let the initial problem [2], [4] admit a strict C b ln 
; P ln
a b " a b
( ,  )-shock profile F. Let F(x, t), x 2 R, t 2
R be a solution of [2], [4]. Then there exists where
continuous function D0 (),  2 [0, 1), such that    
 a b
~  Ct  D0 fx="gj ! 0;
sup jFx; t  Fx b  b 1 
a b
x2R 14
t!1 (iii) For initial problems [1], [7] and [2], [8],  >
The function D0 (),  2 [0, 1], is determined  , the asymptotic convergence statements
uniquely from relation [13][15] admit the precise asymptotic esti-
mates (see Iljin and Oleinik (1960) for [1], [7]:
X
1
~  D0 g 0
fFn; 0  Fn
k1 sup jf x; t  ~f x  ct  d0 j Oet
x2R 16
where  > 0; " > 0
Z A
dy ~<A
F ; F < A; F ~  Ct  D0 fx="gj Oet
sup jFx; t  Fx
F y
x2R
(iii) If in conditions (i) and (ii), we take " = 0 then  > 0; " > 0 17
there exist d0 , D0 such that 8 > 0, we have

sup j  f x; tj f x; t  for  x > ct d0


xctd0  t  t0 ; " 0
18
sup j  f x; tj ! 0; t!1 Fx; t  for  x > Ct D0
xctd0 

15 t  t0 ; " 0
sup j  Fx; tj
xCtD0 

sup j  Fx; tj ! 0; t!1 Theorem 2(i) is proved basing on the following
xCtD0  idea. Let f satisfy the initial problem [1], [3] and let
450 Cauchy Problem for Burgers-Type Equations

f(x  ct d0 ) be ( ,  )-shock profile for [1], and, correspondingly,


satisfying condition [13]. Put X
1
Z x ~
fFk" "  D0  Fk" "; 0g 1
x; t ff y; t  ~f y  ct  d0 gdy 1
1
So, the crucial argument, related to conservation
The function (x, t) satisfies the nonlinear parabolic law, does not hold.
equation
One can extend the important Theorems 2(i), 2(ii)
@ @ @  2 for the case of nonstrict shock profiles in two different
~f 1  f " 2 ways: by changing conditions of these theorems or by
@t @x @x
changing conclusions of these theorems.
where (x, t) is some smooth function of (x, t) with The first method (started by Mei, Matsumura, and
values in [0, 1]. Nishihara in 1994) was completed by the following
Besides, by conservation law of Theorem 0(i), we L1 -asymptotic stability result (Serre 2004).
have (x, t) ! 0, x ! 1, 8t  0.
Estimates basing on maximum principle and Theorem 3 (FreistuhlerSerre). Let eqns [1], [2]
appropriate comparison statements give that admit ( ,  )-shock profiles and f, F the corre-
(x, t) ) 0, x 2 R, t ! 1. It implies that sponding train-wave solutions of [1], [2]. Let
f (x, t), F(n, t), x 2 R, n 2 Z, t 2 R be solutions of
f x; t  ~f x  ct  d0 ) 0; x 2 R; t ! 1 eqns [1], [2] with such initial conditions that
Z 1
Theorem 2(ii) is proved in a similar way. Let F(n, jf x; 0  ~f xjdx < 1
t) satisfy the initial problem [2], [4] with x = n 2 1
Z, " = 1,  = {x} = 0, and let F(n  Ct  D0 ) be X
1
~
jFn; 0  Fnj <1
( ,  )-shock profile for [2], satisfying condition 1
[14]. Put
Then
X
n Z 1
n; t ~  Ct  D0 g
fFn; t  Fn
1
jf x; t  ~f x  ct  d0 jdx ! 0
1

Then function (n, t) satisfies the semidiscrete and, correspondingly,


parabolic equation
X
1
~  Ct  D0 j ! 0;
jFn; t  Fn t!1
dn; t
1 F 1
dt
~ where constants d0 and D0 are calculated from the
1  Fn  1; t  n; t
same relations as in Theorem 2.
where (n, t) is some function with values in [0, 1]. Remark For the inviscid case " = 0, the state-
Besides, by conservation law of Theorem 0(ii), we ment of Theorem 3 is still valid for equations
have admitting strict shock profiles, but generally is not
n; t ! 0; n ! 1; 8 t  0 valid for equations admitting only nonstrict shock
profiles (see Serre (2004)).
Estimates, basing on generalized maximum prin-
The second method permits, keeping initial con-
ciple and comparison statements, give that
ditions [3], [4], to localize the positions of viscous
(n, t) ) 0, n 2 Z, t ! 1. It implies that
shock waves for generalized Burgers equations
~  Ct D0 ) 0;
Fn; t  Fn n 2 Z; t ! 1 (see the next section).

Remark For the cases of nonstrict shock profiles Asymptotic Behavior of Solutions of
(characteristic or semicharacteristic) the statements Generalized Burgers Equations
of Theorem 2 are not valid. The reason is that,
under initial conditions [3], [4] for any d0 and D0 , The main current interest and the main difficulty in
we have the study of Gelfands problem for generalized
Z 1 Burgers equations consist in the following question
ff x; o  ~f x  d0 gdx 1 formulated explicitly for initial problem [1], [3] by
1 Liu et al. (1998): In the Cauchy problem there is
Cauchy Problem for Burgers-Type Equations 451

the question of determining the location of viscous show, on the contrary, that characteristic shock
shock waves. A similar question and related profiles and, as a consequence, the behavior of
conjecture were formulated by Henkin and Potter- initial problems [1], [3] and [2], [4] as in Theorem
ovich (1999) for the initial problem [2], [4]. 4 are rather a rule than an exception.
For solving this problem, it is important to solve it (ii) The statement of Theorem 4(i) (and also of
first for the Burgers type equations admitting Theorem 5(i)) below) disprove the Gelfand hope
nonstrict shock profiles. that the main term of asymptotic (t ! 1) of
f (x, t), satisfying [1], [7], coincides with the
Theorem 4 (HenkinShananinTumanov).
solution of [1], [7] for = 0 with the same
(i) Let the initial problem [1], [3] admit the nonstrict initial condition. Indeed, in conditions of Theorem
( ,  )-shock profile [9] and f(x  ct) be a 4, we have ( ) = c or ( ) = c, but 0 ( ) 6
corresponding traveling-wave solution. Let 0 ( ); then for any > 0 the traveling wave
f(x  ct  0 ln t  d0 ) for [1], [3], concentrated
0  6 0; if  c
near the point x (t) = ct 0 ln t d0 , moves
0  6 0; if  c away (t ! 1) from the shockwave for [1], [7] for
= 0, concentrated near the point x0 (t) = ct
Let f (x, t) be a solution of [1], [3]. Then there
o( ln t), where o( ln t)= ln t ! 0, t ! 1.
exist constants 0 and d0 such that
(iii) Theorem 4 (and also Theorem 5 below) also
sup jf x; t  ~f x  ct  0 ln t  d0 j ! 0; t!1 illustrate another interesting phenomenon: for
x2R
the case 0 ( ) 6 0 ( ), one has asymptotic
where convergence of the solution of [1], [3] (corre-
spondingly of [2], [4]) to the traveling
    0
8 wave f(x  ct  0 ln t  d0 ) (correspondingly
0
< 1=  ;
> if  > c  F(x  Ct  0 ln t  D0 )), which does not
0 
1=  ; if  c >  satisfy eqn [1] or correspondingly eqn [2]. Such
>
:
1=0   1=0  ; if  c  a phenomenon was first discovered by Liu and
Yu (1997) in the special boundary-value pro-
(ii) Let the initial problem [2], [4] with = 1 admit the
blem for the classical Burgers equations, if
nonstrict ( ,  )-shock profile [10] and F(n  Ct)
u(x, t) satisfies the following conditions:
be a corresponding traveling-wave solution. Let
0  6 0; if  C if ut u  ux uxx ; u0; t 1; u1; t 1;
x
0  6 0; if  C ux; 0 th ; then
2
Let F(n, t) be a solution of [2], [4]. Let 1
jux; t th x  ln1 tj ! 0; t ! 1; x  0
def 2
Fn; 0 Fn; 0  Fn  1; 0  0
Theorem 4 is proved in basing on the following
Then there exist constants 0 and D0 such that idea. Let f (x, t) satisfy [1], [3] and F(n, t) satisfy [2],
~  Ct  0 ln t  D0 j ! 0;
sup jFn; t  Fn [4]. Let f(x  ct) be the traveling wave for [1], [3]
n2Z and F(n  Ct) be the traveling wave for [2], [4].
t!1 Suppose that ( ) > c = C = ( ). Let dA (t) and
DA (t), A > 0 be functions such that
where
Z ctApt
    0 ~
8
C=20  ; if  > C  p ff x; t  f x  ct  dA tgdx 0 19
>
> ctA t
>
< C=20  ; if  C > 
and, correspondingly,
>
> C=21=0 
>
:
1=0  ; 
if  C  p
X t
CtA
~  Ct  DA tg
fFk; t  Fk
p
kCtA t
Remarks p p p
Ct A t  Ct A tFCt A t 1; t
(i) One could think that nonstrict shock profiles p
~
 FCt A t 1  Ct DA t 0
as in Theorem 4 can appear only in exceptional
cases. But Proposition 2 and Theorem 5 below 20
452 Cauchy Problem for Burgers-Type Equations

The relations [9], [20] can be called localized initial problems [1], [3] and [2], [4] and some
conservation law. The proof contains two difficult partial results which confirm this conjecture. To
parts. p simplify formulation we admit the following.
The first part consists in
pproving
that for A > 2 c
Assumption 2 Let (u) and (u) be upper bounds of
(correspondingly, A > 2 C) the following asymp-
the convex hulls for the graphs of
totics are valid:
Z u
 ln t u  ydy
dA t d0 o1; t ! 1 

  0 
C ln t and
DA t D0 o1; t ! 1 Z
2   0 
 u
dy
21 u
 y
where d0 , D0 are independent of A. respectively, with u 2 [ ,  ]. We suppose that
The second part gives the following convergence
statements: s fu 2  ;   : u < ^ug
Z
x  ; 0 [ 1 ; 1 [    L ; 
sup ~
p p p ff y; t  f y  ct  dA tg
x2ctA t;ctA t ctA t
where
dy ! 0; t!1
   0 < 0 <  1 < 1 <    <  L < L 
X n
sup fFk; t
p p or, correspondingly,
x2CtA t;CtA t kCtApt
^
~  Ct  DA tg ! 0; t ! 1
 Fk S fu 2  ;   : u < ug
 ; b0 [ a1 ; b1 [    aM ; 
The precise a priori estimates of local solutions of
[1], [2] play an important role in the proof. An where
example of such an estimate, also useful for further
results, is given below.  a0 < b0 < a1 < b1 <    < aM < bM 
Proposition 1 Let, in eqn [2], C = (0) > 0, =
p In addition, we suppose that 0 (l ) 6 0, 0 (l ) 6
1, 0  0 (0) < 0 , x def
= (x  Ct)= Ct . Let the func- 0, l = 0, 1, . . . , L or, correspondingly, 0 (am ) 6
tion F(x, t), defined in the domain 0 = {(x, t): a1 < 0, 0 (bm ) 6 0, m = 0, 1, . . . , M.
x < a2 }, a2 > 0, satisfy eqn [2],
Proposition 2 (Weinberger 1990, Henkin and
def
Fx; t Fx; t  Fx  1; t  0 Polterovich 1999). Under Assumptions 1, 2, one has:
 (i) If u 2 [ ,  ] n s and, correspondingly, u 2
jFx; tj  p ; x; t 2 0 ; t  t0
Ct [ ,  ] n S, then following functions are well
defined:
Then
8
> l ; if x < l  t
B >
> 1
Fx; t  ; x; t 2 0 ; t  t0
x < x=t ; if l  t  x
Ct gl  l1  t
where t >
>
>
: l1 ; if x > l1  t;
    l 0; 1; . . . ; L
1 0 
B B 0 a2 1 ln1 a2
d C and, correspondingly,
d minx  a1 ; a2  x
 8
> b ; if x < bm  t
> m1
and B0 is an absolute constant.
x > < x=t; if bm  t  x
Gl  am1  t
It is interesting to compare a priori estimate of t >
>
>
: am ; if x > am1  t;
Proposition 1 with some similar (but less precise) m 0; 1; . . . ; M
estimates in the theory of classical quasilinear
parabolic equations (Ladyzhenskaya et al. 1968). (ii) For any interval (l , l ) s and, correspond-
We will formulate now the general conjecture ingly, (am , bm ) S there exist traveling waves
concerning asymptotic behavior of solutions of fl (x  cl t) for [1] with overfall (l , l ) and,
Cauchy Problem for Burgers-Type Equations 453

correspondingly, Fm (x  Cm t) for [2] with over- (iii) For any solution F(n", t), n 2 Z, t 2 R , of initial
fall (am , bm ), where problem [2], [4], there exist shift-functions m (t):
Z l
1 
m ln t O1  m t  m ln t O1
cl ydy
l  l l 0  
l 0; 1; . . . ; L
m  m < 1;
cl l ; l 0; . . . ; L  1
such that
cl l ; l 1; . . . ; L
~
sup jFn"; t  Fn"; t; 0 ; 1 ; . . . ; M j ! 0;
and, correspondingly, n2Z
Z bm t!1
1 dy
C1
m
bm  am am y (iv) Moreover, in (iii) one can take
Cm bm ; m 0; . . . ; M  1

m m
Cm am ; m 1; . . . ; M Cm

Conjecture (Henkin and Polterovich 1994, 1999, bm  am
8
Henkin and Shananin 2004). Let > 1
>
> 0 ; if m 0 < M; a0 6 b0
>
> b m
>
>
~f x; t; 0 ; . . . ; L < 1 1

 ; if 0 < m < M
X
L L1

X X
L1 >
> 0 am 0 bm
x >
>
~f x  c t  " t g  l >
> 1
l l l l >
: 0 ; if m M > 0; aM 6 bM
l0 l0
t l0 am
X
L
The main result confirming formulated conjec-
 l ; L1
l1
tures is the following.
~
Fn"; t; 0 ; . . . ; M Theorem 5 (Henkin and Shananin). Conjecture
X
M X
M 1
n" (i) for L = 1 and corresponding conjecture (iii) for
~ m n"  Cm t  "m t
F Gm M = 1 are true, that is,for solution of initial problem
m0 m0
t [1], [3] there exist shift functions l (t) = O (ln t) such
X
M 1 X
M that for t ! 1 we have
 bm  am ; M1 8
m0 m1 < ~f 0 x  c0 t  "0 t; if x  c0 t
f x; t7! 1 x=t; if c0 t  x  c1 t
Then under Assumptions 1, 2, the following state- :~
ments are valid: f 1 x  c1 t  "1 t; if x  c1 t

(i) For any solution f (x, t), x 2 R, t 2 R , of ini- and for solution of initial problem [2], [4] there exist
tial problem [1], [3], there exist shift-functions l (t): shift functions m (t) = O(ln t) such that for t ! 1
we have
l ln t O1  l t  l ln t O1 8
0  l  l < 1; l 0; 1; . . . ; L > ~ n"  C0 t  "0 t; if n"  C0 t
F
>
< 01
n"=t; if C0 t  n"
such that Fn"; t7!
>
>  C1 t
:~
F1 n"  C1 t  "1 t; if n"  C1 t
sup jf x; t  ~f x; t; 0 ; 1 ; . . . ; L j ! 0;
x2R
The proof of Theorem 5 is of the same nature as
t!1 the proof of Theorem 4.
(ii) Moreover, in (i) one can take Remarks
l l (i) The proof of stronger Conjectures (ii) and (iv)
"
for L = 1 or M = 1 are in preparation.
8
l  l
> 1 (ii) The numerical results, Rykova and Spivak (pre-
>
> ; if l 0 < L; 0 6 0
>
> 0
 print, 2004), confirm conjecture (iii) for M = 2.
>
< 1 l 1 (iii) The results of Weinberger (1990) and Henkin

 0 ; if 0 < l < L
>
> 0 
l  l and Polterovich (1999) confirm convergence
>
> 1
>
> statements of Conjectures (i), (iii) for all L and
: 0 ; if l L > 0; L 6 L
l M, but only on the intervals of rarefaction
454 Cauchy Problem for Burgers-Type Equations

profiles: x 2 [(l )t, (l 1 )t] or, correspond- Henkin GM and Polterovich VM (1999) A difference-differential
ingly, x 2 [(bm )t, (am 1 )t], t > 0. analogue of the Burgers equation and some models of
economic development. Discrete and Continuous Dynamical
The problem of finding asymptotics (t ! 1) of Systems 5: 697728.
solutions of (viscous) conservation laws has been Henkin GM and Shananin AA (2004) Asymptotic behavior of
solutions of the Cauchy problem for Burgers type equations.
posed originally not only for generalized Burgers Journal Mathematiques Pure et Appliquee 83: 14571500.
equations but also for systems of conservation laws in Henkin GM, Shananin AA, and Tumanov AE (2005) Estimates
one spatial variable (see Gelfand (1959)). In this for solutions of Burgers type equations and some applications.
direction many important results on existence and Journal Mathematiques Pure et Appliquee 84: 717752.
asymptotic stability of viscous shock profiles (con- Hoff D and Zumbrun K (2000) Asymptotic behavior of multi-
dimensional viscous shock fronts. Indiana University Mathe-
tinuous and discrete) have been obtained and applied matical Journal 49: 427474.
(see Benzoni-Gavage (2004), Lax (1973), Serre Hopf E (1950) The partial differential equation ut uux =
uxx .
(1999), Zumbrun and Howard (1998) and references Communications in Pure and Applied Mathematics 3: 201230.
therein). The results of type of Theorems 4,5 have not Iljin AM and Oleinik OA (1960) Asymptotic behavior of the
yet been obtained for systems of conservation laws. solutions of the Cauchy problem for some quasilinear
equations for large values of time. Mat. Sbornik 51: 191216
It is also very interesting to study asymptotic (in Russian).
behavior of scalar (viscous) conservation laws in Ladyzhenskaya OA, Solonnikov VA, and Uralceva NN (1968)
several spatial variables (continuous or discrete), Linear and Quasilinear Equations of Parabolic Type. Amer.
basing on the asymptotic properties of Burgers type Math.Soc.Transl. Monogr. vol. 23. Providence, RI.
equations. In this direction there have been several Landau LD and Lifschitz EM (1968) Fluid Mechanics. Elmsford,
NY: Pergamon.
important results and problems (see Bauman and Lax PD (1954) Weak solutions of nonlinear hyperbolic equation
Phillips (1986), Henkin and Polterovich (1991), and their numerical computation. Communications in Pure
Hoff and Zumbrun (2000), Serre (1999), and Applied Mathematics 7: 159193.
Weinberger (1990), and references therein). Lax PD (1957) Hyperbolic systems of conservation laws, II.
Communications in Pure and Applied Mathematics
10: 537566.
Lax PD , (1973) Hyperbolic systems of conservation laws and the
mathematical theory of shock waves. Conference Board of the
Mathematical Science, Monograph 11. SIAM.
Further Reading
Levi D, Ragnisco O, and Brushi M (1983) Continuous and discrete
Bauman P and Phillips D (1986b) Large-time behavior of matrix Burgers Hierarchies. Nuovo Cimento 74: 3351.
solutions to a scalar conservation law in several space Liu T-P (1978) Invariants and asymptotic behavior of solutions of
dimensions. Transactions of the American Mathematical a conservation law. Proceedings of American Mathematical
Society 298: 401419. Society 71: 227231.
Belenky V (1990) Diagram of growth of a monotonic function and Liu T-P, Matsumura A, and Nishihara K (1998) Behaviors of
a problem of their reconstruction by the diagram. Preprint, solutions for the Burgers equation with boundary correspond-
CEMI Academy of Science, Moscow, 144 (in Russian). ing to rarefaction waves. SIAM Journal of Mathematical
Benzoni-Gavage S (2002a) Stability of semi-discrete shock profiles Analysis 29: 293308.
by means of an Evans function in infinite dimension. J.Dyn. Liu T-P and Yu S-H (1997) Propagation of stationary viscous
Diff. Equations 14: 613674. Burgers shock under the effect of boundary. Archieves for
Burgers JM (1940) Application of a model system to illustrate Rational and Mechanical Analysis 139: 5792.
some points of the statistical theory of free turbulence. Proc. Oleinik OA (1959) Uniqueness and stability of the generalized
Acad. Sci. Amsterdam 43: 212. solution of the Cauchy problem for a quasi-linear equation.
Dafermos CM (1977) Characteristics in hyperbolic conservation Usp.Mat.Nauk 14: 165170. ((1963) American Mathematical
laws. A study of structure and the asymptotic behavior of Society Translations 33).
solutions. In: Knops RJ (ed.) Nonlinear Analysis and Serre D (1999) Systems of Conservation Laws, I. Cambridge:
Mechanics: HeriotWatt Symposium, vol. 17, pp. 158. Cambridge University Press.
Research Notes in Mathematics, London: Pitman. Serre D (2004) L1 -stability of nonlinear waves in scalar
Gelfand IM (1959) Some problems in the theory of quasilinear conservation laws. In: Dafermos C and Feireisl E (eds.)
equations. Usp. Mat. Nauk 14: 87158 (in Russian). ((1963) Handbook of Differential Equations, pp. 473553. Elsevier.
American Mathematical Society Translations 33). Weinberger HF (1990) Long-time behavior for a regularized
Harten A, Hyman JM, and Lax PD (1976) On finite-difference scalar conservation law in the absence of genuine non-
approximations and entropy conditions for shocks. Commu- linearity. Annales de Linstitut Henri Poincare (C) Analyse
nications in Pure and Applied Mathematics 29: 297322. Nonlineaire.
Henkin GM and Polterovich VM (1991) Schumpeterian dynamics as Zumbrun K and Howard D (1998) Poinwise semigroup methods
a nonlinear wave theory. Journal of Mathematical Economics and stability of viscous shock waves. Indiana University
20: 551590. Mathematical Journal 47: 63185.
Cellular Automata 455

Cellular Automata
M Bruschi, Universita di Roma La Sapienza, Rome, (iia) if the box is empty and the box on its left is
Italy empty then put a ball in the box;
F Musso, Universita Roma Tre, Rome, Italy (iib) if there is a ball in the box and also there is a ball
2006 Elsevier Ltd. All rights reserved. in the box on its left then empty the box.
An example of the evolution of such a rather trivial
CA is given in Figure 1.
What is a Cellular Automaton? A more precise notation can now be established.
First, let us denote the state of a cell at time t by a
Cellular automata (CAs) were first introduced by state function, say S. According to the point (iib)
J von Neumann in his investigation of complexity, above, the number of possible states is arbitrary but
following an inspired suggestion by S Ulam. But in the finite: denote this number by the positive integer M
last 50 years they have been investigated and used in a (M > 1). Then S takes values on a finite field, say
number of fields; widely different terminologies have ZM = Z=MZ = {0, 1, 2, . . . , M  1} (in plain words,
been used by researchers that now it is difficult even we have denoted the M states for the CA by the
to give a precise general definition of a CA. Thus, first M non-negative integers). Different cells can be
some definitions and approximations are in order. labeled with a progressive number: c(n), n = n1 , n1
First a broad definition: 1, . . . , n2  1, n2 ; possibly, in case of an infinite
1. have a number of cells (boxes); number of cells, one has n1 ! 1 and/or
2. at any (discrete) time step, any cell can present n2 ! 1. In the case of n1 = 1, n2 = 1, one
itself in a certain state among a finite number speaks of a unidimensional CA. Of course, the field S
of different states; depends on n as well as on time (remember that, for a
3. the state of any cell can change (evolve) from a CA, time is a discrete variable: t = 0, 1, 2, . . .). The
time step to the subsequent time step; and field S(n, t) describes completely the CA. If the EL is
4. there is a rule (evolution law, EL) which deterministic, then one can determine (com-
determines this transition. pute) S(n, t) step by step for t > 0 from the initial
configuration S(n, 0) (initial datum, ID). Consider
Note that the number of cells can be finite or infinite; only static ELs, namely those that do not change in
the cells can be arranged on a line, on a surface, in the time. A further distinction can be made: there are
ordinary three-dimensional (3D) space, or possibly in a ELs such that the future state of the generic cell,
hyperspace (in any case, the cells can be numbered); the S(n, t 1), depends on the whole current configura-
different states of a cell can be denoted by integer tion of the CA (these are called nonlocal ELs) and
numbers but, in different contexts of application of there are ELs for which S(n, t 1) depends only on
CAs, different imaginative pictures have also been used
(e.g., different colors, dead and living cells, number of
balls in a box, etc.); the evolution of a CA proceeds in
c (1) c (2) c (3) c (4) c (5) c (6) c (7)
finite time steps (time is also discrete); the EL, provided
that it is effective on any possible configuration of a t=0
given CA (computability), is otherwise completely
arbitrary (indeed, there are not only deterministic and t=1
probabilistic ELs, but also those that evolve in time t=2
following a meta-EL, which in turn can be determinis-
tic or probabilistic). t=3
Consider some examples of CAs.
t=4
Example 1 (CA1) Consider a linear array of seven
boxes (cells; one can number them c(i), i = 1, 2, . . . , 7). t=5
Each box can be empty or it can contain a ball (so
t=6
there are just two states for each cell). Given a
configuration of this CA at time t, what happens at t=7
time t 1 (EL)?
Figure 1 A seven time-step evolution of CA1 starting from a
(i) the state of the first box c(1) never changes; given ID (t = 0). Note that a stable configuration has been
(ii) for each other box c(i), i = 2, 3, . . . , 7; reached at t = 6.
456 Cellular Automata

the current state of a finite number, say N, of cells S(n, t) 6 V be called population set (PS), then PS is
(local ELs): a finite set at each time.
Of course, one can easily devise an EL for which
fSn ki ; tg; i 1; 2; . . . ; N; ki 2 Z this is not true; nevertheless, the EL itself is still
) Sn; t 1 1 valid (computable), for instance,
Note that, in principle, the set of cells that Example 3 (CA3) This is an unidimensional CA,
determine, according to the EL, the future state of the namely there are infinite cells on a line (n 2 Z). The
generic cell n, could depend on n, namely one can have cells have M states and V = 0; the EL reads:
N = N(n), as well ki = ki (n), i = 1, 2, . . . , N(n) (see
CA2 below). In any case, such a set of cells is called the state of each cell cycles in the set of available states
the interaction set (IS). Moreover, the distance from 0 ! 1; 1 ! 2; . . . ; M  2 ! M  1; M 1 ! 0
the cell n of the farthest cell in the IS is called
the range R (of the interaction): R = max(jki j). If Note that the range R is zero, there is a vacuum
IS  {c(n  R), c(n  R 1), . . . c(n), . . . c(n R  1), excitation; nevertheless, the EL is effective.
c(n R)}, then this IS is called a neighborhood of
Deterministic, static, and local ELs that do not give
range R. It is, moreover, clear that, for unidimensional
rise to vacuum excitation are called normal ELs (NELs).
CA, there exists at least one infinite subset of cells that
Since M, N are finite for an NEL, one can give the
have the same state. If there is only one such subset,
then it is called the vacuum set and the state of its NEL itself as a table, considering every possible
configuration of the IS and specifying the outcome
cells is called vacuum state: let V denote the value of
for each configuration (note that there are MN
this state (0  V < M, S(n, t) n!1 ! V).
possible configurations).
Example 2 (CA2) An example of CA with
n-dependent IS (M = 2, R = 3, V = 0). This is the Example 4 (CA4) n 2 Z, M = 2, V = 0, IS  {c(n),
c(n  1), c(n 2)}, N = 3, R = 2. The EL is:
EL: the cell c(n) changes its state (0 ! 1, 1 ! 0) iff
Sn; t 0 0 0 0 1 1 1 1
(i) n is even and at least one of the two cells on its
left is not in the vacuum state; Sn  1; t 0 0 1 1 0 0 1 1
2
Sn 2; t 0 1 0 1 0 1 0 1
(ii) n is odd and one or three of the three cells on its
Sn; t 1 0 1 1 0 1 1 0 1
right are not in the vacuum state.
An example of the evolution of such a CA is given
An example of the evolution of such a CA is given
in Figure 2. in Figure 3.
However, these NELs can also be given in an
Usually, only a subclass of ELs is considered for
alternative representation (more useful in view of the
which the phenomenon of vacuum excitation
cannot occur. Namely, during the evolution of extensions of the concept of CA itself, see below).
Namely, an NEL can be given as a discrete-time
the CA, an infinite subset of the vacuum set
EL for the state function S(n, t) in the finite field
cannot change its state in just one time step. In
ZM = {0, 1, 2, . . . , M  1}.
other words: if the set of cells starting from the
first cell and ending with the last one for which

Figure 2 Three hundred and eighty time steps of CA2, starting


from a random chosen initial configuration. Note the leftright Figure 3 Four hundred and sixty-one time steps of CA4,
asymmetry due to the asymmetry of its IS and EL. starting from a random chosen PS of 50 cells.
Cellular Automata 457

For example, the NEL above for CA4 can be


expressed as follows:
2
Sn; t 1 Sn  1; t Sn; t Sn 2; t
Sn; tSn 2; t
Sn  1; tSn; tSn 2; t 3
M
Here and in the following, the symbol denotes a
congruence mod M.
Another example is the following.
Example 5 (CA5) n 2 Z, M = 3, N = 3, V = 0, R = 1,
IS  {c(n  1), c(n), c(n 1)}. The NEL is:
3
Sn; t 1 Sn  1; t Sn; t Sn 1; t
2Sn  1; tSn 1; t 4 Figure 5 A class-1 CA: every ID rapidly evolves to
3
periodic structures; M = 3,V = 0, R = 2, EL: S(n, t 1) = S(n, t)
An example of the evolution of such a CA is given S(n  1, t)S(n 2, t).
in Figure 4.

Classification of ELs Deep and extensive computer investigations have


been exploited for unidimensional CAs with small
Considering a CA with given M > 1, N  1, the values of M, N. Surprisingly enough, it seems that
number L of possible deterministic, static ELs is the typical behavior of all these CAs can be (roughly
LM; N MM
N
5 and heuristically) classified in just four classes
(Wolfram 2002):
Of course, this number can be very large for
 Class 1 (simple): possibly after a complicated
relatively small values of M and N also. Never-
theless, it is a finite positive integer, so that, for transient, simple patterns emerge.
 Class 2 (fractalic): possibly after a transient,
given M, N, one could denote every EL by an
integer number and investigate the typical behavior overall regular nested structures are obtained.
 Class 3 (chaotic): complicated but seemingly
of each EL. A considerable reduction of this
number is obtained if one limits attention to random behavior.
 Class 4 (complex): possibly after a transient,
totalistic ELs, namely to those whose outcome
depends only on the global configuration of the localized structures emerge that interact in com-
IS, often just on plex ways.
Due to the looseness of the above definitions,
X
N
n;t Sn ki ; i 1; 2;... ;N; ki 2 Z 6 perhaps a better way to distinguish between classes
i1 is to train ones eye. Consider some examples of
CAs for each class: the typical behavior of class-1
CA is shown in Figures 5 and 6, of class-2 CA in
Figures 7 and 8, of class-3 CA in Figures 4 and 9,
of class-4 CA in Figures 10 and 11. Note, however,
that often one has mixed type CA: for example,
CA4 is of class 1 on the right and of class 2 on
the left (see Figure 3); Figure 12 exhibits a CA
where the typical behaviors of classes 2 and 3 are
superimposed.

Extensions
The concept of a CA is so simple that many
extensions of the above-sketched definition of a
Figure 4 Four hundred and sixty-one time steps of CA5, CA can be easily devised. A (nonexhaustive) survey
starting from a random chosen PS of 50 cells. of such extensions follows.
458 Cellular Automata

Figure 8 A class-2 CA: a double fractal structure appears; M = 4,


4
V = 0, R = 2, EL: S(n, t 1) = S(n  2, t) S(n, t) S(n 2, t):

Figure 6 A class-1 CA, a random ID vanishes after 337


5
time steps, M = 5, V = 0, R = 2, EL: S(n, t 1) = S(n  1, t)
S(n  2,t) S(n 1,t)S(n 2,t) S(n 1,t)S(n 1,t) S(n  2,t)
S(n 2,t).

5
Figure 9 A class-3 CA: M = 5, V = 0, R = 2, EL: S(n, t 1) =
2S(n  1, t) S(n 1, t) S(n, t)(S(n 1, t) S(n 2, t))
S(n  1, t)S(n 1, t).

function ~S(n, t) that takes values in the finite field


Figure 7 A class-2 CA: Sierpinsky triangles appear; M = 2,
2 ZM , M = M1 M2 . . . ML ; for example,
V = 0, R = 1, EL: S(n, t 1) = S(n  1, t) S(n 1, t).
!
X
L1 Y
L
~Sn; t SL n; t Sl n; t Mk 7
l1 k>l
Vector CA

In this extension, the state function S(n, t) is Thus, in a sense, vector CAs are still usual CAs
considered as a vector, namely S(n, t)  with a complicated EL.
(S1 (n, t), S2 (n, t), . . . SL (n, t)), L being a positive inte-
Example 6 (CA6) A two-component vector CA:
ger. Each component Sl (n, t)(l = 1, 2, . . . , L) takes
values in a finite field, say ZMl = {0, 1, 2, . . . , Ml  M
S1 n; t 1 1 S1 n; tS1 n 1; t
1}, and evolves, according to some EL, interacting
with the other components. Of course, one can give M1  1S2 n  1; tS2 n; t c1 8
separately the time evolution for each component; M
S2 n; t 1 2 S1 n  1; tS2 n; t
however, it is also possible to give a global
representation of a vector CA, introducing a global S1 n; tS2 n 1; t c2 9
Cellular Automata 459

Figure 12 A mixed-class CA: a fractalic structure is super-


2
imposed on a chaotic one; M = 4, V = 0, R = 2, EL: S(n, t 1) =
S(n, t)(S(n  2, t) S(n 2, t)) S(n  1, t)S(n 1, t).

Figure 10 A class-4 CA (Wolfram CA 110): M = 2, V = 0, R = 1,


2
EL: S(n,t 1)= S(n,t) S(n 1,t) S(n,t)S(n 1,t) S(n  1,t)
S(n,t)S(n 1,t).

Figure 13 Global behavior of the vector CA6.

Obviously, ~S 2 ZM with M = M1 M2 . Figure 13


represents the global behavior of this CA with
M1 = 2, M2 = 3, c1 = 1, c2 = 1, V = 0.
Note that this CA can be considered as an
extension of the celebrated quadratic map.

Multidimensional CA
Up to now we have considered CAs with finite number
of cells (finite CAs) or with an infinite number of cells
arranged on a line (unidimensional CAs). Now we
Figure 11 A class-4 CA. Note the interacting moving struc- consider CAs with cells arranged on a surface,
tures on the left and on the right; note also the apparently usually a plane (bidimensional CAs), or on 3-space
2
chaotic behavior in the center; M = 2,V = 0,R = 2,EL: S(n,t 1)= (tridimensional CAs), or even on a hyperspace (multi-
S(n,t) S(n 1,t) S(n  1,t)S(n 2,t). dimensional CAs). In any case, if the number of cells
is finite, the evolution of such CAs, according to an
The global behavior of this CA can be expressed, NEL, must end up to a final cycle: this is due to the
for example, through the global state function finiteness of the phase space (thus, these CAs should
be classified as class 1; however, note that, if the
~
Sn; t M2 S1 n; t S2 n; t 10 phase space is large enough, the dynamics of
460 Cellular Automata

such CAs can still be very rich). Usually, one (periodic structures), gliders and ships (moving
considers an infinite number of cells tessellating structures), emitters and absorbers (namely, struc-
the whole s-space, s = 2, 3, . . . (e.g., squares or tures that, after a time period, reconstitute them-
hexagons on the plane, cubic cells in 3-space). The selves, but meanwhile they have emitted or adsorbed
changes in the previous notation and definitions are moving structures). These structures are essential to
plain: for example, for a bidimensional CA, the state prove that Life can be used to construct a universal
function depends now on two discrete space Turing machine (see below). One can get a rough
variables (S(n1 , n2 , t), n1 2 Z, n2 2 Z); furthermore, idea of such richness from Figure 14.
there is a greater freedom in choosing a neighbor- As in the previous case of vector CA, one could
hood of range R. Two most-used neighborhoods of object that also multidimensional CAs are not true
range 1 are shown below: extensions of the unidimensional CAs. Indeed, since
the whole set of cells is still a countable set, one
Neumann neighborhood
could number the cells with just a discrete space
variable (say n 2 Z ). For example, in the case of a

square tessellation of the plane, we could enumerate

}
the cells in the plane starting from the origin as
follows:


22 !
11
MooreConway neighborhood 21 20 19 18
13 12 11 4 5 6 17



14 9 10 3 2 7 16
15 8 1 0 1 8 15 14

}

16 7 2 3 10 9 14



17 6 5 4 11 12 13
18 19 20 21
The most famous (and interesting) bidimensional  22
CA is Life, introduced by J H Conway, which is
discussed next.
Thus, any multidimensional CA could in principle
Example 7 (CA Life; MooreConway neighbor- be viewed as a unidimensional one. Of course, one
hood, V = 0, M = 2). A cell in the vacuum state 0 is has to pay a price for this: ISs and ELs that are
called dead; a cell in the state 1 is called alive. simple for a multidimensional CA become cumber-
The EL is as follows: some for its unidimensional version and vice versa.
(i) If a cell is dead at time t, it comes alive at time
t 1 if and only if exactly three of its eight
Higher Time Derivatives
neighbors are alive at time t (reproduction).
(ii) If a cell is alive at time t, it dies at time t 1 if and Up to now, we have considered CAs whose evolved
only if fewer than two (loneliness) or more than state S(t 1) depends only on the state S(t), namely
three (overcrowding) neighbors are alive at time t. the state of the CA itself at the previous time step. In
other words the EL involves just the first (discrete)
Clearly, this is a totalistic NEL. Now considering
time derivative (1 CA). One can easily extend all the
the explicit form of  (see [6]):
previous definitions to consider higher-order discrete
n1 ; n2 ; t Sn1 ; n2 ; t time derivatives (K CA). Of course, the ID and the IS
for such a CA involve the state of the CA at K
X
1 X
1
Sn1 k1 ; n2 k2 ; t 12 subsequent time steps.
k1 1 k2 1 An example of a unidimensional 2 CA is given
below.
the above EL can be simply expressed as:
Example 8 (CA7) M = 3, V = 0, R = 1. The EL is:
Sn1 ; n2 ; t 1 3; 2; Sn1 ; n2 ; t 13
3
Sn; t 1 Sn  1; t Sn; t  1 Sn 1; t 15
where 3,  is the Kroenecker symbol.
Life is a class-4 CA; it exhibits a rich variety of An example of the evolution of such a CA is given in
interesting structures: stable structures, oscillators Figure 15.
Cellular Automata 461

(a) (b)

Figure 15 CA7, clearly a class-2 CA.

It is plain that taking a suitable continuum limit


of a K CA one gets a partial differential equation of
order K for the evolution. However, there are also
special and interesting CAs, called filter CAs,
(c) (d) that in a suitable continuum limit end up in integral
evolution equations. For a filter unidimensional
CA, the evolved state at the cell n, S(n, t 1),
depends also on the (already) evolved states of the
cells on its left (or right): for example, an NEL of
the type
M ~j ; t 1
Sn; t 1 FSn ki ; t; Sn  k
i 1; 2; . . . ; N; ki 2 Z
~
j 1; 2; . . . ; N; ~j 2 N
k 16

(e) (f) is still valid (computable). Extensions to K CAs or


vector CAs or multidimensional CA are plain. Very
Figure 14 CA Life: (a) Time 0. Near the lower border, five
stable structures (from the left to the right: a block, a boat, a often filter CAs exhibit a class-4 behavior with
ship, a loaf, a beehive); near the left border three blinkers particle-like structures moving and interacting in a
(period-2 oscillators); near the right corner, a symmetric structure complex way; see the following example and
that, in one time step, evolves into a pulsar (a period-3 examples in the next section.
oscillator), on the left-up corner a glider (a moving structure);
on the right-up corner a medium weight spaceship (another Example 9 (CA8) M = 2, V = 0, R = 2. The EL is:
moving structure); in the center, a configuration that vanishes in a
few time steps. (b) Time 1. The structures on the lower border are 2
Sn; t 1 Sn  1; t  1Sn  2; t
unchanged, the blinkers, the glider, and the space ship are in an
intermediate state, on the right border, the pulsar starts to pulse. Sn; t Sn 1; tSn 2; t 17
(c) Time 2. The three blinkers on the left border are again in their
original configurations (periodic structure with period 2), the
pulsar, the glider and the spaceship are in another intermediate An example of the evolution of such a CA is given
state. (d) Time 3. The pulsar is in its second state, the glider and in Figure 16.
the spaceship in their third, the structure in the center is going to
vanish. (e) Time 4. The pulsar has completed its pulsation (period-
3 oscillator, see Figure 14b); the structure in the center has Invertible CA
vanished, the glider and the spaceship have recovered their
original configurations (see Figure 14a) but meanwhile they have
For most of the ELs there is a loss of information
moved of a cell in four time steps (1n4 of the highest velocity in the course of the evolution (see, e.g., Figures 5
attainable by a moving structure in a CA of range 1). The glider is and 6). Indeed, different definitions of CA
moving downward and to the right, the space ship in horizontal to entropy have been introduced to measure the
the left. (f) Time 60. The space ship has almost completed its randomness in the behavior of a given CA.
crossing, the glider has reached the center and it is in a collision
route with the pulsar.
However, since CAs are important in physical
462 Cellular Automata

Example 10 (CA9) A 6 CA: M = 2, V = 0, R = 1.


The EL is:
2
Sn; t 1 Sn; t  5 Sn; t  3 Sn 1; t  2
Sn  1; t  1
Sn; t  2Sn 1; t  2
Sn; tSn  1; t 20
The inverse EL, according to [19], reads
(Figure 17)
2 ~
~Sn; ~t 1 Sn; ~t  5 ~Sn; ~t  1 ~Sn 1; ~t  2
~Sn  1; ~t  3
~Sn; ~t  2~Sn 1; ~t  2
~Sn; ~t  4~Sn  1; ~t  4 21

Figure 16 CA8, a filter CA. Note the emerging of particle-like


structures moving to the left and to the right and interacting in
complex ways.

modeling as well as in cryptography and data


compression, there is great interest in a special
subclass of CAs which are invertible (time
reversible). Namely, for an invertible CA fol-
lowing a given EL and starting from an arbitrary
ID, there exists an inverse EL such that one
can recover the ID from the evolved states.
Invertible CAs can be easily devised in the case of
K CA (K > 1). For example, if K = 2, 3 . . . , one can
consider ELs of the form
(a)
 
M j
Sn; t 1 Sn; t  K 1 F Sn ki ; t  j 18a

where

j
i 1; 2; . . . ; N j ; ki 2 Z
18b
j 0; 1; 2; . . . ; K  2

and F is an arbitrary polynomial function.


It is then clear that the inverse EL reads

~ M
Sn;~t 1 ~
Sn;~t  K 1
 
M  1F ~ Sn kji ;~t j  K 2 19

Indeed, if an arbitrary ID evolves according to (b)


the EL [18], then applying the inverse EL [19] to K
Figure 17 CA9, a 6 CA: (a) a 50 time-step evolution from a
subsequent evolved states (taken in reversed order), peculiar ID; (b) a 50 time-step evolution of the inverse EL, starting
eventually the original ID is recovered (in reversed from the last six configurations of Figure 17a (taken in inverse
order) (see the following example). order); the ID of Figure 17a is recovered (in inverse order).
Cellular Automata 463

Of course, more complicated invertible ELs can be Example 11 (CA10) A 1.5 CA, M = 2, V = 0, R = 3.
devised. Invertible ELs can be also easily devised for The EL is:
filter CA, for example, if an NEL for a filter CA
reads 2
Sn; t 1 Sn; t Sn  3; t 1Sn  2; t 1
Sn 2; tSn 3; t
M
Sn; t 1 Sn; t Sn  2; t 1Sn  1; t 1
~j ; t 1 Sn 1; tSn 2; t 24
FSn ki ; t; Sn  k 22
Note that this EL is of the form [22]; therefore, it
where ki and k ~j are positive integers is invertible (see Figure 18a). According to [23], the
~ and F is an arbitrary
(i = 1, 2, . . . , N; j = 1, 2, . . . , N) inverse EL reads:
(polynomial) function, then it is invertible and
the inverse NEL reads 2
~Sn; ~t 1 Sn; ~t ~Sn 3; ~t 1~Sn 2; ~t 1
M
~  2; tSn
Sn ~  3; t
~
Sn; ~t 1 ~
Sn; ~t M  1
~Sn 2; ~t 1~Sn 1; ~t 1
F~
Sn ki ; ~t 1; ~ ~j ; ~t
Sn  k 23 ~Sn  1; ~t~Sn  2; ~t 25

Note that [22] is computable starting from This CA exhibits a very rich dynamics: any
n = 1, whereas [23] is computable starting from complex ID rapidly decays in a great variety of coherent
n = 1. particle-like structures, steady or moving to the right or

(a) (b)

(c) (d)
Figure 18 CA10: (a) 230 time-step evolution, then the inverse EL is applied for 230 further time step in order to recover the initial
configuration. (b) Collisions between different kinds of particle-like coherent moving structures. The last collision (on the right) is
a solitonic one: the interaction produces just a phase shift, preserving number, shape, and velocities of the involved particles.
(c) Particles moving with different velocities and interacting in complex ways (solitonic collisions, particle creations and annihilations).
(d) A particle goes through a nonhomogeneous medium and undergoes refraction by the medium itself.
464 Cellular Automata

to the left with different velocities. The interactions the constructing arm. When on the tape, it stores a
between different particles may be solitonic (the description of the universal constructor itself, then it
particles emerge unchanged but shifted) or annihila- self-reproduces. The total size of the self-reproducing
tioncreation phenomena can occur (see Figures 18ad). automaton amounts to 200 000 cells. (Some com-
puter simulations of von Neumann self-reproducing
automaton are available on the web.)
Applications of CAs Since von Neumanns CA is a very complex one,
it led researchers to think that a CA able to simulate
CAs as Universal Constructors and
a universal Turing machine should also be quite
Turing Machines
complex. The perspective changed completely after
In the 1950s, von Neumann, who contributed to the the introduction of CA Life. Conway was looking
development of the first computer (ENIAC), decided for a simple CA with a possible rich dynamics;
to work out a mathematical theory of automata. however, it was subsequently realized that Life was
Such a theory was finalized to give an answer to the much more complicated that anyone could have
following question: is it possible to build an thought. Finally, thanks to the development of faster
automaton such that it allows universal computa- computers that allowed visualization of the evolu-
tion (i.e., it embodies a universal Turing machine) tion of quite large populations and through the
and, moreover, it is able to build (in order of contribution of a large number of researchers, it was
decreasing generality) proved that a universal Turing machine could be
embedded in Life.
1. an arbitrary automata (universal constructor);
The discovery that even a simple CA such as Life
2. a copy of itself (self-reproducing); and
could incorporate a universal Turing machine led to
3. an automaton that is itself a universal Turing
the question whether it could be possible to build a
machine (constructor)?
universal Turing machine inside a simple one-
The last question von Neumann had intention to dimensional CA. This is indeed the case: up to
address was if in the process of automata self- now, the simplest CA capable of universal computa-
reproduction (if possible) a process of evolution tion is the W110 CA (see Figure 10), as proved
could take place, that is, if a simpler automaton recently by Cook after a conjecture formulated by
could generate a more complex one. Wolfram in 1985.
In the beginning, the idea of von Neumann was to
describe, using mathematical axioms, an automaton
CAs for Computer Simulations
moving inside a warehouse and selecting various
elementary spare parts (e.g., muscles, switches, rigid One of the major applications of CAs is the
girders) and then assembling them into a new auto- computer simulation of various dynamical pro-
maton. While this original idea was very realistic, it was cesses. Even if CAs were not invented for this
also very difficult to pursue, so that von Neumann, purpose, they possess peculiarities that make them
following a suggestion by Ulam, decided to consider his particularly suitable for this task. The main advan-
questions in the more abstract framework of CAs. tage of using a CA for a dynamical simulation is due
The particular CA he considered is an infinite to their completely discrete nature that allows exact
square CA with 29 possible states. The transition rule simulations on a computer. Thus, any spurious
is dependent upon the cell to update and its north, effect due to rounding errors is ruled out. Another
east, south, and west neighbor cell (the von Neumann advantage is that the EL of a CA can be seen as a
neighborhood). Among the 29 possible states there is function between finite sets. For this reason, one can
one state that is quiescent (the vacuum state). specify the EL through a lookup table (see [2]):
von Neumann proved the existence of a configura- then when running the simulations, the computer
tion of 50 000 cells immersed in a sea of quiescent has only to access the table instead of computing the
states that embodies a universal Turing machine and function every time, shortening considerably the
that is a universal constructor. An infinite one- computation time. Another great advantage of CAs
dimensional tape is used to store a description of in computer simulations is that, for their very nature
the automaton to build. The universal constructor (at least for local EL), they can be implemented on
reads the description on the tape, develops a parallel machines. These two concepts are at the
constructing arm that builds the configuration basis of dedicated computers for CAs simulations
described on the tape in an unoccupied part of the developed by Toffoli, Margolus, and co-workers
cellular space, makes a copy of the tape and finally (CAM series). The possibility to use efficiently
attaches it to the newly built automaton and retracts parallel computers for CA simulation could prove
Cellular Automata 465

bidimensional square lattice and the particles are


described by arrows lying on the edges of the lattices
and pointing to some vertex (see Figure 20a).
The particles are assumed to be all identical and
with the same velocity, and particles on the same
edge with the same direction are not allowed
(exclusion principle). The EL prescribes that parti-
cles move with unitary velocity along the edges in
the direction pointed by the arrow (free flight)
unless there are exactly two particles on the edges
Figure 19 A CA that computes the 3n 1 CollatzUlam connected to a given vertex and they point in
map. The ID for the CA is the initial number for the iterated map opposite directions (collision); in this case they are
(binary notation, order 2300, randomly chosen, displayed on the
left vertical axis). The CA, according to the Collatz conjecture,
replaced by two arrows pointing outward on the
ends up to the final stable configuration (horizontal line on the previously empty edges (see Figure 20b). Clearly,
right for the CA, 1 ! 4 ! 2 ! 1 for the map). the EL conserves the number and the momentum of
the particles.
to be fundamental when computer speeds approach The HPP model can be described algebraically.
saturation. Moreover, CAs themselves can mimic The admissible particle velocities are just
parallel computations, see, for example, Figure 19, c1 ^
x; c2 ^
y; c3 ^
x; c4 ^
y 26
where a nonlocal CA computes very efficiently the
celebrated CollatzUlam 3n 1 map.

CAs in Physics
Since Newton, physics has been described through
differential equations and continuous functions.
However, such a mathematical description is not
fit for simulation on a computer, and some
discretizations must be considered. First, one has to
discretize space and time passing from differential
equations to (finite systems of) finite difference
equations; second, one has to round off the values
of the functions to store them in the memory of the
computer. The main drawback of this procedure is
(a)
that in chaotic systems such approximations can
rapidly lead to great differences between the real Collisions
and the simulated behavior. As already noticed, this
problem does not appear in CA. Thus, one would
like to use this good characteristic of CAs in physical
modeling taking due account of the continuous
nature of the physics involved. This requires atten-
tion and ingenuity in constructing reliable CA
models for physical processes. For example, this
goal has been achieved in the so-called lattice gas
automata (LGAs).
LGAs are CA models for the microscopic Free flight
dynamics of fluids and gases. The thermodynamic
limit of these CAs yields the correct continuous
functions for the macroscopic quantities (density,
pressure, viscosity, etc.).
The first step toward LGAs was the discovery that
the HPP model developed in the 1970s by Hardy, (b)
Pomeau and De Pazzis was in fact a CA. The HPP Figure 20 (a) An example of configuration for the HPP model.
model describes the behavior of a fluid (or a gas) in (b) Head on collisions and three particle collisions in the HPP
a plane. The configuration space is given by a model.
466 Cellular Automata

Accordingly, only four bits nj (x, t), j = 1, 2, 3, 4, are nonlinear dynamical systems (nonlinear continuous
required to denote the presence (1) or the absence and discrete evolution equations, many-body pro-
(0) of a particle with velocity cj pointing vertex x at blems) could profitably be extended to find integr-
time t. The dynamical rule for HPP can be written in able CAs. Indeed, many such CAs have been found
the form that exhibit solitons and are endowed with non-
trivial conservation laws (of course, this is very
nj x cj ; t 1 nj x; t !j x; t 27 important in physical modeling). Moreover, the
above-cited similarity between certain CA behaviors
where term nj (x, t) on the right-hand side accounts and elementary particle physics phenomena suggests
for the free flight of particles, while !j (x, t) modifies that the fundamental structure of reality (at the Planck
the trajectories in the case of collisions. The !j are level) could indeed be that of a CA (cells of Plank
determined by the state of the system according to length, discrete time flow): attempts to construct this
the following rules: underlying CA physics have been pursued.

!1 n1 1  n2 n3 1  n4 Other Applications


1  n1 n2 1  n3 n4 28a CAs exhibit a great plasticity, which makes them
!2 n2 1  n3 n4 1  n1 well suited to model systems in a wide range of
fields. This is mainly due to the fact that CAs with
1  n2 n3 1  n4 n1 28b
very simple rules can also simulate universal Turing
!3 n3 1  n4 n1 1  n2 machines, so that they can exhibit a very rich and
1  n3 n4 1  n1 n2 28c complicated overall dynamics (in principle, one
could simulate any dynamical system using a simple
!4 n4 1  n1 n2 1  n3 CA). There is another reason for the wide applic-
1  n4 n1 1  n2 n3 28d ability of CA modeling even outside of physics:
namely, it is well known that algorithms, not
It is plain that eqns [27] and [28] can be differential equations, are better instruments to
interpreted as the EL for a CA. schematize dynamical processes for complex and
In the thermodynamic limit, the equations govern- organized systems. Since simple algorithms can be
ing the dynamics of the macroscopic quantities of naturally implemented on CAs, the latter are very
the fluid are given by the continuity equation and by useful for realizing simple models and simulations in
anisotropic NavierStokes equations. The aniso- many fields: biology, economics, ecology, neural
tropy in the NavierStokes equations is due to the networks, traffic models, etc.
fact that the invariance group of the square lattice is Moreover, applications of CAs in informatics and
too small. This problem was solved by Frisch, specifically in cryptography and data compression
Hasslacher, and Pomeau in 1986, with the introduc- have been investigated.
tion of the FPP model. It turns out that a hexagonal
lattice has enough symmetries to recover the See also: Dynamical Systems in Mathematical Physics:
An Illustration from Water Waves; Generic Properties of
isotropic NavierStokes equations in the thermo-
Dynamical Systems; Integrable Systems: Overview.
dynamic limit. So, the FPP model is an example of a
model where even if the microscopic dynamics is
almost a caricature of the real dynamics, the
thermodynamic limit gives rise to the correct Further Reading
physical equations. Berlekamp ER, Conway JH, and Guy R (1982) Winning Ways for
CAs have been used to simulate many other Your Mathematical Plays. London: Academic Press.
physical processes (unfortunately, there is no space Boghosian BM (1999) Lattice gases and cellular automata. Future
here for a sufficiently elaborate description). The Generation Computer Systems 16: 171185.
Boon JP, Dab D, Kapral R, and Lawniczak A (1996) Lattice gas
principal fields of application are: percolation automata for reactive systems. Physics Reports 273: 55147.
theory, magnetism, diffusion phenomena, sandpiles, Burks AW (ed.) (1970) Essay on Cellular Automata. Urbana:
models of earthquakes, crystal growth, etc. University of Illinois Press.
The more intriguing aspect of some even simple CAs Chopard B and Droz M (1998) Cellular Automata Modeling of
(e.g., CA9, CA10: see Figures 16 and 18) is their very Physical Systems. Cambridge: Cambridge University Press.
Doolen G (ed.) (1990) Lattice Gas Methods for Partial
rich particle-like dynamics. For instance, the existence Differential Equations. New York: Addison-Wesley.
of solitonic collisions suggested that the techniques Gardner M (1983) Wheels, Life, and Other Mathematical
recently developed to find and treat integrable Amusements. New York: W H Freeman.
Central Manifolds, Normal Forms 467

Jackson EA (1990) Perspectives of Nonlinear Dynamics. von Neumann J (1966) In: Burks AW (ed.) Theory of Self-
Cambridge: Cambridge University Press. Reproducing Automata. Urbana: University of Illinois Press.
Toffoli T and Margolus N (1987) Cellular Automata Machines Wolfram S (2002) A New Kind of Science. Champaign: Wolfram
A New Enviroment for Modeling. Cambridge: The MIT Press. Media.

Central Manifolds, Normal Forms


P Bonckaert, Universiteit Hasselt, Diepenbeek, (Non) uniqueness, Smoothness
Belgium
Most proofs in the literature (Vanderbauwhede
2006 Elsevier Ltd. All rights reserved.
1989) use a cutoff in order to construct globally
defined objects, and then obtain the invariant graph
as the solution of some fixed-point problem of a
Introduction contraction in an appropriate function space.
We consider differentiable dynamical systems gen- Although this solution is unique for the globalized
erated by a diffeomorphism or a vector field on a problem, this is not the case at the germ level:
manifold. We restrict to the finite-dimensional case, another cutoff may produce a different germ of
although some of the ideas can also be developed in a central manifold. In other words, locally a
the general case (Vanderbauwhede and Iooss 1992). central manifold might not be unique, as is
We also restrict to the behavior near a stationary easily seen on the planar example x2 @=@x 
point or a periodic orbit of a flow. y@=@y. On the other hand, the 1-jet of the map
Let the origin 0 of Rn be a stationary point of a C1 c , in case of a C1 vector field, is unique, so if
vector field X, that is, X(0) = 0. We consider the there would exist an analytic central manifold then
linear approximation A = dX(0) of X at 0 and its this last one is unique; in the foregoing example,
spectrum (A), which we decompose as (A) = s [ it is the x-axis. But for the (polynomial) example
c [ u , where s resp. c resp. u consists of those (x  y2 )@=@x y2 @=@y one can calculate P that the
eigenvalues with real part < 0 resp. = 0 resp. > 0. If 1-jet of x = c (y) is given by j1 c (y) = n1 n!yn1 ,
c = ; then there is no central manifold, and the which has a vanishing radius of convergence, so
stationary point 0 is called hyperbolic. Let Es , Ec , there is no analytic central manifold. On the other
and Eu be the linear A-invariant subspaces corre- hand, by the Borel theorem we can choose a
sponding to s resp. c resp. u . Then Rn = Es C1 -representative for c . This can be generalized
Ec Eu . We look for corresponding X-invariant in the planar case:
manifolds in the neighborhood of 0, in the form of
graphs of maps. More precisely: Proposition 1 If n = 2 and if X is C1 and if the
1-jet of X in the direction of the central manifold
Theorem 1 Let the vector field X above be of class is nonzero, then this central manifold is C1 .
Cr (1  r < 1). There exist map germs ss : (Es , 0) ! In particular, if X is analytic then the central
Ec Eu , sc : (Es Ec , 0) ! Eu , uu : (Eu , 0) ! Es Ec , manifold is either an analytic curve of stationary
cu : (Ec Eu , 0) ! Es , and c : (Ec , 0) ! Es Eu of points or is a C1 curve along which X has a
class Cr such that the graphs of these maps are nonzero jet.
invariant for the flow of X. Moreover, these maps
are of class Cr , and their linear approximation at 0 For proofs and additional reading, the reader is
is zero, that is, their graphs are tangent to, referred to Aulbach (1992). In general, a central
respectively, Es , Es Ec , Eu , Ec Eu , and Ec . If X is manifold is not necessarily C1 (van Strien 1979,
of class C1 then ss and uu are also of class C1 . If Arrowsmith and Place 1990): for the system in
X is analytic then ss and uu are also analytic. R3 given by
@ @ @
The graph of c is called the (local) central x2  z2 y x2  z2 0 
@x @y @z
(or, center) manifold of X at 0 and it is often
denoted by W c . Thus, it is an invariant manifold one can find a Ck central manifold for every k but
of X tangent at the generalized eigenspace of there is no C1 central manifold. Indeed, in this case
dX(0) corresponding to the eigenvalues having zero the domain of definition of c shrinks to zero when
real part. k tends to infinity.
468 Central Manifolds, Normal Forms

Central Manifold Reduction so-called seminormal or renormal form containing


higher-order terms (see Bonckaert (1997, 2000) and
The importance of a central manifold lies in the
references therein; here one can also find results for
principle of central manifold reduction, which
cases where extra constraints should be respected,
roughly says that for local bifurcation phenomena
like symmetry, reversibility, or invariance of some
it is enough to study the behavior on the central
given foliation etc.).
manifold, that is, if two vector fields, restricted to
their central manifolds, have homeomorphic integral Parameters
curve portraits, and if the dimensions of Es and Eu
are equal, then the two vector fields have home- Having an eigenvalue with zero real part is
omorphic integral curve portraits in Rn , at least ungeneric, so in bifurcation problems we consider
locally near 0. Let us be more precise: p-parameter families X near, say,  = 0. With
respect to the results above, we remark that such a
Theorem 2 Let m be the dimension of Ec . There family can be considered as a vector field near
exists p, 0  p  n  m, such that X is locally (0, 0) 2 Rn  Rp tangent to the leaves Rn  {}. In
C0 -conjugate to fact, the parameter direction Rp is contained in Ec .
X
m In all the results mentioned, this structure of being
X0 ~ i z1 ; . . . ; zm @
X a family is respected. For example, in Theorem 2
@zi ~ i (z1 , . . . , zm ) by X
~ i (z1 , . . . , zm , ). Hence,
i1 we replace X
mp
X @ Xn
@ ~  is a versal unfolding of X
if X ~ 0 then X is a versal
zi  zi unfolding of X0 . By this, the search for versal
im1
@zi imp1 @zi
unfoldings is reduced to the unfolding of singula-
where (z1 , . . . , zm ) is a coordinate system on a rities whose linear approximation at 0 has a purely
central manifold, (z1 , . . . , zn ) is a coordinate imaginary spectrum.
Pm ~system
on Rn extending (z1 , . . . , zm ) and i = 1 Xi @=@zi
Diffeomorphisms, Periodic Orbits
is the restriction of X to a central manifold.
Moreover, if A completely analogous theory can be developed for
X
m fixed points of diffeomorphisms f : (Rn , 0) ! Rn .
Y ~ i z1 ; . . . ; zm @
Y Here we split up the spectrum of the linear part
i1
@zi L = df (0) at 0 as (L) = s [ c [ u , where s resp.
mp
X @ Xn
@ c resp. u consists of those eigenvalues with
zi  zi modulus <1 resp. = 1 resp. > 1. This theory can be
@zi imp1 @zi
im1 applied to the time-t map of a vector field (and will
Pm ~ i @=@zi is C0 -equivalent (resp. C0 - give the same invariant manifolds) and to the
and if Y
i = 1P Poincare map of a transversal section of a periodic
conjugate) to m ~ 0
i = 1 Xi @=@zi then X is C -equivalent orbit of a vector field (Chow et al. 1994).
(resp. -conjugate) to Y.
For a proof and further reading (a generalization)
see Palis and Takens (1977). Normal Forms
In case that more smoothness than just C0 is The general idea of a normal form is to put a
needed, we have the principle of normal lineariza- (complicated) system into a form as simple as
tion along the central manifold. More concretely, let possible by means of a change of coordinates. This
x denote a coordinate in the central manifold and idea was already developed to a great extent by
let y be a complementary variable, that is, let H Poincare. Simple examples are: (1) putting a square
X = Xc @=@x Xh @=@y. We define the normally matrix into Jordan form, (2) the flow box theorem
linear part along the central manifold by (Arrowsmith and Place 1990) near a nonsingular
@ @Xh @ point. Depending on the context and on the purpose
NX : Xc x; 0 x; 0  y of the simplification, this concept may vary greatly. It
@x @y @y
depends on the kind of changes of coordinates that are
Under certain nonresonance conditions (Takens tolerated (linear, polynomial, formal series, smooth,
1971, Bonckaert 1997) on the real parts of the analytic) and on the possible structures that must be
eigenvalues of dX(0), there exists a Cr local preserved (e.g., symplectic, volume-preserving, sym-
conjugacy between X and NX for each r 2 N metric, reversible etc.). Let us restrict to local normal
(assuming X to be of class C1 ). If there are forms, that is, in the vicinity of a stationary point of a
resonances, then one can conjugate with the vector field or a diffeomorphism (the latter can be
Central Manifolds, Normal Forms 469

P
applied to the Poincare map of a periodic orbit). We the Taylor series of 
(X) is A  y 1 k = 2 gk (y). For
concentrate on the simplification of the Taylor series. practical computations, it is often appropriate to
The general idea is to apply consecutive polynomial first simplify the linear part A and to diagonalize it
changes of variables; at each step we simplify terms of whenever possible. Hence, it is convenient to use a
a degree higher than in the step before. The ideal complexified setting and to use complex polyno-
simplification would be to put all higher-order terms mials or power series. One can show that all
to zero, which would (at least at the level of formal involved changes of variables preserve the property
series) linearize the system. But as soon as there are of being a complex system coming from a real
resonances (see below), this is impossible: the planar system, that is, at the final stage we can return to a
system 2x@=@x (y x2 )@=@y cannot be formally real system (see, e.g., Arrowsmith and Place (1990)
linearized. for a more precise mathematical description).
Hence, we can assume that A is an upper
Setting triangular matrix. Let the eigenvalues be 1 , . . . , n .
It can be calculated that the eigenvalues of LA , as an
Let X be a Cr1 vector field defined on a neighbor- operator H k ! P Hk , are then the numbers h, i  j
hood of 0 2 Rn , and denote A = dX(0) (its linear where  2 N , nj= 1 j = k and 1  j  n. Hence, if
n

approximation at 0). The Taylor expansion of X at these would all be nonzero then Bk = H k , and then
0 takes the form we have an ideal simplification, that is, all gk equal
X
r to zero. However, if such a number is zero, that is,
Xx A  x Xk x Ojxjr1
k2 h; i  j 0 2
where Xk 2 H k , the space of vector fields whose it is called a resonance between the eigenvalues. In
components are homogeneous polynomials of such a case, we have to choose a complementary
degree k. The classical formal normal-form theorem space Gk . From linear algebra it follows that one
is as follows. We define the operator LA on H k by can always choose
putting LA h(x) = dh(x)  A  x  A  h(x); one calls LA
the homological operator. One checks that Gk kerLA
3
LA (H k )  H k . One also denotes this by ad A(h)(x):
where A
is the adjoint operator. But this choice [3] is
see further in the Lie algebra setting. Let Rk be the
not unique and is, from the computational point of
range of LA , that is, Rk = LA (H k ). Let Gk denote any
view, not always optimal, especially if there are
complementary subspace to Rk in H k . The formal
nilpotent blocks. This fact has been exploited by
normal-form theorem states, under the above
many authors. A typical example is the case where
settings:
A = y@=@x. On the other hand, if A is semisimple we
Theorem 3 (Chow et al. 1994, Dumortier 1991) can choose the complementary space to be ker(LA ), so
There exists a composition of near identity changes LA gk = 0; we can assume it to be the (complex)
of variables of the form diagonal[1 , . . . , n ]. In that case we can be more
explicit as follows. Let ej = @=@xj denote the standard
x y k y 1 basis on Cn . For a monomial one can calculate that
k
where the components of  are homogeneous LA x ej h; i  j x ej 4
polynomials of degree k, such that the vector field
X is transformed into If the latter is zero, then the monomial is called
X
r resonant. This implies that the normal form can be
Yy A  y gk y Ojyjr1 chosen so that it only contains resonant monomials.
k2 Putting a system into normal form not only
simplifies the original system, it also gives more
where gk 2 Gk , k = 2, . . . , r.
geometric insight on the Taylor series. To be more
Sometimes this theorem is applied to the restric- precise, suppose (for simplicity, this can be general-
tion of a vector field to its central manifold, for ized (Dumortier 1997)) that A is semisimple. One
reasons explained in the last section. This is the can calculate that the condition LA gk = 0 implies:
reason why we did not assume X to be C1 ; in the exp (At)gk ( exp (At)x) = gk (x) for all t 2 R. This
latter case one can let r ! 1 and obtain a normal means that gk is invariant for the one-parameter
form on the level of formal Taylor series (also called group exp(At). A typical example in the plane
1-jets). Using a theorem of Borel, we infer the is: A has eigenvalues i, i. Note that the (only)
existence of a C1 change of variables  such that resonances are h(i, i), (p 1, p)i  i = 0 and
470 Central Manifolds, Normal Forms

h(i, i), (p, p 1)i i = 0 for all p 2 N. We done, one says that L0 respects the grading by the
suppose that the original system was real, that is, homogeneous polynomials. In order to fix ideas,
on R2 ; we can choose linear coordinates such that suppose that L0 are the divergence-free planar vector
for z = x iy, z = x  iy the linear part is fields. Note that a monomial xi yj @=@x is not diver-
A = diagonal[i, i]. Applying the remarks above, gence free. We can instead use time mappings of
we conclude that the normal form only contains the homogeneous vector fields of the form a(q
monomials (zz)p z@=@z and (zz)pz@=@z. The geo- 1)xp1 yq @=@x  a(p 1) xp yq1 @=@y. Up to terms
metric interpretation here is that these monomials of higher order we can use the time-one map of hk
are invariant for rotations around (0, 0). This can instead of x hk (x). In case that one asks for a C1 -
also be seen on the real variant of this: the Taylor realization of the normalizing transformation, we need
series of the (real) normalized system has the an extra assumption on the extra structure, that is, on
form ( f (x2 y2 ))(x@=@y  y@=@x) g(x2 y2 ) L0 , called the Borel property: denote by J1, 0 the set of
(x@=@x y@=@y) and is invariant for rotations. formal series such that each truncation is the Taylor
Warning: the dynamic behavior of a formal normal polynomial of an element of L0 . The extra assumption
form in the central manifold can be very different is: each element of J1, 0 must be the Taylor series of a
from that of the original vector field, since we are C1 vector field in L0 . It can be proved (Broer 1981)
only looking at the formal level. A trivial example is that the following structures respect the grading and
(take f = g = 0 in the foregoing example) X(x, y) = satisfy the Borel property: being an r-parameter family,
(x@y  y@x)  exp (1=(x2 ))@=@x, where orbits respecting a volume form on Rn , being a Hamiltonian
near (0, 0) spiral to (0, 0), whereas the normal form vector field (n even), and being reversible for a linear
is just a linear rotation. This difference is due to the involution.
so-called flat terms, that is, the difference between One could consider other types of grading of the
the transformed vector field and a C1 -realization of Lie-algebras involved.
its normalized Taylor series (or polynomial). In case This method, using the framework of the so-called
of analyticity of X, one can ask for analyticity of the filtered Lie algebras, is explained and developed
normalizing transformation . Generically, this is systematically in a more general and abstract
not the case in many situations. The precise meaning context in Broer (1981).
of this genericity condition is too elaborate to In nonlocal bifurcations, such as near a homo-
explain in this brief review article. We provide some clinic loop, for example, it is not enough to perform
suggestions for further reading in the next section. central manifold reduction near the singularity: a
One could roughly say that, in the central manifold, simplified smooth model in a full neighborhood of
the normal form has too much symmetry and is too the singularity is often needed, for example, in order
poor to model more complicated dynamics of the to compute Poincare maps.
system, which can be hidden in the flat terms. To Let us start with the purely hyperbolic case (i.e.,
quote Ilyashenko (1981): In the theory of normal dim Ec = 0). First we compute the formal normal
forms of analytic differential equations, divergence form such as the above. If there are no resonances
is the rule and convergence the exception . . . . [2] then we can formally linearize the vector field X.
In many applications, we want to preserve some If X is C1 then a classical theorem of Sternberg
extra structure, such as a symplectic structure, a (1958) states that this linearization can be realized
volume form, some symmetry, reversibility, some by a C1 change of variables (i.e., no more flat terms
projection etc.; the case of a projection is important remaining). In case there are resonances, we must
since it includes vector fields depending on a para- allow nonlinear terms: the resonant monomials. In
meter. Sometimes a superposition of these structures this case we can also reduce C1 to this normal form.
appears (e.g., a family of volume-preserving systems). Using the same methods, it is also possible to reduce
We would like that the normal-form procedure to a polynomial normal form, but this time using
respects this structure at each step. One can often Ck (k < 1) changes of variables. More precisely, if k
formulate this in terms of vector fields belonging to is a given number and if we write the vector field as
some Lie subalgebra L0 . The idea is then to use X = XN RN , where XN is the Taylor polynomial
changes of variables like [1], where k is then generated up to order N (which can be assumed to be in
by a vector field in L0 . This will guarantee that all normal form) and where RN (x) = O(jxjN1 ), then for
changes of variables are compatible with the extra N sufficiently large there is a Ck change of variables
structure. Unlike the general case where we could conjugating X to XN near 0. The number N depends
work with monomials as in [4], we will have to on the spectrum of A = dX(0). An elegant proof of
consider vector fields hk in L0 whose components are these facts can be found in Ilyashenko and Yakovenko
homogeneous polynomials of degree k. If this can be (1991). For the case when extra structure must be
Central Manifolds, Normal Forms 471

preserved, see Bonckaert (1997), which also deals with For local diffeomorphisms there are completely
the partially hyperbolic case (dim Ec  1). As already similar theorems pertaining to all the cases consid-
remarked above, the case of a parameter-dependent ered above.
family can be regarded as a partially hyperbolic
stationary point preserving this extra structure.
The question of an analytic normal form, also in Concluding Remarks
the hyperbolic case, leads to convergence questions
and calls upon the so-called small-divisor problems. The concept of central manifold can be extended to
The classical results are due to Poincare and Siegel. more general invariant sets (see Chow et al. (2000)
Let us summarize them; they are formulated in the and references therein). It can also be extended to
complex analytic setting: the infinite-dimensional case and can be applied to
partial differential equations (Vanderbauwhede and
Theorem 4 Iooss 1992).
(i) If the convex hull of the spectrum of A does not Concerning the generic divergence of normalizing
contain 0 2 C then X can locally be put into transformations, the reader is referred to Broer and
normal form by an analytic change of variables. Takens (1989), Bruno (1989), Ilyashenko (1981), and
Moreover, this normal form is polynomial. Ilyashenko and Pyartli (1991). Although the power
(ii) If the spectrum {1 , . . . , n } of A satisfies the series giving the normalizing transformation generally
condition that there exists C diverges, the study of the dynamics is often performed
P> 0 and  > 0 such by truncating the normal form at a certain order.
that for any m 2 N n with j mj  2:
Recently, Iooss and Lombardi (2005) considered the
C question as to what an optimal truncation is. It is
jh1 ; . . . ; n ; mi  j j  5 shown, in case dX(0) is semisimple, that the order of
jmj
the normal form can be optimized so that the remainder
for 1  j  n then X can be locally linearized by satisfies some estimate shrinking exponentially fast to
an analytic change of variables. zero as a function of the radius of the domain.
Note that case (i) contains the case where 0 is a Concerning normal forms preserving the
hyperbolic source or sink. This case (i) in Theorem 4 Hamiltonian structure, see Birkhoff (1966) and
can be extended if there are parameters: if X Siegel and Moser (1995) for a starting point; this is
depends analytically on a parameter " 2 Cp near an extended subject on its own, sometimes called
" = 0 then the change of variables is also analytic in Birkhoff normal form, and it would require another
"; moreover, the normal form is then a polynomial review article.
in the space variables whose coefficients are analy- Further simplifications of the normal form can
tically dependent on the parameter ". sometimes be obtained by taking into account
For case (ii) this is surely not the case, since the nonlinear terms (instead of just A) in order to obtain
condition [5] is fragile: a small distortion of the reductions of higher-order terms (see Gaeta (2002)
parameter generically causes resonances, be it of a and especially the references therein).
high order. To fix ideas, consider n = 2 and suppose Applications of normal forms and central mani-
1 < 0 < 2 . By a generic but arbitrary small folds to bifurcation theory have been explained in
perturbation, we can have that the ratio of these Dumortier (1991).
eigenvalues becomes a negative rational number
See also: Averaging Methods; Bifurcation Theory;
p=q, which gives a resonance of the form [2] Dynamical Systems and Thermodynamics; Dynamical
with j = 1 and  = (q 1, p), so [5] is violated. Systems in Mathematical Physics: An Illustration from
So analytic linearization, or even a polynomial Water Waves; Finite Group Symmetry Breaking;
analytic normal form, is ungeneric for families of Kortewegde Vries Equation and Other Modulation
such hyperbolic stationary points. The search for Equations; Multiscale Approaches; Normal Forms and
analytic normal forms, that is, simplified models, for Semiclassical Approximation; Symmetry and Symmetry
families is still under investigation. A first simplifica- Breaking in Dynamical Systems.
tion is obtained via the stable and unstable manifold
from Theorem 1, that is, the graphs of ss and uu .
When X is analytic near 0 then these manifolds are Further Reading
also analytic. So, up to an analytic change of variables,
Arrowsmith D and Place C (1990) Dynamical Systems. Cambridge:
we can assume that Es and Eu are invariant, which Cambridge University Press.
gives a simplification of the expression of X. More- Aulbach B (1992) One-dimensional center manifolds are C1 .
over, there is analytic dependence on parameters. Results in Mathematics 21: 311.
472 Channels in Quantum Information Theory

Birkhoff GD (1966) Dynamical Systems. With an addendum by Ilyashenko YS (1981) In the theory of normal forms of analytic
Jurgen Moser. American Mathematical Society Colloquium differential equations violating the conditions of Bryuno
Publications, vol. IX. Providence, RI: American Mathematical divergence is the rule and convergence the exception. Moscow
Society. University Mathematical Bulletin 36(2): 1118.
Bonckaert P (1997) Conjugacy of vector fields respecting Ilyashenko YS and Pyartli AS (1986) Materialization of reso-
additional properties. Journal of Dynamical and Control nances and divergence of normalizing series for polynomial
Systems 3: 419432. differential equations. Journal of Mathematical Sciences
Bonckaert P (2000) Symmetric and reversible families of vector 32(3): 300313.
fields near a partially hyperbolic singularity. Ergodic Theory Ilyashenko YS and Yakovenko SY (1991) Finitely smooth normal
and Dynamical Systems 20: 16271638. forms of local families of diffeomorphisms and vector fields.
Broer H (1981) Formal normal forms for vector fields and some Russian Mathematical Surveys 46: 143.
consequences for bifurcations in the volume preserving case. Iooss G and Lombardi E (2005) Polynomial normal forms with
In: Dynamical Systems and Turbulence, Warwick 1980, exponentially small remainder for analytic vector fields.
vol. 898, Lecture Notes in Mathematics. New York: Springer. Journal of Differential Equations 212: 161.
Broer H and Takens F (1989) Formally symmetric normal forms Palis J and Takens F (1977) Topological equivalence of normally
and genericity. Dynamics Reported. A Series in Dynamical hyperbolic dynamical systems. Topology 16(4): 335345.
Systems and their Applications 2: 1118. Siegel CL and Moser JK (1971) Lectures on Celestial Mechanics,
Bruno AD (1989) Local Methods in Nonlinear Differential (reprint 1995). Berlin: Springer.
Equations. New York: Springer. Sternberg S (1958) On the structure of local homeomorphisms of
Chow S-N, Li C, and Wang D (1994) Normal Forms and Euclidean n-space. II. American Journal of Mathematics
Bifurcations of Planar Vector Fields. Cambridge: Cambridge 80: 623631.
University Press. Takens F (1971) Partially hyperbolic fixed points. Topology
Chow S-N, Liu W, and Yi Y (2000) Center manifolds for invariant 10: 133147.
sets. Journal of Differential Equations 168: 355385. Vanderbauwhede A (1989) Center manifolds, normal forms and
Dumortier F (1991) Local study of planar vector fields: singula- elementary bifurcations. In: Kirchgraber U and Walther O
rities and their unfoldings. In: Van Groesen E and De Jager EM (eds.) Dynamics Reported, vol. 2, pp. 89169. New York:
(eds.) Structures in Dynamics, Studies in Mathematical Physics, Wiley.
vol. 2, pp. 161241. Amsterdam: Elsevier. Vanderbauwhede A and Iooss G (1992) Center manifold theory in
Gaeta G (2002) Poincare normal and renormalized forms. Acta infinite dimensions. In: Jones CKRT et al. (eds.) Dynamics
Applicandae Mathematicae 70(13): 113131 (symmetry and Reported, vol. 1, New Series, pp. 125163. Berlin: Springer.
perturbation theory).

Channels in Quantum Information Theory


M Keyl, Universita di Pavia, Pavia, Italy C
-algebras (which are, in our case, always finite
2006 Elsevier Ltd. All rights reserved.
dimensional): quantum systems can be represented
in terms of the algebra B(H) of (bounded) operators
on the Hilbert space H = Cd ; for classical informa-
Introduction tion we have to choose the set C(X) of (continuous),
complex-valued functions on the finite alphabet X;
Consider a typical quantum system such as a string and the tensor product of both B(H) C(X)
of ions in a trap. To process the quantum describes hybrid systems which are half-classical
information the ions carry, we have to perform in and half-quantum. Assume now that A is one of
general many steps of a quite different nature. these algebras. Effects (i.e., yes/no measurements on
Typical examples are: free time evolution (including the system in question) are then described by A 2 A
unwanted but unavoidable interactions with the satisfying 0  A  1, states are positive, normalized
environment), controlled time evolution (e.g., the linear functionals ! : A ! C, and the probability to
application of a quantum gate in a quantum get the result yes during an A measurement on a
computer), preparations and measurements. Each system in the state ! is given by !(A). Since A is
processing step can be described by a channel which assumed to be finite dimensional, each state ! on
transforms input systems into output system of a B(H) is represented by a density operator , that is,
possibly different type (e.g., a measurement trans- !(A) = tr(A).P Likewise, a state ! on C(X) has the
forms quantum systems into classical information). form !(A) = x A(x)px , where (px )x2X denotes a
probability distribution on X, and a state ! on
Systems, States, and Algebras
B(H) C(X) is described by a sequence (x )x2X of
To get a unified mathematical description of systems positive
P (trace-class) operatorsPon B(H) with
of different physical nature, it is useful to consider x tr( x ) = 1 such that !(A) = x tr(x Ax ). Here
Channels in Quantum Information Theory 473

we have used the fact that an element A 2 B(H)  (i) T is called positive if T(A)  0 holds for all
C(X) can be represented in a canonical way by a positive A 2 A.
sequence (Ax )x2X of operators on H. The set of (ii) T is called completely positive (CP) if T 
states will be denoted in the following by S(A) and Id : A  B(Cn ) ! B(H)  B(Cn ) is positive for
the set of effects by E(A). all n 2 N. Here Id denotes the identity map
on B(Cn ).
Completely Positive Maps (iii) T is called unital if T(1) = 1 holds.
Our aim is now to get a mathematical object which Consider now the map T  : B ! A which is dual
can be used to describe a channel. To this end, to T, that is, T  (A) = (TA) for all  2 B and A 2 A.
consider two C -algebras, A, B, describing the input It is called the Schrodinger-picture representation of
and output system, respectively, and an effect A 2 B the channel T, since it maps states to states provided T
of the output system. If we invoke first a channel is unital. (Complete) positivity can be defined in the
which transforms A systems into B systems, and Schrodinger picture as in the Heisenberg picture, and
measure A afterwards on the output systems, we end we immediately see that T is (completely) positive iff
up with a measurement of an effect T(A) on the T  is.
input systems. Hence, we get a map T : E(B) ! E(A) It is natural to ask whether the distinction
which completely describes the channel (note that between positivity and complete positivity is
the direction of the mapping arrow is reversed really necessary, that is, whether there are
compared to the natural ordering of processing). positive maps which are not CP. If at least one
Alternatively, we can look at the states and interpret of the algebras A or B is classical, the answer is
a channel as a map T  : S(A) ! S(B) which trans- no: each positive map is CP in this case. If both
forms A systems in the state  2 S(A) into B systems algebras are quantum however, complete positiv-
in the state T  (). To distinguish between both ity is not implied by positivity alone. The most
maps, we can say that T describes the channel in the prominent example for this fact is the transposi-
Heisenberg picture and T  in the Schrodinger tion map.
picture. On the level of the statistical interpretation, If item (ii) holds only for a fixed n 2 N,
both points of view should coincide of course, that the map T is called n-positive. This is obviously
is, the probabilities (T  )(A) and (TA) to get the a weaker condition than complete positivity.
result yes during an A measurement on B systems However, n-positivity implies m-positivity for
in the state T  , respectively a TA measurement on all m  n, and for A = B(Cd ) complete positivity
A systems in the state , should be the same. Since is implied by n-positivity, provided n  d holds.
(T  )(A) is linear in A, we see immediately that T Let us consider now the question whether a
must be an affine map, that is, T(1 A1 2 A2 ) = channel should be unital or not. We have already
1 T(A1 ) 2 T(A2 ) for each convex linear combina- mentioned that T(1)  1 must hold since effects
tion 1 A1 2 A2 of effects in B, and this in turn should be mapped to effects. If T(1) is not equal to 1,
implies that T can be extended naturally to a linear we get (T1) = T  (1) < 1 for the probability to
map, which we will identify in the following with measure the effect 1 on systems in the state T  ,
the channel itself, that is, we say that T is the but this is impossible for channels which produce an
channel. output with certainty, because 1 is the effect which
Let us now change slightly our point of view and is always true. In other words, if a CP map is not
start with a linear operator T : A ! B. To be a unital, it describes a channel which sometimes
channel, T must map effects to effects, that is, T has produces no output at all and T(1) is the effect
to be positive: T(A)  0 8 A  0 and bounded from which measures whether we have got an output. We
above by 1, that is, T(1)  1. In addition, it is natural will assume henceforth that channels are unital if
to require that two channels in parallel are again a nothing else is explicitly stated.
channel. More precisely, if two channels T : A1 ! B1
and S : A2 ! B2 are given, we can consider the map
T  S which associates to each A  B 2 A1  A2 the Quantum Channels
tensor product T(A)  S(B) 2 B1  B2 . It is natural to
In this section we will discuss some basic properties
assume that T  S is a channel which converts
of CP maps which transform quantum systems into
composite systems of type A1  A2 into B1  B2
quantum systems, in particular the Stinespring
systems. Hence, S  T should be positive as well.
theorem, which constitutes the most important
Definition 1 Consider two observable algebras structural result. For a more detailed presentation,
A, B and a linear map T : A ! B  B(H). including generalizations to more general input/
474 Channels in Quantum Information Theory

output algebras the reader should consult the This representation of a channel has a (seemingly)
textbook by Paulsen (2002). very nice physical interpretation, because we can
look at eqn [3] as the unitary interaction of the
system with an unobservable environment, which is
The Stinespring Theorem
initially in the state 0 . The problem, however, is
Hence consider channels between quantum systems, that there is a great arbitrariness in the choice of U
i.e., A = B(H1 ) and B = B(H2 ). A fairly simple and 0 . This is the weakness of the ancilla form
example (not necessarily unital) is given in terms of compared to the Stinespring representation.
an operator V : H1 ! H2 by B(H1 ) 3 A 7! VAV  2 Finally, let us state a related result. It characterizes
B(H2 ). A second example is the restriction to a all decompositions of a given completely positive
subsystem, which is given in the Heisenberg picture map into completely positive summands. By analogy
by B(H) 3 A 7! A  1K 2 B(H  K). Finally the com- with results for states on abelian algebras (i.e.,
position S  T = ST of two channels is again a probability measures), we will call it a Radon
channel. The following theorem says that each Nikodym theorem (see Arveson (1969) for a proof).
channel can be represented as a composition of Theorem 5 (RadonNikodym theorem). Let
these two examples [7]. Tx : B(H1 ) ! B(H2 ), x 2 X be a family of CP
Theorem 2 (Stinespring dilation theorem). Every maps and let V : P H2 ! H1  K be the Stinespring
operator of T =
completely positive map T : B(H1 ) ! B(H2 ) has the x Tx ; then there are uniquely
form determined
P positive operators Fx in B(K) with
x Fx = 1 and
TA V  A  1K V 1
Tx A V  A  Fx V 4
with an additional Hilbert space K and an operator
V : H2 ! H1  K. Both (i.e., K and V) can be
chosen such that the span of all (A  1)V with A 2
B(H1 ) and  2 H2 is dense in H1  K. This The Jamiokowski Isomorphism
particular decomposition is unique (up to unitary The subject of this section is a relation between CP
equivalence) and is called the minimal maps and states of bipartite systems, first discovered
decomposition. by Jamiokowski (1972), and which is very useful in
By introducing a family jj ihj j of one-dimen- translating properties of bipartite systems into
P
sional projectors with j jj ihj j = 1, we can define properties of positive maps and vice versa.
the Kraus operators h , Vj i = h  j , Vi. The idea is based on the following setup. Alice
In terms of these, we can rewrite eqn [1] in and Bob share a bipartite system in a maximally
the following form (Kraus 1983): entangled state
Corollary 3 (Kraus form). Every CP map 1 X d

T : B(H1 ) ! B(H2 ) can be written in the form  p e  e 2 H  H 5


d 1
X
N
(where e1 , . . . , ed denote an orthonormal basis of H).
TA Vj AVj 2
j1
Alice applies to her subsystem a channel T : B(H) !
B(H0 ) while Bob does nothing. At the end of the
with operators Vj : H2 ! H1 . processing, the overall system ends up in a state
To get a third representation of channels, consider RT T  Idjihj 6
the Stinespring form [1] of T and a vector 2 K
Mathematically, eqn [6] makes sense if T is only
such that U(  ) = V() can be extended to a
linear but not necessarily positive or CP (but then
unitary map U : H  K ! H  K. It is then easy to
RT is not positive either). If we denote the space of
see that the dual T  of T can be written as:
all linear maps from B(H) into B(H0 ) by L, we get a
Corollary 4 (Ancilla form). Assume that T : B(H) ! map
B(H) is a channel. Then there is a Hilbert space K, a
L 3 T 7! RT 2 BK  H 7
pure state 0 , and a unitary map U : H  K ! H  K
such that which is easily shown to be linear (i.e.,
  RTS = RT RS for all ,  2 C and all
T  trK U  0 U 3
T, S 2 L). Furthermore, this map is bijective, hence
holds. a linear isomorphism.
Channels in Quantum Information Theory 475

Theorem 6 The map defined in eqns [7] and [6] is The most prominent examples of covariant
a linear isomorphism. The inverse map is given by channels arise with H1 = H2 = Cd , G = U(d) and

1 (U) =
2 (U) = U. All channels of this type are of
BH  H0 3  7! T 2 L 8
the form
with
  TA 1 #A #d 1 trA1
he0 ; T e0 i d tr je0 ihe0 j  T 9 with # 2 0; d2 =d2 1 11
where e01 , . . . , e0d0 2 H0 denote an (arbitrary) ortho- and are known as depolarizing channels. They
normal basis of H0 and the transposition of  is often serve as a standard model for noise. Two
defined with respect to the basis e ,  = 1, . . . , d used particular cases are the ideal channel arising with
to define  in [5]. # = 0, and the completely depolarizing channel
From the definition of RT in eqn [6], it is obvious (# = 1) which erases all information. If we choose
 (where the bar denotes complex conju-

2 (U) = U
that RT is positive, if T is CP. To see that the
converse is also true is not as trivial (because a gate) instead of
2 (U) = U, we get
transposition is involved), but it requires only a #  
short calculation, which is omitted here. Hence, we TA trA1 AT
d1
get:
1 # 
trA1 AT ; # 2 0; 1 12
Corollary 7 The operator RT is positive, iff the d 1
map T is CP.
If we map these channels to states of bipartite
systems (using the Jamiokowski isomorphism from
Examples the last section), we get Isotropic states from
eqn [11] and Werner states from [12].
Let us return now to the general case (i.e., arbitrary
input and output algebras) and discuss several
examples. Classical Channels

The classical analog to a quantum operation is a


Channels Under Symmetry channel T : C(X) ! C(Y) which describes the trans-
It is often useful to consider channels with special mission or manipulation of classical information. As
symmetry properties. To be more precise, consider already mentioned in the subsection Completely
a group G and two unitary representations
1 ,
2 positive maps, positivity and complete positivity
on the Hilbert spaces H1 and H2 , respectively. are equivalent in this case. Hence, we have to
A channel T : B(H1 ) ! B(H2 ) is called covariant assume only that T is positive and unital. Obviously,
(with respect to
1 and
2 ) if T is characterized by its matrix elements
Txy = y (Tex ), where y 2 C (X) denotes the Dirac
T
1 UA
1 U 
2 UTA
2 U measure at y 2 Y and ex 2 C(X) is the canonical
8 A 2 BH1 8 U 2 G 10 basis in C(X). More precisely, y and ex denote,
respectively, the probability distribution and the
holds. The general structure of covariant channels function on X, given by
is governed by a fairly powerful variant of Stine-
springs theorem (Keyl and Werner 1999). y xy x2X and ex y xy 13
Theorem 8 Let G be a group with finite-dimen- We will keep this notation up to the end of this
sional unitary representations
j : G ! U(Hj ) and article. Positivity and normalization of T imply that
T : B(H1 ) ! B(H2 ) a
1 ,
2 -covariant channel. 0  Txy  1 and
(i) Then there is a finite-dimensional unitary
1 y 1 y T1
representation
: G ! U(K) and an operator " !#
V : H2 ! H1  K with V
2 (U) =
1 (U) 
(U)V
X X
 y T ex Txy 14
and T(A)
P = V A  1V. x x
(ii) If T =  T  is a decomposition of T in CP and
covariant
P summands, there is a decomposition holds. Hence the family (Txy )x2X is a probability
1 =  F of the identity operator on K into distribution on X and Txy is, therefore, the transition
positive operators F 2 B(K) with [F ,
(g)]
=0 probability to get the information x 2 X at the
such that T  (X) = V  (X  F )V. output side of the channel if y 2 Y was sent.
476 Channels in Quantum Information Theory

   
Observables tr Tx  tr Tx 1 trT1  ex 18
Let us consider now a channel which transforms is (again) the probability to measure x 2 X on .
quantum information B(H) into classical information The instrument T can be expressed in terms of the
C(X). Since positivity and complete positivity are operations Tx by
again equivalent, we just have to look at a positive X
and unital map E : C(X) ! B(H). With the canonical TA  f f xTx A 19
basis ex , x 2 X, of C(X), we get a family x
Ex = E(e
P x ), x 2 X, of positive operators Ex 2 B(H) Hence, we can identify T with the family Tx , x 2 X.
with x2X Ex = 1. Hence, the Ex form a positive Finally, we can consider the second marginal of T
operator valued (POV) measure, i.e., an observable. X
If, on the other hand, a POV measure Ex 2 B(H), x 2 BH 3 A 7! TA  1 Tx A 2 BK 20
X, is given, we can define a quantum-to-classical x2X
channel E : C(X) ! B(H) by It describes the operation we get if the outcome of
X the measurement is ignored.
Ef f xEx 15 The best-known example of an instrument is a von
x2X
NeumannLuders measurement associated with a PV
This shows that the observable Ex , x 2 X, and the measure given by family of projections Ex , x = 1,
channel E can be identified. . . . , d; for example, the eigenprojections of a self-
adjoint operator A 2 B(H). It is defined as the channel
Preparations
T : BH  CX ! BH
Let us now exchange the role of C(X) and B(H); in with X f1; . . . ; dg and Tx A Ex AEx 21
other words, let us consider a channel R : B(H) !
1
C(X) with a classical input and a quantum output Hence, we get the final state tr(Ex ) Ex Ex if we
algebra. In the Schrodinger picture, we get a family of measure the value x 2 X on systems initially in the
density matrices x := R ( x ) 2 B (H), x 2 X, where state  this is well known from quantum mechanics.
x 2 C (X) denotes again the Dirac measure on X.
Hence, we get a parameter-dependent preparation
Parameter-Dependent Operations
that can be used to encode the classical information
x 2 X into the quantum information x 2 B (H). Let us change now the role of B(H)  C(X) and
B(K); in other words, consider a channel T : B(K) !
Instruments B(H)  C(X) with hybrid input and quantum output.
An observable describes only the statistics of It describes a device which changes the state of a
measuring results, but does not contain information system depending on the additional classical infor-
about the state of the system after the measurement. mation. As for an instrument, T decomposes into a
To get a description which fills this gap, we have family of (unital!) channels
P Tx : B(K) ! B(H) such
to consider channels which operate on quantum that we get T  (  p) = x px Tx () in the Schrodin-
systems and produce hybrid systems as output, that is, ger picture. Physically, T describes a parameter-
T : B(H)  C(X) ! B(K). Following Davies (1976), dependent operation: depending on the classical
we will call such an object an instrument. From T we information x 2 X, the quantum information  2
can derive the subchannel B(K) is transformed by the operation Tx .
Finally, we can consider a channel T : B(H) 
CX 3 f 7! T1  f 2 BK 16 C(X) ! B(K)  C(Y) with hybrid input and output
to get a parameter-dependent instrument: similarly
which is the observable measured by T, that is, to the above discussion, we can define a family of
tr(T(1  ex )) is the probability to measure x 2 X on instruments Ty : B(H) P C(X) ! B(K), y 2 Y, by the
systems in the state . On the other hand, we get for equation T  (  p) = y py Ty (). Physically, T
each x 2 X a quantum channel (which is not unital) describes the following device: it receives the
BH 3 A 7! Tx A TA  ex 2 BK 17 classical information y 2 Y and a quantum system
in the state  2 B (K) as input. Depending on y, a
It describes the operation performed by the instru- measurement with the instrument Ty is performed,
ment T if x 2 X was measured. More precisely, if a which in turn produces the measuring value x 2 X
measurement on systems in the state  gives the and leaves the quantum system in the state (up to
result x 2 X, we get (up to normalization) the state normalization) Ty, x (); with Ty, x given as in eqn
Tx () after the measurement, while [17] by Ty, x (A) = Ty (A  ex ).
Chaos and Attractors 477

See also: Capacities Enhanced by Entanglement; Davies EB (1976) Quantum Theory of Open Systems. London:
Capacity for Quantum Information; Entanglement; Academic Press.
Optimal Cloning of Quantum States; Positive Maps on Jamiokowski A (1972) Linear transformations which preserve
C*-Algebras; Quantum Channels: Classical Capacity; trace and positive semidefiniteness of operators. Reports on
Mathematical Physics 3: 275278.
Quantum Dynamical Semigroups; Quantum Entropy;
Keyl M and Werner RF (1999) Optimal cloning of pure states, testing
Quantum Spin Systems; Source Coding in Quantum single clones. Journal of Mathematical Physics 40: 32833299.
Information Theory. Kraus K (1983) States Effects and Operations. Berlin: Springer.
Paulsen VI (2002) Completely Bounded Maps and Dilations.
Cambridge: Cambridge University Press.
Further Reading Stinespring WF (1955) Positive functions on C -algebras.
Proceedings of the American Mathematical Society 6: 211216.
Arveson W (1969) Subalgebras of C -algebras. Acta Mathematica
123: 141224.

Chaos and Attractors


R Gilmore, Drexel University, Philadelphia, PA, USA attractor has fractal structure, it is called a strange
2006 Elsevier Ltd. All rights reserved. attractor.
Tools to study strange attractors have been
developed that depend on three types of mathe-
matics: geometry, dynamics, and topology.
Introduction Geometric tools attempt to study the metric
Chaos is a type of behavior that can be exhibited by relations among points in a strange attractor.
a large class of physical systems and their mathe- These include a spectrum of fractal dimensions.
matical models. These systems are deterministic. These real numbers are difficult to compute, require
They are modeled by sets of coupled nonlinear very long, very clean data sets, provide a number
ordinary differential equations (ODEs): without error estimates for which there is no
underlying statistical theory, and provide very little
dxi
x_ i fi x; c 1 information about the attractor.
dt Dynamical tools include estimation of Lyapunov
called dynamical systems. The coordinates x desig- exponents and a Lyapunov dimension. They include
nate points in a state space or phase space. globally averaged exponents and local Lyapunov
Typically, x 2 Rn or some n-dimensional manifold exponents. These are eigenvalues related to the
for some n  3, and c 2 Rk are called control different stretching ( > 0) and squeezing ( < 0)
parameters. They describe parameters that can be eigendirections in the phase space. To each globally
controlled in physical systems, such as pumping averaged Lyapunov exponent i , 1  2   n ,
rates in lasers or flow rates in chemical mixing there corresponds a partial dimension i , 0  i  1,
reactions. The most important mathematical prop- with i = 1 if i  0. The Lyapunov dimension P is
erty of dynamical systems is the uniqueness theorem, the sum of the partial dimensions dL = ni= 1 i .
which states that there is a unique trajectory through That the partial dimension i = 1 for i  0 indicates
every point at which f (x; c) is continuous and that the flow is smooth in the stretching (i > 0) and
Lipschitz and f (x; c) 6 0. In particular, two distinct flow directions and fractal in the squeezing (i < 0)
periodic orbits cannot have any points in common. directions with i < 1. Dynamical indices provide
The properties of dynamical systems are gov- some useful information about a strange attractor.
erned, in lowest order, by the number, stability, and In particular, they can be used to estimate some
distribution of their fixed points, defined by fractal properties of a strange attractor, but not vice
x_ i = fi (x; c) = 0. It can happen that a dynamical versa.
system has no stable fixed points and no stable Topological tools are very powerful for a
limit cycles (x(t) = x(t T), some T > 0, all t). In restricted class of dynamical systems. These are
such cases, if the solution is bounded and recurrent dynamical systems in three dimensions (n = 3). For
but not periodic, it represents an unfamiliar type of such systems there are three Lyapunov exponents
attractor. If the system exhibits sensitivity to initial 1 > 2 > 3 , with 1 > 0 describing the stretching
conditions (jx(t) y(t)j
et jx(0) y(0)j for direction and responsible for sensitivity to initial
jx(0) y(0)j = and  > 0 for most x(0)), the conditions, 2 = 0 describing the direction of the
solution set is called a chaotic attractor. If the flow, and 3 < 0 describing the squeezing direction
478 Chaos and Attractors

and responsible for recurrence. Strange attractors Lyapunov exponents and squeezing occurs in the
are generated by dissipative dynamical systems, directions identified by the negative Lyapunov
which satisfy the additional condition 1 2 exponents. In R3 there is one stretching direction
3 < 0. For such attractors, 1 = 2 = 1 and and one squeezing direction.
3 = 1 =j3 j by the KaplanYorke conjecture, so A simple stretch-and-squeeze mechanism that
that dL = 2 3 = 2 1 =j3 j. nature appears to be very fond of is illustrated in
A number of tools from classical topology have Figure 1. In this illustration, a cube of initial
been exploited to probe the structure of strange conditions at (a) is advected by the flow in a short
attractors in three dimensions. These include the time to (b). During this process, the cube is
Gauss linking number, the Euler characteristic, the deformed by being stretched (1 > 0). It also shrinks
PoincareHopf index theorem, and braid theory. in a transverse direction (3 < 0). During the initial
More recent topological contributions include sev- phase of this deformation, two nearby points
eral definitions for entropy, the development of a typically separate exponentially in time. If they
theory for knot holders or braid holders (also called were to continue to separate exponentially for all
branched manifolds), the BirmanWilliams theorem times, the invariant set would not be bounded.
for these objects, and relative rotation rates, a Therefore, this separation cannot continue indefi-
topological index for individual periodic orbits and nitely, and in fact it must somehow reverse itself
orbit pairs. after some time because the motion is recurrent. The
Three-dimensional strange attractors are mechanism shown in Figure 1 involves folding,
remarkably well understood; those in higher which begins between (b) and (c) and continues
dimensions are not. As a result, the description through to (d). Squeezing occurs where points from
that follows is largely restricted to strange attrac- distant parts of the attractor approach each other
tors with dL < 3 that exist in R3 or other three- exponentially, as at (d). Finally, the cube, shown
dimensional manifolds (e.g., R2 S1 ). The obstacle deformed at (d), returns to the neighborhood of
to progress in higher dimensions is the lack of a initial conditions (a). This process repeats itself and
higher-dimensional analog of the Gauss linking builds up the strange attractor. As can be inferred
number for orbit pairs in R3 . from this figure, the strange attractor constructed by
the repetitive process is smooth in the expanding
(1 ) and flow (2 = 0) directions but fractal in the
Overview squeezing (3 ) direction. The attractors fractal
The program described below has two objectives: dimension is 1 2 3 = 2 3 = 2 1 =j3 j.
Figure 1 summarizes the boundedness and recur-
1. classify the global topological structure of strange rence conditions that were introduced to define
attractors in R3 ; and strange attractors, and illustrates one stretching and
2. determine the perestroikas (changes) that such squeezing mechanism that occurs repetitively to
attractors can undergo as experimental condi- build up the fractal structure of the strange attractor
tions or control parameters change.
Four levels of structure are required to complete
this program. Each is topological and discretely
quantifiable. This provides a beautiful interaction
between a rigidity of structure, demanded by
topological constraints, and freedom within this
rigidity. These four levels of structure are: Boundary (c)
layer
1. basis sets of orbits,
2. branched manifolds or knot holders,
3. bounding tori, and
4. embeddings of bounding tori. Squeeze Stre
(d) tch
(b)

Branched Manifolds: Stretching


and Squeezing (a)
Figure 1 A common stretch-and-fold mechanism generates
A strange attractor is generated by the repetition of many experimentally observed strange attractors. The Topology
two mechanisms: stretching and squeezing. Stretch- of Chaos; R Gilmore and M Lefranc; Copyright 2002, Wiley.
ing occurs in the directions identified by the positive This material is used by permission of John Wiley & Sons, Inc.
Chaos and Attractors 479

and to organize all the (unstable) periodic orbits in it outflow side of the branch line) have two preimages
in a unique way. The particular mechanism shown above the branch line, one in each inflow sheet. This
in Figure 1 is called a stretch-and-fold mechanism. structure generates positive entropy.
Other mechanisms involve stretch and roll, and tear A beautiful theorem of Birman and Williams
and squeeze. justifies the use of the two cartoons shown at the
The stretch-and-squeeze mechanisms are well bottom of Figure 2 to characterize strange attractors
summarized by the cartoons shown in Figure 2. On in R3 . As preparation for the theorem, Birman and
the left, a cube of initial conditions (top) is deformed Williams introduced an important identification for
under the flow. The flow is downward. Stretching the nongeneric or atypical points that are not
occurs in one direction (horizontal) and shrinking sensitive to initial conditions
occurs in a transverse direction (perpendicular to the
t!1
page). In the limit of extreme shrinking (3 ! x  y if jxt  ytj ! 0 2
1), the dynamics of the stretching part of the
flow is represented by the two-dimensional surface That is, two points in a strange attractor are
shown on the bottom left. This surface fails to be a identified if they have asymptotically the same
manifold because of the singularity, called a splitting future. In practice, this amounts to projecting the
point. This singularity represents an initial condition flow down along the stable (3 < 0) direction onto a
that flows to an unstable fixed point with at least two-dimensional surface described by the stretching
one stable direction. On the right (squeezing), two (1 > 0) and the flow (2 = 0) directions. This
distant cubes of initial conditions (top) in the flow surface is not a manifold because of lower-
are deformed and brought to each others proximity dimensional singularities: splitting points and branch
under the flow (middle). In the limit of extreme lines. The two-dimensional surface has many names,
dissipation, two two-dimensional surfaces represent- for example, knot holder (because it holds the
ing inflows are joined at a branch line to a single periodic orbits that exist in abundance in strange
surface representing an outflow. This surface fails to attractors), braid holders, templates, branched mani-
be a manifold because of the branch line, which is a folds. The flow, restricted to this surface, is called a
singularity of a different kind. Points below the semiflow. Under the semiflow, points in the branched
branch line in this representation of the flow (on the manifold have a unique future but do not have a
unique past. The degree of nonuniqueness is mea-
sured by the topological entropy of the dynamical
system. The BirmanWilliams theorem is:
Theorem Assume that a flow t
3
(i) on R is dissipative (1 > 0, 2 = 0, 3 < 0 and
1 2 3 < 0);
Shrink
Stretch (ii) generates a hyperbolic strange attractor (the
Shrink Shrink
xxx
eigenvectors of the local Lyapunov exponents
Boundary 1 , 2 , 3 span everywhere on the attractor).
layer
Squeeze
Then the projection [2] maps the strange attractor
Flow

Flow

SA to a branched manifold BM and the flow t on


SA to a semiflow  t on BM in R3 . The periodic
orbits in SA under t correspond 1:1 with the
periodic orbits in BM under  t with perhaps one or
two specified exceptions. On any finite subset of
orbits the correspondence can be taken via isotopy.
Stretch
Squeeze The beauty of this theorem is that it guarantees
Branch
Flow

line that a flow t that generates a (fractal) strange


Flow

attractor SA can be continuously deformed to a new


flow  t on a simple two-dimensional structure BM.
Figure 2 Left: The stretch mechanism is modeled by a two- During this deformation, periodic orbits are neither
dimensional surface with a splitting point singularity. Right: The created nor destroyed. The uniqueness theorem for
squeeze mechanism is modeled by a two-dimensional surface
with a branch line singularity. The Topology of Chaos; R Gilmore
ODEs is satisfied during the deformation, so orbit
and M Lefranc; Copyright 2002, Wiley. This material is used segments do not pass through each other. As a
by permission of John Wiley & Sons, Inc. result, the topological organization of all the
480 Chaos and Attractors

unstable periodic orbits in the strange attractor is unique up to cyclic permutation. This symbol
the same as the topological organization of all the sequence provides a symbolic name for the orbit.
unstable periodic orbits in the branched manifold. In For example, (a)()(b)(ba) is a period-4 orbit.
fact, the branched manifold (knot holder) defines The structure of a branched manifold is determined
the topological organization of all the unstable in part by a transition matrix T. The matrix element
periodic orbits that it supports. Topological organi- Tij is 1 if the transition from branch i to branch j is
zation is defined by the Gauss linking number and allowed, 0 otherwise. The transition matrix for the
the relative rotation rates, another braid index. figure-8 branched manifold is shown in Figure 3.
The significance of this theorem is that strange The BirmanWilliams theorem is stronger than its
attractors can be characterized in fact classified statement suggests. More systems satisfy the state-
by their branched manifolds. Figure 3 shows a ment of the theorem than do the assumptions of the
branched manifold for a figure-8 knot as well as theorem. The figure-8 knot, and its attendant
the figure-8 knot itself (dark curve). If a constant magnetic field, is not dissipative in fact, it is not
current is sent through a conducting wire tied into even a dynamical system, yet the closed loops can be
the shape of a figure-8 knot, a discrete countable set isotoped to the figure-8 knot holder. There are other
of magnetic field lines will be closed. These closed ways in which the BirmanWilliams theorem is
field lines can be deformed onto the two-dimen- stronger than its statement suggests.
sional surface shown in Figure 3. Each of the eight It is apparent from Figure 3 that the figure-8
branches of this branched manifold can be named. branched manifold can be built up Lego fashion
One way to do this specifies the two branch lines from the two basic building blocks shown in
that are joined by the branch in the sense of the flow Figure 2. This is more generally true. Every
(e.g., (a) and () (but not (a)). Every closed field branched manifold can be built up, Lego fashion,
line can be labeled by a symbol sequence that is from the stretch (with a splitting point singularity)
and the squeeze (with a branch line singularity)
building blocks, subject to the following two
conditions:
1. outputs flow to inputs and
2. there are no free ends.

The figure-8 branched manifold is built up from


a b four stretch and four squeeze building blocks. As a
result, there are eight branches and four branch
lines.
Two often-studied strange attractors are shown in
Figures 4 and 5. Figure 4 shows the details of the
Rossler dynamical system. A similar spectrum of
features is shown in Figure 5 for the Lorenz equations.
ab a a ba b b
The knot holder in Figure 5e is obtained from the
ab 0 0 0 0 1 1 0 0 caricature in Figure 5d by twisting the right-hand lobe
by  radians.
a 0 0 1 1 0 0 0 0
Branched manifolds can be used to characterize
0 0 0 0 0 0 1 1 all three-dimensional strange attractors. Branched
manifolds that classify the strange attractors gener-
a 1 1 0 0 0 0 0 0
ated by four familiar sets of equations (for some
ba 1 1 0 0 0 0 0 0 control parameter values) are shown in Figure 6.
b
The sets of equations, and one set of parameter
0 0 0 0 0 0 1 1
values that generate strange attractors, are presented
0 0 1 1 0 0 0 0 in Table 1.
b 0 0 0 0 1 1 0 0
The beauty of this topological classification of
strange attractors is that it is apparent, just by
Figure 3 Figure-8 knot (dark curve) and the figure-8 branched inspection, that there is no smooth change of
manifold. Transition matrix for the eight branches of the figure-8
variables that will map any of these systems to any
branched manifold is also shown. Flow direction is shown by
arrows. The Topology of Chaos; R Gilmore and M Lefranc; of the others for the parameter values shown.
Copyright 2002, Wiley. This material is used by permission of Branched manifolds can be described algebrai-
John Wiley & Sons, Inc. cally. In Figure 7 we provide the algebraic
Chaos and Attractors 481

dx = y z 4 z(t )
dt 2
dy
= x + ay 0
dt x(t )
dz = b + z(x c) 2
dt 4

(a) (b) (c)

(d) (e)
Figure 4 The Rossler dynamical system. (a) Rossler equations. (b) Time series z(t) and x(t) generated by these equations, and
(c) projection of the strange attractor onto the xy plane. (d) Caricature of the flow and (e) knot holder derived directly from the
caricature. Control parameter values (a, b,c) = (2:0, 4:0, 0:398): The Topology of Chaos; R Gilmore and M Lefranc; Copyright 2002,
Wiley. This material is used by permission of John Wiley & Sons, Inc.

50
40
z(t )
30
dx = x + y 20
dt 10
dy
= Rx y xz 0 x(t )
dt
dz = bz + xy 10
dt 20

(a) (b) (c)

(e) (d)
Figure 5 (a) Lorenz equations. (b) Time series x (t) and z(t) generated by these equations, and (c) projection of the strange attractor
onto the xy plane. (d) Caricature of the flow and (e) knot holder derived directly from the caricature by rotating the right-hand lobe by 
radians. Control parameter values (R, , b) = (26:0, 10:0, 8=3): The Topology of Chaos; R Gilmore and M Lefranc; Copyright 2002,
Wiley. This material is used by permission of John Wiley & Sons, Inc.

description of two branched manifolds. Figure 7a elements are twice the linking number of the
shows the branched manifold that describes experi- period-1 orbits in the corresponding pair of branches.
mental data generated by many physical systems. Since the period-1 orbits in these two branches do not
The mechanism is a simple stretch-and-fold defor- link, the off-diagonal matrix elements are 0. The
mation with zero global torsion that generates a period-1 orbits in the branches labeled 1 and 2 in
typical Smale horseshoe. There are two branches. Figure 7b have linking number 1, so the off-diagonal
The diagonal elements of the matrix identify the matrix elements are T(1, 2) = T(2, 1) = 2 1. The
local torsion of the flow through the corresponding array identifies the order (above, below) that the two
branch, measured in units of . Branch 0 has no branches are joined at the branch line, the smaller the
local torsion, and branch 1 shows a half-twist and value, the closer to the viewer. These two pieces of
has local torsion 1. The off-diagonal matrix information, four integers in Figure 7a and eight in
482 Chaos and Attractors

1
0

0 1 0 1 2

(a) (b)

b
c
a

0 0 0
a b 0 0
0 1 2
a 0 1
c 0 2 2
b
(c) (d) 0 1 0 2 1
Figure 6 Branched manifolds for four standard sets of
equations: (a) Rossler equations, (b) periodically driven Duffing (a) (b)
equations, (c) periodically driven van der Pol equations, and Figure 7 Branched manifolds are described algebraically. The
(d) Lorenz equations. The Topology of Chaos; R Gilmore and diagonal matrix elements describe the twist of each branch.
M Lefranc; Copyright 2002, Wiley. This material is used by The off-diagonal matrix elements are twice the linking number of
permission of John Wiley & Sons, Inc. the period-1 orbits in each of the two branches. The array
describes the order in which the branches are connected at the
branch line. (a) Smale horseshoe branched manifold. (b) Beginning
Table 1 Four sets of equations that generate strange attractors of a gateau roule (jelly roll) branched manifold.
Dynamical Parameter
system ODEs values Table 2 shows the number of orbits of period
p  20 for the branched manifolds with two and
x_ = y  z
Rossler y_ = x ay (a, b, c) = (2:0, 4:0, 0:398) three branches shown in Figure 7. The number of
z_ = b z(x  c) orbits of period p grows exponentially with p, and
x_ = y the limit hT = limp ! 1 log (N(p))=p defines the topo-
Duffing y_ = y  x 3 x (, A, !) = (0:4, 0:4, 1:0) logical entropy hT for the branched manifold. The
A sin(!t) limits are ln 2 and ln 3 for the branched manifolds
van der Pol x_ = by (c  dy 2 )x (b, c, d , A, !) = with two and three branches, respectively. The
y_ = x A sin(!t) (0:7, 1:0, 10:0, 0:25, =2) linking numbers of orbits up to period 5 in the
x_ = x y Smale horseshoe branched manifold are shown in
Lorenz y_ = Rx  y  xz (R, , b) = (26:0, 10:0, 8=3) Table 3, which identifies each of the orbits by its
z_ = bz xy
symbol sequence (e.g., 00111).

Table 2 Number of orbits of period p on the branched manifolds


Figure 7b, serve to determine the topological organi- with two and three branches, shown in Figure 7. The integers
zation of all the unstable periodic orbits in any N3 (p) are constructed by replacing 2p by 3p in eqn [3]
strange attractor with either branched manifold. Two Three Two Three
The periodic orbits are identified by a repeating Period branches branches Period branches branches
symbol sequence of least period p, which is unique p N2 (p) N3 (p) p N2 (p) N3 (p)
up to cyclic permutation. The symbol sequence
1 2 3 11 186 16 104
consists of a string of integers, sequentially identify- 2 1 3 12 335 44 220
ing the branches through which the orbit passes. For 3 2 8 13 630 122 640
a branched manifold with two branches, there are 4 3 18 14 1 161 341 484
two symbols. The number of orbits of period 5 6 48 15 2 182 956 576
p, N(p), obeys the recursion relation 6 9 116 16 4 080 2 690 010
7 18 312 17 7 710 7 596 480
kp=2
X 8 30 810 18 14 532 21 522 228
pNp 2p  kNk 3 9 56 2184 19 27 954 61 171 656
1kjp
10 99 5880 20 52 377 174 336 264
Chaos and Attractors 483

Table 3 Linking numbers of orbits to period 5 in the Smale horseshoe branched manifold with zero global torsion

0 1 21 31 31 41 42 42 51 51 52 52 53 53

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 1 2 1 1 2 2 2 2 1 1
21 01 0 1 1 2 2 3 2 2 4 4 3 3 2 2
31 011 0 1 2 2 3 4 3 3 5 5 5 5 3 3
31 001 0 1 2 3 2 4 3 3 5 5 4 4 3 3
41 0111 0 2 3 4 4 5 4 4 8 8 7 7 4 4
42 0011 0 1 2 3 3 4 3 4 5 5 5 5 4 4
42 0001 0 1 2 3 3 4 4 3 5 5 5 5 4 4
51 01111 0 2 4 5 5 8 5 5 8 10 9 9 5 5
51 01101 0 2 4 5 5 8 5 5 10 8 8 8 5 5
52 00111 0 2 3 5 4 7 5 5 9 8 6 7 5 5
52 00101 0 2 3 5 4 7 5 5 9 8 7 6 5 5
53 00011 0 1 2 3 3 4 4 4 5 5 5 5 4 5
53 00001 0 1 2 3 3 4 4 4 5 5 5 5 5 4

Tables of linking numbers have been used supports. Whenever a low-dimensional strange
successfully to identify mechanisms that nature uses attractor is subjected to topological analysis, it is
to generate chaotic data. This analysis procedure is always the case that fewer periodic orbits are
called topological analysis. Segments of data are present and identified than are allowed by the
identified that closely approximate unstable periodic branched manifold that classifies it. This is the case
orbits existing in the strange attractor. These data for strange attractors generated by experimental
segments are then embedded in R3 . Each orbit is data as well as strange attractors generated by
given a trial identification (symbol sequence). Their ODEs. The full spectrum occurs only in the
pairwise linking numbers are computed either by hyperbolic limit, which has never been seen.
counting signed crossings or using the time- The orbits that are present are organized exactly
parametrized data segments and estimating the as in the hyperbolic limit that is, as determined by
integers numerically using the Gauss linking integral the underlying branched manifold. As control para-
meters change, the strange attractor undergoes
LinkA; B
I I perestroikas. New orbits are created and/or old
1 r A t1  r B t2 orbits are annihilated in direct or inverse period-
dr A t1 dr B t2
4 jr A t1  r B t2 j3 doubling and saddlenode bifurcations. The orbits
that are present are always organized as determined
This table of experimental integers is compared with by the branched manifold. Orbits are not created or
the table of linking numbers for orbits with the same annihilated independently of each other. Rather,
symbolic name on a trial branched manifold. This there is a partial order (forcing order) involved in
procedure serves to identify the branched manifold orbit creation and annihilation. This partial order is
and refine the symbolic identifications of the poorly understood for general branched manifolds.
experimental orbits, if necessary. The procedure is It is much better understood for the two-branch
vastly overdetermined. For example, the linking Smale horseshoe branched manifold.
numbers of only three low-period orbits serve to The forcing diagram for this branched manifold
identify the four pieces of information required to is shown in Figure 8 for orbits up to period 8. It is
specify a branched manifold with two branches. typically the case that the existence of one orbit in
Since six or more surrogate periodic orbits can a strange attractor forces the presence of a
typically be
  extracted from experimental data, spectrum of additional orbits. Forcing is transitive,
providing 62 = 15 or more linking numbers, this so if orbit A forces orbit B(A ) B) and B forces C,
topological analysis procedure has built-in self- then A forces C: if A ) B and B ) C then A ) C.
consistency checks, unlike analysis procedures For this reason, it is sufficient to show only the
based on geometric and dynamical tools. first-order forcing in this figure. The orbits shown
are labeled by their period and the order in which
they are created in a particular highly dissipative
Basis Sets of Orbits
limit of the dynamics: the logistic map (U-sequence
A branched manifold determines the topological order in Figure 8). For example, 52 describes the
organization of all the periodic orbits that it second (pair) of period-5 orbits created in the
484 Chaos and Attractors

0.70
ln 2 815
78
0.65
Forcing of horseshoe 64
orbits to period 8
0.60
810F 811R

810R 811F
0.55 52

85F 86F 8
8
0.50 813R 814R
85R 86R
Entropy

813F 814F
0.45 73F 74F
72
73R 74R

0.40 76R 77R


76F 77F

0.35 84R 87R


84F 87F
Other 61
finite order 82 812
Period 63
89
doubled 21 41 81 83 31
62 75 42 53 65 79 816
Well 71 51
ordered

(a)

U-sequence order

Wo : f 1/2 3/7 2/5 3/8 1/3 2/7 1/4 1/5 1/6 1/7 1/8

Braids
PD 21 41 81 61 82 71 51 72 83 31 62 84 73 85 52 86 74 87 63 88 75 42 89 76 810 64 811 77 812 53 813 78 814 65 815 79 816

QOD : f 2/5 1/3 1/4 1/5 1/6

(b)
Figure 8 (a) Forcing diagram for orbits up to period 8 in the Smale horseshoe branched manifold. (b) The sequence (universal
order) in which orbits are created in the highly dissipative limit, which is the logistic map. The Topology of Chaos; R Gilmore and
M Lefranc; Copyright 2002, Wiley. This material is used by permission of John Wiley & Sons, Inc.

logistic map in the transition from simple, non- period. The basis set of orbits can be constructed
chaotic behavior to fully chaotic (hyperbolic) algorithmically. The algorithm is as follows:
behavior.
1. Write down all the orbits that are present in
The orbits in the forcing diagram are organized
order of increasing two-dimensional entropy
according to their one-dimensional entropy
from left to right.
(horizontal axis, U-sequence order) and their two-
2. For orbits with the same two-dimensional entropy,
dimensional entropy (vertical axis). Nonchaotic
order by increasing one-dimensional entropy.
(laminar) behavior occurs at the lower left of
3. Remove the highest (rightmost) orbit from this
this figure, where both entropies are zero. Fully
list, together with all the orbits that it forces.
chaotic behavior occurs at the upper right, where
This is the first basis orbit.
both entropies are ln 2. As control parameters
4. Of the orbits remaining, again remove the right-
change, a dynamical system that can exhibit chaos
most and all the orbits that it forces. This is the
generated by a stretch-and-fold mechanism follows a
second basis orbit.
path in the forcing diagram from the lower left to
5. Continue until all orbits have been removed.
the upper right. Each such path is a route to
chaos. The Smale horseshoe mechanism exhibits For any finite period, the above algorithm
many different routes to chaos: each follows a terminates because there is only a finite number of
different path in the forcing diagram. orbits. For example, if the orbit 52 is present as well
The state of a strange attractor at any stage in its as all orbits with lower one-dimensional entropy,
route to chaos can be specified by a basis set of the basis set is 87 R, 76 , 74 F, 86 F, 88 , 52 . As control
orbits. This is a set of orbits whose presence forces parameters change, a strange attractor undergoes
the existence of all other orbits that can concur- perestroikas that are quantitatively determined by
rently be found in the attractor, up to any finite changes in the basis sets of orbits.
Chaos and Attractors 485

Bounding Tori surface. As a result, all singularities are saddles; so, by


the PoincareHopf theorem, the number of singularities
As experimental conditions or control parameters
is strongly related to the genus. The number is 2(g  1).
change, strange attractors can undergo grosser
The flow, restricted to the genus-g surface, can be
perestroikas than those that can be described by a
put into canonical form and these canonical forms can
change in the basis set of orbits. This occurs when new
be classified. The classification involves projection of
orbits are created that cannot be contained on the initial
the genus-g torus onto a two-dimensional surface. The
branched manifold for example, when orbits are
planar projection consists of a disk with outer
created that must be described by a new symbol. This is
boundary and g interior holes. All singularities can be
seen experimentally in the transition from horseshoe
placed on the interior holes. The flow on the interior
type dynamics to gateau roule type dynamics. This
holes without singularities is in the same direction as
involves the addition of a third branch to the branched
the flow on the exterior boundary. Interior holes with
manifold with two branches, as shown in Figures 7a
singularities have an even number, 4, 6, . . . . Some
and 7b. Strange attractors can undergo perestroikas
canonical forms are shown in Figure 9.
described by the addition of new branches to, or
Poincare sections have been used to simplify the
deletion of old branches from, a branched manifold.
study of flows in low-dimensional spaces by effec-
These perestroikas are in a very real sense grosser
tively reducing the dimension of the dynamics. In
than the perestroikas that can be described by changes
three dimensions, a Poincare surface of section for a
in the basis sets of orbits on a fixed branched manifold.
strange attractor is a minimal two-dimensional sur-
There is a structure that provides constraints on
face with the property that all points in the attractor
the allowed bifurcations of branched manifolds
intersect this surface transversally an infinite number
(creation/annihilation of branches), which is analo-
of times under the flow. The Poincare surface need
gous to the constraints that a branched manifold
not be connected and in fact is often not connected.
provides on the bifurcations and topological organi-
The Poincare section for the flow in a genus-g torus
zation of the periodic orbits that can exist on it. This
consists of the union of g  1 disjoint disks (g  3) or
structure is called a bounding torus.
is a single disk (g = 1). The locations of the disks are
Bounding tori are constructed as follows. The semi-
determined algorithmically, as shown in Figure 9. The
flow on a branched manifold is inflated or blown
interior circles without singularities are labeled by
up to a flow on a thin open set in R3 containing this
capital letters A, B, C, . . . and those with singularities
branched manifold. The boundary of this open set is a
are labeled with lowercase letters a, b, c, . . . The
two-dimensional surface. Such surfaces have been
components of the global Poincare surface of section
classified. They are uniquely tori of genus g; g = 0
are numbered sequentially 1, 2, . . . , g  1, in the order
(sphere), g = 1 (tire tube), g = 2, 3, . . . . The torus of
they are encountered when traversing the outer
genus g has Euler characteristic = 2  2g. The flow is
boundary in the direction of the flow, starting from
into this surface. The flow, restricted to the surface,
any point on that boundary. Each component of the
exhibits a singularity wherever it is normal to the
global Poincare surface of section connects (in the
surface. At such singularities the stability is determined
projection) an interior circle without singularities to
by the local Lyapunov exponents: 1 > 0 and 3 < 0,
the exterior boundary. There is one component
since the flow direction (2 = 0) is normal to the
between each successive encounter of the flow with

1 7

1 A E
a
A
1
2 6
2 a 7 3 2 A
B
B D
4 D c c b B a b c
b 4 5 c 4
5 6 E
c E C D
3 6 7 3 5
ABCBDED ABCDCBE ABCBDBE
abbacca abccbaa abbccaa

(a) (b) (c)


Figure 9 Three inequivalent canonical forms of genus 8 are shown. Each is identified by a period-7 orbit and its dual. Reprinted
figure with permission from Physical Review E, 69, 056206, 2004. Copyright (2004) by the American Physical Society.
486 Chaos and Attractors

holes that have singularities. Heavy lines are used to Table 4 Number of canonical bounding tori as a function of
show the location of the seven components of the genus g
global Poincare surface of section for each of the three g N(g) g N(g) g N(g)
inequivalent genus-8 canonical forms shown in
Figure 9. The structure of the flow is summarized by 3 1 9 15 15 2 211
a transition matrix. For the canonical form shown in 4 1 10 28 16 5 549
5 2 11 67 17 14 290
Figure 9c the transition matrix is 6 2 12 145 18 36 824
2 3 7 5 13 368 19 96 347
1 1 0 0 0 0 0
60 0 1 1 0 0 07 8 6 14 870 20 252 927
6 7
60 0 1 1 0 0 07
6 7
T6 60 0 0 0 1 1 07
7
60 0 0 0 1 1 07 canonical forms grows rapidly with g, as shown in
6 7 Table 4. In fact, the number, N(g), grows exponen-
40 1 0 0 0 0 15
tially and can even be assigned an entropy:
1 0 0 0 0 0 1
lnNg
where Ti, j = 1 if the flow can proceed directly from lim ln 3 5
g!1 g1
component i to component j, 0 otherwise.
Bounding tori, dressed with flows, can be labeled. In In some sense, canonical forms that constrain
fact, two dual labeling schemes are possible. Following branched manifolds within them behave like branched
the outer boundary in the direction of the flow, one manifolds that constrain periodic orbits on them.
encounters the g  1 components of the global Poin- Every strange attractor that has been studied in R3
care surface of section sequentially, the interior holes has been described by a canonical bounding torus that
without singularities at least once each, and the interior contains it. This classification is shown in Table 5.
holes with singularites at least twice each. The Branched manifold perestroikas are constrained
canonical form (genus-g torus dressed with a flow) on by bounding tori as follows. Each branch line of any
the genus-8 bounding torus shown in Figure 9a can be branched manifold can be moved into one of the
labeled by the sequence in which the holes without g  1 components of the global Poincare surface of
singularities are encountered (ABCBDED) or the order section. Any branched manifold contained in a
in which the holes with singularities are encountered genus-g bounding torus (g  3) must have at least
(abbacca). Both sequences contain g  1 symbols. one branch between each pair of components of the
These labels are unique up to cyclic permutation. global Poincare surface of section between which the
Symbol sequences for canonical forms for bounding flow is allowed, as summarized by the canonical
tori act in many ways like symbol sequences for forms transition matrix. New branches can only be
periodic orbits on branched manifolds. Although there added in a way that is consistent with the canonical
is a 1:1 correspondence between bounded closed two- forms transition matrix, continuity requirements,
dimensional surfaces in R3 and genus g, the number of and the no intersection condition.

Table 5 All known strange attractors of dimension dL < 3 are bounded by one of the standard dressed tori. Dual labels for the
bounding tori depend on g  1 symbols describing holes with or without singularities

Strange attractor Holes w/o singularites Holes with singularities Genus

Rossler, Duffing, Burke, and Shaw A 1


Various lasers, gateau roule A 1
Neuron with subthreshold oscillations A 1
Shawvan der Pol A 1
Lorenz, ShimizuMorioka, Rikitake AB aa 3
C2 covers of Rossler AB a2 3
C2 cover of Lorenza ABCD a4 5
C2 cover of Lorenzb ABCB abba 5
2 ! 1 Image of figure-8 branched manifold ABCB ab(ab)1 5
Figure-8 branched manifold AEBECEDE a2 b 2 c 2 d 2 9
Cn covers of Rossler AB    N an n1
Cn cover of Lorenza AB    (2N) a 2n 2n 1
Cn cover of Lorenzb (AZ )(BZ )    (NZ ) a2 b 2    n 2 2n 1
Multispiral attractors A(B    M)N(B    M)1 (ab    m)(ab    m)1 2m 1
a
Rotation axis through origin.
b
Rotation axis through one focus.
Chaos and Attractors 487

In the simplest case, g = 1, a third branch can be canonical flow have a larger (but discrete) variety of
added to a branched manifold with two branches only extrinsic embeddings in R3 .
if its local torsion differs by
1 from the adjacent
branch. In addition, the ordering of the new branch
must be consistent with the continuity and no The Embedding Question
intersection (ODE uniqueness theorem) requirements.
The mechanism that nature uses to generate chaotic
behavior in physical systems is not directly observable,
and must be deduced by examining the data that are
Embeddings of Bounding Tori generated. Typically, the data consist of a single scalar
The last level of topological structure needed for the time series that is discretely recorded: xi , i = 1, 2, . . . .
classification of strange attractors in R3 describes In order to exhibit a strange attractor, a mapping of the
their embeddings in R3 . The classification using data into RN must also be constructed. If the attractor
genus-g bounding tori is intrinsic that is, the is low dimensional (dL < 3), one can hope that a
canonical form shows how the flow looks from mapping into R3 can be constructed that exhibits no
inside the torus. Strange attractors, and the tori that self-intersections or other degeneracies. Such a map is
bound them, are actually embedded in R3 . For a called an embedding. Once an embedding in R3 is
complete classification, we must specify not only the available, a topological analysis can be carried out. The
canonical form but also how this form sits in R3 . analysis reveals the mechanism that underlies the
This program has not yet been completed, but we creation of the embedded strange attractor.
illustrate it with the genus-1 bounding torus in But how do you know that the mechanism that
Figure 10. Figure 10a shows the canonical form, and generates the observed, embedded strange attractor
two different embeddings of it in R3 . The embedding has anything to do with the mechanism nature used
on the left is unknotted. The embedding on the right is to generate the experimental data?
knotted like a figure-8 knot. Extrinsic embeddings of If the embedding is contained in a genus-1 bounding
genus-1 tori are described by tame knots in R3 , and torus, then the topological mechanism that generates
tame knots can be used as centerlines for extrinsi- the data, as defined by some unknown branched
cally embedded genus-1 tori. Higher-genus (g  3) manifold BMEXP , and the topological mechanism that
canonical forms intrinsic genus-g tori dressed with a is identified from the embedded strange attractor
BMEMB , are identical up to three degrees of freedom:
parity, global torsion, and the knot type. As a result, in
this case (genus-1) a topological analysis of embedded
data does reveal natures hidden secrets.

See also: Ergodic theory; Fractal dimensions in


dynamics; Generic Properties of Dynamical Systems;
Gravitational N-body Problem (Classical);
Homeomorphisms and Diffeomorphisms of the Circle;
Homoclinic phenomena; Inviscid Flows; Lyapunov
Exponents and Strange Attractors; Nonequilibrium
Statistical Mechanics (Stationary): Overview; Random
Algebraic Geometry, Attractors and Flux Vacua; Random
Matrix Theory in Physics; Regularization for Dynamical
(a) Zeta Functions; Singularity and Bifurcation Theory;
Symmetry and Symmetry Breaking in Dynamical
Systems; Synchronization of Chaos.

Further Reading
Abraham R and Shaw CD (1992) Dynamics: The Geometry of
Behavior, Studies in Nonlinearity, 2nd edn. Reading, MA:
Addison-Wesley.
Eckmann J-P and Ruelle D (1985) Ergodic theory of chaos and
strange attractors. Reviews of Modern Physics 57(3): 617656.
(b) (c)
Gilmore R (1998) Topological analysis of chaotic dynamical
Figure 10 (a) Canonical form for genus-1 bounding torus. systems. Reviews of Modern Physics 70(4): 14551529.
Extrinsic embeddings of the torus into R 3 that are (b) unknotted Gilmore R and Lefranc M (2002) The Topology of Chaos, Alice
and (c) knotted like the figure-8 knot. in Stretch and Squeezeland. New York: Wiley.
488 Characteristic Classes

Gilmore R and Letellier C (2006) The Symmetry of Chaos Alice Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge
in the Land of Mirrors. Oxford: Oxford University Press. University Press.
Gilmore R and Pei X (2001) The topology and organization of Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear
unstable periodic orbits in HodgkinHuxley models of receptors Physics and Its Mathematical Tools. Bristol: IoP Publishing.
with subthreshold oscillations. In: Moss F and Gielen S (eds.) Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental
Handbook of Biological Physics, Neuro-informatics, Neural Approach to Nonlinear Dynamics and Chaos. Reading, MA:
Modeling, vol. 4, pp. 155203. Amsterdam: North-Holland. Addison-Wesley.

Characteristic Classes
P B Gilkey, University of Oregon, Eugene, OR, USA Frames
R Ivanova, University of Hawaii Hilo, Hilo, HI, USA
A frame s := (s1 , . . . , sk ) for V 2 Vectk (M, F) over an
S Nikcevic, SANU, Belgrade, Serbia and Montenegro
open set O M is a collection of k smooth sections
2006 Elsevier Ltd. All rights reserved. to V jO so that {s1 (P), . . . , sk (P)} is a basis for the
fiber VP of V over any point P 2 O. Given such a
frame s, we can construct a local trivialization which
Vector Bundles identifies O Fk with VjO by the mapping
Let Vectk (M, F) be the set of isomorphism classes of P; 1 ; . . . ; k ! 1 s1 P    k sk P
real (F = R) or complex (F = C) vector bundles of
rank k over a smooth connected m-dimensional Conversely, given a local trivialization of V, we can
manifold M. Let take the coordinate frame
[
VectM; F Vectk M; F si P P 0; . . . ; 0; 1; 0; . . . ; 0
k
Thus, frames and local trivializations of V are
equivalent notions.
Principal Bundles Examples
Let H be a Lie group. A fiber bundle Simple Covers


:P!M An open cover {O } of M, where  ranges over some
indexing set A, is said to be a simple cover if any
with fiber H is said to be a principal bundle if there finite intersection O1 \    \ Ok is either empty or
is a right action of H on P which acts transitively on contractible.
the fibers, that is, if P=H = M. If H is a closed Simple covers always exist. Put a Riemannian
subgroup of a Lie group G, then the natural metric on M. If M is compact, then there exists a
projection G ! G=H is a principal H bundle over uniform  > 0 so that any geodesic ball of radius  is
the homogeneous space G=H. Let O(k) and U(k) geodesically convex. The intersection of geodesically
denote the orthogonal and unitary groups, respec- convex sets is either geodesically convex (and hence
tively. Let Sk denote the unit sphere in Rk1 . Then contractible) or empty. Thus, covering M by a finite
we have natural principal bundles: number of balls of radius  yields a simple cover.
The argument is similar even if M is not compact
Ok Ok 1 ! Sk
where an infinite number of geodesic balls is used
Uk Uk 1 ! S2k1 and the radii are allowed to shrink near 1.
Let RPk and CPk denote the real and complex Transition Cocycles
projective spaces of lines through the origin in Rk1
and Ck1 , respectively. Let Let Hom(F, k) be the set of linear transformations of
Fk and let GL(F, k) Hom(F, k) be the group of all
Z2 f
Idg Ok invertible linear transformations.
S1 f  Id : jj 1g Uk Let {s } be frames for a vector bundle V over some
open cover {O } of M. On the intersection O \ O ,
One has Z2 and S1 principal bundles: one may express s =  s , that is
Z2 ! Sk1 ! RPk1 X
j
s;i P ;i Ps;j P
S1 ! S2k1 ! CPk1 1jk
488 Characteristic Classes

Gilmore R and Letellier C (2006) The Symmetry of Chaos Alice Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge
in the Land of Mirrors. Oxford: Oxford University Press. University Press.
Gilmore R and Pei X (2001) The topology and organization of Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear
unstable periodic orbits in HodgkinHuxley models of receptors Physics and Its Mathematical Tools. Bristol: IoP Publishing.
with subthreshold oscillations. In: Moss F and Gielen S (eds.) Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental
Handbook of Biological Physics, Neuro-informatics, Neural Approach to Nonlinear Dynamics and Chaos. Reading, MA:
Modeling, vol. 4, pp. 155203. Amsterdam: North-Holland. Addison-Wesley.

Characteristic Classes
P B Gilkey, University of Oregon, Eugene, OR, USA Frames
R Ivanova, University of Hawaii Hilo, Hilo, HI, USA
A frame s := (s1 , . . . , sk ) for V 2 Vectk (M, F) over an
S Nikcevic, SANU, Belgrade, Serbia and Montenegro
open set O  M is a collection of k smooth sections
2006 Elsevier Ltd. All rights reserved. to V jO so that {s1 (P), . . . , sk (P)} is a basis for the
fiber VP of V over any point P 2 O. Given such a
frame s, we can construct a local trivialization which
Vector Bundles identifies O  Fk with VjO by the mapping
Let Vectk (M, F) be the set of isomorphism classes of P; 1 ; . . . ; k ! 1 s1 P    k sk P
real (F = R) or complex (F = C) vector bundles of
rank k over a smooth connected m-dimensional Conversely, given a local trivialization of V, we can
manifold M. Let take the coordinate frame
[
VectM; F Vectk M; F si P P  0; . . . ; 0; 1; 0; . . . ; 0
k
Thus, frames and local trivializations of V are
equivalent notions.
Principal Bundles Examples
Let H be a Lie group. A fiber bundle Simple Covers

:P!M An open cover {O } of M, where  ranges over some


indexing set A, is said to be a simple cover if any
with fiber H is said to be a principal bundle if there finite intersection O1 \    \ Ok is either empty or
is a right action of H on P which acts transitively on contractible.
the fibers, that is, if P=H = M. If H is a closed Simple covers always exist. Put a Riemannian
subgroup of a Lie group G, then the natural metric on M. If M is compact, then there exists a
projection G ! G=H is a principal H bundle over uniform  > 0 so that any geodesic ball of radius  is
the homogeneous space G=H. Let O(k) and U(k) geodesically convex. The intersection of geodesically
denote the orthogonal and unitary groups, respec- convex sets is either geodesically convex (and hence
tively. Let Sk denote the unit sphere in Rk1 . Then contractible) or empty. Thus, covering M by a finite
we have natural principal bundles: number of balls of radius  yields a simple cover.
The argument is similar even if M is not compact
Ok  Ok 1 ! Sk
where an infinite number of geodesic balls is used
Uk  Uk 1 ! S2k1 and the radii are allowed to shrink near 1.
Let RPk and CPk denote the real and complex Transition Cocycles
projective spaces of lines through the origin in Rk1
and Ck1 , respectively. Let Let Hom(F, k) be the set of linear transformations of
Fk and let GL(F, k)  Hom(F, k) be the group of all
Z2 fIdg  Ok invertible linear transformations.
S1 f  Id : jj 1g  Uk Let {s } be frames for a vector bundle V over some
open cover {O } of M. On the intersection O \ O ,
One has Z2 and S1 principal bundles: one may express s =  s , that is
Z2 ! Sk1 ! RPk1 X
j
s;i P ;i Ps;j P
S1 ! S2k1 ! CPk1 1jk
Characteristic Classes 489

The maps  : O \ O ! GL(F, k) satisfy Spin Structures


For k 3, the fundamental group of SO(k) is Z2 .
 Id on O
1 Let Spin(k) be the universal cover of SO(k) and let
   on O \ O \ O
 : Spink ! SOk
Let G be a Lie group. Maps belonging to a
be the associated double cover; set Spin(2) = S1 and
collection {  } of smooth maps from O \ O to G
let () = 2 . An oriented bundle V is said to be spin
which satisfy eqn [1] are said to be transition
if the transition functions can be lifted from SO(k)
cocycles with values in G; if G  GL(F, k), they
to Spin(k); this is possible if and only if the second
can be used to define a vector bundle by making
StiefelWhitney class of V, which is defined later,
appropriate identifications.
vanishes. There can be inequivalent spin structures,
which are parametrized by the cohomology group
Reducing the Structure Group H 1 (M; Z2 ).
If G is a subgroup of GL(F, k), then V is said to have
a G-structure if we can choose frames so the The Tangent Bundle of Projective Space
transition cocycles belong to G; that is, we can
The tangent bundle TRPm of real projective space is
reduce the structure group to G.
orientable if and only if m is odd; TRPm is spin if
Denote the subgroup of orientation-preserving
and only if m
3 mod 4. If m
3 mod 4, there are
linear maps by
two inequivalent spin structures on this bundle as
H 1 (RPm ; Z2 ) = Z2 .
GL R; k : f 2 GLR; k: det > 0g
The tangent bundle TCPm of complex projective
If V 2 Vectk (M, R), then V is said to be orientable if space is always orientable; TCPm is spin if and only
we can choose the frames so that if m is odd.

 2 GL R; k Principal and Associated Bundles

Let H be a Lie group and let


Not every real vector bundle is orientable; the first
StiefelWhitney class sw1 (V) 2 H 1 (M; Z2 ), which is  : O \ O ! H
defined later, vanishes if and only if V is orientable.
In particular, the Mobius line bundle over the circle be a collection of smooth functions satisfying the
is not orientable. compatibility conditions given in eqn [1]. We define
Similarly, a real (resp. complex) bundle V is a principal bundle P by gluing O  H to O  H
said to be Riemannian (resp. Hermitian) if we can using :
reduce the structure group to the orthogonal group
O(k)  GL(R, k) (resp. to the unitary group P; h P;  Ph for P 2 O \ O
U(k)  GL(C, k)). Because right multiplication and left multiplication
We can use a partition of unity to put a positive- commute, right multiplication gives a natural action
definite symmetric (resp. Hermitian symmetric) fiber of H on P:
metric on V. Applying the GramSchmidt process
then constructs orthonormal frames and shows that ~ : P; h  h
P; h  h ~

the structure group can always be reduced to O(k)
(resp. to U(k)); if V is a real vector bundle, then the The natural projection P ! P=H = M is an H fiber
structure group can be reduced to the special bundle.
orthogonal group SO(k) if and only if V is Let  be a representation of H to GL(F, k). For
orientable.
2 P,  2 Fk , and h 2 H, define a gluing


; 
 h1 ; h
Lifting the Structure Group
The associated vector bundle is then given by
Let  be a representation of a Lie group H to
GL(F, k). One says that the structure group of V can P  Fk : P  Fk=
be lifted to H if there exist frames {s } for V and
smooth maps  : O \ O ! H, so   =  Clearly, {  } are the transition cocycles of the
where eqn [1] holds for . vector bundle P  Fk .
490 Characteristic Classes

Frame Bundles Homotopy


If V is a vector bundle, the associated principal Two smooth maps f0 and f1 from N to M are
GL(F, k) bundle is the bundle of all frames; if V is said to be homotopic if there exists a smooth map
given an inner product on each fiber, then the F : N  I ! M so that f0 (P) = F(P, 0) and so that
associated principal O(k) or U(k) bundle is the bundle f1 (P) = F(P, 1). If f0 and f1 are homotopic maps from
of orthonormal frames. If V is an oriented Riemannian N to M, then f1 V is isomorphic to f2 V.
vector bundle, the associated principal SO(k) bundle is Let [N, M] be the set of all homotopy classes
the bundle of oriented orthonormal frames. of smooth maps from N to M. The association
V ! f  V induces a natural map
Direct Sum and Tensor Product N; M  Vectk M; F ! Vectk N; F
Fiber-wise direct sum (resp. tensor product) defines the If M is contractible, then the identity map is
direct sum (resp. tensor product) of vector bundles: homotopic to the constant map c. Consequently,
V = Id V is isomorphic to c V = M  Fk . Thus, any
: Vectk M; F  Vectn M; F
vector bundle over a contractible manifold is trivial.
! Vectkn M; F In particular, if {O } is a simple cover of M and if
: Vectk M; F  Vectn M; F V 2 Vect(M, F), then VjO is trivial for each . This
! Vectkn M; F shows that a simple cover is a trivializing cover for
every V 2 Vect(M, F).
The transition cocycles of the direct sum (resp.
tensor product) of two vector bundles are the direct Stabilization
sum (resp. tensor product) of the transition cocycles
of the respective bundles. Let l 2 Vect1 (M, F) denote the isomorphism class of
The set of line bundles Vect1 (M, F) is a group the trivial line bundle M  F over an m-dimensional
under . The unit in the group is the trivial line manifold M. The map V ! V l induces a stabili-
bundle l := M  F; the inverse of a line bundle L is zation map
the dual line bundle L := Hom(L, F) since s : Vectk M; F ! Vectk1 M; F

L L l which induces an isomorphism
Vectk M; R Vectk1 M; R for k > m
2
Pullback Bundle Vectk M; C Vectk1 M; C for 2k > m
Let  : V ! M be the projection associated with These values of k comprise the stable range.
V 2 Vectk (M, F). If f is a smooth map from N to M,
then the pullback bundle f  V is the vector bundle The K-Theory
over N which is defined by setting
The direct sum and tensor product make
f  V : fP; v 2 N  V : f P vg Vect(M, F) into a semiring; we denote the associated
ring defined by the Grothendieck construction by
The fiber of f  V over P is the fiber of V over f (P).
KF(M). If V 2 Vect(M, F), let [V] 2 KF(M) be the
Let {s } be local frames for V over an open cover
corresponding element of K-theory; KF(M) is gener-
{O } of M. For P 2 f 1 (O ), define
ated by formal differences [V1 ]  [V2 ]; such formal
ff  s gP : P; s f P differences are called virtual bundles.
The Grothendieck construction (see K-theory)
This gives a collection of frames for f  V over the
introduces nontrivial relations. Let Sm denote the
open cover {f 1 (O )} of N. Let
standard sphere in Rm1 . Since
f  :  f
TSm l m 1l
be the pullback of the transition functions. Then
we can easily see that [TSm ] = m[ l ] in KR(Sm ),

ff s gP P;  f Ps f P despite the fact that T(Sm ) is not isomorphic to ml
  for m 6 1, 3, 7.
ff  f s gP
Let L denote the nontrivial real line bundle over
This shows that the pullback of the transition RPk . Then TRPk l = (k 1)L, so
functions for V are the transition functions of the
pullback f  (V). TRPk  k 1L  l 
Characteristic Classes 491

The map V ! Rank(V) extends to a surjective respectively. The topology on these spaces is the
map from KF(M) to Z. We denote the associated weak or inductive topology. The Grassmannians are
ideal of virtual bundles of virtual rank 0 by called classifying spaces. The isomorphisms of
eqn [4] are compatible with the inclusions of eqn [5]
f
KFM : kerRank and we have
In the stable range, V ! [V]  k[ l ] identifies M; Grk F; 1 Vectk M; F 6
g
Vectk M; R KRM if k > m
3
g
Vectk M; C KCM if 2k > m
Spaces with Finite Covering Dimension
These groups contain nontrivial torsion. Let L be the A metric space X is said to have a covering
nontrivial real line bundle over RPk . Then dimension at most m if, given any open cover {U  }
g
KRRP k
Z  fL  l g=2 k ZfL  l g of X, there exists a refinement {O } of the cover so
that any intersection of more than m 1 of the {O }
where (k) is the Adams number. is empty. For example, any manifold of dimension
m has covering dimension at most m. More
Classifying Spaces generally, any m-dimensional cell complex has
covering dimension at most m.
Let Grk (F, n) be the Grassmannian of k-dimensional The isomorphisms of [2][4], and [6] continue to
subspaces of Fn . By mapping a k-plane in Fn to the hold under the weaker assumption that M is a metric
corresponding orthogonal projection on , we can space with covering dimension at most m.
identify Grk (F, n) with the set of orthogonal projec-
tions of rank k:
f
2 HomFn :
2
;

; tr
kg Characteristic Classes of Vector
There is a natural associated tautological k-plane Bundles
bundle The Cohomology of Grk (F, 1)
Vk F; n 2 Vectk Grk F; n; F The cohomology algebras of the Grassmannians are
whose fiber over a k-plane is the k-plane itself: polynomial algebras on suitably chosen generators:

Vk F; n : f
; x 2 HomFn  Fn :
x xg H  Grk R; 1; Z2 Z2 sw1 ; . . . ; swk 
7
H  Grk C; 1; Z Zc1 ; . . . ; ck 
Let [M, Grk (F, n)] denote the set of homotopy
equivalence classes of smooth maps f from M to
Grk (F, n). Since [f1 ] = [f2 ] implies that f1 V is The StiefelWhitney Classes
isomorphic to f2 V, the association
Let V 2 Vectk (M, R). We use eqn [6] to find
f ! f  Vk F; n 2 Vectk M; F  : M ! Grk (R, 1) which classifies V; the map 
induces a map is uniquely determined up to homotopy and, using
eqn [7], one sets
M; Grk F; n ! Vectk M; F
swi V :  swi 2 H i M; Z2
This map defines a natural equivalence of functors
in the stable range: The total StiefelWhitney class is then defined by
M; Grk R; k Vectk M; R for > m swV 1 sw1 V    swk V
4
M; Grk C; k Vectk M; C for 2 > m The StiefelWhitney class has the properties:
The natural inclusion of Fn in Fn1 induces natural 1. If f : X1 ! X2 , then f  (sw(V)) = sw(f  V).
inclusions 2. sw(V W) = sw(V)sw(W).
3. If L is the Mobius bundle over S1 , then sw1 (L)
Grk F; n  Grk F; n 1
5 generates H1 (S1 ; Z2 ) = Z2 .
Vk F; n  Vk F; n 1
The cohomology algebra of real projective space
Let Grk (F, 1) and Vk (F, 1) be the direct limit is a truncated polynomial algebra:
spaces under these inclusions; these are the infinite-
dimensional Grassmannians and classifying bundles, H  RPk ; Z2 Z2 x=xk1 0
492 Characteristic Classes

Since TRPk l = (k 1)L, one has classes pi (V) 2 H 4i (X; Z) are characterized by the
properties:
swTRPk 1 xk1
k 1k 2 1. p(V) = 1 p1 (V)    pk (V).
1 kx x  8
2 2. If f : X1 ! X2 , then f  (p(V)) = p(f  V).
3. Rp(V W) = p(V)p(W) mod elements of order 2.
2
4. CP2 1 p (TCP ) = 3.
Orientability and Spin Structures
The StiefelWhitney classes have real geometric We can complexify a real vector bundle V to
meaning. For example, sw1 (V) = 0 if and only if V construct an associated complex vector bundle VC .
is orientable; if sw1 (V) = 0, then sw2 (V) = 0 if and We have
only if V admits a spin structure. With reference to
the discussion on the tangent bundle or projective pi V : 1i c2i VC
space, eqn [8] yields Conversely, if V is a complex vector bundle, we can

construct an underlying real vector bundle VR by
sw1 TRP 0 if k
0 mod 2
k
forgetting the underlying complex structure. Mod-
x if k
1 mod 2
ulo elements of order 2, we have
Thus, RPk is orientable if and only if k is odd.
Furthermore, pVR cVcV 

Let TCPk be the real tangent bundle of complex
sw2 TRPk 0 if k
3 mod 4
x if k
1 mod 4 projective space. Then
Thus, TRPk is spin if and only if k
3 mod 4. pTCPk 1  x2 k1
Chern Classes

Let V 2 Vectk (M, C). We use eqn [6] to find Line Bundles
 : M ! Grk (C, 1) which classifies V; the map 
is uniquely determined up to homotopy and, using Tensor product makes Vect1 (M, F) into an abelian
eqn [7], one sets group. One has natural equivalences of functors
which are group homomorphisms:
ci V :  ci 2 H 2i M; Z
sw1 : Vect1 M; R ! H 1 M; Z2
The total Chern class is then defined by
c1 : Vect1 M; C ! H 2 M; Z
cV : 1 c1 V    ck V
The Chern class has the properties: A real line bundle L is trivial if and only if it is
orientable or, equivalently, if sw1 (L) vanishes. A
1. If f : X1 ! X2 , then f  (c(V)) = c(f  V). complex line bundle L is trivial if and only if
2. c(V W) = c(V)c(W). c1 (L) = 0. There are nontrivial vector bundles with
3. Let L be the R classifying line bundle over vanishing StiefelWhitney classes of rank k > 1. For
S2 = CP1 . Then S2 c1 (L) = 1. example, swi (TSk ) = 0 for i > 0 despite the fact that
The cohomology algebra of complex projective TSk is trivial if and only if k = 1, 3, 7.
space also is a truncated polynomial algebra

H  CPk ; Z Zx=xk1
Curvature and Characteristic Classes
where x = c1 (L) and L is the complex classifying line
de Rham Cohomology
bundle over CPk = Gr1 (C, k 1). If Tc CPk is the
complex tangent bundle, then We can replace the coefficient group Z by C at the cost
of losing information concerning torsion. Thus, we
cTc CPk 1 xk1 may regard pi (V) 2 H 4i (M; C) if V is real or ci (V) 2
H 2i (M; C) if V is complex. Let M be a smooth
manifold. Let C1 p M be the space of smooth
The Pontrjagin Classes
p-forms and let
Let V be a real vector bundle over a topological
space X of rank r = 2k or r = 2k 1. The Pontrjagin d : C1 p M ! C1 p1 M
Characteristic Classes 493

be the exterior derivative. The de Rham cohomology The curvature operator  can also be computed
groups are then defined by locally. Let (si ) be a local frame. Expand
X j
p kerd : C1 p M ! C1 p1 M rsi !i sj
HdeR M :
imd : C1 p1 M ! C1 p M j

The de Rham theorem identifies the topological to define the connection 1-form !. One then has
cohomology groups H p (M; C) with the de Rham  
p
cohomology groups HdeR (M) which are given j j
r2 si d! i  !ki ^ !k sk
differential geometrically.
Given a connection on V, the ChernWeyl theory and so
enables us to compute Pontrjagin and Chern classes in
de Rham cohomology in terms of curvature. j j
 i d!i  !ki ^ !k
j

js
If s = i j is another local frame, we compute
Connections
Let V be a vector bundle over M. A connection ~ dgg1 g!g1
! and ~ gg1


r : C1 V ! C1 T  M V Although the connection 1-form ! is not tensorial, the


curvature is an invariantly defined 2-form-valued
on V is a first-order partial differential operator
endomorphism of V.
which satisfies the Leibnitz rule, that is, if s is a
smooth section to V and if f is a smooth function
on M, Unitary Connections
rfs df s f rs Let ( , ) be a nondegenerate Hermitian inner product
on V. We say that r is a unitary connection if
If X is a tangent vector field, we define
rX s hX; rsi rs1 ; s2 s1 ; rs2 ds1 ; s2

where h , i denotes the natural pairing between the Such connections always exist and, relative to a
tangent and cotangent spaces. This generalizes to the local orthonormal frame, the curvature is skew-
bundle setting the notion of a directional derivative symmetric, that is,
and has the properties:
  0
1. rfX s = f rX s.
2. rX (fs) = X(f )s f rX s. Thus,  can be regarded as a 2-form-valued element
3. rX1 X2 s = rX1 s rX2 s. of the Lie algebra of the structure group, O(V) in the
4. rX (s1 s2 ) = rX s1 rX s2 . real setting or U(V) in the complex setting.

The Curvature 2-Form Projections

Let !p be a smooth p-form. Then We can always embed V in a trivial bundle 1 of


dimension ; let V be the orthogonal projection on
r : C1 p M V ! C1 p1 M V V. We project the flat connection to V to define a
can be extended by defining natural connection on V. For example, if M is
embedded isometrically in the Euclidean space R ,
r!p s d!p s 1p !p ^ rs this construction gives the Levi-Civita connection on
the tangent bundle TM. The curvature of this
In contrast to ordinary exterior differentiation, r2
connection is then given by
need not vanish. We set
s : r2 s  V d V d V

This is not a second-order partial differential Let VP be the fiber of V over a point P 2 M. The
operator; it is a zeroth-order operator, that is, inclusion i : V  R n defines the classifying map
f : P ! Grk (R, n) where we set
fs ddf s  df ^ rs df ^ rs f r2 s
f s f P iVP
494 Characteristic Classes

ChernWeyl Theory Other Characteristic Classes


Let r be a Riemannian connection on a real vector The Chern character is defined by the exponential
bundle V of rank k. We set function. There are other characteristic classes
  which appear in the index theorem that are defined
1 using other generating functions that appear in
p : det I 
2 index theory. Let x := (x1 , . . . ) be a collection of
indeterminates. Let s (x) be the th elementary
Let T denote the transpose matrix of differential symmetric function;
form. Since  T = 0, the polynomials of odd Y
degree in  vanish and we may expand 1 x 1 s1 x s2 x   

p 1 p1     pr 
For a diagonal matrix A := diag( 1 , .. . ), denote the
p
where k = 2r or k = 2r 1 and the differential forms normalized eigenvalues by xj := 1j =2 . Then
pi () 2 C1 4i (M) are forms of degree 4i. p !
Changing the gauge (i.e., the local frame) replaces 1
 by gg1 and hence p() is independent of the cA det 1 A 1 s1 x   
2
local frame chosen. One can show that dpi () = 0;
let [pi ()] denote the corresponding element of de Thus, the Chern class corresponds in a certain sense
Rham cohomology. This is independent of the to the elementary symmetric functions.
particular connection chosen and [pi ()] represents Let f (x) be a symmetric polynomial or more
pi (V) in H 4i (M; C). generally a formal power series which is symmetric.
Similarly, let V be a complex vector bundle of We can express f (x) = F(s1 (x), . . . ) in terms of the
rank k with a Hermitian connection r. Set elementary symmetric functions and define
p ! f () = F(c1 (), . . . ) by substitution. For example,
1 the Chern character is defined by the generating
c : det I 
2 function
1 c1     ck  X
k
f x : ex
Again ci () is independent of the local gauge and 1
dci () = 0. The de Rham cohomology class [ci ()] The Todd class is defined using a different
represents ci (V) in H 2i (M; C). generating function:
Y
tdx : x 1  ex 1
The Chern Character

The total Chern character is defined by the formal 1 td1 x   


sum
If V is a real vector bundle, we can define
p
ch : tre 1=2 some additional
p
characteristic classes similarly. Let
p { 11 , . . . } be the nonzero eigenvalues of a
X 1
tr skew-symmetric matrix A. We set xj =  j =2

2 ! and define the Hirzebruch polynomial L and the A
ch0  ch1     genus by
Y x
Let ch(V) = [ch()] denote the associated de Rham Lx :

tanhx
cohomology class; it is independent of the particular
connection chosen. We then have the relations 1 L1 x L2 x   
Y x
^
Ax :
chV W chV chW 2 sinh1=2x

chV W chVchW ^ 1 x A
1A ^ 2 x   
The Chern character extends to a ring isomorph- The generating functions
ism from KU(M) Q to H e (M; Q), which is a
natural equivalence of functors; modulo torsion, x x
and
K theory and cohomology are the same functors. tanhx 2 sinh1=2x
Characteristic Classes 495

are even functions of x, so the ambiguity in the If M is an even-dimensional manifold, let em (M) :=
choice of sign in the eigenvalues plays no role. This em (TM). If we reverse the local orientation of M,
defines characteristic classes then em (M) changes sign. Consequently, em (M) is a
measure rather than an m-form; we can use the
Li V 2 H 4i M; C and ^ i V 2 H 4i M; C
A Riemannian measure on M to regard em (M) as a
scalar. Let Rijkl be the components of the curvature of
Summary of Formulas the Levi-Civita connection with respect to some local
orthonormal frame field; we adopt the convention
We summarize below some of the formulas in terms that R1221 = 1 on the standard sphere S2 in R3 . If
of characteristic classes: "I,J := (eI , eJ ) is the totally antisymmetric tensor, then
p
1tr()
1. c1 () = , X "I;J Ri    Rim1 im jm jm1
2 1 i2 j2 j1
e2n :
1 8 n n!
2. c2 () = 2 {tr(2 )  tr()2 }, I; J
8
1 Let R := Rijji and ij := Rikkj be the scalar curvature
3. p1 () =  2 tr(2 ),
8   and the Ricci tensor, respectively. Then
c21  2c2
4. ch(V) = k c1    (V), 1
 2  e2 R
2 4
c1 (c1 c2 ) c1 c2
5. td(V)= 1   (V), 1
2 12 24 e4 R2  4jj2 jRj2
  32 2
p1 7p21  4p2
6. A(V) = 1     (V),
24 5760
  Characteristic Classes of Principal
p1 7p2  p21
7. L(V) = 1    (V), Bundles
3 45
8. td(V W) = td(V)td(W), Let g be the Lie algebra of a compact Lie group G.
9. A(V W) = A(V)A(W), Let : P ! M be a principal G bundle over M. For

2 P, let
10. L(V W) = L(V)L(W).
V
: ker  : T
P ! T
M and H
: V ?

The Euler Form


be the vertical and horizontal distributions of the
So far, this article has dealt with the structure groups projection , respectively. We assume that the metric
O(k) in the real setting and U(k) in the complex on P is chosen to be G-invariant and such that
setting. There is one final characteristic class which  : H
! T
M is an isometry; thus, is a Rieman-
arises from the structure group SO(k). Suppose k = 2n nian submersion. If F is a tangent vector field on M,
is even. While a real antisymmetric matrix A of shape let HF be the corresponding vertical lift. Let V be
2n  2n cannot be diagonalized, it can be put in block orthogonal projection on the distribution V. The
off 2-diagonal form with blocks, curvature is defined by
 
0  F1 ; F2 V HF1 ; HF2 
 0 the horizontal distribution H is integrable if and only if
The top Pontrjagin class pn (A) = x21    x2n is a perfect the curvature vanishes. Since the metric is G-invariant,
square. The Euler class (F1 , F2 ) is invariant under the group action. We may
use a local section s to P over a contractible coordinate
e2n A : x1    xn chart O to split 1 O = O  G. This permits us to
is the square root of pn . If V is an oriented vector identify V with TG and to regard  as a g-valued
bundle of dimension 2n, then 2-form. If we replace the section s by a section s, then
 = gg1 changes by the adjoint action of G on g.
e2n V 2 H 2n M; C If V is a real or complex vector bundle over M,
is a well-defined characteristic class satisfying we can put a fiber metric on V to reduce the
e2n (V)2 = pn (V). structure group to the orthogonal group O(r) in the
If V is the underlying real oriented vector bundle real setting or the unitary group U(r) in the complex
of a complex vector bundle W, setting. Let P V be the associated frame bundle. A
Riemannian connection r on V induces an invariant
e2n V cn W splitting of TP V = V H and defines a natural
496 ChernSimons Models: Rigorous Results

metric on P V ; the curvature  of the connection r Bott R and Tu LW (1982) Differential forms in algebraic
defined here agrees with the definition previously. topology. Graduate Texts in Mathematics, p. 82. New York
Berlin: Springer-Verlag.
Let Q(G) be the algebra of all polynomials on Chern S (1944) A simple intrinsic proof of the GaussBonnet
g which are invariant under the adjoint action. If formula for closed Riemannian manifolds. Annals of Mathe-
Q 2 Q(G), then Q() is well defined. One has matics 45: 747752.
dQ() = 0. Furthermore, the de Rham cohomology Chern S (1945) On the curvatura integra in a Riemannian
class Q(P) := [Q()] is independent of the particular manifold. Annals of Mathematics 46: 674684.
Conner PE and Floyd EE (1964) Differentiable periodic maps.
connection chosen. We have Ergebnisse der Mathematik und ihrer Grenzgebiete, N.F.,
Band 33. New York: Academic Press; BerlinGottingen
QUk Cc1 ; . . . ; ck 
Heidelberg: Springer-Verlag.
QSUk Cc2 ; . . . ; ck  de Rham G (1950) Complexes a automorphismes et homeomorphie
differentiable (French). Ann. Inst. Fourier Grenoble 2: 5167.
QO2k Cp1 ; . . . ; pk 
Eguchi T, Gilkey PB, and Hanson AJ (1980) Gravitation, gauge
QO2k 1 Cp1 ; . . . ; pk  theories and differential geometry. Physics Reports 66: 213393.
Eilenberg S and Steenrod N (1952) Foundations of Algebraic
QSO2k Cp1 ; . . . ; pk ; ek =e2k pk Topology. Princeton, NJ: Princeton University Press.
QSO2k 1 Cp1 ; . . . ; pk  Greub W, Halperin S, and Vanstone R (1972) Connections,
Curvature, and Cohomology. Vol. I: De Rham Cohomology
Thus, for this category of groups, no new character- of Manifolds and Vector Bundles. Pure and Applied Mathe-
istic classes ensue. Since the invariants are Lie- matics, vol. 47. New YorkLondon: Academic Press.
algebra theoretic in nature, Hirzebruch F (1956) Neue topologische Methoden in der
algebraischen Geometrie (German). Ergebnisse der Mathema-
QSpink QSOk tik und ihrer Grenzgebiete (N.F.), Heft 9. BerlinGottingen
Heidelberg: Springer-Verlag.
Other groups, of course, give rise to different Husemoller D (1966) Fibre Bundles. New YorkLondonSydney:
characteristic rings of invariants. McGraw-Hill.
Karoubi M (1978) K-theory. An introduction. Grundlehren der
Mathematischen Wissenschaften, Band 226. BerlinNew York:
Acknowledgmnts Springer-Verlag.
Kobayashi S (1987) Differential Geometry of Complex Vector
Research of P Gilkey was partially supported by Bundles. Publications of the Mathematical Society of Japan, 15.
the MPI (Leipzig, Germany), that of R Ivanova by Kano Memorial Lectures, 5. Princeton, NJ: Princeton University
the UHH Seed Money Grant, and of S Nikcevic by Press; Tokyo: Iwanami Shoten.
Milnor JW and Stasheff JD (1974) Characteristic Classes. Annals
MM 1646 (Serbia), DAAD (Germany), and Dierks of Mathematics Studies, No. 76. Princeton, NJ: Princeton
von Zweck Stiftung (Esen, Germany). University Press; Tokyo: University of Tokyo Press.
Steenrod NE (1962) Cohomology Operations. Lectures by NE
See also: Cohomology Theories; Gerbes in Quantum Steenrod written and revised by DBA Epstein. Annals of Mathe-
Field theory; Instantons: Topological Aspects; K-Theory; matics Studies, No. 50. Princeton, NJ: Princeton University Press.
Mathai-Quillen Formalism; Riemann Surfaces. Steenrod NE (1951) The Topology of Fibre Bundles. Princeton
Mathematical Series, vol. 14. Princeton, NJ: Princeton
University Press.
Further Reading Stong RE (1968) Notes on Cobordism Theory. Mathematical
Notes. Princeton, NJ: Princeton University Press; Tokyo:
Besse AL (1987) Einstein manifolds. Ergebnisse der Mathematik University of Tokyo Press.
und ihrer Grenzgebiete (3) [Results in Mathematics and Weyl H (1939) The Classical Groups. Their Invariants and
Related Areas (3)], p. 10. Berlin: Springer-Verlag. Representations. Princeton, NJ: Princeton University Press.

ChernSimons Models: Rigorous Results


A N Sengupta, Louisiana State University, challenges for mathematicians. Most of the tremen-
Baton Rouge, LA, USA dous amount of mathematical activity generated by
2006 Elsevier Ltd. All rights reserved. Wittens discovery has been concerned primarily with
issues that arise after one has accepted the functional
integral as a formal object. This has left, as an
important challenge, the task of giving rigorous
Introduction
meaning to the functional integrals themselves and to
The relationship between topological invariants and rigorously derive their relation to topological invar-
functional integrals from quantum ChernSimons iants. The present article will discuss efforts to put the
theory discovered by Witten (1989) raised several functional integral itself on a rigorous basis.
ChernSimons Models: Rigorous Results 497

ChernSimons Functional Integrals subject to the initial condition g(0) = I, the identity.
The path t 7! g(t) describes parallel transport along C
We shall describe here the typical ChernSimons
by the connection A. If C is a loop then the final value
functional integral. For the purposes of this article,
g(1) is the holonomy of A around C. If R is a repre-
we will confine ourselves to a simpler setting rather
sentation of G on some finite-dimensional vector space
than the most general possible one. In fact, we shall
then the trace of R(g(1)) is the Wilson loop observable:
work with fields over three-dimensional Euclidean
space R 3 (instead of a general 3-manifold). WC;R A trRg1 3
The typical ChernSimons functional integral is of
Thus, we have specified the meaning of the terms
the form
appearing in the formal integral [1], where
Z
C1 , . . . , Cn of eqn [1] form a link (a family of
eik=4SCS A WC1 ;R1 A . . . WCn ;Rn ADA 1 nonintersecting, imbedded loops) in R3 and
A
R1 , . . . , Rn are finite-dimensional representations of
Our objective in this section will be to specify what G. Witten showed that, at least for suitable values of
the terms in this formal integral mean. Very briefly, k, integrals of this form ought to produce topologi-
the integration is with respect to a formal Lebesgue cal invariants, which he identified, for the link.
measure on A, an infinite-dimensional space of The integral [1] is problematic for several reasons.
geometric objects A called connections over R 3 with First, there is no reasonable and useful analog of
values in the Lie algebra LG of a group G. In the Lebesgue measure on an infinite-dimensional space.
first term in the integrand, in the exponent, k is a Even if one were to regularize this measure in some
real number, and SCS (A) is the ChernSimons action simple way, one would run into the problem that the
for the connection A. Each term WCi ,Ri (A) is a measure would not live on the space of smooth
Wilson loop observable, the trace in some represen- connections, and so the integrand would become
tation Ri of the holonomy of the connection A meaningless.
around the loop Ci . The entire integral, formal There are several different approaches to a
though it may be, provides an invariant associated mathematical interpretation of [1]. The approach
with the system of loops C1 , . . . , Cn . that is often taken in practice is to simply ignore the
Let G be a compact Lie group; for ease of analytical problem and define the value of the
exposition, let us take G to be a closed, connected integral [1] to be what Wittens calculations have
subgroup of U(n). Thus, each element of G is an given. One approach, used, for instance, by Bar-
n  n complex matrix g with g g = I, the identity. Natan (1995) is to expand the integrand in a series
The Lie algebra LG consists of all n  n matrices A and relate each individual integral in this expansion
which are skew-Hermitian, that is, satisfy A = A, separately to topological invariants. Discrete
and for which etA 2 G for all real numbers t. On LG approximation procedures to the continuum integral
there is a convenient inner product given by have also been explored. In the abelian case, infinite-
hA; Bi trAB dimensional oscillatory integral techniques have
been used to understand the functional integral.
This inner product is invariant under the conjuga- Frohlich and King (1999) showed the possibility of
tion action of the group G on its Lie algebra LG. interpreting parallel transport using ideas from
By a connection over R3 we shall mean a C1 stochastic differential equations. Such an approach
1-form with values in LG. The set of all connections has been used successfully in the case of two-
is an affine (in our case, actually a linear) space A. If dimensional YangMills theory, where the func-
A 2 A, then define tional integral actually corresponds to integration
Z with respect to a measure. In this article, we focus
SCS A trA ^ dA 23 A ^ A ^ A 2 on a method of understanding the normalized
R3 ChernSimons functional integral in terms
This is, up to constant multiple, the ChernSimons of infinite-dimensional distribution theory and
action functional. examining some ideas for understanding Wilson
Let A be a connection and consider a piecewise loop expectation values in this setting.
smooth path
C : 0; 1 ! R3 Infinite Dimensional Distributions
With this one can associate a G-valued path [0,1] ! Let (x0 , x1 , x2 ) denote the usual coordinates on R 3 .
G : t 7! g(t) 2 G satisfying the differential equation Gauge symmetry, an issue which will not be
examined here, may be used to simplify the problem
g0 tgt1 AC0 t of the ChernSimons integral. In particular, one
498 ChernSimons Models: Rigorous Results

need only focus on connections which vanish in the The inner products h , ip give rise to a nuclear space
x2 -direction, that is, connections of the form structure on function spaces over E. Let U be the
A = A0 dx0 A1 dx1 . For such A, the triple wedge- algebra of functions on E 0 generated by the exponen-
product term in the ChernSimons action disap- tials e^x , with x running over E and  over C. For each
pears, and we are left with the quadratic expression: p  0, there is an inner product hh , iip on U such that
Z DD 2 2 2 2
EE
SCS A trA ^ dA 4 e^x jxjp =2 ; e^y jyjp =2 
ehx;yi p 7
p
R3
For p = 0 the left-hand side coincides with the L2 ()
This is good, since the functional integral now
inner product. Let [E]p be the Hilbert space
involves a quadratic exponent and so stands a good
completion of U in the hh , iip inner product. Then
chance of rigorous realization, just as Gaussian
measure can be given rigorous meaning in infinite    E3  E2  E1  E0 L2 E 0 ;  8
dimensions. However, in the ChernSimons situa-
tion, there is no hope of actually getting a measure, Let [E] = \p 0 [E]p , equipped with topology from all
not even a complex measure. the norms kkp , and [E]0 its topological dual.
The next best thing to a measure is a distribution Elements of [E]0 , being continuous linear functionals
or generalized function. A distribution over a space on the test function space [E], are called distribu-
Y is a continuous linear functional on a topological tions over E, in the language of white-noise analysis.
vector space of functions on Y. Thus, the objective is A fundamental tool in the study of infinite-
to realize the ChernSimons functional integral as a dimensional distributions is the S-transform. This
continuous linear functional on some space of test generalizes the traditional SegalBargmann trans-
functions over A (more precisely, on an extension of form from the L2 -setting to the context of distribu-
A). Before turning to the specific case of the Chern tions. Let E c be the complexification of E. The inner
Simons integral, let us examine some elements of the product h , i0 on E extends to a complex-bilinear
theory of infinite-dimensional distributions, in as pairing E c  E c ! C : (z, w) 7! z  w. The evaluation
much as they are relevant to our needs. pairing E 0  E ! R also extends naturally to the
Let us consider a Hilbert space E 0 , and a positive complexifications. For  a distribution belonging to
HilbertSchmidt operator T on E 0 . For each integer [E]0 , define a function S on E by
p  0, let E p = T p (E 0 ), which is a Hilbert space with Sz cz
the inner product hx, yip = hT p x, T p yi. Then we
have the chain of inclusions for all z 2 E c . Here cz is the coherent state function on
\ E 0 given by cz () = e(z)(1=2)zz . A fundamental and
E Ep     E2  E1  E0 5 useful result in white-noise analysis, due originally to
p1 Potthoff and Streit, specifies the range of the transform
S and allows reconstruction of a distribution  from
with each inclusion E p1 ! E p being Hilbert
the function S. Briefly, the range of S consists of
Schmidt. Let E p = E 0p be the topological dual of E p ,
functions which are holomorphic, in an appropriate
the space of continuous linear functionals on E p , and
sense, and have at most quadratic exponential growth.
let E 0 be the topological dual of E, where the latter is
In particular, this theorem implies that a function of the
given the topology generated by all the norms kkp .
form z 7! eazz , for any constant a, is in the range of .
Then we have the inclusions
[
E 0 E 00  E 1  E 2      E 0 E p 6
p0
Rigorous Realization of ChernSimons
For each x 2 E there is the evaluation map Integrals
^ : E 0 ! R :  7! (x). A very special case of a general
x We return to the ChernSimons context. As men-
theorem of Minlos guarantees that on the dual E 0 there tioned earlier, gauge symmetry may be invoked to
is a measure  on the sigma algeba generated by all the reduce the space of connections to the smaller space:
functions x ^ such that each x ^ is a Gaussian random
variable of mean zero and variance jxj20 , that is, E X X 9
Z 3
where X = S(R )
LG is the space of rapidly
2 2
eit^x d et jxj0 =2 decreasing functions with values in the Lie algebra
E0
LG. Let
for all x 2 E and t 2 R. This measure  is the !1
standard Gaussian measure on E 0 for the infinite- d2 x2
T1  2
dimensional nuclear space E. dx 4
ChernSimons Models: Rigorous Results 499

as a linear operator on L2 (R 3 ), T2 = T1
3
I the by  (x) = 3 (x=). Next, for a smooth loop
induced operator on L2 (R 3 )
LG, and T = T2 T2 . [0, 1] ! l(t) = (l0 (t), l1 (t), l2 (t)), let l (t) =  (  l(t)),
Then, as described in the preceding section, we have the scaled bump function centered now at the path
the space E and its dual E 0 . There is then the point l(t). Now consider a generalized connection
standard Gaussian measure  on E 0 , and the space A = (A0 , A1 ) 2 E 0 . Set
[E]0 of distributions over E 0 . 
The normalized ChernSimons integral may be BlA t A0 l tl0 t0 A1 l tl0 t1 13
viewed as a linear functional
The equation of parallel transport can be reformu-
Z
1 lated as a differential equation for a matrix-valued
CS : F 7! eik=4SCS A FADA 10 
path t 7! PlA (t) satisfying
N E
where N is a normalizing factor. Rigorous mean- d l  
P t BlA tPlA t 0 14
ing can be given to this by first formally working out dt A
what the S-transform of CS ought to be. Calcula- 
and the initial condition PlA (t) = I. With this smear-
tion shows that S is indeed a holomorphic function
ing, one can consider functions of the form
on E c of quadratic growth. The PotthoffStreit
theorem then implies that CS does exist as a Y
n


distribution in the space [E]0 . Let us examine this W L; A trPli A 15


i1
in some more detail.
As before, we take A to be of the form for a link L consisting of loops l1 , . . . , ln , instead of
A = A0 dx0 A1 dx1 , with the component A2 equal the classical Wilson loop variable.
to 0. Integration by parts shows that At this stage, it would be natural to consider
Z taking  # 0 in (W (L)). However, this is still
k k problematic. A further regularization is needed,
SCS A  trA0 @2 A1 dvol 11
4 2 R3 roughly corresponding to the geometric notion of
A formal computation reveals that S(CS )(j) should framing. In the definition of CS , alteration is made
be given by to the quadratic form Q(j, j) in the exponent which
  appears in the expression for S(CS ), replacing it
2i  1  with Q(j, s j), where {s }s>0 is a family of suitable
exp tr j0 @2 j1 12
k diffeomorphisms of R3 , with 0 being the identity.
In a sense, this splits a single loop l into l and a
where j = (j0 , j1 ), and neighboring loop s l. At the end, one has to take
Z s # 0. The resulting limiting value is the expected
1
@21 f x ds11;x2  s  1x2 ;1 s f x0; x1; s link-invariant. We shall not go into the case of
2
nonabelian G, which is more complex, for which
The PotthoffStreit criterion implies the existence of work continues to be in progress.
a distribution CS , whose S-transform is given by the Infinite-dimensional distributions can be used to
above expression. formulate a rigorous theory for normalized Chern
The distribution CS is, however, not a suffi- Simons functional integrals. The more specific ques-
ciently powerful object to allow determination of tions raised by the Wilson-loop integrals in this setting
the Wilson loop expectations that one would really opens up new problems for further developments in
like to have. For instance, CS does not live on the the distribution theory, connecting geometry, topol-
space of smooth connections and so the meaning of ogy, and infinite-dimensional analysis.
parallel transport needs to be defined. The state of
knowledge, at the rigorous level, at this point is still
evolving, with progress reported by A. Hahn. We Acknowledgments
describe some ideas for the Wilson loop expecta-
tions in the following. This research is supported by US NSF grant DMS-
The strategy for defining parallel transport along 0201683.
a path is to smear out the path by means of bump
See also: BF Theories; Feynman Path Integrals;
functions and essentially replace the path by a path
Fractional Quantum Hall Effect; Knot Theory and
of test functions in E. The description given here is Physics; Large-N and Topological Strings; Large-N
mainly for the case of abelian G. Choose first a C1 Dualities; Quantum 3-Manifold Invariants; Quantum Hall
non-negative bump function on R3 , vanishing Effect; Spin Foams; String Field Theory; Topological
1
outside the unit ball and having L norm equal to 1. Quantum Field Theory: Overview; Twistor Theory: Some
For  > 0, let  be the scaled bump function given Applications.
500 Classical Groups and Homogeneous Spaces

Further Reading Frohlich J and King C (1989) The ChernSimons theory and Knot
polynomials. Communications in Mathematical Physics
Albeverio S, Hahn A, and Sengupta AN (2003) ChernSimons 126: 167199.
theory, Hida distributions, and state models. Infinite Dimen- Kondratiev Yu, Leukert P, Potthoff J, Streit L, and Westerkamp W
sional Analysis Quantum Probability and Related Topics (1996) Generalized functionals in Gaussian spaces the
6: 6581. characterization theorem revisited. Journal of Functional
Albeverio S and Schafer J (1994) Abelian ChernSimons Analysis 141 (suppl. 2): 301318.
theory and linking numbers via oscillatory integrals. Kuo H-H (1996) White Noise Distribution Theory. Boca Raton,
Journal of Mathematical Physics (N.Y.) 36 (suppl. FL: CRC Press.
5): 21352169. Landsman NP, Pflaum M, and Schlichenmaier M (2001)
Albeverio S and Sengupta A (1997) A mathematical construction Quantization of Singular Symplectic Quotients. BaselBoston
of the non-Abelian ChernSimons functional integral. Com- Berlin: Birkhauser.
munications in Mathematical Physics 186: 563579. Leukert P and Schafer J (1996) A rigorous construction of Abelian
Altschuler D and Freidel L (1997) Vassiliev Knot invariants and ChernSimons path integrals using White Noise analysis. Rev.
ChernSimons perturbation theory to all orders. Communica- Math. Phys. 8 (suppl. 3): 445456.
tions in Mathematical Physics 187: 261287. Sen Samik, Sen Siddhartha, Sexton JC, and Adams DH (2000)
Atiyah M (1990) The Geometry and Physics of Knot Polyno- Geometric discretization scheme applied to the Abelian
mials. Cambridge: Cambridge University Press. ChernSimons theory. Physical Review E 61: 31745185.
Bar-Natan D (1995) Perturbative ChernSimons theory. Journal Simon B (1971) Distributions and their Hermite expansions.
of Knot Theory and its Ramifications 4: 503. Journal of Mathematical Physics (N.Y.) 12: 140148.
Chern S-S and Simons J (1974) Characteristic forms and Witten E (1989) Quantum field theory and the Jones polynomial.
geometric invariants. Annals of Mathematics 99: 4869. Communications in Mathematical Physics 121: 351399.

Classical Groups and Homogeneous Spaces


S Gindikin, Rutgers University, Piscataway, NJ, USA
interpretation (see below the consideration of the
2006 Elsevier Ltd. All rights reserved. cone of symmetric positive forms). Between classical
manifolds there are Minkowski space, Grassman-
nians, and multidimensional analogs of the disk and
Classical groups are Lie groups corresponding to the half-plane. A substantial part of this theory is a
three classical geometries linear, metric, and matrix geometry, which serves as a background for
symplectic. Let us start with the complex field C. matrix analysis. A rich geometry on classical
We consider the linear space Cn and the group manifolds with many symmetries is a background
GL(n; C) of its automorphisms nondegenerate for a rich multidimensional analysis with many
(invertible) linear transformations. The complex explicit formulas. Classical geometries, starting with
linear metric space is the space Cn endowed by a Minkowski geometry, have appeared in some
nondegenerate symmetric bilinear form; the orthogo- problems of mathematical physics.
nal group O(n; C) is the subgroup in GL(n; C) of A crucial technical fact is the embedding of the
automorphisms of this structure. If, for n = 2l, we classical groups in the class of semisimple Lie groups;
replace the symmetric form by a nondegenerate skew- it gives a very strong unified method to work with
symmetric form, we obtain the linear symplectic semisimple groups and corresponding geometries the
space and the group Sp(l; C) of its automorphisms method of roots. Nevertheless, some special realiza-
the symplectic group. tions and constructions for classical groups can also be
A fundamental observation of nineteenth century very useful. A very impressive example is the twistors
geometry was that the transfer from the complex of Penrose, where an initial construction is the
field to the real one, gives not only three corres- realization of points of four-dimensional Minkowski
ponding groups for R but a much reacher collection space as lines in three-dimensional complex projective
of real forms of complex classical groups: unitary, space. We mention below some general facts about
pseudounitary, pseudoorthogonal, etc. (see below). semisimple groups and homogeneous manifolds, but
Classical geometries correspond to homogeneous the focus will be on special possibilities for the classical
manifolds with classical groups of transformations. groups. The class of simple Lie groups contains,
Geometers understood that this produces a very besides the classical groups, only a finite number of
reach world of non-Euclidean geometries, including exceptional groups which are also very interesting and
the first example of non-Euclidean geometry are connected, in particular, with noncommutative
hyperbolic geometry. Some classical algebraic the- and nonassociative geometries; they have applications
ories through such an approach obtain a geometrical to mathematical physics.
Classical Groups and Homogeneous Spaces 501

Complex Groups and Homogeneous Flag Manifolds


Manifolds These homogeneous spaces F = G=P with semi-
Complex Classical Groups simple (in our case with classical) groups G have
parabolic subgroups P as the isotropy subgroups.
The complete linear group GL(n; C) is the group of The group G = GL(n; C) transitively acts on the
nongenerate matrices g of order n (det g 6 0) and the flag manifolds F(n1 , . . . , nr ), 0 < n1 <    < nr < n,
special linear group SL(n; C) is its subgroup of whose elements are (n1 , . . . , nr )-flags sequences of
matrices with the determinant equal 1 (unimodular embedded subspaces in Cn of the dimensions
condition). The unimodular condition kills the one- (n1 , . . . , nr ). The isotropy subgroup P = P(n1 , . . . , nr )
dimensional center, perhaps, leaving only a finite is the subgroup of blocktriangle matrices with the
center. We realize the direct products of several copies diagonal blocks of sizes k1 , . . . , kr1 , kj = (nj 
of complete linear groups with different dimensions, nj1 ), n0 = 0, nr1 = n. The flag manifolds are com-
for example, GL(k; C)  GL(l; C), as the groups of the pact complex manifolds. The matrices proportional
blockdiagonal nondegenerate matrices. The letter S to the unit matrix En act trivially and we can
always means that we take matrices with determinant consider instead of the action of G = GL(n; C) the
1. So the notation S(L(k; C)  L(l; C)) means that we transitive action of G = SL(n; C).
take blockdiagonal matrices with blocks of sizes k, l Let us pay particular attention to two extremal
and with the determinant 1. cases. The first one is the case of the maximal
Let I be a nondegenerate symmetric matrix of flag manifold when we have the sequence of
order n; then the orthogonal group O(n; C) is the all integers (1, 2, 3, . . . , n  1) complete flags; the
subgroup in GL(n; C) of matrices preserving the subgroup P in this case is called Borelian. Another
corresponding symmetric form so that case is minimal flag manifolds with r = 1 (for them
g> Ig I the unipotent radical of the parabolic subgroups is
commutative). Then in the case of SL(n; C) the
These matrices can have the determinant 1. The sequence has only one element n1 = k < n and we
special orthogonal group SO(n; C) is the subgroup have Grassmannian manifolds GrC (k; n) = F(k) of
of orthogonal matrices with determinant 1. Differ- k-dimensional subspaces in Cn . If k = 1 or k = n  1,
ent Is give isomorphic orthogonal groups since they we obtain the dual realizations of the complex
are all linearly equivalent. If we take as I the unit projective space CPn1 . We can interpret points
matrix E = En , then we receive the group of of GrC (k; n) also as (k  1)-dimensional planes in
orthogonal matrices in the classical sense: g> g = E. CPn1 .
If n = 2l and we replace in this definition the We can define points of the projective space
symmetric matrix I by a nondegenerate skew- CPn1 by homogeneous coordinates as the
symmetric matrix J, we obtain the symplectic equivalency classes (z  cz, z 2 Cn n {0}, c 2 C n 0).
group Sp(l; C). Again, different Js give isomorphic For the Grassmannians we can similarly use matrix
groups. The typical example of J is homogeneous coordinates (Stiefels coordinates):
  classes of (k  n)-matrices Z 2 Mat(k, n) of the
0 El maximal rank k relative to the equivalency
J
El 0
Z  uZ; u 2 GLk; C
It is convenient then to represent matrices g as
  The rows of a matrix Z correspond to a base in
A B subspace with the homogeneous coordinate Z; the
g
C D left multiplication on a matrix u replaces this base,
where the blocks A, B, C, D are matrices of order l. but does not change the subspace. The group
Then the symplectic condition is that A> D  GL(n; C) acts by right multiplications:
C> A = E and matrices A> C and D> B are symmetric. Z 7! Zg
If C = 0 then D = (A> )1 and A1 B is a symmetric
matrix. In this way, we have in Sp(l; C) a subgroup and this action preserves the equivalency classes.
P of blocktriangular matrices of a very simple Suppose k  n  k and the left k-minor of Z is not
structure; it is an example of subgroups which are zero. Such matrices give the dense coordinate chart
called parabolic. Ck(nk) : we can pick in the equivalency classes the
There are two principal classes of homogeneous representatives (Ek , z), z 2 Mat(k, n  k), and con-
spaces with complex semisimple Lie groups: flag sider the matrices z as (inhomogeneous) local
manifolds and Stein manifolds. coordinates. In the inhomogeneous coordinates the
502 Classical Groups and Homogeneous Spaces

action of the group has a matrix fractional linear (inhomogeneous) coordinate chart we obtain the
form: let condition that the matrix z is symmetric. Thus, we
  have the (dense) coordinate chart on the Lagrangian
A B Grassmannian CN = Sym(k), N = k(k 1)=2 the
g
C D linear space of symmetric matrices.
A 2 Matk; D 2 Matn  k; There is one more type of minimal flag manifolds
B 2 Matk; n  k; C 2 Matn  k; k for the orthogonal group SO(n; C) the quadric Q
in the projective space:
Then we have the transformation in inhomogeneous
coordinates: Iz zIz> 0
where rows z 2 Cn n{0} represent, in homogeneous
z 7! A zC1 B zD coordinates, points in CPn1 . If I = En we have the
The condition C = 0 defines the parabolic sub- equation (z1 )2    (zn )2 = 0. This quadric is the
group which has affine action in inhomogeneous complex compact conformal flat manifold
coordinates which is transitive in the coordinate CCN , N = n  2; it is the compactification of CN
chart. In such a way the Grassmannian is a endowed with the flat conformal structure corre-
compactification of Ck(nk) (realized as a space of sponding to the quadratic isotropic cone. The
k  (n  k) matrices). If n = 2k, we can consider it as parabolic group is generated by linear conformal
the compactification of the space of square matrices transformations and translations. On the quadric Q
z of order k with the flat generalized conformal the conformal structure is defined by intersections of
structure defined by translations of the isotropy cone tangent spaces with Q. Apparently, this structure is
{det z = 0}. invariant relative to the natural action of SO(n; C).
There are similar constructions of flag manifolds
for other classical groups. We will consider only the Classical Stein Manifolds
minimal flag manifolds. For O(2k; C) we consider
the isotropic Grassmannian GrIC (2k; C) of isotropic Such homogeneous complex manifolds X = G=H have
k-subspaces relative to the symmetric form I. We complex reductive isotropy subgroups H. Contrary to
take the matrix realization of GrC (k; 2k), using the flag manifolds which are compact, these manifolds
Stiefels homogeneous coordinates, and add the are Stein ones and there are many holomorphic
matrix equation functions on them. The typical examples for
G = GL(n; C) are homogeneous spaces S(k1 , . . . ,
ZIZ> 0 kr1 ), n = k1    kr1 , for which the isotropy sub-
groups are blockdiagonal matrices with the blocks of
which is well defined in the homogeneous coordi- sizes k1 , . . . , kr1 . Then points of the manifold can be
nates (compatible with the equivalency classes) and realized as generic sets of subspaces Lj  Cn ,
defines isotropic subspaces relative to I. This matrix dim Lj = kj , 1  j  r 1 or, what is equivalent, gen-
cone is preserved by the subgroup O(2k; C)  eric sets of (kj  1)-dimensional planes in CPn1 . Since
GL(2k; C) corresponding to the matrix I. If we the isotropy subgroup of such a homogeneous space is a
take the symmetric matrix subgroup of the parabolic subgroup P(n1 , . . . , nr ),
  kj = nj  nj1 , we have the natural fibering S(k1 , . . . ,
0 Ek
I kr1 ) ! F(n1 , . . . , nr ) (it is simple to see this geo-
Ek 0
metrically: the ith subspace of a flag in the base is the
then in inhomogeneous coordinates (z is a square direct sum of first i subspaces representing a point in
k-matrix) this equation is transformed into the the fiber). This is a convenient tool to apply
condition that the matrix z is skew-symmetric. So, complex analysis on S to the compact manifold F
in a natural sense, the isotropic Grassmannian is where there are no nontrivial holomorphic functions.
the compactification of the linear space of skew- Let us emphasize that such a connection exists only
symmetric matrices Alt(k) = CN , N = k(k  1)=2. for special classes of classical Stein manifolds.
A similar construction makes sense for the Let us pay special attention to the subclass of
symplectic group: if we replace the symmetric form symmetric Stein manifolds. For such manifolds X, the
I with the skew-symmetric form J, we obtain the isotropy subgroup H is fixed relative to a holomorphic
equation of the matrix cone representing the involutive automorphism of G. Complex semisimple
Lagrangian Grassmannian GrLC (k; 2k) of Lagrangian Lie groups G (including classical ones) are symmetric
subspaces in 2k-dimensional linear symplectic space. Stein manifolds relative to the action of their square
If we were to choose J as above, then in the G  G by left and right multiplications.
Classical Groups and Homogeneous Spaces 503

Classical Stein manifolds for SL(n; C) considered Similarly, we can interpret the local isomorphism
above are symmetric if r = 1 and we have the SO(4; C) SL(2; C)  SL(2; C). We realize C4 as the
manifold of pairs of subspaces of complimentary space of square matrices z of order 2 with the
dimensions intersecting only on {0}. The simplest symmetric quadratic form I(z, z) = det (z). Then left
example is the manifold of pairs of different points and right multiplications of z on unimodular
of the projective line CP1 . Let us point out again matrices (z 7! uzv, u, v 2 SL(2; C)) induce orthogonal
that the transition to the generic pairs of points transforms for the form I and any orthogonal
transforms the compact complex manifold without transform can be represented in such a form (one
nonconstant holomorphic functions into a Stein can see it by the calculation of dimensions).
manifold with a large collection of holomorphic The local isomorphism SL(4; C) SO(6; C) has a
functions. slightly more complicated nature. Let us consider the
Some other examples of symmetric Stein mani- Grassmannian GrC (2; 4) of lines in the projective
folds are connected with classical geometry and space CP3 with 2  4 matrices Z as matrix homo-
linear algebra. The affine hyperboloid in Cn , geneous coordinates. Let pij , i < j, be the minors of Z
with ith and jth columns. They are called Plucker
Qz 1
coordinates on GrC (2; 4): the equivalency class of
is a symmetric space for G = O(n; C), H = O(n  1; C). Z is defined by the sequence of six numbers
We can compare it with the projective quadric p = (pij , 1  i < j  j) 6 (0, . . . , 0) up to a constant
Q(z) = 0 which is a minimal flag manifold. Let us factor. Thus, we have an imbedding of GrC (2; 4) in the
remark that there is a duality here: it is possible to projective space CP5 . The image will be the quadric
interpret points of the hyperboloid of dimension n
p12 p34  p13 p24 p14 p24 0
as generic hyperplane sections of the projective
quadric of dimension n  1. Thus, we have the isomorphism of two flag manifolds
The space X of complex symmetric matrices of and the action of SL(4; C) on the Grassmannian
order n with determinant 1 is symmetric for the transforms in orthogonal transformations of four-
group SL(n; C) which acts by the changes of dimensional quadric in CP5 . The Plucker coordinates
variables in the corresponding quadratic forms: can be defined for any Grassmannian, but they do not
produce in other cases some isomorphisms with other
z 7! g> zg; g 2 SLn; C flag manifolds; nevertheless, they realize them as
The transitive action reflects the possibility of intersections of quadrics in projective spaces.
transforming such a form into a sum of squares.
The isotropy subgroup is SO(n; C).
The Stein symmetric manifold X = SO(n; C)= Compact Classical
S(O(k; C)  O(n  k; C)) is realized as the manifold Homogeneous Manifolds
of k-dimensional subspaces in Cn on which the
restriction of the principal symmetric form I is Compact classical groups U(n), SU(n), O(n), SO(n),
nondegenerate. Sp(l) are maximal compact subgroups in the corre-
sponding classical complex groups GL(n; C), SL(n; C),
O(n; C), SO(n; C), Sp(l; C). This condition defines
Isomorphisms in Small Dimensions
them up to an isomorphism. They are fixed subgroups
Isomorphisms of classical groups in small dimen- of some antiholomorphic involutive automorphisms.
sions produce isomorphisms of some classical The unitary groups U(n) and SU(n) are the groups
homogeneous manifolds. Such isomorphisms were of unitary matrices (g
g = E,) correspondingly, of
very important in the history of geometry; below are unitary matrices with determinant 1. As the compact
a few examples. We will consider local isomorph- orthogonal group we can take the intersection U(n) \
isms (up to a finite center). We have SL(2; C) O(n; C). For the standard form I, it will be the group of
SO(3; C). Let us realize C3 as the space of symmetric real orthogonal matrices: g> g = E (so the involution in
matrices z of order 2. Then, as we remarked above, O(n; C) is the conjugation g 7! g). Similarly, we can
the two-dimensional submanifold X of matrices take Sp(l) = SU(2l) \ Sp(l; C) (then the involution is
with determinant 1 is the symmetric Stein manifold g 7! JgJ).
for the group SL(2; C). On the other hand, we can Compact classical groups act on compact homo-
take det z as the quadratic symmetric form I in C3 ; geneous Riemann manifolds. There are two mech-
then X is the hyperboloid for this form and the anisms connecting compact and complex
action of SL(2; C) on symmetric matrices gives the homogeneous manifolds. We observe the first
orthogonal transformations relative to this form I. possibility in the case of flag manifolds which are
504 Classical Groups and Homogeneous Spaces

compact. We considered them so far relative to the real Grassmannian GrR (k; n) of k-subspaces in Rn
action of complex (noncompact) groups. It turns out can be defined as SO(n)=S(O(k)  O(n  k)). This
that on the flag manifold F = G=P the maximal representation corresponds to the characterization
compact subgroup U  G continues to be transitive: of subspaces by orthonormal bases. The considera-
so we can consider flag manifolds also as being tion of arbitrary bases defines the action of the
homogeneous with compact groups. Then F = U=C, larger group GL(n; R) on GrR (k; n). Relative to this
where C is the centralizer of a torus in U. There is a action, the real Grassmannian is not symmetric since
Kahler metric on F, invariant relative to U. Thus, G the isotropy subgroup is parabolic and is not
is the group of all automorphisms of F as the involutive. Such a possibility to extend the group is
complex manifold, but U is the group of its typical for a class of compact symmetric manifolds
automorphisms as the Kahler manifold. It defines called symmetric R-spaces. They are real forms of
two sides of geometry of flag manifolds: complex Hermitian compact symmetric manifolds (minimal
and Kahler. Flag manifolds are the only compact flag manifolds). Let us also mention compact
homogeneous Kahler manifolds with semisimple Lie symmetric spaces SU(n)=SO(n), which is the compact
groups (the class of all compact Kahler manifolds form of the space of unimodular symmetric matrices
also contains locally flat compact manifolds and can be presented by the submanifold of unitary
toruses). In the example considered above we have matrices in it. Also, all compact Lie groups G are
F(n1 , . . . , nr ) = SU(n)=S(U(k0 )     U(kr )). In the lan- symmetric spaces relative to the action of G  G.
guage of Stiefel (homogeneous) coordinates, we fix a
positive Hermitian form in Cn and characterize
subspaces by orthonormal bases. For r = 1 we have
Noncompact Riemannian
Grassmannians GrC (k; n), in particular the projec-
Symmetric Manifolds
tive space CPn1 which we consider relative to the
action of the unitary groups. Relative to this action This class of symmetric manifolds has the strongest
they are Hermitian symmetric spaces. In the case of connections with classical mathematics. Let us
minimal flag manifolds for other groups the action consider noncompact real semisimple Lie groups
of maximal compact subgroups also defines on them real forms of complex semisimple Lie groups. They
the structure of compact Hermitian symmetric correspond to antiholomorphic involutions in com-
spaces. Let us emphasize that relative to noncom- plex groups.
pact groups of biholomorphic automorphisms G, Between real forms of SL(C, n) there are real and
the minimal flag manifolds (including the Grass- quaternionic unimodular groups SL(R, n), SL(H, n)
mannians) are not symmetric. and pseudounitary groups SU(p, q) of complex
In the case of homogeneous Stein manifolds matrices preserving a Hermitian form H of the
X = G=H, the picture is different: the maximal signature (p, q). The complex orthogonal group has
compact subgroups have no open orbits. There are as real forms, in particular, pseudoorthogonal
totally real orbits which are the compact forms of groups SO(p, q) of real matrices preserving a
X: XR = GR =HR , where GR and HR are compact quadratic form of the signature (p, q).
forms of G and H, respectively. It is the canonical Let G be a real simple Lie group and K be its
embedding of compact homogeneous manifolds maximal compact subgroup. Then X = G=K is a
in their complexifications. The important special Riemann symmetric manifold of noncompact type;
case is the embedding of compact symmetric K is defined by an involutive automorphism of G.
manifolds in the Stein symmetric manifolds their Therefore, in irreducible situation there is a corre-
complexifications. spondence between noncompact Riemann sym-
For compact symmetric manifolds X = U=K the metric manifolds and real simple noncompact Lie
groups U, K are compact Lie groups and elements groups. K-orbits on X are parametrized by points of
of K are fixed for an involutive automorphism  the orbit on X of a maximal abelian subgroup A
such that K contains the connected component of the Cartan subgroup of the symmetric space X. Its
the subgroup of all fixed elements of . This dimension l is the important invariant of X its
possibility to connect several symmetric manifolds rank. The algebraic base for geometry of X is the
with one involution is illustrated by the next Iwasawa decomposition
example. The sphere Sn1  Rn is the symmetric
G KAN
space SO(n)=SO(n  1); the real projective space
RPn1 is SO(n)=O(n  1). Here SO(n  1) is the where N is a maximal unipotent subgroup (in a
connected component of O(n  1) and Sn1 is a natural sense compatible with A). Then the para-
double covering of RPn1 . A few more examples, the bolic subgroup P = AN is transitive on X.
Classical Groups and Homogeneous Spaces 505

Symmetric Cones Jordan algebras (Faraut and Koranyi 1994). Such


cones participate as elements of explicit construc-
Let us start with X = GL(n, R)=O(n). This manifold
tions of other classes of symmetric spaces (see
corresponds to the classical theory of quadratic
below).
forms: X can be realized as the manifold Sym (n) of
Following Siegel, it is possible to connect with
symmetric positive matrices x 0 of order n
homogeneous self-dual cones multidimensional ver-
(corresponding to positive quadratic forms). Then
sions of Euler integrals (- and B-functions) (Faraut
the transitivity of GL(n; R) on X corresponds to the
and Koranyi 1994). They have many applications,
possibility to transform positive forms to a sum of
including those to integral formulas for complex
squares. The sufficiency of triangle matrices for such
symmetric domains.
transformations corresponds to the transitivity on
X = Sym (n) of the parabolic subgroup P of (upper)
Riemann Symmetric Manifolds of Rank 1
triangle matrices with positive diagonal elements. So
A is the group of diagonal matrices with positive The first example of non-Euclidean geometry is
elements and the submanifold of diagonal matrices connected with the Riemann symmetric manifolds of
in X parametrizes K-orbits. The general fact about rank 1 hyperbolic spaces; X = SO(1, n)=O(n) is the
A-parametrization in this example is the classical hyperbolic space of dimension n. It can be realized
fact about the reduction of quadratic forms to as the upper sheet of the two-sheeted hyperboloid:
diagonal form by orthogonal transformations.
x20  x21      x2n 1; x0 > 0
There are complex and quaternionic versions
of this picture. The symmetric manifold Pseudoorthogonal linear transformations from
X = GL(n; C)=U(n) is realized as the manifold SO(1, n) preserve this surface; they play the role of
Herm (n) of positive complex Hermitian matrices hyperbolic motions. The equivalent realization is in
(forms) and similarly classical facts of linear algebra the real ball x21      x2n < 1 relative to the
on Hermitian quadratic forms are transformed into projective transformations preserving this ball.
geometrical statements on symmetric spaces. Let us Another example of a Riemann symmetric mani-
emphasize that we consider here the group GL(n; C) fold of rank 1 is the complex hyperbolic symmetric
as the real group. The same situation exists with the space X = SU(1; n)=U(n). It can similarly be realized
manifold Herm (H, n) of positive quaternionic either as the hyperboloid
Hermitian matrices, which is the symmetric mani-
fold for the real group GL(n; H). jz0 j2  jz1 j2      jzn j2 1
These three manifolds can be included in an in Cn1 relative to pseudounitary linear transforma-
impressive geometrical structure. They all are con- tions or as the complex ball jz1 j2    jzn j2 < 1
vex homogeneous cones V in linear spaces RN which relative to complex projective transformations pre-
are self-dual (V = V
) relative to a bilinear form serving it. There are also quaternionic hyperbolic
h , i. Let us recall that spaces which are realized as the quaternionic balls in
V
fx; hx; yi > 0; y 2 V n 0g the quaternionic projective spaces. These three series
exhaust all classical Riemann symmetric manifolds
Here V is the closure of V. So these three symmetric of rank 1 of noncompact type. There is only one
manifolds are linear homogeneous self-dual cones. exceptional symmetric manifold of rank 1: it has the
There is only one more type of classical homo- dimension 16 and can be interpreted as a two-
geneous self-dual cones quadratic (Lorentzian) dimensional ball for Cayley numbers.
cones
Classical Symmetric Domains in Cn
Ln fx 2 Rn1 ; x21  x22      x2n1 > 0; x1 > 0g (Cartan Domains)
which is also called the future light cone (the Riemann symmetric manifolds of noncompact type
condition x1 < 0 defines the past light cone). The which admit an invariant complex structure also
group of linear automorphisms of this cone is have an invariant Hermitian form corresponding to
SO(1, n)  R ; the first factor is the Lorentz group. the Riemann metrics. For this reason, we will call
There is also one exceptional 27-dimensional them noncompact Hermitian symmetric manifolds
cone; it is possible to interpret this cone as the (we considered above the compact Hermitian sym-
cone of positive Hermitian matrices of third order metric manifolds). They are Stein manifolds, but as
over Cayley numbers. There is a very nice structural opposed to symmetric Stein manifolds, which we
theory of homogeneous self-dual cones; it is con- considered above, they are homogeneous relative to
venient to develop this theory in the language of real groups. The condition for a Riemann symmetric
506 Classical Groups and Homogeneous Spaces

manifold X = G=K to be Hermitian is that K has an have the realization of this Hermitian symmetric
one-dimensional center. All Hermitian symmetric space as a bounded domain in CN , N = kq. In the
manifolds of noncompact type can be realized as case k = 1, we have the usual (scalar) complex ball.
bounded domains in Cn (but, of course, not all their Let us remark that the edge of the boundary
holomorphic automorphisms extend in Cn ). In the (Shilovs boundary) is the compact symmetric space
case of classical manifolds, these domains are called
zz
Ek
Cartans domains: Cartan gave their explicit matrix
realizations. with the group of automorphisms S(U(k)  U(q))
The nature of groups of holomorphic automorph- (the isotropy subgroup of X). For k = q the edge
isms of symmetric domains X = G=K  CN is coincides with the set of unitary matrix U(k).
explained by Cartans duality. Each such domain Different forms H of the signature (k, q) are
(Hermitian symmetric manifold of noncompact linearly equivalent and they correspond to different
type) admits an embedding in a Hermitian sym- (biholomorphically equivalent) realizations of this
metric manifold of compact type XC such that the Hermitian symmetric spaces. Let us, in the beginning,
complexification GC of G is the group of holo- set k = q; the inhomogeneous matrix coordinates are
morphic automorphisms of XC (correspondingly, square matrices of order k. Let us take the form
D is an open G-orbit on XC ). Moreover, X lies  
inside a (Zariski open) coordinate chart CN , which 0 iEk
H2
is an orbit of a parabolic subgroup. iEk 0
The simplest example is the complex ball CBn Then, in inhomogeneous matrix coordinates, we
(complex hyperbolic space) imbedded in the com- have the domain X2 :
plex projective space CPn . The affine chart Cn is the
orbit of the parabolic subgroup of affine transfor- 1
z  z
0
mations. Let us consider more complicated i
examples. (complex matrices with positive skew-Hermitian
Let XC be the Grassmannian GrC (k; n), q = n  parts). This domain (but not its boundary) lies in
k p; we will use matrix homogeneous coordinates the chart. It has the structure of the tube domain
Z k  n matrices for the description of the T = R n iV, n = k2 , corresponding to the symmetric
symmetric domain. Then GC = SL(n; C). Let us take cone of positive Hermitian matrices (we take the
its real form G = SU(k; q), k q = n. We fix a space of such matrices as a real form of Cn ). The
Hermitian form H of the signature (k, q) and realize group of affine transformations of the tube domain:
G as the group of matrices preserving H:
z 7! uzu
a; u 2 GLk; C; a 2 Hermk
gHg
H
is transitive on X2 ; it is the parabolic subgroup in
Then X = Xk, q = SU(k, q)=S(U(k)  U(q)) can be rea- SU(k, q).
lized as the domain in the Grassmannian The biholomorphic equivalency of the realizations
of X corresponding to different H is induced by the
ZHZ
0
equivalency of these forms. We have
so that this Hermitian matrix of order k must be p  
positive. It is essential that this condition is invariant
2 Ek iEk
H2 H1  ; 
relative to multiplications of Z on nondegenerate 2 iEk Ek
matrices u on the left and, therefore, it is a well- Then the transform Z 7! Z transforms X2 in X1 . In
defined condition in homogeneous coordinates. inhomogeneous coordinates it is the fractional linear
Let us specify the choice of H: matrix transform
 
Ek 0
H1 z 7! iz iEk 1 z  iEk
0 Eq
It is the matrix version of the classical Cayley transform.
Then the corresponding domain X1 is defined in Similarly, we can write down the inverse transform.
inhomogeneous coordinates Z = (Ek , z), z 2 Mat(k, q), If q 6 k, then there is also an analog of the tube
by the condition realization. Let r = q  k > 0 and
Ek  zz
0 0 1
0 iEk 0
This matrix ball lies completely in the coordinate H2 @iEk 0 0 A
chart Ckq . Its rank is equal to min (k, q). Thus, we 0 0 Er
Classical Groups and Homogeneous Spaces 507

Let us represent the inhomogeneous coordinates The corresponding tubes are called the future (past)
as z = (Ek , w, u), w 2 Mat(k), u 2 Mat(k, r). Then the tube, depending on which light cone was taken.
domain X2 is defined by the condition Let us consider this construction. The group of
holomorphic automorphisms of these domains is
1
w  w
 uu
0 G = SO(2; n) the conformal extension of the
i Lorentz group. To realize this group, let us fix a
This is an example of Siegel domains of the second real symmetric matrix Q of signature (2, n) and the
kind (Pyatetskii-Shapiro 1969). This domain has a group is the group of linear transformations preser-
transitive group of affine transformations: ving simultaneously the quadratic symmetric and
Hermitian forms with this matrix Q:
w; u 7! w a 2ub
bb
; u b
a 2 Hermk; b 2 Matk; r g> Qg Q; g
Qg Q
w; u 7! cwc
; cu c 2 GL k; C The standard realization corresponds to the diagonal
matrix Q with the diagonal (1, 1, 1, . . . , 1).
This class of symmetric domains in Grassman- Cartans domains of the fourth class are connected
nians is called Cartans domains of the first class. components of the manifold
There are similar constructions for minimal flag
domains (compact Hermitian symmetric spaces) ZQZ> 0; ZQZ
> 0
with other groups. Let us consider the Lagrangian
where rows Z are homogeneous coordinates in the
Grassmannian GrLC (k; 2k) corresponding to the
projective space CPn1 . In other words, we consider
form J above. Here GC = Sp(k, C). Its real form
a domain on the quadric in the projective space
G = Sp(k; R) can be realized as the subgroup
(which is the complex flat conformal space CCn ).
of complex symplectic matrices preserving a
For the standard Q the domain will lie in the
Hermitian form H of the signature (k, k). In other
coordinate chart; thus it is the bounded realization.
words, we intersect the domains from the last
For the tube realization, we take
example with the Lagrangian Grassmannians. We
0 1
consider the coordinate chart with inhomogeneous 0 1 0
coordinates symmetric matrices z 2 Sym(k). For Q @ 1 0 0 A
H1 we have the domain of symmetric matrices z 0 0 En
with the condition
Let Z = (z0 , z1 , w1 , . . . , wn ), w = u iv, q(s, t) = s1 t1 
Ek  zz 0; z z> s2 t2          sn tn and we consider the affine
chart Cn1 = {z0 = 1}. We have
This bounded realization is called Siegels disk. For
H2 the real form is the group of real symplectic ZQZ> 2z1 qw; w 0
matrices and X2 is the domain ZQZ
2<z1 qw; w
 >0
1 The first condition gives 2<z1 = q(v, v)  q(u, u) and
=z z  z 0; z z>
2i then the second condition gives the final description
of complex symmetric matrices with positive ima- of the considered set in Cnw :
ginary parts; it is called Siegels half-plane. This is
qv; v v21  v22      v2n > 0; w u iv
the third class of Cartans domains. There are Siegel
domains of second kind connecting with the cones as the union of the future and the past tubes
of positive symmetric matrices; some of them are (T = {v1 00}). The edge Rn of these tubes (v = 0)
homogeneous, but they are never symmetric. has the structure of the Minkowski space correspond-
There are two more series of classical minimal flag ing to the form q. The parabolic subgroup is the affine
manifolds: the isotropic Grassmannians and quadrics. conformal group of the Minkowski space. It includes
They both contain the dual bounded symmetric the Poincare group and is transitive on tubes. The
domains (Cartans domains of second and fourth complete group of holomorphic automorphisms of
classes correspondingly). Some of these domains in tubes G = SO(2, n) is the group of all (not only affine)
the isotropic Grassmannians admit the realizations as conformal transformations of the Minkowski space.
tubes with the cone of positive Hermitian quaternionic The complete edge of these symmetric domains in the
matrices and others as Siegel domains of the second quadric CCn is the conformal compactification of the
kind corresponding to the same cones. Minkowski space (a compact symmetric R-space with
Symmetric domains in quadrics can be realized as the compact group S(O(2)  O(n)) on which the
tube domains with the Lorentzian (light) cones. noncompact group SO(2, n) also acts).
508 Classical Groups and Homogeneous Spaces

In addition to four Cartans classes of classical Geometry of Isomorphisms in Small Dimensions


domains there are two exceptional symmetric
We connected above several local isomorphisms of
domains in the dimensions 27 and 16 (dual to two
complex classical groups with some geometrical
exceptional compact Hermitian symmetric spaces of
facts. Let us mention now several similar examples
these dimensions). The first of them can be realized
for real groups. We start from isomorphisms of
as the tube domain corresponding to the exceptional
symmetric cones. The cone Sym (2) of symmetric
cone of positive Hermitian matrices with Cayley
positive matrices of second order is (linearly)
numbers of order 3 (the dimension 27) and another
isomorphic to the future light cone L(2). The
can be realized as a Siegel domain of the second
comparison of the groups of automorphisms gives
kind associated with the eight-dimensional future
the local isomorphism
tube. It is possible, using -function of self-dual
homogeneous cones, to write explicit Bergman and SL2; R SO1; 2
CauchySzego integral formulas.
This isomorphism corresponds also to the isomorph-
ism of two classical realizations of hyperbolic plane
Noncompact Symmetric R-Spaces of Poincare and Klein. Let us also mention that the
isomorphism SL(2, R) SU(1, 1) corresponds to the
There are several other interesting noncompact holomorphic equivalency of the disk and the upper
symmetric manifolds. Let us mention the noncom- half-plane. The isomorphism Herm (2) = L(3) corres-
pact symmetric R-spaces which are real forms of ponds to the presentation of any Hermitian matrix of
complex symmetric domains. The typical example is the order 2 in Paulis coordinates,
the domain of real square matrices x 2 Mat(k):  
t  x1 x2 ix3
Ek  xx> 0 z
x2  ix3 t x1
The condition is that this symmetric matrix is Then, det z = t2  x21  x22  x23 . To compare the
positive. It is the Riemann symmetric space with groups of automorphisms, we receive
the group G = SO(k, k). It can be embedded in the
real Grassmannian GrR (k; 2k) with the matrix SL2; C SO1; 3
homogeneous coordinates X 2 MatR (k, 2k) and the Similarly, in the quaternionic case, the isomorphism
group SL(2k; R) acting of X by right multiplications. of the cones Herm (2, H) gives the isomorphism
Let
  SL2; H SO1; 5
Ek 0
I1 The linear isomorphism of cones produces the
0 Ek
holomorphic isomorphism of corresponding tubes
and SO(k, k) be the subgroup of matrices preserving and their groups of holomorphic automorphisms. So
the quadratic form I1 : gI1 g> = I1 . This group will each of these three isomorphisms gives automati-
preserve the domain XI1 X> 0 and, in the inho- cally one more isomorphism. Let us give it for the
mogeneous coordinates X = (Ek , x), x 2 MatR (k), it first two cones:
will be exactly the same as the domain above. The Sp2; R SO2; 2; SU2; 2 SO2; 3
group SO(k, k) acts by matrix fractional linear
transformations. This domain is the real form on We just compared the descriptions of automorph-
Siegels ball. If we replace the form on isms of classical tubes from above.
  Considering det (x) as the quadratic form of
0 Ek signature (2, 2) on Mat(2) R4 , we obtain
I2
Ek 0
SO2; 2 SL2; R  SL2; R
then we realize our symmetric manifold as the Each of local isomorphisms in the complex case
domain has different real forms which admit some geome-
x x> 0 trical interpretations. We mentioned above two real
forms of the isomorphism SL(4; C) SO(6; C). The
So, the symmetric part of the matrix x must be isomorphism for SO(2, 2) admits another interpreta-
positive. This realization is homogeneous relative tion in the language of Pluckers coordinates: points
to the linear automorphisms: x 7! axa> b, a 2 of the quadric in RP5 of the signature (2, 3) can be
GL(k; R), b = b> . A similar construction exists interpreted as (complex) lines in CP3 which lie on a
for rectangular matrices. Hermitian quadric of the signature (2, 2) (Gindikin
Classical Groups and Homogeneous Spaces 509

1983). The isomorphism above for the group manifold of smaller dimension (which plays a role
SL(2, H) also corresponds to Hopfs fibering of of infinity).
CP3 on complex lines over the sphere S4 or the There are pseudo-Hermitian symmetric manifolds
isomorphism S4 and the quaternionic projective line which are not satellites of Hermitian ones. Let us
HP1 . In all these cases, isomorphisms of homo- give an interesting example. The group SL(2p, R)
geneous manifolds intertwine the actions of locally has two open orbits on the Grassmannian
isomorphic groups. GrC (p; 2p) which are both pseudo-Hermitian sym-
metric spaces. Let us consider as above the Stiefel
coordinates Z 2 MatC (p, 2p) and let Z = X iY.
Pseudo-Riemann Symmetric Manifolds Then the orbits are defined by the conditions
 
We obtain the next broad class of homogeneous X
det 00
manifolds if we preserve conditions that the group G Y
is a real semisimple one, the isotropy subgroup H is
involutive, but we remove the restriction that H In the intersection with the coordinate chart
must be (maximal) compact. Such symmetric mani- Z = (E, z), z 2 MatC (p), z = x iy, we have the
folds are often called semisimple pseudo-Riemann conditions
symmetric manifolds (since there are also pseudo-
det y00
Riemann symmetric manifolds whose groups are not
semisimple). This class of spaces contains symmetric Therefore, we obtain (nonconvex) tube domains in
Stein manifolds XC = GC =HC . Each semisimple CN = MatC (p), N = p2 , corresponding to nonconvex
symmetric manifold X = G=H admits complexifica- homogeneous cones V of real matrices with
tion as a symmetric Stein manifold. Each real positive (negative) determinants. These tubes do
semisimple Lie group G is symmetric relative to not coincide with the symmetric manifolds which
the group G  G. include also some sets of small dimensions outside of
The simplest family of semisimple symmetric the coordinate chart (on infinity). There are other
manifolds is the family of all hyperboloids of all homogeneous nonconvex cones such that corre-
signatures sponding tube domains are Zariski open parts of
Hp;q fx21    x2p  x2p1      x2n 1g pseudo-Hermitian symmetric spaces (DAtri and
Gindikin 1993). Between these cones are cones of
with the groups SO(p, q). Their complexifications nondegenerate skew-symmetric matrices, of skew-
are complex hyperboloids. There are two types Hermitian quaternionic matrices. We again observe
of Riemann manifolds in these families: compact strong connections with classical mathematics. Not
ones spheres and noncompact ones two-sheeted all pseudo-Hermitian symmetric manifolds admit
hyperboloids; all others are pseudo-Riemann. such tube realizations of dense parts. Analysis in
The Cartan duality holds for pseudo-Hermitian pseudo-Hermitian symmetric manifolds is very
symmetric manifolds: they are domains in compact interesting: we consider there instead of holo-
Hermitian symmetric manifolds (minimal flag mani- 
morphic functions @-cohomology of some degree.
folds) Z = GC =PC . They are open orbits of real Geometric relations between different symmetric
forms G of the groups of holomorphic automorph- manifolds are usually important for analytic applica-
isms GG . We construct examples of such manifolds tions since they can produce some nontrivial integral
if we consider one of the above-described realiza- transformations. In a broad sense, such transforms are
tions of noncompact Hermitian symmetric mani- considered in integral geometry (Gelfand et al. 2003).
folds (through matrix homogeneous coordinates) An important example is duality between some
and replace the condition of positivity with the compact Hermitian symmetric manifolds (when points
condition that the symmetric (Hermitian) matrix in in one of them are interpreted as submanifolds in
the definition has a fixed nondegenerate signature another one). The simplest example is the projective
(i, k  i). We can call such pseudo-Hermitian sym- duality between dual copies of projective spaces or,
metric manifolds satellites of Hermitian ones. more generally, the realization of points of Grass-
Correspondingly, we can consider nonconvex mannians as projective planes. Such a duality can
tubes, for example, the set T of such symmetric induce a duality between orbits of real forms of groups.
matrices whose imaginary parts have the signature In a special case, it can be a duality between Hermitian
(i, n  i). This domain is linear homogeneous, but it and pseudo-Hermitian symmetric manifolds.
is not symmetric; to receive the symmetric manifold Here is one important example. Let us consider in
we need to extend the nonconvex tube by a the projective space CP2k1 the domain D which in
510 Classical Groups and Homogeneous Spaces

homogeneous coordinates rows z = (z0 , z1 , . . . , zn ) spite of the fact that this group acts neither on X
are defined by the equation zHz
> 0, where H nor on Hn . Such an extension of the symmetry
is a Hermitian form of the signature (k, k), for group is a very interesting phenomenon. It happens
example, for several other symmetric manifolds, but is not a
general fact. This geometrical construction gives a
jz0 j2    jzk j2  jzk 1 j2      jzn j2 > 0 possibility to construct a multidimensional version
This domain is (k  1)-pseudoconcave and it con- of the Penrose transform from (n  2)-dimensional

@-cohomology with different coefficients into solu-
tains (k  1)-dimensional complex compact cycles,
namely (k  1)-dimensional planes. The manifold of tions of massless equations on the future (past)
these planes is exactly the domain X in the Grass- tubes.
mannian GrC (k; 2k) (of projective (k  1)-planes) The last duality is connected with some general
which is the noncompact Hermitian symmetric geometrical construction. We mentioned that each of
space the orbit of the group SU(k, k) (see above). the Riemann symmetric manifolds X = G=K admits a
This picture is the geometrical basis for a deep canonical embedding in the symmetric Stein manifold
analytic construction. In the domain D the spaces XC = GC =KC . It turns out that X has in XC a canonical

of (k  1)-dimensional @-cohomology are infinite Stein neighborhood the complex crown (X) such
dimensional for some coefficients. Their integration that many analytic objects on X can be holomorphi-
on (k  1)-planes (the Penrose transform) gives cally extended on the crown (Gindikin 2002). For
sections of corresponding vector bundles on X. The example, all solutions of all invariant differential
images are described by differential equations equations on X (which are elliptic) admit such
generalized massless equations. The basic twistor holomorphic extension. In the last example, D is
theory corresponds to k = 2 when X is isomorphic the crown of the Riemann symmetric space which is
to four-dimensional future tube (see above). defined, in Hn , by the condition =() = 0, <(0 ) > 0.
Similar dual realizations of Hermitian symmetric Symmetric manifolds are distinguished from most
manifolds exist only in special cases. The twistor other homogeneous manifolds by a very rich
realization of four-dimensional future tube was geometry which is a background for deep analytic
possible since the Grassmannian GrC (2; 4) is iso- considerations. There are several important nonsym-
morphic to the quadric in CP5 . This does not work metric homogeneous manifolds. We already men-
for the future tubes of bigger dimensions but there is tioned flag manifolds and Stein homogeneous
another possibility (Gindikin 1998). Let us have the manifolds with complex semisimple Lie groups
quadric Qn1  CPn be defined in the homogeneous which can be nonsymmetric. Pseudo-Riemann sym-
coordinates by the equation metric manifolds are open orbits of real groups on
compact Hermitian symmetric spaces. It turns out
&z z0 2  z1 2      zn 2 0 that open orbits on other flag manifolds also
produce interesting homogeneous manifolds. Let
and z   is the bilinear form. As already mentioned, F = GC =PC be a flag manifold. Flag domains are
the set of (nondegenerate) hyperplane sections open orbits of a real form G on F. Of course,
  z 0;  2 Cn1 ; & 1 pseudo-Hermitian symmetric manifolds are a special
case of this construction. Let us consider a simple
of Qn1 is the corresponding hyperboloid Hn . Thus, example with GC = SL(3; C) and P the triangle
we have the duality between a flag manifold (the group. Then points of F are pairs {a point z and a
quadric Qn1 ) and a symmetric Stein manifold (the line l passing through it}. Let G = SU(2; 1); it has
hyperboloid Hn ) with the same group SO(n 1, C); two open orbits on CP2 : the complex ball D and its
they have different dimensions. complementary DC . On F, the group G has three
The group SO(1, n) has two orbits on Qn1 : open orbits (flag domains): in the first z 2 D, l is
the real quadric QR = {z 2 Qn1 ; =(z) = 0} and its arbitrary; in the second l  DC ; in the third z 2 DC , l
complement X = Qn1 nQR . Hyperplane sections intersects D. They are all 1-pseudoconcave. In one-
which do not intersect QR (lie at X) correspond 
dimensional @-cohomology of these flag domains
such  2 Hn that with coefficients in line bundles, are realized all
three discrete series of unitary representations of
&<z > 0
SU(2, 1). For arbitrary semisimple Lie groups, all
This set has two connected components D which discrete series of representations can also be realized
are biholomorphically equivalent to the future and 
in @-cohomology of flag domains. Crowns of
past tubes T of the dimension n. Let us emphasize Riemann symmetric spaces which we just mentioned
that their group of automorphisms is SO(2, n) in parametrize cycles (complex compact submanifolds)
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 511

in flag domains. Some general version of the Penrose Faraut J and Koranyi A (1994) Analysis on Symmetric Cones.
transform connects through the integration along Oxford: Oxford University Press.
Gelfand I, Gindikin S, and Graev M (2003) Selected Topics in
cycles cohomology in flag domains with holo- Integral Geometry. Providence, RI: American Mathematical
morphic solutions of some differential equations in Society.
crowns (generalized massless equations). Gindikin S (1983) The complex universe of Roger Penrose.
Mathematical Intellingencer 5(1): 2735.
See also: Combinatorics: Overview; Compact Groups Gindikin S (1998) SO(1; n)-twistors. Journal of Geometry and
and their Representations; Lie Groups: General Theory; Physics 26: 2636.
Pseudo-Riemannian Nilpotent Lie Groups; Several Gindikin S (2002) Some remarks on complex crowns of real
symmetric spaces. Acta Mathematica Applicata 73(12): 95101.
Complex Variables: Compact Manifolds; Stability of
Helgasson S (1978) Differential Geometry, Lie Groups and
Minkowski Space; Symmetry Classes in Random Matrix
Symmetric Spaces. New York: Academic Press.
Theory; Twistor Theory: Some Applications; Twistors. Helgasson S (1994) Geometric Analysis on Symmetric Spaces.
Providence, RI: American Mathematical Society.
Onishchik A and Vinberg E (1993) Lie Groups and Lie Algebras I
Further Reading Foundations of Lie Theory. In: Onishchik A (ed.) Lie Groups and
Lie Algebras. Encyclopaedia of Mathematical Sciences, vol. 20.
Akhiezer D (1990) Homogeneous complex manifolds. In: Gindikin
New York: Springer.
S and Henkin G (eds.) Several Complex Variables IV, vol. 10,
Pyatetskii-Shapiro I (1969) Automorphic Functions and Geometry
Encyclopaedia of Mathematical Science. New York: Springer.
of Classical Domains. Amsterdam: Gordon and Breach.
DAtri J and Gindikin S (1993) Siegel domain realization of
pseudo-Hermitian symmetric manifolds. Geometriae Dedicata
46: 91126.

Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups


M A Semenov-Tian-Shansky, Steklov Institute of Multiplication in G is a Poisson mapping, if for
Mathematics, St. Petersburg, Russia, and Universite any , 2 C1 (G), we have
de Bourgogne, Dijon, France
f; gxy fx ; x gy fy ; y gx 1
2006 Elsevier Ltd. All rights reserved.
Note that in general, multiplicative brackets are
neither left nor right invariant; in other words, for
fixed x translation operators x , x do not preserve
Introduction
Poisson brackets.
The notion of classical r-matrices has emerged as Multiplicative Poisson brackets naturally arise in the
a by-product of the quantum inverse scattering study of integrable systems which admit the so-called
method (which was developed mainly by L D zero-curvature representation. The study of zero-
Faddeev and his team in their work at the Steklov curvature equations, and in particular, of the Poisson
Mathematical Institute in Leningrad); it has given a properties of the associated monodromy map, was the
new insight into the study of Hamiltonian structures main source of nontrivial examples (associated with
associated with classical integrable systems solvable classical r-matrices, classical YangBaxter equations,
by the classical inverse scattering method and its and factorizable Lie bialgebras). The special class of
generalizations. Important classification results for multiplicative Poisson brackets encountered in this
classical r-matrices are due to Belavin and Drinfeld. context is closely related to factorization problems in
Based on the initial results of Sklyanin, Drinfeld Lie groups (in particular, the matrix Riemann pro-
introduced the important concepts of Poisson Lie blem); these problems represent the key tools in
groups and Lie bialgebras which arise as a constructing solutions of zero-curvature equations.
semiclassical approximation in the study of quan- The equivalent definition of Poisson Lie groups
tum groups. uses the dual language of Hopf algebras. Let
A Poisson group is a Lie group G equipped A = F(G) be the commutative algebra of (smooth)
with a Poisson bracket such that the multiplica- functions on a Lie group G equipped with the
tion m : G  G ! G is a Poisson mapping. A standard coproduct  : A ! A  A
Poisson bracket on G with this property is called
x; y xy; 2 FG; x; y 2 G
multiplicative. More explicitly, let x , x be the
left and right translation operators in C1 (G) by as usual, we identify the (topological) tensor product
an element x 2 G, x (y) = (xy), x (y) = (yx). F(G)  F(G) with F(G  G). The multiplicative
512 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups

Poisson bracket on G equips F(G) with the structure we conclude from eqn [4] that  : g ! g ^ g is a
of a PoissonHopf algebra, that is 1-cocycle on g, that is,

f; g f;  g 2 X; Y X  I I  X; Y


 Y  I I  Y; X
Equation [2] is the starting point for the study of
relations between Poisson groups and quantum Equation [4] implies that (e) = 0, that is, a multi-
groups. Following the general philosophy of defor- plicative Poisson structure is identically zero at the
mation quantization, we can look for a deformation unit element. Its linearization at this point induces
Ah of the commutative Hopf algebra A with the the structure of a Lie algebra on the cotangent space
deformation germ determined by the Poisson struc- Te G g ; namely, for any , 0 2 g , choose , 0 2
ture on A satisfying eqn [2]. The fundamental F(G) in such a way that re = , re 0 = 0 , and set
theorem (conjectured by Drinfeld and proved by ; 0  re f; 0 g 5
Etingof and Kazhdan) asserts that any Poisson
algebra associated with a Poisson group admits a It is easy to see that h[, 0 ] , Xi = h ^ 0 , (X)i,
formal quantization (in the category of Hopf which proves that the bracket is well defined,
algebras). while eqn [5] implies the Jacobi identity.
Definition 1 Let g, g be a pair of linear spaces set
in duality; (g, g ) is called a Lie bialgebra if both g
Poisson Groups and Lie Bialgebras and g are Lie algebras and the mapping  : g ! g 
g which is dual to the commutator map [ , ] : g 
Let G be a Lie group with Lie algebra g equipped g ! g is a 1-cocycle on g.
with a multiplicative Poisson bracket. Any Poisson
bracket is bilinear in differentials of functions; it is Thus if G is a PoissonLie group, the pair (g, g ) is
convenient to express it by means of right- or left- a Lie bialgebra (called the tangent Lie bialgebra of
invariant differentials. For 2 F(G) set G). PoissonLie groups form a category in which the
morphisms are Lie group homomorphisms, which
hrx; Xi d=dtt0 etX x; are also Poisson mappings. A morphism
hr0 x; Xi d=dtt0 xetX ; (g, g ) V (h, h ) in the category of Lie bialgebras is
X 2 g; rx; r0 x 2 g a Lie algebra homomorphism g ! h such that the
dual map h ! g is again a Lie algebra homo-
Let us define the Poisson operator :G ! morphism. It is easy to see that morphisms of
Hom(g , g) by setting Poisson groups induce morphisms of their tangent
bialgebras. The converse is also true.
f; gx hxrx; r i 3
Theorem 1
For a finite-dimensional Lie algebra, we can identify (i) Let (g, g ) be a Lie bialgebra, G a connected,
Hom(g , g) with g  g; the skew symmetry of simply connected Lie group with Lie algebra g.
Poisson bracket implies that  2 g ^ g. By an abuse There is a unique multiplicative Poisson bracket
of language, the same identification is traditionally on G such that (g, g ) is its tangent Lie bialgebra.
used for infinite-dimensional algebras (e.g., for loop (ii) Morphisms of Lie bialgebras induce Poisson
algebras) as well. Of course, in the latter case, the mappings of the corresponding Poisson groups.
corresponding Poisson tensors are represented by
singular kernels which do not lie in the algebraic Basically, the theorem asserts that a Poisson
tensor product and should be regarded as tensor is uniquely restored from the infinitesimal
distributions. cocycle on the corresponding Lie algebra; moreover,
Multiplicativity of Poisson bracket on G implies a the obstruction for the Jacobi identity vanishes
functional equation for  globally if this is true for its infinitesimal part at
the unit element of the group.
xy Ad x  Ad x  y x 4 It is important to observe that Lie bialgebras
possess a remarkable symmetry: if (g, g ) is a Lie
which means that  is a 1-cocycle on G (with values bialgebra, the same is true for (g , g). Hence, the
in g ^ g). By setting dual group G (which corresponds to g ) also carries
  a multiplicative Poisson bracket. The duality theory
d for Lie bialgebras, based on the key notion of the
X etX ; X 2 g
dt t0 Drinfeld double, is discussed in the next section.
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 513

3
Classical r-Matrices and Special A coboundary Lie bialgebra with [[r, r]] 2 (^ g)g
Classes of Lie Bialgebras is called quasitriangular; it is called triangular
if r satisfies the classical YangBaxter equation
The general classification problem for Lie bialgebras [[r, r]] = 0. (Both terms come from another name of
is unfeasible (e.g., classification of abelian Lie the classical YangBaxter equation, the classical
bialgebras includes classification of all Lie algebras). triangle equation.)
In applications, one mainly deals with important When a Lie algebra g admits a nondegenerate
special classes of Lie bialgebras, of which factoriz- invariant inner product, the class of quasitriangular
able Lie bialgebras are probably the most important. Lie bialgebra structures on g admits an important
In a sense, this class may be regarded as exhaustive, specialization. Let g  g g  g be the natural
since (as explained below) any Lie bialgebra is isomorphism induced by the inner product. Let I 2
canonically embedded into a factorizable one. g  g be the canonical element; its image t 2 g  g
Various other special classes discussed in literature under this isomorphism is called the tensor
are coboundary bialgebras, triangular bialge- Casimir element. Clearly, t 2 (S2 g)g and, more-
bras, and quasitriangular bialgebras. 3
over, [t12 , t23 ] 2 (^ g)g . When g is semisimple, the
The Lie bialgebra (g, g , ) is called a coboundary 3
mapping (S g) ! (^ g)g : s 7! [s12 , s23 ] is an iso-
2 g
bialgebra if the cobracket  is a trivial 1-cocycle on g, morphism; in particular, if g is simple, both spaces
that is, are one dimensional and generated by a tensor
X X  I I  X; r for all X 2 g 6 Casimir (which is unique up to a scalar multiple). A
Lie bialgebra (g, r) is called factorizable if r 2 g ^ g
the constant element r 2 g ^ g is called the classical satisfies the modified classical YangBaxter
r-matrix. If g is semisimple, H 1 (g, V) = 0 for any equation
g-module V by the classical Whitehead theorem, and
hence all Lie bialgebra structures on g are of r; r ct12 ; t23 ; c const 6 0 9
coboundary type. The associated Lie bracket on g The convenient normalization is c = 1=4 (it can be
is given by the formula achieved by an appropriate normalization of r).
; 0  adg r  0  adg r0   7 Instead of dealing with the modified YangBaxter
equation, we may relax the antisymmetry condition
where we identified r 2 g ^ g with a skew-symmetric imposed on r. Set r = r  (1=2)t 2 g  g. Since t
linear operator r : g ! g. The restrictions imposed is ad g-invariant, the symmetric part of r drops
on r by the Jacobi identity are formulated in terms out from the cobracket; on the other hand, one
of the so-called YangBaxter tensor [[r, r]] 2 g ^ has [[r , r ]] = 0. Regarding r as a linear operator,
g ^ g, which is a quadratic expression in r. To define r 2 Hom(g , g), we get the following important
it, let us mark different factors in tensor products, result:
for example, g  g  g, by fixed numbers 1, 2, 3, . . .
Proposition 2 Let (g, g ) be a factorizable Lie
which indicate their place; for simplicity, we assume
bialgebra.
that g is embedded in an associative algebra A with a
unit. The embeddings are defined as (i) The mappings r 2 Hom(g , g) are Lie algebra
homomorphisms; moreover, r = r .
i12 ; i23 ; i13 : g  g ! A  A  A (ii) The combined mapping
by setting i12 (X  Y)=X  Y  I, and similarly ir : g ! g g : X 7! r X; r X
in other cases. For a 2 g  g, we put i12 (a) = a12 ,
etc. Set is a Lie algebra embedding.
(iii) Any X 2 g admits a unique decomposition
r; r r12 ; r13  r12 ; r23  r13 ; r23  8 X = X  X with (X , X ) 2 Im ir .
The commutators in the RHS are computed in the The additive decomposition in a factorizable Lie
associative algebra A  A  A; it is easy to check bialgebra gives rise to a multiplicative factorization
that the result does not depend on the choice of the problem in the associated Lie group. Namely, ir may
embedding g ,! A. be extended to a Lie group embedding ir : G ! G 
Proposition 1 The Jacobi identity for [ , ] is valid if G and any x 2 G, which is sufficiently close to the
and only if [[r, r]] is ad g-invariant, that is, if unit element, admits a decomposition x = x x1 
with (x , x ) 2 Im ir .
X  I  I I  X  I I  I  X; r 0 Any Lie bialgebra (g, g ) admits a canonical
for all X 2 g embedding into a larger Lie bialgebra (called its
514 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups

double) which is already factorizable. Namely, set triple. Hence, any compact semisimple Lie group
d = g g as a linear space and equip it with the K carries a natural Poisson structure; its double
natural inner product, G = D(K) is the complex group G = KC (regarded
as a real Lie group). The associated factorization
hhX; F; X0 ; F0 ii hF; X0 i hF0 ; Xi 10
problem in G is the Iwasawa decomposition
G = KAN, which exists globally.
Theorem 2 2. Let g be a real split semisimple Lie algebra, h its
(i) There exists a unique structure of the Lie algebra Cartan subalgebra, and  a system of positive
on d such that: (a) g, g
d are Lie subalgebras. roots. Fix an invariant inner product on g which
(b) The inner product [10] is invariant. is positive on h, and let {e ;  2  } be the root
(ii) Let Pg , Pg be the projection operators onto vectors normalized in such a way that
g, g
d parallel to the complementary sub- (e , e ) = 1. Let
algebra. Set rd = Pg , rd = Pg ; then (d, rd ) is a M
n R  e
factorizable Lie bialgebra. 2
(iii) The inclusion map (g, g ) V (d, d ) is a homo-
morphism of Lie bialgebras and the dual inclusion Fix an orthonormal basis {Hi } in h; let P , P0
map (g , g) V (d, d ) is an antihomomorphism. be the projection operators onto n , h in the
Bruhat decomposition g = n . h. n . The
Conversely, let a be a Lie algebra equipped with a standard Lie bialgebra structure on g is given
nondegenerate invariant inner product, a
a its Lie by the r-matrices r = P  12 P0 . In tensor
subalgebras such that (i) a are isotropic with respect notation,
to inner product, (ii) a = a. a as a linear space.
The triple (a, a , a ) is called a Manin triple. Let X 1X
r  e ^ e  Hi  Hi 11
P be the projection operators onto a in this 2
2 i
decomposition. Set r = P . Then (a, r ) is a
factorizable Lie bialgebra; moreover, a and a are Let b = h. n be the opposite Borel subalge-
set into duality by the inner product in a and inherit bras; the inner product in g sets them into
the structure of a Lie bialgebra, and a is their double. duality, and (b , b ) is a Lie sub-bialgebra
If (g, g ) is itself a factorizable Lie bialgebra, its in (g, g ). Let G be the connected, simply
double admits a simple explicit description. Set connected Lie group associated with g, B =
d = g g (direct sum of Lie algebras); let us equip HN its opposite Borel subgroups which corres-
d with the inner product pond to b . Let p : B ! B =N H be the
canonical projection. The associated factoriza-
hhX; X0 ; Y; Y 0 ii hX; Yi  hY; Y 0 i tion problem in G, g = b b1  , (b , b ) 2 B 
B , p(b ) = p(b )1 , is closely related to the
Let g
d be the diagonal subalgebra; we identify Bruhat decomposition; it is solvable for all g in
g with the embedded subalgebra ir (g )
d. the open Bruhat cell B N
G.
Proposition 3 3. Let Lg = g  C((z)) be the loop algebra of a finite
dimensional semisimple Lie algebra g, as usual we
(i) (d, g , ir (g )) is a Manin triple. denote the ring of formal Laurent series by C((z)).
(ii) As a Lie algebra, d = g g is isomorphic to the Put Lg = g  C[[z]], Lg = g  z1 C[z1 ]. Fix an
double of g. invariant inner product on g and equip Lg with
Key examples of factorizable Lie bialgebras are the inner product
associated with semisimple Lie algebras and their hhX; Yii Resz0 hXz; Yzi dz
loop algebras.
Then (Lg, Lg , Lg ) is a Manin triple. The associa-
1. Let k be a compact semisimple Lie algebra: g = kC ted classical r-matrix is called rational r-matrix; in
its complexification regarded as a real Lie algebra, tensor notation, it is represented by a singular kernel
 2 Aut g the Cartan involution which fixes k, and
t
g = k p the associated Cartan decomposition. rz; z0
z  z0
Fix a real split Cartan subalgebra a
p and the
associated Iwasawa decomposition g = k. a. n; where t 2 g  g is the tensor Casimir, which is
put s = a. n. Let B be the complex Killing form essentially the Cauchy kernel.
on g; let us equip g with the real inner product 4. Let us assume that g = sl(n); in this case, the loop
(X, Y) = Im B(X, Y), then (g, k, s) is a Manin algebra Lg admits a nontrivial decomposition
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 515

associated with the so-called elliptic r-matrix. Drinfeld have given a complete classification of
Set factorizable Lie bialgebra structures for semisimple
Lie algebras; in the loop algebra case, the problem they
I1 diag1; "; . . . ; "n1 ; solved consists of classification of all meromorphic
0 1 solutions of the classical YangBaxter equation. In
0 1 ... 0
B C other words, we assume that the distribution kernel
B 0 1 C
B. associated with the classical r-matrix is represented by
B. .. .. C
C
12
I2 B . . . C; " e2 i=n a meromorphic function (of two complex variables).
B .. C Up to an equivalence, any such solution depends
B C
@ . 1A only on one variable and belongs to the rational,
1 0 ... 0 trigonometric, or elliptic type (in the latter case, the
underlying Lie algebra is necessarily sl(n)). Classifi-
Put Z2n = Z=n Z  Z=n Z; for a = (a1 , a2 ) 2 Z2n , cation of solutions in the elliptic case is completely
set Ia = I1a1 I2a2 ; matrices Ia define an irreducible rigid; in the trigonometric case, the moduli space is
projective representation of Z2n (they form the so- finite dimensional and admits an explicit descrip-
called finite Heisenberg group). Let us denote tion. In the rational case, the classification is
the elliptic curve of modulus
by E = C=Z
Z somewhat less explicit (it has been completed by A
and let P ! E be the n-dimensional holomorphic Stolin under some nondegeneracy condition). Con-
vector bundle with flat connection and with trary to to the popular belief, there are many other
monodromies given by structures of a factorizable Lie bialgebra on loop
algebras, for which the associated r-matrices are
z 7! z 1 : h1 Ad I1 ; z 7! z
: h2 Ad I2 given by more singular distribution kernels.
Let GE
Lg be the subspace of Laurent expansions
at zero of the global meromorphic sections of P
with a unique pole at 0 2 E. Then (Lg, Lg , GE ) is Poisson Lie Groups
again a Manin triple. The associated classical
If the tangent Lie bialgebra of a Poisson Lie group is
r-matrix is the kernel of a singular integral operator
of coboundary type, the cocycle  is also trivial,
which associates a meromorphic section of P to its
(g) = r  Ad g  Ad g  r. Hence, the Poisson
principal part at 0. Explicitly, it is given by
bracket on G is given by
n1  
0 1X z  z0 f; g hr; r0 ^ r0 i  hr; r ^ r i; r2g^g
rz  z  a  b

n a;b0 n 13
where r, r0 2 g are left and right differentials of
 Ad Ia;b  I  t 2 C1 (G). This is the so-called Sklyanin bracket.
Let us assume that G is a matrix group; its affine
where is the Weierstrass zeta function.
ring generated by evaluation functions ij which
5. Let g be an arbitrary semisimple Lie algebra
assign to L 2 G its matrix coefficients, ij (L) = Lij .
again. Let us equip the loop algebra Lg with the
The Poisson bracket on G is completely determined
inner product
by its values on ij . Explicitly, we get
hhX; Yii0 Resz0 hXz; Yziz1 dz  
ij ; km L r; L  Likjm 14
Set N = n _ g  zC[[z]], N  = n _ g 
the commutator in the RHS is in Mat(n2 ). By a
z1 C[z1 ]. We have Lg = N _ h
_ N  , where
variation of language, evaluating functions and their
we identify h, n
g with the corresponding
values on a generic element L 2 G are denoted by
subalgebras of constant loops in Lg. Let P , P0
the same letter; using tensor notation to suppress
be the projection operators onto N  , h in this
matrix indices, we get
decomposition and r =  P  (1=2)P0 . The
classical r-matrices r define on Lg the structure fL1 ; L2 g r; L1 L2 ; L1 L  I; L2 I  L 15
of a factorizable Lie bialgebra. The associated
In the case of loop algebras, these Poisson bracket
tensor kernels are called the trigonometric classi-
relations take the form
cal r-matrices.
fL1 ; L2 g r; ; L1 L2 
Classical r-matrices described above are associated
with factorization problems in the infinite-dimensional Let us assume that G is factorizable and the
loop groups: matrix Riemann problems or matrix associated factorization problem is globally solvable.
Cousin problems (in the elliptic case). Belavin and The Poisson bracket on the dual group G
516 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups

ir (G )
G  G may be characterized in terms of the the Poisson structure. Moreover, the maps , 0 and
matrix coefficients of (h , h ) = ir (h), or of their p, p0 form the so-called dual pairs, that is, the
quotient h = h h1
 . Explicitly, we get algebras of functions which are constant on the fibers
     1 2   of and 0 (or of p and p0 ) are mutual centralizers of
h1 ; h2 
r; h1 h2 ; h ; h  r ; h1 h2 16 one another in the big Poisson algebra F(D ).
Since D = G  G = G  G, we have G =D G,
fh1 ; h2 g rh1 h2 h1 h2 r  h2 r h1  h1 r h2 ; G=D G ; it is easy to check that the quotient
17 Poisson structure induced on G, G coincides with
r 12 r r
the original one. Applying the fundamental theorem
on dual pairs of Poisson mappings (going back to S.
The key question in the geometry of Poisson
Lie), we conclude that symplectic leaves in G and G ,
groups consists in description of symplectic leaves in
respectively, coincide with the orbits of G (respec-
G, G . This question is already nontrivial when G is
tively, G) in these quotient spaces. The actions G 
abelian (and hence may be identified with the dual of
G ! G , G  G ! G are called dressing transfor-
the Lie algebra g = Lie(G)). The Poisson bracket on
mations. Unit elements in G and G are fixed points
g is linear; this is the well-known LiePoisson (alias,
of dressing transformations; their linearizations at the
BeresinKirillovKostant) bracket. Its symplectic
tangent spaces Te G g , Te G g coincide with the
leaves coincide with the orbits of the coadjoint
coadjoint actions of G and G , respectively.
representation of G in g . The natural way to prove
When D 6 G  G (i.e., the factorization problem in
this fundamental result (which goes back to Lie) is to
D is not always solvable), dressing actions are still well
consider first the natural action of G on the
defined as global transformations of the quotient
cotangent bundle T  G G  g ; this action is
spaces; in this case G, G may be identified with open
Hamiltonian, and the coadjoint orbits arise as a
cells in D=G , D=G, respectively, which means that
result of Hamiltonian reduction associated with this
dressing action on G, G is, in general, incomplete.
action. The generalization of the theory of coadjoint
If the group G is factorizable, symplectic leaves in the
orbits to the case of arbitrary Poisson groups starts
dual group G admit a nice uniform description: since
with the notion of symplectic double, which is the
in this case D = G  G and G
D is the diagonal
nonlinear analog of the cotangent bundle.
subgroup, the quotient D=G may be modeled on G
Let D be the double of (G, G ); assume for
itself. The quotient Poisson bracket in this realization
simplicity that D = G  G globally and hence the
coincides with [17], while the dressing action coin-
associated factorization problem is always solvable.
cides with conjugation in G (and is independent of
Let rd = (1=2)(Pg  Pg ). Set
r). Hence, symplectic leaves in D/G coincide with
f; g hrd r; r i  hrd r0 ; r0 i 18 conjugacy classes in G; the equivalence of this model
with G (equipped with the bracket [16]) is provided
The bracket { , } is the usual Sklyanin bracket which by the factorization map. The description of sym-
defines the structure of a Poisson group on D, while plectic leaves in G is more subtle (and already
{ , } is nondegenerate and defines a symplectic crucially depends on the choice of r!); for semisimple
structure on D. Let us denote the copies of D equipped Lie groups with the standard Poisson structure, it is
with the bracket { , } by D . The bracket on D is not related to the geometry of double Bruhat cells.
multiplicative, but it is covariant with respect to the For loop groups with rational, trigonometric, or
action of D by left and right translations; in other elliptic r-matrices, dressing action is associated with
words, the natural mappings D  D ! D and auxiliary factorization problems in the loop group.
D  D ! D , associated with multiplication in D, Roughly speaking, symplectic leaves correspond to
preserve Poisson brackets. Since G,G
D are rational loops with prescribed singularities. Many
Poisson subgroups, natural actions G  D ! D important examples have been described in connection
and G  D ! D by left and right translations are with integrable lattice systems, although a complete
Poisson mappings. Consider the natural projections classification theorem is still not available. For
D D g = sl(2), the elliptic Manin triple described earlier
. & 0 p. &p0 leads to the Poisson structure on the group of elliptic
loops with values in SL(2); its simplest symplectic
G D=G GnD G G D=G G nD G
leaves (corresponding to loops with simple poles) are
onto the space of left and right coset classes. It is easy associated with a remarkable Poisson algebra, the
to see that functions on D which are constant on each Sklyanin algebra (with four generators and two
projection fiber are closed with respect to the Poisson Casimir functions), which admits an interesting
bracket. This means that the quotient spaces inherit explicit quantization.
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 517

Dressing action is a nontrivial example of a linear operators). Equations [19] give the compat-
Poisson group action. In general, such actions are ibility conditions for the auxiliary linear system
not Hamiltonian in the usual sense; the appropriate
d m
generalization is provided by the notion of the m1 Lm m; Mm m; m 2 Z 20
nonabelian moment map. Let G  M ! M be an dt
action of a Poisson group G on a Poisson manifold The use of finite-difference operators associated with
M, g ! Vect M, the associated homomorphism of a one-dimensional lattice, as in [20], is particularly
Lie algebras. A mapping : M ! G is called the well suited for the study of multiparticle lattice
nonabelian moment map associated with this action, models. Let we assume that the potential Lm in [20]
if for any X 2 g and 2 F(M), we have is periodic, LmN = Lm ; the period N may be
interpreted as the number of copies of an elemen-
X  h 1 f ; gM ; Xi tary system. It is natural to presume that Lax
In this case, G  M ! M is a fortiori a Poisson matrices Lm in [19] are elements of a matrix Lie
map. Both dressing actions G  G ! G and G  group G (or of a loop group, if they depend on an
G ! G admit nonabelian moment maps, which are extra parameter). The auxiliary linear problem [20]
just the identity maps = idG and  = idG . For leads to a family of dynamical systems on GN which
compact Poisson groups, the nonabelian moment remain integrable for any N. Let T : GN ! G be the
map has good convexity properties, which general- monodromy map which assigns to the set
ize the convexity properties of the ordinary moment L1 , . . . , LN of local Lax matrices their ordered
map for Hamiltonian group actions. product TL = LN LN1    L1 . Let us assume that G is
The general theory of homogeneous Poisson spaces equipped with the Sklyanin bracket associated with a
has some peculiarities. Typically, the G-covariant factorizable r-matrix r. Then T is a Poisson map. Let
Poisson structure on a given homogeneous space is I(G) be the algebra of central functions on G; for 2
not unique (when it exists); this is true already for I(G), set H = T. All functions H , 2 I(G) are
principal homogeneous spaces (a simple example is in involution with respect to the product Poisson
provided by the symplectic double D ). Let G be a bracket on GN and give rise to lattice zero-curvature
Poisson Lie group, (g, g ) its tangent Lie bialgebra, d equations of the same form as [19]; for a given , we
its double, U its Lie subgroup, u = Lie U. A subalgebra may choose the M-matrix in either of the two forms:
l
d is called Lagrangian if it is isotropic with respect  Y
M m r
1
m rTL m ; m Lk
to the canonical inner product in d. The general
1 k m
classification result, according to Drinfeld, asserts that
there is a bijection between G-covariant Poisson Let Lm (t), m = 1, . . . , N, be the integral curve of
structures on G=U and the set of all Lagrangian this equation which starts at L0m . The construction of
subalgebras l
d such that l \ g = u. Various non- this curve reduces to the factorization problem asso-
trivial examples arise, notably in the study of integr- ciated with the chosen r-matrix. Explicitly, we get
able systems. For instance, the geometric proof of the
factorization theorem for lattice zero-curvature equa- Lm t gm1 t1 0 1 0
Lm gm t gm1 t Lm gm t
tion, which is stated in the following section, uses a where (gm (t) , gm (t) ) is the curve in G which
different Poisson structure on the double (the so-called solves the factorization problem
twisted symplectic double).
gm t gm t1

0
m exptrTL0 0 1
m ;
0 0
m m L
Applications to Integrable Systems
This result exhibits the double role of the r-matrix.
The definition of PoissonLie groups was motivated On the one hand, it serves to define the Poisson
by key examples which arise in the theory of structure on GN which is adapted to the study of
integrable systems. In applications, one often deals lattice zero-curvature equations; in particular, the
with nonlinear differential equations which may be dynamical flow associated with these equations is
written in the form of the so-called lattice zero automatically confined to symplectic leaves in GN .
curvature equations (In applications, G is usually a loop group equipped
with a factorizable r-matrix; despite the fact that
dLm
Lm Mm  Mm1 Lm ; m2Z 19 dim G = 1, it admits plenty finite-dimensional sym-
dt plectic leaves.) In its second incarnation, the r-matrix
where Lm , Mm are matrices, possibly depending on serves to define the factorization problem which
an additional parameter (or, more generally, abstract solves these zero-curvature equations. In the loop
518 Clifford Algebras and Their Representations

group case, this is a matrix Riemann problem; its 1998, Classic Reviews in Mathematics and Mathematical
explicit solution is based on the study of the spectral Physics, vol. 1. Amsterdam: Harwood Academic Publishers.
Chari V and Pressley A (1995) A Guide to Quantum Groups.
curve associated with the monodromy matrix TL Cambridge: Cambridge University Press.
and uses the technique of algebraic geometry. Drinfeld VG (1987) Quantum groups. In: Proceedings of the
The monodromy map T : GN ! G may be regarded International Congress of Mathematicians, (Berkeley, Calif.,
as a nonabelian moment map associated with an 1986) vol. 1, pp. 798820. Providence, RI: American
action of the dual Lie algebra g on the phase space. Mathematical Society.
Etingof P and Schiffman O (1998) Lectures on Quantum Groups.
This action actually extends to an action of the (local) Boston: International Press.
Lie group G which transforms solutions into solu- Frenkel E, Reshetikhin N, and Semenov-Tian-Shansky MA (1998)
tions again. This is the prototype dressing action DrinfeldSokolov reduction for difference operators and
(originally defined by Zakharov and Shabat in their deformations of W-algebras. I. The case of Virasoro algebra.
study of zero-curvature equations related to Riemann Communications in Mathematical Physics 192(3): 605629.
Lu J-H (1991) Momentum mappings and reduction of Poisson
Hilbert problems). Dressing provides an effective tool actions. Symplectic Geometry, Groupoids, and Integrable Sys-
to produce new solutions of zero-curvature equations tems (Berkeley, CA, 1989), Mathematical de Sciences Research
from the trivial ones; it was also the first nontrivial Institute Publications vol. 20: 209226. New York: Springer.
example of a Poisson group action. Lu J-H and Weinstein A (1990) PoissonLie groups, dressing
transformations, and Bruhat decompositions. Journal of
See also: Affine Quantum Groups; Bicrossproduct Differential Geometry 31(2): 501526.
Reshetikhin N (2000) Characteristic systems on PoissonLie
Hopf Algebras and Noncommutative Spacetime;
groups and their quantization. In: Integrable Systems:
Bi-Hamiltonian Methods in Soliton Theory; Deformations
From Classical to Quantum (Montreal, QC, 1999), CRM
of the Poisson Bracket on a Symplectic Manifold; Proceedings Lecture Notes, vol. 26, pp. 165188. Providence,
Functional Equations and Integrable Systems; RI: American Mathematical Society.
Hamiltonian Fluid Dynamics; Hopf Algebras and Reshetikhin NY and Semenov-Tian-Shansky MA (1990) Central
q-Deformation Quantum Groups; Integrable Systems extensions of quantum current groups. Letters in Mathema-
and Recursion Operators on Symplectic and Jacobi tical Physics 19(2): 133142.
Manifolds; Integrable Systems: Overview; Lie, Symplectic Reyman AG and Semenov-Tian-Shansky MA (1994) Group-
and Poisson Groupoids, and their Lie Algebroids; Multi- theoretical methods in the theory of finite-dimensional integrable
Hamiltonian Systems; Poisson Reduction; Recursion systems. In: Encyclopaedia of Mathematical Sciences, Dynamical
Systems VII, ch. 2, vol. 16, pp. 116225. Berlin: Springer.
Operators in Classical Mechanics; Toda Lattices;
Semenov-Tian-Shansky MA (1994) Lectures on R-matrices,
YangBaxter Equations.
PoissonLie groups and integrable systems. In: Babelon O,
Cartier P, and Kosmann-Schwarzbach Y (eds.) Lectures on
Integrable Systems (Sophia-Antipolis, 1991), pp. 269317.
Further Reading River Edge: World Scientific.
Terng C-L and Uhlenbeck K (1998) Poisson actions and scattering
Babelon O, Bernard D, and Talon M (2003) Introduction to Classical theory for integrable systems. In: Surveys in Differential Geome-
Integrable Systems. Cambridge: Cambridge University Press. try: Integrable Systems, pp. 315402. Lectures on geometry and
Belavin AA and Drinfeld VG (1984) Triangle equations and simple topology, sponsored by Lehigh Universitys Journal of Differential
Lie algebras. In: Mathematical physics reviews, vol. 4, Soviet Geometry. A supplement to the Journal of Differential Geometry.
Scientific Reviews Section C Mathematical Physics Reviews, Edited by Chuu Lian Terng and Karen Uhlenbeck. Surveys in
pp. 93165. Chur: Harwood Academic Publishers, Reprinted in Differential Geometry IV, Boston: International Press.

Clifford Algebras and Their Representations


A Trautman, Warsaw University, Warsaw, Poland Euclidean space. Cartan discovered representations of
2006 Elsevier Ltd. All rights reserved.
the Lie algebras son (C) and son (R), n > 2, that do
not lift to representations of the orthogonal groups.
In physics, Clifford algebras and spinors appear for
the first time in Paulis nonrelativistic theory of the
Introduction magnetic electron. Dirac (1928), in his work on the
relativistic wave equation of the electron, introduced
Introductory and Historical Remarks
matrices that provide a representation of the Clifford
Clifford (1878) introduced his geometric algebras algebra of Minkowski space. Brauer and Weyl (1935)
as a generalization of Grassmann algebras, complex connected the Clifford and Dirac ideas with Cartans
numbers, and quaternions. Lipschitz (1886) was the spinorial representations of Lie algebras; they found,
first to define groups constructed from Clifford in any number of dimensions, the spinorial, projective
numbers and use them to represent rotations in a representations of the orthogonal groups.
Clifford Algebras and Their Representations 519

Clifford algebras and spinors are implicit in every s 2 S1 and ! 2 S2 . If S1 and S2 are complex
Euclids solution of the Pythagorean equation x2  vector spaces, then a map f : S1 ! S2 is said to be
y2 z2 = 0, which is equivalent to semilinear if it is R-linear and f (is) = if (s). The
! ! complex conjugate of a finite-dimensional complex
yx z p vector space S is the complex vector space S of all
=2 p q 1
z yx q semilinear maps from S to C. There is a natural
semilinear isomorphism (complex conjugation) S !  S,
and gives x = q2  p2 , y = p2 q2 , z = 2pq. If the s 7! s such that h!, si = hs, !i for every ! 2 S .
numbers appearing in [1] are real, then this equation The space S can be identified with S and then s = s.
can be interpreted as providing a representation of a The spaces (S) and S are identified. If f : S1 ! S2
vector (x, y, z) 2 R 3 , null with respect to a quadratic is a complex-linear map, then there is the complex-
form of signature (1, 2), as the square of a spinor conjugate map f : S1 ! S2 given by f (s) = f (s) and
def  
(p, q) 2 R2 . The pure spinors of Cartan (1938) the Hermitian conjugate map f y f : S1 ! S2 .

provide a generalization of this observation to A linear map A : S ! S such that A = A is said to
y

higher dimensions. be Hermitian. K(N) denotes, for K = R, C or H, the


Multiplying the square matrix in [1] on the left by set of all N by N matrices with elements in K.
a real, 2  2 unimodular matrix, on the right by its
transpose, and taking the determinant, one arrives at
the exact sequence of group homomorphisms: Real, Complex, and Quaternionic Structures
A real structure on a complex vector space S is a
1 ! Z2 ! SL2 R = Spin01;2 ! SO01;2 ! 1
complex-linear map C : S ! S such that CC  = idS .
Multiplying the same matrix by A vector s 2 S is said to be real if s = C(s). The set of
all real vectors is a real vector space; its real
!
0 1 dimension is the same as the complex dimension of S.
"= 2 A complex-linear map C : S ! S such that
1 0 
CC =  idS defines on S a quaternionic structure; a
on the left and computing the square of the product, necessary condition for such a structure to exist is
one obtains that the complex dimension m of S be even, m = 2n,
n 2 N. The space S with a quaternionic structure
!2 ! can be made into a right vector space over the field
z xy 1 0
= x2  y2 z2 H of quaternions. In the context of quaternions, it is
xy z 0 1 convenient to represent the imaginary unit of C as
p
1. Multiplication on the right by the quaternion
This equation is an illustration of the idea of
unit
p i is realized as the multiplication (on the left) by
representing a quadratic form as the square of a
1. If j and k = ij are the other two quaternion
linear form in a Clifford algebra. Replacing y by iy,  s) and sk = sij.
units and s 2 S, then one puts sj = C(
one arrives at complex spinors, the Pauli matrices,
A real vector space S can be complexified by
! ! forming the tensor product C R S = S  iS.
0 1 1 0
x = ; y = i"; z = The realification of a complex vector space S is the
1 0 0 1 real vector space having S as its set of vectors so that
dimR S = 2 dimC S. The complexification of a realifica-
Spin3 = SU2 , etc.
tion of S is the double S  S of the original space.
This article reviews Clifford algebras, the asso-
ciated groups, and their representations, for quad-
ratic spaces over complex or real numbers. These Inner-Product Spaces and Their Groups
notions have been generalized by Chevalley (1954)
to quadratic spaces over arbitrary number fields. Definitions: quadratic and symplectic spaces A
bilinear map B : S  S ! K on a vector space S over
K is said to make S into an inner-product space. To
Notation save on notation, one also writes B : S ! S so that
If S is a vector space over K = R or C, then S hs, B(t)i = B(s, t) for all s, t 2 S. The group of
denotes its dual, that is, the vector space over K automorphisms of an inner-product space,
of all K-linear maps from S to K. The value of ! 2 AutS; B = fR 2 GLSjR  B  R = Bg
S on s 2 S is sometimes written as hs, !i.
The transpose of a linear map f : S1 ! S2 is the is a Lie subgroup of the general linear group GL(S).
map f  : S2 ! S1 defined by hs, f  (!)i = hf (s), !i for An inner-product space (S, B) is said here to be
520 Clifford Algebras and Their Representations

quadratic (resp., symplectic) if B is symmetric (resp., chosen in V so that, defining g = g(e , e ), one
antisymmetric and nonsingular). A quadratic space is has g = (1)1 and, if  6 , then g = 0.

characterized by its quadratic form s 7! B(s, s). For If A : S ! S is a Hermitian isomorphism, then

K = C, a Hermitian map A : S !  S defines a there is a (pseudo)unitary frame (e ) in S such that
Hermitian scalar product A(s, t) = hs, A(t)i. the matrix A  = A(e , e ) is diagonal, has p 1s
An orthogonal space is defined here as a quadratic and q 1s on the diagonal, p q = dim S. If p = q,
space (S, B) such that B : S ! S is an isomorphism. then A is said to be neutral. A is definite if either p
The group of automorphisms of an orthogonal space or q = 0.
is the orthogonal group O(S, B). The group of
automorphisms of a symplectic space is the sym-
plectic group Sp(S, B). The dimension of a symplec-
tic space is even. If S = K2n is a symplectic space Algebras
over K = R or C, then its symplectic group is Definitions An algebra over K is a vector space A
denoted by Sp2n (K). Two quaternionic symplectic over K with a bilinear map A  A ! A, (a, b) 7! ab,
groups appear in the list of spin groups of low- which is distributive with respect to addition.
dimensional spaces: The algebra is associative if (ab)c = a(bc) holds for
all a, b, c 2 A. It is commutative if ab = ba for all
Sp2 H = fa 2 H2 j ay a = Ig
a, b 2 A. An element 1A is the unit of A if
and 1A a = a1A = a holds for every a 2 A.
From now on, unless otherwise specified, the bare
Sp1;1 H = fa 2 H2 j ay z a = z g word algebra denotes a finite-dimensional, associa-
tive algebra over K = R or C, with a unit element.
Here ay denotes the matrix obtained from a by
If S is an N-dimensional vector space over K, then the
transposition and quaternionic conjugation.
set End S of all endomorphisms of S is an N2-
dimensional algebra over K, the product being
defined by composition; if f , g 2 End S, then one
Contractions, frames, and orthogonality From now writes fg instead of f g; the unit of End S is
on, unless otherwise specified, (V, g) is a quadratic the identity map I. By definition, homomorphisms
space of dimension m. Let ^V = m p
p = 0 ^ V be its of algebras map units into units. The map K ! A,
exterior (Grassmann) algebra. For every v 2 V and a 7! a1A is injective and one identifies K with its
w 2 ^V there is the contraction gvcw characterized image in A by this map so that the unit can be
as follows. The map V  ^V ! ^V, v, w 7! represented by 1 2 K A. A set B A is said to
gvcw, is bilinear; if x 2 ^p V, then gvcx ^ w = generate A if every element of A can be represented
gvcx ^ w 1p x ^ gvcw and gvcv= gv, v. as a linear combination of products of elements of B.
A frame (e ) in a quadratic space (V, g) is said to For example, if V is a vector space over K, then its
be a quadratic frame if  6  implies g(e , e ) = 0. tensor algebra
For every subset W of V there is the orthogonal
p
subspace W ? containing all vectors that are ortho- T V = 1
p=0  V
gonal to every element of W.
If (V, g) is a real orthogonal space, then there is an is an (infinite-dimensional) algebra over K generated
orthonormal frame (e ),  = 1, . . . , m, in V such that by K  V. The algebra of all N  N matrices
k frame vectors have squares equal to 1, l frame with entries in an algebra A is denoted by A(N).
vectors have squares equal to 1 and k l = m. The Its unit element is the unit matrix I. In particular,
pair (k, l) is the signature of g. The quadratic form g R(N), C(N), and H(N) are algebras over R. The
is said to be neutral if the orthogonal space (V, g) algebra R(2) is generated by the set fx , z g. As a
admits two maximal totally null subspaces W and vector space, the algebra R(2) is spanned by the set
W 0 such that V = W  W 0 . Such a space V is 2n- fI, x , ", z g.
dimensional, either complex or real with g of The direct sum A  B of the algebras A and B
signature (n, n). A Lorentzian space has maximal over K is an algebra over K such that its underlying
totally null subspaces of dimension 1 and a vector space is A  B and the product is defined by
Euclidean space, characterized by a definite quad- (a, b)
(a0 , b0 ) = (aa0 , bb0 ) for every a, a0 2 A and
ratic form, has no null subspaces. The Minkowski b, b0 2 B. Similarly, the product in the tensor
space is a Lorentzian space of dimension 4. product algebra A K B is defined by
If (V, g) is a complex orthogonal space, then an
orthonormal frame (e ),  = 1, . . . , m, can be a  b
a0  b0 = aa0  bb0 3
Clifford Algebras and Their Representations 521

For example, if A is an algebra over R, then the isomorphism, then the representations 1 and 2 are
tensor product algebra R(N) R A is isomorphic to said to be equivalent, 1 2 . The following two
A(N) and propositions are classical:
KN K KN 0 = KNN 0 4 Proposition (A)
for K = R or C and N, N0 2 N. There are isomorph- (i) An algebra over K is simple if and only if it
isms of algebras over R: admits a faithful irreducible representation in a
vector space over K. Such a representation is
C R C = C  C unique, up to equivalence.
C R H = C2 5 (ii) The complexification of a central simple algebra
H R H = R4 over R is a central simple algebra over C.

An algebra over R can be complexified by complex- For real algebras, one often considers complex
ifying its underlying vector space; it follows from [5] representations, that is, representations in complex
that C(2) is the complex algebra obtained by vector spaces. Two such representations 1 : A !
complexification of the real algebra H. End S1 and 2 : A ! End S2 are said to be complex
The center of an algebra A is the set equivalent if there is a complex isomorphism F : S1 !
S2 intertwining the representations; they are real
ZA = fa 2 A j ab = ba 8 b 2 Ag equivalent if there is an isomorphism among the
The center is a commutative subalgebra containing realifications of S1 and S2 , intertwining the
K. An algebra over K is said to be central if its center representations. For example, C, considered as an
coincides with K. The algebras R(N) and H(N) are algebra over R, has two complex-inequivalent
central over R. The algebra C(N) is central over C, representations in C : the identity representation
but not over R. and its complex conjugate. The realifications of
these representations, given by i 7! " and i 7! ",
respectively, are real equivalent: they are intertwined
Simplicity and representations Let B1 and B2
by z . The real algebra H, being central simple, has
be subsets of the algebra A. Define B1 B2 = fb1 b2 j
only one, up to complex equivalence, representation
b1 2 B1 , b2 2 B2 g. A vector subspace B of A is said
in C2: every such representation is equivalent to the
to be a left (resp., right) ideal of A if AB B (resp.,
one given by
BA B). A two-sided ideal or simply an ideal is
p p p
a left and right ideal. An algebra A 6 f0g is said to i 7! x = 1; j 7! y = 1; k 7! z = 1
be simple if its only two-sided ideals are f0g and A.
For example, the algebras R(N) and H(N) are This representation extends to an injective homo-
simple over R; the algebra C(N) is simple when morphism of algebras i : H(N) ! C(2N) which is used
considered as an algebra over both R and C; every to define the quaternionic determinant of a matrix a 2
associative, finite-dimensional simple algebra over R H(N) as detH a= det ia, so that detH (a)5 0 and
or C is isomorphic to one of them. detH (ab)= detH (a)detH (b) for every a, b 2 H(N). In
A representation of an algebra A over K in a vector particular, if q 2 H and ,  2 R, then detH (q)= q
q and
space S over K is a homomorphism of algebras  : A ! !
 q
End S. If  is injective, then the representation is said to detH =  qq2 6
be faithful. For example, the regular representation  : q 
A ! End A of an algebra A, defined by (a)b = ab There are quaternionic unimodular groups
for all a, b 2 A, is faithful. A vector subspace T of SLN H = fa 2 HN j detH a = 1g. For example,
the vector space S carrying a representation  of A the group SL1 (H) is isomorphic to SU2 and SL2 (H)
is said to be invariant for  if (a)T T for every is a noncompact, 15-dimensional Lie group, one of
a 2 A; it is proper if distinct from both f0g and S. the spin groups in six dimensions.
For example, a left ideal of A is invariant for the
regular representation. Given an invariant subspace
T of  one can reduce  to T by forming the Antiautomorphisms and inner products An auto-
representation T : A ! End T, where T (a)s = (a)s morphism of an algebra A is a linear isomorphism  :
for every a 2 A and s 2 T. A representation is A ! A such that (ab) = (a)(b). An invertible
irreducible if it has no proper invariant subspaces. element c 2 A defines an inner automorphism Ad(c) 2
A linear map F : S1 ! S2 is said to intertwine the GL(A), Ad(c)a = cac1 . Complex conjugation in C,
representations 1 : A ! End S1 and 2 : A ! End S2 if considered as an algebra over R, is an automorphism
F1 (a) = 2 (a)F holds for every a 2 A. If F is an that is not inner. An antiautomorphism of an
522 Clifford Algebras and Their Representations

algebra A is a linear isomorphism  : A ! A such that when one reduces the degree of every element
(ab) = (b)(a) for all a, b 2 A. An (anti)auto- mod 2. A graded isomorphism of graded algebras
morphism  is involutive if 2 = id. For example, is an isomorphism that preserves the grading.
conjugation of quaternions defines an involutive A Z2 -grading of A is characterized by the
antiautomorphism of H. involutive automorphism  such that, if a 2 Ap ,
Let  : A ! End S be a representation of an algebra then (a) = (1)p a. From now on, grading means
with an involutive antiautomorphism . There is then Z2 -grading unless otherwise specified. The elements
the contragredient representation  : A ! End S given of A0 (resp., A1 ) are said to be even (resp., odd). It
by (a) = (((a))) . If, moreover, A is central simple is often convenient to denote the graded algebra as
and  is faithful irreducible, then there is an isomorph-
ism B : S ! S intertwining  and  which is either A0 ! A 7
symmetric, B = B, or antisymmetric, B = B. It Given such an algebra over K and N 2 N, one
defines on S the structure of an inner-product space. constructs the graded algebra A0 (N) ! A(N). Two
This structure extends to End S: there is a symme- graded algebras over K, A0 ! A and A00 ! A0 are
tric isomorphism B  B1 : End S ! (End S) = End S said to be of the same type if there are integers N
given, for every f 2 End S, by (B  B1 )(f ) = Bf B1 . and N 0 such that the algebras A0 (N) ! A(N) and
Let K = Knf0g be the multiplicative group of the A00 (N 0 ) ! A0 (N 0 ) are graded isomorphic. The prop-
field K. Given a simple algebra A with an involutive erty of being of the same type is an equivalence
antiautomorphism , one defines N(a) = (a)a and relation in the set of all graded algebras over K.
the group Given an algebra A, one constructs two canoni-
G = fa 2 A j Na 2 K g cal graded algebras as follows:

Let  : A ! End S be the faithful irreducible represen- 1. the double algebra


tation as above, then, for a 2 A and s, t 2 S, one has A!AA
Bas; at = NaBs; t graded by the swap automorphism, (a1 , a2 ) =

If a 2 G() and  2 K , then a 2 G() and the norm (a2 , a1 ) for a1 , a2 2 A;
N satisfies N(a) = 2 N(a). The inner product B is 2. the algebra
invariant with respect to the action of the group A  A ! A2
G1  = fa 2 G j Na = 1g is defined by declaring the diagonal (resp., anti-
diagonal) elements of A(2) to be even (resp., odd).
Proposition (B) Let A be a central simple algebra The real algebra R(2) has also another grading,
over K with an involutive antiautomorphism  and a given by the involutive automorphism  such that
faithful irreducible representation  so that (a) = "a"1 , where a 2 R(2) and " is as in [2]. In
this case, [7] reads
a = BaB1
C ! R2
The map h : A  A ! K defined by
There are also graded algebras over R:
ha; b = tr ab
R ! C; C ! H; and H ! C2
is bilinear, symmetric, and nondegenerate. The map
 is an isometry of the quadratic space (A, h) on its The grading of the last algebra can be defined by
image in the quadratic space (End S, B  B1 ). declaring the Pauli matrices and iI to be odd.

Super Lie algebras A super Lie algebra is a graded


Graded Algebras
algebra A such that the product (a, b 7! a, b is
Definitions An algebra A is said to be Z-graded super anticommutative, a, b =  (1)pq b, a, and
(resp., Z2 -graded) if there is a decomposition of the satisfies the super Jacobi identity,
underlying vector space A = p2Z Ap (resp.,
a; b; c = a; b; c 1pq b; a; c
A = A0  A1 ) such that Ap Aq Apq . In a Z2 -graded
algebra, it is understood that p q is reduced mod 2. If for every a 2 Ap , b 2 Aq and c 2 A. To every graded
a 2 Ap , then a is said to be homogeneous of degree p. associative algebra A there corresponds a super Lie
The exterior algebra ^V of a vector space V is algebra GLA: its underlying vector space and
Z-graded. Every Z-graded algebra becomes Z2 -graded grading are as in A and the product, for a 2 Ap
Clifford Algebras and Their Representations 523

and b 2 Aq , is given as the supercommutator a, b = map such that f (v)2 = g(v, v)1A for every v 2 V. There
ab  (1)pq ba. then exists a homomorphism f : C(V, g) ! A of
algebras with units, an extension of f, so that f (v) = f(v)
Supercentrality and graded simplicity A graded for every v 2 V.
algebra A over K is supercentral if Z(A) \ A0 = K. As a corollary, one obtains
The algebra R ! C is supercentral, but the real
ungraded algebra C is not central. Proposition (D) If f is an isometry of (V, g) into
A subalgebra B of a graded algebra A is said to be (W, h), then there is a homomorphism of algebras
a graded subalgebra if B = B \ A0  B \ A1 . A C(f ) : C(V, g) ! C(W, h) extending f so that there
graded ideal of A is an ideal that is a graded is the commutative diagram
subalgebra. A graded algebra A 6 f0g is said to be Cf
graded simple if it has no graded ideals other than CV; g ! CW; h
f0g and A. The double algebra of a simple algebra is " "
graded simple, but not simple. V ! W
f

For example, the isometry v 7! v extends to the


The graded tensor product Let A and B be graded
involutive main automorphism  of C(V, g), defin-
algebras; the tensor product of their underlying
ing its Z2 -grading:
vector spaces admits a natural grading, (A  B)p =
q Aq  Bpq . The product defined in [3] makes CV; g = C0 V; g  C1 V; g
A  B into a graded algebra. There is another super
product in the same graded vector space given by The algebra C(V, g) admits also an involutive cano-
nical antiautomorphism  characterized by (1) = 1
a  b
a0  b0 = 1pq aa0  bb0 and (v) = v for every v 2 V.

for a0 2 Ap and b 2 Bq . The resulting graded algebra


is referred to as the graded tensor product and The Vector Space Structure of Clifford Algebras
denoted by A  ^ B. For example, if V and W are
Referring to proposition (D), let A = End( ^V) and, for
vector spaces, then the Grassmann algebra ^(V  every v 2 V and w 2 ^V, put f (v)w = v ^ w g(v)cw,
W) is isomorphic to ^V  ^ ^ W.
then f : V ! End( ^V) is a Clifford map and the map
i : CV; g ! ^V 9
Clifford Algebras
given by i(a) = f(a)1^V is an isomorphism of vector
Definitions: The Universal Property and Grading spaces. This proves
The Clifford algebra associated with a quadratic Proposition (E) As a vector space, the algebra
space (V, g) is the quotient algebra C(V, g) is isomorphic to the exterior algebra ^V.
CV; g = T V=J V; g 8 If V is m-dimensional, then C(V, g) is
where J (V, g) is the ideal in the tensor algebra T (V) 2m -dimensional. The linear isomorphism [9] defines a
generated by all elements of the form v  v  Z-grading of the vector space underlying the Clifford
g(v, v)1T (V) , v 2 V. algebra: if i(ak ) 2 ^k V, then ak is said to be of
The Clifford algebra is associative with a unit Grassmann degree k. Every element a 2 C(V, g)
element denoted by 1. One denotes by the decomposes
P into its Grassmann components,
canonical map of T (V) onto C(V, g) and by ab a = k2Z ak . The Clifford product of two elements of
the product of two elements a, b 2 C(V, g) so that Grassmann
P degrees k and l decomposes as follows:
(P  Q) = (P) (Q) for P, Q 2 T (V). The map is ak bl = p2Z (ak bl )p , and (ak bl )p = 0 if p < jk  lj or
injective on K  V, and one identifies this subspace of p k  l 1 mod 2 or p > m  jm  k  lj.
T (V) with its image under . With this identification, One often uses [9] to identify the vector spaces ^V
for all u, v 2 V, one has and C(V, g); this having been done, one can write,
for every v 2 V and a 2 C(V, g),
uv vu = 2gu; v
va = v ^ a gvca 10
Clifford algebras are characterized by their universal
so that [v, a] = 2g(v)ca, where [ , ] is the supercommu-
property described in the following proposition.
tator. It defines a super Lie algebra structure in the
Proposition (C) Let A be an algebra with a unit 1A vector space K  V. The quadratic form defined by g
and let f : V ! A be a Clifford map, that is, a linear need not be nondegenerate; for example, if it is the
524 Clifford Algebras and Their Representations

0-form, then [10] shows that the Clifford and exterior The Chevalley Theorem and the BrauerWall
multiplications coincide and C(V, 0) is isomorphic, as Group
an algebra, to the Grassmann algebra. If (V, g) and (W, h) are quadratic spaces over K, then
their sum is the quadratic space (V  W, g  h)
Complexification of Real Clifford Algebras characterized by g  h : V  W ! V   W  so that
Proposition (F) If (V, g) is a real quadratic space, (g  h)(v, w) = (g(v), h(w)). By noting that the map
^ C(W, h)
V  W 3 (v, w)7! v  1 1  w 2 C(V, g) 
then the algebras C  C(V, g) and C(C  V, C  g)
are isomorphic, as graded algebras over C. has the Clifford property, Chevalley proved

From now on, through the end of the article, one Proposition (I) The algebra C(V  W, g  h) is
^ C(W, h).
isomorphic to the algebra C(V, g) 
assumes that (V, g) is an orthogonal space over
K = R or C. The type of the (graded) algebra C(V  W, g  h)
The Clifford algebra associated with the orthogo- depends only on the types of C(V, g) and C(W, h).
nal space Cm is denoted by Cm . The Clifford The Chevalley theorem (I) shows that the set of types
algebra associated with the orthogonal space of Clifford algebras over K forms an abelian group for
(Rkl , g), where g is of signature (k, l), is denoted a multiplication induced by the graded tensor product.
by Ck, l , so that C  Ck, l = Ckl . The unit of this BrauerWall group of K is the type of
the algebra C(K2 , h) described in [11]; for a full
Relations between Clifford Algebras in Spaces of account with proofs, see Wall (1963).
Adjacent Dimensions
The Volume Element and the Centers
Consider an orthogonal space (V, g) over K and the
Let e = (e ) be an orthonormal frame in (V, g). The
one-dimensional orthogonal space (K, h1 ), having a
unit vector w 2 K, h1 (w, w) = ", where " = 1 or 1. volume element associated with e is
The map V 3 v 7! vw 2 C0 (V  K, g  h1 ) satisfies
= e1 e2


em
(vw)2 = "g(v, v) and extends to the isomorphism
of algebras C(V, "g) ! C0 (V  K, g  h1 ). This If
0 is the volume element associated with another
proves orthonormal frame e0 in the same orthogonal space,
then either
0 =
(e and e0 are of the same
Proposition (G) There are isomorphisms of algebras: orientation) or
0 = 
(e and e0 are of opposite
Cm ! C0m1 and Ck, l ! C0k1, l . orientation). For K = C, one has
2 = 1; for K = R
Consider the orthogonal space (K2 , h) with a and g of signature (k, l) one has
neutral h such that, for ,  2 K, one has
2 = 11=2klkl1 13
h(, ), h(, )i = . The map
! It is convenient to define 2 f1, ig so that
2 = 2 . For
2
0  every v 2 V one has v
= (1)m1
v. The structure of
K ! K2; ;  7! the centers of Clifford algebras is as follows:
 0
Proposition (J) If m is even, then Z(C(V, g)) = K
has the Clifford property and establishes the and Z(C0 (V, g)) = K  K
. If m is odd, then
isomorphisms represented by the horizontal arrows Z(C(V, g)) = K  K
and Z(C0 (V, g)) = K.
in the diagram The graded algebra C(V, g) is supercentral for
every m.
CK2 ; h ! K2
" " 11 The Structure of Clifford Algebras

C0 K2 ; h ! KK The complex case Using [4] one obtains from [11]
and [12] the isomorphisms of algebras
C02n1 = C2n = C2n 14
2
Proposition (H) If (K , h) is neutral and (V, g) is
over K, then the algebra C(V  K2 , g  h) is
C2n1 = C02n2 = C2n  C2n 15
isomorphic to the algebra C(V, g)  K(2)_ Specifically,
there are isomorphisms for n = 0, 1, 2 , . . . : Therefore, there are only two types
of complex Clifford algebras, represented by
Ck1;l1 = Ck;l  R2
12 C ! C  C and C  C ! C(2) : the BrauerWall
Cm2 = Cm  C2 group of C is Z2 .
Clifford Algebras and Their Representations 525

The real case In view of proposition (I) and The spinorial clock is symmetric with respect to
C1, 1 = R(2), the algebra Ck, l is of the same type as the reflection in the vertical line through its center;
Ckl, 0 if k > l and of the same type as C0, lk this is a consequence of the isomorphism of algebras
if k < l. Since Ck, l  ^ Cl, k = Ckl, kl , the type Ck, l2 = Cl, k  R(2).
of Cl, k is the inverse of the type of Ck, l . The algebra Note that the abstract algebra Ck, l carries, in
C04, 0 ! C4, 0 is isomorphic to H  H ! H(2): if general, less information than the Clifford algebra
x = (x1 , x2 , x3 , x4 ) 2 R4 C4, 0 , and q = ix1 jx2 defined in [8], which contains V as a distinguished
kx3 x4 2 H, then an isomorphism is obtained from vector subspace with the quadratic form
the Clifford map f , v 7! v2 = g(v, v). For example, the algebras C8, 0 ,
! C4, 4 , and C0, 8 are all graded isomorphic.
0 q
f x = 16
q 0 Theorem on Simplicity

In view of [13], the volume element


satisfies
2 = 1. From general theory (Chevalley 1954) or by inspec-
By replacing q  with q  in [16], one shows that C0, 4 tion of [14], [15], and [17], one has
is also isomorphic to H(2). The map R4  Rkl ! Proposition (L) Let m be the dimension of the
H(2)  Ck, l given by (x, y) 7! f (x)  1
 y has orthogonal space (V, g) over K.
the Clifford property and establishes the isomorphism
of algebras Ck4, l = H  Ck, l . Since, similarly, (i) If m is even (resp., odd), then the algebra
Ck, l4 = H  Ck, l , one obtains the isomorphism C(V, g) (resp., C0 (V, g)) over K is central simple.
(ii) If K = C and m is odd (resp., even), then the
Ck4;l = Ck;l4 algebra C(V, g) (resp., C0 (V, g)) is the direct
Therefore, sum of two isomorphic complex central simple
algebras.
Ck8;l = Ck4;l4 = Ck;l8 = Ck;l  R16 (iii) If K = R and m is odd (resp., even), then the
algebra C(V, g) (resp., C0 (V, g)) when
2 = 1 is
and the algebras Ck, l , Ck8, l , and Ck, l8 are all of the
the direct sum of two isomorphic central simple
same type. This double periodicity of period 8 is
algebras and when
2 = 1 is simple with a
subsumed by saying that real Clifford algebras can be
center isomorphic to C.
arranged on a spinorial chessboard. The type of
C0k, l ! Ck, l depends only on k  l mod 8; the eight
types have the following low-dimensional algebras as Representations
representatives: C1, 0 , C2, 0 , C3, 0 , C4, 0 = C0, 4 , C0, 3 ,
C0, 2 , and C0, 1 . The BrauerWall group of R is Z8 , The Pauli, Cartan, Dirac, and Weyl
Representations
generated by the type of C01, 0 ! C1, 0 , that is, by R !
C. Bearing in mind the isomorphism Ck, l = C0k1, l Odd dimensions Let (V, g) be of dimension
and abbreviating C ! R(2) to C ! R, etc., one can m = 2n 1 over K. From propositions (A) and (L) it
arrange the types of real Clifford algebras in the form follows that the central simple algebra C0 (V, g) has a
of a spinorial clock: unique, up to equivalence, faithful, and irreducible
7 0 representation in the complex 2n -dimensional vector
R ! RR ! R space S of Pauli spinors. By putting (
) = I it is
6" #1 extended to a Pauli representation  : C(V, g) !
C C 17 End S. Given an orthonormal frame (e ) in V, Pauli n
5" #2 endomorphisms (matrices if S is identified with C2 )
H HH H are defined as  = (e ) 2 End S. The representations
4 3
 and    are complex inequivalent. For K = C
none of them is faithful; their direct sum is the faithful
Proposition (K) Recipe for determining C0k, l !
Cartan representation of C(V, g) in S  S. For K = R
Ck, l :
and (1=2)(k  l  1) even, the representations  and
(i) find the integers  and  such that    are real equivalent and faithful. On computing
k  l = 8  and 0 v 7; (
) one finds that the contragredient representation 
(ii) from the spinorial clock, read off A0v ! vAv and is equivalent to  for n even and to    for n odd.
0
compute the real dimensions, dim A0v = 2 and

dim Av = 2 ; and Even dimensions Similarly, for (V, g) of dimension
0
(iii) form C0k, l = A0v (2(1=2)(kl1 ) ) and Ck, l = m = 2n over K, the central simple algebra C(V, g)
Av (2(1=2)(kl ) ). has a unique, up to equivalence, faithful, and
526 Clifford Algebras and Their Representations

irreducible representation : C(V, g) ! End S in the Example One of the most used representations :
2n-dimensional complex vector space S of Dirac C3, 1 ! C(4) is given by the Dirac matrices
spinors. The Dirac endomorphisms (matrices) are ! !
0 x 0 y
 = (e ). Put  (
) so that 2 = I: the matrix  1 = ; 2 =
generalizes the familiar 5 . The Dirac representation x 0 y 0
restricted to C0 (V, g) decomposes into the sum  
of two irreducible representations in the vector spaces
! ! 20
S = fs 2 S j s = sg 0 z 0 I
3 = ; 4 =
of Weyl (chiral) spinors. The elements of S are said z 0 I 0
to be of opposite chirality with respect to those of
S . The transpose  defines a similar split of S . Change Conjugation and Majorana Spinors
The representations and  are never complex-
equivalent, but they are real equivalent and Throughout this section and next, one assumes
faithful for K = R and (1=2)(k  l) odd. K = R so that, given a representation  : C(V, g) !
The representations   and are both equiva- End S,one can form the complex- (charge) conjugate
lent to . It is convenient to describe simultaneously representation  : C(V, g) ! End S defined by
the properties of the transpositions of the Pauli and (a) = (a) and the Hermitian conjugate representa-

Dirac matrices; let  be either the Pauli matrices tion y : C(V, g) ! End S , where y (a) = (a).
for V of dimension 2n 1 or the Dirac matrices for
V of dimension 2n. There is a complex isomorphism Even dimensions The representations  and are
B : S ! S such that equivalent: there is an isomorphism C : S ! 
S such
that
 = 1n B B1 18
 = C  C1 21
n
In the case of the Dirac matrices, the factor (1) in The automorphism CC  is in the commutant of ; it
[18] implies that this equation also holds for  in is, therefore, proportional to I and, by a change of
place of  . The isomorphism B preserves (resp., scale, one can achieve CC  = I for k  l 0 or
changes) the chirality of Weyl spinors for n even 
6 mod 8 and CC I for k  l 2 or 4 mod 8.
(resp., odd). Every matrix of the form B 1 . . . p , The spinor sc C1s 2 S is the charge conjugate of
where s 2 S. If : V ! S is a solution of the Dirac equation
141 <


< p 2n 19  @  iqA  = 0

is either symmetric or antisymmetric, depending on for a particle of electric charge q, then c is a


p and the symmetry of B. A simple argument, based solution of the same equation with the opposite
on counting the number of such products of one charge. Since
symmetry, leads to the equation  = 2 CC1
B = 11=2nn1 B charge conjugation preserves (resp., changes) the
chirality of Weyl spinors for (1=2)(k  l) even (resp.,
valid in dimensions 2n and 2n 1. odd).
 = I, then
If CC

Inner products on spinor spaces Let S be the Re S = fs 2 S j sc = sg


complex vector space of Dirac or Pauli spinors is a real vector space of dimension 2n , the space of
associated with (V, g) over K. The isomorphism B : DiracMajorana spinors. The representation is
S ! S defines on S an inner product real: restricted to Re S and expressed with respect to
B(s, t) = hs, B(t)i, s, t 2 S, which is orthogonal for a frame in this space, it is given by real 2n  2n
m 0, 1, 6, or 7 mod 8 and symplectic for m matrices. For k  l 0 mod 8 the representations
2, 3, 4, or 5 mod 8. For m 0 mod 4, this product and  are both real: in this case there are
restricts to an inner product on the space of Weyl WeylMajorana spinors.
spinors that is orthogonal for m 0 mod 8 and
symplectic for m 4 mod 8. For m 2 mod 4, the Odd dimensions On computing (
) one finds that

map B defines the isomorphisms B : S ! S . the conjugate representation  is equivalent to 
Clifford Algebras and Their Representations 527

(resp.,   ) if
2 = 1 (resp.,
2 = 1). There is an  of dimension 2(m) , where (m) is the mth Radon
isomorphism C : S !  S such that Hurwitz number given by
m= 1 2 3 4 5 6 7 8
 = 11=2kl1 C C1 22
m = 1 2 2 3 3 3 3 4
 = I (resp., CC
and CC  =  I) for k  l 1 or 7 mod 8
(resp., k  l 3 or 5 mod 8). For k  l 1 mod 8, the and (m 8) = (m) 4. The matrices  2 R(2(m) ),
restriction of the Pauli representation to C0k, l is real  = 1, . . . , m, defining these representations satisfy
and the Pauli matrices are pure imaginary; for k  l  v v  = 2v I
7 mod 8, the Pauli representations of Ck, l are both real
and so are the Pauli matrices. In both these cases there and can be chosen so as to be antisymmetric. In all
are PauliMajorana spinors. dimensions other than m 3 mod 4 the representa-
tions are faithful.
For m 2 and 4 mod 8 (resp., m 1, 3, and
Hermitian Scalar Products and Multivectors 5 mod 8) the representations  are the realifications of
For m = k l odd and C as in [22], the map the corresponding Dirac (resp., Pauli) representations.

 :S ! 
A = BC S intertwines the representations y In dimensions m 0 and 6 mod 8 (resp.,
and  (resp.,   ) for k even (resp., odd), m 7 mod 8) the Dirac (resp., Pauli) representations
themselves are real.
y = 1k A A1
By rescaling of B, the map A can be made Inductive Construction
Hermitian. The corresponding Hermitian form
of Representations
s 7! A(s, s) is definite if and only if k or l = 0;
otherwise, it is neutral. An inductive construction of the Pauli
For m = k l even, the representations y and representations
are equivalent and one can define a Hermitian
isomorphism A : S ! 

S so that  : Cn1;n ! R2n1 ; n = 1; 2; . . .
and of the Dirac representations
y = A  A1 23
: Cn;n ! R2n ; n = 1; 2; . . .
The isomorphism A0 = A intertwines the represen-
tations y and  ; it can also be made Hermitian is as follows.
by rescaling. The Hermitian form A(s, s) is definite 1. In dimension 1, put 1 = 1.
for k = 0 and A0 (s, s) is definite for l = 0; otherwise, 2. Given  2 R(2n1 ),  = 1, . . . , 2n  1, define
these forms are neutral. For example, in the familiar !
representation [20], one has A = 4 , a neutral form. 0 
For p = 0, 1, . . . , m = 2n, two spinors s and t 2 S  = for  = 1; . . . ; 2n  1
 0
define the p-vector with components
and
A1 ...p s; t = hs; A 1 . . . p ti 24
!
0 I
where the indices are as in [19]. The Hermiticity of 2n =
A and [23] imply I 0

A1 ...p s; t = 11=2pp1 A1 ...p t; s 3. Given  2 R(2n ),  = 1, . . . , 2n, define  = 


for  = 1, . . . , 2n, and 2n1 = 1


2n .
In view of y = (1)k AA1 , the map A defines,
for k even, a nondegenerate Hermitian scalar All entries of these matrices are either 0, 1, or 1;
product on the spaces S whereas A(s, t) = 0 if s therefore, they can be used to construct representa-
and t are Weyl spinors of opposite chiralities. For k tions of Clifford algebras of orthogonal spaces over
odd, A changes the chirality. any commutative field of characteristic 6 2.
By induction, one has  = (1)1  . Therefore,
the isomorphisms appearing in [18] are
B = 2 4


2n for both m = 2n and 2n 1.
The RadonHurwitz Numbers
By multiplying some of the matrices  or  by the
Proposition (M) For every integer m > 0, the imaginary unit, one obtains complex representations
algebra Cm, 0 has an irreducible real representation of the Clifford algebras associated with the quadratic
528 Clifford Algebras and Their Representations

forms of other signatures. For example, in dimension fields on odd-dimensional spheres can be constructed
3, (1 , i2 , 3 ) are the Pauli matrices. In dimension 4, with the help of the representation  described in
multiplying 2 by i one obtains the Dirac matrices for g proposition (M). Given a positive even integer N, let
of signature (1, 3), in the chiral representation: m be the largest integer such that N = 2(m) p, where
    p is an odd integer. Consider the unit sphere
0 x 0 y
1 ; 2 SN1 = fx 2 RN j jjxjj = 1g of dimension N  1. For
x 0 y 0 v 2 Rm , put 0 (v) = (v)  I, where I 2 R(p) is the
    25
0 z 0 I unit matrix. Since (v) is antisymmetric, so is the
3 ; 4
z 0 I 0 matrix 0 (v) 2 R(N). Therefore, for every x 2 SN1 ,
the vector 0 (v)x is orthogonal to x. The map
To obtain the real Majorana representation one uses x 7! 0 (v)x defines a vector field on SN1 that
the following fact: vanishes nowhere unless v = 0 : the (N1)-sphere
Proposition (N) If the matrix C 2 R(2n ) is such admits a set of m tangent vector fields which are
that C2 = I and [21] holds, then the matrices linearly independent at every point. Using methods of
(I iC)  (I iC)1 ,  = 1, . . . , 2n, {\it are real}. algebraic topology, it has been shown that this
method gives the maximum number of linearly
For the matrices [25], one can take C = 1 3 4 to independent tangent vector fields on spheres.
obtain If m = 1, 3, or 7, then m 1 = 2(m) and, for these
! ! values of m, the sphere Sm is parallelizable. More-
0 x I 0
0
1 = ; 0
2 = over, one can then introduce in Rm1 the structure
x 0 0 I of an algebra Am as follows. Put 0 = I. If e0 2 Rm1
! ! is a unit vector and e =  (e0 ), then (e0 , e1 , . . . , em )
0 z 0 I is anPorthonormal framePin Rm1 . The product of
30 = ; 40 = x= m m
 = 0 x e and y =  = 0 y e is defined to be
z 0 I 0
X
m

The real representations described in proposition x


y= x yv  ev
;v = 0
(M) can be obtained by the following direct inductive
construction. Consider the following seven real anti- so that e0 is the unit element for this product.
symmetric and anticommuting 8  8 matrices: Defining Re x= x0 e0 , Im x = x  Re x, x  = Rex  Im x,
one has x
x = e0 jjxjj2 and x

(x
y)= (
x
x)
y, so that
1 z  I  ";  2 z  "  x
x
y= 0 implies x = 0 or y = 0: Am is a normed
 3 z  "  z ;  4 x  "  I algebra without zero divisors. The algebras A1 and
26 A3 are isomorphic to C and H, respectively, and A7
5 x  x  ";  6 x  z  "
is, by definition, the algebra O of octonions
7 "  I  I discovered by Graves and Cayley. The algebra O is
nonassociative; its multiplication table is obtained
For m = 4, 5, 6, and 7 the matrices 1 , . . . , m gener-
with the help of [26].
ate the representations of Cm, 0 in R8 . The eight
matrices  = x   ,  = 1, . . . , 7, and 8 = "  I 
I  I give the required representation of C8, 0 in Spinor Groups
R16 . By dropping the first factor in 1 , 2 , 3 , one
obtains the matrices generating a representation of Let (V, g) be a quadratic space over K. If u 2 V is
C3, 0 in R 4 , etc. The symmetric matrix not null, then it is invertible as an element of
 = 1


8 = z  I  I  I anticommutes with all C(V, g) and the map v 7! uvu1 is a reflection in
the s and 2 = I. If the matrices  2 R(2(m) ) the hyperplane orthogonal to u. The orthogonal
correspond to a representation of Cm, 0 , then the group O(V, g) = O(V, g) = fR 2 GL(V) j R  g 
m 8 matrices   1 , . . . ,   m , 1  I, . . . , 8  I R = gg is generated by the set of all such reflections.
generate the required representation of Cm8, 0 . A spinor group G is a subset of C(V, g) that is a
group with respect to multiplication induced by the
product in the algebra, with a homomorphism
 : G ! GL(V) whose image contains the connected
Vector Fields on Spheres
component SO0 (V, g) of the group of rotations of
and Division Algebras
(V, g). In the case of real quadratic spaces, one
It is known that even-dimensional spheres have no considers also spinor groups that are subsets of C 
nowhere-vanishing tangent vector fields. All such C(V, g) with similar properties. By restriction, every
Clifford Algebras and Their Representations 529

representation of C(V, g) or C  C(V, g) gives u1 . . . u2p v1 . . . v2q such that u2i = 1 and v2j = 1.
spinor representations of the spinor groups it The connected groups Spinm:0 and Spin0, m are
contains. isomorphic and denoted by Spinm . Since Spin0k, l
G1 (), the Hermitian form A and the bilinear form
Pin Groups
B are invariant with respect to the action of this
group. Moreover, for k l even, from [24] and
It is convenient to define a unit vector v 2 V [28] there follows the transformation law of
C(V, g) to be such that v2 = 1 for V complex and multivectors formed from pairs of spinors,
v2 = 1 or 1 for V real. The group Pin(V, g) is
A1

p as; at
defined as the subgroup of Cpin(V, g) consisting of
products of all finite sequences of unit vectors. = Av1 ...vp s; tRv11 a1 . . . Rvpp a1
f
Defining now the twisted adjoint representation Ad
f 1
Consider Spin0 (V, g) and assume that either V is
by Ad(a)v = (a)va , one ontains the exact sequence complex of dimension 52 or real with k or l 5 2.
e
Ad Then there are two unit orthogonal vectors
1 ! Z2 ! PinV; g!OV; g ! 1 27
e1 , e2 2 V such that (e1 , e2 )2 = 1. The vector
If dimV is even, then the adjoint representation u(t) = e1 cos t e2 sin t is obtained from e1 by rotation
Ad(a)v = ava1 also yields an exact sequence like in the plane span fe1 , e2 g by the angle t 2 R. The
[27]; if it is odd, then the image of Ad is SO(V, g) and curve t 7! e1 u(t), 0  t  , connects the elements
the kernel is the four-element group f1, 1,
, 
g. 1 and 1 of Spin0 (V, g). Its image in SO0 (V, g), that
Given an orthonormal frame (e ) in (V, g) and is, the curve t 7! Ad(e1 u(t)), 0  t  , is closed:
a 2 Pin(V, g), one defines the orthogonal matrix Ad(1) = Ad(1). This fact is often expressed by
R(a) = (Rv (a)) by saying that a spinor undergoing a rotation by 2
f v
changes sign. There is no homomorphism not
Adae  = ev R a 28 even a continuous map f : SO0 (V, g) ! Spin0 (V, g)
If (V, g) is complex, then the algebras C(V, g) and such that Ad  f = id.
C(V, g) are isomorphic; this induces an iso-
morphism of the groups Pin(V, g) and Pin(V, g). Spinc Groups
If V = Cm , then this group is denoted by Pinm (C). If
V = Rkl and g of signature (k, l), then one writes For the purposes of physics, to describe charged
Pin(V, g) = Pink, l . A similar notation is used for the fermions, and in the theory of the SeibergWitten
groups spin, see below. invariants, one needs the Spinc groups that are spinorial
extensions of the real orthogonal groups by the group U1
of phase factors. Assume V to be real and g of
Spin Groups
signature (k, l) so that the sequence [29] can be
The spin group Spin(V, g) = Pin(V, g) \ C0 (V, g) is written as
generated by products of all sequences of an even
1 ! Z2 ! Spink;l ! SOk;l ! 1
number of unit vectors. Since the algebras C0 (V, g)
and C0 (V, g) are isomorphic, so are the groups Define the action of Z2 = f1, 1g in Spink, l  U1 so
Spin(V, g) and Spin(V, g). Since (a) = a for a 2 that (1)(a, z) = ( a,  z). The quotient (Spink, l 
Spin(V, g), the twisted adjoint representation U1 )=Z2 = Spinck, l yields the extensions
reduces to the adjoint representation and yields the
exact sequence 1 ! U1 ! Spinck;l ! SOk;l ! 1
Ad
1 ! Z2 ! SpinV; g ! SOV; g ! 1 29 and
1 ! Spink;l ! Spinck;l ! U1 ! 1
For V = Cm , the spin group is denoted by Spinm (C).
Since Spinm (C) G1 (), the bilinear form B is For example, Spin3 = SU2 and Spinc3 = U2 .
invariant with respect to the action of this group.

Spin0 Groups Spin Groups in Dimensions <6


The connected component Spin0 (V, g) of the group The connected components of spin groups asso-
Spin(V, g) coincides with Spin(V, g) if either the ciated with orthogonal spaces of dimension 46 are
quadratic space (V, g) is complex or real and kl = 0. isomorphic to classical groups. They can be expli-
In signature (k, l), the connect group Spin0k, l is citly described starting from the following
generated in C0k, l by all products of the form observations.
530 Clifford Algebras and Their Representations

Consider the four-dimensional vector space See also: Dirac Operator and Dirac Field; Index
(of twistors) T over K, with a volume element Theorems; Relativistic Wave Equations Including Higher
vol 2 ^4 T. The six-dimensional vector space Spin Fields; Spinors and Spin Coefficients; Twistors.
V = ^2 T has a scalar product g defined by
g(u, v)vol = 2u ^ v for u, v 2 V. The quadratic form
g(u, u) is the Pfaffian, Pf(u). If u 2 V is represented Further Reading
by the corresponding isomorphism T  ! T and a 2
End T, then Pf(aua ) = det aPf(u). The last for- Adams JF (1981) Spin (8), triality, F4 and all that. In: Hawking
SW and Rocek M (eds.) Superspace and Supergravity.
mula shows Spin0 (V, g) = SL(T), so that Spin6 (C) =
Cambridge: Cambridge University Press.
SL4 (C). For K = R, the Pfaffian is of signature (3, 3), so Atiyah MF, Bott R, and Shapiro A (1964) Clifford modules.
that Spin03, 3 = SL4 (R). A non-null vector v 2 V defines Topology 3(suppl. 1): 338.
a symplectic form on T  . The five-dimensional vector Baez JC (2002) The octonions. Bulletin of the American
space v? V is invariant with respect to the symplec- Mathematical Society 39: 145205.
Brauer R and Weyl H (1935) Spinors in n dimensions. American
tic group Sp(T  , u) = Spin0 (v? , Pfjv? ). This shows that
Journal of Mathematics 57: 425449.
Spin5 (C) = Sp4 (C) and Spin02, 3 = Sp4 (R). Spin groups Budinich P and Trautman A (1988) The Spinorial Chessboard,-
for other signatures in real dimensions 6 and 5 are Trieste Notes in Physics. Berlin: Springer.
obtained by considering appropriate real subspaces of Cartan E (1938) Theorie des spineurs. Actualites Scientifiques et
C6 and C5 , respectively. For example, [6] is used to Industrielles, No. 643 et 701. Paris: Hermann (English
transl.:The Theory of Spinors. Paris: Hermann, 1966).
show that Spin01, 5 = SL2 (H).
Chevalley C (1954) The Algebraic Theory of Spinors. New York:
Spin groups in dimensions 4 and lower are Columbia University Press.
similarly obtained from the observation that det is Clifford WK (1878) Applications of Grassmanns extensive
a quadratic form on the four-dimensional space K(2) algebra. American Journal of Mathematics 1: 350358.
and C0 (K(2), det) = K(2)  K(2). Clifford WK (1882) On the classification of geometric algebras.
In: Tucker R (ed.) Mathematical Papers by William Kingdon
Several spin groups are listed below.
Clifford, pp. 397401. London: Macmillan.
The complex spin groups Dirac PAM (1928) The quantum theory of the electron.
Proceedings of the Royal Society of London A 117: 610624.
Spin2 C = C ; Spin3 C = SL2 C Eckmann B (1942) Gruppentheoretische Beweis des Satzes von
HurwitzRadon uber die Komposition quadratischer Formen.
Spin4 C = SL2 C  SL2 C Commentarii Mathematici Helvetici 15: 358366.
Spin5 C = Sp4 C Karoubi M (1968) Algebres de Clifford et K-theorie. Annales
Scientifiques de lEcole Normale Superieure 4eme ser 1: 161270.
Spin6 C = SL4 C Lipschitz RO (1886) Untersuchungen uber die Summen von
Quadraten. Berlin: Max Cohen und Sohn.
The real, compact spin groups Lounesto P (2001) Clifford Algebras and Spinors, 2nd edn.
London Math. Soc. Lecture Note Series, vol. 286. Cambridge:
Spin2 = U1 ; Spin3 = SU2
Cambridge University Press.
Spin4 = SU2  SU2 ; Spin5 = Sp2 H Pauli W (1927) Zur Quantenmechanik des magnetischen
Elektrons. Z. Physik 43: 601623.
Spin6 = SU4
Penrose R and MacCallum MAH (1973) Twistor theory: an
The groups Spin0k, l for 1 4 k 4 l and k l  6 approach to the quantisation of fields and space-time. Physics
Report 6C(4): 241316.
Spin01;1 = R ; Spin01;2 = SL2 R Porteous IR (1995) Clifford Algebras and the Classical Groups,
Cambridge Studies in Advanced Mathematics, vol. 50. Cam-
Spin01;3 = SL2 C bridge: Cambridge University Press.
Postnikov MM (1986) Lie groups and Lie algebras. Mir: Moscow.
Spin02;2 = SL2 R  SL2 R Sudbery A (1987) Division algebras (pseudo)orthogonal groups
and spinors. Journal of Physics A17: 939955.
Spin01;4 = Sp1;1 H Trautman A (1997) Clifford and the square root ideas.
Spin02;3 = Sp4 R; Spin01;5 = SL2 H Contemporary Mathematics 203: 324.
Trautman A and Trautman K (1994) Generalized pure spinors.
Spin02;4 = SU2;2 Journal of Geometry and Physics 15: 122.
Wall CTC (1963) Graded Brauer groups. Journal fur die Reine
Spin03;3 = SL4 R und Angewandte Mathematik 213: 187199.
Cluster Expansion 531

Cluster Expansion
R Kotecky, Charles University, Prague, and
Czech Republic, and the University of Warwick, UK
1 @
2006 Elsevier Ltd. All rights reserved. ;  lim  log Z; ; V 6
V!1 jVj @

Mayer series are the expansions of p and  in powers


of :
Introduction
X
1

The method of cluster expansions in statistical p;  bn n 7


n1
physics provides a systematic way of computing
power series for thermodynamic potentials (loga- and
rithms of partition funtions) as well as correlations.
It originated from the works of Mayer and others X
1
;  nbn n 8
devoted to expansions for dilute gas. n1

Mayers idea for a systematic computation of


coefficients bn was based on a reformulation of
Mayer Expansion partition function Z(, , V) in terms of cluster
Consider a system of interacting particles with integrals. Introducing the function
Hamiltonian
f r er  1 9
HN p1 ; . . . ; pN ; r 1 ; . . . ; r N
and using G[N] to denote the set of all graphs on N
XN
p2i X
N
vertices {1, . . . , N}, we get
r i  r j 1
2m i; j1 Z
i1 X
1
N Y
N  Y 3
Z; ; V 1 f r i  r j d ri
where  is a stable and regular pair potential. N0
N! V N i;j1
Namely, we assume that there exists B  0 such that X
1
N X
X
N wg 10
N0
N g2GN
r i  r j   BN 2
i;j1
where
for all N = 2, 3, . . . and all (r 1 , . . . , r N ) 2 R3N , and Z Y Y
that wg f r i  r j d3 r i 11
Z V N fi;jg2g
 
C er  1d3 r < 1 3
Observing that the weight w is multiplicative in
for some  > 0 (and hence all  > 0). Basic connected components (clusters) g1 , . . . , gk of the
thermodynamic quantities are given in terms of the graph g,
grand-canonical partition function Y
k

X1 Z Q 3 Q 3 wg wg 12
zN HN d pi d r i
Z;; V e 1
N0
N! 3N
R V N h3N
Z P we can rewrite
X1
N  r i r j
Y 3
e i;j d ri 4 X
1
N X Y
N0
N! V N Z; ; V wg 13
N0
N! fgl g g2G
In the second expression we absorbed the factor
resulting from the integration over impulses into with the sum running over all disjoint collections fgl g
(configurational) activity = (2m=h2 )3=2 z. In par- of connected graphs with vertices in {1, . . . , N}. A
ticular, the pressure p and the density  are defined straightforward exponential expansion can be used to
by the thermodynamic limits (with V ! 1 in the show that, at least in the sense of formal power series,
sense of Van Hove)
X1
n X
1 1 log Z; ; V wg 14
p;  lim log Z; ; V 5 n1
n! g2Cn
 V!1 jVj
532 Cluster Expansion

where C[n] is the set of all connected graphs on n Vertices v 2 V are called abstract polymers, with
vertices. Using bn(V) to denote the coefficients two abstract polymers connected by an edge in the
graph G called incompatible. We shall refer to w(v)
1 1 X
bV
n wg 15 as to the weight of the abstract polymer v. For any
jVj n! g2Cn finite W  V, we consider the induced subgraph
G[W] of G spanned by W and define
and observing that the limits limV ! 1 (1=jVj)w(g) of XY
cluster integrals exist, we get bn = limV ! 1 b(V)
n . The ZW w wv 18
convergence of Mayer series can be controlled directly IW v2I
by combinatorial estimates on the coefficients b(V)
n . As a Here the sum runs over all collections I of
result, the diameter of convergence of the series [7] and
compatible abstract polymers or, in other words,
[8] can be proved to be at least (C()e2B1 )1 . A less
the sum is over all independent sets I of vertices in
direct proof is based on an employment of linear
W (no two vertices in I are connected by an edge).
integral KirkwoodSalsburg equations in a suitable
The partition function ZW (w) is an entire function
Banach space of correlation functions.
in w = {w(v)}v2W 2 CjWj and ZW (0) = 1. Hence, it is
Similar combinatorial methods are available also
nonvanishing in some neighborhood of the origin
for evaluation of coefficients of the virial expansion
w = 0 and its logarithm is, on this neighbourhood, an
of pressure in powers of gas density,
analytic function yielding a convergent Taylor series
X
1 X
p;  n  n 16 log ZW w aW XwX 19
n1 X2X W

obtained by inverting [8] (notice that b1 = 1) and Here, X (W) is theQset of all multi-indices X : W !
inserting it into [7]. One is getting n = limV ! 1 n(V) {0 1, . . . } and wX = v w(v)X(v) . Inspecting the formula
with for aW (X) in terms of corresponding derivatives of
1 1 X log ZW (w), it is easy to show that the Taylor coefficients
nV wg 17 aW (X) actually do not depend on W : aW (X) = asupp
jVj n! g2Bn
X(X), where supp X = {v 2 V: X (v) 6 0}. As a result,
where B[n]  C[n] is the set of all 2-connected one is getting the existence of coefficients a(X) such that
X
graphs on {1, . . . , n}; namely, those graphs that log ZW w aXwX 20
cannot be split into disjoint subgraphs by erasing X2X W
one vertex (and all adjacent edges). The diameter of
convergence of the virial expansion turns out to be for every finite W  V.
no less than (C()e(e2B 1))1 . The coefficients a(X) can be obtained explicitly.
One can pass from [18] to [20] in a similar way as
passing from [10] to [13]. The starting point is to
Abstract Polymer Models replace the restriction to compatible collections of
abstract
Q polymers in the sum [18] by the factor
An application of the ideas of Mayer expansions to 0 (1 F(v; v0 )) with
v; v 2W
lattice models is based on a reformulation of the 8 0
partition function in terms of a polymer model, a < 0 if v and v are compatible
>
formulation akin to [13] above. Namely, the partition Fv; v0  1 otherwise v and v0 21
function is rewritten as a sum over collections of >
:
connected by an edge from G
pairwise compatible geometric objects polymers.
Most often, the compatibility means simply their and to expand the product afterwards. The resulting
disjointness. formula is
While the reformulation of physical partition X
function in terms of a polymer model (including the aX X!1 1jEHj 22
HGX
definition of compatibility) depends on particularities P
of a given lattice model and on the considered region of Here, G(X) is the graph with jXj = jX(v)j vertices
parameters high-temperature, low-temperature, large induced from G[supp X] by replacing each of its
external fields, etc. the essence and results of cluster vertices v by the complete graphQon jX(v)j vertices
expansion may be conveniently formulated in terms of and X! is the multifactorial X! = v2supp X X(v)!. The
an abstract polymer model. sum is over all connected subgraphs H  G(X)
Let G = (V, E) be any (possibly infinite) countable spanned by the set of vertices of G(X) and jE(H)j
graph and suppose that a map w : V ! C is given. is the number of edges of the graph H.
Cluster Expansion 533

A useful property of the coefficients a(X) is their The restriction to compatible collections of polymers
alternating sign, can be actually relaxed. Namely, replacing [25] by
X Y Y
1jXj1 aX  0 23 ZW w wv Uv; v0 25
W 0 W v2W 0 v;v0 2W 0
More important than an explicit form of the
coefficients a(X) are the convergence criteria for the with U(v, v0 ) 2 [0, 1] (soft repulsive interaction), and
series [20]. One way to proceed is to find direct the condition [24] by
combinatorial bounds on the coefficients as expressed Y 1  rv0
by [22]. While doing so, one has to take into account the Rv  rv 26
1  Uv; v0 rv0
cancelations arising in view of the presence of terms of v0 6v
opposite signs in [22]. Indeed, disregarding them would
one can prove that the partition function ZW (w)
lead to a failure since, as it is easy to verify, the number
does not vanish on the polydisk DW, R implying thus
of connected graphs on jXj vertices is bounded from
that the power series of log ZW (w) converges
below by 2(jXj1)(jXj2)=2 . An alternative approach is to
absolutely on DW, R .
prove the convergence of [20] on polydisks DW, R =
Polymers that arise in typical applications are
{w : jw(v)j  R(v) for v 2 W} by induction in jWj,
geometric objects endowed with a support in the
once a proper condition on the set of radii R = {R(v);
considered lattice, say Zd , d  1, and their weights
v 2 V} is formulated. The most natural for the inductive
satisfy the condition of translation invariance. Cluster
proof (leading in the same time to the strongest claim)
expansions then yield an explicit power series for the
turns out to be the Dobrushin condition:
pressure (resp. free energy) in the thermodynamic
There exists a function r : V ! [0; 1) such that, for
limit as well as its finite-volume approximation.
each v 2 V
To formulate it for an abstract polymer model, we
Y
Rv  rv 1  rv0 24 assume that for each x 2 Zd , an isomorphism
v0 2N v
x : G ! G is given and that with each abstract polymer
v 2 V a finite set (v)  Zd is associated so that
Here N (v) is the set of vertices v0 2 V adjacent in (x (v)) = (v) x for every v 2 V and every x 2 Zd .
graph G to the vertex v. For any finite W  V and any multi-index X, let
Using X to denote the set of all P multi-indices (W) = [v2W (v) and (X) = (supp(X)). On the
X : V ! {0; 1, . . . } with finite jXj = jX(v)j and other hand, for any finite   Zd , let W() = {v 2
saying that X 2 X is a cluster if the graph G(supp V : (v)  }. Assuming also that the weight w : V ! C
X) is connected, we can summarize the cluster is translation invariant that is, w(v) = w(x (v)) for
expansion claim for an abstract polymer model in every v 2 V and every x 2 Zd we get an explicit
the following way: expression for the pressure of abstract polymer model
in the thermodynamic limit
Theorem (Cluster expansion). There exists a func-
tion a : X ! R that is nonvanishing only on clusters, 1 X aXwX
so that for any sequence of diameters R satisfying p lim log ZW w 27
!1 jj jXj
X:X30
the condition [24] with a sequence {r(v)}, the
following holds true: In addition, the finite-volume approximation can be
(i) For every finite W  V, and any contour weight explicitly evaluated, yielding
w 2 DW, R, one has ZW (w) 6 0 and
X log ZW w
log ZW w aXwX X jX \ j
X2X W
pjj aXwX 28
X:X\c 6;
jXj
P
(ii) X2X : suppX3v ja(X)jjwjX   log(1  r(v)).
Using the claim (ii), the second term can be bounded
Notice that, we have got not only an absolute by const. j@j.
convergence of the Taylor series of log ZW in the closed
polydisk DW, R , but also the bound (ii) (uniform in W)
Cluster Expansions for Lattice Models
on the sum over all terms containing a fixed vertex v.
Such a bound turns out to be very useful in applications There is a variety of applications of cluster expan-
of cluster expansions. It yields, eventually, bounds on sions to lattice models. As noticed above, the first
various error terms, avoiding a need of an explicit step is always to rewrite the model in terms of a
evaluation of the number of clusters of given size. polymer representation.
534 Cluster Expansion

High-Temperature Expansions yielding [34] (1  t > e2t for t < 1=2). To have w 2
DW, R (for any W) is, for R(B) = (e2 )jBj , sufficient
Let us illustrate this point in the simplest case of the Ising
to take   0 with tanh 0 = e2 .
model. Its partition function in volume   Zd , with
As a consequence, for   0 we can use the
free boundary conditions and vanishing external field, is
8 9 cluster expansion theorem to obtain a convergent
>
< >
= power series in powers of tanh . In particular,
X X
Z  exp x y 29 using (X) = [B2suppX (B), we get the pressure by

>
: x;y2 >
; the explicit formula
jxyj1

p
Using the identity
X aX X 37
ex y cosh  x y sinh  30 log 2 d logcosh  w
X:X3x
jXj
it can be rewritten in the form
X for any fixed x 2 Zd (by translation invariance of
Z  2jj cosh jBj tanh jBj 31 the contributing terms, the choice of x is irrelevant).
B The function p() is analytic on the region   0
Here, the sum runs over all subsets B of the set B() of since it is obtained as a uniformly absolutely
all bonds in  (pairs of nearest-neighbor sites from ) convergent series of analytic terms ( tanh )jXj .
such that each site is contained in an even number of This type of high-temperature cluster expansion
bonds from B. Using (B) to denote the set of sites can be extended to a large class of models P with
contained in bonds from B, we say that B1 , B2  B() Boltzmann factor in the form exp { A UA (
)},
are disjoint if (B1 ) \ (B2 ) = ;. Splitting now B into a where
= (
x ; x 2 Zd ) is the configuration with
collection B = {B1 , . . . , Bk } of its connected components a priori on-site probability distribution (d
x ) and
called (high-temperature) polymers and using B() to UA , for any finite A  Zd , are the multi-site
denote the set of all polymers in , we are getting interactions (depending only on (
x ; x 2 A)). Using
X Y the Mayer trick we can rewrite
Z  2jj cosh jBj tanh jBj 32 ( )
X Y
BB B2B
exp  UA
1 fA
38
with the sum running over all collections B of mutually A A

disjoint polymers. This expression is exactly of the with fA (


) = exp {UA (
)}  1. Expanding the
form [18], once we define compatibility of polymers product we will get a polymer representation with
by their disjointness. Introducing the weights polymers A consisting of connected collections
A = (A1 , . . . , Ak ) with weights
wB tanh  jBj 33 Z Y Y
and taking the set B() of all polymers in  for W, wA fA
d
x 39
we get the polymer representation Z () = A2A x2[A2A A

2jj ( cosh )jB()j ZB() (w). under appropriate bounds on the interactions UA
To apply the cluster expansion theorem, we have to and for  small enough, using (A) to denote the set
find a function r such that the right-hand side of [24] is [A2A A, we get,
positive and yields thus the radius of a polydisk of X
convergence. Taking r(B) = jBj with a suitable , we get jwAj  1 40
Y A:A 3 x
1  rB0  e2jBj 34
B0 2NB This assumption allows, as before in the case of the
2 jBj high-temperature Ising model, to apply the cluster
allowing to choose R(B) = r(B)e2jBj = (e ) .
expansion theorem yielding an explicit series expan-
Indeed, to verify [34] we just notice that the number
sion for the pressure.
of polymers of size n containing a fixed site is
bounded by n with a suitable constant . Thus,
X X
1 Correlations
0
jB j  n n  1 35
Cluster expansions can be applied for evaluation of
B0 : B0 3x n1
decay of correlations. Let us consider, for the class
once  is sufficiently small, and thus of models discussed above, the expectation
X Z Y
jBj  jBj  jBj 36 1
hi 
eH
d
x 41
B0 2NB Z x2
Cluster Expansion 535

P
with H (
) = A UA (
) and a function  we extend AS () to AS = [ AS () and X S,A0 () to
depending only on variables
x on sites x from a X S,A0 = [ X S,A0 (). As a result, we have an explicit
finite set S    Zd . expression for the limiting expectation hi in terms of
A convenient way of evaluating the expectation starts an absolutely convergent power series. This can be
with introduction of the modified partition function immediately applied to show that jhi  hi j decay
exponentially in distance between S and the comple-
Z; Z Z; Z 1 hi 42
ment
P of . Indeed, it suffices to find a suitable bound on
X
Clearly, X ja(X)jjwj with the sum running over all clusters
 X reaching from the set S to c . To this end one does not
d log Z; 
hi  43 need to evaluate explicitly the P number of clusters of
d 0 given diameter diam(X)= A X(A) diam((A))=m
Thus, one may get an expression for the expectation with m  dist(S,c ). The needed estimate is actually
hi , by forming a polymer representation of Z,  ( ) already contained in the condition (ii) from the cluster
and isolating terms linear in in the corresponding expansion theorem. It just suffices to choose a suitable
cluster expansion. For the first step, in the just cited k and assume that  is small P enough to assure validity
high-temperature case with general multi-site inter- of (40) in a stronger form, A:(A)3x jw(A)jK(A)j  1,
actions, we first enlarge the original set A() of all yielding eventually
X
polymers in  (consisting of connected collections c
jaXjjwjX  KdistS;  jSj
A = (A1 , . . . , Ak )) to W S () = A() [ AS (), where X : diamX  distS; c
AS () is the set of all collections (A1 , . . . , Ak ) of X P
polymers such that each of them intersects the set S jaXjjwjX K XAjAj

X:[A 2 supp X A3 x


(polymers (A1 , . . . , Ak ) are glued by S into a single c
entity). Compatibility is defined as before by disjoint-  jSjKdistS;  49
ness; in addition, any two collections from AS () are
Exponential decay of correlations h1 ; 2 i =
declared to be incompatible as well as any polymer A
h1 2 i  h1 i h2 i (and the limiting h1 ; 2 i)
from A() intersecting S is considered to be incompa-
in distance between supports of 1 and 2 can be
tible with any collection from AS (). Defining now
established in a similar way by isolating terms
w (A) = w(A) for A 2 A() and
Z proportional to 1 2 in the cluster expansion of
Y log Z, 1 ; 2 ( 1 ; 2 ) with
w A 
eH
d
x
x2[A2A1 [  [ Ak A[S Z;1 ;2 1 ; 2
44 Z 1 1 h1 i 2 h2 i 1 2 h1 2 i 50

for A = (A1 , . . . , Ak ) 2 AS (), we get Z,  ( ) The resulting claim can be readily generalized to one
exactly in the form [18], about the decay of the correlation h1 ; . ..; k i in
X Y terms of the shortest tree connecting supports
Z; w A 45 S1 , ... , Sk of the functions 1 , . .., k .
I W S  A2I
Low-Temperature Expansions
As a result, we have
X Finally, in some models with symmetries, we can apply
log Z ; aXwX
46 cluster expansion also at low temperatures. Let us
X2XW S  illustrate it again in the case of Ising model. This time,
we take the partition function Z  () with plus
allowing easily to isolate terms linear in : namely,
boundary conditions. First, let us define for each
the terms with multi-indices X with supp X \ AS ()
nearest-neighbor bond hx, yi its dual as the (d  1)-
consisting of a single collection, say A0 , that occurs
dimensional closed unit hypercube orthogonal to the
with multiplicity one, X(A0 ) = 1. Explicitly, using
segment from x to y and bisecting it at its center. For a
X S;A0  fX 2 X W S  : supp X \ AS  given configuration  , we consider the boundary of
fA0 g; XA0 1g 47 the regions of constant spins consisting of the union
@( ) of all hypercubes that are dual to nearest-
we get neighbor bonds hx, yi for which x 6 y . The contours
X X corresponding to  are now defined as the connected
hi aXwX 48
A0 2AS  X2X S;A0 
components of @( ). Notice that, under the fixed
boundary condition, there is a one-to-one correspon-
It is easy to show that, for sufficiently small , the series dence between configurations  and sets  of
on the right-hand side is absolutely convergent even if mutually compatible (disconnected) contours in .
536 Cluster Expansion

Observing that the number of faces in @( ) is just does not vanish only if A(X) \  6 ;, we can expand
the sum of the areas j j of the contours 2 , we the product to obtain decorations of the boundary
get the polymer representation @ by clusters fX . In the case of interface these clusters
! can be incorporated into the weight of interface, while
X X

Z  e jEj
exp  j j 51 on a fixed boundary they yield a wall free energy.
 2 The possibility of the (low-temperature) polymer
representation of the partition function in terms of
where the sum is over all collections of disjoint contours is based on the $  symmetry of the
contours in . Here E() is the set of all bonds hx, yi Ising model. In absence of such a symmetry, cluster
with at least one endpoint x, y in . expansions can still be used, but in the framework of
The condition [24] with r( ) =  yields a similar PirogovSinai theory (see PirogovSinai Theory).
bound on the weights w( ) = ej j as in the high-
temperature expansion. To verify it, for  sufficiently
large, boils down to the evaluation of number of Bibliographical Notes
contours of size n that contain a fixed site.
As a result, we can employ the cluster expansion Cluster expansions originated from the works of Ursell,
theorem to get Yvon, Mayer, and others and were first studied in terms
X of formal power series. The combinatorial and enu-
log Z  jEj aXwX 52 meration problems considered in this framework were
X:X2X C summarized in Uhlenbeck and Ford (1962). For related
with an explicit formula for the limit topics in modern language, see Bergeron et al. (1998).
The convergence results for Mayer and virial expansions
X aX X for dilute gas were first proved in the works of Penrose,
p d w 53
jAXj Lebowitz, Groenveld, and Ruelle (see Ruelle (1969) for
X:AX30
a detailed survey). General polymer models on lattice
Here, A(X) is the set of sites attached to contours were discussed by Gruber and Kunz (1971) (see also
from supp X, Simon (1993) for discussion of high-temperature and
low-temperature cluster expansions of lattice models).
AX [ 2supp X A 54
Abstract polymer models were introduced in Kotecky
with and Preiss (1986). An elegant proof of a general claim
presented by Dobrushin (1996) was further extended
A fx 2 Zd j such that distx;  1=2g 55 and summarized by Scott and Sokal (2005). We follow
As a consequence of the fact that [53] is, for large their reformulation of the Dobrushin condition. Cluster
, an absolutely convergent
P sum of analytic terms expansions with a view on applications in quantum field
a(X)wX = a(X)e

X( )j j
(considered as functions theory are reviewed in Brydges (1986).
of ), the function p() is, for large , analytic in .
See also: Phase Transitions in Continuous Systems;
The fact that one can explicitly express the
PirogovSinai Theory; Wulff Droplets.
difference log Z  ()  jjp() (cf. [28]) found
numerous applications in situations where one
needs an accurate evaluation of the influence of the Further Reading
boundary of the region  on the partition function.
One such example is a study of microscopic Bergeron F, Labelle G, and Leroux P (1998) Combinatorial
behavior of interfaces. The main idea is to use the Species and Tree-Like Structures, Coll. Encyclopaedia of
Mathematics and Its Applications, vol. 67. Cambridge, MA:
explicit expression in the form Cambridge University Press.
Z Brydges DC (1986) A short course on cluster expansions. In:
 
8 9 Osterwalder K and Stora R (eds.) Critical Phenomena, Random
< X = Systems, Gauge Theories, pp. 129183. Les Houches, Session
X jAX \ j
expfpjjgexp aXw XLIII, 1984. Amsterdam/New York: Elsevier.
: jAXj ;
X:AX\c 6; Dobrushin RL (1996) Estimates of semi-invariants for the Ising
Y model at low temperatures. In: Dobrushin RL, Minlos RA,
expfpjjg 1 fX 56 Shukin MA, and Vershik AM (eds.) Topics in Statistical and
X:AX\c 6; Theoretical Physics, pp. 5981. Providence, RI: American
Mathematical Society.
Noticing that
Gruber C and Kunz H (1971) General properties of polymer
  systems. Communications Mathematical Physics 22: 133161.
jAX \ j
fX exp aXwX 1 Kotecky R and Preiss D (1986) Cluster expansion for abstract polymer
jAXj models. Communications in Mathematical Physics 103: 491498.
Coherent States 537

Ruelle D (1969) Statistical Mechanics: Rigorous Results, The Simon B (1993) The Statistical Mechanics of Lattice Gases, Princeton
Mathematical Physics Monograph Series. Reading, MA: Series in Physics, vol. 1. Princeton: Princeton University Press.
Benjamin. Uhlenbeck GE and Ford GW (1962) The theory of linear graphs with
Scott AD and Sokal AD (2005) The repulsive lattice gas, the applications to the theory of the virial development of the
independent-set polynomial, and the Lovasz local lemma. properties of gases. In: de Boer J and Uhlenbeck GE (eds.) Studies
Journal of Statistical Physics 118: 11511261. in Statistical Mechanics, vol. I, Amsterdam: North-Holland.

Coherent States
S T Ali, Concordia University, Montreal, QC, Canada and group-theoretical properties which are taken as
2006 Elsevier Ltd. All rights reserved.
starting points in looking for generalizations. We
now define the canonical coherent states mathemati-
cally and enumerate a few of these properties.
Introduction Suppose that the vectors j0i, j1i, . . . , jni, . . . , cor-
respond to quantum states of 0, 1, . . . , n, . . . , exci-
Very generally, a family of coherent states is a set of tons, respectively. The Hilbert space of these states,
continuously labeled quantum states, with specific in which they form an orthonormal basis, is often
mathematical and physical properties, in terms known as Fock space. The canonical coherent states
of which arbitrary quantum states can be expressed are then defined in terms of this basis, for each
as linear superpositions. Since coherent states are complex number z, by the analytic expansion:
continuously labeled, they form overcomplete
sets of vectors in the Hilbert space of states. 2 X1
zn
jzi ejzj =2
p jni 1
Originally these states were introduced into physics n!
n0
by Schrodinger (1926), as a family of quantum
states in terms of which the transition from quantum The states jzi are normalized to unity: hzjzi = 1.
to classical mechanics could be conveniently studied. They satisfy the formal eigenvalue equation
These states have the minimal uncertainty property,
in the sense that they saturate the Heisenberg ajzi zjzi 2
uncertainty relations. The name coherent state was where a is the annihilation operator for excitons, which
applied when these states were rediscovered in the acts on the basis vectors (Fock states) jni as follows:
context of quantum optical radiation by Glauber, p
Klauder, and Sudarshan. It was demonstrated that in ajni njn  1i 3
these states the correlation functions of the quantum
optical field factorize as they do in classical optics, Its adjoint ay has the action
so that the optical field has a near-classical behavior, p
ay jni n 1jn 1i 4
with the optical beam being coherent. In this article,
we shall refer to these originally studied coherent and
states as canonical coherent states (CCS).
The canonical coherent states, apart from their a; ay  aay  ay a I 5
use in quantum optics, have also been found to be
I being the identity operator on Fock space.
extremely useful in computations in atomic and
Introducing the self-adjoint operators Q and P, of
molecular physics, in quantum statistical mechanics,
position and momentum, respectively,
and in certain areas of mathematics and mathema-
tical physics, including harmonic analysis, symplec- a ay a  ay
tic geometry, and quantization theory. Their wide Q p ; P p 6
2 i 2
applicability has prompted the search for other
families of states sharing similar mathematical and it is possible to demonstrate the minimal uncertainty
physical properties. These other families of states are property referred to above (we take h = 1):
usually called generalized coherent states, even when hQihPi 12 7
there is no link to optical coherence in such studies.
where for any observable A,
h i1=2
Some Properties of CCS hAi hzjA2 jzi  hzjAjzi2
In addition to the minimal uncertainty property, the
canonical coherent states have a number of analytical is its dispersion in the state jzi.
538 Coherent States

One can also prove the resolution of the identity, The operators U(q, p) realize a (projective) unitary,
Z irreducible representation of the WeylHeisenberg
dq dp
jzihzj I 8 group, which is the group whose Lie algebra has the
C 2 generators Q, P, and I, obeying the commutation
p
where z = (1= 2)(q  ip) has been written pin terms relations [Q, P] = iI. The existence of the resolution
of pits
real and imaginary parts (1= 2)q and of the identity [8] is the statement of the fact that
(1= 2)p, respectively. The above operator integral this representation is square integrable (a notion
is to be understood in the weak sense, as will be which will be elaborated upon in the section Some
explained later. Equation [8] incorporates the examples) which gives us the next paradigm for
mathematical fact that the set of vectors jzi is building coherent states, namely by the action, on a
overcomplete in the Hilbert space. Indeed, using [8] fixed vector, of the unitary operators of a square-
any vector ji in the Hilbert space can be written as integrable representation of a locally compact
a linear (integral) superposition of these states: group.
Z The above range of properties, which are enjoyed
dq dp by the CCS, cannot all be expected to hold when
ji zjzi
C 2 looking for generalizations. It then becomes neces-
sary to adopt one or other of these properties as the
where  is the component function, (z) = hjzi.
starting point and to proceed from there. In so
Thus, the coherent states jzi form a continuously
doing, it is best first to set down a general definition
labeled total set of vectors in the Hilbert space and
of coherent states, involving a minimal mathema-
since this space is separable, they are an over-
tical structure. Motivated more by possible applica-
complete set.
tions to physics, we do this in the following section.
Analytic properties of the vectors jzi emerge when
the scalar product hjzi is taken with respect to an
arbitrary vector ji in Fock space. From [1] it is General Definition
clear that
Let H be an abstract, separable Hilbert space over
jzj2 =2 the complexes, X a locally compact space and d a
Fz hjzi e f z
measure on X. Let jx, ii be a family of vectors in H ,
where f is an entire analytic function in the complex defined for each x in X and i = 1, 2, 3, . . . , N, where
variable z. Moreover, the mapping  7! f is an N is usually a finite integer, although it could also
isometric embedding of the Fock space onto the be infinite. We assume that this set of vectors
Hilbert space of analytic functions, with respect to possesses the following properties:
the norm
1. For each i, the mapping x 7! jx, ii is weakly
Z 1=2 continuous, that is, for each vector ji in H , the
kf k jf zj2 dz; z 9 function i (x) = hx, iji is continuous (in the
C
topology of X).
2
defined by the measure d(z, z) = (1=2)ejzj dq dp. 2. For each x in X, the vectors jx, ii, i = 1, 2, . . . , N,
Group-theoretical properties of the CCS can be are linearly independent.
demonstrated by noting that 3. The resolution of the identity
XN Z
ay n
jni p j0i and aj0i 0 jx; iihx; ijdx IH 12
n! i1 X

using which [1] can be recast into the form holds in the weak sense on the Hilbert space H ,
2
that is, for any two vectors ji,j i in H , the
=2 zay
jzi ejzj e j0i Uzj0i following equality holds:
10
zay
Uz e  za X N Z
hjx; iihx; ij idx hj i
The vectors jzi and the unitary operator U(z) can be i1 X
reexpressed in terms of the real variables q, p and the
A set of vectors jx, ii satisfying the above three
operators Q, P as
properties is called a family of generalized vector
jzi jq; pi Uq; pj0i coherent states. In case N = 1, the set is called a family
11 of generalized coherent states. Sometimes the resolu-
Uq; p eipQqP tion of the identity condition is replaced by a weaker
Coherent States 539

i
condition, with the vectors jx, ii simply forming a total defined by xx (y) = K(y, x)ei , is the image in H K of
set in H and the functions Fi (x) = hx, iji, as ji runs the generalized vector coherent state jx, ii, under the
i
through H , forming a reproducing kernel Hilbert above-mentioned isometry. The vectors xx span
space. Alternatively, the identity on the right-hand the space H K and for an arbitrary element Y of this
side of [12] could also be replaced by a bounded, Hilbert space, the reproducing property [16] of the
positive operator T with bounded inverse. In this case, kernel implies the relation
the term frame is also used for the family of general- Z
ized coherent states. For physical applications, how- Kx; yYydy Yx 17
ever, the resolution of the identity condition is always X
assumed to hold, although the measure d could be of Conversely, given any reproducing kernel Hilbert
a very general nature (possibly also singular). The space, with a kernel satisfying the relations [15] and
objective in all these cases is to ensure that an arbitrary [16], generalized coherent states can be constructed
vector ji be expressible as a linear (integral) as above in terms of this kernel. Mathematically,
combination of these vectors. Indeed, [12] is immedi- therefore, generalized coherent states are just the set
ately seen to imply that of vectors naturally defined by the kernel in a
XN Z reproducing kernel Hilbert space.
ji i xjx; iidx 13
i1 X

where i (x) = hx, iji. Some Examples


Associated to a family of generalized coherent
states on a Hilbert space H , there is an intrinsic We present in this section some of the more
isomorphism between this space and a Hilbert space commonly used types of coherent states, as illustra-
of (in general, vector valued) continuous functions tions of the general structure given above.
over X. Using this isomorphism, it is always possible A large class of generalizations of the canonical
to look upon coherent states as a family of coherent states [1] is obtained by a simple modifica-
continuous functions which are square integrable tion of their analytic structure. Let x1  x2     
with respect to the measure d. To demonstrate this, xn     be an infinite sequence of positive numbers
we note that, in view of [12], for each vector ji in (x1 6 0). Define xn ! = x1 x2    xn and by convention
H , the vector-valued function Y(x) on x, with set x0 ! = 1. In the same Fock space in which the CCS
components i (x) = hx, iji, i = 1, 2, . . . , N, satisfies were described, we now define the related deformed
the norm condition or nonlinear coherent states via the analytic
expansion
XN Z
ji xj2 dx kk2H X1
zn
i1 X jzi N jzj2 1=2 p jni 18
n0 xn !
This means that the set of vectors Y, as ji runs
through H , is a closed subspace of the Hilbert space The normalization factor N (jzj2 ) is chosen so that
L2CN (X, d) of N-vector-valued functions on x. Let us hz j zi = 1. These generalized coherent states are
denote this subspace by H K and note that this space overcomplete in the Fock space and satisfy a
is a reproducing kernel Hilbert space with a matrix- resolution of the identity of the type
valued kernel K(x, y) having matrix elements Z
jzihzjN jzj2 dz; z I 19
Kx; yij hx; ijy; ji; i; j 1; 2; . . . ; N 14 D

and enjoying the properties D being an open disk in the complex plane of radius
L,
P1 the n radius
p of convergence of the series
Kx; yij Ky; xji ; Kx; xii > 0 15 n=0 (z = xn !). (In the case of the CCS, L = 1.)
The measure d is generically of the form d d(r)
and (for z = rei ), where d is related to the xn ! through
N Z
X the moment condition
Kx; zi Kz; yj dz Kx; yij 16 Z L
X xn !
1 r 2n dr; n 0; 1; 2; . . . 20
2 0
If ei , i = 1, 2, . . . , N, are the vectors constituting the
canonical basis of CN , then for each x in X and This means that once the quantities xn ! are specified,
i
i = 1, 2, . . . , N, the vector-valued function xx on X, the measure d is to be determined by solving the
540 Coherent States

moment problem [20], which of course may not generalized coherent states arise from representa-
always have a solution. This puts a constraint on the tions of the group SU(1, 1) belonging to the discrete
type of sequences {xn } which may be used in the series, each irreducible representation being labeled
construction. by a specific value of the index . The associated
Once again, we see that for an arbitrary vector ji Hilbert space of functions, analytic on the unit disk,
in the Fock space, the function F(z) = h j zi, of the is a subspace of L2 (D, d ), with
complex variable z, is of the form F(z) =
N (jzj2 )1=2 f (z), where f is an analytic function on 1  r2 22
d z; z 2  1 r dr d
the domain D. The reproducing kernel associated to 
these coherent states is z rei

Kz; z0 hzjz0 i which can be obtained by solving the moment


h i1=2 X problem [20]. The resolution of the identity satisfied
1
zz0 n
N jzj2 N jz0 j2 21 by these states is
n0
xn ! Z
2  1 r dr d
jzihzj I 25
By analogy with [2], one can define a generalized  D 1  r2 2
annihilation operator A by its action on the vectors jzi,
The associated generalized creation and annihilation
Ajzi zjzi 22 operators are
r
and its adjoint operator Ay . These act on the Fock n
Ajni jn  1i
states jni as follows: 2 n  1
r 26
p n1
Ajni xn jn  1i y
A jni jn 1i
p 23 2 n
Ay jni xn1 jn 1i
so that, clearly, [A, Ay ] 6 I.
Depending on the exact values of the quantities xn , Operators A and Ay of the general type defined in
these two operators, together with the identity I and [23] are also known as ladder operators. When such
all their commutators, could generate a wide range operators appear as generators of representations of
of algebras including various deformed quantum Lie algebras, their eigenvectors (see [22]) are usually
algebras. The term nonlinear, as often applied to called BarutGirardello coherent states. As an example,
these generalized coherent states, comes again from the representation of the Lie algebra of SU(1,1) on the
quantum optics, where many such families of states Fock space is generated by the three operators K , K ,
are used in studying the interaction between the and K3 , which satisfy the commutation relations
radiation field and atoms, and the strength of the
interaction itself depends on the frequency of K3 ; K  K ; K ; K  2K3 27
radiation. Of course, these coherent states will not They act on the vectors jni as follows:
in general have either the group-theoretical or the p
minimal uncertainty properties of the CCS. K jni n2 n  1jn  1i
The following is an example of generalized K Ky 28
coherent states of the above type, built over the
K3 jni  njni
unit disk, D = {z 2 C j jzj < 1}: on the Fock space,
we define the states Thus, K j0i = 0 and
X1   1
2n 1=2 n jni p Kn j0i
jzi 1  r2  z jni r jzj 24 n!2n
n0
n!
The BarutGirardello coherent states jzi are now
where  = 1, 3=2, 2, 5=2, . . . , and
defined as the formal eigenvectors of the ladder
a m operator K :
am
a K jzi zjzi; z2C 29
aa 1a 2    a m  1
They have the analytic form
Comparing [24] with [18] we see that xn = n=(2 X
jzj21 1
zn
n  1) so that limn ! 1 xn = 1. Thus, the infinite sum jzi p p jni 30
is convergent for any z lying in the unit disk. These I21 2jzj n0 n!2 n  1!
Coherent States 541

where I (x) is the order- modified Bessel function independent of whether the left- or the right-invariant
of the first kind. These coherent states satisfy the measure is used, so we could just as well have used
resolution of the identity, the right-invariant measure.) A vector j i, satisfying
Z [35], is said to be admissible, and it can be shown
2
jzihzjK21 2rI21 2rr dr d I that the existence of one such vector guarantees the
 C 31 existence of an entire dense set of such vectors in H .
z rei Moreover, if the group G is unimodular, that is, if the
left- and the right-invariant measures coincide, then
where again, K (x) is the order- modified Bessel the existence of one admissible vector implies that
function of the second kind. every vector in H is admissible. Given a square-
A nonanalytic extension of the expression [18] is integrable representation and an admissible vector
often used to define generalized coherent states j i, let us define the vectors
associated to physical Hamiltonians having pure
point spectra. These coherent states, known as 1
GazeauKlauder coherent states, are labeled by jgi p Ugj i 36
c
actionangle variables. SupposePthat we are given
the physical Hamiltonian H = 1 n = 0 En jnihnj, with for all g in the group G. These vectors are to be seen
E0 = 0, that is, it has the energy eigenvalues En and as the analogs of the canonical coherent states [11],
eigenvectors jni, which we assume to form an written there in terms of the representation of the
orthonormal basis for the Hilbert space of states H . WeylHeisenberg group. Next, it can be shown that
Let us write the eigenvalues as En = ! n by introdu- the resolution of the identity
cing a sequence of dimensionless quantities { n } Z
ordered as: 0 = 0 < 1 < 2 <    . Then, for all J  0 jgihgjdg IH 37
and
2 R, the GazeauKlauder coherent states are G
defined as
holds on H . Thus, the vectors jgi constitute a family
X1 n=2 i n

J e of generalized coherent states. The functions


jJ;
i N J1=2 p jni 32 F(g) = hgji for all vectors ji in H are square
k0
n !
integrable with respect to the measure d and the
where again N is a normalization factor, which set of such functions, which in fact are continuous in
turns out to be dependent on J only. These coherent the topology of G, forms a closed subspace of
states satisfy the temporal stability condition L2 (G, d). Furthermore, the mapping  7! F is a
linear isometry between H and L2 (G, d) and under
eiHt j J;
i j J;
!ti 33 this isometry the representation U gets mapped to a
subrepresentation of the left regular representation
and the action identity
of G on L2 (G, d).
h J;
jHj J;
iH !J 34 A typical example of the above construction is
provided by the affine group, GAff . This is the group
While these generalized coherent states do form an of all 2  2 matrices of the type
overcomplete set in H , the resolution of the identity
 
is generally not given by an integral relation of the a b
type [12]. g 38
0 1
For the second set of examples of generalized
coherent states, we take the group-theoretical structure a and b being real numbers with a 6 0. We shall
of the CCS as the point of departure. Let G be a also write g = (b, a). This group is nonunimodular,
locally compact group and suppose that it has a with the left-invariant measure being given by
continuous, irreducible representation on a Hilbert d(b, a) = (1=a2 ) db da. (The right-invariant measure
space H by unitary operators U(g), g 2 G. This is (1=a) db da.) The affine group has a unitary
representation is called square integrable if there exists irreducible representation on the Hilbert space
a nonzero vector j i in H for which the integral L2 (R, dx). Vectors in L2 (R, dx) are measurable
Z functions (x) of the real variable x and the
c jh jUg ij2 dg 35 (unitary) operators U(b, a) of this representation
G act on them in the manner
converges. Here d is a Haar measure of G, which  
1 xb
for definiteness, we take to be the left-invariant Ub; ax p  39
measure. (The value of the above integral is jaj a
542 Coherent States

If is a function in L2 (R, dx) such that its Fourier Choosing a coset representative g(x) 2 G, for each
transform b satisfies the condition coset x, we define the vectors
Z jxi Ugxj i 45
j bkj2
dk < 1 40
R jkj in H . The dependence of these vectors on the specific
choice of the coset representative g(x), is only
then it can be shown to be an admissible vector, that is, through a phase. Thus, if instead of g(x) we took a
Z different representative g(x)0 2 G for the same coset
db da
c jh jUb; a ij2 <1 x, then since g(x)0 = g(x)h for some h 2 H, in view of
GAff a2 [44] we would have U(g(x)0 )j i = ei!(h) jxi. Hence,
quantum mechanically, both jxi and U(g(x)0 )j i
Thus, following the general construction outlined
represent the same physical state and in particular,
above, the vectors
the projection operator jxihxj depends only on the
1 coset. Vectors jxi, defined in this manner, are called
jb; ai p Ub; a ; b; a 2 GAff 41 GilmorePerelomov coherent states. Since U is
c
assumed to be irreducible, the set of all these vectors
as x runs through G=H is dense in H . In this
define a family of generalized coherent states and
definition of generalized coherent states, no resolu-
one has the resolution of the identity
tion of the identity is postulated. However, if X
Z carries an invariant measure, under the natural
db da
jb; aihb; aj I 42 action of G, and if the formal operator B defined as
GAff a2
Z
on L2 (R, dx). B jxihxj dx
X
In the signal-analysis literature a vector satisfying
the admissibility condition [40] is called a mother is bounded, then it is necessarily a multiple of the
wavelet and the generalized coherent states [41] are identity and a resolution of the identity is again
called wavelets. Signals are then identified with retrieved.
vectors ji in L2 (R, dx) and the function The Perelomov construction can be used to define
coherent states for any locally compact group. On
Fb; a hb; aji 43 the other hand, there exist other constructions of
generalized coherent states, using group representa-
is called the continuous wavelet transform of the tions, which generalize the notion of square integr-
signal . ability to homogeneous spaces of the group. Briefly,
There exist alternative ways of constructing in this approach one starts with a unitary irreducible
generalized coherent states using group representa- representation U and attempts to find a vector j i, a
tions. For example, the Perelomov method is based subgroup H and a section : G=H ! G such that
on the observation that the vector j0i, appearing in Z
the construction of the canonical coherent states in
jxihxj dx T 46
[10] and [11] using the representation of the Weyl G=H
Heisenberg group, is invariant up to a phase, under
the action of its center. Consequently, the coherent where jxi = U( (x))j i, T is a bounded, positive
states jzi, as written in [10], are labeled, not by operator with bounded inverse and d is a quasi-
elements of the group itself, but only by the points in invariant measure on X = G=H. It is not assumed
the quotient space of the group by its (central) phase that j i be invariant up to a phase under the action
subgroup. Generally, let G be a locally compact of H and clearly, the best situation is when T is a
group and U a unitary irreducible representation of multiple of the identity. Although somewhat techni-
it on the Hilbert space H . We do not assume U to be cal, this general construction is of enormous
square integrable. We fix a vector j i in H , of unit versatility for semidirect product groups of the type
norm and denote by H the subgroup of G consisting Rn o K, where K is a closed subgroup of GL(n, R).
of all elements h for which Thus, it is useful for many physically important
groups, such as the Poincare or the Euclidean group,
Uhj i ei!h j i 44 which do not have square-integrable representations
in the sense of the earlier definition (see eqn [35]).
where ! is a real-valued function of h. Let X = G=H The integral condition [46] ensures that any vector
be the left-coset space and x an arbitrary element in X. ji in H can be written in terms of the jxi. Indeed, it
Coherent States 543

is easy to see that one has the integral representation taking the combination Q iP, one obtains the
of a vector, minimal uncertainty states,
Z p
y 2 y
ji xjxi dx jz; i N z; 1=2 ewa =2 ez= 21wa j0i 50
X
x hxjT 1 i N (z, ) being a normalization constant and
w = (1  )=(1 ). The case  = 1 does not lead
in terms of the generalized coherent states. to any solutions, while  = 1 gives the canonical
The canonical coherent states satisfy the minimal coherent states [10]. For real  6 1 the above states
uncertainty relation [7]. It is possible to build are the well-known squeezed states of quantum
families of coherent states by generalizing from this optics.
condition. To do this, one typically starts with two Our final example is that of a family of vector
self-adjoint generators in the Lie algebra of a coherent states, which will be obtained essentially
particular group representation and then looks for by replacing the complex variable z in [18] by a
appropriate eigenvectors of a complex combination matrix variable. We choose the domain  = C22
of these two generators. For two self-adjoint (all 2  2 complex matrices), equipped with the
operators B and C on a Hilbert space H , satisfying measure
the commutation relation [B, C] = iD and any
y
normalized vector  in H , one can prove the y etrZ Z  Y
2

Heisenberg uncertainty relation dZ ; Z dxkj ^ dykj


4 k;j1
hDi2
B2 C2  47 where Z is an element of  and zkj = xkj iykj are its
4
entries. One can then prove the matrix orthogon-
where hXi = hjXi and X2 = hX2 i  hXi2 , for ality relation
any operator X on H . More generally, one can prove Z
the SchrodingerRobertson uncertainty relation Z k Z y dZ ; Z y
1h i 
Z
B2 C2  hDi2 hFi2 48
1
trZ k Z y  dZ ; Z y I2
4 2 
where hFi = hBC CBi  2hBihCi measures the bkI2 ; k; 0; 1; 2; . . . ; 1 51
correlation between B and C in the state .
If hFi = 0, the above relation reduces to the I2 being the 2  2 identity matrix and
Heisenberg uncertainty relation. On the other
hand, if hDi = 0, the Heisenberg uncertainty rela- k 3!
bk
tions become redundant. Suppose now that B and 2k 1k 2 52
C are two self-adjoint elements of the Lie algebra in k 1; 2; 3; . . . ; b0 1
the unitary irreducible representation of a Lie group
and we look for states ji which minimize the Consider the Hilbert space H~ = L2C2 (, d) of square
uncertainty relation [48], that is, for which integrable, two-component vector-valued functions
the equality holds. It turns out that such states on  and in it consider the vectors jY ik i, i = 1, 2,
can be found by considering the linear combination k = 0, 1, 2, . . . ,1, defined by the C2 -valued
B iC, for a fixed complex number , and solving functions,
the formal eigenvalue equation
1
Y ik Z y p Z yk i 53
B iCjz; i zjz; i bk
49
with z hBi ihCi
where the vectors i , i = 1, 2, form an orthonormal
Solutions to this equation for which jj = 1 are basis of C2 . By virtue of [51], the vectors jYik i
called squeezed states, since in this case B 6 C. constitute an orthonormal set in H~ , that is,
Generally, the states jz, i are known as intelligent
states. As an example, for the operators Q and P in hY ik jY j iH~ k ij
[6], for which one has
Denote by H K the Hilbert subspace of H~ generated
1h i
by this set of vectors. This can be shown to be a
Q2 P2  1 hFi2
4 reproducing kernel Hilbert space of analytic
544 Coherent States

functions in the variable Z y , with the matrix valued As already mentioned, generalized coherent states
kernel K :    7! C22 : are widely used in signal analysis. The wavelet
transform F(b, a) = hb, aji, introduced in [43], is a
2 X
X 1
KZ 0y ; Z Y ik Z 0y Y ik Z y y timefrequency transform, in which the parameter b
i1 k0 is identified with time and 1/a with frequency.
2 X
X Wavelet transforms are used extensively to analyze,
1
Z 0yk Z k
54 encode, and reconstruct signals arising in many
i1 k0
bk different branches of physics, engineering, seismo-
graphy, electronic data processing, etc. Similarly, the
Vector coherent states in H K are then naturally
canonical coherent states, as written in [11], give
associated to this kernel and are given by
rise to the transform F(q, p) = hq, p j i. Again, if q is
X2 X 1
jy Z k i j interpreted as time and p as frequency, then this is
jZ ; ii p jY k i just the windowed Fourier transform, also used
j1 k0 bk 55
extensively in signal processing. More general
0y 0y
that is; jZ ; iiZ KZ ; Z i wavelets, from higher-dimensional affine groups,
are used to analyze higher-dimensional signals,
for i = 1, 2 and all Z in . They satisfy the resolution while wavelet like transforms from other groups
of the identity have been used to study signals exhibiting different
X2 Z geometries. In particular, wavelet transforms from
jZ ; iihZ ; ijdZ ; Z y IH K 56 spherical geometries have been applied to the study
i1  of brain signals and to astrophysical data.
Our final example is taken from quantization
The expression for the jZ , ii in [55], involving the theory. A quantization technique is a method for
sum, should be compared to [18], of which it is a performing the transition from a given classical
direct analog. mechanical system to its quantum counterpart.
Many methods have been developed to accomplish
this and the use of coherent states is one of them.
Some Applications of Coherent States Suppose that we are given a family of coherent
states jxi in a Hilbert space H , where the set X from
Generalized coherent states have many applications
which x is taken is a classical phase space. This
in physics, signal analysis, and mathematics, of
means that X is a symplectic manifold with an
which we mention a few here. As an example of
associated 2-form !, which defines a Poisson
an application of deformed coherent states, we take
bracket on the set of observables of the classical
 n  system, which are real-valued functions on X. There
q  qn 1=2
xn ; q>0 57 is a natural measure d!, defined on X by the 2-form
q  q1
!. Let us assume that the coherent states jxi satisfy a
in the definition of these states in [18]. It is then easy resolution on the identity with respect to this
to see that the operators A and Ay , defined in [23], measure:
satisfy the q-deformed commutation relation
Z
y y N
AA  qA A q 58 jxihxjd!x IH
X
where N is the usual number operator, which acts
on the Fock states as Njni = njni. Clearly, in the In this case, the coherent states may be used to
limit as q ! 1, these q-deformed coherent states go quantize the observables of the classical system in
over to the canonical coherent states, with the the following way: let f be a real-valued function on
operators A and Ay becoming the usual creation X, representing a classical observable and suppose
and annihilation operators a and ay , respectively. that the formal operator
The operators A and Ay and the commutation Z
relation [58] describe a system of q-deformed b
f f xjxihxjd!x 59
oscillators, which have been used to describe, for X
example, the vibrations of polyatomic molecules.
The potential energy between the atoms of such is well defined as a self-adjoint operator on H . Then
a molecule has anharmonic terms, leading to we may take the operator b f to be the quantized
a deformation of the usual oscillator algebra, observable corresponding to the classical observable
generated by the operators a and ay . f. Suppose that we have two such operators, b f and b
g,
Cohomology Theories 545

corresponding to the two classical observables f and It can be verified that these two operators satisfy the
g, which have the Poisson bracket {f , g}, defined via canonical commutation relations [Q, P] = iIH , as
the 2-form !. We then check if the quantization required.
condition
2 b See also: Solitons and KacMoody Lie Algebras;
ff;d
gg f ; b
g 60 Wavelets: Mathematical Theory.
ih
where h is Plancks constant, is satisfied. Generally
this will be the case for a certain number of classical
Further Reading
observables. This method of quantization has been
most successfully used for manifolds X which have a Ali ST, Antoine J-P, and Gazeau J-P (2000) Coherent States,
(complex) Kahler structure. Over such a manifold, Wavelets and Their Generalizations. New York: Springer.
one can define a Hilbert space of analytic functions, Ali ST and Englis M (2005) Quantization methods a guide for
physicists and analysts. Reviews in Mathematical Physics
which has a reproducing kernel and hence a 17: 391490.
naturally associated set of coherent states. As a Brif C (1997) SU(2) and SU(1,1) algebra eigenstates: a unified
specific example, we take the case of canonical analytic approach to coherent and intelligent states. Interna-
coherent states [11]. We can identify the complex tional Journal of Theoretical Physics 36: 16511682.
plane C with the phase space R2 of a free classical Klauder JR and Sudarshan ECG (1968) Fundamentals of
Quantum Optics. New York: Benjamin.
particle having a single degree of freedom. The Klauder JR and Skagerstam BS (1985) Coherent States
measure d! in this case is just (1=2)dq dp. If we Applications in Physics and Mathematical Physics. Singapore:
now quantize the classical observables f (q, p) = q World Scientific.
and f (q, p) = p, of position and momentum, respec- Perelomov AM (1986) Generalized Coherent States and their
tively, using the canonical coherent states, we obtain Applications. Berlin: Springer.
Schrodinger E (1926) Der stetige Ubergang von der Mikro- zur
the two operators Makromechanik. Naturwissenschaften 14: 664666.
Z Sivakumar S (2000) Studies on nonlinear coherent states. Journal
dq dp
Q qjq; pihq; pj of Optics B: Quantum Semiclass. Opt. 2: R61R75.
2 2 Zhang W-M, Feng DH, and Gilmore RG (1990) Coherent states:
ZR 61
dq dp theory and some applications. Reviews of Modern Physics
P pjq; pihq; pj 62: 867927.
R2 2

Cohomology Theories
U Tillmann, University of Oxford, Oxford, UK To illustrate the interplay between the local and
2006 Elsevier Ltd. All rights reserved. global structure, consider the Euler characteristic of
a compact manifold; as will be explained below,
cohomology is a refinement of the Euler character-
Introduction istic. For simplicity, assume that the manifold M is a
surface and that we have chosen a way of dividing
The origins of cohomology theory are found in the surface into triangles. The Euler characteristic is
topology and algebra at the beginning of the last then defined to be
century but since then it has become a tool of nearly
every branch of mathematics. Its a way of life! M F  E V
Naturally, this article can only give a glimpse at the where F denotes the number of faces, E the number
rich subject. We take here the point of view of of edges, and V the number of vertices in the
algebraic topology and discuss only the cohomology triangulation. Remarkably, this number does not
of spaces. depend on the triangulation. Yet, this simple, easy to
Cohomology reflects the global properties of a compute number can already distinguish the differ-
manifold, or more generally of a topological space. ent types of closed, oriented surfaces: for the sphere
It has two crucial properties: it only depends on the we have = 2, the torus = 0, and in general for
homotopy type of the space and is determined by any surface Mg of genus g
local data. The latter property makes it in general
computable. Mg 2  2g
546 Cohomology Theories

The Euler characteristic also tells us something Z, C2 , C1 , C0 are the free abelian groups generated
about the geometry and analysis of the manifold. For by the set of faces, edges, and vertices, respectively;
example, the total curvature of a surface is equal to its Ci = {0} for i  3. The map @2 assigns to a triangle
Euler characteristic. This is the GaussBonnet theo- the sum of its edges; @1 maps an edge to the sum of
rem and an analogous result holds in higher dimen- its endpoints. If we are working with Z2 coeffi-
sions. Another striking result is the PoincareHopf cients, this defines for us a chain complex as [2] is
theorem which equates the Euler characteristic with clearly satisfied; in general, one needs to keep track
the total index of a vector field and thus gives strong of the orientations of the triangles and edges and
restrictions on what kind of vector fields can exist on take sums with appropriate signs (cf. [6] below). An
a manifold. This interplay between global analysis easy calculation shows that for an oriented, closed
and topology has been one of the most exciting and surface Mg of genus g, we have
fruitful research areas and is most powerfully
H0 Mg ; Z Z
expressed in the celebrated AtiyahSinger index
theorem, which determines the analytic index of an H1 Mg ; Z Z2g
4
elliptic operator, such as the Dirac operator on a spin H2 Mg ; Z Z
manifold, in terms of cohomology classes.
Hi Mg ; Z 0 for i  3
Note that the Euler characteristic can be recov-
Chain Complexes and Homology ered as the alternating sum of the rank of the
homology groups:
There are several different geometric definitions of
the cohomology of a topological space. All share XM
dim

some basic algebraic structure which we will explain M 1i rk Hi M; Z 5


i0
first.
A chain complex (C , @ ) Every smooth manifold M has a triangulation, so
@i1 @i @1
that its simplicial homology can be defined just as
   Ci1 ! Ci ! Ci1    ! C0 1 above. More generally, simplicial homology can be
is a collection of vector spaces (or R-modules more defined for any simplicial space, that is, a space that
generally) Ci , i  0, and linear maps (R-module is built up out of points, edges, triangles, tetrahedra,
maps) @i : Ci ! Ci1 with the property that for all i etc. Formula [5] remains valid for any compact
manifold or simplicial space.
@i  @i1 0 2
The scalar fields one tends to consider are the Singular Homology
rationals Q, reals R, complex numbers C, or a
Let X be any topological space, and let 4n be the
primary field Zp , while the most important ring R is
oriented n-simplex [v0 , . . . , vn ] spanned by the
the ring of integers Z though we will also consider
standard basis vectors vi in R n1 . The set of singular
localizations such as Z[1=p], which has the effect of
n-chains Sn (X) is the free abelian group on the set of
suppressing any p-primary torsion information.
continuous maps  : 4n ! X. The boundary of  is
Of particular interest are the elements in Ci that are
defined by the alternating sum of the restriction of 
mapped to zero by @i , the i-dimensional cycles, and
to the faces of 4n :
those that are in the image of @i1 , the i-dimensional
boundaries. Because of [2], every boundary is a X
n

cycle, and we may define the quotient vector space @n  : 1i jv0 ;...;^vi ;...;vn  6
i0
(R-module), the ith-dimensional homology,
One easily checks that the boundary of a boundary is
ker@i zero, and hence (S (X), @ ) defines a chain complex.
Hi C ; @ : 3
im@i1 Its homology is by definition the singular homology
(C , @ ) is exact if all its cycles are boundaries. H (X; Z) of X. For any simplicial space, the inclusion
Homology thus measures to what extent the of the simplicial chains into the singular chains
sequence [1] fails to be exact. induces an isomorphism of homology groups. In
particular, this implies that the simplicial homology
of a manifold, and hence its Euler characteristic do
Simplicial Homology
not depend on its triangulation.
A triangulation of a surface gives rise to its If in the definition of simplicial and singular
simplicial chain complex: Taking coefficients in homology we take free R-modules (where R may
Cohomology Theories 547

also be a field) instead of free abelian groups, we get and b in B to @c := @ n a = @n b. For example,
the homology H (X; R) of X with coefficients in R. consider two cones, A and B, on a space X and
The universal-coefficient theorem describes the identify them at the base X to define the suspension
homology with arbitrary coefficients in terms of the X of X. Then X = A [ B with A, B pt and A \
homology with integer coefficients. In particular, if R B X. The boundary map @ is then an isomorphism:
is a field of characteristic zero,
~ n X; R Hn1 X; R for all n  0
H 7
dim Hn X; R rk Hn X; Z
From this one can easily compute the homology of a
sphere. First note that
Basic Properties of Singular Homology ~ 0 X; Z Zk1
H
While simplicial homology (and the more efficient
where k is the number of connected components in
cellular homology which we will not discuss) is
X. Also, Sn Sn1    n S0 . Thus, by [7],
easier to compute and easier to understand geome-
trically, singular homology lends itself more easily to ~  Sn ; Z 0 for  6 n
Hn Sn ; Z Z and H 8
theoretical treatment.
If Y is a subspace of X, relative homology groups
1. Homotopy invariance. Any continuous map H (X, Y; R) can be defined as the homology of the
f : X ! Y induces a map on homology quotient complex S (X)=S (Y). When Y has a good
f : H (X; R) ! H (Y; R) which only depends on neighborhood in X (i.e., it is a neighborhood
the homotopy class of f. deformation retract in X), then, by the excision
In particular, a homotopy equivalence f : X ! Y theorem,
induces an isomorphism in homology. So, for exam- ~  X=Y; R
H X; Y; R H
ple, the inclusion of the circle S1 into the punctured
plane Cn{0} is a homotopy equivalence, and thus where X=Y denotes the quotient space of X with Y
identified to a point. There is a long exact sequence
Hi Cnf0g; R Hi S1 ; R
    ! Hn Y; R ! Hn X; R ! Hn X; Y; R
Z for i 0; 1
@
0 for i  2 ! Hn1 Y; R !    ! H0 X; Y; R ! 0
For the one point space we have H0 (pt; R) = R. Define This and the MayerVietoris sequence give two ways of
reduced homology by H ~  (X; R) := ker(H (X; R) ! breaking up the problem of computing the homology of
H (pt; R)). a space into computing the homology of related spaces.
~ i (pt; R) = 0 for all i. An iteration of this process leads to the powerful tool of
2. Dimension axiom. H
spectral sequences (see Spectral Sequences).
More generally, it follows immediately from the
definition of simplicial homology that the homology
of any n-dimensional manifold is zero in dimensions Relation to Homotopy Groups
larger than n. Let 1 (X, x0 ) denote the fundamental group of X
We mentioned in the introduction that homology relative to the base point x0 . These are the based
depends only on local data. This is made precise homotopy classes of based maps from a circle to X.
by the
If X is connected; then H1 X; Z is
3. MayerVietoris theorem. Let X = A [ B be the 9
the abelianization of 1 X; x0
union of two open subspaces. Then the following
sequence is exact: Indeed, every map from a (triangulated) sphere to
X defines a cycle and hence gives rise to a homology
   !Hn A \ B; R ! Hn A; R  Hn B; R
class. This defines the Hurewicz map h :  (X; x0 ) !
@
! Hn X; R! Hn1 A \ B; R H (X; Z). In general there is no good description of
its image. However, if X is k-connected with k  1,
!    ! H0 X; R ! 0
then h induces an isomorphism in dimension k 1
On the level of chains, the first map is induced by the and an epimorphism in dimension k 2.
diagonal inclusion, while the second map takes the Though [9] indicates that homology cannot distin-
difference between the first and second summands. guish between all homotopy types, the fundamental
Finally, @ takes a cycle c = a b in the chains of X group is in a sense the only obstruction to this.
that can be expressed as the sum of a chain a in A A simple form of the Whitehead theorem states:
548 Cohomology Theories

Theorem If a map f : X ! Y between two simpli- an associative, graded commutative ring: u [ v =


cial complexes with trivial fundamental groups (1)deg u deg v v [ u.
induces an isomorphism on all homology groups, The Kunneth theorem gives some geometric
then it is a homotopy equivalence. intuition for the cup product. A simple version
states: for spaces X and Y with H  (Y; R) a finitely
Warning: This does not imply that two simply
generated free R-module, the cup product defines an
connected spaces with isomorphic homology groups
isomorphism of graded rings
are homotopic! The existence of the map f inducing
this isomorphism is crucial and counterexamples can H  X; R R H  Y; R ! H  X
Y; R
easily be constructed.
For example, for a sphere, all products are trivial for
dimension reasons. Hence,
Dual Chain Complexes and Cohomology ^
H  Sn ; Z x 12
The process of dualizing itself cannot be expected to
yield any new information. Nevertheless, the coho- is an exterior algebra on one generator x of degree
mology of a space, which is obtained by dualizing its n. On the other hand, the cohomology of the
simplicial chain complex, carries important addi- n-dimensional torus T n is an exterior algebra on
tional structure: it possesses a product, and more- n degree-1 generators,
over, when the coefficients are a primary field, it is ^
an algebra over the rich Steenrod algebra. As with H  T n ; Z x1 ; . . . ; xn 13
homology we start with the algebraic setup. The dual pairing can be generalized to the slant or
Every chain complex (C , @ ) gives rise to a dual cap product
chain complex (C , @  ) where Ci = homR (Ci , R) is
the dual R-module of Ci ; because of [2], the \ : Hn X; R R H i X; R ! Hni X; R
composition of two dual boundary morphisms
@ i1 : Ci ! Ci1 is trivial. Hence we may define the defined on the chain level by the formula
ith dimensional cohomology group as (, ) 7! (j[v0 ,..., vi ] )j[vi ,..., vn ] .

ker @ i1
H i C ; @  : 10 Steenrod Algebra
im @ i
Evaluation (, ) 7! () descends to a dual pairing The cup product on the chain level is homotopy
commutative, but not commutative. Steenrod used
Hn C ; @ R H n C ; @  !R this defect to define operations
and when R is a field, this identifies the cohomology
Sqi : Hn X; Z2 ! H ni X; Z2
groups as the duals of the homology groups. More
generally, the universal-coefficient theorem relates for all i  0 which refine the cup-squaring opera-
the two. A simple version states: let (C , @ ) be a tion: when n = i, then Sqn (x) = x [ x. These are
chain complex of free abelian groups (such as the natural group homomorphisms which commute
simplicial or singular chain complexes) with finitely with suspension. Furthermore, they satisfy the
generated homology groups. Then, Cartan and Adem Relations
X
Hi C ; @  Hifree C ; @  Hi1
tor
C ; @ 11 Sqn x [ y Sqi x [ Sqj y
ijn
where Htor denotes the torsion subgroup of H and
!
Hfree denotes the quotient group H =Htor . i=2
X jk1
i j
Sq Sq Sqijk Sqk
Singular Cohomology k0 i  2k

The dual S (X) of the singular chain complex of a for i 2j
space X carries a natural pairing, the cup product,
The mod-2 Steenrod algebra A is then the free
[ : Sp (X) Sq (X) ! Spq (X) defined by
Z2 -algebra generated by the Steenrod squares
1 [ 2  Sqi , i  0, subject only to the Adem relations. With
: 1 jv0 ;...;vp  2 jvp ;...;vpq  the help of Adems relations, Serre and Cartan found
a Z2 -basis for A:
This descends to a multiplication
L on cohomology
groups and makes H  (X; R):= n0 Hn (X; R) into fSqI : Sqi1    Sqin jij  2ij1 for all jg
Cohomology Theories 549

The Steenrod algebra is also a Hopf algebra with where sign(p0 ) is 1 or 1 depending on whether f is
a commutative comultiplication  : A ! A A orientation preserving or reversing in a neighbor-
induced by hood of p0 . For example, a complex polynomial of
X degree d defines a map of the two-dimensional
Sqn : Sqi Sqj sphere to itself of degree d: a generic point has n
ijn points in its inverse image and the map is locally
The Cartan relation implies that the mod-2 orientation preserving. On the other hand, a map of
cohomology of a space is compatible with the Sn1 induced by a reflection of Rn reverses orienta-
comultiplication, that is, H  (X; Z2 ) is an algebra tion and has degree 1. Thus, as degrees multiply on
over the Hopf algebra A. There are odd primary composing maps, the antipodal map x 7! x has
analogs of the Steenrod algebra based on the degree (1)n . As an application we prove:
reduced pth power operations Every tangent vector field on an even-dimensional
sphere Sn1 has a zero.
Pi : H n X; Zp ! H n2ip1 X; Zp
Proof Assume v(x) is a vector field which is nonzero
with similar properties to A. for all x 2 Sn1 . Then x is perpendicular to v(x), and
One of the most striking applications of the after rescaling, we may assume that v(x) has length 1.
Steenrod algebra can be found in the work of The function F(x, t) = cos (t)x sin (t) v(x) is a well-
Adams on the vector fields on spheres problem: defined homotopy from the identity map (t = 0) to
for each n, find the greatest number k, denoted K(n), the antipodal map (t = ). But this is impossible as
such that there is a k-field on the (n  1)-sphere Sn1 . homotopic maps induce the same map in (co)homo-
Recall that a k-field is an ordered set of k pointwise logy and we have already seen that the degree of the
linear independent tangent vector fields. If we write n identity map is 1 while the degree of the antipodal
in the form n = 24ab (2s 1) with 0 b < 4, Adams map is (1)n = 1 when n is odd.
proved that K(n) = 2b 8a  1. In particular, when n It is well known that two self-maps of a sphere of
is odd, K(n) = 0. We give an outline of the proof for
any dimension are homotopic if and only if they
this special case in the next section.
have the same degree, that is, n (Sn ) Z for n  1.
The failure of associativity of the cup product at When M is not orientable, [M] still defines a cycle
the chain level gives rise to secondary operations, in homology with Z2 -coefficients, and [M]\
the so-called Massey products. defines an isomorphism between the cohomology
and homology with Z2 coefficients.
As [M] represents a homology class, so does every
other closed (orientable) submanifold of M. It is
Cohomology of Smooth Manifolds however not the case that every homology class
A smooth manifold M of dimension n can be can be represented by a submanifold or linear
triangulated by smooth simplices  : n ! M. If M combinations of such.
is compact, oriented, without boundary, the sum of
Cohomology is a contravariant functor. Poincare
these simplices define a homology cycle [M], the
duality however allows us to define, for any f : M0 ! M
fundamental class of M. The most remarkable
between oriented, compact, closed manifolds of arbi-
property of the cohomology of manifolds is that
trary dimensions, a transfer or Umkehr map,
they satisfy Poincare duality: taking cap product
with [M] defines an isomorphism: f ! : D1 f D0 : H M0 ; Z ! H c M; Z

D: M\ : H k M; Z ! Hnk M; Z for all k 14 which lowers the degree by c = dim M0  dim M. It
satisfies the formula
In particular, for connected manifolds, H n (M; Z) Z;
and every map f : M0 ! M between oriented, compact f ! f  x [ y x [ f ! y
closed manifolds of the same dimension has a degree:
for all x 2 H  (M; Z) and y 2 H  (M0 ; Z). When f is a
f  : H  (M; Z) ! H  (M0 ; Z) is multiplication by an
covering map then f ! can be defined on the chain
integer deg(f ), the degree of f. For smooth maps, the
level by
degree is the number of points in the inverse image of
X 
a generic point p 2 M counted with signs: f ! x : x ~
X f ~

degf signp0
p0 2f 1 p where x 2 C (M0 ) and  2 C (M).
550 Cohomology Theories

de Rham Cohomology Hodge Decomposition


If x1 , . . . , xn are the local coordinates of Rn , define an Let M be a compact oriented Riemannian manifold of
algebra  to be the algebra generated by symbols dimension n. The Hodge star operator, , associates to
dx1 , . . . , dxn subject to the relations dxi dxj = dxj dxi every q-form an (n  q)-form. For Rn and any
for all i, j. We say dxi1    dxiq has degree q. The orthonormal basis {e1 , . . . , en }, it is defined by setting
differential forms on Rn are the algebra
e1 ^    ^ eq : ep1 ^    ^ en
 Rn : fC1 functions on Rn g R 
L where one takes if the orientation defined by
The algebra  (R n ) = nq = 0 q (Rn ) is naturally {e1 , . . . , en } is the same as the given one, and 
graded by degree. There is a differential operator otherwise. Using local coordinate charts this defini-
d : q (Rn ) ! q1 (R n ) defined by tion can be extended to M. Clearly,  depends on the
0 n P chosen metric and orientation of M. If M is
1. if f 2  P(R ), then df = P(@f =@xi )dxi compact, we may define an inner product on the
2. if ! = fI dxI , then d! = dfI dxI
q-forms by
I stands here for a multi-index. For example, in R3 Z
0
the differential assigns to 0-forms ( = functions) the !; ! : ! ^ !0
gradient, to 1-forms the curl, and to 2-forms the M

divergence. An easy exercise shows that d2 = 0 and With respect to this inner product  is an isometry.
the qth de Rham cohomology of Rn is the vector space Define the codifferential via
q ker d : q Rn ! q1 Rn  : 1npn1  d : q M ! q1 M
Hde Rn
R im d : q1 Rn ! q Rn
and the LaplaceBeltrami operator via
More generally, the de Rham complex  (M) and
  : d d
its cohomology Hde R (M) can be defined for any
smooth manifold M. The codifferential satisfies 2 = 0 and is the adjoint
Let  be a smooth, singular, real (q 1)-chain on of the differential. Indeed, for q-forms ! and (q 1)-
M, and let ! 2 q (M). Stokes theorem then says forms !0 :
Z Z
d!; !0 !; !0 15
! d!
@ 
It follows easily that  is self-adjoint, and
and therefore integration defines a pairing between furthermore,
the qth singular homology and the qth de Rahm
! 0 if and only if d! 0 and ! 0 16
cohomology of M. This pairing is exact and thus de
Rahm cohomology is isomorphic to singular coho- A form ! satisfying ! = 0 is called harmonic. Let
mology with real coefficients: Hq denote the subspace of all harmonic q-forms. It is
   not hard to prove the Hodge decomposition theorem:
Hde R M H M; R H M; R
q Hq  im d  im 
Let c (M) denote the subcomplex of compactly
supported forms and Hc (M) its cohomology. Integra- Furthermore, by adjointness [15], a form ! is closed
tion with respect to the first i coordinates defines a map only if it is orthogonal to im . On calculating the
de Rham cohomology we can also ignore the
c Rn ! i
c R
ni
summand im d and find that:
which induces an isomorphism in cohomology; note in Each de Rham cohomology class on a compact
particular Hcn (Rn ) = R. More generally, when E ! M oriented Riemannian manifold M contains a unique
is an i-dimensional orientable, real vector bundle over q
harmonic representative, that is, Hde q
R (M) H .
a compact, orientable manifold M, integration over
the fiber gives the Thom isomorphism: Warning: This is an isomorphism of vector spaces
and in general does not extend to an isomorphism of
Hc E Hci M Hde
i
R M algebras.
f
For orientable fiber bundles F ! M0 ! M with
compact, orientable fiber F, integration over the Examples
fiber provides another definition of the transfer map
 0 i
We list the cohomology of some important
f ! : Hde R M ! Hde R M examples.
Cohomology Theories 551

Projective Spaces BG can be constructed as the space of G-orbits of


n a contractible space EG on which G acts freely.
Let RP be real projective space of dimension n. Then,
Thus, for example,
H  RPn ; Z2 Z2 x=xn1 BZ R=Z S1
is a stunted polynomial ring on a generator x of BZ2 lim Sn =Z2 RP1
n!1
degree 1.
Similarly, let CPn and HPn denote complex and BS1 lim S2n1 =S1 CP1
n!1
quaternionic projective space of real dimensions 2n
and more generally, infinite Grassmannian mani-
and 4n, respectively. Then,
folds are classifying spaces for linear groups. When
H  CPn ; Z Zy=yn1 G is a compact connected Lie group,
H HPn ; Z Zz=zn1 H  BG; Q Qx2d1 ;. . . ; x2dl 

are stunted polynomial rings with deg(y) = 2 and with di as above and jxi j = i. In particular,
deg(z) = 4. H  BSO2k 1; Z1=2
Z1=2p1 ; p2 ; . . . ; pk 
Lie Groups 
H BSO2k; Z1=2
Let G be a compact, connected Lie group of rank l, Z1=2p1 ; p2 ; . . . ; pk1 ; ek 
that is, the dimension of the maximal torus of G is l. 
H BUk; Z Zc1 ; c2 ; . . . ; ck 
Then,
where the Pontryagin, Euler, and Chern classes have
H  G; Q degree jpi j = 4i, jek j = 2k, and jci j = 2i, respectively.
^
a2d1 1 ; a2d2 1 ; . . . ; a2dl 1 
Q Moduli Spaces

where jai j = i and d1 , . . . , dl are the fundamental Let Mng be the space of Riemann surfaces of genus g
degrees of G which are known for all G. Often this with n ordered, marked points. There are naturally
structure lifts to the integral cohomology. In defined classes i and e1 , . . . , en of degree 2i and 2,
particular we have: respectively. By HarerIvanov stability and the
recent proof of the Mumford conjecture (Madsen

Hfree SO2k 1; Z Weiss, preprint 2004), there is an isomorphism up to
^ degree  < 3g=2 of the rational cohomology of Mng
a3 ; a7 ; . . . ; a4k1 
with
Z

Hfree SO2k; Z Q1 ; 2 ; . . . Qe1 ; . . . ; en 
^ The rational cohomology vanishes in degrees  >
a1 ; a7 ; . . . ; a4k5 ; a2k1 
4g  5 if n = 0, and  > 4g  4 n if n > 0. Though
Z
^ the stable part of the cohomology is now well under-

H Uk; Z a1 ; a3 ; . . . ; a2k1  stood, the structure of the unstable part, as proposed by
Z Faber (Viehweg 1999), remains conjectural.

Classifying Spaces Generalized Cohomology Theories


For any group G there exists a classifying space BG, The three basic properties of singular homology
well defined up to homotopy. Classifying spaces appropriately dualized, hold of course also for
are of central interest to geometers and topologists cohomology. Furthermore, they (essentially) deter-
for the set of isomorphism classes of principal mine (co)homology uniquely as a functor from the
G-bundles over a space X is in one-to-one corre- category of simplicial spaces and continuous func-
spondence with the set of homotopy classes of maps tions to the category of abelian groups. If we drop
from X to BG. In particular, every cohomology class the dimension axiom (2), we are left with homotopy
c 2 H  (BG; R) defines a characteristic class of invariance (1), and the MayerVietoris sequence (3).
principle G-bundles E over X: if E corresponds to Abelian group valued functors satisfying (1) and (3)
the map fE : X ! BG, then c(E) := fE (c). are so called generalized (co)homology theories.
552 Cohomology Theories

K-theory and cobordism theory are two well-known Elliptic Cohomology


examples but there are many more.
Quillen proved that complex cobordism theory is
universal for all complex oriented cohomology
K-Theory
theories, that is, those cohomology theories that
The geometric objects representing elements in com- allow a theory of Chern classes. In a complex
plex K-theory K0 (X) are isomorphism classes of finite oriented theory, the first Chern class of the tensor
dimensional complex vector bundles E over X. Vector product of two line bundles can be expressed in
bundles E, E0 can be added to form a new bundle terms of the first Chern class of each of them via a
E  E0 over X, and K0 (X) is just the group completion two-variable power series: c1 (E E0 ) = F(c1 (E),
of the arising monoid. Thus, for example, for the point c1 (E0 )). F defines a formal group law and Quillens
space we have K0 (pt) = Z. Tensor product of vector theorem asserts that the one arising from complex
bundles E E0 induces a multiplication on K-theory cobordism theory is the universal one.
making K (X) into a graded commutative ring. Vice versa, given a formal group law, one may try to
In many ways K-theory is easier than cohomol- construct a complex oriented cohomology theory from
ogy. In particular, the groups are 2-periodic: all even it. In particular, an elliptic curve gives rise to a formal
degree groups are isomorphic to the reduced group law and an elliptic cohomology theory. Hopkins
K-theory group K ~ 0 (X) := coker(K0 (pt) = Z ! K0 (X)), et al. have described and studied an inverse limit of
and all odd degree groups are isomorphic to these elliptic theories, which they call the theory of
K1 (X) := K ~ 0 (X). topological modular forms, tmf, as the theory is closely
The theory of characteristic classes gives a close related to modular forms. In particular, there is a
relation between the two cohomology theories. The natural map from the groups tmf 2n (pt) to the group of
Chern character map, a rational polynomial in the modular forms of weight n over Z. After inverting a
Chern classes, defines certain element (related to the discriminant), the
theory becomes periodic with period 242 = 576.
ch : K0 X Z Q ! H even X; Q Witten (1998) showed that the purely theoreti-
:  H 2k X; Q cally constructed elliptic cohomology theories
k0
should play an important role in string theory: the
an isomorphism of rings. Thus, the K-theory and index of the Dirac operator on the free loop space of
cohomology of a space carry the same rational certain manifolds should be interpreted as an
information. But they may have different torsion element of it. But unlike for ordinary cohomology,
parts. This became an issue in string theory when K-theory, and cobordism theory we do not (yet)
D-brane charges which had formerly been thought know a good geometric object representing elements
of as differential forms (and hence cohomology in this theory without which its use for geometry
classes) were later reinterpreted more naturally as and analysis remains limited. Segal speculated some
K-theory classes by Witten 1998) 20 years ago that conformal field theories should
define such geometric objects. Though progress has
There are real and quaternionic K-theory groups
been made, the search for a good geometric
which are 8-periodic.
interpretation of elliptic cohomology (and tmf)
remains an active and important research area.
Cobordism Theory
The geometric objects representing an element in the
Infinite Loop Spaces
oriented cobordism group nSO (X) are pairs (M, f )
where M is a smooth, orientable n-dimensional Browns representability theorem implies that for
manifold and f : M ! X is a continuous map. Two each (reduced) generalized cohomology theory h we
pairs (M, f) and (M0 , f 0 ) represent the same cobord- can find a sequence of spaces E such that hn (X) is
ism class if there exists a pair (W, F) where W is an the set of homotopy classes [X, En ] from the space X
(n 1)-dimensional, smooth, oriented manifold to En for all n. Recall that the MayerVietoris
with boundary @W = M [ M0 such that F: W ! X sequence implies that hn (X) hn1 (X). The sus-
restricts to f and f 0 on the boundary @W. Disjoint pension functor  is adjoint to the based loop space
union and Cartesian product of manifolds define an functor  which takes a space X to the space of
addition and multiplication so that SO (X) is a based maps from the circle to X. Hence,
graded, commutative ring.
hn X X; En  X; En1 
Similarly, unoriented, complex, or spin cobordism
groups can be defined. X; En1 
Combinatorics: Overview 553

and it follows that every generalized cohomology example, the category of finite-dimensional,
theory is represented by an infinite loop space complex vector spaces and their isomorphisms
gives rise to Z
BU. To give another example, in
E0 E1    n En    quantum field theory, one considers the (d 1)-
Vice versa, any such infinite loop space gives rise to dimensional cobordism category with objects the
a generalized cohomology theory. compact, oriented d-dimensional manifolds, and
One may think of infinite loop spaces as the their (d 1)-dimensional cobordisms as morphisms.
abelian groups up to homotopy in the strongest Disjoint union of manifolds makes this category
sense. Indeed, ordinary cohomology with integer into a symmetric monoidal category. The associated
coefficients is represented by infinite loop space and hence generalized cohomol-
ogy theory has recently been identified as a (d 1)-
Z S1 2 CP1    n Kn; Z    dimensional slice of oriented cobordism theory
(Galatius et al. preprint 2005).
where by definition the EilenbergMacLane space
K(n, Z) has trivial homotopy groups for all dimen- See also: Characteristic Classes; Equivariant
sions not equal to n and n K(n, Z) = Z. Complex Cohomology and the Cartan Model; Functional Equations
K-theory is represented by and Integrable Systems; Index Theorems; Intersection
Theory; K-Theory; Moduli Spaces: An Introduction;
Z
BU U 2 BU 3 U    Riemann Surfaces; Spectral Sequences.

This is Botts celebrated periodicity theorem.


Finally, oriented cobordism theory is represented by Further Reading
1 n
 MSO : lim  Thn Adams F (1978) Infinite Loop spaces. Annals of Mathematical
n!1
Studies 90: PUP.
Bott R and Tu L (1982) Differential Forms in Algebraic
where n ! BSOn is the universal n-dimensional
Topology. Springer.
vector space over the Grassmannian manifold of Galatius, Madsen, Tillmann, Weiss (2005).
oriented n-planes in R1 , and Th(n ) denotes its Hatcher A (2002) Algebraic Topology, (http://www.math.cornell.
Thom space. edu). Cambridge: Cambridge University Press.
A good source of infinite loop spaces are Madsen, Weiss (2004).
symmetric monoidal categories. Indeed every infinite Mosher R and Tangora M (1968) Cohomology Operations and
Applications in Homotopy Theory. Harper and Row.
loop space can be constructed from such a category: Viehweg (1999) Aspects of Mathematics E33.
the symmetric monoidal structure gives the corre- Witten (1998) Journal of Higher Energy Physics 12.
sponding homotopy abelian group structure. For Witten (1998) Springer Lecture Notes in Mathematics, vol. 1326.

Combinatorics: Overview
C Krattenthaler, Universitat Wien, Vienna, Austria technique, RedfieldPolya theory, methods of solving
2006 Elsevier Ltd. All rights reserved.
functional equations of combinatorial origin, meth-
ods of asymptotic enumeration, the theory of heaps,
and the transfer matrix method. The subsequent
sections then discuss specific problem circles with
Introduction
relation to statistical physics more closely. We discuss
Combinatorics is a vast field which enters particularly lattice path problems, explain Kasteleyns method of
in a crucial way in statistical physics. There, it is enumerating perfect matchings and tilings, present
particularly the enumerative problems that are of the fundamental theorems on nonintersecting paths,
importance. Therefore, in this article, we shall mainly and provide an introduction into the research field
concentrate on the enumerative aspects of combina- involving vicious walkers, plane partitions, rhombus
torics. We first recall the basic terminology, in tilings, alternating sign matrices, six-vertex config-
particular the basic combinatorial objects and num- urations, and fully packed loop configurations.
bers, together with the simplest facts about them. We Finally, we explain how one should treat binomial
then provide introductions into the most important and hypergeometric series, which frequently arise in
techniques of enumeration: the generating function enumeration problems.
554 Combinatorics: Overview

Basic Combinatorial Terminology n is 2n1 . The number of compositions


 of n with
n1
exactly k summands is k1. A partition of a
In this section we review basic combinatorial
positive integer n is a representation of n as a sum
notions and facts. The reader can find a more
n = 1 2    k of other positive integers i ,
detailed treatment and further results, for example,
where the order of the summands does matter. Thus,
in chapter 1 of Stanley (1986).
we may assume that the summands are ordered,
The basic combinatorial choice problems and
n 1  2      k > 0. This is the motivation
their solutions are: there n are 2 subsets of an to write partitions most often in the form of
n-element set. There are k k-element subsets of an
tuples (1 , 2 , . . . , k ) the entries of which are
n-element set. Given an alphabet A = {a1 , a2 , . . . }, a
weakly decreasing. The summands of a partition
word is a (finite or infinite) sequence of elements of
are called the parts of the partition. Let p(n) denote
A. Usually, a finite word is written in the form
the number of partitions of n. These numbers are
w1 w2 . . . wn (with wi 2 A). Out of the letters
given by
{1, 2, . . . , k}, one can build kn words of length n.
Out of the letters {1, 2, . . . , k}, one can build ( nk1
n ) X
1
1
increasing sequences of length n. The number of pnxn Q1 i
n0 i1 1  x
permutations of an n-element set is n!. The set of
permutations of {1, 2, . . . , n} is denoted by S n . The If p(n, k) denotes the number of partitions of n into
number of permutations of an n-element set with at most k parts, then we have
exactly k cycles is the Stirling number of the first X
1
1
kind, s(n, k). These numbers are given as the pn; kxn Qk
i
expansion coefficients of falling factorials, n0 i1 1  x

X
n Finally, if p(n, k, m) denotes the number of parti-
xx  1    x  n 1 1nk sn; kxk tions of n into at most k parts, all of which are at
k0 most m, then
X
or in form of the double (formal) power series pn; k; mxn
X yn n0
sn; kxk 1 yx
n! 1  xkm 1  xkm1    1  xm1
n;k0
1  xk 1  xk1    1  x
A partition of a set is a collection of pairwise
The expression on the right-hand side is called
disjoint subsets the union of which is the complete km
set. The subsets in the collection are called the q-binomial coefficient, and is denoted by [ ]x .
k
blocks of the partition. The total number of Partitions are frequently encoded in terms of their
partitions of an n-element set is the Bell number Ferrers diagrams. The Ferrers diagram of a partition
Bn . These numbers are given by  = (1 , 2 , . . . ,  ) is an array of cells with left-
justified rows and i cells in row i. For example, the
X xn x diagram in Figure 1 is the Ferrers diagram of the
Bn ee 1
n0
n! partition (3, 3, 2).
A lattice path P in Zd (where Z denotes the set of
The number of partitions of an n-element set into integers) is a path in the d-dimensional integer
exactly k blocks is the Stirling number of the second lattice Zd which uses only points of the lattice, that
kind, S(n, k). These numbers are given by is, it is a sequence (P0 , P1 , . . . , Pl ), where Pi 2 Zd for
! ! !
X yn y
all i. The vectors P0 P1 , P1 P2 , . . . , Pl1 Pl are called
Sn; kxk exe 1 the steps of P. The number of steps, l, is called the
n!
n;k0 length of P. Figure 2 shows a lattice path in Z2 of
length 11.
or, explicitly, by
 
1X k
k n
Sn; k 1kj j
k! j0 j

A composition of a positive integer n is a represen-


tation of n as a sum n = s1 s2    sk of other
positive integers si , where the order of the sum-
mands matters. The total number of compositions of Figure 1 A Ferrers diagram.
Combinatorics: Overview 555

Figure 2 A Motzkin path.

A Dyck path is a lattice path in the integer


plane Z2 consisting of up-steps (1, 1) and down-steps
(1, 1), which starts at the origin, never passes below
the x-axis, and ends on the x-axis. See Figure 3 for an
example.
The number of Dyck paths of length 2n is the
Figure 4 A Schroder path.
Catalan number
 
1 2n
Cn vertical steps (0, 1), which starts at the origin, never
n1 n passes below the diagonal x = y, and ends on the
The generating function (see the next section for an diagonal x = y. See Figure 4 for an example.
introduction to the theory of generating functions) The number of Schroder paths of length n is the
for these numbers is (large) Schroder number
p
X1
1  1  4x X 1 2kn k
n Sn
Cn x 1
n0
2x k0
k1 k 2k

The reader is referred to exercise 6.19 in Stanley The generating function for these numbers is
(1999) for countless occurrences of the Catalan
p
numbers. X1
1  x  1  6x x2
n
A Motzkin path is a lattice path in the integer Sn x 3
n0
2x
plane Z2 consisting of up-steps (1, 1), level steps
(1, 0), and down-steps (1,1), which starts at the The reader is referred to exercise 6.39 in Stanley
origin, never passes below the x-axis, and ends on (1999) for numerous occurrences of the Schroder
the x-axis. The path in Figure 2 is in fact a Motzkin numbers.
path. The number of Motzkin paths of length n is There is another famous sequence of numbers
the Motzkin number which we did not touch yet, the Fibonacci numbers
X 1 2k n  Fn . They are given by
Mn
k0
k1 k 2k p!n1
1 1 5
Fn p
The generating function for these numbers is 5 2
p
X 1
1  x  1  2x  3x2
n
Mn x 2 with generating function
n0
2x2
X
1
1
Fn xn 4
The reader is referred to exercise 6.38 in Stanley (1999) n0
1  x  x2
for numerous occurrences of the Motzkin numbers.
A Schroder path is a lattice path in the integer They also occur in numerous places. For example,
plane Z2 consisting of horizontal steps (1, 0) and the number Fn counts all paths on the integers Z
from 0 to n with steps (1, 0) and (2, 0).
An undirected graph G consists of vertices and
edges. An edge is a two-element subset of the
vertices, which, however, is thought of as a line or
curve connecting the two vertices. See Figure 5a
for an example. The usual notation for a graph G
is G = (V, E), where V is the set of vertices and E
Figure 3 A Dyck path. is the set of edges of G. A graph is planar if it is
556 Combinatorics: Overview

5
(ordinary) generating function for A is the formal
power series
4 X X
1
2
FA x xjaj an x n
a2A n0

3 (formal means that x is just an indeterminate, not


a real or complex number. One can compute with
1
formal power series in the same way as with analytic
(a) (b) series, only that convergence issues do not arise,
Figure 5 (a) An undirected graph. (b) A directed graph. respectively that convergence has a different
meaning; cf. Stanley (1998, section 1.1)) Typical
embedded in the plane (sphere) in such a way that examples are Sets (the collection containing all
the curves which mark the edges do not intersect unlabeled sets, that is, all objects of the form
in their interiors. There can be several different {, , . . . , }, including the empty set, where the size
ways to embed the same graph in the plane (or in of {, , . . . , } is the number of s), Sequences
another surface). When we speak of a planar (the collection containing all unlabeled sequences,
graph then we assume the graph already to be that is, all objects of the form (, , . . . , ), including
embedded in a given way. For example, the graph the empty sequence), Cycles (unlabeled cycles),
in Figure 5 is not a planar graph, by its drawing. with respective generating function
However, there is a different embedding which is 1
planar (namely, all embeddings which put the FSets x FSequences x
1x 5
vertex v3 above the vertex v5 and leave the other x
vertices as they are). A tree is a graph without any FCycles x
1x
cycles.
A directed graph (or digraph) G consists of or Trees (unlabeled trees).
vertices and arcs (which are sometimes also called If A and B are two sets of objects, one can define
several other sets of objects using them. The union
directed edges). An arc is a pair of vertices, which,
of A and B, written A [ B, has as a groundset the
however, is thought of an arrow pointing from the
disjoint union of A and B, and the size of an element
first vertex of the pair to the second. See Figure 5b
from A is its size in A, while the size of an element
for an example. The usual notation for a directed
graph G is again G = (V, E), where V is the set of from B is its size in B. We have
vertices and E is the set of arcs of G. All other FA[B x FA x FB x 6
notions explained for undirected graphs have analo-
gous meanings for directed graphs. The product of A and B, written A  B, has as a
Graphs can be labeled, in which case each vertex groundset the set of pairs A  B, and the size of an
is assigned a label, or unlabeled. The (undirected) element (a, b) from A  B is the sum of the sizes of a
graph in Figure 5a is labeled, whereas the (directed) (in A) and of b (in B). We have
graph in Figure 5b is unlabeled. FAB x FA x  FB x 7
The substitution of two sets A and B of objects
can only be defined in certain circumstances, and
Generating Functions
only in certain more restrictive circumstances the
Generating functions are the very basic tools of generating function for the substitution can be
enumeration. For introductions to this technique, computed by substituting the generating functions
from different points of view, the reader is referred for A and B. Let us assume that any object a from
to Bergeron et al. (1998), Flajolet and Sedgewick A of size n, by its structure, has n atoms (nodes). For
(chapter 1 in the reference listed in Further read- example, if A is a certain set of trees, where the size
ing section), and Stanley (1998, chapter 1; 1999, of a tree is the number of leaves in the tree, then we
chapter 4). may take, as the atoms, the leaves of the tree. In this
Let A be a set of (unlabeled) objects. Each object situation, the substitution of B in A, denoted by
a in A has a certain size, jaj, which is a non-negative A(B), is the set of objects which arises by replacing
integer. Let us also assume that there is only a finite the atoms of objects from A by objects from B in all
number of objects from A of a given size. Let an be possible ways. The size of an object from A(B) is the
the number of objects from A of size n. The sum of the sizes of the objects from B that it
Combinatorics: Overview 557

contains. In order that A(B) contains only a finite If A and B are two sets of objects, one defines
number of objects of a given size, we must assume again several other sets of objects using them. The
that B contains no elements of size 0. If, in addition, union of A and B, written A [ B, has as a groundset
the atoms of any element a from A inherit an order the disjoint union of A and B, and the size of an
(e.g., if A is a set of binary trees, then the leaves of a element from A is its size in A, while the size of an
binary tree are ordered in a natural way from left element from B is its size in B. We have
to right), then we have
EA[B x EA x EB x 13
FAB
x FA FB x 8
To define the product of A and B, written A  B,
However, this equation is not true in general. The we cannot simply take A  B as a groundset, we
general formula comes out of RedfieldPolya theory must also say something about the labeling of the
(see [21] and [24]) and requires the notion of cycle objects. So, as a groundset we take all pairs (a, b)
index series. For example, if B is the set of connected with a 2 A and b 2 B, but labeled in all possible
(unlabeled) graphs, A is Sets, so that A(B) is the ways by 1, 2, . . . , jaj jbj such that the order of
set of all (connected and disconnected) graphs, then labels assigned to a respects the original order of
[8] is not true, but what is true is labels of a, and the same for b. The size of such an
  element (a,b) is again the sum of the sizes of a (in A)
FSetsB exp FB x 12 FB x2 13 FB x3    9 and of b (in B). We have
This holds, in fact, for any set B of unlabeled objects. EAB x EA x  EB x 14
(This is seen by combining [24], [17], and [21].)
Next we deal with the enumeration of labeled Since, in the labeled world, objects come automati-
objects. Let A be a set of labeled objects, again, each cally with atoms, the substitution of two sets A and
object a with a certain size jaj which is a non- B of objects can now always be defined. The
negative integer. Labeled means that each object substitution of B in A, denoted by A(B), is the set
of size n, by its structure, comes with n atoms of objects which arises by replacing the atoms of
(nodes) which are labeled 1, 2, . . . , n. For example, objects from A by objects from B in all possible
A may be the set of all labeled graphs, where the ways, and labeling the substituted
P objects in all
size of a graph is the number of its vertices, and possible ways by 1, 2, . . . , b jbj (the sum being
where the vertices are labeled 1, 2, . . . , n. Again, we over the objects from B which were put in the places
assume that there is only a finite number of objects of the atoms) that are consistent with the original
from A of a given size. Let an be the number of labelings of the objects from B. The size of an object
objects from A of size n. The exponential generating from A(B) is the sum of the sizes of the objects from
function for A is the formal power series B that it contains. In order that A(B) contains only a
finite number of objects of a given size, we must
X xjaj X
1
xn assume that B contains no elements of size 0. Then
EA x an we have
a2A
jaj! n0
n!
EAB x EA EB x 15
Typical examples are Sets (the collection containing
all labeled sets, that is all objects of the form An example of a composition is
{1, 2, . . . , n}, including the empty set), Permuta-
Permutations SetsCycles
tions, Cycles (labeled cycles), with respective
generating functions Thus, from [15] we have
ESets x expx 10 EPermutations x ESets ECycles x
1 corresponding to the identity
EPermutations x 11
1x
1
1 explog 1=1  x
ECycles x log 12 1x
1x
Another manifestation of the composition rule is, for
or Trees (labeled trees). The explicit form of the example, the fact (which is sometimes called the
generating function for Trees is discussed in the exponential principle) that, if one takes the log of
section Solving equations for generating functions: the partition function for some maps, the result is
the Lagrange inversion formula and the kernel the partition function for the connected maps among
method. them.
558 Combinatorics: Overview

All of the above can be generalized to a weighted our familiar families of objects, compact expressions
setting. Namely, if A is a set of objects (labeled or are available:
unlabeled), and if w : A ! R is a weight function  x2 x3 
from A into some ring R, then all of the above ZSets x1 ; x2 ; . . . exp x1    17
2 3
remains true, if we replace the definitions of FA (x)
and EA (x) above by the weighted sums Y1
1
ZPermutations x1 ; x2 ; . . . 18
X X1 1  xi
FA x waxjaj i i1 1
ZCycles x1 ; x2 ; . . . log 19
a2A i1
i 1  xi

and where (i) is the Euler totient function (the number


of positive integers j  i relatively prime to i).
X xjaj What makes the cycle index series so fundamental
EA x wa
a2A
jaj! is the fact that the generating functions from the last
section are specializations of it. Namely, the
respectively, if in the definition of the union of A exponential generating function for A is equal to
and B we define the weight of an object to be its
weight in A, respectively B, if in the definition of the EA x ZA x; 0; 0; . . . 20
product of A and B we define the weight of an If, given the set of labeled objects A, we produce a
object (a, b) to be the product of the weights of a set of unlabeled objects A~ by taking all the objects
and b, and if in the definition of the substitution we from A but forgetting the labels, then the ordinary
define the weight of an object in A(B) as the product generating function for A~ is another specialization
of the weights of the objects from B that were put in of the cycle index series,
place of the atoms.
FA~x ZA x; x2 ; x3 ; . . . 21

The cycle index series satisfies the following


RedfieldPolya Theory of Colored properties with respect to union, product and
Enumeration composition of sets of objects:
The natural and uniform environment for the
ZA[B x1 ; x2 ; . . . ZA x1 ; x2 ; . . .
separate treatment of generating functions for
unlabeled and labeled objects in the last section is ZB x1 ; x2 ; . . . 22
the theory for counting colored objects founded by
Redfield and Polya, in the modern treatment ZAB x1 ; x2 ; . . . ZA x1 ; x2 ; . . .
through cycle index series due to Joyal. We refer  ZB x1 ; x2 ; . . . 23
the reader to Bergeron et al. (1998, appendix 1),
de Bruijn (1981), and Stanley (1999, chapter 7) for ZAB x1 ; x2 ; . . . ZA ZB x1 ; x2 ; x3 ; . . .;
further reading. ZB x2 ; x4 ; x6 . . .;
Let A be a set of labeled objects with the ZB x3 ; x6 ; x9 ; . . .; . . . 24
constraint that there is only a finite number of
objects of a given size. The cycle index series for A is Similar to the theory of generating functions
the formal multivariable series surveyed in the last section, one can also develop a
weighted version of the cycle index series. Given a set
ZA x1 ; x2 ; . . . of labeled objects A, where each object a is assigned a
X 1
1 X c  c  c  16 weight w(a), one changes the definition [16] insofar as
fix Ax11 x22 x33 . . .
n0
n! 2S
fix
P  (A) gets replaced by the weighted sum
(a) = a w(a), where (a) means the object arising
n

from a by permuting the labels according to . Then all


where fix (A) is the number of objects a from A that
the above formulas remain true in this weighted setting.
remain invariant when the labels are permuted
Cycle index series are instrumental in the enu-
according to the permutation  (in particular, if  2
meration of colored objects. The basic situation is
Sn , the size of a must be n in order that  can be
that we have given a set A~ of unlabeled objects so
applied to the labels), and where ci () denotes the
that every object of size n comes with n atoms
number of cycles of length i of .
(nodes). For example, we may think of A~ as the set
In most cases, it is difficult to obtain compact
of cycles. We are now going to color each atom by a
expressions for the cycle index series. However, for
Combinatorics: Overview 559

color from the set of colors C. The question that we Sedgewick, (section VII.5 of the reference in Further
pose is: how many different colored objects of a reading section) for further reading.
given size are there? In our example, if C consists of In many situations it will happen that, when we
the two colors black and white, then we are apply the methods from the last section, we end up
asking the question of how many necklaces one can with aPfunctional equation for the generating function
make out of n pearls that can be black or white. In f (x) = 1 n
n = 0 fn x that we wanted to compute. For
terms of generating functions, we want to compute example, if tn denotes the number of labeled rooted
X trees
A~x xjcj P1 with n
n nodes, and if we write T(x) =
t
n=1 n x =n!, then, by applying a straightforward
c
decomposition of a tree into its root and its set of
where the sum is over all colored objects c that one subtrees attached to the root, we obtain the equation
can obtain by coloring the objects from A. ~
The central result of RedfieldPolya theory is that, Tz z expTz 25
if A is the set of labeled objects that one obtains How does one solve such an equation? As a matter
from A~ by labeling the objects of A~ in all possible of fact, for T(z), there is no expression in terms of
ways, then known functions. However, the Lagrange inversion
A~x ZA jCjx; jCjx2 ; jCjx3 ; . . . formula enables one to find the coefficients tn =n! of
T(z) explicitly. The theorem reads as follows.
There is again a weighted version. One allows the Theorem Let g(x) be a formal Laurent series
objects a from A~ to have weight w(a) 2 R. More- containing only a finite number of negative powers
over, one assumes a weight function f : C ! R on of x, and let f (x) be a formal power series without
the colors with values in the ring R. One defines the constant term. If we expand g(x) in powers of f (x),
weight of a colored object obtained by coloring X
the atoms of a to be w(a) multiplied by the product gx ck f k x 26
of all f (), where  ranges over all the colors of the k
atoms (including repetitions of colors). Let A~(w, f ) then the coefficients cn are given by
denote the sum of all the weights of all colored
objects obtained from A. ~ Then 1
cn x1 g0 xf n x for n 6 0 27
! n
X X 2
X 3
A~w; f ZA f c; f c ; f c ; . . . or, alternatively, by
c2C c2C c2C
cn x1 gxf 0 xf n1 x 28
We remark that these results cover also the case of n n
Here, [x ]h(x) denotes the coefficient of x in the
enumeration of objects under a group action. This
power series h(x).
includes the enumeration of objects on which we
impose certain symmetries. See Bergeron et al. With this theorem in hand, eqn [25] is easy to
(1998, appendix 1), de Bruijn (1981), and Stanley solve. We write it in the form
(1999, chapter 7) for more details. The enumeration
Tx expTx x 29
of asymmetric objects is the subject of an ongoing
research program (cf. Labelle and Lamathe (2004)). We want P to know the coefficients in the expansion
T(x) = 1 n=0 t n xn
=n!. Since, by [29], T(x) is the
compositional inverse of x exp (x), substitution of
Solving Equations for Generating x exp (x) instead of x gives
Functions: The Lagrange Inversion X
1
tn
Formula and the Kernel Method x x expxn
n0
n!
In this section, we describe two methods to solve
This equation is in the form [26] with f (x) =
functional equations for generating functions. The
x exp (x) and g(x) = x. Hence, by [27], we obtain
Lagrange inversion makes it possible (in some situa-
tions) to find explicit expressions for the coefficients of tn 1 1
x x expxn
an implicitly given series. The kernel method (and its n! n
extensions), on the other hand, is a powerful method 1 nn1
to obtain an explicit expression for an implicitly given xn1  expnx
n n!
function. We refer the reader to Flajolet and
and, thus, tn = nn1 .
560 Combinatorics: Overview

The second method to solve functional equations reading section). In a more general situation, one
which we explain in this section is the kernel has a functional equation
method. We illustrate the method by an example.
PFu; x; F1 x; . . . ; Fk x; x; u 0 33
Let us consider the problem of counting Dyck paths
of length 2n (see the section Basic combinatorial where F(u, x) appears linearly, as well as the
terminology). Rather than attempting to arrive at a unknown series F1 (x), . . . , Fk (x), whereas x and u
solution of the problem directly, we consider the appear rationally. It is clear that one can apply the
more general problem of counting the number an, k same technique, namely collecting all the terms
of paths consisting of steps (1, 1) and (1, 1), which involving F(u, x), equating the coefficient of F(u, x)
start at the origin, never drop below y = 0, have to zero, solving for u and substituting back in [33]. If
length n, and end at height k. We then form P the there is more than one function Fi (x), then this will
bivariate generating function F(u, x) = n, k0 only give one equation for Fi (x). However, when
an, k xn uk . We then have the functional equation equating the coefficient of F(u, x), which was a
x polynomial equation, there can be more solutions.
Fu; x 1 xuFu; x Fu; x  F0; x 30 (That was actually also the case in our example,
u
although only one solution could be used.) All these
since a path can be empty (this explains the term 1), solutions can be substituted in [33] to give many
it can end by a step (1,1) (this explains the term more equations for Fi (x). The kernel method will
xuF(u)), or it can end by a step (1,1). The latter work if we have enough equations to determine the
can only happen if the path before that last step did unknown functions Fi (x) (see the Flajolet and
not end at height 0. The generating function for Sedgewick reference, section VII.5 for further details).
these paths is F(u, x) F(0, x), and this explains the In the variant of the obstinate kernel method,
third term in the eqn [30]. In fact, we may replace more equations are produced in more sophisticated
[30] by ways. The method has been largely extended by
x BousquetMelou and co-workers to cover equations
Fu; x 1 xuFu; x Fu; x  F1 x 31
u of the form [33], where P is a polynomial such that
because [31] implies that F1 (x) = F(0, x). eqn [33] determines all involved series uniquely. This
The idea of the kernel method is to get rid of the extension covers in particular the so-called quadratic
unknown series F(u, x). This is possible because F(u, x) method due to Brown, which is of great significance
occurs linearly in [31], which can be rewritten as in the work of Tutte on the enumeration of maps.
 We refer the reader to BousquetMelou and Jehanne
x x (2005) and the references given there for these
Fu; x 1  xu  1  F1 x 32
u u extensions.
We simply equate the coefficient of F(u, x) in this
equation to zero,
x Extracting Asymptotic Information
1  xu  0
u from Generating Functions
solve this for u, There is powerful machinery available to extract the
p asymptotic behavior of the coefficients of a power
1  1  4x2
u series out of analytic properties of the power series.
2x
We describe the corresponding methods, singularity
(the other solution for u makes no sense in [31]), analysis and the saddle point method in this section.
and substitute this back in [32], to obtain The survey by Odlyzko (1995) and the Flajolet and
p Sedgewick reference in Further reading are excel-
1  1  4x2
F1 x lent sources for further reading, which, in particular,
2x2 contain several other methods which we cannot
the familiar generating function [2] for the Catalan cover here for reasons of limited space.
numbers. Now, by substituting this result in [31], we Let us suppose that we are interested in the
can even compute the full series F(u, x). asymptotic behavior of the sequence (fn )n0 of real
While this was certainly a complicated, and (or complex) numbers as n tends to infinity.P Let usn
unusual, way to compute the Catalan numbers, suppose that the power series f (z) = 1 n = 0 fn z
this approach generalizes when one considers converges in some neighborhood of the origin. (If
paths with different step sets (see section VII.5 of this series converges only at z = 0, then either one
the Flajolet and Sedgewick reference in Further has to try to scale, that is, for example, look at the
Combinatorics: Overview 561

P
power series f (z) = 1 n
n = 0 fn z =n! instead, or one expansion of f (z). For the above-mentioned stan-
must apply methods other than singularity analysis dard functions, we have
or the saddle point method. In the latter case,  
depending on the nature of the coefficients fn , this 1 1
zn 1  z log
may be the EulerMaclaurin or the Poisson summa- z 1z
tion formulas, the Mellin transform technique, or n1 C1 
other direct methods. The reader is referred to  log n 1
 1! log n
Odlyzko (1995) and the Flajolet and Sedgewick !
reference.) The idea is then to consider f (z) as a C2   1
complex function in z (and extend the range of f  35
2! log n2
beyond the disk of convergence about the origin),
and to study the singularities of f (z). (The point at where [zn ]g(z) denotes the coefficient of zn in g(z),
infinity can also be a singularity.) The upshot is that and where
the singularities of f (z) with smallest modulus
dictate the asymptotic behavior of the coefficients dk 1
Ck  k
fn . These singularities of smallest modulus are called ds s s
the dominating singularities.
If there is an infinite number of dominant If  is a nonpositive integer, then this expansion has
singularities, then one has to try the circle method. to be taken with care (cf. section VI.2 of the Flajolet
We refer the reader to Andrews (1976) and Ayoub and Sedgewick reference).
(1963) for details of this method. ToPsee how
  this works, consider the example
If there is a finite number of dominant singula- fn = nk = 0 2kk . We have
rities, then there can be again two different situa- X
1
1
tions, depending on whether these are small or fn zn p
n0 1  z 1  4z
large singularities. Roughly speaking, a singularity
is small if the function f (z) grows at most The function on the right-hand side is meromorphic
polynomially when z approaches the singularity, in all of C (where C denotes the complex numbers),
otherwise it is large. A typical example of a small with singularities at z = 1 and z = 1=4. The domi-
singularity is z = 1=4 in (1  4z)1=2 , whereas a nant singularity is z = 1=4. We determine the
typical example of a large singularity is z = 1 in singular expansion of f(z) about z = 1=4,
exp (x) or z = 1 in exp (1=(1  z)).
The method to apply for small singularities is the 4 4
f z 1  4z1=2  1  4z1=2
method of singularity analysis as developed by 3 9
4  
Flajolet and Odlyzko. (Singularity analysis implies
1  4z O 1  4z5=2
3=2
Darbouxs method, which occurs frequently in the 27
literature, and, thus, supersedes it.) For the sake of (We stopped the expansion after three terms. The
simplicity, we consider first only the case of a farther we go, the more terms can we compute
unique dominant singularity. We shall address the of the asymptotic expansion for fn .) Hence, we
issue of several dominant singularities shortly. obtain
Furthermore, we assume the singularity to be
 1=2
 
z = 1, again for the sake of simplicity of presenta- n 4 n 1 1
tion. The general result can then be obtained by fn 4 1
3 1=2 8n 128n2
rescaling z.  
4 n3=2 3
The basic idea is the transfer principle:  1
9 1=2 8n
If f z z Oz then 4 n 5=2  
7=2
z!1 O n
27 3=2
fn n On 34   
n!1 4n 4 1 11 1
p O
P n 3 18n 288n2 n3
where (z) = 1 n
n = 0 n z is a linear combination of

standard functions of the form P1 (1  z)n , or loga- If there are several small dominant singularities
rithmic variants, and (z) = n = 0 n z also lies in (but only a finite number of them), then one simply
the scale (see sections VI.3,4 of the Flajolet and applies the above procedure for all of them and, to
Sedgewick reference for the exact statement). The obtain the desired asymptotic expansion, one adds
expansion for f (z) in [34] is called the singular up the corresponding contributions.
562 Combinatorics: Overview

The method to apply for large singularities is the This result covers only the first term in the
saddle point method. For the following considera- asymptotic expansion. There is an even more
tions, we assume that f(z) is analytic in jzj < R  1. sophisticated theory due to Harris and Schoenfeld,
At the heart of the saddle point method lies which allows one to also find a complete asymptotic
Cauchys formula expansion. We refer the reader to section VIII.5 of
Z the Flajolet and Sedgewick reference and Odlyzko
1 f z (1995) for more details.
fn zn f z dz 36
2 i C zn1 Methods for the asymptotic analysis of multi-
for writing the nth coefficient in the power series variable generating functions are also available
expansion of f(z). Here, C is some simple closed (see the corresponding chapters in Flajolet and
contour around the origin that stays in the range Sedgewick, Odlyzko (1995) and the recent impor-
jzj < R. The idea is to exploit the fact that we are tant development surveyed in the Pemantle and
free to deform the contour. The aim is to choose a Wilson reference listed in Further reading). We
contour such that the main contribution to the add that both the method of singularity analysis and
integral in [36] comes from a very tiny part of the Haymans theory of admissible functions have been
contour, whereas the contribution of the rest is made largely automatic, and that this has been
negligible. This will be possible if we put the implemented in the Maple program gdev (see
contour through a saddle point of the integrand Further reading).
f (z)=zn1 . Under suitable conditions, the main
contribution will then come from the small passage
of the path through the saddle point, and the The Theory of Heaps
contribution of the rest will be negligible. The theory of heaps, developed by Viennot, is a
In practice, the saddle point method is not always geometric rendering of the theory of the partial
straightforward to apply, but has to be adapted to the commutation monoid of Cartier and Foata, which
specific properties of the function f(z) that we are is now most often called the CartierFoata monoid.
encountering. We refer the reader to the correspond- Its importance stems from the fact that several
ing chapters in the Flajolet and Sedgewick reference objects which appear in statistical physics, such as
and Odlyzko (1995) for more details. There is one Motzkin paths, animals, respectively polyominoes,
important exception though, namely the Hayman or Lorentzian triangulations (see the Viennot and
admissible functions. We will not reproduce the James reference in Further reading and the
definition of Hayman admissibility because it is references therein), are in bijection with heaps.
cumbersome (cf. section VIII.5 in the Flajolet and Informally, a heap is what we would imagine. We
Sedgewick reference and definition 12.4 of Odlyzko take a collection of pieces, say B1 , B2 , . . . , and put
(1995)). However, in many applications, it is not them one upon the other, sometimes also sideways,
even necessary to go back to it because of the closure to form a heap, see Figure 6.
properties of Hayman admissible functions. Namely, There, we imagine that pieces can only move
it is known (cf. Odlyzko (1995), theorem 12.8) that vertically, so that the heap in Figure 6 would indeed
exp (p(z)) is Hayman admissible in jzj < 1 for any form a stable arrangement. Note that we allow
polynomial p(z) with real coefficients as long as the several copies of a piece to appear in a heap. (This
coefficients an of the Taylor series of exp (p(z)) are means that they differ only by a vertical translation.)
positive for all sufficiently large n (thus, e.g., exp (z) For example, in Figure 6 there appear two copies of
is Hayman admissible), and it is known that, if f(z) B2 . Under these assumptions, there are pieces which
and g(z) are Hayman admissible in jzj < R  1, then can move past each other, and others which cannot.
exp (f (z)) and f(z)g(z) are also (thus, e.g., For example, in Figure 6, we can move the piece B6
exp ( exp (z)  1) is Hayman admissible). higher up, thus moving it higher than B1 if we wish.
The central result P of Haymans theory is the However, we cannot move B7 higher than B6 ,
following: if f (z) = n0 fn zn is Hayman admissible
in jzj < R, then
B1
f rn
fn  p as n!1 37 B3 B2
n
rn 2brn
B4 B5 B6
where rn is the unique solution for large n of the
B2 B7
equation a(r) = n in (R0 , R), with a(r) = rf 0 (r)=f (r),
b(r) = ra0 (r), and a suitably chosen constant R0 > 0. Figure 6 A heap of pieces.
Combinatorics: Overview 563

because B6 blocks the way. On the other hand, we


can move B7 past B1 (thus taking B6 with us). Thus, 3
2
a rigorous way to introduce heaps is by beginning
1
with a set B of pieces (in our example, B = 0
{B1 , B2 , . . . , B7 }), and we declare which pieces can
be moved past another and which cannot. We Figure 9 Bijection between Motzkin paths and heaps of
indicate this by a symmetric relation R: we write monomers and dimers.
aRb to indicate that a cannot move past b (and vice
versa). When we consider a word a1 a2 . . . an of the beginning to the end. Whenever we read a level-
pieces, ai 2 B, we think of it as putting first a1 , then step at height h, we make it into a monomer with
putting a2 on top of it (and, possibly, moving it past x-coordinate h, whenever we read a down-step from
a1 ), then putting a3 on top of what we already have, height h to height h  1, we make it into a dimer
etc. We declare two words to be equivalent if one whose endpoints have x-coordinates h  1 and h.
arises from the other by commuting adjacent letters Up-steps are ignored. Figure 9 shows an example. In
which are not in relation. A heap is then an the figure, the heap is not in standard fashion, in
equivalence class of words under this equivalence the sense that the x-axis is not shown as a horizontal
relation. What we have described just now is indeed line but as a vertical line (cf. Figure 7). But it could
the original definition of Cartier and Foata. be easily transformed into standard fashion by a
The class of heaps which occurs most frequently simple reflection with respect to a line of slope 1.
in applications is the class of heaps of monomers Lattice animals on the triangular lattice and on the
and dimers, which we now introduce. Let B = M [ D, quadratic lattice are also in bijection with heaps, this
where M = {m0 , m1 , . . . } is the set of monomers and time with heaps consisting entirely out of dimers.
D = {d1 , d2 , . . . } is the set of dimers. We think of a Given an animal, one simply replaces each vertex of
monomer mi as a point, symbolized by a circle, the animal by a dimer, see Figures 10 and 11. While
with x-coordinate i, see Figure 7. We think in the case of animals on the triangular lattice this
of a dimer di as two points, symbolized by circles, gives a constraintless bijection (see Figure 10), in the
with x-coordinates i  1 and i which are connected case of the quadratic lattice this sets up a bijection
by an edge, see Figure 7. We impose the relations with heaps of dimers in which two dimers of the
mi Rmi , mi Rdi , mi Rdi1 , i = 0, 1, . . . , di Rdj , i  1  same type can never be placed directly one over the
j  i, and extend R to a symmetric relation. Figure 8 other (see Figure 11). For example, two dimers d5 ,
shows two heaps of momomers and dimers. one placed directly over the other (as they occur in
For example, Motzkin paths are in bijection with Figure 10), are forbidden under this rule.
heaps of monomers and dimers. To see this, given a Next we make heaps into a monoid by introdu-
Motzkin path, we read the steps of the path from cing a composition of heaps. (A monoid is a set with
a binary operation which is associative.) Intuitively,
given two heaps H1 and H2 , the composition of H1
d4 and H2 , the heap H1 H2 , is the heap which results

d3

d2

d1

0 1 2 3 4 5 6 7 8
m0 m1 m2 m3 m4 m5 m6 m7 Figure 10 Bijection between animals and heaps of dimers.

0 1 2 3 4 5 6 7
Figure 7 Monomers and dimers.

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8
Figure 8 Two heaps of monomers and dimers. Figure 11 Bijection between animals and heaps of dimers.
564 Combinatorics: Overview

Furthermore, if P(B, R) denotes the set of all


pyramids with pieces from B, then
0 1
0 1 2 3 4 5 6 7 8
X wP X
Figure 12 The composition of the heaps in Figure 8. log@ wHA 40
P2PB;R
jPj H2HB;R
by putting H2 on top of H1 . In terms of words, the
composition of two heaps is the equivalence class of where jPj is the number of pieces of P. (As the
the concatenation uw, where u is a word from the reader will have guessed, this is a consequence of the
equivalence class of H1 , and w is a word from the exponential principle mentioned in the section
equivalence class of H2 . generating functions.)
The composition of the two heaps in Figure 8 is
shown in Figure 12.
Given pieces B with relation R, let H(B, R) be the
The Transfer Matrix Method
set of all heaps consisting of pieces from B, The transfer matrix method (cf. Stanley (1986),
including the empty heap, the latter denoted by ;. chapter 4 for further reading) applies whenever we
It is easy to see that the composition makes are able to build the combinatorial objects that we
(H(B, R), ) into a monoid with unit ;. are interested in by moving on a finite number of
For the statement of the main theorem in the states in a step-by-step fashion, where the current
theory of heaps, we need two more terms. A trivial step does not depend on the previous ones. (In
heap is a heap consisting of pieces all of which are statistical language, we are considering a finite-state
pairwise unrelated. Figure 13a shows a trivial heap Markov chain.) For example, Motzkin paths which
consisting of monomers and dimers. A pyramid is a are constrained to stay between two parallel lines,
heap with exactly one maximal ( = topmost) ele- say between y = 0 and y = K, can be described in
ment. Figure 13a shows a pyramid consisting of such a way: the states are the heights 0, 1, . . . , K,
monomers and dimers. Finally, if H is a heap, then and, if we are in state h, then in the next step we are
we write jHj for the number of pieces in H. allowed to move to states h 1, h, or h  1, except
In applications, heaps will have weights, which are that from state 0 we cannot move to 1 (there is no
defined by introducing a weight w(B) for each piece B state 1), and when we are in state K we cannot
in B, and by extending the weight w to all heaps H by move to K 1 (there is no state K 1).
letting w(H) denote the product of all weights of the For describing the general situation, let G = (V, E)
pieces in H (multiplicities of pieces included). be a directed graph with vertex set V and edge set E. Let
Let M be a subset of the pieces B. Then, the wn (u, v) denote the number of walks from vertex u to
generating function for all heaps with maximal vertex v along edges of G. To compute these numbers,
pieces contained in M is given by we consider the adjacency matrix of G, A(G). By
P jTj definition, using our notation, A(G) = (w1 (u, v))u, v2V .
X T2T BnM;R 1 wT
wH P Obviously, (wn (u, v))u, v2V = (A(G))n . Thus,
jTj !
H2HB;R T2T B;R 1 wT X1 X1
maximal pieces
M
wn u; vx n
AGn xn
38 n0 u;v2V n0

where T (B, R) denotes the set of all trivial heaps In  AGx1


with pieces from B. In particular, the generating
function for all heaps is given by where In is the n  n identity
Pmatrix. In other words,
1
the generating functions n=0 wn (u, v)xn for the
X 1 walk numbers between u and v form the entries of a
wH P jTj
39
H2HB;R T2T B;R 1 wT
matrix which is the inverse matrix of In  A(G)x. By
elementary linear algebra,
X
1
wn u; v xn
n0

1#u#v detIn  AGxv;u


0 1 2 3 4 5 6 7 0 1 2 3 4 5 6
41
detIn  AGx
(a) (b) where det (In  A(G)x)v, u is the minor of In  A(G)x
Figure 13 (a) A trivial heap. (b) A pyramid. with the row indexed by v and the column indexed
Combinatorics: Overview 565

by u omitted, and where #u denotes the row from ( t  1, t 1) to (n, m), by reflecting the path
number of u and similarly for #v. A weighted portion between the origin and the last touching
version could also be developed in the same way, point on y = x t 1 in this latter line. Thus, the
where we put a weight w(e) on each edge, and the result of the enumeration problem is the number of
weight of a walk is the product of the weights of all all paths from (0, 0) to (n,
 m), which is given by the
its edges. binomial coefficient nm n , minus the number of all
In particular, the expression [41] is a rational paths from (t  1, t 1) to (n, m), which is given
function in x. Then, by the basic theorem on nm
by the binomial coefficient nt1 , whence the
rational generating functions (cf. Stanley (1986), formula [42].
sectionP4.1), the number wn (u, v) can be expressed as If one considers more generally paths bounded by
a sum di= 1 Pi (n)in , where the i s are the different the line my = nx t, no compact formula is known.
roots of the polynomial det (xIn  A(G)), and Pi (n) It seems that the most conceptual way to approach
is a polynomial of degree less than the multiplicity this problem is through the so-called kernel method
of the root i . (The Pi (n)s depend on u and v, (see the section on solving equations for generating
whereas the i s do not.) If there exists a unique root functions), which, in combination with the saddle
j with maximal modulus, then this implies that, point method, allows one also to obtain strong
asymptotically as n ! 1, wn (u, v)  Pj (n)jn . asymptotic results. There is one special instance,
however, which has a nice formula. The number
of all lattice paths from the origin to (n, m) which
Lattice Paths never pass above x = y, where is a positive
integer, is given by
Recall from the section on basic combinatorial
 
terminology that a lattice path P in Zd is a path in n  m 1 n m 1
the d-dimensional integer lattice Zd which uses only 43
nm1 m
points of the lattice, that is, it is a sequence
(P0 , P1 , . . . , Pl ), where Pi 2 Zd for all i. The vectors The most elegant way to prove this formula is by


!
P0 P1 , P1 P2 , . . . , Pl1 Pl are called the steps of P. The means of the cycle lemma of Dvoretzky and
number of steps, l, is called the length of P. Motzkin (see Mohanty (1979), p. 9 where the cycle
The enumeration of lattice paths has always lemma occurs under the name of penetrating
been an intensively studied topic in statistics, analysis).
because of their importance in the study of Iteration of the reflection principle shows that the
random walks, of rank order statistics for non- number of paths from the origin to (n, m) which stay
parametric testing, and of queueing processes. The between the lines y = x t and y = x s (being
reader is referred to Feller (1957) and particularly allowed to touch them), where t  0  s and n t 
Mohantys (1979) book, which is a rich source for m  n s, is given by the finite (!) sum (see, e.g.,
enumerative results on lattice paths, albeit in a Mohanty (1979), p. 6)
statistical language. We review the most important X 
results in this section. Most of these concern two- nm
dimensional lattice paths, that is, the case d = 2. k2Z
n  kt  s 2
To begin with, we consider paths in the integer  
nm
plane Z2 consisting of horizontal and vertical unit  44
n  kt  s 2 t 1
steps in the positive direction. Clearly, the number
of all (unrestricted) paths from  the origin to (n, m) is The enumeration of lattice paths restricted to
the binomial coefficient nm n . By the reflection regions bounded by hyperplanes has also been
principle, which is commonly attributed to D Andre considered for other regions, such as quadrants,
(see, e.g., Comtet (1974) p. 22), it follows that the octants, and rectangles, as well as in higher dimen-
number of paths from the origin to (n, m) which do sions. A general result due to Gessel and Zeilberger,
not pass above the line y = x t, where m  n t, is and Biane, independently, on the number of lattice
given by paths in a chamber (alcove) of an (affine) reflection
n m  n m  group (see Krattenthaler (2003) for the correspond-
 42 ing references and pointers to further results) shows
n nt1
how far one can go when one uses the reflection
Roughly, the reflection principle sets up a bijec- principle. In particular, this result covers [42] and
tion between the paths from the origin to (n, m) [44], the enumeration of lattice paths in quadrants,
which do pass above the line y = x t and all paths octants, rectangles, and many other results that have
566 Combinatorics: Overview

appeared (before and after) in the literature. We Enumerating lattice paths with a fixed number
present a particularly elegant (and frequently occur- of maximal straight pieces (which correspond to
ring) special case. (In reflection group language, it runs), is intimately connected to another basic
corresponds to the reflection group of type An1 . enumeration problem concerning lattice paths: the
See Humphreys (1990) for terminology and infor- enumeration of lattice paths having a fixed number
mation on reflection groups.) of turns. An effective way to attack the latter problem
Let A = (a1 , a2 , . . . , ad ) and E = (e1 , e2 , . . . , ed ) be is by means of two-rowed arrays (see the survey
points in Zd with a1  a2      ad and e1  article by Krattenthaler (1997), where in particular
e2      ed . The number of all paths from A to E in analogs of the reflection principle for two-rowed
the integer lattice Zd , which consist of positive unit arrays are developed. These imply formulas for the
steps and which stay in the region x1  x2      xd , number of lattice paths with fixed starting points and
equals endpoints and a fixed number of north-east (respec-
!   tively eastnorth) turns, for unrestricted paths, as
X d
1
ei  ai ! det 45 well as for paths bounded by lines. (A northeast turn
1i;jd ei  aj  i j ! in a lattice path is a point where the direction changes
i1
from north to east. An eastnorth turn is defined
The counting problem of the theorem is equiva- analogously.) In particular, analogs of [42][44] are
lent to numerous other counting problems. It has known when the number of northeast (respectively
been originally formulated as an n-candidate ballot eastnorth) turns is fixed.
problem, but it is as well equivalent to counting the These formulas imply for example (see again
number of standard Young tableaux of a given Krattenthaler (1997, section 3.5)) that the number
shape. In the case that all aj s are equal, the of lattice paths from the origin to (n, n) which
determinant does in fact evaluate into a closed- never pass above the line y = x t and have
form product. In Young tableaux theory, a parti- exactly 2r maximal straight pieces is given by
cular way to write the result is known as the
    
hook-length formula (see, e.g., Stanley (1999), n1 2 nt1 nt1
corollary 7.21.6). 2 
r1 r2 r
We return to lattice paths in the plane, mention-
  
ing some more closely related results. The first is a nt1 nt1
result of Mohanty (1979, section 4.2), which  49
r1 r1
expresses the number of all lattice paths from the
origin to (n, m) which touch the line y = x t with a similar result for the case of 2r 1 maximal
exactly r times, never crossing it, as the difference straight pieces. (If t = 0, the numbers in [49] become
   
nmr nmr 1  n  n 
 ; r1 46
nt1 nt n r r1
Not forbidding that the paths cross the bounding and they are known as the Narayana numbers.)
line, we arrive at the problem of counting the lattice Furthermore, they imply that the number of lattice
paths from the origin to (n, m), which cross the main paths from the origin to (n, n) which never pass
diagonal y = x exactly r times, the answer being above the line y = x t and never below the line
8   y = x  t and have exactly 2r maximal straight
> m  n 2r 1 m n 1 pieces is given by
>
> if m > n
< mn1 nr
47 X1   
>   n  2kt  1 n 2kt  1
>
> 2r 2 2n 2
: if m = n rk1 rk1
n nr1 k1
  
Next, we give the number of lattice paths from the n  2kt t  1 n 2kt  t  1

origin to (n, n) which have 2r steps on one side of rk2 rk
the line y = x, as   
n  2kt t  1 n 2kt  t  1
    50
2r 2n  2r rk1 rk1
48
r nr
with a similar result for the case of 2r 1 maximal
a result due to Sparre Andersen. We refer the reader straight pieces.
to Mohanty (1979, chapter 3) for further results in The most general boundary for lattice paths that
this direction. one can imagine is the restriction that it stays
Combinatorics: Overview 567

between two given (fixed) paths. Let us assume that The sequence of polynomials (pn (x))n0 is in fact a
the horizontal steps of the upper (fixed) path are at sequence of orthogonal polynomials (cf. Koekoek
heights a1  a2      an , whereas the horizontal and Swarttouw (1998) and Szego " (1959)).
steps of the lower (fixed) path are at heights b1  We remark that in the case that r = s = 0 there is
b2      bn , ai  bi , i = 1, 2, . . . , n. Then the num- also an elegant expression for the generating func-
ber of all paths from (0, b1 ) to (n, an ) satisfying the tion due to Flajolet (see section V.2 of the Flajolet
property that for all i = 1, 2, . . . , n the height of the and Sedgewick reference in Further reading) in
ith horizontal step is between bi and ai is given by terms of a continued fraction.
the determinant In order to solve our problem, we just have to
  extract the coefficient of x in [53]. By a partial
a i  bj 1
det 51 fraction expansion, a formula of the type
1i;jn ji1 X

In the statistical literature, this formula is often cm
m 54
m
known as Stecks formula, but it is actually a
special case of a much more general theorem due results, where the
m s are the zeroes of pK1 (x), and
to Kreweras. A generalization of [51] to higher- the cm s are some coefficients, only a finite number
dimensional paths was given by Handa and of them being nonzero.
Mohanty (see Mohanty (1979, section 2.4)). It should be noted that, because of the many
Next, we consider three-step lattice paths in the available parameters (the bn s and n s), by appro-
integer plane Z2 , that is, paths consisting of up-steps priate specializations one can also obtain numerous
(1, 1), level steps (1, 0), and down-steps (1, 1). The results about enumerating three-step paths accord-
particular problem that we are interested in is to ing to various statistics, such as the number of
count such three-step paths starting at (0, r) and touchings on the bounding lines, etc.
ending at (, s), which do not pass below the x-axis There are two important special cases in which a
and do not pass above the horizontal line y = K. completely explicit solution in terms of elementary
Furthermore, we assign the weight 1 to an up-step, functions can be given.
the weight bh to a level-step at height h, and the The first case occurs for bi = 0 and i = 1 for all i.
weight h to a down-step from height h to h  1. In this case, the polynomials pn (x) defined by
The weight w(P) of a path P is defined as the the three-term recurrence [52] are Chebyshev poly-
product of the weights of all its steps. Then we have nomials of the second kind, pn (x) = Un (x=2).
the following result, which can be obtained by the (The Chebyshev polynomial of the second kind
transfer matrix method described in the last section. Un (x) is defined by Un ( cos t) = sin ((n 1)t)= sin t
Define the sequence (pn (x))n0 of polynomials by (see Koekoek and Swarttouw (1998) for almost
exhaustive information on these polynomials and,
xpn x pn1 x bn pn x n pn1 x
52 more generally, on hypergeometric orthogonal poly-
for n  1 nomials)). The result which is then obtained from the
with initial conditions p0 (x) = 1 and p1 (x) = x  b0 . general theorem (clearly, the zeros of Un (x) are
Furthermore, define (Spn (x))n0 to be the sequence of x = cos (2k=(n 1)), k = 1, 2, . . . , n, and therefore
polynomials which arises from the sequence (pn (x)) the partial fraction expansion of [53] is easily
by replacing i by i1 and bi by bi1 , i = 0, 1, 2, . . . , determined) is that the number of lattice paths from
everywhere in the three-term recurrence [52] and in (0, r) to (, s) with only up- and down-steps, which
the initial conditions. Finally, given a polynomial p(x) always stay between the x-axis and the line y = K, is
of degree n, we denote the corresponding reciprocal given by (see also Feller (1957, chapter XIV, eqn [5.7])
polynomial xn p(1=x) by p (x). K1 
2 X k
With the weight w defined as before, the generat- 2 cos
P K 2 k1 K2
ing function P w(P)x(P) , where the sum is over all
three-step paths which start at (0, r), terminate at kr 1 ks 1
 sin sin 55
height s, do not pass below the x-axis, and do not K2 K2
pass above the line y = K, is given by
a formula which goes back to Lagrange.
8 sr The second case occurs for bi = 1 and i = 1 for
>
> x pr xSs1 p Ks x
< ; rs all i. In this case, the polynomials pn (x) defined
p x
K1
53 by the three-term recurrence [52] are again
>
> xrs p s xSr1 p Kr x
: r    s1 ; rs Chebyshev polynomials of the second kind,
p K1 x pn (x) = Un ((x  1)=2). The result which is then
568 Combinatorics: Overview

obtained from the general theorem is that the The latter equality shows in particular that Pfaffians
number of three-step lattice paths from (0, r) to are very close to determinants. They do, in fact,
(, s), which always stay between the x-axis and the generalize determinants since
line y = K, is given by  
0 B
Pf det B 59
K1  B 0
2 X k
2 cos 1
K 2 k1 K2 for any square matrix B.
Thus, given a graph with vertices v1 , v2 , . . . , v2n ,
kr 1 ks 1
 sin sin 56 specializing ai, j to the weight of the edge between vi
K2 K2 and vj , if it exists, and setting ai, j = 0 otherwise in
the definition of the Pfaffian, we obtain almost
Mw (G), the only difference is that there could be
signs in front of the individual terms of the sum,
Perfect Matchings and Tilings whereas in Mw (G) the sign in front of each term
In this section we consider the problem of counting must be . (The object obtained by omitting the sign
the perfect matchings of a graph. For an introduc- in [57] is called Hafnian. Unfortunately, in contrast
tion into the problem, and into methods to solve it, to the Pfaffian, it does not have any nice properties
as well as for a report on recent developments, we and it is therefore extremely difficult to compute.)
refer the reader to Propp (1999). Kasteleyns idea is to circumvent this problem by
Let G = (V, E) be a finite loopless graph with orienting the edges of the graph, defining signed
vertex set V and edge set E. A matching (also called weights of the edges, in such a way that the Pfaffian
1-factor in graph theory) is a subset of the edges of the array with signed weights produces exactly
with the property that no two edges share a vertex. Mw (G).
A matching is perfect if it covers all the edges. More precisely, given a (weighted) graph G with
Let M(G) denote the number of perfect matchings of vertices v1 , v2 , . . . , v2n , we make it into an oriented
!
the graph G. More generally, we could assign a (weighted) graph G . That is, if there is an edge
weight w(e) to each edge e of the graph and define the between vi and vj , ei, j say, we orient it either from vi
weight of a matching to be the product of to vj or the other way. Now we define the signed
! !
the weights of all its edges. Let Mw (G) denote adjacency matrix A(G ) of G by letting its (i, j)-entry
the sum of all weights of all matchings of the to be w(ei, j ) if there is an edge from vi to vj
graph G. oriented that way, w(ei, j ) if there is an edge from
Kasteleyns method for determining M(G), respec- vj to vi oriented that way, and 0 if there is no edge
tively Mw (G), makes use of determinants and between vi and vj . Such an orientation is called
Pfaffians. Recall that the Pfaffian Pf(A) of a Pfaffian if
triangular array A = (ai, j )1i<j2n is defined by !
X Y PfAG Mw G
PfA sgn m i;j 57
m fi;jg2m Clearly, the question remains whether a Pfaffian
orientation can be found for a given graph. In
where the sum is over all perfect matchings of the general, this is an open question. However, Kaste-
complete graph on vertices {1, 2, . . . , 2n}, and where leyn shows that for planar graphs such a Pfaffian
the product is over all edges {i, j}, i < j, of m. The orientation can always be found. Moreover, he
sign sgn m of m is (1)#crossings of m , where a crossing shows that any orientation of a planar graph
is a pair ({i, j}, {k, l}) of edges such that i < k < j < l. which has the property that around any face
Usually, one extends the triangular array A to a bounded by 4k edges an odd number of edges is
matrix by setting aj, i = ai, j , i < j, and ai, i = 0 for oriented in either direction and that around any face
all i. Then, abusing notation, we identify the bounded by 4k 2 edges an even number of edges is
triangular array with the skew-symmetric matrix oriented in either direction is Pfaffian.
A = (ai, j )1i, j2n . The Pfaffian satisfies the following For bipartite graphs (i.e., for graphs in which the set
useful properties: of vertices can be split into two disjoint sets such that
all the edges connect the vertex of one of these sets to a
PfBt AB detB PfA
vertex of the other), the situation is even nicer. This is
and because for a bipartite graph G in which both parts of
the bipartition of the vertices are of the same size
PfA2 detA 58 (otherwise, there is no perfect matching), any signed
Combinatorics: Overview 569

!
adjacency matrix A(G ) has the block form of the denote the set of all walks in G from u to v by
matrix on the left-hand side of [59] and, hence, the P(u ! v), and the set of all families (P1 , P2 , . . . , Pn )
Pfaffian reduces to a determinant. More precisely, let of walks, where Pi runs from ui to vi , i = 1, 2, . . . , n,
G be a bipartite graph with vertex set V = U [ W, by P(u ! v), with u = (u1 , u2 , . . . , un ) and v = (v1 ,
U = {u1 , u2 , . . . , un } and W = {w1 , w2 , . . . , wn }, with v2 , . . . , vn ). The symbol P (u ! v) stands for the set
edges connecting some ui to some wj . Given a of all families (P1 , P2 , . . . , Pn ) in P(u ! v) with the
!
Pfaffian orientation G , we build the signed bipartite additional property that no two walks share a
! !
adjacency matrix B(G ) = (bi, j )1i, jn of G by setting vertex. We call such families of walk(er)s vicious
bi, j = w(ei, j ) if there is an edge from ui to wj oriented walkers or, alternatively, nonintersecting paths.
that way, w(ei, j ) if there is an edge from uj to wi The weight w(P) of a family PQ = (P1 , P2 , . . . , Pn ) of
oriented that way, and 0 if there is no edge between ui walks is defined as the product ni= 1 w(Pi ) of all the
and wj . Then we have weights of the walks in the family. Finally, given a
! set M with weight functionP w, we write GF(M; w)
detBG Mw G for the generating function x2M w(x).
In particular, this holds for any bipartite planar We need two further notations before we are able
graph. See Robertson et al. (1999) for a structural to state the LindstromGesselViennot theorem.
description about which (not necessarily planar) (For references and historical remarks, we refer the
bipartite graphs admit a Pfaffian orientation. reader to footnote 5 in Krattenthaler (2005a).) As
Kasteleyns construction in the planar case has earlier, the symbol S n denotes the symmetric group
been generalized to graphs on surfaces of any genus of order n. Given a permutation  2 S n , we write u
g in Dolbilin et al. (1996), Galluccio and Loebl for (u(1) , u(2) , . . . , u(n) ). Then
(1999), and Tesler (2000), independently. As pre- X
dicted by Kasteleyn, the solution is in terms of a sgn   GFP u ! v; w
linear combination of 4g Pfaffians. 2S n
 
With the help of his method, Kasteleyn computed det GFPuj ! vi ; w 60
1i;jn
the number of dimer coverings of an m  n
rectangle. (A dimer is a 2  1 rectangle. Thus, this Most often, this theorem is applied in the case
is equivalent to counting the number of perfect where the only permutation  for which vicious
matchings on the m  n grid graph. The formula walks exist is the identity permutation, so that the
was independently found by Temperley and Fisher.) sum on the left-hand side reduces to a single term
The result is that counts all families (P1 , P2 , . . . , Pn ) of vicious
Ym Y n  p
 walks, the ith walk Pi running from Ai to
i j
2 cos 2 1 cos Ei , i = 1, 2, . . . , n. This case occurs, for example, if
i1 j1
m1 n1
for any pair of walks (P, Q) with P running from ua
For even m and n, the formula can be rewritten as to vd and Q running from ub to vc , a < b and c < d,
it is true that P and Q must have a common vertex.
m=2
YY n=2  
i j Explicitly, in that case we have
4 cos2 4 cos2
m1 n1  
i1 j1
GFP u ! v;w det GFPuj ! vi ; w 61
1i;jn
There is a similar rewriting if one of m or n is odd.
(If both m and n are odd, there is no dimer If the starting points or/and the endpoints are not
covering.) fixed, then the corresponding number is given by a
For further reading and references see Dimer Pfaffian, a result obtained by Okada and Stembridge
Problems and Kuperberg (1998). (see Bressoud (1999) for references). For a set A of
starting points, let P (A ! v) denote the set of all
families (P1 , P2 , .. ., P2n ) of nonintersecting lattice
Nonintersecting Paths
paths, where Pi runs from some point of A to
Let G = (V, E) be a directed acyclic graph with vi , i = 1, 2, ..., 2n. Furthermore, let us suppose that
vertices V and directed edges E. Furthermore, we are the elements of A = {u1 , u2 , ...} are ordered in such a
given a function w which assigns a weight w(x) to way that for any pair of walks (P, Q) with P running
every vertex or edge x. Let usQdefine the
Q weight w(P) from ua to vd and Q running from ub to vc , a < b and
of a walk P in the graph by e w(e) v w(v), where c < d, it is true that P and Q must have a common
the first product is over all edges e of the walk P and vertex. (This is the same condition as the one which
the second product is over all vertices v of P. We makes [61] valid, with the only difference that, here,
570 Combinatorics: Overview

the number of ui s could be larger than the number of The second model could also be realized as a
vi s.) Then, single walker model (cf. Krattenthaler (2003)).
However, most often it is realized as a model of n
GFP A ! v;w paths in the plane consisting of steps (1, 1) and
X (1, 1) with the property that no two paths have a
Pf GFPua ! vi ;wGFPub ! vj ;w point in common. In this picture, the x-axis becomes
1i;j2n
a<b
 the time line, the kth path doing an up-step (1, 1)
GFPub ! vi ; wGFPua ! vj ;w 62 from (t  1, y) to (t, y 1) meaning that the kth
If the number of paths is odd, then one can use the particle moves to the left at time t, whereas the kth
same formula by adding an artificial point to the path doing a down-step (1, 1) from (t  1, y) to
endpoints and to the set of starting points A. There (t, y  1) meaning that the kth particle moves to the
is also a theorem by Okada and Stembridge which right at time t.
covers the case that starting points and endpoints The reader should consult Figure 14a for an
vary. Refinements when the number of turns is fixed example. (The labelings should be ignored at this
can be found in Krattenthaler (1997). point.) Clearly, what we encounter here is a
particular instance of the nonintersecting paths of
the last section. Therefore, for fixed starting points
Vicious Walkers, Plane Partitions, and endpoints, formula [61] applies, whereas if the
starting points vary and the endpoints are fixed, it is
Rhombus Tilings, and Fully Packed
formula [62] that applies.
Loop Configurations
At this point, the links to the other objects,
In this section we describe the interrelations between semistandard tableaux and plane partitions
four frequently appearing objects in statistical (cf. Bressoud (1999)), emerge. A filling of the cells
mechanics and combinatorics: vicious walkers, of the Ferrers diagram of  with elements of the set
plane partitions, rhombus tilings, and fully packed {1, 2, . . . }, which is weakly increasing along rows
loop configurations. and strictly increasing along columns is called a
Given a lattice, vicious walkers, as introduced by (semistandard) tableau of shape . Figure 14b shows
Fisher (1984), are particles which move on lattice such a semistandard tableau of shape (4, 3, 2). In
sites in such a way that two particles never occupy fact, vicious walkers and semistandard tableaux are
the same lattice site. Models of vicious walkers have equivalent objects. To see this, first label down-steps
been the object of numerous studies from various by the x-coordinate of their endpoint, so that a step
points of view. Rather than accomplishing the from (a  1, b) to (a, b  1) is labeled by a, see
impossible task of providing a complete overview Figure 14a. Then, out of the labels of the jth path,
of references, the reader is referred to the basic form the jth column of the corresponding tableau,
reference Fisher (1984) and to Krattenthaler (2005a)
for further pointers to the literature.
Most of the known results apply for vicious
walkers on the line. There are in fact two different
6
models: in the random turns vicious walker model, n
E4
walkers move on the integral points of the real line
in such a way that at each tick of the clock exactly
one walker moves to the right or to the left, whereas
in the lock step vicious walker model n walkers 4 6
move on the integral points of the real line in such a A4 E3
way that at each tick of the clock each walker moves
to the right or to the left. A3 3
The first model is equivalent to a model of one 4 6
walker in Zn (Z denoting the set of integers) which A2 E2
at each tick of the clock moves a positive or negative 2 4
unit step in the direction of one of the coordinate A1 5 E1
2 3 4 6
axes, always staying in the wedge x1 > x2 >    > 4 4 6
xn . This point of view was already put forward by 5 6
Fisher (1984). However, this problem belongs to the
problem of counting paths in chambers of reflection (a) (b)
groups discussed in the section Lattice paths. Figure 14 (a) Vicious walkers. (b) A tableau.
Combinatorics: Overview 571

see Figure 14b. The resulting array of numbers is


indeed a semistandard tableau. This can be readily
seen, since the entries are trivially strictly increasing
along columns, and they are weakly increasing along
rows because the paths do not touch each other.
Thus, problems of enumerating vicious walkers can
(a) (b)
be translated into tableau enumeration problems,
and vice versa. Figure 16 (a) A plane partition; three-dimensional view.
(b) A rhombus tiling.
The significance of semistandard tableaux lies
particularly in the representation theory for classical
groups, see Classical Groups and Homogenous which opposite sides have the same length, see
Spaces and Compact Groups and Their Representa- Figure 16b.
tions. Namely, the irreducible characters for From the rhombus tiling, there is then again an
GL(n, C) and SL(n, C), the Schur functions, are elegant way to go to nonintersecting paths: we mark
generating functions for semistandard tableaux of the mid-points of the edges along two opposite sides,
a given shape. If the entries of the ith row of see Figure 17a. Now we draw lattice paths which
a semistandard tableau are required to be at least connect points on different sides, by following
2i  1, then one speaks of symplectic tableaux, and along the other lozenges, as indicated in Figure 17a
the irreducible characters for Sp(2n, C) are generat- by the dashed lines. Clearly, the resulting paths are
ing functions for symplectic tableaux of a given nonintersecting, that is, no two paths have a
shape. We refer the reader to Krattenthaler et al. common vertex. If we slightly distort the underlying
(2000) for more information on these topics. lattice, we get orthogonal paths with horizontal and
Objects which are very close to semistandard vertical steps in the positive direction, see
tableaux are plane partitions. According to MacMa- Figure 17b.
hon, a plane partition of shape  is a filling of the Rhombus tilings, on their part, are equivalent to
Ferrers diagram of  with non-negative integers which perfect matchings of hexagonal graphs. To see this,
is weakly decreasing along rows and columns. See one places the tiling on the underlying triangular
Figure 15b for an example of a plane partition of shape grid, see Figure 18a. Then one places a bond into
(3, 3, 3). In particular, semistandard tableaux and each rhombus, so that it connects the mid-points of
plane partitions of rectangular shape are actually the two triangles out of which the rhombus is
equivalent. For, let T be a semistandard tableau of composed, see Figure 18b. Finally, one forgets the
rectangular shape. Then, from each element of the ith contour of the tiling, but instead one introduces all
row we subtract i. Finally, the obtained array is rotated the other edges which connect mid-points of
by 180 . As a result, we obtain a plane partition. See adjacent triangles of the triangular grid, see
Figure 15 for a semistandard tableau and a plane Figure 18c. Thus, one arrives at a perfect matching
partition which correspond to each other under these of the hexagonal graph consisting of the edges
transformations. connecting mid-points of triangles.
On the other hand, plane partitions can also be Because of these various connections, enumera-
realized as three-dimensional objects, by interpreting tion problems for vicious walkers, plane partitions,
each entry in the array as a pile of unit cubes of the tableaux, rhombus tilings can be approached by
size of the entry. For example, the plane partition in the different methods which are available for the
Figure 15 corresponds to the pile of cubes in various objects: the determinant theorem from
Figure 16a. But then, forgetting the three-dimensional the section Nonintersecting paths, together
view, by embedding the picture in a minimally with determinant evaluation techniques (cf. the
bounding hexagon, and by filling the emerging empty survey Krattenthaler (2005b)), apply, as well as the
regions by rhombi of unit length in the unique way this Kasteleyn method from the section Perfect
is possible, we obtain a rhombus tiling of a hexagon in

1 1 2 2 2 1
3 3 3 1 1 1
4 5 5 1 0 0
(a) (b)
(a) (b)
Figure 17 (a) A rhombus tiling. (b) A family of nonintersecting
Figure 15 (a) A semistandard tableau. (b) A plane partition. paths.
572 Combinatorics: Overview

Thus, the number of rhombus tilings of a hexagon


with side lengths a,b,c,a,b,c is given by the same
number, as well as the number of all vicious walkers
(P1 , P2 , . . . , Pa ), where Pi runs from (0, 2i) to (b c,
b  c 2i), i = 1, 2, . . . , a. More generally, the num-
ber of semistandard tableaux of shape  with entries
at most m is given by the hook-content formula
Y cu m
(a) (b) 64
u2
hu

where u ranges over all the cells of the Ferrers


diagram of , with c(u) being the content of u,
defined as the difference of the column number and
the row number of u, and with h(u) being the hook
length of u, defined as the number of cells in the
hook of u, the latter consisting of the cells to the
right of u in the same row and below u in the
same column, including u. Thus, this also gives a
(c)
formula for the number of all vicious walkers
Figure 18 (a) A rhombus tiling. (b) Bonds in rhombi.
(c) A perfect matching of a hexagonal graph.
(P1 , P2 , . . . , Pa ), where Pi runs from (0, 2i) to
(N, hi ). See Krattenthaler et al. (2000, section 2)
for details. There it is also explained that a Schur
matchings and tilings, and also methods from function summation formula, together with an
character theory for the classical groups. All analog of the hook-content formula for special
of these methods have been applied extensively (see orthogonal characters, proves that the number of
the surveys by Kenyon (2003), Propp (1999), and all vicious walkers (P1 , P2 , . . . , Pa ), where Pi runs
Krattenthaler et al. (2000)), the first and third more from (0, 2i) for N steps is given by
frequently for exact enumeration, while the second
Y aij1
particularly for asymptotic studies. It should be 65
noted that methods from random matrix theory also 1ijN
ij1
apply in certain situations, see Johansson (2002). See
Growth Processes in Random Matrix Theory and The reader is referred to the references given in
Random Matrix Theory in Physics. this section for many more results, in particular, on
In fact, we missed mentioning a further object, from the enumeration of plane partitions with symmetry,
statistical physics, which in some cases is equivalent to the enumeration of rhombus tilings of regions other
vicious walkers, etc.: fully packed loop configurations. than hexagons, and the enumeration of vicious
(Fully packed loop configurations are in bijection with walkers with various starting points and endpoints,
six-vertex configurations, see the next section.) If one under various constraints.
imposes certain connectivity constraints on fully
packed loop configurations, then one can construct
bijections with rhombus tilings and, hence, with
Six-Vertex Model and Alternating-Sign
nonintersecting paths and with the other objects
discussed in this section. The reader is referred to
Matrices
Di Francesco et al. (2004) and references therein. An alternating-sign matrix is a square matrix of 0s,
Having explained the various connections, we cite 1s and 1s for which the sum of entries in each
some fundamental results in the area. (We refer the row and in each column is 1 and the nonzero entries
reader to Bressoud (1999) and Stanley (1999, of each row and of each column alternate in sign.
chapter 7).) MacMahon proved that the number of For instance,
all plane partitions contained in an a  b  c box 0 1
(when viewed in three dimensions) is equal to 0 0 1 0 0
B 1 0 1 1 0C
B C
Y
a Y
b Y
c
ijk1 B0 1 0 1 1C
B C
63 @0 0 1 0 0A
i1 j1 k1
ijk2
0 0 0 1 0
Combinatorics: Overview 573

is a 5  5 alternating-sign matrix. Zeilberger proved 0 1 0


that the number of n  n alternating-sign matrices is
1 1 1
given by

Y
n1 0 1 0
3i 1!
66
i0
n i! (a) (b)
Figure 20 (a) An alternating-sign matrix. (b) A six-vertex
and he went on to prove the finer version that the
configuration.
number of n  n alternating-sign matrices with the
(unique) 1 in the first row in position j is given by
nj22nj1 n1 incident to exactly two edges. One obtains a fully
Y 3i 1! packed loop configuration out of a six-vertex config-
n1
3n2n1
 67
n1 i0
n i! uration by dividing the square lattice into its even and
odd sublattice denoted by A and B, respectively.
The first number is also equal to the number of Instead of arrows, only those edges are drawn that,
totally symmetric self-complementary plane parti- on sublattice A, point inward and, on sublattice B,
tions contained in the (2n)  (2n)  (2n) box, but point outward. The reader is referred to de Gier
there is no intrinsic explanation why this is so. We (2005) and Di Francesco et al. (2004) for further
refer the reader to Bressoud (1999) for an exposi- reading.
tion of these results, and for pointers to the The story of alternating-sign matrices and their
literature containing further unexplained connec- connection to the six-vertex model is given a vivid
tions between alternating-sign matrices and plane account in Bressoud (1999), with further important
partitions. results by Kuperberg, Okada, Razumov and
While the first result was achieved by a brute-force Stroganov, referenced in Razumov and Stroganov
constant-term approach, the second result is based on (2005).
the observation that alternating-sign matrices are in Fully packed loop configurations seem to play an
bijection with configurations in the six-vertex model important role in the explicit form of the ground-
on the square grid under domain-wall boundary state vectors of certain Hamiltonians in the dense
conditions. This then allowed one to use a formula O(1) loop model. The corresponding conjectures are
due to Izergin for the partition function for these six- surveyed in de Gier (2005). There is important
vertex configurations. Similar formulas for variations progress on these conjectures by Di Francesco and
of the model have been found by Kuperberg, and by ZinnJustin (2005, and references therein).
Razumov and Stroganov (see Razumov and Stroga-
nov (2005) and references therein).
A configuration in the six-vertex model is an
orientation of edges of a 4-regular graph (i.e., at Binomial Sums and Hypergeometric Series
each vertex there meet exactly four edges) such that When dealing with enumerative problems, it is
at each vertex two edges are oriented towards the inevitable to deal with binomial sums, that is, sums
vertex and two are oriented away from the vertex. in which the summands are products/quotients of
Thus, there are six possible vertex configurations, binomial coefficients and factorials, such as, for
giving the name of the model, see Figure 19. To go example,
from one object to the other, one uses the transla- n   
X 2k 2n  2k
tion between local configurations at a vertex and
entries in alternating-sign matrices indicated in the k nk
k0
figure. An example of the correspondence can be
found in Figure 20. In most cases, the right environment in which one
Another manifestation of alternating-sign matrices should work is the theory of (generalized) hypergeo-
and six-vertex configurations are fully packed loop metric series. These are defined as follows:
2 3
configurations. A fully packed loop configuration on a a1 ; . . . ; ar X1
a1 k    ar k zk
graph is a collection of edges such that each vertex is F
r s
4 ; z 5
b1 k    bs k k!
b1 ; . . . ; bs k0

where ()k = ( 1)( 2)    ( k  1) for k >


0, and ()0 = 1. The symbol ()k is called the
0 0 0 0 1 1 Pochhammer symbol or shifted factorial. For in-
Figure 19 The six vertex configurations. depth treatments of the subject, we refer the reader
574 Combinatorics: Overview

2 3
to Andrews et al. (1999), Gasper and Rahman a;a=2 1;b;c;d;1 2a  b  c  d n;n
(2004), and Slater (1966). 6 7
6 ;1 7
Hypergeometric series can be characterized as 6 7
6
7 F6 6
7
series in which the quotient of the (k 1)st by the a=2; 1 a  b; 1 a  c; 1 a  d; 7
6 7
4 5
kth summand is a rational function in k. This is also
the way to convert binomial sums into their  a b c d  n; a 1 n
hypergeometric form (respectively to see if this is 1 an 1 a  b  cn 1 a  b  dn 1 a  c  dn
possible; in most cases it is): form the quotient of the
1 a  bn 1 a  cn 1 a  dn 1 a  b  c  dn
(k 1)st by the kth summand and read off the
parameters a1 , . . . , ar , b1 , . . . , bs , and the argument z provided n is a non-negative integer.
from the factorization of the numerator and the Some of the most important transformation
denominator polynomials of the rational function, formulas are
out of these form the corresponding hypergeometric the Euler transformation formula
series, and multiply the series by the summand for 2 3 2 3
a;b c  a;c  b
k = 0. This is, in fact, a completely routine task, and, 2 F1
4 ;z 5 1  zcab 2 F1 4 ;z 5
indeed, computer algebra programs such as Maple c c
and Mathematica do this automatically.
The reason why hypergeometric series are much provided jzj < 1,
more fundamental than the binomial sums them- the Kummer transformation formula
selves is that there are hundreds of ways to write the 2 3
a; b; c
same sum using binomial coefficients and factorials, 6 7 ed e  a  b  c
whereas there is just one hypergeometric form, that 3 F2 4 ; 1 5
e  ad e  b  c
is, hypergeometric series are a kind of normal form d; e
2 3
for binomial sums. In particular, given a specific a; d  b; d  c
binomial sum, it is a hopeless enterprise to scan 6 7
 3 F2 4 ;15
through all the identities available in the literature
for this sum. There may be an identity for it, but d; d e  b  c
perhaps written differently. On the contrary, given a provided both series converge,
specific hypergeometric series, the list of available and the Whipple transformation formulas
identities which apply to this series is usually not 2 3
large, and tables of such identities can be set up in a;b;c;n
6 7
a systematic way. This has been done (cf. Slater 6
4 F3 4 ;1 7
5
(1966); the most comprehensive table available to e;f ;1 a b c  e  f  n
this date is contained in the manual of
e  an f  an
the Mathematica package HYP see Further
en f n
reading), and scanning through these tables is 2 3
largely facilitated by the use of the Mathematica n;a;1 a c  e  f  n;1 a b  e  f  n
6 7
package HYP. 6
 4 F3 4 ;1 7
5
We give here some of the most important
1 a b c  e  f  n;1 a  e  n;1 a  f  n
identities for hypergeometric series. Aside from the
binomial theorem, the most important summation 68
formulas are: the Gau 2 F1 -summation formula where n is a non-negative integer, and
2 3 2 3
a; b a; 1 2a ; b; c; d; e; n
4 cc  a  b
2 F1 ;15 6
6
7
c  ac  b 7 F6 4 ;17
5
c
a
2 ; 1 a  b; 1 a  c; 1 a  d; 1 a  e; 1 a n
provided <(c  a  b) > 0,
the PfaffSaalschutz summation formula 1 an 1 a  d  en

1 a  dn 1 a  en
2 3
a; b; n 2 3
c  an c  bn 1 a  b  c; d; e; n
3 F2
4 ;15 6 7
cn c  a  bn  4 F3 6 ;17 69
c; 1 a b  c  n 4 5
1 a  b; 1 a  c; a d e  n
provided n is a non-negative integer, and
the Dougall summation formula provided n is a non-negative integer.
Combinatorics: Overview 575

Since about 1990, for the verification of binomial may now sum both sides of [71] over k to obtain a
and hypergeometric series, there are automatic tools recurrence of the form [70].
available. The book by Petkovsek et al. (1996) is an Algorithms for multiple sums are also available
excellent introduction into these aspects. The philo- (see Further reading). They follow ideas by Wilf
sophy is as follows. Suppose we are P given a binomial and Zeilberger (1992) (of which a simplified
or hypergeometric series S(n) = k F(n, k). The version is presented in a Mohammed and Zeilber-
GosperZeilberger algorithm (see Further read- ger preprint (see Further reading)); however, they
ing) (cf. Petkovsek et al. (1996); a simplified run more quickly in capacity problems. Schneider
version was presented in the reference Zeilberger in (2005) is currently developing a very promising
Further reading) will find a linear recurrence new algorithmic approach to the automatic treat-
ment of multisums. See q-Special Functions and
A0 nSn A1 nSn 1    Statistical Mechanics and Combinatorial Problems.
Ad nSn d Cn 70
See also: Classical Groups and Homogeneous Spaces;
for some d, where the coefficients Ai (n) are Compact Groups and Their Representations; Dimer
polynomials in n, and where C(n) is a certain Problems; Growth Processes in Random Matrix Theory;
function in n, with proof ! Ordinary Special Functions; q-Special Functions; Saddle
If, for example, we suspected that S(n) = RHS(n), Point Problems; Statistical Mechanics and Combinatorial
where RHS(n) is some closed-form expression, then Problems.
we just have to verify that RHS(n) satisfies the
recurrence [70] and check S(n) = RHS(n) for suffi-
ciently many initial values of n to have a proof for Further Reading
the identity S(n) = RHS(n) for all n. On the other http://algo.inria.fr This site includes, among its libraries, the
hand, if RHS(n) was a different sum, then we would Maple program gdev.
apply the algorithm to find a recurrence for RHS(n). Andrews GE (1976) The Theory of Partitions, Encyclopedia of
Mathematics and Its Applications, vol. 2. (reprinted by Cambridge
If it turns out to be the same recurrence then, again,
University Press, Cambridge, 1998). Reading: AddisonWesley.
a check of S(n) = RHS(n) for a few initial values will Andrews GE, Askey RA, and Roy R (1999) In: Rota GC (ed.)
provide a full proof of S(n) = RHS(n) for all n. Special Functions, Encyclopedia of Mathematics and Its
Even in the case that we do not have a conjectured Applications, vol. 71. Cambridge: Cambridge University Press.
expression RHS(n), this is not the end of the story. Ayoub R (1963) An Introduction to the Analytic Theory of
Numbers. Mathematical Surveys, vol. 10, Providence, RI:
Given a recurrence of the type [70], the Petkovsek
American Mathematical Society.
algorithm (see Further reading) (cf. Petkovsek et al. Bergeron F, Labelle G, and Leroux P (1998) Combinatorial Species
(1996)) is able to find a closed-form solution (where and Tree-Like Structures. Cambridge: Cambridge University Press.
closed form has a precise meaning), respectively tell Bousquet-Melou M and Jehanne A (2005), Polynomial equations
that there is no closed-form solution. with one catalytic variable, algebraic series, and map
enumeration. Preprint, ar iv:math.CO/0504018.
The fascinating point about both algorithms is
Bressoud DM (1999) Proofs and Confirmations The Story of
that neither do we have to know what the algorithm the Alternating Sign Matrix Conjecture. Cambridge: Cam-
does internally nor do we have to check that. For bridge University Press.
the Petkovsek algorithm, this is obvious anyway de Bruijn NG (1964) Polyas theory of counting. In: Beckenbach
because, once the computer says that a certain EF (ed.) Applied Combinatorial Mathematics, New York:
Wiley, (reprinted by Krieger, Malabar, Florida, 1981).
expression is a solution of [70], it is a routine matter
Comtet L (1974) Advanced Combinatorics. Dordrecht: Reidel.
to check that. This is less obvious for the Gosper Dolbilin NP, Mishchenko AS, Shtanko MA, Shtogrin MI, and
Zeilberger algorithm. However, what the Gosper Zinoviev YuM (1996) Homological properties of dimer
Zeilberger
P algorithm does is, for a given sum
configurations for lattices on surfaces. Functional Analysis
S(n) = k F(n, k), it finds polynomials A0 (n), and its Application 30: 163173.
Feller W (1957) An Introduction to Probability Theory and Its
A1 (n), . . . , Ad (n) and an expression G(n, k) (which
Applications, vol. 1, 2nd edn. New York: Wiley.
is, in fact, a rational multiple of F(n, k)), such that Fisher ME (1984) Walks, walls, wetting and melting. Journal of
Statistical Physics 34: 667729.
A0 nFn; k A1 nFn 1; k    Flajolet P and Sedgewick R, Analytic Combinatorics, book
Ad nFn d; k Gn; k 1  Gn; k 71 project, available at http://algo.inria.fr.
Di Francesco P, Zinn-Justin P and Zuber J.-B. (2004), Determi-
for some d. Because of the properties of F(n, k) and nant formulae for some tiling problems and application to
G(n, k), which are part of the theory, this is an fully packed loops, Preprint, ar iv:math-ph/0410002.
Di Francesco P and Zinn-Justin P (2005), Quantum Knizhnik
identity which can be directly verified by clearing all Zamolodchikov equation, generalized RazumovStroganov
common factors and checking the remaining identity sum rules and extended Joseph polynomials. Preprint,
between rational functions in n and k. However, we ar iv:math-ph/0508059.
576 Compact Groups and Their Representations

Galluccio A and Loebl M (1999) On the theory of Pfaffian Pemantle R and Wilson MC, Twenty combinatorial examples of
orientations I. Perfect matchings and permanents. Electronic asymptotics derived from multivariate generating functions.
Journal of Combinatorics 6: Article #R6, 18 pp. Preprint, available at http://www.cs.auckland.ac.nz.
http://www.fmf.uni-lj.si website of Faculty of Mathematics of Petkovsek M, Wilf H, and Zeilberger D (1996) A B Wellesley:
University of Ljubljana. A Mathematica implementation by Peters AK.
Marko Petkovsek is available here. http://www.mat.univie.ac.at Website of Faculty of Mathematics,
Gasper G and Rahman M (2004) Basic Hypergeometric Series, University of Vienna. It provides the manual of the Mathe-
2nd edn. Encyclopedia of Mathematics and Its Applications, matica package HYP.
vol. 96. Cambridge: Cambridge University Press. Propp J (1999) Enumeration of matchings: problems and progress.
de Gier J (2005) Loops matchings and alternating-sign matrices. In: Billera L, Bjorner A, Greene C, Simion R, and Stanley RP
Discrete Mathematics 365388. (eds.) New Perspectives in Algebraic Combinatorics, Mathe-
Humphreys JE (1990) Reflection Groups and Coxeter Groups. matical Sciences Research Institute Publications, vol. 38,
Cambridge: Cambridge University Press. pp. 255291. Cambridge: Cambridge University Press.
Johansson K (2002) Non-intersecting paths, random tilings and Razumov AV and Stroganov YG (2005) Enumeration of quarter-
random matrices. Probability Theory and Related Fields turn symmetric alternating-sign matrices of odd order.
123: 225280. Preprint, ar iv:math-ph/0507003.
Kenyon R (2003) An Introduction to the Dimer Model, Lecture Notes Robertson N, Seymour PD, and Thomas R (1999) Permanents,
for a Short Course at the ICTP, 2002; ar iv:math.CO/0310326. Pfaffian orientations, and even directed circuits. Annals of
Koekoek R and Swarttouw RF, The Askeyscheme of hypergeo- Mathematics 150(2): 929975.
metric orthogonal polynomials and its q-analogue, TU Delft, Schneider C (2005) A new Sigma approach to multi-summation.
The Netherlands, 1998; on the www: http://aw.twi.tudelft.nl. Advances in Applied Mathematics 34(4): 740767.
Krattenthaler C (1997) The enumeration of lattice paths with Slater LJ (1966) Generalized Hypergeometric Functions.
respect to their number of turns. In: Balakrishnan N (ed.) Cambridge: Cambridge University Press.
Advances in Combinatorial Methods and Applications to Stanley RP (1986) Enumerative Combinatorics, Pacific Grove,
Probability and Statistics, pp. 2958. Boston: Birkhauser. CA: Wadsworth & Brooks/Cole, (reprinted by Cambridge
Krattenthaler C (2003), Asymptotics for random walks in alcoves University Press, Cambridge, 1998).
of affine Weyl groups. Preprint, ar iv:math.CO/0301203. Stanley RP (1999) Enumerative Combinatorics, vol. 2. Cambridge:
Krattenthaler C (2005a), Watermelon configurations with wall Cambridge University Press.
interaction: exact and asymptotic results. Preprint, Szego" G (1959) Orthogonal Polynomials, American Mathematical
ar iv:math.CO/0506323. Society Colloquium Publications, vol. 23. New York. Provi-
Krattenthaler C (2005b) Advanced determinant calculus: a dence RI: American Mathematical Society.
complement. Linear Algebra Applications 411: 68166. Tesler G (2000) Matchings in graphs on non-oriented surfaces.
Krattenthaler C, Guttmann AJ, and Viennot XG (2000) Vicious Journal of Combinatorial Theory Series B 78: 198231.
walkers, friendly walkers and Young tableaux II: with a wall. http://www.risc.uni.linz.ac.at website of RISC (Research Insti-
Journal of Physics A: Mathematical and General 33: 88358866. tute for Symbolic Computation). Mathematica implementa-
Kuperberg G (1998) An exploration of the permanent-determi- tions written by Peter Paule and Markus Schorn, and Axel
nant method. Electronic Journal of Combinatorics 5: Article Riese and Kurt Wegschaider are available here.
#R46, 34 pp. http://www.math.rutgers.edu website of Department of Mathe-
Labelle G and Lamathe C (2004) A shifted asymmetry index matics, Rutgers University. Computer implementations written
series. Advances in Applied Mathematics 32: 576608. by D Zeilberger are available here.
Mohammed M and Zeilberger D (2005) Multi-variable Zeilberger Viennot X and James W Heaps of segments, q-Bessel functions in
and AlmkvistZeilberger algorithms and the sharpening of square lattice enumeration and applications in quantum
WilfZeilberger theory. Advanced Applications in Mathe- gravity. Preprint.
matics (to appear). Wilf HS and Zeilberger D (1992) An algorithmic proof theory for
Mohanty SG (1979) Lattice Path Counting and Applications. hypergeometric (ordinary and q) multisum/integral identi-
New York: Academic Press. ties. Inventiones Mathematicae 108: 575633.
Odlyzko AM (1995) Asymptotic enumeration methods. In: Zeilberger D (2005) Deconstructing the Zeilberger algorithm.
Graham RL, Grotschel M, and Lovasz L (eds.) Handbook of Journal of Difference Equations and Applications 11: 851856.
Combinatorics, pp. 10631229. Amsterdam: Elsevier.

Compact Groups and Their Representations


A Kirillov, University of Pennsylvania, group with Lie algebra g. Unless otherwise stated,
Philadelphia, PA, USA G is assumed to be connected. The word group
A Kirillov, Jr., Stony Brook University, will always mean a Lie group and the word
Stony Brook, NY, USA subgroup will mean a closed Lie subgroup. The
2006 Elsevier Ltd. All rights reserved. notation Lie(H) stands for the Lie algebra of a Lie
group H. We assume that the reader is familiar
with the basic facts of the theory of Lie groups and
In this article, we describe the structure and Lie algebras, which can be found in Lie Groups:
representation theory of compact Lie groups. General Theory, or in the books listed in the
Throughout the article, G is a compact real Lie bibliography.
Compact Groups and Their Representations 577

Examples of Compact Lie Groups The proof of these results is based on the fact that
the Killing form of g is negative semidefinite.
Examples of compact groups include
Example 1 The group U(n) contains as the center
 finite groups,
the subgroup C of scalar matrices. The quotient
 quotient groups Tn = Rn =Zn , or more generally,
group U(n)=C is simple and isomorphic to
V=L, where V is a finite-dimensional real vector
SU(n)=Zn . The presentation of Theorem 1 in this
space and L is a lattice in V, that is, a discrete
case is
subgroup generated by some basis in V groups
 
of this type are called tori; it is known that Un T1  SUn =Zn
every commutative connected compact group is a
C  SUn=C \ SUn
torus;
 unitary groups U(n) and special unitary groups For the group SO(4) the presentation is
SU(n), n  2; (SU(2)  SU(2))={(1  1)}.
 orthogonal groups O(n) and SO(n), n  3; and
 the groups U(n, H), n  1, of unitary quaternionic This theorem effectively reduces the study of the
transformations, which are isomorphic to Sp(n) := structure of connected compact groups to the study
Sp(n, C) \ SU(2n). of simply connected compact simple Lie groups.

The groups O(n) have two connected components,


one of which is SO(n). The groups SU(n) and Sp(n) Complexification of a Compact Lie Group
are connected and simply connected. Recall that for a real Lie algebra g, its complex-
The groups SO(n) are connected but not simply ification is gC = g  C with obvious commutator. It
connected: for n  3, the fundamental group of is also well known that gC is semisimple or
SO(n) is Z2 . The universal cover of SO(n) is a
reductive iff g is semisimple or reductive, respec-
simply connected compact Lie group denoted by
tively. There is a subtlety in the case of simple
Spin(n). For small n, we have isomorphisms:
algebras: it is possible that a real Lie algebra is
Spin(3) SU(2), Spin(4) SU(2)  SU(2), Spin(5)
simple, but its complexification gC is only semi-
Sp(4), and Spin(6) SU(4). simple. However, this problem never arises for Lie
algebras of compact groups: if g is a Lie algebra of a
real compact Lie group, then g is simple if and only if
Relation to Semisimple Lie Algebras gC is simple.
and Lie Groups The notion of complexification for Lie groups is
more delicate.
Reductive Groups
Definition 1 Let G be a connected real Lie group
A Lie algebra g is called
with Lie algebra g. A complexification of G is a
 simple if it is nonabelian and has no ideals connected complex Lie group GC (i.e., a complex
different from {0} and g itself; manifold with a structure of a Lie group such that
 semisimple if it is a direct sum of simple ideals; group multiplication is given by a complex analytic
and map GC  GC ! GC ), which contains G as a closed
 reductive if it is a direct sum of semisimple and subgroup, and such that Lie(GC ) = gC . In this case,
commutative ideals. we will also say that G is a real form of GC .
We call a connected Lie group G simple or It is not obvious why such a complexification
semisimple if Lie(G) has this property. exists at all; in fact, for arbitrary real group it may
not exist. However, for compact groups we do have
Theorem 1 Let G be a connected compact Lie
the following theorem.
group and g = Lie(G). Then
Theorem 2 Let G be a connected compact Lie
(i) The Lie algebra g = Lie(G) is reductive: g = a 
group. Then it has a unique complexification GC  G.
g0 , where a is abelian and g0 = [g, g] is
Moreover, the following properties hold:
semisimple.
(ii) The group G can be written in the form G = (A  (i) The inclusion G GC is a homotopy equiva-
K)=Z, where A is a torus, K is a connected, simply lence. In particular, 1 (G) = 1 (GC ) and the
connected compact semisimple Lie group, and Z quotient space GC =G is contractible.
is a finite central subgroup in A  K. (ii) Every complex finite-dimensional representation
(iii) If G is simply connected, it is a product of of G can be uniquely extended to a complex
simple compact Lie groups. analytic representation of GC .
578 Compact Groups and Their Representations

Since the Lie algebra of a compact Lie group G is The restrictions on n in this table are
reductive, we see that GC must be reductive; if G is made to avoid repetitions which appear for
semisimple or simple, then so is GC . The natural small values of n. Namely, A1 = B1 = C1 , which
question is whether every complex reductive group gives SU(2) = Spin(3) = Sp(1); D2 = A1 [ A1 , which
can be obtained in this way. The following theorem gives Spin(4) = SU(2)  SU(2); B2 = C2 , which gives
gives a partial answer. SO(5) = Sp(4); and A3 = D3 , which gives SU(4) =
Spin(6). Other than that, all entries are distinct.
Theorem 3 Every connected complex semisimple
Exceptional groups E6 , . . . , G2 also admit explicit
Lie group H has a compact real form: there is a
geometric and algebraic descriptions which are
compact real subgroup K H such that H = KC .
related to the exceptional nonassociative algebra O
Moreover, such a compact real form is unique up to
of the so-called octonions (or Cayley numbers). For
conjugation.
example, the compact group of type G2 can be
Example 2 defined as a subgroup of SO(7) which preserves an
almost-complex structure on S6 . It can also be
(i) The unitary group U(n) is a compact real form
described as the subgroup of GL(7, R) which
of the group GL(n, C).
preserves one quadratic and one cubic form, or,
(ii) The orthogonal group SO(n) is a compact real
finally, as a group of all automorphisms of O.
form of the group SO(n, C).
(iii) The group Sp(n) is a compact real form of the
group Sp(n, C). Maximal Tori
(iv) The universal cover of GL(n, C) has no compact
real form. Main Properties

These results have a number of important appli- In this section, G is a compact connected Lie group.
cations. For example, they show that study of Definition 2 A maximal torus in G is a maximal
representations of a semisimple complex group H connected commutative subgroup T G.
can be replaced by the study of representations of its
compact form; in particular, every representation is The following theorem lists the main properties of
completely reducible (this argument is known as maximal tori.
Weyls unitary trick). Theorem 5
(i) For every element g 2 G, there exists a maximal
Classification of Simple Compact Lie Groups torus T 3 g.
(ii) Any two maximal tori in G are conjugate.
Theorem 1 essentially reduces such classification to
(iii) If g 2 G commutes with all elements of a
classification of simply connected simple compact
maximal torus T, then g 2 T.
groups, and Theorems 2 and 3 reduce it to the
(iv) A connected subgroup H G is a maximal
classification of simple complex Lie algebras. Since
torus iff the Lie algebra Lie(H) is a maximal
the latter is well known, we get the following result.
abelian subalgebra in Lie(G).
Theorem 4 Let G be a connected, simply con-
Example 3 Let G = U(n). Then the set T of
nected simple compact Lie group. Then gC must be
diagonal unitary matrices is a maximal torus in G;
a simple complex Lie algebra and thus can be
moreover, every maximal torus is of this form after
described by a Dynkin diagram of one the following
a suitable unitary change of basis. In particular, this
types: An , Bn , Cn , Dn , E6 , E7 , E8 , F4 , G2 .
implies that every element in G is conjugate to a
Conversely, for each Dynkin diagram in the above
diagonal matrix.
list, there exists a unique, up to isomorphism, simply
connected simple compact Lie group whose Lie Example 4 Let G = SO(3). Then the set D of
algebra is described by this Dynkin diagram. diagonal matrices is a maximal commutative sub-
group in G, but not a torus. Here D consists of four
For types An , . . . , Dn , the corresponding compact
elements and is not connected.
Lie groups are well-known classical groups shown in
the table below: Maximal Tori and Cartan Subalgebras
The study of maximal tori in compact Lie groups is
An , n  1 Bn , n  2 Cn , n  3 Dn , n  4 closely related to the study of Cartan subalgebras in
SU(n 1) Spin(2n 1) Sp(n) Spin(2n) reductive complex Lie algebras. For convenience of
readers, we briefly recall the appropriate definitions
Compact Groups and Their Representations 579

here; details can be found in Serre (2001) or in Lie It follows from the definition of root system that
Groups: General Theory. we have inclusions
Definition 3 Let a be a complex reductive Lie Q P it
algebra. A Cartan subalgebra h a is a maximal 2

Q_ P_ it
commutative subalgebra consisting of semisimple
elements.
Both P, Q are lattices in it ; thus, the index (P : Q)
Note that for general Lie algebras Cartan sub- is finite. It can be computed explicitly: if i is a basis
algebra is defined in a different way; however, for of the root system, then the fundamental weights !i
reductive algebras the definition given above is defined by
equivalent to the standard one.
A choice of a Cartan subalgebra gives rise to the h_i ; !j i ij
so-called root decomposition: if h a is a Cartan
subalgebra in a complex reductive Lie algebra, then form a basis of P. The simple roots i are related
we can write to fundamental
P weights !j by the Cartan matrix A:
i = Aij !j . Therefore, (P : Q) = (P_ : Q_ ) = j det Aj.
!
M Definitions of P, Q, P_ , Q_ also make sense when
ah a 1
g is reductive but not semisimple. However, in this
2R case they are no longer lattices: rkQ < dim t , and P
is not discrete.
where
We can now give more precise information about
a fx 2 aj ad h:x h; hix 8h 2 hg the structure of the maximal torus.
R f 2 h f0gja 6 0g h Lemma 1 Let T be a compact connected commu-
tative Lie group, and t = Lie(T) its Lie algebra. Then
The set R is called the root system of a with the exponential map is surjective and preimage
respect to Cartan subalgebra h; elements  2 R are of unit is a lattice L t. There is an isomorphism
called roots. We will also frequently use elements of Lie groups
_ 2 h defined by h_ , i = 2(, )=(, ) where ( , )
is a nondegenerate invariant bilinear form on a and exp : t=L ! T
h , i is the pairing between a and a . It can be shown
that so defined _ does not depend on the choice of In particular, T Rr =Zr = Tr , r = dim T.
the form ( , ). Let X(T) it be the lattice dual to 2i 1 L:
Theorem 6 Let G be a connected compact Lie
group with Lie algebra g, and let T G be a XT f 2 it jh; li 2 2iZ 8l 2 Lg 3

maximal torus in G, t = Lie(T) g. Let gC , GC be


the complexification of g, G as in Theorem 2. It is called the character lattice for T (see the
Let h = tC gC . Then h is a Cartan subalgebra in subsection Examples of representations).
gC , and the corresponding root system R it . Theorem 7 Let G be a compact connected Lie
Conversely, every Cartan subalgebra in gC can be group, and let T G be a maximal torus in G.
obtained as tC for some maximal torus T G. Then Q X(T) P. Moreover, the group G is
uniquely determined by the Lie algebra g and the
lattice X(T) 2 it which can be any lattice between
Weights and Roots
Q and P.
Let G be semisimple. Recall that the root lattice
Corollary For a given complex semisimple Lie
Q it is the abelian group generated by roots  2
algebra a, there are only finitely many (up to
R, and let the coroot lattice Q_ it be the abelian
isomorphism) compact connected Lie groups G
group generated by coroots _ ,  2 R. Define also
with gC = a.
the weight and coweight lattices by
The largest of them is the simply connected group,
P fjh_ ; i 2 Z 8 2 Rg it for which T = t=2iQ_ , X(T) = P; the smallest is the
so-called adjoint group, for which T = t=2iP_ ,
P_ ftjht; i 2 Z 8 2 Rg it;
X(T) = Q.
where h , i is the pairing between t and the dual Example 5 Let G = U(n). Then it = {real diagonal
vector space t . matrices}. Choosing the standard basis of matrix
580 Compact Groups and Their Representations

units in it, we identify it Rn , which also allows us Example 6 Let G = U(n). The set of diagonal unitary
to identify it Rn . Under this identification, matrices is a maximal torus, and the Weyl group is the
n X o symmetric group Sn acting on diagonal matrices by
Q 1 ; . . . ; n ji 2 Z; i 0 permutations of entries. In this case, Theorem 9 shows
  that if f (U) is a central function of a unitary matrix,
P 1 ; . . . ; n ji 2 R; i j 2 Z then f (U) = ~f (1 , . . . , n ), where i are eigenvalues of
XT Zn U and ~f is a symmetric function in n variables.

Note that Q, P are not lattices: Q Zn 1 ,


P R  Zn 1 . Representations of Compact Groups
Now let G = SU(n). Then it = Rn =R (1, . . . , 1), Basic Notions
and Q, P are the images of Q, P for G = U(n) in this By a representation of G we understand a pair
quotient. In this quotient they are lattices, and (, V), where V is a complex vector space and  is
(P : Q) = n. The character lattice in this case is a continuous homomorphism G ! Aut(V). This
X(T) = P, since SU(n) is simply connected. The notation is often shortened to  or V. In this article,
adjoint group is PSU(n) = SU(n)=C, where C = we only consider finite-dimensional (f.d.) represen-
{ idjn = 1} is the center of SU(n). tations; in this case, the homomorphism  is
automatically smooth and even real-analytic.
Weyl Group We associate to any f.d. representation (, V) of G
the representation ( , V) of the Lie algebra g = Lie(G)
Let us fix a maximal torus T G. Let N(T) G be which is just the derivative of the map  : G ! AutV at
the normalizer of T in G: N(T) = {g 2 G j gTg 1 = T}. the unit point e 2 G. In terms of the exponential map,
For any g 2 N(T) the transformation A(g): t 7! gtg 1 is we have the following commutative diagram:
an automorphism of T. According to Theorem 5, this 
automorphism is trivial iff g 2 T. So in fact, it is the G ! AutV
quotient group N(T)=T which acts on T. exp " " exp
Definition 4 The group W = N(T)=T is called the 

Weyl group of G.
g ! EndV
Choosing a basis in V, we can write the operators
Since the Weyl group acts faithfully on t and t , it
(g) and  (X) in matrix form and consider  and 
is common to consider W as a subgroup in GL(t ). It
as matrix-valued functions on G and g. The diagram
is known that W is finite.
above means that
The Weyl group can also be defined in terms of
Lie algebra g and its complexification gC . exp X e X 4

Theorem 8 The Weyl group coincides with the Recall that if G is connected, simply connected, then
subgroup in GL(it ) generated by reflections every representation of g can be uniquely lifted to a
s : x 7! x (2(, x))=(, ),  2 R, where, as representation of G. Thus, classification of repre-
before, ( , ) is a nondegenerate invariant bilinear sentations of connected simply connected Lie groups
form on g . is equivalent to the classification of representations
Theorem 9 of Lie algebras.
Let (1 , V1 ) and (2 , V2 ) be two representations of
(i) Two elements t1 , t2 2 T are conjugate in G iff the same group G. An operator A 2 Hom(V1 , V2 ) is
t2 = w(t1 ) for some w 2 W. called an intertwining operator, or simply an
(ii) There exists a natural homeomorphism of intertwiner, if A  1 (g) = 2 (g)  A for all g 2 G.
quotient spaces G=AdG T=W, where AdG Two representations are called equivalent if they
stands for action of G on itself by conjugation. admit an invertible intertwiner. In this case, using an
(Note, however, that these quotient spaces are appropriate choice of bases, we can write 1 and 2
not manifolds: they have singularities.) by the same matrix-valued function.
(iii) Let us call a function f on G central if Let (, V) be a representation of G. If all operators
f (hgh 1 ) = f (g) for any g, h 2 G. Then the (g), g 2 G, preserve a subspace V1 V, then the
restriction map gives an isomorphism restrictions 1 (g) = (g)jV1 define a subrepresenta-
tion (1 , V1 ) of (, V). In this case, the quotient
fcontinuous central functions on Gg
space V2 = V=V1 also has a canonical structure of a
fW invariant continuous functions on Tg representation, called the quotient representation.
Compact Groups and Their Representations 581

A representation (, V) is called reducible if it The collection of all unirreps of T is itself a group,
has a nontrivial (different from V and {0}) sub- called Pontrjagin dual of T and denoted by
representation. Otherwise it is called irreducible. b This group is isomorphic to Z.
T.
We call representation (, V) unitary if V is a By Theorem 11, any f.d. representation  of T is
Hilbert space and all operators (g), g 2 G, are equivalent to a direct sum of one-dimensional
unitary, that is, given by unitary matrices in any unirreps. So, an equivalence class of  is defined by
orthonormal basis. We use a short term unirrep the multiplicity function  on T b = Z taking non-
for a unitary irreducible representation. negative values:
X
 k k
Main Theorems k2Z
The following simple but important result was one The many-dimensional case of compact connected
of the first discoveries in representation theory. It abelian Lie group can be treated in a similar way.
holds for representations of any group, not necessa- Let T be a torus, that is, an abelian compact group,
rily compact. t = Lie(T). Then every irreducible representation
Theorem 10 (Schur lemma). Let (i , Vi ), i = 1, 2, be of T is one dimensional and thus is defined by a
any two irreducible finite-dimensional representa- group homomorphism  : T ! T1 = U(1). Such
tions of the same group G. Then any intertwiner homomorphisms are called characters of T. One
A : V1 ! V2 is either invertible or zero. easily sees that such characters themselves form a
group (Pontrjagin dual of T). If we denote by L the
Corollary 1 If V is an irreducible f.d. representation, kernel of the exponential map t ! T (see Lemma 1),
then any intertwiner A : V ! V is scalar: A = c id, c 2 C. one easily sees that every character has a form
Corollary 2 Every irreducible representation of a expt eht;i ; t 2 t;  2 XT
commutative group is one dimensional.
where X(T) it is the lattice defined by [3]. Thus,
The following theorem is one of the fundamental we can identify the group of characters T b with X(T).
results of the representation theory of compact b
In particular, this shows that T Z dim T
.
groups. Its proof is based on the technique of The second example is the group G = SU(2), the
invariant integrals on a compact group, which will simplest connected, simply connected nonabelian
be discussed in the next section. compact Lie group. Topologically, G is a three-
Theorem 11 dimensional sphere since the general element of G is
a matrix of the form
(i) Any f.d. representation of a compact group is  
equivalent to a unitary representation. a b
g ; a; b 2 C; jaj2 jbj2 1
(ii) Any f.d. representation is completely reducible: b a
it can be decomposed into direct sum
M Let V be two-dimensional complex vector space,
V ni V i realized by column vectors uv . The group G acts
naturally on V. This action induces the representa-
where Vi are pairwise nonequivalent unirreps. tion  of G in the space S(V) of all polynomials in
Numbers ni 2 Z are called multiplicities. u, v. It is infinite dimensional, but has many f.d.
subrepresentations. In particular, let Sk (V), or
Examples of Representations simply Sk , be the space of all homogeneous
polynomials of degree k. Clearly, dim Sk = k 1.
The representation theory looks rather different for
It turns out that the corresponding f.d. representa-
abelian (i.e., commutative) and nonabelian groups.
tions (k , Sk ), k  0, are irreducible, pairwise non-
Here we consider two simplest examples of both kinds. b of all unirreps.
equivalent, and exhaust the set G
Our first example is a one-dimensional compact
Some particular cases are of special interest:
connected Lie group. Topologically, it is a circle
which we realize as a set T U(1) of all complex 1. k = 0. The space V0 consists of constant functions
numbers t with absolute value 1. and 0 is the trivial one-dimensional representa-
Every unirrep of T is one dimensional; thus, it is tion: 0 (g)  1.
just a continuous multiplicative map  of T to itself. 2. k = 1. The space V1 is identical to V and 1 is
It is well known that every such map has the form just the tautological representation (g)  g.
3. k = 2. The space V2 is spanned by monomials
k t tk for some k 2 Z u2 , uv, v2 . The remarkable fact is that this
582 Compact Groups and Their Representations

representation is equivalent to a real one. Namely, Theorem 12 For every compact Lie group G, there
in the new basis exists a unique measure dg on G, called Haar
measure, which is invariant
R under left shifts
u2 v2 u2 v2 Lg : h 7! gh and satisfies G dg = 1.
x ; y ; z iuv
2 2i In addition, this measure is also invariant under
we have right shifts h 7! hg and under involution h 7! h 1 .
0 1
! Rea2 b2 2Imab Imb2 a2 Invariance of the Haar measure implies that for
a b B C every integrable function f (g), we have
2 @ 2Imab jaj2 jbj2 2Reab A
b a Z Z Z Z
Ima2 b2 2Reab Rea2 b2 f g dg f hgdg f gh dg f g 1 dg
G G G G
This formula defines a homomorphism 2 : SU(2) !
For a finite group G, the integral with respect to
SO(3). It can be shown that this homomorphism is
the Haar measure is just averaging over the group:
surjective, and its kernel is the subgroup
Z
{ 1} SU(2): 1 X
f g dg f g
2 G jGj g2G
1 ! f1g ,! SU2 ! SO3 ! 1
The simplest way to see it is to establish the For compact connected Lie groups, the Haar
equivalence of 2 with the adjoint representation measure is given by a differential form of top degree
of G in g. The corresponding intertwiner is which is invariant under right and left translations.
For a torus T n = Rn =Zn with real coordinates
k 2
S2 3  i u2 2iuv
  R=Z or complex coordinates tk = e2i
k , the Haar
2 i  i measure is dn
:= d
1 d
2 d
n or
 i v ! 2g
 i i Yn
dtk
dn t :
Note that SU(2) and SO(3) are the only compact k1
2it k
groups associated with the Lie algebra sl(2, C).
The group G contains the subgroup H of diagonal In particular, consider a central function f (see
matrices, isomorphic to T1 . Consider the restriction Theorem 9). Since every conjugacy class contains
of n to T1 . It splits into the sum of unirreps k as elements of the maximal torus T (see Theorem 5),
follows: such a function is determined by its values on T, and
the integral of a central function can be reduced to
sn=2

X integration over T. The resulting formula is called


ResG
T 1 n n 2s
s0
Weyl integration formula. For G = U(n) it looks
as follows:
The characters k which enter this decomposition Z Z
1 Y
are called the weights of n . The collection of all f gdg f t jti tj j2 dn t
weights (together with multiplicities) forms a multi- Un n! T i<j
b denoted by P(n ) or P(Sn ).
set in T
Note the following features of this multiset: where T is the maximal torus consisting of diagonal
matrices
1. P(n ) is invariant under reflection k 7! k.
2. All weights of n are congruent modulo 2. t diagt1 ; . . . ; tn ; tk e2i
k
3. The nonequivalent unirreps have different multi- and dn t is defined above.
sets of weights. Weyl integration formula for arbitrary compact
Below we show how these features are generalized group G can be found in Simon (1996) or Bump
to all compact connected Lie groups. (2004, section 18).
The main applications of the Haar measure are the
proof of complete reducibility theorem (Theorem 11)
and orthogonality relations (see below).
Fourier Transform
Haar Measure and Invariant Integral Orthogonality Relations and PeterWeyl Theorem
The important feature of compact groups is the Let V1 , V2 be unirreps of a compact group G.
existence of the so-called invariant integral, or Taking any linear operator A : V1 ! V2 and aver-
average. aging the expression A(g) := 2 (g 1 )  A  1 (g) over
Compact Groups and Their Representations 583

R
G, we get an intertwining operator hAi = G A(g)dg. b as the space
We introduce the Hilbert space L2 (G)
Comparing this fact with the Schur lemma, one b whose value at a point
of matrix-valued functions on G
obtains the following fundamental results. 2G b belongs to Matd() (C). The norm is defined as
Let (, V) be any unirrep of a compact group G. X
Choose any orthonormal basis {vk , 1  k  dim V} kFk2 2 b d trFF
L G
V  b
in V and denote by tkl , or tkl , the function on G 2G
defined by
For a function f on G define its Fourier transform e
f
V
tkl g gvl ; vk as a matrix-valued function on G:b
Z
V
The functions tkl are called matrix elements of the e
f  f g 1  gdg
unirrep (, V). G

Note that in the case G = T1 this transform


Theorem 13 (Orthogonality relations) associates to a function f the set of its Fourier
V coefficients. In general this transform keeps some
(i) The matrix elements tkl are pairwise orthogonal
important features of Fourier coefficients.
and have norm ( dim V) 1=2 in L2 (G, dg).
(ii) The matrix elements corresponding to equiva- Theorem 14
lent unirreps span the same subspace in
(i) For a function f 2 L1 (G, dg) the Fourier transform
L2 (G, dg). e
f is well defined and bounded (by matrix norm)
(iii) The matrix elements of two nonequivalent b
function on G.
unirreps are orthogonal.
(ii) For a function f 2 L1 (G, dg) \ L2 (G, dg) the
(iv) The linear span of all matrix elements of all
following analog of the Plancherel formula holds:
unirreps is dense in C(G), C1 (G), and in Z
L2 (G, dg) (generalized PeterWeyl theorem). kf k2L2 G;dg : jf gj2 dg
G
b of
In particular, this theorem implies that the set G X
equivalence classes of unirreps is countable. For an d tref e
f  : ke
f k2 2 b
L G
f.d. representation (, V) we introduce the character 2Gb
of  as a function (iii) The following inversion formula expresses f in
XV
dim terms of e
f:
 X
 g trg tkk g 5

k1 f g d tre
f  g
b
2G
It is obviously a central function on G.
(iv) The Fourier transform sends the convolution to
Remark Traditionally, in representation theory
the matrix multiplication:
the word character has two different meanings:
(1) a multiplicative map from a group to U(1), and g
f1 f2 e
f1 e
f2
(2) the trace of a representation operator (g). For
one-dimensional representations both notions where the convolution product is defined by
coincide. Z
f1 f2 h f1 hgf2 g 1 dg
From the orthogonality relations we get the G
following result.
Note the special case of the inversion formula for
Corollary The characters of unirreps of G form an g = e:
orthonormal basis in the subspace of central func- X
tions in L2 (G, dg). f e d tre
f ;
b
2G

Noncommutative Fourier Transform or


X
The noncommutative Fourier transform on a com- g d  g
b denote the
pact group G is defined as follows. Let G b
2G
set of equivalence classes of unirreps of G. Choose R
b a representation ( , V ) of class  where (g) is Diracs delta-function:
for any  2 G G f (g)
and an orthonormal basis in V . Denote by d() the (g) dg = f (e). Thus, we get a presentation of Diracs
dimension of V . delta-function as a linear combination of characters.
584 Compact Groups and Their Representations

Classification of Finite-Dimensional However, this representation can be infinite dimen-


Representations sional; moreover, it may not be possible to lift it to a
representation of G.
In this section, we give a classification of unirreps of
a connected compact Lie group G. Definition 5 A weight  2 X(T) is called domi-
nant if h, _i i 2 Z for any simple root i . The set
Weight Decomposition of all dominant weights is denoted by X (T).

Let G be a connected compact group with maximal Theorem 17


torus T, and let (, V) be a f.d. representation of G. (i) All weights of L() are of the form  =  ni i ,
Restricting it to T and using complete reducibility, ni 2 Z .
we get the following result. (ii) Let  2 X . Then the irreducible highest-weight
Theorem 15 The vector space V can be written in representation L() is f.d. and lifts to a
the form representation of G.
M (iii) Every irreducible f.d. representation of G is of
V V ; the form L() for some  2 X .
2XT 6
Thus, we have a bijection {unirreps of G} $ X .
V fv 2 Vj tv h; tiv 8t 2 tg
Example 7 Let G = SU(2). There is a unique simple
where X(T) is the character group of T defined by [3]. root  and the unique fundamental weight !, related
The spaces V are called weight subspaces, by  = 2!. Therefore, X = Z ! and unirreps are
vectors v 2 V weight vectors of weight . The set indexed by non-negative integers. The representa-
tion with highest weight k ! is precisely the
PV f 2 XTjV 6 f0gg 7
representation k constructed in the subsection
Examples of representations.
is called the set of weights of , or the spectrum
of ResG Example 8 Let G = U(n). Then X = Zn , and X =
T , and
{(1 , . . . , n ) 2 Zn j 1   n }. Such objects are
mult;V  : dim V well known in combinatorics: if we additionally
assume that n  0, then such dominant weights are
is called the multiplicity of  in V. in bijection with partitions with n parts. They can
The next theorem easily follows from the defini- also be described by Young diagrams with n rows
tion of the Weyl group. (see Fulton and Harris (1991)).

Theorem 16 For any f.d. representation V of G,


the set of weights with multiplicities is invariant Explicit Construction of Representations
under the action of the Weyl group: In addition to description of unirreps as highest-
wPV PV; mult;V  mult;V w weight representations, they can also be constructed
in other ways. In particular, they can be defined
for any w 2 W. analytically as follows. Let B = HN be the
Borel subgroup
P in GC ; here H = exp h,
Classification of Unirreps N = exp 2R (gC ) . For  2 h , let  : B ! C
be a multiplicative map defined by
Recall that R is the root system of gC . Assume that
we have chosen a basis of simple roots 1 , . . . , r  hn ehh;i 8

R. Then R = R [ R ; roots  2 R can be written


as a linear combination of simple roots with positive Theorem 18 (CartanBorelWeil). Let  2 X(T).
coefficients, and R = R . Denote by V() the space of complex-analytic
A (not necessarily f.d.) representation of gC is functions on GC which satisfy the following trans-
called a highest-weight representation if it is formation property:
generated by a single vector v 2 V (the highest-
weight vector) such that g v = 0 for all positive f gb  1
 bf g; g 2 GC ; b 2 B
roots  2 R .
It can be shown that for every  2 X(T), there is a The group GC acts on V() by left shifts:
unique irreducible highest-weight representation of
gC with highest weight , which is denoted L(). gf h f g 1 h 9

Compact Groups and Their Representations 585

Then Example 9 Let G = SU(2). Then Weyl character


formula gives, for irreducible representation k with
(i) V() 6 {0} iff  2 X . highest weight k !,
(ii) If  2 X , the representation of G in V() is xk1 x k1
equivalent to L(w0 ()), where w0 2 W is the k
x x 1
unique element of the Weyl group which sends x xk 2 x k ;
k
x e!
R to R .
which implies dim k = k 1.
This theorem can also be reformulated in more
geometric terms: the spaces V() are naturally Weyl character formula is equivalent to the follow-
interpreted as spaces of global sections of appro- ing formula for weight multiplicities, due to Kostant:
priate line bundles on the flag variety X
B = GC =B = G=T. multL  "wKw 
w2W
For classical groups, irreducible representations
can also be constructed explicitly as the subspaces in where K is Kostants partition function: K( ) is the
tensor powers (Cn )k , transforming in a certain way number of ways of writing as a sum of positive
under the action of the symmetric group Sk . roots (with repetitions).
For classical Lie groups such as G = U(n), there are
more explicit combinatorial formulas for weight multi-
plicities; for U(n), the answer can be written in terms of
Characters and Multiplicities the number of Young tableaux of a given shape.
Characters Details can be found in Fulton and Harris (1991).
Let (, V) be a f.d. representation of G and let  be Tensor Product Multiplicities
its character as defined by [5]. Since  is central,
and every element in G is conjugate to an element of Let (, V) be a f.d. representation of G. By complete
T,  is completely determined by its restriction to reducibility, one can write V = n L(). The coeffi-
T, which can be computed from the weight decom- cients n are called multiplicities; finding them is an
position [6]: important problem in many applications. In parti-
X cular, a special case of this is finding the multi-
 j T dim V e plicities in tensor product of two unirreps:
2XT X
X L  L
N L
mult  e 10

2XT Characters provide a practical tool for computing


where e is the function on T defined by multiplicities: since characters of unirreps are line-
e ( exp (t)) = eht, i , t 2 t. Note that e = e e and arly independent, multiplicities can be found from
that e0 = 1. the condition that V = n L() . In particular,
X

L L N L
Weyl Character Formula

Theorem 19 (Weyl character formula). Let  2 X . Example 10 For G = SU(2), tensor product multi-
Then plicities are given by

A X n  m l
L ; A "wew
A w2W
where the sum is taken over all l such that jm nj 
l  m n, m n l is even.
where, for w 2 W, we denote "(w) = det wPconsid-
ered as a linear map t ! t , and = (1=2) R . For G = U(n), there is an algorithm for finding the
tensor product multiplicities, formulated in the
In particular, computing the value of the character language of Young tableaux (LittlewoodRichardson
at point t = 0 by LHopitals rule, it is possible to rule). There are also tables and computer programs
deduce the following formula for the dimension of for computing these multiplicities; some of them are
irreducible representations: listed in the bibliography.
Y h_ ;  i
dim L 11
See also: Classical Groups and Homogeneous Spaces;
2R
h_ ; i Combinatorics: Overview; Equivariant Cohomology and

586 Compactification of Superstring Theory

the Cartan Model; Finite Group Symmetry Breaking; Lie Fulton W and Harris J (1991) Representation Theory. New York:
Groups: General Theory; LjusternikSchnirelman Theory; Springer.
Noncommutative Geometry and the Standard Model; Knapp A (2002) Lie Groups beyond an Introduction, 2nd edn.
Optimal Cloning of Quantum States; Ordinary Special Boston: Birkhauser.
LiE: A Computer algebra package for Lie group computations,
Functions; Quasiperiodic Systems; Symmetry Classes in
available from http://young.sp2mi.univ-poitiers.fr
Random Matrix Theory. McKay WG, Patera J, and Rand DW (1990) Tables of
Representations of Simple Lie Algebras, vol. I. Exceptional
Simple Lie Algebras. Montreal: CRM.
Further Reading Serre J-P (2001) Complex Semisimple Lie Algebras. Berlin: Springer.
Bump D (2004) Lie Groups. New York: Springer. Simon B (1996) Representations of Finite and Compact Groups.
Brocker T and tom Dieck T (1995) Representations of Compact Providence, RI: American Mathematical Society.
Lie Groups, Graduate Texts in Mathematics, vol. 98. Zelobenko DP (1973) Compact Lie Groups and Their Represen-
New York: Springer. tations. Providence, RI: American Mathematical Society.

Compactification of Superstring Theory


M R Douglas, Rutgers, The State University of understood simply in terms of compactification of these
New Jersey, Piscataway, NJ, USA field theories, with the addition of a few crucial
2006 Elsevier Ltd. All rights reserved. ingredients from string/M-theory. Thus, most of this
article will restrict attention to this case, leaving many
stringy topics to the articles on conformal field
theory, topological string theory, and so on. We also
Introduction
largely restrict attention to compactifications based on
Superstring theories and M-theory, at present the best Ricci-flat compact spaces. There is an equally important
candidate quantum theories which unify gravity, class in which K has positive curvature; these lead to
YangMills fields, and matter, are directly formu- anti-de Sitter (AdS) spacetimes and are discussed in the
lated in ten and eleven spacetime dimensions. To article on AdS/CFT (see AdS/CFT Correspondence).
obtain a candidate theory of our four-dimensional After a general review, we begin with compacti-
universe, one must find a solution of one of fication of the heterotic string on a three complex
these theories whose low-energy physics is well dimensional CalabiYau manifold. This was the first
described by a four-dimensional effective field theory construction which led convincingly to the SM, and
(EFT), containing the well-established standard remains one of the most important examples. We
model (SM) of particle physics coupled to Einsteins then survey the various families of compactifications
general relativity (GR). The standard paradigm for to higher dimensions, with an eye on the relations
finding such solutions is compactification, along the between these compactifications which follow from
lines originally proposed by Kaluza and Klein in the superstring duality. We then discuss some of the
context of higher-dimensional general relativity. One phenomena which arise in the regimes of large
postulates that the underlying D-dimensional space- curvature and strong coupling. In the final section,
time is a product of four-dimensional Minkowski we bring these ideas together in a survey of the
spacetime, with a (D 4)-dimensional compact and various known four-dimensional constructions.
small Riemannian manifold K. One then finds
that low-energy physics effectively averages over K,
General Framework
leading to a four-dimensional EFT whose field
content and Lagrangian are determined in terms of Let us assume we are given a D- (=d k) dimen-
the topology and geometry of K. sional field theory T . A compactification is then a
Of the huge body of prior work on this subject, the D-dimensional spacetime which is topologically
part most relevant for string/M-theory is supergravity the product of a d-dimensional spacetime with an
compactification, as in the limit of low energies, small k-dimensional manifold K, the compactification or
curvatures and weak coupling, the various string internal manifold, carrying a Riemannian metric
theories and M-theory reduce to ten- and eleven- and with definite expectation values for all other
dimensional supergravity theories. Many of the quali- fields in T . These must solve the equations of motion,
tative features of string/M-theory compactification, and and preserve d-dimensional Poincare invariance (or,
a good deal of what is known quantitatively, can be perhaps another d-dimensional symmetry group).
Compactification of Superstring Theory 587

The most general metric ansatz for a Poincare for AdS compactifications). The remaining perturba-
invariant compactification is tions can be divided into massless fields, correspond-
  ing to zero modes of the linearized equations of
f  0 motion on K, and massive fields, the others. General
GIJ
0 Gij results on eigenvalues of Laplacians imply that the
masses of massive fields depend on the diameter of
where the tangent space indices are 0  I < d
K as m  1=diam(K), so at energies far smaller than
k = D, 0   < d, and 1  i  k. Here  is the
m, they cannot be excited (this is not universal;
Minkowski metric, Gij is a metric on K, and f is a
given strong negative curvature on K, or a rapidly
real-valued function on K called the warp factor.
varying warp factor, one can have perturbations of
As the simplest example, consider pure
small nonzero mass). Thus, the massive fields can be
D-dimensional GR. in this case, Einsteins equations
integrated out, to leave an EFT with a finite
reduce to Ricci flatness of GIJ . Given our metric
number of fields. In the classical approximation, this
ansatz, this requires f to be constant, and the metric
simply means solving their equations of motion in
Gij on K to be Ricci flat. Thus, any K which admits
terms of the massless fields, and using these
such a metric, for example, the k-dimensional torus,
solutions to eliminate them from the action. At
will lead to a compactification.
leading order in an expansion around a solution,
Typically, if a manifold admits a Ricci-flat metric,
these fields are zero and this step is trivial; never-
it will not be unique; rather there will be a moduli
theless, it is useful in making a systematic definition
space of such metrics. Physically, one then expects
of the interaction terms in the EFT.
to find solutions in which the choice of Ricci-flat
As we saw in pure GR, the configuration space
metric is slowly varying in d-dimensional spacetime.
parametrized by the massless fields in the EFT, is the
General arguments imply that such variations
moduli space of compactifications obtained by
must be described by variations of d-dimensional
deforming the original solution. Thus, from a
fields, governed by an EFT. Given an explicit
mathematical point of view, low-energy EFT can
parametrization of the family of metrics, say
be thought of as a sort of enhancement of the
Gij ( ) for some parameters  , in principle the
concept of moduli space, and a dictionary set up
EFT could be computed explicitly by promoting
between mathematical and physical languages. To
the parameters to d-dimensional fields, substituting
give its next entry, there is a natural physical metric
this parametrization into the D-dimensional action,
on moduli space, defined by restriction from the
and expanding in powers of the d-dimensional
metric on the configuration space of the theory T ;
derivatives. In pure GR, we would find the four-
this becomes the sigma-model metric for the scalars
dimensional effective Lagrangian
in the EFT. Because the theories T arising from
Z q string theory are geometrically natural, this metric is
LEFT dk y det GR4 also natural from a mathematical point of view, and
q one often finds that much is already known about it.
@Gij @Gkl For example, the somewhat fearsome two derivative
det GGik Gjl   @  @ 
@ @ terms in eqn [1], are (perhaps) less so when one
 1 realizes that this is an explicit expression for the
WeilPetersson metric on the moduli space of Ricci-
While this is easily evaluated for K a symmetric space flat metrics. In any case, knowing this dictionary is
or torus, in general a direct computation of LEFT is essential for taking advantage of the literature.
impossible. This becomes especially clear when one Another important entry in this dictionary is that
learns that the Ricci-flat metrics Gij are not explicitly the automorphism group of a solution translates
known for the examples of interest. Nevertheless, into the gauge group in the EFT. This can be either
clever indirect methods have been found that give a continuous, leading to the gauge symmetry of
great deal of information about LEFT ; this is much of Maxwell and YangMills theories, or discrete,
the art of superstring compactification. However, in leading to discrete gauge symmetry. For example, if
this section, let us ignore this point and continue as if the metric on K has continuous isometry group G,
we could do such computations explicitly. the resulting EFT will have gauge symmetry G, as in
Given a solution, one proceeds to consider its the original example of Kaluza and Klein with K S1
small perturbations, which satisfy the linearized and G U(1). Mathematically, these phenomena
equations of motion. If these include exponentially of enhanced symmetry are often treated using the
growing modes (often called tachyons), the solu- languages of equivariant theories (cohomology,
tion is unstable. (Note that this criterion is modified K-theory, etc.), stacks, and so on.
588 Compactification of Superstring Theory

To give another example, obstructed deformations We now assume N = 1 supersymmetry. An unbroken


(solutions of the linearized equations which do not supersymmetry is a spinor
for which the left-hand
correspond to elements of the tangent space of the side is zero, so we seek compactifications with a
true moduli space) correspond to scalar fields which, unique solution of these equations.
while massless, appear in the effective potential in a We first discuss the case H = 0. Setting  in
way which prevents giving them expectation values. eqn [2] to zero, we find that the warp factor f must
Since the quadratic terms V 00 are masses, this be constant. The vanishing of i requires
to be a
dependence must be at cubic or higher order. covariantly constant spinor. For a six-dimensional
While the preceding concepts are general and apply M to have a unique such spinor, it must have SU(3)
to compactification of all local field theories, string holonomy; in other words, M must be a CalabiYau
and M-theory add some particular ingredients to this manifold. In the following, we use basic facts about
general recipe. In the limits of small curvatures and their geometry.
weak coupling, string and M-theory are well described The vanishing of  then requires constant dilaton
by the ten- and 11-dimensional supergravity theories, , while the vanishing of a requires the gauge field
and thus the string/M-theory discussion usually starts strength F to solve the hermitian YangMills
with KaluzaKlein compactification of these theories, equations,
which we denote I, IIa, IIb, HE, HO and M. Let us
now discuss a particular example. F2;0 F0;2 F1;1 0
By the theorem of Donaldson and UhlenbeckYau,
such solutions are in one-to-one correspondence
CalabiYau Compactification with -stable holomorphic vector bundles with
of the Heterotic String structure group H contained in the complexification
of G. Choose such a bundle E; by the general
Contact with the SM requires finding compactifications discussion above, the commutant of H in G will be
to d = 4 either without supersymmetry, or with at most the automorphism group of the connection on E and
N = 1 supersymmetry, because the SM includes chiral thus the low-energy gauge group of the resulting
fermions, which are incompatible with N > 1. Let us EFT. For example, since E8 has a maximal E6 
start with the E8  E8 heterotic string or HE theory. SU(3) subgroup, if E has structure group H = SL(3),
This choice is made rather than HO because only in this there is an embedding such that the unbroken gauge
case can we find the SM fermion representations as symmetry is E6  E8 , realizing one of the standard
subrepresentations of the adjoint of the gauge group. grand unified groups E6 as a factor.
Besides the metric, the other bosonic fields of the HE The choice of E is constrained by anomaly
supergravity theory are a scalar  called the dilaton, cancellation. This discussion (Green et al. 1987)
YangMills gauge potentials for the group G E8  modifies the Bianchi identity for H to
E8 , and a 2-form gauge potential B (often called the
NeveuSchwarz or NS 2-form) whose defining 1 X a
dH tr R ^ R  F ^ Fa 5
characteristic is that it minimally couples to the 30 a
heterotic string world-sheet. We will need their gauge
field strengths below: for YangMills, this is a 2-form where R is the matrix of curvature 2-forms. The
a normalization of the F ^ F term is such that if we
FIJ with a indexing the adjoint of Lie G, and for the NS
2-form this is a 3-form HIJK . Denoting the two take E TK the holomorphic tangent bundle of K,
MajoranaWeyl spinor representations of SO(1, 9) as with isomorphic connection, then using the embed-
S and C, then the fermions are the gravitino I 2 ding we just discussed, we obtain a solution of eqn
S
V, a spin 1/2 dilatino  2 C, and the adjoint [5] with H = 0.
gauginos a 2 S. We use I to denote Dirac matrices Thus, we have a complete solution of the
contracted with a zehnbein, satisfying {I , J } = equations of motion. General arguments imply that
2GIJ , and IJ = (1=2)[I , J ], etc. supersymmetric Minkowski solutions are stable, so
A local supersymmetry transformation with para- the small fluctuations consist of massless and
meter
is then massive fields. Let us now discuss a few of the
massless fields. Since the EFT has N = 1 super-
I DI
18HIJK JK
2 symmetry, the massless scalars live in chiral multi-
plets, which are local coordinates on a complex
 @I I
 12
1
HIJK IJK
3 Kahler manifold.
First, the moduli of Ricci-flat metrics on K will
a FIJ
a IJ

4 lead to massless scalar fields: the complex structure
Compactification of Superstring Theory 589

moduli, which are naturally complex, and Kahler Note the very important fact that this expression
moduli, which are not. However, in string compac- only depends on the cohomology classes of the i
tification the latter are complexified to the periods of (and ). This means the Yukawa couplings can be
the 2-form B iJ integrated over a basis of H2 (K, Z), computed without finding the explicit harmonic
where J is the Kahler form and B is the NS 2-form. In representatives, which is not possible (we do not
addition, there is a complex field pairing the dilaton even know the explicit metric). More generally, one
(actually, exp()) and the model-independent expects to be able to explicitly compute the super-
axion, the scalar dual in d = 4 to the 2-form B . potential and all other holomorphic quantities in
Finally, each complex modulus of the holomorphic the effective Lagrangian solely from topological
bundle E will lead to a chiral multiplet. Thus, the information (the Dolbeault cohomology ring, and
total number of massless uncharged chiral multiplets its generalizations within topological string theory).
is 1 h1, 1 (K) h2, 1 (K) dim H1 (K, End (E)). On the other hand, computing the Kahler metric in
Massless charged matter will arise from zero an N = 1 EFT is usually out of reach as it would
modes of the gauge field and its supersymmetric require having explicit normalized zero modes.
partner spinor a . It is slightly easier to discuss the Most results for this metric come from considering
spinor, and then appeal to supersymmetry to get the closely related compactifications with extended
bosons. Decomposing the spinors of SO(6) under supersymmetry, and arguing that the breaking
SU(3), one obtains (0, p) forms, and the Dirac to N = 1 supersymmetry makes small corrections
equation becomes the condition that these forms to this.
are harmonic. By the Hodge theorem, these are in There are several generalizations of this construc-
one-to-one correspondence with classes in Dolbeault tion. First, the necessary condition to solve eqn [5] is
cohomology H 0, p (K, V), for some bundle V. The that the left-hand side be exact, which requires
bundle V is obtained by decomposing the spinor into
c2 E c2 TK 7
representations of the holonomy group of E. For
H = SU(3), the decomposition of the adjoint under This allows for a wide variety of Es to be used, so
the embedding of SU(3)  E6 in E8 , that Ngen = 3 can be attained with many more Ks.
This class of models is often called (0, 2) compacti-
 27
248 8; 1 1; 78 3; 27 3;  6 fication to denote the world-sheet supersymmetry
implies that charged matter will form generations of the heterotic string in these backgrounds. One can
in the 27, of number dim H 0, 1 (K, E), and antigene- also use bundles with larger structure group; for

rations in the 27, of number dim H 0, 1 (K, E)  = example, H = SL(4) leads to unbroken SO(10)  E8 ,
0, 2
dim H (K, E). The difference in these numbers is and H = SL(5) leads to unbroken SU(5)  E8 .
determined by the AtiyahSinger index theorem to be The subsequent breaking of the grand unified
group to the SM gauge group is typically done by
1 choosing K with nontrivial 1 , so that it admits a
Ngen N27  N27
 2c3 E
flat line bundle W with nontrivial holonomy
In the special case of E TK, these numbers are (usually called a Wilson line). One then uses the
separately determined to be N27 = b1, 1 and bundle E
W in the above discussion, to obtain the
2, 1
N27
 =b , so their difference is (K)=2, half the commutant of H
W as gauge group. For example,
Euler number of K. In the real world, this number is if 1 (K) Z5 , one can use W whose holonomy is an
Ngen = 3, and matching this under our assumptions element of order 5 in SU(5), to obtain as commutant
so far is very constraining. the SM gauge group SU(3)  SU(2)  U(1).
Substituting these zero modes into the ten- Another generalization is to take the 3-form H 6 0.
dimensional YangMills action and integrating, one This discussion begins by noting that, for super-
can derive the d = 4 EFT. For example, the cubic symmetry, we still require the existence of a unique
terms in the superpotential, usually called Yukawa spinor
; however, it will no longer be covariantly
couplings after the corresponding fermionboson constant in the Levi-Civita connection. One way to
interactions in the component Lagrangian, are structure the problem is to note that the right-hand
obtained from the cubic product of zero modes side of eqn [2] takes the form of a connection with
Z torsion; the resulting equations have been discussed
 ^ tr1 ^ 2 ^ 3 mathematically in (Li and Yau 2004).
K
Another recent approach to these compactifica-
where  is the holomorphic i 2 H 0, 1 (K, Rep E) are tions (Gauntlett 2004) starts out by arguing that

the zero modes, and tr arises from decomposing the cannot vanish on K, so it defines a weak SU(3)
E8 cubic group invariant. structure, a local reduction of the structure group of
590 Compactification of Superstring Theory

T K to SU(3) which need not be integrable. This Ns = 32


structure must be present in all N = 1, d = 4 super-
Given the supersymmetry algebra, if such a super-
symmetric compactifications and there are hopes
gravity exists, it is unique. Thus, toroidal compac-
that it will lead to a useful classification of the
tifications of d = 11 supergravity, IIa and IIb
possible local structures and corresponding partial
supergravity lead to the same series of maximally
differential equations (PDEs) on K.
supersymmetric theories. Their structure is gov-
erned by the exceptional Lie algebra E11d ; the
Higher-Dimensional and Extended gauge charges transform in a fundamental repre-
Supersymmetric Compactifications sentation of this algebra, while the scalar fields
parametrize a coset space G=H, where G is the
While there are similar quasirealistic constructions maximally split real form of the Lie group E11d ,
which start from the other string theories and and H is a maximal compact subgroup of G.
M-theory, before we discuss these, let us give an Nonperturbative duality symmetries lead to a
overview of compactifications with N 2 super- further identification by a maximal discrete sub-
symmetry in four dimensions, and in higher dimen- group of G.
sions. These are simpler analog models which can be
understood in more depth; their study led to one of
the most important discoveries in string/M-theory, Ns = 16
the theory of superstring duality. This supergravity can be coupled to maximally
As before, we require a covariantly constant supersymmetric super YangMills theory, which
spinor. For Ricci-flat K with other background given a choice of gauge group G is unique. Thus,
fields zero, this requires the holonomy of K to be these theories (with zero cosmological constant and
one of trivial, SU(n), Sp(n), or the exceptional thus allowing super-Poincare symmetry) are
holonomies G2 or Spin(7). In Table 1 we tabulate uniquely determined by the choice of G.
the possibilities with spacetime dimension d greater In d = 10, the choices E8  E8 and Spin(32)=Z2
or equal to 3, listing the supergravity theory, the which arise in string theory, are almost uniquely
holonomy type of K, and the type of the resulting determined by the GreenSchwarz anomaly cancel-
EFT: dimension d, total number of real super- lation analysis. Compactification of these HE, HO
symmetry parameters Ns, and the number of spinor and type I theories on T n produces a unique theory
supercharges N (in d = 6, since left- and right- with moduli space
chirality Majorana spinors are inequivalent, there
are two numbers). R  SOn; n 16; ZnSOn; n 16; R=SOn; R
The structure of the resulting supergravity EFTs is  SOn 16; R 8
heavily constrained by Ns. We now discuss the
various possibilities. In KaluzaKlein (KK) reduction, this arises from the
choice of metric gij , the antisymmetric tensor Bij and
Table 1 String/M-theories, holonomy groups and the resulting the choice of a flat E8  E8 or Spin(32)=Z2 connec-
supersymmetry tion on T n , while a more unified description follows
from the heterotic string world-sheet analysis. Here
Theory Holonomy d Ns N
the group SO(n, n 16) is defined to preserve an even
M, II Torus Any 32 Max self-dual quadratic form  of signature (n, n 16);
M SU(2) 7 16 1 for example,  = (E8 ) (E8 ) I I I, where I
SU(3) 5 8 1 is the form of signature (1,1) and E8 is the Cartan
G2 4 4 1
matrix. In fact, all such forms are equivalent under
Sp(4) 3 6 3
SU(4) 3 4 2 orthogonal integer similarity transformation; so,
Spin(7) 3 2 1 the resulting EFT is unique. It has a rank 16 2n
IIa SU(2) 6 16 (1, 1) gauge group, which at generic points in moduli
SU(3) 4 8 2 space is U(1)162n , but is enhanced to a nonabelian
G2 3 4 2
group G at special points. To describe G, we first
IIb SU(2) 6 16 (0, 2)
SU(3) 4 8 2 note that a point p in moduli space determines an
G2 3 4 2 n-dimensional subspace Vp of R162n , and
HE, HO, I Torus Any 16 Max/2 an orthogonal subspace Vp? (of varying dimen-
SU(2) 6 8 1 sion). Lattice points of length squared 2 con-
SU(3) 4 4 1
tained in Vp? then correspond to roots of the Lie
G2 3 2 1
algebra of Gp .
Compactification of Superstring Theory 591

The other compactifications with Ns = 16 is Finally, these constructions admit further discrete
M-theory on K3 and its further toroidal reductions, choices, which break some of the gauge symmetry.
and IIb on K3. M-theory compactification to d = 7 The simplest to explain is in the toroidal compacti-
is dual to heterotic on T 3 , with the same moduli fication of I/HE/HO. The moduli space of theories
space and enhanced gauge symmetry. As we discuss we discussed uses flat connections on the torus
at the end of the section Stringy and quantum which are continuously connected to the trivial
corrections, the extra massless gauge bosons of connection, but in general the moduli space of flat
enhanced gauge symmetry are M2 branes wrapped connections has other components. The simplest
on 2-cycles with topology S2 . For such a cycle to example is the moduli space of flat E8  E8
have zero volume, the integral of the Kahler form connections on S1 , which has a second component
and holomorphic 2-form over the cycle must vanish; in which the holonomy exchanges the two E8 s. On
expressing this in a basis for H 2 (K3, R) leads to T 3 , there are connections for which the holonomies
exactly the same condition we discussed for cannot be simultaneously diagonalized. This struc-
enhanced gauge symmetry above. The final result is ture and the M-theory dual of these choices is
that all such K3 degenerations lead to one- of the discussed in (de Boer et al. 2001).
two-dimensional canonical singularities, of types A,
D or E, and the corresponding EFT phenomenon is
Ns = 8, d < 6
the enhanced gauge symmetry of corresponding
Dynkin type A, D, or E. Again, the gravity multiplet is uniquely determined,
IIb on K3 is similar, but reducing the self-dual so the most basic classification is by the gauge group
RamondRamond (RR) 4-form potential on the 2- G. The full low-energy EFT is determined by the
cycles leads to self-dual tensor multiplets instead of matter content and action, and there are two types
Maxwell theory. The moduli space is eqn [8] but of matter multiplets. First, vector multiplets contain
with n = 5, not n = 4, incorporating periods of RR the YangMills fields, fermions and 6  d scalars;
potentials and the SL(2, Z) duality symmetry of IIb their action is determined by a prepotential which is
theory. a G-invariant function of the fields. Since the vector
One may ask if the Ns = 16 I/HE/HO theories in multiplets contain massless adjoint scalars, a generic
d = 8 and d = 9 have similar duals. For d = 8, these vacuum in which these take nonzero distinct
are obtained by a pretty construction known as vacuum expectation values (VEVs) will have U(1)r
F-theory. Geometrically, the simplest definition of gauge symmetry, the commutant of G with a generic
F-theory is to consider the special case of M-theory matrix (for d < 5, while there are several real
on an elliptically fibered CalabiYau, in the limit scalars, the potential forces these to commute in a
that the Kahler modulus of the fiber becomes small. supersymmetric vacuum). Vacua with this type of
One check of this claim for d = 8 is that the moduli gauge symmetry breaking, which does not reduce
space of elliptically fibered K3s agrees with eqn [8] the rank of the gauge group, are usually referred to
with n = 2. as on a Coulomb branch of the moduli space. To
Another definition of F-theory is the particular summarize, this sector can be specified by nV , the
case of IIb compactification using Dirichlet number of vector multiplets, and the prepotential F ,
7-branes, and orientifold 7-planes. This construction a function of the nV VEVs which is cubic in d = 5,
is T-dual to the type I theory on T 2 , which provides and holomorphic in d = 4.
its simplest string theory definition. As discussed in Hypermultiplets contain scalars which parame-
Polchinski (1999), one can think of the open strings trize a quaternionic Kahler manifold, and partner
giving rise to type I gauge symmetry as living on 32 fermions. Thus, this sector is specified by a 4nH real
Dirichlet 9-branes (or D9-branes) and an orientifold dimensional quaternionic Kahler manifold. The G
nineplane. T-duality converts Dirichlet and orienti- action comes with triholomorphic moment maps; if
fold p-branes to (p  1)-branes; thus this relation nontrivial, VEVs in this sector can break gauge
follows by applying two T-dualities. symmetry and reduce it in rank. Such vacua are
These compactifications can also be parametrized usually referred to as on a Higgs branch.
by elliptically fibered CalabiYaus, where K is the The basic example of these compactifications is
base, and the branes correspond to singularities of M-theory on a CalabiYau 3-fold (CY3 ). Reduction
the fibration. The relation between these two of the 3-form leads to h1, 1 (K) vector multiplets,
definitions follows fairly simply from the duality whose scalar components are the CY Kahler moduli.
between M-theory on T 2 , and IIb string on S1 . There The CY complex structure moduli pair with periods
is a partially understood generalization of this of the 3-form to produce h2, 1 (K) hypermultiplets.
to d = 9. Enhanced gauge symmetry then appears when the
592 Compactification of Superstring Theory

CY3 contains ADE singularities fibered over a curve, M-theory on an elliptically fibered CY3 in the same
from the same mechanism involving wrapped M2 general way we discussed under Ns = 16. The
branes we discussed under Ns = 16. If degenerating relation between F-theory and the heterotic string
curves lead to other singularities (e.g., the ODP or on K3 can be seen by lifting M-theory-heterotic
conifold), it is possible to obtain extremal transi- duality; this suggests that the two constructions are
tions which translate physically into CoulombHiggs dual only if the CY3 is a K3 fibration as well. Since
transitions. Finally, singularities in which surfaces not all elliptically fibered CY3 s are K3 fibered, the
degenerate lead to nontrivial fixed-point theories. F-theory construction is more general.
Reduction on S1 leads to IIa on CY3 , with the We return to d = 4 and Ns = 4 in the final section.
spectrum above plus a universal hypermultiplet The cases of Ns < 4 which exist in d  3 are far less
which includes the dilaton. Perhaps the most studied.
interesting new feature is the presence of world-
sheet instantons, which correct the metric on vector
Stringy and Quantum Corrections
multiplet moduli space. This metric satisfies the
restrictions of special geometry and thus can be The D-dimensional low-energy effective supergrav-
derived from a prepotential. ity actions on which we based our discussion so far
The same theory can be obtained by compactifi- are only approximations to the general story of
cation of IIb theory on the mirror CY3 . Now vector string/M-theory compactification. However, if
multiplets are related to the complex structure Plancks constant is small, K is sufficiently large,
moduli space, while hypermultiplets are related to and its curvature is small, then they are controlled
Kahler moduli space. In this case, the prepotential approximations.
derived from variation of complex structure receives In M-theory, as in any theory of quantum gravity,
no instanton corrections, as we discuss in the next corrections are controlled by the Planck scale
section. parameter MD2P , which sits in front of the Einstein
Finally, one can compactify the heterotic string on term of the D-dimensional effective Lagrangian, and
K3  T 6d , but this theory follows from toroidal plays the role of h. In general, this is different from
reduction of the d = 6 case we discuss next. the four-dimensional Planck scale, which satisfies
M2P 4 = Vol(K)MD2P . After taking the low-energy
limit E MP , the remaining corrections are con-
Ns = 8, d = 6
trolled by the dimensionless parameters lP =R, where
These supergravities are similar to d < 6, but there R can any characteristic length scale of the solution:
is a new type of matter multiplet, the self-dual a curvature radius, the length of a nontrivial cycle,
tensor (in d < 6 this is dual to a vector multiplet). and so on.
Since fermions in d = 6 are chiral, there is an In string theory, one usually thinks of the
anomaly cancellation condition relating the numbers corrections as a double series expansion in gs , the
of the three types of multiplets (Aspinwall 1996, dimensionless (closed) string coupling constant, and
section 6.6), 0 , the inverse string tension parameter, of dimen-
sions (length)2 . The ten-dimensional Planck scale is
nH  nV 29nT 273 9
related to these parameters as M8P = 1=g2s (0 )4 , up to
One class of examples is the heterotic string a constant factor that depends on conventions.
compactified on K3. In the original perturbative Besides perturbative corrections, which have power-
constructions, to satisfy eqn [7], we need to choose a like dependence on these parameters, there can be
vector bundle with c2 (V) = (K3) = 24. The result- world sheet and brane instanton corrections. For
ing degrees of freedom are a single self-dual tensor example, a string world sheet can wrap around a
multiplet and a rank-16 gauge group. More gen- topologically nontrivial spacelike 2-cycle  in K,
erally, one can introduce N5B heterotic 5-branes, leading to an instanton correction to the effective
which generalize eqn [7] to c2 (E) N5B = c2 (TK). action which is suppressed as exp(Vol()=2 0 ).
Since this brane carries a self-dual tensor multiplet, More generally, any p-brane wrapping a p-cycle
this series of models is parametrized by nT . They are can produce a similar effect. As for which terms in
connected by transitions in which an E8 instanton the effective Lagrangian receive corrections, this
shrinks to zero size and becomes a 5-brane; the depends largely on the number and symmetries of
resulting decrease in the dimension of the moduli the fermion zero modes on the instanton world
space of E8 bundles on K3 agrees with eqn [9]. volumes.
Another class of examples is F-theory on an Let us start by discussing some cases in which one
elliptically fibered CY3 . These are related to can argue that these corrections are not present.
Compactification of Superstring Theory 593

First, extended supersymmetry can serve to elim- string/M-theory compactification on a singular


inate many corrections. This is analogous to the manifold K is typically consistent, but has new
familiar result that the superpotential in d = 4, N = 1 light degrees of freedom in the EFT, not predicted
supersymmetric field theory does not receive (or is by KK arguments. We implicitly touched on one
protected from) perturbative corrections, and in example of this in the discussion of M-theory
many cases follows from similar formal arguments. compactification on K3 above, as the space of
In particular, supersymmetry forbids corrections to Ricci-flat K3 metrics has degeneration limits in
the potential and two derivative terms in the which curvatures grow without bound, while the
Ns = 32 and Ns = 16 Lagrangians. volumes of 2-cycles vanish. On the other hand, the
In Ns = 8, the superpotential is protected, but the structure of Ns = 16 supersymmetry essentially
two derivative terms can receive corrections. How- forces the d = 7 EFT in these limits to be non-
ever, there is a simple argument which precludes singular. Its only noteworthy feature is that a
many such corrections since vector multiplet and nonabelian gauge symmetry is restored, and thus
hypermultiplet moduli spaces are decoupled, a certain charged vector bosons and their superpart-
correction whose control parameter sits in (say) a ners become massless.
vector multiplet, cannot affect hypermultiplet mod- To see what is happening microscopically, we
uli space. This fact allows for many exact computa- must consider an M-theory membrane (or 2-brane),
tions in these theories. wrapped on a degenerating 2-cycle. This appears as
As an example, in IIb on CY3 , the metric on a particle in d = 7, charged under the vector
vector multiplet moduli space is precisely eqn [1] as potential obtained by reduction of the D = 11
obtained from supergravity (in other words, the 3-form potential. The mass of this particle is the
WeilPetersson metric on complex structure moduli volume of the 2-cycle multiplied by the membrane
space). First, while in principle it could have been tension, so as this volume shrinks to zero, the
corrected by world-sheet instantons, since these particle becomes massless. Thus, the physics is also
depend on Kahler moduli which sit in hypermulti- well defined in 11 dimensions, though not literally
plets, it is not. The only other instantons with the described by 11-dimensional supergravity.
requisite zero modes to modify this metric are This phenomenon has numerous generalizations.
wrapped Dirichlet branes. Since in IIb theory these Their common point is that, since the essential
wrap even-dimensional cycles, they also depend on physics involves new light degrees of freedom, they
Kahler moduli and thus leave vector moduli space can be understood in terms of a lower-dimensional
unaffected. quantum theory associated with the region around
As previously discussed, for K3-fibered CY3 , this the singularity. Depending on the geometry of the
theory is dual to the heterotic string on K3  T 2 . singularity, this is sometimes a weakly coupled field
There, the vector multiplets arise from Wilson lines theory, and sometimes a nontrivial conformal field
on T 2 , and reduce to an adjoint multiplet of N = 2 theory. Occasionally, as in IIb on K3, the lightest
supersymmetric YangMills theory. Of course, in wrapped brane is a string, leading to a little string
the quantum theory, the metric on this moduli space theory (Aharony 2000).
receives instanton corrections. Thus, the duality
allows deriving the exact moduli space metric, and
N = 1 Supersymmetry in Four Dimensions
many other results of the SeibergWitten theory of
N = 2 gauge theory, as aspects of the geometry of Having described the general framework, we con-
CalabiYau moduli space. clude by discussing the various constructions which
In Ns = 4, only the superpotential is protected, lead to N = 1 supersymmetry. Besides the heterotic
and that only in perturbation theory; it can receive string on a CY3 , these compactifications include
nonperturbative corrections. Indeed, it appears that type IIa and IIb on orientifolds of CY3 , the related
this is fairly generic, suggesting that the effective F-theory on elliptically fibered CalabiYau 4-folds
potentials in these theories are often sufficiently CY4 , and M-theory on G2 manifolds. Let us briefly
complicated to exhibit the structure required for spell out their ingredients, the known nonperturbative
supersymmetry breaking and the other symmetry corrections to the superpotential, and the duality
breakings of the SM. Understanding this is an active relations between these constructions.
subject of research. To start, we recap the heterotic string construc-
We now turn from corrections to novel physical tion. We must specify a CY3 K, and a bundle E over
phenomena which arise in these regimes. While this K which admits a Hermitian YangMills connec-
is too large a subject to survey here, one of the basic tion. The gauge group G is the commutant of the
principles which governs this subject is the idea that structure group of E in E8  E8 or Spin(32)=Z2 ,
594 Compactification of Superstring Theory

while the chiral matter consists of metric moduli of ChernSimons action on the special Lagrangian
K, and fields corresponding to a basis for the cycles, with disk world-sheet instanton corrections,
Dolbeault cohomology group H 0, 1 (K, Rep E) where as studied in open string mirror symmetry. The
Rep E is the bundle E embedded into an E8 bundle gauge theory instantons are now D2-branes.
and decomposed into G-reps. Using the duality relation between the IIa string and
There is a general (though somewhat formal) 11-dimensional M-theory, this construction can be
expression for the superpotential, lifted to a compactification of M-theory on a seven-
Z dimensional manifold L, which is an S1 fibration over
  K. The D6 and O6 planes arise from singularities in the
W  ^ tr A @A
 2A3
3
S1 fibration. Generically, L can be smooth, and the
Z
only candidate in Table 1 for such an N = 1
 ^ H 3 WNP 10
compactification is a manifold with G2 holonomy;
therefore, L must have such holonomy. Finally, both
The first term is the holomorphic ChernSimons the IIa world-sheet instantons and the D2-brane
action, whose variation enforces the F0, 2 = 0 condi- instantons lift to membrane instantons in M-theory.
tion. The second is the flux superpotential, while This construction implicitly demonstrates the exis-
the third term is the nonperturbative corrections. tence of a large number of G2 holonomy manifolds.
The best understood of these arise from super- Another way to arrive at these is to go back to the
symmetric gauge theory sectors. In some, but not all, heterotic string on K, and apply the duality (again
cases, these can be understood as arising from gauge under Ns = 16) between heterotic on T 3 and M-theory
theoretic instantons, which can be shown to be dual on K3 to the T 3 fibration structure on K, to arrive at
to heterotic 5-branes wrapped on K. Heterotic M-theory on a K3-fibered manifold of G2 holonomy.
world-sheet instantons can also contribute. Wrapping membranes on 2-cycles in these fibers, we
The HO theory is S-dual to the type I string, with can see enhanced gauge symmetry in this picture fairly
the same gauge group, realized by open strings on directly. It is an illuminating exercise to work through
Dirichlet 9-branes. This construction involves essen- its dual realizations in all of these constructions.
tially the same data. The two classes of heterotic Our final construction uses the interpretation of the
instantons are dual to D1- and D5-brane instantons, strong coupling limit of the HE theory as M-theory on
whose world-sheet theories are somewhat simpler. a one-dimensional interval I, in which the two E8
If the CY3 K has a fibration by tori, by applying factors live on the two boundaries. Thus, our original
T-duality to the fibers along the lines discussed for starting point can also be interpreted as the heterotic
tori under Ns = 16 above, one obtains various type II string on K  I. This construction is believed to be
orientifold compactifications. On an elliptic fibra- important physically as it allows generalizing a
tion, double T-duality produces a IIb compactifica- heterotic string tree-level relation between the gauge
tion with D7s and O7s. Using the relation between and gravitational couplings which is phenomenologi-
IIb theory on T 2 and F-theory on K3 fiberwise, one cally disfavored. One can relate it to a IIa orientifold as
can also think of this as an F-theory compactifica- well, now with D8- and O8-branes.
tion on a K3-fibered CY4 . More generally, one These multiple relations are often referred to as the
can compactify F theory on any elliptically fibered web of dualities. They lead to numerous relations
4-fold to obtain N = 1. These theories have between compactification manifolds, moduli spaces,
D3-instantons, the T-duals of both the type I superpotentials, and other properties of the EFTs,
D1- and D5-brane instantons. whose full power has only begun to be appreciated.
The theory of mirror symmetry predicts that all
CY3 s have T 3 fibration structures. Applying the
corresponding triple T-duality, one obtains a IIa Suggestions for further reading
compactification on the mirror CY3 K, ~ with D6-
Original references for all but the most recent of
branes and O6-planes. Supersymmetry requires these topics can be found in the following textbooks
these to wrap special Lagrangian cycles in K.~ As in
and proceedings. We have also referenced a few
all Dirichlet brane constructions, enhanced gauge research articles which are good starting points for
symmetry arises from coincident branes wrapping the more recent literature. There are far more
the same cycle, and only the classical groups are reviews than we could reference here, and a partial
visible in perturbation theory. Exceptional gauge listing of these appears at http://www.slac.stanford.
symmetry arises as a strong coupling phenomenon edu/spires/reviews/
of the sort described in the previous section. The
superpotential can also be thought of as mirror to See also: Brane Construction of Gauge Theories;
eqn [10], but now the first term is the sum of a real Random Algebraic Geometry, Attractors and Flux Vacua;
Compressible Flows: Mathematical Theory 595

String Theory: Phenomenology; Superstring Theories; Connes A and Gawedzki K (eds.) (1998) Les Houches 1995:
Two-Dimensional Conformal Field Theory and Vertex Quantum Symmetries. Amsterdam: North-Holland.
Operator Algebras; Viscous Incompressible Fluids: Deligne P et al. (eds.) (1999) Quantum Fields and Strings: A Course for
Mathematical Theory. Mathematicians. Providence, RI: American Mathematical Society.
Douglas M et al. (eds.) (2004) Strings and Geometry: Proceedings
of the 2002 Clay School. Providence, RI: American Mathe-
Further Reading matical Society.
Gauntlett J (2004) Branes, calibrations and supergravity. In:
Aharony O (2000) A brief review of little string theories. Douglas M et al. (eds.) Strings and Geometry, pp. 79126.
Classical and Quantum Gravity 17: 929938. Providence, RI: American Mathematical Society.
Aspinwall PS (1996) K3 surfaces and string duality, 1996 Green MB, Schwarz JH, and Witten E (1987) Superstring Theory,
preprint, arXiv:hep-th/9611137. 2 vols. Cambridge: Cambridge University Press.
Bachas C et al. (eds.) (2002) Les Houches 2001: Unity from Li J and Yau S-T (2004) The existence of supersymmetric string
Duality: Gravity, Gauge Theory and Strings. Berlin: theory with torsion, 2004 preprint, arXiv:hep-th/0411136.
Springer. Polchinski J (1998) String Theory, 2 vols. Cambridge: Cambridge
de Boer J et al. (2002) Triples, fluxes, and strings. Advances in University Press.
Theoretical and Mathematical Physics 4: 995.

Compressible Flows: Mathematical Theory


G-Q Chen, Northwestern University, independent variables, then the constitutive relations
Evanston, IL, USA can be written as
2006 Elsevier Ltd. All rights reserved. e; p;  e ; S; p ; S;  ; S 6
2
governed by  dS = de pd = de  pd = . For
Introduction polytropic gases,

The Euler equations for compressible fluids consist of p p ; S   eS=cv


the conservation laws of mass, momentum, and energy: p
e
  1 7
@t rx  m 0; x 2 Rd 1
p
  
R
m
m
@t m rx  rx p 0 2
where R > 0 may be taken to be the universal gas
  constant divided by the effective molecular weight of
m the particular gas, cv > 0 is the specific heat at constant
@t E rx  E p 0 3
volume,  = 1 R=cv > 1 is the adiabatic exponent,
and  can be any positive constant under scaling.
Equivalently, these correspond to the general form of
The most important criterion of applicability of
nonlinear hyperbolic systems of conservation laws:
any mathematical model is its well-posedness:
@t u rx  f u 0; x 2 Rd ; u 2 Rn 4 existence, uniqueness, and stability. The well-posedness
theory for compressible fluid flows is far from being
System [1][3] is closed by the following constitutive complete, and many further issues are still unexplored.
relations: In particular, the global existence and uniqueness of
solutions in Rd , d 2, is still a major open problem, and
1 jmj2
p p ; e; E e 5 only partial results shed some lights on the amazing
2
complexity of the problem. Below, we will mainly focus
In [1][3] and [5],  = 1= is the deformation on the well-posedness issues with emphasis on the
gradient (specific volume for fluids, strain for Cauchy problem, the initial value problem:
solids), v = (v1 , . . . , vd )> is the fluid velocity with
ujt0 u0 8
v = m the momentum vector, p is the scalar
pressure, and E is the total energy with e the first for inviscid compressible fluid flows and then
internal energy which is a given function of (, p) or for viscous compressible fluid flows.
( , p) defined through thermodynamical relations. Throughout this article, where a cited reference is
The other two thermodynamic variables are tem- not shown in the Further reading section, it may
perature  and entropy S. If ( , S) are chosen as usually be found by consulting Bressan (2000),
596 Compressible Flows: Mathematical Theory

Chen (2005), Dafermos (2005), Feireisl (2004), The system above can be rewritten in Lagrangian
Lions (1986, 1988) or Liv (2000). coordinates:

@t   @x v 0; @t v @x p 0
2
13
Inviscid Compressible Fluid Flows: @t e v =2 @x pv 0
Euler Equations with v = m=, where the coordinates (t, x) are
Solutions to the Euler equations [1][3] are generically the Lagrangian coordinates, which are different
discontinuous functions obeying the ClausiusDuhem from the Eulerian coordinates for [12]; for simp-
inequality, the second law of thermodynamics: licity of notations, we do not distinguish them.
For the barotropic case, systems [12] and [13]
@t S rx  mS  0 9 reduce to
in the sense of distributions. Such discontinuous  
@t  @x m 0; @t m @x m2 = p 0 14
solutions are called entropy solutions.
When a flow is isentropic, that is, entropy S is a and
uniform constant S0 in the flow, then the Euler
equations for the flow take the simpler form: @t   @x v 0; @t v @x p 0 15

@t  rx  m 0 respectively, where pressure p = p() = p~(),  = 1=.


10 The solutions of [12] and [13], as well as [14] and
@t m rx  m  m= rx p 0
[15], are equivalent even for entropy solutions with
where the pressure is a function of the density, vacuum where  = 0.
p = p(, S0 ), with constant S0 . For a polytropic gas, The potential flow is well known in transonic
aerodynamics, beyond the isentropic approxi-
p  ; >1 11
mation [10] from [1] to [3]. Denote Dt = @t
P d
where  can be any positive constant by scaling. This k = 1 vk @xk the convective derivative along fluid
system can be derived from [1] to [3] as follows: for particle trajectories. From [1] to [3], we have
smooth solutions of [1][3], entropy S(, m, E) is
conserved along fluid particle trajectories, that is, Dt S 0 16

@t S rx  mS 0 and, by taking the curl of the momentum equations,


 
If the entropy is initially a uniform constant and ! ! pS ; S
Dt  rx v rx   rx S 17
the solution remains smooth, then the energy   3
equation can be eliminated and entropy S keeps the
The identities [16] and [17] imply that a smooth
same constant in later time. Thus, under constant
solution of [1][3] which is both isentropic and
initial entropy, a smooth solution of [1][3] satisfies
irrotational at time t = 0 remains isentropic and
the equations in [10]. Furthermore, solutions of
irrotational for all later times, as long as this
system [10] are also a good approximation to
solution stays smooth. Then, the conditions
solutions of system [1][3] even after shocks form,
S = S0 = const. and ! = curlx v = 0 are reasonable for
since the entropy increases across a shock to the
smooth solutions. For a smooth irrotational solu-
third order in wave strength for solutions of [1][3],
tion, we integrate the d-momentum equations in
while in [10] the entropy is constant. Moreover,
[10] through Bernoullis law:
system [10] is an excellent model for the isothermal
fluid flow with  = 1 and for the shallow-water flow @t v rx jvj2 =2 rx h 0
with  = 2. For such barotropic flows (i.e., p = p()),
the energy equation [3] serves as an entropy where h0 () = p (, S0 )=. On a simply connected
inequality (see Lax (1973)): space region, the condition curlx v = 0 implies that
there exists  such that v = rx . Then,
@t E rx  mE p=  0
in the sense of distributions @t  rx  rx  0
18
@t  12jrx j2 h K
In the one-dimensional case, system [1][3] in
Eulerian coordinates is for some constant K. From the second equation in
  [18], we have
@t  @x m 0; @t m @x m2 = p 0
12 D h1 K  @t  12jrx j2
@t E @x mE p= 0
Compressible Flows: Mathematical Theory 597

Then, system [18] can be rewritten as the following Consider the Cauchy problem of the Euler
time-dependent potential flow equation of second equations [1][3] in R3 for polytropic gases with
order: smooth initial data:

@t D rx  Drx  0 19 ; v; Sjt0 0 ; v0 ; S0 x


0 x > 0; x 2 R3 21
For a steady solution  = (x), that is, @t  = 0,
we obtain the celebrated steady potential flow , 0, S) for jxj  L, where
satisfying (0 , v0 , S0 )(x) = (
equation of aerodynamics: 
 > 0, S, and L are given constants. The equations
possess a unique local C1 solution (, v, S)(t, x) with
rx  rx rx  0 20 (t, x) > 0 provided that the initial data [21] is
sufficiently regular. The support of the smooth
In applications in aerodynamics, [18] or [19] is disturbance (0 (x)  , q (x), S0 (x)  S) propagates
v0
used for discontinuous solutions, and the empirical
evidence is that entropy solutions of [18] or [19] are with speed at most  = p ( , S) (the sound speed),
fairly good approximations to entropy solutions for that is,
[1][3] provided that (1) the shock strengths are ; 0; S if jxj  L t
; v; St; x  22
small, (2) the curvature of shock fronts is not too
large, and (3) there is a small amount of vorticity in Define
the region of interest. Model [19] or [18] is an Z  
excellent model to capture multidimensional shock Pt pt; x; St; x1=  p
; S1= dx
3
waves by ignoring vorticity waves, while the ZR
incompressible Euler equations are an excellent Ft vt; x  x dx
model to capture multidimensional vorticity waves R3

by ignoring shock waves. which, roughly speaking, measure the entropy and the
radial component of momentum. Then, if (, v, S)(t, x)
is a C1 solution of [1][3] and [21] for 0 < t < T, and
Local Well-Posedness for Classical Solutions
P0  0; F0 > R4 max 0 x
Consider the Cauchy problem for the Euler equations x
[1][3] with Cauchy data [8]: with  16 =3 23
d s 1
Assume that u0 : R ! D is in H \ L with s > d=2 1. then the lifespan T of the C1 solution is finite
Then, for the Cauchy problem [1][3] and [8], there (Sideris 1985).
exists a finite time T = T(ku0 ks , ku0 kL1 ) 2 (0, 1) such
To illustrate a way in which the conditions in
that there is a unique, stable bounded classical solution
[23] may be satisfied, consider the initial data:
u 2 C1 ([0, T]  Rd ) with u(t, x) 2 D for (t, x) 2 [0, T] 
Rd and u 2 C([0, T]; Hs ) \ C1 ([0, T]; H s1 ). Moreover, 0 = , S0 = S. Then P(0) = 0, and [23] holds if
the interval [0, T) with T < 1 is the maximal interval
Z
of the classical H s existence for [1][3] if and only if v0 x  x dx > R4
jxj<R
either k(ut ,rx u)kL1 ! 1 or u(t, x) escapes every
compact subset K ! D as t ! T. Comparing both sides, one finds that the initial
This local existence can be established by relying velocity must be supersonic in some region relative
solely on the elementary linear existence theory for to the sound speed at infinity. The formation of a
symmetric hyperbolic systems with smooth coeffi- singularity (presumably a shock wave) is detected as
cients (cf. Majda (1984)), or by the abstract the disturbance overtakes the wave front forcing the
semigroup theory (Kato 1975). front to propagate with supersonic speed.
Singularities are formed even without the condi-
tion of largeness, such as [23], being satisfied. For
Formation of Singularities example, if S0 (x)  S and, for some 0 < R0 < R,
Z
For the one-dimensional case, singularities include
jxj1 jxj  r2 0 x   dx > 0
the development of shock waves and formation of jxj>r
vacuum states. For the multidimensional case, the Z 24
situation is much more complicated: besides shock jxj3 jxj2  r2 0 xv0 x  x dx  0
jxj>r
waves and vacuum states, singularities can also be
generated from vortex sheets, focusing and breaking for R0 < r < R, then the lifespan T of the C1
of waves, among others. solution of [1][3] and [21] is finite. The
598 Compressible Flows: Mathematical Theory

assumptions in [24] mean that, in an average sense, Sobolev space Hul s


(D   
0 ), while (0 , v0 , E0 )(x) belongs
the gas must be slightly compressed and outgoing to the Sobolev space H s (D 0 ), for some fixed s  10.
directly behind the wave front. Assume also that there is a function () 2 H s (S 0 )
so that [26] and [27] hold, and the compatibility
conditions up to order s  1 are satisfied on S 0 by
Local Well-Posedness for Shock-Front Solutions
the initial data, together with the entropy condition:
For a general hyperbolic system of conservation laws q
[4], shock-front solutions are discontinuous, piecewise v
0 
 p 
0 ; S0 < 
smooth entropy solutions with the following structure: q
< v 0 
 p  
0 ; S0 29
1. There exists a C2 spacetime hypersurface S(t)
defined in (t, x) for 0  t  T with spacetime Then, there are a C2 hypersurface S(t) and C1
normal (
t ,
x ) = (
t ,
1 , . . . ,
d ) as well as two functions ( , v , E )(t, x) defined for t 2 [0, T],
C1 vector-valued functions: u (t, x) and u (t, x), with T sufficiently small, so that
defined on respective domains D and D on
 ; v ; E t; x; t; x 2 D
either side of the hypersurface S(t) and satisfying ;v; Et; x 30
@t u rx  f (u ) = 0 in D ;  ; v ; E t; x; t; x 2 D
2. The jump across the hypersurface S(t) satisfies the is the discontinuous shock-front solution of the
RankineHugoniot condition: Cauchy problem [1][3] and [28]. Here a vector
s
{
t u  u
x  f u  f u }jS = 0 function u is in Hul , provided that there exists
some r > 0 so that maxy2Rd kwr, y ukHs < 1 with
d
For [4], the surface S is not known in advance wr, y (x)= w((x  y)=r), where w 2 C1 0 (R ) is a
and must be determined as part of the solution of function so that w(x)  0, w(x)= 1 when jxj  1=2,
the problem; thus, the two equations in (1)(2) and w(x) = 0 when jxj > 1.
describe a multidimensional, highly nonlinear, free- The compatibility conditions are needed in order
boundary-value problem. The initial data yielding to avoid the formation of discontinuities in higher
shock-front solutions is defined as follows. Let S 0 be derivatives along other characteristic surfaces ema-
a smooth hypersurface parametrized by , and let nating from S 0 : Once the main condition [26] is

() = (
1 , . . . ,
d )() be a unit normal to S 0 . Define satisfied, the compatibility conditions are automati-
the piecewise smooth initial values for respective cally guaranteed for a wide class of initial data. The
domains D 
0 and D0 on either side of the hypersur- idea of the proof is to use the existence of a strictly
face S 0 as convex entropy and the symmetrization of [4]; the
shock-front solutions are defined as the limit of a
u0 x; x 2 D 0
u0 x 25 convergent classical iteration scheme based on
u
0 x; x 2 D
0 a linearization by using the theory of linearized
It is assumed that the initial jump in [25] satisfies the stability for shock fronts (Majda 1984). The uni-
RankineHugoniot condition, that is, there is a form existence time of shock-front solutions in
smooth scalar function () so that shock strength can be achieved (Metivier 1990).
 
  u 
0   u0 
      Global Theory in L1 for the Isentropic Euler

  f u0   f u0  0 26 Equations for x 2 R
and that () does not define a characteristic Consider the Cauchy problem for [14] with initial
direction, that is, data:
 
 6 i u
0 ;  2 S0 ; 1  i  n 27 ; mjt0 0 ; m0 x 31
where i , i = 1, . . . , n, are the eigenvalues of [4]. It is where 0 and m0 are in the physical region
natural to require that S(0) = S 0 . {(, m) :   0, jmj  C0 } for some C0 > 0. System
Consider the Euler equations [1][3] in R3 for [14] is strictly hyperbolic at the states with  > 0,
polytropic gases with piecewise smooth initial data: and strict hyperbolicity fails at the vacuum states
  V := {(, m=) :  = 0, jm=j < 1}. Then, we have:
 ; v ; E x; x 2 D 0
; v; Ejt0  0 0   28
0 ; v0 ; E x; x 2 D 0
1. There exists a global solution (, m)(t, x) of the
Cauchy problem [14] and [31] satisfying
Assume that S 0 is a smooth compact surface in R3
and that ( 0  t; x  C; jmt; xj  Ct; x 32
0 , v0 , E0 )(x) belongs to the uniform local
Compressible Flows: Mathematical Theory 599

for some C > 0 depending only on C0 and , and such that, for every initial data (0 , v0 , S0 ) 2 K with
the entropy inequality TVR (0 , v0 , S0 )  N, when

@t ; m @x q; m  0 33   1TVR 0 ; v0 ; S0  C0 for any  2 1; 5=3

in the sense of distributions for any convex weak the Cauchy problem [13] and [34] has a global
entropyentropy flux pair ( , q), that is, entropy solution (, v, S)(t, x) which is bounded and
satisfies
rq; m r ; mrf ; m
TVR ; v; St;   C TVR 0 ; v0 ; S0
with
for some constant C > 0 independent of .
r2 ; m  0 and jV 0
This result specially includes that for the baro-
2. The solution operator (, m)(t,  ) = St (0 , m0 )(  ), tropic case (Nishida 1968, NishidaSmoller 1973,
determined by (1), is compact in L1loc (R) for t > 0; DiPerna 1973). Some efforts in the direction of
3. Furthermore, if (0 , m0 )(x) is periodic with period relaxing the requirement of small total variation
P, then there exists a global periodic solution have been made. Some extensions to the initial-
(, m)(t, x) with [32] such that (, m)(t, x) asymp- boundary value problems have also been made. In
totically decays to addition, an entropy solution in BV with periodic
Z data or compact support decays when t ! 0.
1
0 ; m0 xdx Furthermore, even for a general hyperbolic system
jPj P
[4] for x 2 R, we have:
in L1 .
If the initial data functions u0 (x) and v0 (x) have
The convergence of the LaxFriedrichs scheme, sufficiently small total variation and u0  v0 2 L1 (R),
the Godunov scheme, and the vanishing viscosity then, for the corresponding exact Glimm, or wave-
method for system [14] have also been established. front tracking, or vanishing viscosity solutions u(t, x)
The results are based on a compensated compact- and v(t, x) of the Cauchy problem [4] and [8], there
exists a constant C > 0 such that
ness framework to replace the BV compactness
framework. For a gas obeying the -law, the case
kut;   vt; kL1 R  Cku0  v0 kL1 R
 = (N 2)=N, N  5 odd, was first studied by
DiPerna (1983), and the case 1 <   5=3 for for all t > 0 35
usual gases was first solved by Chen (1986) and An immediate consequence is that the whole
Ding-Chen-Luo (1985). The cases   3 and 5=3 < sequence of the approximate solutions constructed
 < 3 were treated by LionsPerthameTadmor by the Glimm (1965) scheme, as well as the wave-
(1994) and LionsPerthameSouganidis (1996), front tracking method and the vanishing viscosity
respectively. The case of general pressure laws was method, converges to a unique entropy solution of
solved by ChenLeFloch (2000, 2003). All the [4] and [8] when the mesh size or the viscosity
results for entropy solutions to [14] in Eulerian coefficient tends to zero. More detailed discussions
coordinates can equivalently be presented as the and extensive references about the L1 -stability of BV
corresponding results for entropy solutions to [15] entropy solutions and related topics can be found in
in Lagrangian coordinates. The isothermal case Bressan (2000) and Dafermos (2000); also see Chen
 = 1 was treated by HuangWang (2002). and Wang (2002). Furthermore, the Riemann solu-
tion is unique and asymptotically stable in the class
Global Theory in BV for the Adiabatic Euler of entropy solutions to [13] with large variation
Equations for x 2 R satisfying only one physical entropy inequality
(Chen-Frid-Li 2002).
Consider the Euler equations [13] for polytropic
gases with the Cauchy data: Multidimensional Steady Theory
; v; Sjt0 0 ; v0 ; S0 x 34 The mathematical study of two-dimensional steady
supersonic flows past wedges, whose vertex angles
Then we have (Liu 1977, Temple 1981, Chen and
are less than the critical angle, can date back to the
Wagner 2003):
1940s, since the stability of such flows is fundamental
Let K
{(, v, S) :  > 0} be a compact set in R  R2 , in applications (cf. CourantFriedrichs (1948)). Local
and let N  1 be any constant. Then there exists a solutions around the wedge vertex were first
constant C0 = C0 (K, N), independent of  2 (1, 5=3], constructed (Gu 1962, Schaeffer 1976, Li 1980).
600 Compressible Flows: Mathematical Theory

Such global potential solutions were constructed the free boundary has a strictly positive lower bound
when the wedge has some convexity, or is a small (Chen-Feldman 2003, 2004), which works for the
perturbation of the straight wedge with fast decay in nonlinear equations whose coefficients may depend
the flow direction (Chen 2001, Chen-Xin-Yin 2002), on not only the solution itself but also the gradients
or is piecewise smooth which is a small perturba- of the solution. The second approach is a partial
tion of straight wedge (Zhang 2003). For the hodograph procedure, with which the existence and
two-dimensional steady supersonic flows gov- stability of multidimensional transonic shocks that
erned by the full Euler equations past Lipschitz are not nearly orthogonal to the flow direction can
wedges, it indicates (Chen-Zhang-Zhu 2005a) be handled (Chen-Feldman 2004): one of the main
that, when the wedge vertex angle is less than ingredients in this approach is to employ a partial
the critical angle, the strong shock front hodograph transform to reduce the free boundary
emanating from the wedge vertex is nonlinearly problem into a conormal boundary value problem
stable in structure globally, although there may be for the corresponding nonlinear equations of diver-
many weak shocks and vortex sheets between the gence form and then develop techniques to solve the
wedge boundary and the strong shock front, under conormal boundary value problem. When the reg-
the BV perturbation of the wedge so that the total ularity of the steady perturbation is C3, or higher,
variation of the tangent function along the wedge the third approach is to employ the implicit function
boundary is suitably small. This asserts that any theorem to deal with the existence and stability
supersonic shock for the wedge problem is non- problem. Another iteration approach, which works
linearly stable. well for the two-dimensional equations whose coeffi-
A self-similar gas flow past an infinite cone in R3 cients depend only on the solution itself, has also
with small vertex angle is also nonlinearly stable been developed (Canic-Keyfitz-Lieberman 2000).
upon the BV perturbation of the obstacle (Lien-Liu Further longstanding open problems include the
1999). It is still open for the nonlinear stability when existence of global transonic flows past an airfoil or
the infinite cone in R3 has arbitrary vertex angle. a smooth obstacle (Morawetz 195658, 1985).
The stability issues of supersonic vertex sheets have
been studied by classical linearized stability analysis,
Multidimensional Unsteady Problems
large-scale numerical simulations, and asymptotic
analysis. In particular, the nonlinear development of Now we present some multidimensional time-
instabilities of supersonic vortex sheets at high dependent problems with a simplifying feature that
Mach number was predicted as time evolves the data (domain and/or the initial data) coupled
(Woodward 1985, Artola-Majda 1989). In contrast with the structure of the underlying equations
with the prediction of evolution instability, steady obey certain geometric structure so that the multi-
supersonic vortex sheets, as time-asymptotics, are dimensional problems can be reduced to lower-
stable globally in structure, even under the BV dimensional problems with more complicated
perturbation of the Lipschitz walls, although there couplings. Different types of geometric structure
may be many weak shocks and supersonic vortex call for different techniques.
sheets away from the strong vortex sheet (Chen- The Euler equations for compressible fluids
Zhang-Zhu 2005b). with geometric structure describe many important
Transonic shock problems for steady fluid flows fluid flows, including spherically symmetric flows
are important in applications (cf. Courant and and self-similar flows. Such geometric flows
Friedrichs (1948)). A program on the existence and are motivated by many physical problems such as
stability of multidimensional transonic shocks has shock diffractions, supernovas formation in stellar
been initiated and three new analytical approaches dynamics, inertial confinement fusion, and under-
have been developed (Chen-Feldman 2003, 2004). water explosions. For the initial data with large
The transonic problems include the existence and amplitude having geometric structure, the requi-
stability of transonic shocks in the whole Rd , the red physical insight is: (1) whether the solution
existence and stability of transonic flows past finite has the same geometric structure globally and
or infinite nozzles, the stability of transonic flows (2) whether the solution blows up to infinity in a
past infinite nonsmooth wedges, and the existence of finite time. These questions are not easily under-
regular shock reflection solutions. The first stood in physical experiments and numerical simula-
approach is an iteration scheme based on the tions, especially for the blow-up, because of the
nondegeneracy of the free boundary condition: the limited capacity of available instruments and
jump of the normal derivative of a solution across computers.
Compressible Flows: Mathematical Theory 601

The first type of geometric structure is spherical gradient equation when the wedge is close to a flat
symmetry. A criterion for L1 Cauchy data functions wall.
of arbitrarily large amplitude was observed to For the potential flow equation [19], a self-
guarantee the existence of spherically symmetric similar solution is a solution of the form:
solutions in L1 in the large for the isentropic flows,  = t (y), y = x=t. Letting (y) = y2 =2 (y),
which model outgoing blast waves and large-time then the system can be rewritten in the form of a
asymptotic solutions (Chen 1997). On the other hand, second-order equation of mixed hyperbolicelliptic
it is evident that the density blows up as jxj ! 0 in type in y 2 Rd by scaling:
general, especially for the focusing case; the singular-
ity at the origin makes the problem truly multi- ry  jry j2 ; ry djry j2 ; 0 36
dimensional due to the reflection of waves from with (q2 , z) = (1  (q2 2z)=2)1=(1) . Equation [36]
infinity and their strengthening as they move radially at jry j = q is hyperbolic (pseudosupersonic) if
inwards. One of the important open questions is to (q2 , z) qq (q2 , z) < 0 and elliptic (pseudosubsonic)
understand the order of singularity, (t, jxj) jxj , if (q2 , z) qq (q2 , z) > 0. Under this framework,
at the origin for bounded Cauchy data. the nature of the shock reflection pattern has been
The second type of geometric structure is self- explored for weak incident shocks (strength b) and
similarity, that is, the solutions with initial data small wedge angles 2w by a number of different
functions that give rise to self-similar solutions, scalings, a study of mixed equations, and matching
especially including Riemann solutions. Compressi- asymptotics for the different scalings, where the
ble flow equations in Rd , d  2, with one or more parameter  = c1 2w =b( 1) ranges from 0 to 1
linearly degenerate modes of wave propagation have and c1 is the speed of sound behind the incident
additional difficulties. In that case, the global flow is shock (Morawetz 1994). For  > 2, a regular
governed by a reduced (self-similar) system which is reflection of both strong and weak kinds is
of composite (hyperbolicelliptic) type in the sub- possible as well as a Mach reflection; for  <
sonic region. The linearly degenerate waves give rise 1=2, a Mach reflection occurs and the flow behind
to one or more families of degenerate characteristics the reflection is subsonic and can be constructed in
which remain real in the subsonic region. In some principle (with an elliptic problem) and matched;
cases, the reduced equations couple an elliptic and for 1=2 <  < 2, the flow behind a Mach
(degenerate elliptic) problem for the density with a reflection may be transonic which is a solution of
hyperbolic (transport) equation for the vorticity. a nonlinear boundary-value problem of mixed
An important prototype for both practical type. The basic pattern of reflection has been
applications and the theory of multidimensional shown to be an almost semicircular shock issuing,
complex wave patterns is the problem of diffraction for a regular reflection, from the reflection point
of a shock wave which is incident along an inclined on the wedge and, for a Mach reflection, matched
ramp (see Glimm and Majda (1991)). When a with a local interaction flow. Some related
plane shock hits a wedge head-on, a self-similar observations were also made (Keller-Blank 1951,
reflected shock moves outward as the original Hunter-Keller 1984, Hunter 1988). It is important
shock moves forward. The computational and to establish rigorous proofs. Recently, a rigorous
asymptotic analysis shows that various patterns of existence proof was established for global solutions
reflected shocks may occur, including regular to shock reflection by large-angle wedges in Chen
reflection and (simple, double, and complex) and Feldman (2005).
Mach reflections. The main part or whole reflected
shock is a transonic shock in the self-similar
coordinates, for which the corresponding equation Analytical Frameworks for Entropy Solutions
changes the type from hyperbolic to elliptic across The recent great progress for entropy solutions for
the shock. There are few rigorous mathematical one-dimensional time-dependent Euler equations
results on the global existence and stability of and two-dimensional steady Euler equations, based
shock reflection solutions and the transition among on BV, L1 , or even L1 estimates, naturally arises the
regular, simple Mach, double Mach, and complex expectation that a similar approach may also be
Mach reflections for the potential flow equa- effective for the multidimensional Euler equations,
tion [19] and the full Euler equations [1][3]. or more generally, hyperbolic systems of conserva-
Some results were recently obtained for simplified tion laws, especially,
models including the transonic small-disturbance
equation near the reflection point and the pressure kut; kBV  Cku0 kBV 37
602 Compressible Flows: Mathematical Theory

Unfortunately, this is not the case. The necessary Furthermore, since the fluid is isotropic, we are led
condition for [37] to be held for p 6 2 (Rauch to the Fourier law:
1986) is
q k; ; jrx jrx 
rf k urf l u rf l urf k u
for all k; l 1; 2; . . . ; d 38 for scalar function k which, in most cases, is taken
to be simply a function of  and , or even a
The analysis suggests that only systems in which the constant called the thermal conduction coefficient.
commutativity relation [38] holds offer any hope for Again, system [39][41] is closed by the constitutive
treatment in the framework of BV. This special case relations in [5]. The equation for entropy S is
includes the scalar case n = 1 and the case of one
 q
space dimension d = 1. Beyond that, it contains very @t S rx  mS
few systems of physical interest. 
In this regard, it is important to identify effective Srx v : rx v q  rx 
 43
analytical frameworks for studying entropy solu-  2
tions of the multidimensional Euler equations [1]
[3], which are not in BV. Naturally, we want to The second law of thermodynamics indicates that
approach the questions of existence, stability, the right-hand side of [43] should be non-negative
uniqueness, and long-time behavior of entropy which yields the restriction:
solutions with as much generality as possible. For
this purpose, a theory of divergence-measure fields k; ; jrx j  0;   0; 2=d  0
to construct such a global framework has been
developed for studying entropy solutions (Chen-Frid The case  > 0 and  > 0 is the viscous case
1999, 2000, Chen-Torres 2005, Chen-Torres-Ziemer with heat conductivity k > 0. In particular, the
2005). For more details, see Chen (2005). kinetic theory indicates that the Stokes relationship
should hold, namely = 2=d and the adiabatic
component  = 5=3 for monatomic gases.
Viscous Compressible Fluid Flows: In mathematical viscous fluid dynamics, an
NavierStokes Equations important model is the barotropic model for
Compressible fluid flows that are viscous and viscous fluids, that is, p = p(). Then, the specific
conduct heat are governed by the following energy E can be taken in the form of
NavierStokes equations: E = (1=2)jvj2 e() with e0 () = p()=2 . For clas-
sical solutions, the energy of a barotropic flow
@t  rx  m 0; x 2 Rd 39 satisfies the equality:
  @t E rx  E pv rx  Sv  S : rx v
mm
@t m rx  rx p rx  S 40

which is now a direct consequence of [39] and [40].
    The question of local existence of classical
m m
@t E rx  E p rx   S  rx  q 41 solutions to [39][41] for regular initial data was
  addressed by Nash (1962), where there is no
Here, S = S(rx v, , ) is the viscous stress tensor indication whether or not these solutions exist for
which is symmetric from the conservation of angular all times.
momentum and q is the heat flux. If the fluid is In the case of one space dimension, the well-
isotropic and the viscous tensor S is a linear function posedness is largely settled. The basic result for the
of rx v and invariant under a change of reference existence of classical solutions is that of Kazhikhov
frame (translation and rotation), then we deduce (1976); see Lions (1998) and Feireisl (2004) for
from elementary algebraic manipulations that extensive references. The discontinuous solutions
necessarily have been constructed (Shelukhin 1979, Serre 1986,
Hoff 1987, Chen-Hoff-Trivisa 2000).
S ; rx  v 2; D 42 For the NavierStokes equations in R3 with
general equation of state, the global classical
which corresponds to the Newtonian fluids, where solutions for the Cauchy problem and various
D = (rx v (rx v)> )=2 is the deformation tensor and initial-boundary value problems whose initial data
and  are the Lame viscosity coefficients. is small around a constant state have been
Compressible Flows: Mathematical Theory 603

constructed (Matsumura-Nishida 1980, 1983). The The inviscid limits from the NavierStokes equa-
approach is to obtain a priori estimates via energy tions to the Euler equations have been established as
methods for extending the local solution or for a long as the solutions of the Euler equations are
difference method globally. These results have been smooth, when the viscosity and heat conductivity
extended to the Cauchy problem or the initial- coefficients tend to zero (Klainerman-Majda 1982).
boundary value problems with small discontinuous It is completely open for general entropy solutions,
initial data (Hoff 1997). even in the one-dimensional case.
For the NavierStokes equations in Rd for
barotropic flows with [11] and large initial data, See also: Breaking Water Waves; Capillary Surfaces;
the global existence of solutions containing vacuum Fluid Mechanics: Numerical Methods; Geophysical
for the Cauchy problem or various initial-boundary Dynamics; Incompressible Euler Equations:
Mathematical Theory; Inviscid Flows;
value problems was first established by Lions
Magnetohydrodynamics; Newtonian Fluids and
(1998) for   3=2 if d = 2,   9=5 if d = 3, and
Thermohydraulics; Non-Newtonian Fluids; Partial
 > d=2 if d  4. The gap was closed by Feireisl Differential Equations: Some Examples; Stability of
NovotnyPetzeltova (2001) for the full range Flows; Viscous Incompressible Fluids: Mathematical
 > d=2. These results have been extended to the Theory.
full NavierStokes equations describing the motion
of a general compressible, viscous, and heat con-
ducting fluid (see Feireisl (2004)). The physically
relevant isothermal case,  = 1, is completely open Further Reading
even if d = 2. The only large data existence result is Bressan A (2000) Hyperbolic Systems of Conservation Laws: The
that for radially symmetric data (Hoff 1992). The One-Dimensional Cauchy Problem. Oxford: Oxford Univer-
general case   1 and d = 3 for radially symmetric sity Press.
data was solved only recently (Jiang-Zhang 2001). Chen G-Q (2005) Euler equations and related hyperbolic
conservation laws. In: Dafermos CM and Feireisl E (eds.)
The lower-bound estimate on the density is a Handbook of Differential Equations II: Evolutionary Differ-
delicate issue. Weak solutions containing vacuum ential Equations, Chapter 1, pp. 1104. Amsterdam: Elsevier.
for the isentropic viscous flows with constant Chen G-Q and Wang D (2002) The Cauchy problem for the
viscosity are unstable in general (Hoff-Serre Euler equations for compressible fluids. In: Friedlander S
1991). Hence, it is important to see whether and Serre D (eds.) Handbook of Mathematical Fluid
Dynamics, vol. 1, ch. 5, pp. 421543. Amsterdam: Elsevier
vacuum will never develop if the initial data is Science B.V.
away from vacuum; this has been shown for the Courant R and Friedrichs KO (1948) Supersonic Flow and Shock
one-dimensional case for large initial data and Waves. New York: Springer.
for the multidimensional case with small data. On Dafermos CM (2005) Hyperbolic Conservation Laws in Con-
tinuum Physics (2nd edn). Berlin: Springer.
the other hand, from the kinetic theory, if
Feireisl E (2004) Dynamics of Viscous Compressible Fluids.
solutions contain vacuum, then the viscosity Oxford: Oxford University Press.
coefficients in the NavierStokes equations should Glimm J (1965) Solutions in the large for nonlinear hyperbolic
depend on the density near vacuum; this indeed system of equations. Communications on Pure and Applied
stabilizes the solutions for the one-dimensional Mathematics 18: 95105.
case. Glimm J and Majda A (1991) Multidimensional Hyperbolic
Problems and Computations. New York: Springer.
The stability of viscous shock waves has been Lax PD (1973) Hyperbolic Systems of Conservation Laws and
studied for the one-dimensional case (see Liu (2000) the Mathematical Theory of Shock Waves. Philadelphia:
and the references therein). The compressible SIAM.
incompressible limits from the isentropic compres- Lions PL (1996, 1998) Mathematical Topics in Fluid Mechanics,
sible to incompressible NavierStokes equations vols. 12. New York: Oxford University Press.
Liu T-P (2000) Hyperbolic and Viscous Conservation Laws,
when the Mach number tends to zero have been CBMS-NSF RCSAM, vol. 72. Philadelphia: SIAM.
established for arbitrarily weak solutions (Lions- Majda A (1984) Compressible Fluid Flow and Systems of
Masmoudi 1998) and for smooth solutions and a Conservation Laws in Several Space Variables. New York:
class of initial data functions (Hoff 1998). Springer.
604 Computational Methods in General Relativity: The Theory

Computational Methods in General Relativity: The Theory


M W Choptuik, University of British Columbia, Here, G is the Einstein tensor that contracted
Vancouver, Canada piece of the Riemann curvature tensor that has
2006 Elsevier Ltd. All rights reserved. vanishing divergence and T is the stress tensor of
the matter content of the spacetime. T likewise has
vanishing divergence, an expression of the principle
of local conservation of stressenergy that general
Conventions and Units relativity embodies.
The elegant tensor formulation [1] belies the fact
This article adopts many of the conventions and that, ultimately, the field equations are generically a
notations of Misner, Thorne, and Wheeler (1973) complicated and nonlinear set of partial differential
hereafter denoted MTW including metric signature equations (PDEs) for the components of the space-
(  ); definitions of Christoffel symbols and time metric tensor, g (x ), in some coordinate
curvature tensors (up to index permutations per- system x . Moreover, implicit in a numerical
mitted by standard symmetries of the tensors in a solution of [1] is the numerical solution of the
coordinate basis); the use of Greek indices equations of motion for any matter fields that
, , , . . . , ranging over the spacetime coordinate couple to the gravitational field that is, that
values (0, 1, 2, 3) ! (t, x1 , x2 , x3 ), to denote the com- contribute to T . The reader is reminded that it is a
ponents of spacetime tensors such as g ; the similar hallmark of general relativity that, in principle, all
use of Latin indices i, j, k, . . . , ranging over the matter fields including massless ones such as the
spatial coordinate values (1, 2, 3) ! (x1 , x2 , x3 ), for electromagnetic field contribute to T .
spatial tensors such as ij ; the use of the Einstein Now, in the 3 1 approach to general relativity
summation convention for both types of indices; the that is described below, the task of solving the field
use of standard Kronecker delta symbols (tensors), equations [1] is formulated as an initial-value or
  and i j ; the choice of geometric units, G = c = 1; Cauchy problem. Specifically, the spacetime metric,
and, finally, the normalization of the matter fields g (x ) = g (t, xk ), which encodes all geometric
implicit in the choice of the constant 8 in [1]. information concerning the spacetime, M, is
The majority of the equations that appear in this viewed as the time history, or dynamical evolution,
article are tensor equations, or specific components of the spatial metric, ij (0, xk ), of an initial space-
of tensor equations, written in traditional index (not like hypersurface, (0). In any practical calculation,
abstract index) form. Thus, these equations are the degree to which the matter fields back-react
generally valid in any coordinate system, (t, xi ), on the gravitational field, that is, contribute to T
but, of course do require the introduction of a substantially enough to cause perturbations in g
coordinate basis and its dual. This approach is also at or above the desired accuracy threshold, will
largely a matter of convention, since all of what thus depend on the specifics of the initial
follows can be derived in a variety of fashions, some configuration.
of them purely geometrical, and there are also In astrophysics, there are relatively few well-
approaches to numerical relativity based, for exam- identified environments in which it is generally
ple, on frames rather than coordinate bases. thought to be crucial to the faithful emulation of
This article departs from MTW in its use of , i , the physics that the matter fields be fully coupled to
and ij to denote the lapse, shift, and spatial metric, the gravitational field. However, both observation-
respectively, rather than MTWs N, N i , and (3) gij . ally and theoretically, the existence of gravitation-
Finally, the operations of partial differentiation ally compact objects is quite clear. Gravitationally
with respect to coordinates x , t, and xi are denoted compact means that a star with mass, M, has a
@ , @t , and @i , respectively. radius, R, comparable to its Schwarzschild radius,
RM , which is defined by
2G
Introduction RM M  1027 kg m1 2
c2
The numerical analysis of general relativity, or
Here, and only here, G and c Newtons gravita-
numerical relativity, is concerned with the use of
tional constant and the speed of light, respectively
computational methods to derive approximate solu-
have been explicitly reintroduced. The fact that
tions to the Einstein field equations
RM =R is about 106 and 109 at the surfaces of the
G 8T 1 sun and earth, respectively, is a reminder of just how
Computational Methods in General Relativity: The Theory 605

weak gravity is in the locality of Earth. However, as these events using the techniques of numerical
befits anything of Einsteinian nature, the weakness relativity have the potential to substantially hasten
of gravity is relative, so that at the surface of a the discovery process, on the basis of the general
neutron star, one would find principle that if one knows what signal to look for,
it is much easier to extract that signal from the
RM
 0:4 3 experimental noise.
R The computational task facing numerical relati-
while for black holes, one has vists who study problems such as binary inspiral is
RM formidable. In particular, such problems are intrin-
1 4 sically 3D, to use the CFD (computational fluid
R
dynamics) nomenclature in which time dependence
In such circumstances, gravity is anything but is always assumed. That is, the PDEs that must be
weak! Furthermore, in situations where the mat- solved govern functions, F(t, xk ), that depend on all
terenergy distribution has a highly time-dependent three spatial coordinates, xk , as well as on time, t.
quadrupole moment such as occurs naturally with Unfortunately, even a cursory description of 3D
a compact-binary system (i.e., a gravitationally work in numerical relativity as it stands at this time
bound two-body system, in which each of the is far beyond the scope of this article.
bodies is either a black hole or a neutron star) the What follows, then, is an outline of a traditional
dynamics of the gravitational field, including, approach to numerical relativity that underpins
crucially, the dynamics of the radiative components many of the calculations from the early years of
of the gravitational field, can be expected to the field (1970s and 1980s), most of which were
dominate the dynamics of the overall system, carried out with simplifying restrictions to
matter included. For scenarios such as these, it either spherical symmetry or axisymmetry. The
should come as no surprise that the solution of the mathematical development, which will hereafter be
combined gravitohydrodynamical system begs for called the 3 1 approach to general relativity, has
numerical analysis. the advantage of using tensors and an associated
In addition, both from the physical and mathe- tensor calculus that are reasonably intuitive for the
matical perspectives, it is also natural to study the physicist. This standard 3 1 approach is also
strong, field dynamic regimes (R ! RM and/or v ! c, sufficient in many instances (particularly those
where v is the typical speed characterizing internal with symmetry) in the sense that it leads to well-
bulk motion of the matter) of general relativity posed sets of PDEs that can be discretized and
within the context of a variety of matter models. then solved computationally in a convergent
Typical processes addressed by these theoretical (stable) fashion. In addition, a thorough under-
studies include the process of black hole formation, standing of the 3 1 approach will be of sig-
end-of-life events for various types of model stars, nificant help to the reader wishing to study any of
and, again, the interaction, including collisions, of the current literature in numerical relativity,
gravitationally compact objects. Note that it is including the 3D work.
another hallmark of general relativity that highly However, the reader is strongly cautioned that
dynamical spacetimes need not contain any matter; the blind application of any of the equations that
indeed, the interaction of two black holes the follow, especially in a 3D context, may well lead
natural analog of the Kepler problem in relativity to ill-posed systems, numerical analysis of which
is a vacuum problem; that is, it is described by a is useless. Anyone specifically interested in using
solution of [1] with T = 0. the methods of numerical relativity to generate
Motivated in significant part by the large-scale discrete, approximate solutions to [1], particularly
efforts currently underway to directly detect gravita- in the generic 3D case, is thus urged to first
tional radiation (gravitational waves), much of the consult one of the comprehensive reviews of
contemporary work in numerical relativity is numerical relativity that continue to appear at
focused on precisely the problem of the late phases fairly regular intervals (see, e.g., Lehner (2001), or
of compact-binary inspiral and merger. Such bin- Baumgarte and Shapiro (2003)). Most such refer-
aries are expected to be the most likely candidates ences will also provide a useful overview of many
for early detection by existing instruments such as of the most popular numerical techniques that are
TAMA, GEO, VIRGO, LIGO, and, more likely, by currently being used to discretize (convert to
planned detectors including LIGO II and LISA (see, algebraic form) the Einstein equations, as well as
e.g., Hough and Rowan (2000)). Detailed and the main algorithms that are used to solve the
accurate predictions of expected waveforms from resulting discrete equations. These subjects are not
606 Computational Methods in General Relativity: The Theory

described below, not least since discussion of the of t should nominally be infinite, both to the future
available discretization techniques only makes as well as to the past; that is, the solution domain is
sense in the context of PDEs of specific systems
with specific boundary conditions, while there is 1 < t < 1 6
only space here to describe the general mathema-  1=2
tical setting for 3 1 numerical relativity. jXj  ij xi xj <1 7

However, this assumes that one has global


existence for arbitrarily strong initial data, which
The 3 1 Spacetime Split is decidedly not always the case in general
At least at the current time, computations in relativity. Indeed, continued or catastrophic
numerical relativity are restricted to the case of gravitational collapse that is, the process of black
globally hyperbolic spacetimes. A spacetime (four- hole formation signaled, in modern language, by
dimensional pseudo-Riemannian manifold), M , the appearance of a trapped surface, inexorably
endowed with a metric, g , is globally hyperbolic leads to a physical singularity, which the
if there is at least one edgeless, spacelike hypersur- somewhat vague nature of the singularity theorems
face, (0), that serves as a Cauchy surface. That is, of Penrose, Hawking, and others notwithstanding
provided that the initial data for the gravitational in actual numerical computations invariably turns
field are set consistently on (0) so that the four out to be catastrophic in terms of Cauchy
constraint equations are satisfied (see below) the evolution.
entire metric g (t, xi ) can be determined from the Such behavior in time-dependent nonlinear PDEs
field equations [1] (with appropriate boundary is quite familiar in the mathematical community at
conditions), and thus, so can the complete geometric large, where it is frequently known as finite-time
structure of the spacetime manifold. blow-up (or finite-time singularity). However,
To be sure, global hyperbolicity is restrictive. It despite the fact that such behavior is one of the
excludes, for example, the highly interesting Godel most fascinating aspects of solutions of the Einstein
universe. However, particularly from the point equations, the following discussion will be, impli-
of view of studying asymptotically flat solutions citly at least, restricted to the case of weak initial
(or solutions asymptotic to any of the currently data, that is, to initial data for which there is global
popular cosmologies), as is usually the case in existence.
astrophysics, the requirement of global hyperbolicity With the manifold M sliced into an infinite
is natural. stack of spacelike hypersurfaces, (t), attention
The 3 1 split is based on the complete foliation shifts to any single surface, as well as to the
of M based on level surfaces of a scalar function, manner in which such a generic surface is
t the time function. That is, the t = const. slices, embedded in the spacetime.
are three-dimensional spacelike (Riemannian) hyper- First, each spacelike hypersurface, (t), is itself a
surfaces, and, as t ranges from 1 to 1, three-dimensional Riemannian differential manifold
completely fill the spacetime manifold, M . In with a metric ij (t, xk ). (Note that in this discussion,
order for the (t) to be everywhere spacelike, the symbol t is to be understood to represent any
t must be everywhere timelike: specific value of coordinate time.) From this metric,
one can construct an inverse metric,  ij (t, xk ),
g r tr t < 0 5 defined, as usual, so that
Here r is the spacetime covariant derivative  ik kj i j 8
operator compatible with the four metric, g , thus
satisfying r g = 0, and g is the inverse metric Associated with the spatial metric, ij , is a natural
tensor, which satisfies g g =   . The reader is spatial covariant derivative operator, Di , that is
reminded that   is a Kronecker delta symbol; that compatible with ij :
is,   has the value 1 if  = , and the value 0
otherwise. Dk ij 0 9
Furthermore, the scalar function t is now adopted With the spatial metric, ij , and its inverse,  ij , in
as the temporal coordinate, so that x = (t, xi ), hand, the standard formulas of tensor analysis can
where the xi are the three spatial coordinates. As be applied to compute the usual suite of geome-
noted implicitly above, since the problem under trical tensors. All tensors thus computed, and
consideration is a pure Cauchy evolution, the range indeed, all tensors defined intrinsically to the
Computational Methods in General Relativity: The Theory 607

hypersurfaces (t) are called spatial tensors, and


have their indices (if any) raised and lowered with idt
dx i
 ij and ij , respectively. (t + dt )
Thus, the Christoffel symbols of the second kind,
dt dx
i jk , are given by (t )
 
i jk 12  il @k lj @j lk  @l jk 10
Figure 1 Spacetime displacement in the 3 1 approach,
following Misner, Thorne, and Wheeler (1973). Solid lines represent
Note that these quantities are symmetric in their last surfaces of constant time, t ; that is, each solid line represents a
two indices single spacelike hypersurface, (t). Dotted lines denote trajectories
of constant spatial coordinate, that is, trajectories with x k = const.
i jk i kj 11 The lapse function, (t, x k ), encodes the (local) ratio between
elapsed coordinate time, dt, and elapsed proper time, d =  dt, for
an observer moving normal to the slices (i.e., for an observer with a
and that they can be used, as usual, in explicit 4-velocity, u  , identical to the hypersurface normal, n  ). Similarly,
calculation of the action of the spatial covariant the shift vector,  i (t, x k ), describes the shift, i (t, x i ) dt, in
derivative operator on an arbitrary tensor. In trajectories of constant spatial coordinate the dotted lines in the
particular, for the special cases of a spatial vector, figure relative to motion perpendicular to the slices. The 3 1
form of the line element [18] then follows immediately from an
V i , and a covector (1-form), Wi , one has
application of the spacetime version of the Pythagorean theorem.

Di V j @i V j j ik V k 12
As Figure 1 illustrates, a quick route to the 3 1
and decomposition of the above expression, and thus of
the tensor g itself, is based on an application of
Di Wj @i Wj  k ij Wk 13 the four-dimensional Pythagorean theorem. In
setting up the calculation, one naturally identifies
respectively. four functions, the scalar lapse, (t, xk ), and the
Given the Christoffel symbols, the components of vector shift, i (t, xk ), that encode the full coordi-
the spatial Riemmann tensor, denoted here Rijk l , are nate (gauge) freedom of the theory. That is,
computed using complete specification of the lapse and shift is
equivalent to completely fixing the spacetime
Rijk l @j l ik  @i l jk m ik l mj coordinate system.
In light of the above discussion, and again
 m jk l mi 14
referring to Figure 1, one readily deduces the 3 1
decomposition of the spacetime line element:
Finally, the Ricci tensor, Ri j , and Ricci scalar, R, are
defined in the usual fashion   
ds2 2 dt2 ij dxi  i dt dxj j dt 18

Ri j  ik Rkj  ik Rklj l 15 A rearranged form of this last expression is also


often seen in the literature:
R  ij Rij 16  
ds2 2 k k dt2 2k dxk dt
The reader should again note that all of the
tensors just defined live on each and every single ij dxi dxj 19
spacelike hypersurface, (t), and are thus known as
The following useful identifications of the time
hypersurface-intrinsic quantities. In particular, the
time, timespace, and spacespace pieces of
spatial Riemann tensor, Rijk l , which encodes all
the spacetime metric, g , follow immediately from
intrinsic geometric information about (t), in no
[19]:
way depends on how the slice is embedded in the
spacetime M . g00 2  i i 20
The next step in the 3 1 approach involves
rewriting the fundamental spacetime line element for g0i gi0 i ik  k 21
the squared proper distance, ds2 , between two
spacetime events, P and Q, having coordinates x gij ij 22
and x dx , respectively,
This last relation is an example of a useful general
ds2 g dx dx 17 result; the purely spatial components, Qijk , of a
608 Computational Methods in General Relativity: The Theory

completely covariant, but otherwise arbitrary, space- the extrinsic curvature (or second fundamental
time tensor, Q , constitute the components of a form). This additional tensor is analogous to a
completely covariant spatial tensor. time derivative of ij (t, xk ), or, from a Hamiltonian
A straightforward calculation, which provides a perspective, to a variable that is dynamically
good exercise in the use of the 3 1 calculus, conjugate to ij (t, xk ).
yields the following equally useful identifications for As the name suggests, the extrinsic curvature
various pieces of the inverse spacetime metric: g describes the manner in which the slice (t) is
embedded in the manifold (to be contrasted with
g00 2 23 Rijk l defined by [14] which is, as mentioned
previously, completely insensitive to the manner in
g0i gi0 2 i 24 which the hypersurface is embedded in M ).
Geometrically, Kij is computed by calculating the
gij  ij  2 i j 25 spacetime gradient of the normal covector field, n ,
and projecting the result on to the hypersurface,
Since the Einstein field equations are equations
with, loosely speaking, geometry on one side and Kij  12 ri nj 31
matter on the other, tensors built from matter fields
must also be decomposed. In particular, it is where it must be stressed that r is the spacetime
conventional to define tensors,
, ji , and Sij that covariant derivative operator compatible with the
result from various projections of the spacetime 4-metric, g ; that is, r g = 0. A straightforward
stress energy tensor, T , onto the hypersurface: tensor calculus calculation then yields the following,
which can be viewed as a definition of the Kij :

 n n T  26
1  

Kij @t ij Di j Dj i 32
ji  n T i 27 2
Here, Di is the spatial covariant metric, compatible
Sij  Tij 28 with ij (Dk ij = 0), that was defined previously.
For observers with 4-velocities u equal to n , and Observe that this equation can be easily solved for
only for those observers with u = n , the above @t ij (this will be done below), and thus, in the 3 1
quantities have the interpretation of the locally and approach it is [32] that is the origin of the evolution
instantaneously measured energy density, momen- equations for the 3-metric components, ij .
tum density, and spatial stresses, respectively. As
with the geometric quantities, all of the matter
variables,
, ji , and Sij defined in [26][28] are Einsteins Equations in 3 1 Form
spatial tensors and thus have their indices (if any)
raised and lowered with the 3-metric. Note that the The Constraint Equations
identification Sij = Tij is another illustration of As is well known, as a result of the coordinate (gauge)
the general result mentioned in the context of the invariance of the theory, general relativity is overdeter-
previous identification of ij and gij . mined in a sense completely analogous to the situation
Finally, observing that time parameters are natu- in electrodynamics with the Maxwell equations. One
rally defined in terms of level surfaces (equipotential of the ways that this situation is manifested is via the
surfaces), it should be no surprise that the covariant existence of the constraint equations of general
components, n , of the hypersurface normal field, relativity. Briefly, starting from the naive view that
the ten metric functions, g (t, xi ), that completely
n ; 0; 0; 0 29
determine the spacetime geometry are all dynamical
are simpler than the components, n , of the normal that is, that they satisfy second-order-in-time equations
itself, of motion one finds that the Einstein equations do not
 
n 1 ; 1 i 30 provide dynamical equations of motion for the lapse,
, or the shift,  i . Rather, four of the field equations [1]
and, in fact, eqn [29] can also be deduced from a are equations of constraint for the true dynamical
quick study of Figure 1. variables of the theory, {ij , @t ij }, or, equivalently,
In the 3 1 approach, in addition to the 3-metric, {ij , Ki j }. Note that in the following, the mixed
ij (t, xk ), and coordinate functions, (t, xi ) and form, Ki j , is at times used again by convention as
(t, xi ), it is convenient to introduce an additional the principal representation of the extrinsic curvature
rank-2 symmetric spatial tensor, Kij (t, xk ), known as tensor (instead of Kij as previously, or Kij ).
Computational Methods in General Relativity: The Theory 609

Thus, four of the components of [1] can be The Evolution Equations


written in the form
As discussed above, in the 3 1 form of the Einstein
 
C ij ; Ki j ; @k ij ; @l @k ij ; @k Ki j T  33 equations [1], the spatial metric, ij , and the
extrinsic curvature, Ki j , are viewed as the dynamical
where T  depends only on the matter content in the variables for the gravitational field. The remainder
spacetime. Note that in addition to having no of the 3 1 equations are thus two sets of six first-
dependence on @t2 ij , the constraints are also order-in-time evolution equations; one set for ij ,
independent of  and  i .
If the Einstein equations [1] are to hold throughout @t ij  2ik Kk j  k @k ij
the spacetime, then the constraints [33] must hold on ik @j k kj @i  k 37
each and every spacelike hypersurface, (t), including,
crucially, the initial hypersurface, (0). From the point and the other set for Ki j ,
of view of Cauchy evolution, this means that the 12 @t Ki j  k @k Ki j  @k i Kk j @j k Ki k  Di Dj 
functions, {ij (0, xk ), Ki j (0, xk )}, constituting the grav-   
itational part of the initial data, are not completely  Ri j KKi j 8 12 i j S 
 Si j 38
freely specifiable, but must satisfy the four constraints As also noted previously, the evolution equations
  [37] for the spatial metric components, ij , follow
C ij 0; xk ; Ki j 0; xk ; . . . T  0; xk 34
from the definition of the extrinsic curvature [31].
The derivation of the equations for the extrinsic
However, provided initial data that do satisfy the
curvature, on the other hand, require lengthy, but
equations is chosen, then as consistency of the
well-documented, manipulations of the spatial com-
theory demands the dynamical equations of
ponents of the field equations [1].
motion for the {ij , Ki j } (eqns [37] and [38] below)
guarantee that the constraints will be satisfied on all
future (or past) hypersurfaces, (t). In this internal The (Naive) Cauchy Problem
self-consistency, the geometrical Bianchi identities, A naive statement of the Cauchy problem for 3 1
r G = 0, and the local conservation of stress numerical relativity is thus as follows: fix a speci-
energy, r T  = 0, play crucial roles. fied number, N, of matter fields A (t, xk ), A =
In the 3 1 approach, as one would expect, the 1, 2, . . . , N, all minimally coupled to the gravita-
constraint equations further naturally subdivide into tional field, with a total stress tensor, T , given by
a scalar equation
X
N
A
ij 2
R  Kij K K 16
35 T T 39
A1
and a (spatial) vector equation A
where T is the stress tensor corresponding to the
Dj Kij  Di K 8ji 36 matter field A . Choose a topology for (0) (e.g., R3
with asymptotically flat boundary conditions; T 3 ,
where the energy and momentum densities,
and ji = with no boundaries, etc.) This also fixes the
 ik jk , are given by [26][28]. Equations [35] and [36] topology of M to be Rthe topology of (0).
are often known as the Hamiltonian and momentum Next, freely specify eight of the 12 {ij (0, xk ),
constraint, respectively, not least since
p
the behavior of K j (0, xk )}, as well as initial values, A (0, xk ), for the
i

their solutions as X  ij xi xj ! 1 encodes the matter fields. Then determine the remaining four
conserved mass and linear momentum (four numbers) dynamical gravitational fields from the constraints
that can be defined in asymptotically flat spacetimes. [35] and [36]. This completes the initial data
In a general 3 1 coordinate system, and with an specification.
appropriate choice of variables, the constraints can One must now choose a prescription for the
be written as a set of quasilinear elliptic equations kinematical (coordinate) functions,  and i , so that
for four of the {ij , Ki j } (or, more properly, for either explicitly or implicitly, they are completely fixed;
certain algebraic combinations of the {ij , Ki j }). for the case of implicit specification, this may well
Thus, especially for 2D and 3D calculations, the mean that the coordinate functions themselves will
setting of initial data for the Cauchy problem in satisfy PDEs, which, furthermore, can be of essentially
general relativity is itself a highly nontrivial mathe- any type in practice (i.e., elliptic, hyperbolic, para-
matical and computational exercise. Readers bolic, . . .). Finally, with consistent initial data,
wishing more details on this subject are directed to {ij (0, xk ), Ki j (0, xk ); A (0, xk )}, in hand, and with a
the comprehensive review by Cook (2000). prescription for the coordinate functions, the evolution
610 Computational Methods in General Relativity: The Theory

equations [37] and [38] can be used to advance the It is critical to note at this point, however, that in
dynamical variables forward or backward in time. the vast bulk of past and current work in numerical
The above description is naive since, apart from a relativity, including most of the ongoing work in
consistent mathematical specification, the most crucial 3D, the Einstein equations [1] have been solved, not
issue in the solution of a time-dependent PDE as a as a pure Cauchy problem, but as a mixed initial-
Cauchy problem is that the problem be well posed. value/boundary-value (IBVP) problem. That is, in
Roughly speaking, this means that solutions do not the discretization process in which the continuum
grow without bound (blow-up) without physical equations [1] are replaced with algebraic equations,
cause, and that small, smooth changes to initial data the continuum domain [6][7] is typically replaced
yield correspondingly small, smooth changes to the with a truncated spatial domain
evolved data. In short, the Cauchy problem must be
stable, and whether or not a particular subset of jxi j Ximax 45
the equations displayed in this section yields a well- where the Ximax are a priori specified constants
posed problem is a complicated and delicate issue, (parameters of the computational solution) that
especially in the generic 3D case. The reader is thus define the extremities of the computational box.
again cautioned against blind application of any of the As one might expect, the theory underlying stability
equations displayed in this article. and well-posedness of IBVP problems especially
for differential systems as complicated as [1] is
even more involved than for the pure initial-value
Boundary Conditions
case, and is another very active area of research in
In principle, because all spacelike hypersurfaces, (t), both mathematical and numerical relativity
in a pure Cauchy evolution are edgeless and provided (see, e.g., Friedrich and Nagy (1999)).
that the initial data {ij (0, xk ), Ki j (0, xk ); A (0, xk )} is
consistent with asymptotic flatness, or whatever other See also: Critical Phenomena in Gravitational Collapse;
condition is appropriate given the topology of the Einstein Equations: Initial Value Formulation; Fluid
(t) there are essentially no boundary conditions to Mechanics: Numerical Methods; General Relativity:
Overview; Geometric Analysis and General Relativity;
be imposed on the dynamical variables, {ij (t, xk ),
Gravitational Waves; Hamiltonian Reduction of Einsteins
Ki j (t, xk )}, during Cauchy evolution. Note that asymp- Equations; Magnetohydrodynamics; Spacetime
totic flatness generally requires that Topology, Causal Structure and Singularities; Symmetric
 
1 Hyperbolic Systems and Shock Waves.
lim ij fij O 40
X!1 X
and
  Further Reading
i 1
lim K j O 41
X!1 X2 Baumgarte T and Shapiro SL (2001) Numerical relativity and
compact binaries. Physics Reports 376: 41131.
where X is defined by Cook G (2000) Initial data for numerical relativity. Living
q Reviews of Relativity 3: 5 (irr-2000-5).
X ij xi xj 42 Font JA (2003) Numerical hydrodynamics in general relativity.
Living Reviews of Relativity 6: 4 (irr-2003-4).
as previously, and fij is the flat 3-metric. Similarly, Frauendiener J (2004) Conformal infinity. Living Reviews of
Relativity 7: 1 (irr-2004-1).
should the lapse, , and shift, , be constrained by
Friedrich H and Nagy G (1999) The initial boundary value
elliptic PDEs as is frequently the case in practice problem for Einsteins vacuum field equation. Communica-
then the only natural place to set boundary condi- tions in Mathematical Physics 201: 619655.
tions is at spatial infinity, and then, provided that Hough J and Rowan S (2000) Gravitational wave detection by
the frame at spatial infinity is inertial, with interferometry (ground and space). Living Reviews of Rela-
tivity 3: 3 (irr-2000-3).
coordinate time t measuring proper time, one should
Lehner L (2001) Numerical relativity: a review. Classical and
have Quantum Gravity 18: R25R86.
  Misner CW, Thorne KS, and Wheeler JA (1973) Gravitation.
1
lim  1 O 43 San Francisco: W.H. Freeman.
X!1 X Reula OA (1998) Hyperbolic methods for Einsteins equations.
Living Reviews of Relativity 1: 3 (irr-1998-3).
and   Winicour J (2001) Characteristic evolution and matching. Living
1 Reviews of Relativity 4: 3 (irr-2001-3).
lim  i O 44
X!1 X
Constrained Systems 611

Confinement see Quantum Chromodynamics

Conformal Geometry see Two-dimensional Conformal Field Theory and Vertex Operator Algebras

Conservation Laws see Symmetries and Conservation Laws

Constrained Systems
M Henneaux, Universite Libre de Bruxelles, of motion in the standard canonical form
Brussels, Belgium qi = @H=@pi , pi = @H=@qi . These canonical
2006 Elsevier Ltd. All rights reserved. equations are in normal form and have a unique
solution for given initial data, which would
contradict the presence of a gauge symmetry.
A simple example that illustrates this phenom-
Introduction enon is given by the following model for three
Consider a dynamical system with coordinates variables q1 , q2 , and , the Lagrangian of which
qi (i = 1, . . . , n) and Lagrangian L(qi , qi ) (field theory reads
is formally covered by regarding the spatial coordi-  
nates as a continuous index). When going to the L 12 q_ 1  2 q_ 2  2 2
Hamiltonian formulation, it is usually assumed that
This model is inspired by electromagnetism: the
the Legendre transformation between the velocities
variables q1 and q2 play a role somewhat similar
qi and the momenta
to that of the spatial components of the vector
@L potential, while  corresponds to the temporal
pi 1
@ q_ i component. The Lagrangian is invariant under the
gauge transformations
can be inverted to yield the velocities as functions of
the qs and the ps. This regular situation occurs q1 ! q1 "; q2 ! q2 ";  !  "_ 3
for most systems appearing in standard classical
mechanics and enables one to proceed to the where " is an arbitrary function of time. The
Hamiltonian formulation of the theory without conjugate momenta are
difficulty.
In field theory, however, the regular case is the p1 q_ 1  ; p2 q_ 2  ;  0
exception rather than the rule. This is due to gauge
One cannot invert the Legendre transformation
invariance and first-order Lagrangians.
since one cannot express the velocity _ in terms of
 Gauge invariance A system possesses gauge sym- the momenta.
metries if it is invariant under transformations that  First-order Lagrangians Fermionic fields obey
involve arbitrary functions of time (gauge trans- first-order equations. Their Lagrangian is linear
formations). In that case, the solution of the in the derivatives, so that the conjugate momenta
equations of motion with given initial data is not pi depend on the coordinates qi only. It is then
unique, since it is always possible to perform a clearly impossible to express the velocities in
gauge transformation in the course of the evolution terms of the momenta through the Legendre
without changing the initial data. It is then clear transformation. More generally, any first-order
that the Legendre transformation cannot be inver- Lagrangian with or without gauge symmetry leads
tible, for if it were, one could rewrite the equations to a noninvertible Legendre transformation.
612 Constrained Systems

A simple system that exhibits this feature is by their expression [1] in terms of the coordinates
described by the Lagrangian and the velocities. They are called primary con-
straints. We shall assume that the matrix
L z2 z_ 1  12 z2 2 4
@m
1 2
for two bosonic degrees of freedom (z , z ). This @pi ; qi
is in fact the canonical form of the Lagrangian for
a free particle in one dimension (z2 is the is everywhere of constant (maximum) rank M on the
momentum conjugate to the position z1 ): the phase-space surface defined by eqns [6] which is
system is already in Hamiltonian form. There is assumed to be smooth. This surface is of dimension
no gauge invariance, but because the Lagrangian 2n  M.
is first order, the Legendre transformation with
[4] as starting point, Canonical Hamiltonian The next step in the Dirac
procedure is to define the canonical Hamiltonian H
p 1 z2 ; p2 0 5 through
is non invertible for the velocities (which do not H q_ i pi  L 7
even appear in the formulas for the momenta).
As shown by Dirac, H can be re-expressed as a
Dirac showed how to develop the Hamiltonian function H(q, p) of the momenta and the coordi-
formalism in the case when the Legendre transfor- nates, even when the Legendre transformation is not
mation is not invertible. One can still reformulate invertible: the canonical Hamiltonian H depends on
the equations in phase space and write them in terms the velocities only through the pi s. Furthermore, the
of brackets with the Hamiltonian, but a new major original equations of motion in Lagrangian form are
feature emerges, namely the canonical variables are equivalent to the Hamiltonian equations
no longer free. Rather, the permissible phase-space
points are constrained to be on the so-called @H @m
q_ i um 8
constrained surface. For this reason, systems for @pi @pi
which the Legendre transformation is not invertible
are also called constrained Hamiltonian systems. @H @m
p_ i   um 9
We shall adopt this terminology here. @qi @qi
The purpose of this article is to explain the main
ideas underlying the Dirac method. To simplify the m q; p 0 10
discussions and to focus on the features peculiar to
the Dirac construction, we shall assume as a rule where the um s are parameters, some of which will
that all necessary smoothness conditions are fulfilled be determined through the consistency algorithm to
by the functions, surfaces, etc., appearing in the be discussed shortly. (In [7][9] and everywhere
formalism. How to develop the analysis when some below, there is a summation over the repeated
of the smoothness conditions are not fulfilled is of indices.)
definite interest but goes beyond the scope of this
review. We shall also assume, for definiteness, that Secondary constraints The equations of motion [8]
all the variables are bosonic in order to avoid and [9] can be rewritten as
straightforward but somewhat cumbersome sign F_ F; H um F; m  11
factors in the formulas.
where F = F(q, p) is any function of the canonical
variables. Here, the Poisson bracket is defined as
General Theory usual by
Dirac Algorithm @G @F @G @F
G; F  12
Primary constraints When the Legendre transfor- @qi @pi @pi @qi
mation [1] cannot be inverted, the momenta pi s do If one takes for F one of the primary constraints
not span an n-dimensional space but are constrained m , one should get zero, _ m = 0. This yields the
by relations consistency conditions
m q; p 0; m 1; . . . ; M 6 0
m ; H um m ; m0  0 13
which follow from their definition. These equations These conditions can imply further restrictions on the
reduce to identities when the momenta are replaced canonical variables and/or impose conditions on the
Constrained Systems 613

variables um . Any new relation X(q, p) = 0 on the Poisson brackets with all the constraints vanish
canonical variables leads, in turn, to a further consis- weakly (i.e., are zero on the constraint surface),
0
tency condition X = [X, H] um [X, m0 ] = 0, which
can bring in either further restriction on the constraint F; j   0; j 1; . . . ; J 18
surface or fix more variables um . Constraints that
A function is second class otherwise, that is, if there
follow from the consistency algorithm are called
is at least one constraint j such that [F, j ] 6 0
secondary constraints. Finally, one is left with a
(not even weakly). Second-class functions generate
certain number of secondary constraints, which are
canonical transformations that do not leave the
denoted by k = 0, k = M 1, . . . , M K. We assume
constraint surface invariant. Since canonical trans-
again that all the constraints (primary and secondary)
formations that map the constraint surface on itself
define a smooth surface, called the constraint surface,
form a group, the Poisson bracket of two first-class
and fulfill the condition that @(k )=@(qi , pi ) is of
functions is itself a first-class function.
maximum rank J  M K on the constraint surface.
Because the system is constrained to lie on the
(We also assume for simplicity that there is no
constraint surface, the only allowed canonical
branching in the consistency algorithm.)
transformations are those that are generated by
first-class functions. The importance of the distinc-
Restrictions on the us Having a complete set of tion between first-class and second-class functions
constraints stems from this elementary fact. Note, in particular,
that the time evolution is generated as it should
j 0; j 1; . . . ; M K  J 14 by a first-class generator since the equations of
motion [11] can be rewritten as
we can now investigate more precisely the restric-
tions on the variables um . These read F_  F; H 0  ua F; Vam m  19
with
j ; H um j ; m   0; j 1; . . . ; J 15
H 0 H U m m 20
where the notation  means equal modulo the
constraints. In [15], m is summed from 1 to M.
0
One has both [H , m ]  0 and [Vam m , j ]  0.
Equations [15] are a set of J linear, inhomogeneous
equations for the us, with coefficients that are Splitting of the constraints One can separate
functions of the canonical variables qi , pi . The the constraints between first-class and second-class
general solution of this system is of the form constraints. This can be achieved by considering the
matrix Cjj0 of the Poisson bracket of the constraints,
um Um ua Vam 16
Cjj0 j ; j0 ; j; j0 1; . . . ; J 21
where Um is a particular solution and where the Vam
(a = 1, . . . , A) provide a complete set of independent One has the following theorem due to Dirac.
solutions of the homogeneous system Theorem 1 If det Cjj0  0, there exists at least one
Vam j ; m   0 17 first-class constraint among the j s.
Proof Straightforward: if det Cjj0  0, one can find
The coefficients ua (a = 1, . . . , A) are completely a nontrivial solution j of j Cjj0  0. The corre-
arbitrary. sponding constraint j j is easily verified to be first
We thus see the emergence of another new feature class.
in the theory, in addition to the appearance of 0
constraints. It is that the general solution of the By redefining the constraints as j ! j = aj j j0
0
equations of motion may contain arbitrary functions with aj j (q, p) invertible, one can bring the Poisson
of time (when A 6 0), in agreement with the brackets of the constraints to the form
possible presence of a gauge symmetry.
a ; b  0; a ;   0;  ;   C 22
with (j )  (a ,  ) and where the matrix C is
First- and Second-Class Constraints
invertible. (We assume, for simplicity, throughout
First- and second-class functions A function F(q, p) that the rank of the matrix Cjj0 is constant on the
is called a first-class function if it generates a constraint surface (regular case).) In this repre-
canonical transformation that maps the constraint sentation, the constraints are completely split into
surface on itself. Thus, F(q, p) is first class if its first-class constraints (a ) and second-class
614 Constrained Systems

constraints ( ): there is no first-class constraint left transformations as being the transformations gener-
among the  s, and the set {a } exhausts all the ated by the first-class constraints).
first-class constraints. Note that now the index The extended Hamiltonian HE is defined to be the
 runs over all (primary and
a = 1, . . . , A, A 1, . . . , A sum of the first-class Hamiltonian [20] and of all the
secondary) first-class constraints. first-class constraints a multiplied by an arbitrary
This separation of the constraints into first-class Lagrange multiplier,
and second-class constraints is quite important
H E H 0 va  a 23
because, as already seen above, the first-class
constraints generate admissible canonical transfor- (with a summed from 1 to A). It is the generator of
mations, while the second-class constraints do not. the time evolution in which the complete gauge
For a bosonic system, the matrix C is antisym- symmetry is fully displayed.
metric. As C is invertible, this implies that the
number of second-class constraints is even. In the
fermionic case, C is symmetric (in the fermionic Elimination of second-class constraints Dirac
sector) and, therefore, the number of second-class brackets Second-class constraints do not generate
constraints can be even or odd. permissible canonical transformations, since they do
not map the constraint surface on itself. For this
reason, it is convenient to eliminate them. This can
First-class constraints and gauge symmetries The consistently be done by using the Dirac brackets
first-class constraints not only map the constraint instead of the Poisson brackets. By definition, the
surface on itself, but generate, in fact, transforma- Dirac bracket [F, G]D of two phase-space functions
tions that do not change the physical state of the F and G is given by
system, that is, gauge transformations. Indeed, the F; DD F; G  F;  C  ; G 24
presence of arbitrary functions in the solutions of
the equations of motion indicates that the qs and where C is the inverse to C ,
the ps involve some redundancy and are not all
C C 
physically distinct. Only those phase-space functions
whose time evolution does not depend on the (which exists since the  s are second class). As
arbitrary functions ua are observables. shown by Dirac, the bracket [24] is indeed a bracket
That the first-class constraints generate gauge (antisymmetry, derivation property, and Jacobi
transformations is rather clear in the case of the identity). Furthermore, it fulfills the crucial property
first-class primary constraints, since these appear that the Dirac bracket of anything with any second-
explicitly in the generator of the time evolution class constraint is zero,
multiplied by arbitrary functions. That it also holds
for the first-class secondary constraints is known as F;  D 0 F arbitrary 25
the Dirac conjecture. This conjecture can be
Thus, one can consistently eliminate the second-class
proved under reasonable assumptions (see, e.g.,
constraints and replace the Poisson bracket by the
Henneaux et al. 1990). The reason that the
Dirac bracket. Once this is done, one has fewer
secondary first-class constraints also correspond to
canonical variables and only first-class constraints
gauge transformations is that they appear in the
remain (if any). It also follows from the definition
brackets of the Hamiltonian with the primary first-
that the Dirac bracket of two first-class functions is
class constraints. Thus, different choices of arbitrary
equal to their Poisson bracket.
functions ua in the dynamical equations of motion
will lead to phase-space points that differ by a
canonical transformation whose generator involves Gauge conditions One can push the reduction
the secondary first-class constraints as well. procedure further and eliminate the first-class con-
In any case, as noted below, one must identify the straints by means of gauge conditions. Gauge condi-
phase-space points in the same orbit generated by all tions Ca = 0 are conditions on the phase-space
the first-class constraints (primary and secondary) in variables which do not follow from the Lagrangian
order to get a reduced space with a symplectic and which have the property that they cut each gauge
structure (reduced phase space). For this reason, orbit once and only once. Since the gauge transfor-
one postulates that the first-class constraints always mations are generated by the first-class constraints,
generate gauge transformations, even for systems this requirement is (locally) equivalent to
which are counterexamples to the Dirac conjecture
(i.e., in that case, one defines the gauge Ca ; b "b  0 ) "b  0 26
Constrained Systems 615

That is, the constraints (a , Cb ) form together a Second example (see eqn [4]). The primary
second-class system: there is no first-class constraint constraints are p1  z2 = 0 and p2 = 0 and define a
left once the conditions Ca = 0 are included. One two-dimensional plane in the four-dimensional
can then eliminate all the constraints and gauge phase space (z1 , z2 , p1 , p2 ). The consistency algo-
conditions and introduce the corresponding Dirac rithm forces u1 = z2 and u2 = 0 and does not bring
bracket. For gauge-invariant functions, this Dirac any further constraint. The constraints are second
bracket coincides with the original Poisson bracket. class since [p2 , p1  z2 ] = 1. One can eliminate p1
The reduced phase space is the unconstrained and p2 through the constraints. The Dirac brackets
space obtained after this reduction, equipped with of the remaining variables vanish, except
the Dirac bracket. It has dimension 2n  s  2A,  [z1 , z2 ] = 1. The reduced phase is the space of the
where 2n is the dimension of the original phase zs, with z2 conjugate to z1 . The Hamiltonian is the
space, s is the number of second-class constraints, free-particle Hamiltonian , H = (1/2)(z2 )2 . Thus, one
and A is the number of first-class constraints. In the recovers the original description which was already
bosonic case, this number is even (as it should) in Hamiltonian form. (The recognition that a system
because s is even. One sees that first-class con- is already in first-order form often enables one to
straints strike twice since they need gauge shortcut some aspects of the Dirac procedure by not
conditions. introducing the unnecessary momenta which would
The observables of the theory are the reduced in any case be eliminated in the end.)
phase-space functions. They form a Poisson algebra,
the relevant reduced phase-space bracket being the
Dirac bracket associated with all the constraints and Quantization
gauge conditions. The symplectic structure defined
The phase space of physical interest is the reduced
in the reduced phase space is nondegenerate because
phase space and the physical algebra is the algebra
one has removed all the first-class constraints.
of the observables. The quantization of the theory
The definition of reduced phase space given above
then amounts to quantizing the algebra of the
is useful in practice but has the conceptual
observables. This can be achieved along two
drawback of relying on gauge conditions. This
different lines:
approach does not display clearly its intrinsic
significance and, furthermore, in the case of the 1. Reduce then quantize: In this direct approach,
so-called Gribov problems (global obstructions to one represents as quantum operators only the
cutting each gauge orbit once and only once), may reduced phase-space functions. There is no
yield the incorrect expectation that the reduced operator associated with non-gauge-invariant
phase space does not exist. We shall provide a more functions.
intrinsic definition below, which does not involve 2. Quantize then reduce: In this approach, one
gauge conditions. represents as quantum operators the bigger alge-
bra of functions of all the phase-space variables.
One must then take into account the constraints.
Examples The second-class constraints are enforced as
First example (see eqn [2]). There is here one operator equations, which is consistent with the
primary constraint, namely  = 0. The canonical correspondence rule that the commutator in the
Hamiltonian is (1=2)((p1 )2 (p2 )2 ) (p1 p2 ). quantum theory is ih times the Dirac bracket,
The consistency algorithm yields the secondary
AB  BA ihA; BD 27
constraint p1 p2 = 0 and no condition on the us.
The constraints are first class. They generate the (plus higher-order terms in h). The first-class
gauge transformations q1 ! q1 ", q2 ! q2 ", constraints are implemented in a more subtle
and  ! 
, which coincide with the Lagrangian way. It would be inconsistent to impose them as
gauge transformations if one identifies
with "_ operator equations since in general [a , F]D 6 0
(" and "_ are, of course, independent at any given (even in the Dirac bracket). What one does is to
time). One can fix the gauge by means of the gauge impose them as conditions on the physical states:
conditions  = 0, q1 q2 = 0. The reduced phase these are defined as the states annihilated by the
space is two-dimensional and the observables can first-class constraints,
be identified with the functions of the gauge-
a j i 0 28
invariant variables (1=2)(q1  q2 ) and p1  p2 ,
which are conjugate. Any other gauge condition For simple systems, it is easy to verify that the two
leads to the same reduced phase space. procedures are equivalent. There is yet another
616 Constrained Systems

approach, in which one extends the system rather functions in C1 (), that is, to impose that they are
than reduce it. This is the BecchiRouetStora constant along the gauge orbits O. Assuming all
Tyutin (BRST) approach, in which the new variables necessary smoothness and regularity conditions to be
are called ghosts. fulfilled (i.e., that the orbits fiber which is, for
instance, the case if the gauge orbits are the orbits
of a free and proper group action), one may denote
the algebra of observables as C1 (=O). This algebra
Geometric Description
is a Poisson algebra because the induced 2-form on
We defined above first-class and second-class the quotient space =O is nondegenerate. The
constraints through algebraic means. It turns out algebraic description of the observables underlies the
that these definitions also have a geometrical BRST construction.
interpretation, which sheds considerable insight It is interesting to note that in the covariant
into their nature. approach to phase space, a similar two-step reduc-
The phase-space symplectic 2-form induces, by tion procedure occurs. What plays the role of the
pullback, a 2-form  on the constraint surface . constraint surface is the stationary surface in the
While is of maximal rank, this may not be the case space of all histories qi (t) of the dynamical variables.
for the induced  , which may be degenerate. In The gauge symmetry acts on this space and the
fact, the rank of  fails to be equal to the reduced phase space is just the quotient space. One
maximum rank 2n  J (where J is the total number can establish the equivalence of the two descriptions
of constraints) by precisely the number A  of first- (Barnich et al. 1991).
class constraints.
Indeed, the Hamiltonian vector fields Xa associated See also: BatalinVilkovisky Quantization; BRST
with the first-class constraints are tangent to the Quantization; Canonical General Relativity; Operads;
constraint surface  and are null eigenvectors of  , Perturbative Renormalization Theory and BRST;
Quantum Dynamics in Loop Quantum Gravity; Quantum
 Xa ; Y 0 8Y tangent to  29 Field Theory: A Brief Introduction.

as an immediate consequence of the first-class


property. Here, all first-class constraints (primary
and secondary) yield a null eigenvector. The integral Further Reading
surfaces of the vector fields Xa are the gauge orbits. Anderson JL and Bergmann PG (1951) Constraints in covariant
The reduced phase space is nothing else but the field theories. Physical Review 83: 1018.
quotient space of the constraint surface by the gauge Barnich G, Henneaux M, and Schomblond C (1991) On the
orbits. The 2-form induced in the quotient space is covariant description of the canonical formalism. Physical
Review D 44: 939.
invertible because one has removed all degeneracy
Dirac PAM (1950) Generalized Hamiltonian dynamics. Canadian
directions (including the ones associated with sec- Journal of Mathematics 2: 129.
ondary first-class constraints). Reaching the reduced Dirac PAM (1967) Lectures on Quantum Mechanics. New York:
phase space falls under the scope of Hamiltonian Academic Press.
reduction. The observables are the functions on the Flato M, Lichnerowicz A, and Sternheimer D (1976) Deforma-
tions of Poisson brackets, Dirac brackets and applications.
reduced phase space.
Journal of Mathematical Physics 17: 1754.
Thus, the reduced phase space is obtained through Hanson A, Regge T, and Teitelboim C (1976) Constrained
a two-step procedure. First, one restricts the functions Hamiltonian Systems. Rome: Accad. Naz. dei Lincei.
to functions on the constraint surface . One may Henneaux M and Teitelboim C (1992) Quantization of Gauge
view the algebra C1 () of smooth functions on  as Systems. Princeton: Princeton University Press.
Henneaux M, Teitelboim C, and Zanelli J (1990) Gauge
the quotient algebra C1 (P)=N of the algebra C1 (P)
invariance and degree of freedom count. Nuclear Physics B
of smooth phase-space functions by the ideal N of 332: 169.
phase-space functions that vanish on the constraint Marsden JE and Weinstein A (1974) Reduction of symplectic
surface . The second step in the reduction procedure manifolds with symmetry. Reports on Mathematical Physics
is to impose the gauge-invariant condition on the 5: 121.
Constructive Quantum Field Theory 617

Constructive Quantum Field Theory


G Gallavotti, Universita di Roma La Sapienza, relativistic covariance, RuelleHaag scattering
Rome, Italy theory: the reconstruction problem.
2006 G Gallavotti. Published by Elsevier Ltd. The characteristic problem for the construction of
All rights reserved. quantum fields is (1) and here attention will be
confined to it with the further restriction to the
paradigmatic massive scalar fields cases. The dimen-
Euclidean Quantum Fields sion d of the spacetime will be d = 2, 3 unless
The construction of a relativistic quantum field is specified otherwise.
still an open problem for fields in spacetime Given a cube  of side L,   R d , consider the
dimension d  4. The conceptual difficulty that following functional integral on the space of the fields on
sometimes led to fear an incompatibility between , that is, on functions (N)
x
defined for x 2 ,
nontrivial quantum systems and special relativity Z  Z 
N4 N2
has however been solved in the case of dimension ZN ; f exp  N x N x
d = 2, 3 although, so far, has not influenced the 
 
corresponding debate on the foundations of quan- N
N fx x dx PN dN 1
tum mechanics, still much alive.
It began in the early 1960s with Wightmans work
on the axioms and the attempts at understanding the The fields (N)
x
are called Euclidean fields with
mathematical aspects of renormalization theory and ultraviolet cutoff N > 0, fx is a smooth function with
with Hepps renormalization theory for scalar fields. compact support bounded by jfx j  1 (for definiteness),
The breakthrough idea was, perhaps, Nelsons the constants N > 0, N , N are called bare cou-
realization that the problem could really be studied plings, and PN is a Gaussian probability distribution
in Euclidean form. A solution in dimensions d = 2, 3 defining the free-field distribution with mass m and
has been obtained in the 1960s and 1970s through a ultraviolet cutoff N; the probability distribution PN
def R (N)
remarkable series of papers by Nelson, Glimm, is determined by its covariance C (N)
x, h =
x
Jaffe, and Guerra. While the works of Nelson and (N)
h dPN , which in the physics literature is called a
Guerra relied on the Euclidean approach (see propagator, given by
below) and on d = 2, the early works of Glimm and
1 X Z eipxhnL
Jaffe dealt with d = 3 making use of the Minkowskian N
Cx;h N jpjdd p 2
d p 2 m2
approach (based on second quantization) but 2 n2Zd
making already use of a multiscale analysis
technique. The latter received great impulsion and The sum over the integers n 2 Zd is introduced so that
systematization by the adoption of Wilsons views the field (N)
x
is periodic over the box : this is not
and methods on renormalization: in physics termi- really necessary as in the limit L ! 1 either translation
nology, renormalization group methods; a point of invariance would be recovered or lack of it properly
view taken here following the Euclidean approach. understood, but it makes the problem more symmetric
The solution dealt initially with scalar fields but it and generates a few technical simplifications; here
has been subsequently considerably extended. N (z) is a regularizer and a standard choice is
The Euclidean approach studies quantum fields
through the following problems: m2  2N  1
N jpj
p2  2N m2
1. existence of the functional integrals defining the
generating functions (see below) of the probabil- with  > 1, which is such that
ity distribution of the interacting fields in finite
volume: the ultraviolet stability problem, N jpj 1 1
2. existence of the infinite-volume limit of the 2 2
 2 2
 2
p m p m p  2N m2
generating functions: the infrared problem, XN  
1 1
and   3
3. check that the infinite volume generating h1
p2  2h1 m2 p2  2h m2
functions satisfy the axioms needed to pass
from the Euclidean, probabilistic, formulation here  > 1 can be chosen arbitrarily: so  = 2. If
to a Minkowskian formulation guaranteeing d > 3, the above regularization will not be sufficient
the existence of the Hamiltonian operator, and a N decaying faster than p2 would be needed.
618 Constructive Quantum Field Theory

A simple estimate yields, if " 2 (0, 1) is fixed and c the fields x(N) sampled with distribution PN
is suitably chosen, are rather singular objects. Their properties cannot be
  described by a single length scale: they are extremely
 N 
Cx;h   c d2N emjxhj large for large N, take independent values only beyond
  distances of order m1 but, at the same time, they look
 N N  "
Cx;h  Cx;h0   c d2N  N mjh  h0 j 4 smooth only on the much smaller scale m1  N . Their
essential feature is that fixed " < 1, for example,
with  (d2)N interpreted as N if d = 2.
" = 1=2, with PN -probability 1 there is B > 0 such
The
that (interpreting  (d2)=2N as N if d = 2)
ZN ; f  
f log  N 
ZN ; 0 x   B Nd2=2
   "=2 6
defines a generating function of a probability  N N 
x  h  < B Nd2=2  N mjx  hj
distribution Pint over the fields on  which will be
called the distribution with 4 -interaction regu- and furthermore the probability of the relations in
larized on  and at length scale m1  N : the [6] will be N-independent, that is, (N) are
x
integral, in [1], bounded and roughly of size  N(d2)=2 as N ! 1
  Z  and, on a very small length scale m1  N , almost
def N4 N2
VN N  N x N x constant.

 Substantial control on the field (N) x
statisti-
N
N f x x dd x 5 cally sampled with distribution PN can be obtained
by decomposing it, through [3], into components
will be called the interaction potential with of various scales: that is, as a sum of statistically
external field f. The regularizationR is introduced to mutually independent fields whose properties
guarantee that the integral [1], eVN dPN , is well are entirely characterized by a single scale of length.
defined if N > 0. The momenta of Pint are the This means that they have size of order 1 and
functional derivatives of (f ): they are called are independent and smooth on the same length
Schwinger functions. scale.
The problem (1) can now be made precise: it is to Assuming the side of  to be an integer multiple
show the existence of N , N , N so that the limit of m1 , let Qh be a pavement of  into boxes of
side m1  h , imagined hierarchically arranged so
ZN ; f that the boxes of Qh are exactly paved by those of
lim
N!1 ZN ; 0 Qh1 .
Define z(h) to be the random field with propa-
exists for all f and is not Gaussian, that is, it is not x
gator C(h) with Fourier transform
the exponential of a quadratic form in f: which x, h

would be the case if N , N ! 0 fast enough: the last X 1 1



h

requirement is of course essential because the 2  2 m2


 2 2
einp L
d
p p m
n2Z
Gaussian case describes, in the physical interpreta-
tion, free fields and noninteracting particles, that is, so that (N)
x
and its propagator C(N)
x, h
can be repre-
it is trivial. Note that N does not play a role: its sented, see [2], [3], as
introduction is useful to be able to study separately
the numerator and the denominator of the fraction N
X
N
h
x   hd2=2 zh x
ZN ; f h1
7
ZN ; 0 N
X
N
h
hd2
Cx;h  Ch x;h h
For more details, the reader is referred to Wightman h1
and Garding (1965), Streater and Wightman (1964), where the fields z(h) are independently distributed
Nelson (1966), Guerra (1972), Osterwalder and Gaussian fields. Note that the fields z(h) are also
Schrader (1973), and Simon (1974). almost identically distributed because their propa-
gator is obtained by periodizing over the period  h L
the same function
The Regularized Free Field Z ipxh  
0 def e dp 1 1
Since the propagator, see [4], decays exponentially Cx;h 
over a scale m1 and is smooth over a scale m1  N , 2d p2  2 m2 p2 m2
Constructive Quantum Field Theory 619

that is, their propagator is Perturbation Theory


X 0
h
Cx;h Cx;hh nL The naive approach to the problem is to fix N 
n2Zd  > 0 and to develop ZN (, f ) or, more conveniently
and equivalently, (1=jj) log ZN (, f ) in powers of .
The reason why they are not exactly equally If one fixes a priori N , N independent of N,
distributed is that the field z(h) x
is periodic with however, even a formal power series is not possible:
period  h L rather than L. But proceeding with care this is trivially due to the divergence of the
the sum over n in the above expressions can be coefficients of the power series, already to second
essentially ignored: this is a little price to pay if one order, for generic f in the limit N ! 1. Nevertheless
wants translation invariance built in the analysis it is possible to determine N (), N () as functions
since the beginning. of N and  so that a formal power series exists (to
The representation [7] defines a multiscale all orders in ): this is the key result of renormaliza-
representation of the field (N) x
. Smoothness tion theory.
properties for the field (N) x
can be read from To find the perturbative expansion, the simplest is
those of its components z(h) . Define, for  2 Q0 , to use a graphical representation of the coefficients of
0  1 the power expansion in , N , N , f and the Gaussian
     h h 
 h   h   z x
 z h  integration rules which yield (after a classical
z  max @zx  A 8
 x2;h2 jx  hj
1=4 computation) that the coefficient of n pN fx1 . . . fxr is
1
jxhjm obtained by considering the graph elements shown in
and will be chosen = 0 or = 1 as needed (in Figure 1, where the segments will be called half-lines
practice = 0 if d = 2 and = 1 if d = 3): = 1 will and the graph elements will be called, respectively,
allow us to discuss some smoothness properties of coupling or 4 -vertex, mass vertex, vacuum
the fields which will be necessary (e.g., if d = 3). vertex, and external vertex.
Then the size jjzjj of any field z(h) , for all h  1, is The half-lines of the graph elements are consid-
estimated by ered distinct (i.e., imagine a label attached to
  distinguish them). Then consider all possible con-
c0 B2
P max jjzjj  B  ece jj nected graphs G obtained by first drawing, respec-
Q0 tively, n, p, r graph elements in Figure 1, which are
Y 0 2 9
Pjjzjj  B ; 8 2 D  cec B not vacuum vertices, with their nodes marked by
2D points in  named x1 , . . . , xn , xn1 , . . . , xnpr ; and
form all possible graphs obtained by attaching pairs
where P is the Gaussian probability distribution of
of half-lines emerging from the vertices of the graph
z, D is any collection of boxes  2 Q0 and c, c0 > 0
elements. These are the nontrivial graphs.
are suitable constants. The [9] imply in particular
Furthermore, consider also the single trivial
[6]. The estimates [9] follow from the Markovian
graph formed just by the third graph element and
nature of the Gaussian field z(h) , that is, from the
consisting of a single point. All graphs obtained in
fact that the propagator is the Greens function of an
this way are particular Feynman graphs.
elliptic operator (of fourth order, see the first of [3]),
Given a nontrivial graph G (there are many of
with constant coefficients which implies also the
them) we define its value to be the product
inequalities (fixing " 2 (0, 1))
  Z  WG x1 ; . . . ; xn ; xn1 ; . . . ; xnpr
 h    0
Cx;h    zx zh Pdz  cemc jxhj Q
10 n pN fxnpj Y N
  1 npr
Cx ;h 11
 h h  " n!p!r!
Cx;h  Cx;h0   cmjh  h0 j

where jx  hj is reinterpreted as the distance where the last product runs over all pairs = (x , h )
between x, h measured over the periodic box  h  of half-lines of G that are joined and connect two
(hence jx  hj differs from the ordinary distance vertices labeled by points x , h : call line of G any
only if the latter is of the order of  h L). The such pair. If the graph consists of the single vacuum
interpretation of [10] is that z(h)x
are essentially
bounded variables which, on scale m1 , are
essentially constant and furthermore beyond length
m1 are essentially independently distributed.
For more details, the reader is referred to Wilson Figure 1 The graph elements to representing (N)4

, (N)2 ,
(1970, 1972) and Gallavotti (1981, 1985). a constant (N) .
620 Constructive Quantum Field Theory

R
vertex its value will be N . The series for C(N)
ax
C(N)3
xh
(C(N)
hb
 C(N)
xb
) dh. If d = 2, we only
(1=jj) log ZN (, f ) is then need to define N as the first term on the right-hand
Z side (RHS) of [14] and we can leave the subgraphs like
npr
1 X Y the second in Figure 2 as they are (without any
N WG x1 ; . . . ; xnpr dxj 12
jj G j1
renormalization).
Graphs without external lines are called vacuum
and the integral will be called the integrated graph graphs and there are a few such graphs which are
value. divergent. Namely, if d = 3, they are the first three
Suppose first that N = N = 0. Then if a graph G drawn in Figure 3; furthermore, if N is set to the
contains subgraphs like in Figure 2, the correspond- above nonzero value a new vacuum graph, the
ing respective contribution to the integral in [12] fourth in Figure 3, can be formed. Such graphs
(considering only the integrals over h and suitably contribute to the graph value, respectively, the terms
taking care of the combinatorial factors) is a factor in the sum
obtained by integrating over x the quantities Z
N2 4! N4 23  3!3 3
3Cx ;x 2 Cx x dx2  
N
6Cax Cxx Cxb
N N 1 1 2 1 2 3!
Z
Z N2 N2 N2 N
42  3! 2 N 13
Cx x Cx x Cx x dx2 dx3  N Cx x 15
N3 N 1 2 2 3 3 1 1 1
or  Cax Cxh Chb dh
2!
and diverge, respectively, as  2N ,  N , N,  2N if d = 3
which if d = 3 diverge as N ! 1 as  or, respec- N while, if d = 2, only the first and the last (see [14])
tively, as N; the second factor does not diverge in diverge, like N 2 .
dimension d = 2 while the first still diverges as N. The Therefore, if we fix N as minus the quantity in
divergences arise from the fact that as x  h ! 0 the [15] we can disregard graphs like those in Figure 3;
propagator behaves as jx  hjN if d = 3 or as if d = 2 N can be defined to be the sum of the first
log jx  hj if d = 2, all the way until saturation and last terms in [15].
occurs at distance jx  hj m1  N : for this reason The formal series in  and f thus obtained is called
the latter divergences are called ultraviolet the renormalized series for the field 4 in
divergences. dimension d = 2 or, respectively, d = 3. Note that
However, if we set N 6 0, then for every graph with the given definitions and choices of N , N the
containing a subgraph like those in Figure 2 there only graphs G that need to be considered to
is another one identical except that the points construct the expansion in  and f are formed by
a, b are connected via a mass vertex, see Figure 1, the first and last graph elements in Figure 1, paying
with the vertex in x, by a line ax and a line xb; attention that the graphs in Figure 3 do not
the new graph value receives a contribution from contribute and, if d = 3, the graphs with subgraphs
the mass vertex inserted in x between a and b like the second in Figure 2 have to be computed with
simply given by a factor N . Therefore if we fix, the modification described.
for d = 3, In the next section, it will be shown that the
above are the only sources of divergences as N ! 1
N 42  3! 2 and therefore the problem of studying [1] is solved
N 6Cxx 
2 at the level of formal power series by the subtraction
Z
N3 def N in [14]. This also shows that giving a meaning to the

Cxh dh  6Cxx N 14 series thus obtained is likely to be much easier if

d = 2 than if d = 3.
we can simply consider graphs which do not contain The coefficients of order k of the expansion in 
any mass graph element and in which there are no of (1=jj) log ZN (, f ) can be ordered by the number
subgraphs like the first in Figure 2 while the subgraphs 2n of vertices
R representing Q external fields: and have
2n
Rlike(N)
the second in Figure 2 do not contribute a factor the form S(k) (
2n 1 x , . . . , x 2n ) i = 1 (fxi dx i ): the kernels
Cax C(N)3
xh
(N)
Chb dh but a renormalized factor (k)
S2n are the Schwinger functions of order 2n, see the
section Euclidean quantum fields.

1
1
1 2 2 1
3
Figure 2 Divergent subgraphs, if d = 3. If d = 2 only the first
diverges. Figure 3 Divergent vacuum graphs.
Constructive Quantum Field Theory 621

Remark If d = 4, the regularization at cutoff N in The distinctions between the cases d = 2, 3, 4, >4
[2] is not sufficient as in the subtraction procedure explain the terminology given to the 4 -scalar field
smoothness of the first derivatives of the field theories calling them super-renormalizable if
(N) is necessary, while the regularization [2] does d = 2, 3, renormalizable if d = 4 and nonrenormaliz-
not even imply [6], that is, not even Holder able if d > 4. Since the (divergent) coefficients in the
continuity. A higher regularization (i.e., using a formal power series defining N , N , N , N are
N like the square of the N in [3]). Furthermore, called counter-terms, the 4 -scalar fields require
the subtractions discussed in the case d = 3 are not finitely many counter-terms (see [14]) in the super-
sufficient to generate a formal power series and renormalizable cases and infinitely many in the
many more subtractions are needed: for instance, renormalizable case. The nonrenormalizable cases
graphs with a subgraph like the one in Figure 4 (d > 4) cannot be treated in a way analogous to the
would give a contribution to the graph value which renormalizable ones.
is a factor For more details, the reader is referred
2 Z to Gallavotti (1985), Aizenman (1982), and
2 def 2  6 2 N2
 N  Cxh dh Frohlich (1982).
2! 

also divergent as N ! 1 proportionally to N.


Although this divergence could be canceled by Finiteness of the Renormalized Series,
changing  into N =  2 N the previously dis- d = 2, 3: Power Counting
cussed cancelations would be affected and a change
in the value of N would become necessary; Checking that the renormalized series is well defined
furthermore, the subtraction in [14] will not be to all orders is a simple dimensional estimate
sufficient to make finite the graphs, not even to characteristic of many multiscale arguments that in
second order in , unless a new term physics have become familiar with the name of
R R
 N (@x (N)
x
) 2
dx with N = (1=2) 2
@ h C (N)3
xh
renormalization group arguments.
(x  h)2 is added in the exponential in [1]. Consider a graph G with n r vertices built over n
But all this will not be enough and still new graph elements with vertices x1 , . . . , xn each with four
divergences, proportional to 3 , will appear. half-lines and r graph elements with vertices
xn1 , . . . , xnr representing the external fields: as
And so on indefinitely, the consequence being that remarked in the previous section, these are the only
it will be necessary to define N , N , N , N as graphs to be considered to form the renormalized series.
formal power series in  (with coefficients diverging Develop each propagator into a sum of propaga-
as N ! 1) in order to obtain a formal power series tors as in [7]. The graph G value will, as a
in  for [1] in which all coefficients have a finite consequence, be represented as a sum of values of
limit as N ! 1. Thus, the interpretation of the new graphs obtained from G by adding scale labels
formal renormalized series in the case d = 4 is on its lines and the value of the graph will
substantially different and naturally harder than be computed as a product of factors in which a
the cases d = 2, 3. Beyond formal perturbation line joining xh and bearing a scale label h
expansions, the case d = 4 is still an open problem: will contribute with C(h) replacing C(N) . To avoid
xh xh
the most widespread conjecture is that the series proliferation of symbols, we shall call the
cannot be given a meaning other than setting to 0 all graphs obtained in this way, i.e., with the scale
coefficients of j , j > 0. In other words, the con- labels attached to each line, still G: no confusion
jecture claims, there should be no nontrivial solution should arise as we shall, henceforth, only consider
to the ultraviolet problem for scalar 4 fields in graphs G with each line carrying also a scale label.
d = 4. But this is far from being proved, even at a The scale labels added on the lines of the graph G
heuristic level. The situation is simpler if d  5: in allow us to organize the vertices of G into
such cases, it is impossible to find formal power clusters: a cluster of scale h consists in a maximal
series in  for (1=jj) log ZN (, f ), even allowing set of vertices (of the graph elements in the graph)
N , N , N , N to be formal power series in  with connected by lines of scale h0  h among which one
divergent coefficients. at least has scale h.
It is convenient to consider the vertices of the

graph elements as trivial clusters of highest scale:
conventionally call them clusters of scale N 1.
The clusters can be of first generation if they
Figure 4 The simplest new divergent subgraph on d = 4. contain only trivial clusters, of second generation
622 Constructive Quantum Field Theory

if they contain only clusters which are trivial or of k=0


the first generation, and so on. h
Imagine to enclose in a box the vertices of graph p q
m
f t
elements inside a cluster of the first generation and
then into a larger box the vertices of the clusters of 1 2 3 4 5 6 7 8 9
the second generation and so on: the set of boxes Figure 7 The clusters in Figure 6 after affixing the scale labels.
ordered by inclusion can then be represented by a
rooted tree graph whose nodes correspond to the
clusters and whose top points are nodes represent- instance, in the case of Figure 6 one gets Figure 7.
ing the trivial clusters (i.e., the vertices of the graph). By construction, if two top points x and h are inside
If the maximum number of nodes that have to be the same box bv of scale hv but not in inner boxes,
crossed to reach a top point of the tree starting from then there is a path of graph lines joining x and h
a node v is nv (v included and the top nodes all of which have scales  hv and one at least has
included), then the node v represents a cluster of the scale hv .
nv th generation. The first node before the root is a Given a graph G, fix one of its points x1 (say) and
cluster containing all vertices of G and the root of integrate the absolute value of the graph over the
the tree will not be considered a node and it can positions of the remaining points. The exponential
conventionally bear the scale label 0: it represents decay of the propagators implies that if a point h is
symbolically the value of the graph. linked to a point h0 by a line of scale h the
For instance, in Figure 5 a tree is drawn: its integration over the position of h0 is essentially
nodes correspond to clusters whose scale is indicated constrained to extend only over a distance  h m1 .
next to them; in the second part of the drawing, the Furthermore, the maximum size of the propagator
trivial clusters as well as the clusters of the first associated with a line of scale h is bounded
generation are enclosed into boxes. proportionally to  (d2)h . Therefore, recalling that
Then consider the next generation clusters, that is, jfx j is suppose bounded by 1, the mentioned integral
the clusters which only contain clusters of the first can be immediately bounded by
generation or trivial ones, and draw boxes enclosing n nr def n Cnr Y d2=2h Y dhv sv 1
all the graph vertices that can be reached from each C I   16
n!r! n!r! v
of them by descending the tree, etc. Figure 6
represents all boxes (of any generation) correspond- where, C being a suitable constant, the first product
ing to the nodes of the tree in Figure 5. The is over the half-lines composing the graph lines and
representations of the clusters of a graph G by a tree the second is over the tree nodes (i.e., over the
or by hierarchically ordered boxes (see Figures 5 and clusters of the graph G), sv is the number of
6) are completely equivalent provided inside each subclusters contained in the cluster v but not in
box not representing a top point of the tree the scale inner clusters; and in [16] the scale of a half-line is
hv of the corresponding cluster v is marked. For h if is paired with another half-line to form a line
(in the graph G) of scale label h .
Denoting by v0 the cluster immediately containing
1
1 v in G, by ninner
v the number of half-lines in the
2 2
3
cluster v, by nv , rv the numbers of graph elements of
3
4 4
the first type or of the fourth type in Figure 1 with
p q m vertices in the cluster v, and denoting by nev the
5 leads to 5
k=0 h 6 6 number of lines which are not in the cluster v but
f 7 7 have one extreme on a vertex in v (lines external to
t 8 v), the identities (k = 0)
8
9 X
9
hv  ksv  1
Figure 5 A tree and its clusters of generation 1 and 2. v>root
X
 hv  hv0 nv rv  1
v>root
X X
hv  kninner  ninner
hv  hv0 e 17
v v
v>root v>root
with
1 2 3 4 5 6 7 8 9
def
Figure 6 All clusters of any generation for the tree in Figure 5. ninner
e v 4nv rv  nev
Constructive Quantum Field Theory 623

hold, so that the estimate [16] can be elaborated into For more details, the reader is referred to Hepp
Y (1966), Gallavotti (1985), sections 8 and 16.
I  v hv hv0
v>r
18
def d2 d2 e
v d 4  dnv rv nv Asymptotic Freedom (d = 2, 3).
2 2
Heuristic Analysis
where hv0 = k = 0 if v is the first nontrivial node (i.e.,
v0 = root), and an estimate of the integral of the Finiteness to all orders of the perturbation expan-
absolute value of the graphs G with given tree sions is by no means sufficient to prove the existence
structure but different scale labels is proportional to of the ultraviolet limit for ZN (, f ) or for (1=jj)
{hv } I < 1 if (and only if) v > 0, 8v. log ZN (, f ): and a priori it might not even be
But there may be clusters v with only two necessary. For this purpose, the first step is to check
external lines nev = 2 and two graph vertices inside: uniform (upper and lower) boundedness of ZN (, f )
for which v = 0. However, this can happen only if as N ! 1.
d = 3 and in only one case: namely if the graph G The reason behind the validity of a bound
contains a subgraph of the second type in Figure 2 ejjE (, f )  ZN (, f )  ejjE (, f ) with E (, f ) cutoff
and the three intermediate lines form a cluster v of independent has been made very clear after the
scale hv while the other two lines are external to it: introduction of the renormalization group methods
hence on scale h0 > hv . In this case, one has to in field theory. The approach studies the integral
remember that the subtraction in the previous section ZN (, f ), recursively, decomposing the field (N) x
has led to a modification of the contribution of such a into its regular components z(h) x
, see [7], and
subgraph to the value of the graph (integrated over integrating first over z(N) , then over z(N1) and so on.
the position labels of the vertices). As discussed in the The idea emerges naturally if the potential VN in
previous section, the change amounts to replacing the [1] and [4] is written in terms of the normalized
def
variables Xx(N)  N(d2)=2 (N)
0
(h0 ) (h0 )
propagator C(h h, b
)
by C h, b
 C x, b
. x
, see [6]; here if d = 2
(d2)=2N
This improves, in [18], the estimate of the contribu- the factor  is interpreted as N1=2 .
tion Rof the line joining h to b from being proportional The key remark is that as far as the integration
(hv )3 (h0 ) over the small-scale component z(N) is concerned the
to Cxh Chb dh to being proportional to
R (hv )3 (h0 ) 0 field X(N) is a sum of two fields of size of order 1
Cxh (Chb  C(h )
) dh; and this changes the con- x
(statistically),
xb
0 R hv
tribution of the line hb from  (d2)h to em jxhj
h0 1=2 (h0 ) N N N1
( jx  hj) dh because C is regular on scale Xx  zN x  d2=2 Xx
0
 h m1 , see [10] with " = 1=2.
Since x, h are in a cluster of higher scale hv this if d = 2 this becomes
0
means that the estimate is improved by  (1=2)(hv h ) .
N 1 N N  11=2 N1
In terms of the final estimate, this means that v in Xx  zN Xx
[18] can be improved to v = v 1=2 for the N 1=2  x N 1=2
clusters for which v = 0. Hence, the integrated and it can be considered to be smooth on scale m1  N
value of the graph G (after taking also into account (also statistically). Hence, approximately constant
the integration over the initially selected vertex x1 , and of size of order O(1) on the small cubes  of
trivially giving a further factor jj by translation volume  dN md of the pavement QN introduced
invariance), and summed over the possible scale before [7]; at the same time it can be considered to
labels is bounded proportionally to jj{hv } I < 1 take (statistically) independent values on different cubes
once the estimate of I is improved as described. of QN . This is suggested by the inequalities [8][10].
Note that the graphs contributing to the perturbation Therefore, it is natural to decompose the potential
series for (1=jj) log ZN (, f ) to order n are finitely VN , see [5], as a sum over the small cubes  of volume
many because the number r of external vertices is r   dN md of the pavement QN as (see [14] for the
2n 2 (since graphs must be connected). Hence, the definition of N , N ), taking henceforth m = 1,
perturbation series is finite to all orders in . X Z 
N def N 4
The above is the renormalizability proof of the VN z   Nd
 2d2N Xx
scalar 4 -fields in dimension d = 2, 3. The theory is 2QN 

renormalizable even if d = 4 as mentioned in the N  d2N Xx


N 2

remark at the end of the previous section. The  dx


analysis would be very similar to the above: it is just N fx  d2=2N Xx
N
19
a little more involved power-counting argument. jj
624 Constructive Quantum Field Theory

where  (d2)N is interpreted as N if d = 2. Hence, if divergent when the fields were not properly scaled,
d = 3 it is are in fact of the same order or much smaller than
the main 4 -term.
VN zN Therefore, the integration over z(N) can be, heur-
X Z 
def N N 4 N 2 istically, performed by techniques well established
  Xx N Xx
2QN  in statistical mechanics (i.e., by straightforward
 dx perturbation expansions): at least if the field
X(N1)
3 N
 N fx  2N Xx 20 x
is smooth and bounded, as prescribed
jj by [6], with B = BN1 growing as a power of N.
where In this case, denoting symbolically the integration
over z(N) by P or by h. . .i, it can be expected that it
def should give
N 6cN 2 N N c0N ;
def Z  
 N 3c2N 2  N bN 3 N 2N b0N eVN dP zN  eVj;N1 Rj;Njj 22
and cN , c0N , bN , b0N , computable from [15] and [14],
admit a limit as N ! 1. While if d = 2 it is where
R Vj; N1 is the Taylor expansion of
log eVN dP(z(N) ) in powers of  (hence essentially
VN zN in the very small parameter  (4d)N ) truncated at
X Z  order j, that is,
def 2 2N N 4 N 2
 N  Xx N Xx
2QN  V1;N1 hVN i1
 dx " #2
32 N
2
hVN i  hVN i2
 N fx N Xx 21 V2;N1 hVN i
jj 2!
"
def 2
hVN i  hVN i2
where N = 6cN and  N = 3c2N and cN , compu- V3;N1 hVN i
2!
table from [13], admits a limit as N ! 1.  #
The fields z(N) and X(N1) can be considered 2
hVN hVN i  hVN i2 i  hVN hVN
2
i  hVN i2 3
; ...
constant over boxes  2 QN : z(N) x
= s , X(N1)
x
= x 3!
for x 2  and the s can be considered statistically 23
independent on the scale of the lattice QN .
j
Therefore, [20] and [21] show that integration over where [] denotes truncation to order j in ,
z(N) in the integral defining ZN (, f ) is not too and R(j, N) is a remainder (depending on (N1)
x
)
different from the computation of a partition func- which can be expected to be estimated, for d = 2, 3, by
tion of a lattice continuous spin model in which the
spins are s and, most important, interact extre- jRj; Nj  Rj; N
mely weakly if N is large. In fact, the coupling def 4j
Cj BN  N 2  4dN j1  dN 24
constants are of order of a power of jX(N1) j times
O( N ) if d = 3 (O(N 2  2N ) if d = 2), or of order for suitable constants Cj , that is, a remainder
O( N(d2)=2 max jfx j), no matter how large  and f. estimated by the (j 1)th power of the coupling
This says that the smallest scale fields are times the number of boxes of scale N in . The
extremely weakly coupled. The fields X(N1) can be relations [22][24] resultR from a naive Taylor
regarded as external fields of size that will be called expansion (in  of the log eVN dP(z(N) ), taking into
BN1 , of order 1 or even allowed to grow with a account that, in VN as a function of z(N) , the z(N) s
power of N, see [6]. Their presence in VN does not appear multiplied by quantities at most of size
affect the size of the couplings, as far as the analysis  4d N 2 B3N , by [20] and [21] if jX(N1) j  BN1 ).
of the integral over z(N) is concerned, because the In a statistical mechanics model for a lattice spin
couplings remain exponentially small in N, see [20] system, such a calculation of ZN would lead to a
and [21], being at worst multiplied by a power of mean-field equation of state once the remainder was
BN1 , i.e., changed by a factor which is a power of N. neglected.
The smallness of the coupling at small scale is a The peculiarity of field theory is that a relation like
property called asymptotic freedom. Once fields [22] and [24] has to be applied again to Vj; N1 to
and coordinates are correctly scaled, the real size perform the integration over z(N1) and define Vj; N2
of the coupling becomes manifest, that is, it is and, then, again to Vj; N2 . . . . Therefore, it will be
extremely small and the addends in VN proportional essential to perform the integral in [22] to an order
to the counter-terms N , N , which looked (in ) high enough so that the bound R(j, N) can be
Constructive Quantum Field Theory 625

summed over N: this requires (see [24]) an explicit The relevant part in d = 2 is simply of the form
calculation of [23] pushed at least to order j = 1 if [21] with h replacing N: call it Vh(rel, 1) . If d = 3, it is
d = 2 or to order j = 3 if d = 3; furthermore it is also given by [20] with h replacing N plus, for h < N, a
necessary to check that the resulting Vj; N1 can still second nonlocal term
be interpreted as low-coupling spin model so that Z
2 
[22] can be iterated with N  1 replacing N and then rel;2 def 4 3! 2 h 3 N 3
Vh  Chh0  Chh0
with N  2 replacing N  1, . . . . 2! 2!
The first necessary check towards a proof of the  2
h h

h  h0 dhdh0
discussed heuristic expectations is that, defining
recursively Vj; h from Vj, h1 for h = N  1, . . . , 1, 0
which is conveniently expressed in terms of a
by [23] with VN replaced by Vj; h1 and Vj; N1
nonlocal field
replaced by Vj; h , the couplings between the variables
z(h) do not become worse than those discussed in h h
h  h0
the case h = N. Furthermore, the field (N1)
x
has a h def
Yhh0 1
high probability of satisfying [6], but fluctuations  h jh  h0 j4
are possible: hence the R-estimate has to be rel rel;1 rel;2
combined with another one dealing with the large as Vh Vh Vh with
fluctuations of X(N1)
x
which has to be shown to be rel;2 def
X Z h2 h
not worse. Vh 2  2h Yhh0 Ahh0
;0 2Q 
0
For more details, the reader is referred to Gallavotti h

(1978, 1985) and Benfatto and Gallavotti (1995). 0 h 0 dhdh0



ec  jhh j
25
jj j0 j
where
Effective Potentials and Their
h
!
Scale (In)Dependence Ahh0
0<a < a0
To analyze the first problem mentioned at the end of  h jh  h0 j31=2 N
the previous section, define Vj; h by [23] with VN
0 0
replaced by Vj; h1 for h = N  1, N  2, . . . , 0. The with a, a , c > 0 and the subscript N means that the
quantities Vj; h , which are called effective poten- expression in parenthesis saturates at scale N, i.e.,
tials on scale h (and order j), turn out to be in a its denoninator becomes  (3(1=2))(hN) as jh  h0 j ! 0.
natural sense scale independent: this is a conse- The expression [25] is not the full part of the
quence of renormalizability, realized by Wilson as a potential Vj; h which is of second order in the fields:
much more general property which can be checked, there are several other contributions which are
in the very special cases considered here with collected below as irrelevant.
d = 2, 3, at fixed j by induction, and in the super- It should be stressed that irrelevant is a
renormalizable models considered here it requires traditional technical term: by no means it should
only an elementary computation of a few Gaussian suggest negligibility. On the contrary, it could be
integrals as the case j = 3 (or even j = 1 if d = 2) is maintained that the whole purpose of the theory is
already sufficient for our purposes. to study the irrelevant terms. The irrelevant part of
It can also be (more easily) proved for general j by the potential can be better designated as the driven
a dimensional argument parallel to the one pre- part, as its behavior is controlled by the relevant
sented earlier to check finiteness of the renormalized part: although initially Vj; h , h = N, contains
series. The derivation is elementary but it should be no irrelevant terms, it eventually contains them for
stressed that, again, it is possible only because of the h < N and they keep getting generated as h
special choice of the counter-terms N , N . If d = 3, diminishes. Furthermore, the part of the irrelevant
the boundedness and smoothness of the fields ( h) terms generated at scale h0  N becomes very small
and z(h) expressed by the second of [6] and of [10] is at scales h h0 so that the irrelevant part of Vj; h at
essential; while if d = 2 the smoothness is not small h (e.g., at h = 0, i.e., on the physical scale of
necessary. the observer) only depend on the relevant terms in a
The structure of Vj; h is conveniently expressed few scales near h.
in terms of the fields X(h) x
, as a sum of three terms It also turns out that the Schwinger functions are
Vh(rel) (standing for relevant part), Vh(irr) (standing simply related to the irrelevant terms.
for irrelevant part), and a field independent The irrelevant part of the effective potential can
part E(j, h)jj. be expressed as a finite sum of integrals of
626 Constructive Quantum Field Theory

monomials in the fields X(h)


x
if d = 2, or in the fields Remarks
X(h)
x
(h)
and Yhh 0 if d = 3, which can be written as V
(irr)
j; h (i) Checking scale R independence for j = 1 is just
given by
checking that P(dz(h) )V1; h = V1; h1 . Note that
Z Y
p q
Y  Z  
h nk h n0 h 0
c dx1 ;...;h0q n ht def h4 h h2 h2
Xx Yh 0 h0k0 e   V1;h  x  6C00 x 3C00 dx
k k k0 
k1 k0 1
p
Y dxk
q
Ydhk0 dh0k0 hence, calling :x(h)4 : the polynomial in the integral

Wx1 ; . . . ; h0q 26 (Wicks monomial of order 4), we have here an
jk j k0 1 j1k0 j j2k0 j
k1 elementary Gaussian integral (martingale property
of Wick monomials). Note the essential role of the
with the integral extended to products 1
  

counter-terms. For j > 1, the computation is similar


p
  
(1q
2q ) of boxes  2 Qh , and
but it involves higher-order polynomials (up to 4j)
d(x1 , . . . , h0q ) is the length of the shortest tree
and the distinction between d = 2 and d = 3
graph that connects all the p 2q > 0 points, the
becomes important.
exponents n, t are  2, and t is  3 if q > 0;
(ii) Vj; 0 contains only the field-independent part
the kernel W depends on all coordinates
Qq x1 , . . . , h0q
E(j, 0)jj (see above) which is just a number (as
and it is bounded P abovePby Cj k0 = 1 Ahk0 hk0 for some
0
there are no fields of scale 0): by the above
Cj ; the sums nk n0k0 cannot exceed 4j. The
definitions, it is identical to the perturbative
test functions f do not appear in [26] because by
expansion truncated to jth order in  of
assumption they are bounded by 1: but W depends
log ZN (, f ), well defined as discussed earlier.
on the f s as well.
The field-independent part is simply the value
of log ZN (, f ) computed by the perturbation Nonperturbative Renormalization:
analysis in the section Perturbation theory up to Small Fields
order j in  but using as propagator (C(N)  C(h) ):
thus, E(j, h) is a constant depending on N but Having introduced the notion of effective potential
uniformly bounded as N ! 1 (because of the Vj; h , of order j and scale h, satisfying the bounds
renormalizability proved in the section Perturba- (described after [26]) on the kernels W representing
tion theory). it, the problem is to estimate the remainder in [22]
If d = 2, there is no need to introduce the nonlocal and find its relation with the value [24] given by the
fields Y (h) and in [26] one can simply take q = 0, heuristic Taylor expansion. Assume  < 1 to avoid
and the relevant part also can be expressed by distinguishing this case from that with   1 which
omitting the term Vh(rel, 2) in [25]: unlike the d = 3 would lead to very similar estimates but to different
case, the estimate on the kernels W by an -dependence on some constants.
N-independent Cj holds uniformly in h without Define B (z(h) ) = 1 if kz(h) k  Bh2 for all  2 Qh ,
having to introduce Y. For d = 2, it will therefore be see [8], and 0 otherwise; then the following lemma
supposed that Vh(rel, 2)  0 in [25] and q = 0 in [26]. holds:
It is not necessary to have more information on Lemma 1 Let kX(h) k be defined as [8] with z
the structure of Vj; h even though one can find simple replaced by X and suppose kX(h) k  Bh4 for all 
graphical rules, closely related to the ones in the then, for all j  1, it is
section Perturbation theory, to construct the Z
coefficients W in full detail. The W depend, of eVj;h1 B zh1 dPzh1
course, on h but the uniformity of the bound on W
0
is the only relevant property and in this sense the eVj;h R j;h1jj 27
effective potentials are said to be (almost) scale
independent. with, for suitable constants c , c0 ,
The above bounds on the irrelevant part can jR0 j; h 1j  R j; h 1
be checked by an elementary direct computation if
def 0 2
h12
j  3: in spite of its elementary character, the Rj; h 1 c ec B
uniformity in h  N is a result ultimately playing an
and R(j; h 1) given by [24] with h 1 in place
essential role in the theory together with the
of N.
dominance of the relevant part over the irrelevant R Q
one which, once the fields are properly scaled, is Since ZN (, f )  eVN N (h) (h)
h = 1 B (z )P(dz ) this
much smaller (by a factor of order  h , see [26]), immediately gives a lower bound on E = (1=jj)
0
at least if h is large. log ZN (, f ): in fact if B (kz(h ) k) = 1 for
Constructive Quantum Field Theory 627

h0 = 1, . . . , h, then kX(h) k  c Bh04 for some c so is  N 2  2N (BN 4 ) < 0 and it overwhelmingly
that, by recursive PN application of Lemma 1, dominates on the remaining terms whose value is
ZN (, f )  eVj, 0  h = 1 R (j, h)jj . By the remark at the bounded by a similar expression with a smaller
end of the previous section, given j the lower bound power of N. Then if E c def= =E denotes the comple-
on E just described agrees with the perturbation ment in  of a set E  :
expansion of E = (1=jj) log ZN (, f ) truncated to Lemma 2 Let d = 2. Define Vh (Dch ) to be given by
order j (in ) up to an error bounded by
P the expression [22] with the integrals extending over
1
h = 1 R (j, h). j =Dh and define R(j, h 1) by [24]. Then
Z  
Remark The problem solved by Lemma 1 is c c

usually referred to as the small-field problem, to eVh1 Dh1 dP zh1 eVh Dh R j;h1jj 28
contrast it with the large-field problem discussed
later. The proof of the lemma is a simple Taylor where jR (j, h 1j  R (j, h 1 def
= R(j; h 1)
c0 B2 (h1)2
expansion in  h if d = 3 or in h2  2h if d = 2 to c e with suitable c , c0 .
order j (in ). The constraint on z(h1) makes the Remark Lemma 2 is genuinely not perturbative
integrations over z(h1) , necessary to compute Vj; h and making essential use of the positivity of .
from Vj; h1 , not Gaussian. But the tail estimates [9], Below the analysis of the proof of the lemma, which
together with the Markov property of the distribu- consists essentially in its reduction to Lemma 1, is
tion of z(h) can be used to estimate the difference described in detail. It is perhaps the most interesting
with respect to the Gaussian unconstrained integra- part and the core of the theory of the proof that
tions of z(h1) : and the result is the addition of the truncating the expansion in  of (1=jj) log ZN (, f )
small tail error changing R into R in [27]. The to order j gives as a result an estimate exact to order
estimate of the main part of the remainder R would j1 of (1=jj) log ZN (, f ).
be obvious if the fields z(h) were independent on
boxes of scale  h : they are not independent but Let RN be the cubes  2 QN in which there is at
they are Markovian and the estimate can be done by least one point x where jz(N)
x
j  BN 2 . By definition,
taking into account the Markov property. the region DN =DN1 is covered by RN .
Remark that in the region DN1 =RN the field
For more details, the reader is referred to Wilson X(N1) is large but zN is not large so that X(N) is still
(1970, 1972), Gallavotti (1978, 1981, 1985), and very large: this is so because the bounds set to define
Benfatto et al. (1978). the regions D and R are quite different being BN 4
and BN 2 , respectively. Hence, if a point is in DN1
and not in RN , then the field X(N) must be of the
Nonperturbative Renormalization: Large order BN 3 . Therefore, by positivity of the (N)4
x
Fields, Ultraviolet Stability term (which dominates all other terms so that
The small-field estimates are not sufficient to obtain V (N) ((N)
x
) < 0 for x 2 DN [ (DN1 =RN )) we can
ultraviolet stability: to control the cases in which replace VN (DcN ) by V((DN [ (DN1 =RN ))c ), for the
jX(h) (h)
j > Bh4 for some x or some h, or jYxh j > Bh4 for purpose of obtaining an upper bound.
x
h
some jx  hj <  , a further idea is necessary and it Furthermore, modulo a suitable correction, it is
rests on making use of the assumption that  > 0 possible to replace V((DN [ (DN1 =RN ))c ) by
which, in a sense to be determined, should suppress V((DN1 [ RN )c ): because the integrand in VN is
the contribution to the integral defining ZN (, f ) bounded below by
coming from very large values of the field. Assume b 2N N 2
also  < 1 for the same reasons advanced in the
section Effective potentials and their scale if d = 2 (by b N if d = 3), for some b, so that the
(in)dependence. points in RN can at most lower V((DN [
Consider first d = 2. Let DN be the large-field (DN1 =RN ))c ) by bN 2  (4d)N #(RN ) if #RN is
region where jX(N)x
j > BN 4 and let VN (=DN ) be the number of boxes of QN in RN and V(x ) is
the integral defining the potential in [21] extended bounded below by its minimum: thus,
to the region =DN , complement of DN . This region
VDN1 [ RN c bN2  4dN #RN
is typically very irregular (and random as X itself is
random with distribution PN ). is an upper bound to V((DN [ (DN1 =RN ))c ).
An upper bound on the integral defining ZN (, f ) In the complement of DN1 [ RN , all fields are
is obtained by simply replacing eVN by eVN (=DN ) small; if X(N1) and RN are fixed this region is not
because in DN the first term in the integrand in [21] random (as a function of z(N) ) any more. Therefore,
628 Constructive Quantum Field Theory

if X(N1) , RN are fixed the integration over z(N) , quantity like b0 N 2  (4d)N (BN 4 )4 #(RN ) (because
conditioned to having z(N) fixed (and large) in the the reintroduction occurs in the region RN =DN1
region RN , is performed by means of the same which is covered by RN and in such points the field
argument necessary to prove Lemma 1 (essentially a Xx(N1) is not large, being bounded by B(N  1)4 );
Taylor expansion in  (4d)N ). The large size of so that their contribution to the effective potential
z(N) in RN does not affect too much the result is still dominated by the 4 -term and therefore by
because on the boundary of RN the field z(N) is  (4d)N times a power of BN 4 times the volume of
BN 2 (recalling that z(N) is continuous) and since RN (in units  N , i.e., #RN ). All this is taken care
the variable z(N) is Markovian, the boundary effect of by suitably fixing c00 .
decays exponentially from the boundary @RN : it
Note that the sum over RN of [29] is
adds a quantity that can be shown to be bounded by
the number of boxes in RN on the boundary of RN , 0 2
N4 00
 4dN N2 BN4 4  dN jj
hence by #RN , times b0 (N  1)2  (4d) (B(N  1)4 )4 1 c ec B ec
for some b0 .
The result of the integration over z(N) of (because  contains jj dN0 cubes of QN ); hence, it is
c B2 N 2
VN ((DN [(DN1 =RN ))c ) c e
e conditioned to the large-field bounded above by e for suitably defined
values of z (N)
in RN leads to an upper bound on
c , c0 .
R V
e N P(dz(N) ) as The same argument can be repeated for Vj; h (Dch )
with any h if Vj; h (Dch ) is defined by the sum over s
X c
eVj;N1 DN1 Rj;Njj in Qh of the same integrals as those in [25] and [26]
RN with j =Dh replacing j in the integration domains.
Y 2 2
#RN Applying Lemma 1 recursively with j  1 (if
0 00
 4dN N2 BN 4 4

c ec BN ec 29 d = 3 it would become necessary to take j  3), it
2RN follows that there exist N-independent upper and
lower P bounds E jj on log Z(, f ) of the form
where c, c0 , c00 are suitable constants: this is Vj; 0 1 c0 B2 h2
)jj for c , c0 > 0
h = 1 (R(j, h) c e
explained as follows. suitably chosen and -independent for  < 1.
1. Taylor expansion (in ) of the integral By the remark at the end of Sec.6, given j, the
c 2 (4d )N
eVN ((DN1 [RN ) )bN  #(RN )
(which, by cons- bounds just described agree with the perturbation
c
truction, is an upper bound on eVN (DN ) ) with expansion E(j, 0)jj  Vj; 0 of log Z(, f ) truncated
respect to the field z(N) , conditioned to be fixed toP order j (in ) up to the remainders
and large in RN , would lead to an upper bound as 1 h = 1 R (j, h). Hence, if B is chosen proportional
to log 1 def = log (e 1 ), the upper and lower
c 0 00
BN 4 4  4dN #RN bounds coincide to order j in  with the value
eVj; N1 DN1 [RN R j;Njjb
obtained by truncating to order j the perturbative
with R0 equal to [24] possibly with some C0j series.
replacing Cj . The second exponential on the RHS The latter remark is important as it implies
of [29] arises partly from the above correction not only that the bounds are finite (by the
b00 (BN 4 )4  (4d)N #(RN ) and partly from a section Perturbation theory) but also that the
contribution of similar form explained in (3) function (1=jj) log Z(, f ) is not quadratic in f:
below. already to order 1Rin  it is quartic in f (containing a
2. Integration over the large conditioning fields term equal to ( Cx, 0 fx dx)4 ).
fixed in RN is controlled by the second estimate The latter property is important as it excludes
in [9] (the tail estimate): the first factors in that the result is a Gaussian generating function.
parentheses in [29] is the tail estimate just Thus, the outline of the proof of Lemma 2, which
mentioned, i.e., the probability that z(N) is large together with Lemma 1 forms the core of the
in the region RN . The second factor is only partly analysis of the ultraviolet stability for d = 2, is
explained in (1) above. completed.
3. Without further estimates, the bound [29] would If d = 3, more care is needed because (very mild)
contain Vj; N1 ((DN1 [ RN )c ) rather than smoothness, like the considered Holder continuity
c
Vj; N1 (DN1 ). Hence, there is the need to change with exponent 1/4, of z, X is necessary to obtain the
the potential Vj;N1 ((DN1 [ RN )c ) by reintrodu- key scale independence property discussed in earlier:
cing the contribution due to the fields in therefore, the natural measure of the size of z(h) and
RN =DN1 in order to reconstruct Vj; N1 (DcN1 ). X(h) in a box  2 Qh is no longer the maximum of
Reintroducing this part of the potential costs a jz(h)
x
j or of jX(h) x
j. The region Dh becomes more
Constructive Quantum Field Theory 629

involved as it has to consist of the points x renormalization group applications in which they
where jX(h)
x
j > Bh4 and of the pairs h, h0 where either tend to zero only as powers of h or do not
  tend to zero at all.
 h h 
Xh  Xh0  The multiscale analysis method, i.e., the renorma-
4
jYh;h0 j  1 > Bh lization group method, in a form close to the one
 h jh  h0 j4
discussed here has been applied very often since its
i.e., it is not just a subset of . introduction in physics and it has led to the solution
However, if d = 3, the relevant part also contains of several important problems. The following is not
the negative term V (rel, 2) , see [25]: and since it an exhaustive list and includes a few open questions.
dominates over all other terms which contain a
Y-field (because their couplings [25] are smaller by 1. The arguments just discussed imply, with minor
about  h ), the argument given for d = 2 can be extra work that ZN (, f ) as N ! 1 not only admit
adapted to the new situation. Two regions D1h , D2h uniform upper and lower bounds but also that the
will be defined: the first consists of all the points x limit as N ! 1 actually exists and it is a C1 function
where jX(h) x
j > Bh4 and the second of all the pairs of , f . Its  and f-derivatives at  = 0 and f = 0 are
0 (h) 4
h, h where jYh, h0 j > Bh . The region Rh will be
given by the formal perturbation calculation. In some
the collection of all  2 Qh , where kz(h) k > Bh2 , cases, it is even possible to show that the formal series
see [8] with = 0. Then V(Dch ) will be defined as the for ZN (, f ) in powers of  is Borel summable.
sum of the integrals in [25] and [26] with the integrals 2. The problem of removing the infrared cutoff (i.e.,
over xi further restricted to xi 62 D1h and those over the  ! 1) is in a sense more a problem of statistical
pairs hi , h0i are further restricted to (hi , h0i ) 62 D2h . With mechanics. In fact, it can be solved for d = 2, 3 by a
the new settings, Lemma 2 can be proved also for typical technique used in statistical mechanics, the
d = 3 along the same lines as in the d = 2 case. cluster expansion. This is not intended to mean
For more details, the reader is referred to Wilson that it is technically an easy task: understanding its
(1970, 1972), Benfatto et al. (1978), and Gallavotti connection with the low-density expansions and
(1981). the possibility of using such techniques has been a
major achievement that is not discussed here.
3. The third problem mentioned in the introduction,
that is, checking the axioms so that the theory could
Ultraviolet Limit, Infrared Behavior, and
be interpreted as a quantum field theory is a difficult
Other Applications problem which required important efforts to con-
The results on the ultraviolet stability are nonper- trol and which is not analyzed here. An introduction
turbative, as no assumption is made on the size of  to it can be its analysis in the d = 2 case.
(the assumption  < 1 has been imposed in the last 4. Also the problem of keeping the ultraviolet cutoff
two sections only to obtain simpler expressions for and removing the infrared cutoff while the para-
the -dependence of various constants): nevertheless meter m2 in the propagator approaches 0 is a very
the multiscale analysis has allowed us to use interesting problem related to many questions in
perturbative techniques (i.e., the Taylor expansion statistical mechanics at the critical point.
in Lemmata 1, 2) to find the solution. The latter 5. Field theory methods can be applied to various
procedure is the essence of the renormalization statistical mechanics problems away from criti-
group methods: they aim at reducing a difficult cality: particularly interesting is the theory of the
multiscale problem to a sequence of simple single- neutral Coulomb gas and of the dipole gas in two
scale problems. Of course, in most cases, it is dimensions.
difficult to implement the approach and the scalar 6. The methods can be applied to Fermi systems in
quantum fields in dimensions 2, 3 are among the field theory as well as in equilibrium statistical
simplest examples. The analysis of the beta function mechanics. The understanding of the ground state
and of the running couplings, which appear in in not exactly soluble models of spinless fermions
essentially all renormalization group applications, in one dimension at small coupling is one of the
does not play a role here (or, better, their role is so results. And via the transfer matrix theory it has
inessential that it has even been possible to avoid led to the understanding of nontrivial critical
mentioning them). This makes the models somewhat behavior in two-dimensional models that are not
special from the renormalization group viewpoint: exactly soluble (like Ising next-nearest-neighbor or
the running couplings at length scale h, if intro- AshkinTeller model). Fermi systems are of
duced, would tend exponentially to 0 as h ! 1; particular interest also because in their analysis
unlike what happens in the most interesting the large-fields problem is absent, but this great
630 Constructive Quantum Field Theory

technical advantage is somewhat offset by the In general, constructive quantum field theory
anticommutation properties of the fermionic seems to be in a deep crisis: the few solutions that
fields, which do not allow us to employ have been found concern very special problems and
probabilistic techniques in the estimates. are very demanding technically; the results obtained
7. An outstanding open problem is whether the scalar have often not been considered to contribute
4 -theory is possible and nontrivial in dimension appreciably to any progress. And many consider
d = 4: this is a case of a renormalizable not that the work dedicated to the subject is not worth
asymptotically free theory. The conjecture that the results that one can even hope to obtain.
many support is that the theory is necessarily trivial Therefore, in recent years, attempts have been
(i.e., the function ZN (, f ) becomes necessarily a made to follow other paths: an attitude that in the
Gaussian in the limit N ! 1). One of the main past usually did not lead, in general to great
problems is the choice of the ultraviolet cut-off; achievements but that is always tempting and
unlike the d = 2, 3 cases in which the choice is a worth pursuing because the rare major progresses
matter of convenience it does not seem that the made in physics resulted precisely by such changes
issue of triviality can be settled without a careful of attitude, leaving aside developments requiring
analysis of the choice and of the role of the work which was too technical and possibly hopeless:
ultraviolet cut-off. just to mention an important case, one can recall
8. Very interesting problems can be found in the quantum mechanics which disposed of all attempts
study of highly symmetric quantum fields: gauge at understanding the observed atomic levels quanti-
invariance presents serious difficulties to be zation on the basis of refined developments of
studied (rigorously or even heuristically) because classical electromagnetism.
in its naive forms it is incompatible with For more details, the reader is referred to Nelson
regularizations. Rigorous treatments have been (1966), Guerra (1972), Glimm et al. (1973), Glimm
in some cases possible and in few cases it has been and Jaffe (1981), Simon (1974), Benfatto et al.
shown that the naive treatment is not only not (1978, 2003), Aizenman (1982), Gawedzky and
rigorous but it leads to incorrect results. Kupiainen (1983, 1985a, b), Balaban (1983), and
9. In connection with item (8) an outstanding problem Giuliani and Mastropietro (2005).
is to understand relativistic pure gauge Higgs fields
in dimension d = 4: the latter have been shown to be See also: Algebraic Approach to Quantum Field Theory;
ultraviolet stable but the result has not been Axiomatic Quantum Field Theory; Euclidean Field
followed by the study of the infrared limit. Theory; Integrability and Quantum Field Theory;
Perturbation Theory and its Techniques; Quantum Field
10. The classical gauge theory problem is quantum
Theory: A Brief Introduction; Scattering, Asymptotic
electrodynamics, QED, in dimension 4: it is a
Completeness and Bound States.
renormalizable theory (taking into account gauge
invariance) and its perturbative series truncated
after the first few orders give results that can be Further Reading
directly confronted with experience, giving very
Aizenman M (1982) Geometric analysis of 4 -fields and Ising
accurate predictions. Nevertheless, the model is models. Communications in Mathematical Physics 86: 148.
widely believed to be incomplete: in the sense that, Balaban T (1983) (Higgs)3, 2 quantum fields in a finite volume. III.
if treated rigorously, the result would be a field Renormalization. Communications in Mathematical Physics
describing free noninteracting assemblies of 88: 411445.
photons and electrons. It is believed that QED Benfatto G, Cassandro M, Gallavotti G et al. (1978) Some
probabilistic techniques in field theory. Communications in
can make sense only if embedded in a model with Mathematical Physics 59: 143166.
more fields, representing other particles (e.g., the Benfatto G, Cassandro M, Gallavotti G et al. (1980) Ultraviolet
standard model), which would influence the stability in Euclidean scalar field theories. Communications in
behavior of the electromagnetic field by providing Mathematical Physics 71: 95130.
an effective ultraviolet cutoff high enough for not Benfatto G and Gallavotti G (1995) Renormalization Group,
pp. 1143. Princeton: Princeton University Press.
altering the predictions on the observations on the Benfatto G, Giuliani A, and Mastropietro V (2003) Low
time and energy scales on which present (and, temperature analysis of two dimensional Fermi systems with
possibly, future over a long time span) experi- symmetric Fermi surface. Annales Henry Poincare 4: 137193.
ments are performed. In dimension d = 3, QED is De Calan C and Rivasseau V (1981) Local existence of the Borel
super-renormalizable, once the gauge symmetry is transform in euclidean 44 . Communications in Mathematical
Physics 82: 69100.
properly taken into account, and it can be studied Frohlich J (1982) On the triviality of 4d theories and the
with the techniques described above for the scalar approach to the critical point in d  4 dimensions. Nuclear
fields in the corresponding dimension. Physics B 200: 281296.
Contact Manifolds 631

Gallavotti G (1978) Some aspects of renormalization problems in Glimm J and Jaffe A (1981) Quantum Physics. Springer.
statistical mechanics. Memorie dell Accademia dei Lincei Guerra F (1972) Uniqueness of the vacuum energy density and Van
15: 2359. Hove phenomena in the infinite volume limit for two-dimensional
Gallavotti G (1981) Elliptic operators and Gaussian processes. In: self-coupled Bose fields. Physical Review Letters 28: 12131215.
Aspects Statistiques et Aspects Physiques des Processus Gaus- Hepp K (1966) Theorie de la renormalization. Lecture Notes in
siens, pp. 349360. Colloques Internat. C.N.R.S, St. Flour. Physics, vol. 2. Heidelberg: Springer.
Publications du CNRS, Paris. Nelson E (1966) A quartic interaction in two dimensions. In:
Gallavotti G (1985) Renormalization theory and ultraviolet Goodman R and Segal I (eds.) Mathematical Theory of
stability via renormalization group methods. Reviews of Elementary Particles, pp. 6973. Cambridge: M.I.T.
Modern Physics 57: 471569. Osterwalder K and Schrader R (1973) Axioms for Euclidean
Gawedzky K and Kupiainen A (1983) Block spin renormalization Greens functions. Communications in Mathematical Physics
group for dipole gas and (@)4 . Annals of Physics 147: 198243. 31: 83112.
Gawedzky K and Kupiainen A (1985a) GrossNeveu model Simon B (1974) The P()2 Euclidean (Quantum) Field Theory.
through convergent perturbation expansion. Communications Princeton: Princeton University Press.
in Mathematical Physics 102: 130. Streater RF and Wightman AS (1964) PCT, Spin, Statistics and
Gawedzky K and Kupiainen A (1985b) Massless lattice 44 theory: All That. Benjamin-Cummings (reprinted Princeton University
rigorous control of a renormalizable asymptotically free model. Press, 2000).
Communications in Mathematical Physics 99: 197252. Wightman AS and Garding L (1965) Fields as operator-valued
Giuliani A and Mastropietro V (2005) Anomalous universality in distributions in relativistic quantum theory. Arkiv for Fysik
the anisotropic AshkinTeller model. Communications in 28: 129189.
Mathematical Physics 256: 681735. Wilson KG (1970) Model of coupling constant renormalization.
Glimm J, Jaffe A, and Spencer T (1973) Velo G and Wightman A Physical Review D 2: 14381472.
(eds.) Constructive Field theory, Lecture Notes in Physics, Wilson KG (1972) Renormalization of a scalar field in strong
vol. 25, pp. 132242. New York: Springer. coupling. Physical Review D 6: 419426.

Contact Manifolds
J B Etnyre, University of Pennsylvania, (e.g., thermodynamics, fluid dynamics, holo-
Philadelphia, PA, USA morphic curves, and open book decompositions)
2006 Elsevier Ltd. All rights reserved. are provided in the Further reading section.

Introduction Basic Definitions and Examples


Contact geometry has been seen to underly many A hyperplane field
on a manifold M is a codimen-
physical phenomena and is related to many other sion-1 sub-bundle of the tangent bundle TM. Locally,
mathematical structures. Contact structures first a hyperplane field can always be described as the
appeared in the work of Sophus Lie on partial kernel of a 1-form. In other words, for every point in
differential equations. They reappeared in Gibbs M there is a neighborhood U and a 1-form defined
work on thermodynamics, Huygens work on on U such that the kernel of the linear map
geometric optics, and in Hamiltonian dynamics. x : Tx M ! R is
x for all x in U. The form is called
More recently, contact structures have been seen to a local defining form for
. A contact structure on a
have relations with fluid mechanics, Riemannian (2n 1)-dimensional manifold M is a maximally
geometry, and low-dimensional topology, and these nonintegrable hyperplane field
. The hyperplane
structures provide an interesting class of subelliptic field
is maximally nonintegrable if for any (and hence
operators. every) locally defining 1-form for
the following
After summarizing the basic definitions, exam- equation holds:
ples, and facts concerning contact geometry, this
^ d n 6 0 1
article discusses the connections between contact
geometry and symplectic geometry, Riemannian (this means that the form is, pointwise, never equal
geometry, complex geometry, analysis, and to 0). Geometrically, the nonintegrability of
means
dynamics. The article ends by discussing two of that no hypersurface in M can be tangent to
along
the most-studied connections with physics: Hamil- an open subset of the hypersurface. Intuitively, this
tonian dynamics and geometric optics. References means that the hyperplanes twist too much to be
for other important topics in contact geometry tangent to hypersurfaces (Figure 1). The pair (M,
)
632 Contact Manifolds

y M is compact then so is P M; so this gives examples of


z
contact structures on compact manifolds.
If  and 0 are two locally defining 1-forms for , then
there is a nonzero function f such that 0 = f . Thus,
0 ^ (d0 )n = f n1  ^ (d)n is a nonzero top dimen-
sional form on M and if n is odd then the orientation
defined by the local defining form is independent of the
x
actual form. Hence, when n is odd, a contact structure
defines an orientation on M (this is independent of
whether or not  is orientable!). If M had a preassigned
orientation (and n is odd), then the contact structure is
Figure 1 The standard contact structure on R 3 given as the called positive if it induces the given orientation and
kernel of dz  y dx : Courtesy of Stephan Schonenberger. negative otherwise. One should be careful when
reading the literature, as some authors build
positive into their definition of contact structure,
is called a contact manifold and any locally defining
especially when n = 1. If there is a globally defined
form  for  is called a contact form for .
1-form  whose kernel defines , then  is called
Example 1 The most basic example of a contact transversally orientable or co-orientable. This is
seen on R2n1 as the kernel of the
structure can be P equivalent to the bundle  being orientable when n
1-form  = dz  ni= 1 yi dxi , where the coordinates is odd or when n is even and M is orientable. In
on R2n1 are (x1 , y1 , . . . , xn , yn , z). This example is this article the discussion is restricted to transver-
shown in Figure 1 when n = 1. sely orientable contact structures.
Suppose that  is a contact form for , then eqn [1]
Example 2 Recall that on the cotangent space of implies that dj is a symplectic form on . This
any n-manifold M, there is a canonical 1-form , is one sense in which a contact structure is like an
called the Liouville form. If (q1 , . . . , qn ) are local odd-dimensional analog of a symplectic structure.
coordinates on M, then any 1-form can be expressed A submanifold L of a contact manifold (M, ) is
P
as ni1 pi dqi , so (q1 , p1 , . . . , qn , pn ) are local coor- called Legendrian if dim M = 2 dim L 1 and Tp L  p .
dinates on T  M. In these coordinates, Example 4 A fiber in the unit cotangent bundle
X
n with the contact structure from Example 3 is a
 pi  dqi 2 Legendrian sphere.
i1
Example 5 Let f : M ! R be a function. Then
where  : T  M ! M is the natural projection j1 (f )(q) = (q, dfq , f (q)) is a section of the 1-jet space
map. The 1-jet space of M is the manifold J1 (M) of M; it is called the 1-jet of f. If s is any
J1 (M) = T  M  R and can be considered as a bundle section of the 1-jet space, then it is Legendrian if and
over M. The 1-jet space has a natural contact only if it is the 1-jet of a function.
structure given as the kernel of  = dz  , where z
is the coordinate on R. Note that if M = Rn then we This observation is the basis for Lies study of
recover the previous example. partial differential equations. More specifically, a
first-order partial differential equation on M can be
Example 3 The (oriented) projectivized cotangent considered as giving an algebraic equation on J1 (M).
space of a manifold M is the set P M of nonzero Then, a section of J1 (M) satisfying this algebraic
covectors in T  M where two covectors are identified equation corresponds to the 1-jet of a solution to the
if they differ by a positive real number, that is, original partial differential equation if and only if it
is Legendrian.
P M T  M n f0g=R 3
Recently, Legendrian submanifolds have been
where {0} is the zero section of T  M and R denotes much studied. There are various classification results
the positive real numbers. If M has a metric then P M in three dimensions and several striking existence
can be easily identified with the space of unit results in higher dimensions.
covectors. Considering P M as unit covectors, we can
restrict the canonical 1-form  to P M to get a 1-form
Local Theory
 whose kernel defines a contact structure  on P M.
(Although there is no canonical contact form on P M, The natural equivalence between contact structures
the contact structure  is still well defined.) Note that if is contactomorphism. Two contact structures 0 and
Contact Manifolds 633

1 on manifolds M0 and M1 , respectively, are Lutz and Martinet proved a similar, but weaker,
contactomorphic if there is a diffeomorphism result for oriented closed 3-manifolds. More
f : M0 ! M1 such that f (0 ) = 1 . All contact struc- specifically, every closed oriented 3-manifold admits
tures are locally contactomorphic. In particular, we a co-oriented contact structure and in fact has at least
have the following theorem. one for every homotopy class of plane field. There has
been much progress on classifying contact structures
Theorem 1 (Darbouxs Theorem). Suppose i is a
on 3-manifolds and here an interesting dichotomy has
contact structure on the manifold Mi , i = 0, 1, and
appeared. Contact structures break into one of two
M0 and M1 have the same dimension. Given any
types: tight or overtwisted. Overtwisted contact
points p0 and p1 in M0 and M1 , respectively, there
structures obey an h-principle and are in general easy
are neighborhoods Ni of pi in Mi and a contacto-
to understand. Tight contact structures have a more
morphism from (N0 , 0 jN0 ) to (N1 , 1 jN1 ). Moreover,
subtle, geometric nature. In higher dimensions there is
if i is a contact form for i near pi , then the
much less known about the existence (or classification)
contactomorphism can be chosen to pull 1 back to 0 .
of contact structures.
Thus, locally all contact structures (and contact
forms!) look like the one given in Example 1 above.
Furthermore, contact structures are local in Relations with Symplectic Geometry
time. That is, compact deformations of contact
Let (X, !) be a symplectic manifold. A vector field v
structures do not produce new contact structures.
satisfying
Theorem 2 (Grays theorem). Let M be an oriented
Lv ! ! 4
(2n 1)-dimensional manifold and t , t 2 (0, 1), a
family of contact structures on M that agree off of (where Lv ! is the Lie derivative of ! in the direction
some compact subset of M. Then there is a family of of v) is called a symplectic dilation. A compact
diffeomorphisms t : M ! M such that (t ) t = 0 . hypersurface M in (X, !) is said to have contact
type if there exists a symplectic dilation v in a
In particular, on a compact manifold, all
neighborhood of M that is transverse to M. Given a
deformations of contact structures come from
hypersurface M in (X, !), the characteristic line field
diffeomorphisms of the underlying manifold. The
LM in the tangent bundle of M is the symplectic
theorem is not true if the contact structures do not
complement of TM in TX. (Since M is codimension 1,
agree off of a compact set. For example, there is a
it is coisotropic; thus, the symplectic complement lies
one-parameter family of noncontactomorphic
in TM and is one dimensional.)
contact structures on S1  R2 .
Theorem 3 Let M be a compact hypersurface in a
symplectic manifold (X, !) and denote the inclusion
Existence and Classification
map i : M ! X. Then M has contact type if and only
The existence of contact structures on closed odd- if there exists a 1-form  on M such that d = i !
dimensional manifolds is quite difficult. However, and the form  is never zero on the characteristic
Gromov has shown that contact structures on line field.
open manifolds obey an h-principle. To explain
If M is a hypersurface of contact type, then the
this, we note that if (M2n1 , ) is a co-oriented
1-form  is obtained by contracting the symplectic
contact manifold then the tangent bundle of M can
dilation v into the symplectic form:  = v !. It is
be written as   R and thus the structure group
easy to verify that the 1-form  is a contact form
of TM can be reduced to U(n) (since  has
on M. Thus, a hypersurface of contact type in a
a conformal symplectic structure on it). Such
symplectic manifold inherits a co-oriented contact
a reduction of the structure group is called an
structure.
almost contact structure on M. Clearly, a contact
Given a co-orientable contact manifold (M, ), its
structure on M induces an almost contact struc-
symplectization Symp(M, ) = (X, !) is constructed
ture. If M is an open manifold, Gromov proved
as follows. The manifold X = M  (0, 1), and given
that the inclusion of the space of co-oriented
a global contact form  for  the symplectic
contact structures on M into the space of almost
form is ! = d(t), where t is the coordinate on R.
contact structures on M is a weak homotopy
(The symplectization is also equivalently defined as
equivalence. In particular, if an open manifold
(M  R, d(et )).)
meets the necessary algebraic condition for the
existence of an almost contact structure, then the Example 6 The symplectization of the standard
manifold has a co-oriented contact structure. contact structure on the unit cotangent bundle
634 Contact Manifolds

(see Example 3) is the standard symplectic structure for  if and only if it is transverse to  and its flow
on the complement of the zero section in the preserves .
cotangent bundle. The fundamental question concerning Reeb vector
fields asks if its flow has a (contractible) periodic
The symplectization is independent of the choice
orbit. A paraphrazing of the Weinstein conjecture
of contact from . To see this, fix a co-orientation
asserts a positive answer to this question. Most
for  and note the manifold X which can be
progress on this conjecture has been made in
identified (in many ways) with the sub-bundle of
dimension 3 where H Hofer has proved the
T  M whose fiber over x 2 M is
existence of periodic orbits for all Reeb fields on S3
f 2 Tx M : x 0 and and on 3-manifolds with essential spheres
 > 0 on vectors positively transverse to x g 5 (i.e., embedded S2 s that do not bound a 3-ball in
the manifold). Relations with Hamiltonian dynamics
and restricting d to this subspace yields a symplec- are discussed below.
tic form !, where  is the Liouville form on T  M Recall, from Example 3, that a Riemannian metric
defined in Example 2. A choice of contact form  g on a manifold M provides an identification of the
fixes an identification of X with the sub-bundle of (oriented) projectivized cotangent bundle P M with
T  M under which d(t) is taken to d. the unit cotangent bundle. Considered as a subset of
The vector field v = @=@t on (X, !) is a symplectic T  M, P M inherits not only a contact structure but
dilation that is transverse to M  {1}  X. Clearly, also a contact form  (by restricting the Liouville
v !jM{1} = . Thus, we see that any co-orientable form). Let v be the associated Reeb vector field.
contact manifold can be realized as a hypersurface The metric g also provides an identification of the
of contact type in a symplectic manifold. In tangent and cotangent bundles of M. Thus, P M
summary, we have the following theorem. may be considered as the unit tangent bundle of M.
Let wg be the vector field on the unit tangent bundle
Theorem 4 If (M, ) is a co-oriented contact
generating the geodesic flow on M.
manifold, then there is a symplectic manifold
Symp(M, ) in which M sits as a hypersurface of Theorem 6 The Reeb vector field v is identified
contact type. Moreover, any contact form  for  with geodesic flow field wg when P M is identified
gives an embedding of M into Symp(M, ) that with the unit tangent space using the metric g.
realizes M as a hypersurface of contact type.
We also note that all the hypersurfaces of contact
type in (X, !) look locally, in X, like a contact Relations with Complex Geometry
manifold sitting inside its symplectification. and Analysis
Theorem 5 Given a compact hypersurface M of Let X be a complex manifold with boundary and
contact type in a symplectic manifold (X, !) with the denote the induced complex structure on TX by J.
symplectic dilation given by v, there is a neighbor- The complex tangencies  to M = @X are described
hood of M in X symplectomorphic to a neighbor- by the equation d  J = 0, where  is a function
hood of M  {1} in Symp(M, ) where the defined in a neighborhood of the boundary such that
symplectization is identified with M  (0, 1) using 0 is a regular value and 1 (0) = M. The form
the contact form  = v !jM and  = ker . L(v, w) = d(d  J)(v, Jw), for v, w 2 , is called
the Levi form, and when L(v, w) is positive
(negative) definite, then X is said to have strictly
The Reeb Vector Field and Riemannian pseudoconvex (pseudoconcave) boundary. The
Geometry hyperplane field  will be a contact structure if and
only if d(d  J) is a nondegenerate 2-form on  (if
Let (M, ) be a contact manifold. Associated to a
and only if L(v, w) is definite). A well-studied source
contact form  for  is the Reeb vector field v .
of examples comes from Stein manifolds.
This is the unique vector field satisfying
Example 7 Let X be a complex manifold and
v  1 and v d 0 6
again let J denote the induced complex structure
One may readily check that v is transverse to the on TX. From a function  : X ! R, we can define a
contact hyperplanes and the flow of v preserves  2-form ! = d(d  J) and a symmetric form
(in fact, it preserves ). These two conditions g(v, w) = !(v, Jw). If this symmetric form is positive
characterize Reeb vector fields; that is, a vector definite, the function  is called strictly plurisub-
field v is the Reeb vector field for some contact form harmonic. The manifold X is a Stein manifold if X
Contact Manifolds 635

admits a proper strictly plurisubharmonic function Weinsteins conjecture asserts a positive answer to
 : X ! R. An important result says that X is Stein the questions: Does the Hamiltonian flow along a
if and only if it can be realized as a closed complex regular level set of contact type have a periodic
submanifold of C n . Clearly any noncritical level set orbit? Viterbo proved that the answer was yes if the
of  gives a contact manifold. hypersurface is compact and in (R 2n , ! = d). Other
progress has been made by studying Reeb dynamics.
Contact manifolds also give rise to an interesting
class of differential operators. Specifically, a contact
structure  on M defines a symbol-filtered algebra of
pseudodifferential operators  (M), called the
Geometric Optics
Heisenberg calculus. Operators in this algebra In this section, we study the propagation of light (or
are modeled on smooth families of convolution various other disturbances) in a medium (for the
operators on the Heisenberg group. An important moment, we do not specify the properties of this
class of operators of this type are the sum-of- medium). The medium will be given by a three-
squares operators. Locally, the highest-order part dimensional manifold M. Given a point p in M and
of such an operator takes the form t > 0, let Ip (t) be the set of all points to which light
can travel in time t. The wave front of p at time t
X
2n
L v2j iav 7 is the boundary of this set and is denoted as
j1 p (t) = @Ip (t).
where {v1 , . . . , v2n } is a local framing for the contact Theorem 8 (Huygens principle). p (t t0 ) is the
field and v is a Reeb vector field. This operator envelope of the wave fronts q (t0 ) for all q 2 p (t).
belongs to 2 (M) and is subelliptic for a outside a This is best understood in terms of contact
discrete set. geometry. Let  : (T  Mn{0}) ! P M be the natural
projection (see Example 3) and let S be any smooth
Hamiltonian Dynamics sub-bundle of T  Mn{0} that is transverse to the radial
vector field in each fiber and for which  jS : S ! P M
Given a symplectic manifold (X, !), a function is a diffeomorphism. The restriction of the Liouville
H : X ! R will be called a Hamiltonian. (Only form to S gives a contact form  and a corresponding
autonomous Hamiltonians are discussed here.) The Reeb vector field v. Given a subset F of M with a well-
unique vector field satisfying defined tangent space at every point set
vH ! dH LF fp 2 S : p 2 F and pw 0 for all
is called the Hamiltonian vector field associated to w 2 Tp Fg 8
H. Many problems in classical mechanics can be
The set LF is a Legendrian submanifold of S and is
formulated in terms of studying the flow of vH for
called the Legendrian lift of F. If L is a generic
various H.
Legendrian submanifold in S, then (L) is called the
Example 8 If (X, !) = (R 2n , d), where  is from front projection of L and L(L) = L. Given a Legendrian
Example 2, then the flow of the Hamiltonian vector submanifold L, let t (L) be the Legendrian submani-
field is given by fold obtained from L by flowing along v for time t.
@H @H Example 9 Given a metric g on M, Fermats
q_ ; p_ 
@p @q principle says that light travels along geodesics.
Thus, if S is the unit cotangent bundle, then using g
A standard fact says that the flow of vH preserves
to identify the geodesic flow with the Reeb flow
the level sets of H.
one sees that light will travel along trajectories
Theorem 7 If M is a level set of H corresponding of the Reeb vector field. Given a point p in M,
to a regular value and M is a hypersurface of contact the Legendrian submanifold Lp is a sphere sitting
type, then the trajectories of vH and of the Reeb in Tp M. The Huygens principle follows from the
vector field (associated to M in Theorem 3) agree. observation that p (t) = (t (Lp )).
Thus under suitable hypothesis, Hamiltonian Using the more general S discussed above, one can
dynamics is a reparametrization of Reeb dynamics. generalize this example to light traveling in a medium
In particular, searching for periodic orbits in such a that is nonhomogeneous (i.e., the speed differs from
Hamiltonian system is equivalent to searching for point to point in M) and anisotropic (i.e., the speed
periodic orbits in a Reeb flow. Thus in this context, differs depending on the direction of travel).
636 Control Problems in Mathematical Physics

See also: Hamiltonian Fluid Dynamics; Integrable Systems Etnyre J and Ng L (2003) Problems in Low Dimensional Contact
and Recursion Operators on Symplectic and Jacobi Topology, Topology and Geometry of Manifolds (Athens,
Manifolds; Minimax Principle in the Calculus of Variations. GA, 2001), pp. 337357, Proc. Sympos. Pure Math., vol. 71.
Providence, RI: American Mathematical Society.
Geiges H Contact geometry. Handbook of Differential Geometry,
Further Reading vol. 2 (in press).
Geiges H (2001a) Contact Topology in Dimension Greater than
Aebisher B, Borer M, Kalin M, Leuenberger Ch, and Reimann Three, European Congress of Mathematics, vol. II (Barcelona,
HM (1994) Symplectic Geometry, Progress in Mathematics, 2000), Progress in Mathematics, vol. 202, pp. 535545. Basel:
vol. 124. Basel: Birkhauser. Birkhauser.
Arnold VI (1989) Mathematical Methods of Classical Mechanics, Geiges H (2001b) A brief history of contact geometry and
Graduate Texts in Mathematics, vol. 60, xvi516, pp. 163179. topology. Expositiones Mathematicae 19(1): 2553.
New York: Springer. Ghrist R and Komendarczyk R (2001) Topological features of
Arnold VI (1990) Contact Geometry: The Geometrical Method of inviscid flows. An Introduction to the Geometry and Topology
Gibbss Thermodynamics, Proceedings of the Gibbs Symposium. of Fluid Flows (Cambridge, 2000), 183201, NATO Sci. Ser. II
(New Haven, CT, 1989), pp. 163179. Providence, RI: American Math. Phys. Chem., vol. 47. Dordrecht: Kluwer Academic.
Mathematical Society. Giroux E (2002) Geometrie de contact: de la dimension trois
Beals R and Greiner P (1988) Calculus on Heisenberg manifolds. vers les dimensions superieures, Proceedings of the Inter-
Annals of Mathematics Studies 119. national Congress of Mathematicians, vol. II (Beijing, 2002),
Eliashberg Y, Givental A, and Hofer H (2000) Introduction to pp. 405414. Beijing: Higher Ed. Press.
Symplectic Field Theory, GAFA 2000 (Tel Aviv, 1999), Geom. Hofer H and Zehnder E (1994) Symplectic Invariants and
Funct. Anal. 2000, Special Volume, Part II, pp. 560673. Hamiltonian Dynamics, Birkhauser Advanced Texts: Basler
Etnyre J. Legendrian and transversal knots. Handbook of Knot Lehrbucher, pp. xiv341. Basel: Birkhauser.
Theory (in press). Taylor ME (1984) Noncommutative Microlocal Analysis, Part I,
Etnyre J (1998) Symplectic Convexity in Low-Dimensional Mem Amer. Math. Soc., 52, no. 313. American Mathematical
Topology, Symplectic, Contact and Low-Dimensional Topol- Society.
ogy (Athens, GA, 1996), Topology Appl., vol. 88, No. 12,
pp. 325.

Control Problems in Mathematical Physics


B Piccoli, Istituto per le Applicazioni del Calcolo, There are various problems one can formulate
Rome, Italy regarding systems of type [1], among which:
2006 Elsevier Ltd. All rights reserved.
Controllability Given any two states y0 and y1
determine a control function u(
) such that for
some time t > 0 we have y1 = A(t, y0 , u(
)).
Introduction
Optimal control Consider a cost function J(y(
),
Control Theory is an interdisciplinary research area, u(
)) depending both on the evolutions of y and u
bridging mathematics and engineering, dealing with and determine a control function u ~(
) and a
physical systems which can be controlled, that is, trajectory ~y(t) = A(t, y0 , u
~(
)) such that ~y(
) steers
whose evolution can be influenced by some external the system from y0 to y1 , as before, and the cost J
agent. A general model can be written as is minimized (or maximized).
Stabilization We say that y is an equilibrium if
yt At; y0; u
1
there exists u
 2 U such that A(t, y, u ) = y for every
where y describes the state variables, y(0) the initial t > 0 (here u  indicates also the constant in time
condition, and u(
) the control function. Thus, eqn control function). Determine the control u as
[1] means that the state at time t depends on the function of the state y so that y is a (Lyapunov)
initial condition but also on some parameters u stable equilibrium for the uncontrolled dynamical
which can be chosen as function of time. To be system y(t) = A(t, y(0), u(y(
))).
precise, there are some control problems which are Observability Assume that we can observe not the
not of evolutionary type; however, in this presenta- state y, but a function (y) of the state. Determine
tion we restrict ourselves to this case. conditions on  so that the state y can be
One has to distinguish among the control set U where reconstructed from the evolution of (y) choosing
the control function can take values: u(t) 2 U, and the u(
) suitably.
space of control functions, U, to which each control
function should belong: u(
) 2 U. Thus, for example, For the sake of simplicity, we restrict ourselves
we may have U = Rm and U = L1 ([0, T], Rm ). mainly to the first two problems and just mention
Control Problems in Mathematical Physics 637

some facts about the others. Also, we focus on two y2


cases: y2
Control of ordinary differential equations (ODEs) In u=
(
1) u(y) = 1
this case t 2 R, y 2 Rn , U is a set, typically
U  Rm , and A is determined by a controlled ODE
y1 u(y) = +1 y1
y_ f t; y; u 2
(u = +
A typical example in mathematical physics is the 1)
+
control of mechanical systems (Bloch 2003, Bullo
Figure 1 Example 1. The simplest example of (a) optimal
and Lewis 2005).
synthesis and (b) corresponding feedback.
Control of partial differential equations (PDEs) In
this case t 2 R, x 2 Rn , y(x) belongs to a Banach origin with maximum force on some interval [0, t]
functional space, for example, H s (Rn1 , R), U is a and then to decelerate with maximum force to reach
functional space, and A is determined by a the origin at velocity zero. The set of optimal
controlled PDE, trajectories is depicted in Figure 1a: they can
be obtained using the following discontinuous
Ft; x; y; yt ; yx1 ; . . . ; yxn ; yt ; . . . ; u 0 3
feedback, see Figure 1b. Define the curves
A typical example in mathematical physics is the   = {(y1 , y2 ) : y2 > 0, y1 = y22 } and let  be
control of wave equation using boundary condi- defined as the union   [ {0}. We define A to be
tions, see below. the region below  and A the one above. Then the
feedback is given by
There are various other possible situations we do 8
not treat here: stochastic control, when y is a random < 1 if y1 ; y2 2 A [ 
variable and A defined by a (controlled) sto- ux 1 if y1 ; y2 2 A [  
:
chastic differential equation; discrete time control, 0 if y1 ; y2 0; 0
where t 2 N; hybrid control, where t and y may have
both discrete and continuous components, and so on. Example 2 Consider a (one-dimensional) vibrating
As shown above, the control law can be assigned string of unitary length with a fixed endpoint. The
in (at least) two basically different ways. In open- model for the motion of the displacement of the
loop form, as a function of time: t ! u(t), and in string with respect to the rest position is given by
closed-loop form or feedback, as a function of the ytt y 0; yt; 0 0 5
state: y ! u(y). For example, in optimal control we
look for a control u ~(t) in open-loop form, while in with initial data
stabilization we search for a feedback control u(y). y0;  y0 ; yt 0;  y1 6
The open-loop control depends on y(0), while a
feedback control can stabilize regardless of the Assume that we can control the position of the
initial condition. second endpoint; then,

Example 1 A point with unit mass moves along a yt; 1 ut 7


straight line; if a controller is able to apply an for some control function u() 2 R.
external force u, then, calling y1 (t), y2 (t), respec-
tively, the position and the velocity of the point at Let us introduce another key concept: the reach-
time t, the motion is described by the control system able set at time t from y is the set

y_ 1 ; y_ 2 y2 ; u 4 Rt; y fAt; y; u : u 2 Ug

It is easy to check that the feedback control Various problems can be formulated in terms of
u(y1 , y2 ) = y1  y2 stabilizes the system asymptot- reachable sets, for example, controllability requires
ically to the origin, that is, for every initial data that for every y the union of all R(t; y) as t ! 1
(
y1 , 
y2 ), the solution of the corresponding Cauchy includes the entire space. The dependence of R(t; y)
problem satisfies limt ! 1 (y1 , y2 )(t) = (0, 0). on time t and on the set of controls U is also a
Another simple problem consists in driving the subject of investigation: one may ask whether the
point to the origin with zero velocity in minimum same points in R(t; y) can be reached by using
time from given initial data. It is quite easy to see controls which are piecewise constant, or take
that the optimal strategy is to accelerate towards the values within some subsets of U.
638 Control Problems in Mathematical Physics

Control of ODEs the so-called geometric control theory. The main idea
is that controllability (and properties of optimal
For most proofs we refer to Agrachev and Sachkov
trajectories) is determined by the Lie algebra gener-
(2004) and Sontag (1998).
ated by vector fields fi . For example:

Controllability
Theorem 5 (Lie-algebraic rank condition). Let L
be the Lie algebra generated by the vector fields
Consider first the case of a linear system: fi , i = 1, . . . , m, and assume f0 = 0. If L(y) is of
y_ Ay Bu; u 2 U; y0 y0 8 dimension n at every point y then the system is
controllable.
where y, y0 2 Rn , U  Rm , A is an n  n matrix and
B an n  m matrix. We have the following property We refer to Agrachev and Sachkov (2004)
of reachable sets: and Jurdjevic (1997) for general presentation of
geometric control theory and give a simple example
Theorem 1 If U is compact convex then the to show how Lie brackets characterize reachable
reachable set R(t) for [8] is compact and convex. directions.
A control system [8] is controllable if taking Example 3 Consider the Brockett integrator
U = Rm we have R(t) = Rn for every t > 0. By
linearity, this is equivalent to requiring the reachable y_ 1 u1 ; y_ 2 u2 ; y_ 3 u1 y2  u2 y1
set to be a neighborhood of the origin in case of Starting from the origin, using constant controls, we
bounded controls. Define the controllability matrix can move along curves tangent to the y1 y2 plane.
to be the n  nm matrix However, let f1 = (1, 0, y2 ) and f2 = (0, 1, y1 ) (fields
corresponding to constant controls); then their Lie
CA; B B; AB; . . . ; An1 B
bracket is given by
Controllability is characterized by the following:
f1 ; f2 0 Df2  f1  Df2  f2 0 0; 0; 2
Theorem 2 (Kalman controllability theorem). The
Moving for time t first along the integral curve of f1 ,
linear system [8] is controllable if and only if
then of f2 , then of f1 , and finally of f2 , we reach
rank(C(A, B)) = n.
a point t2 [f1 , f2 ](0) o(t2 ) along the vertical direc-
For linear systems, there exists a duality between tion y3 . This corresponds to say that the system
controllability and observability in the sense of the satisfies LARC.
following theorem:
Optimal Control
Theorem 3 Consider the linear control system [8]
and assume to observe the variable z(y) = Cy for The theory of optimal control has developed in three
some p  n matrix C. Then, observability holds if main directions:
and only if the linear system y_ = At y Ct v is Existence of optimal controls, under various
controllable. assumptions on L, f , U. When the sets F(t, y) are
convex, optimal solutions can be constructed follow-
There exists no characterization of controllability
ing the direct method of Tonelli for the calculus of
for nonlinear systems as for linear ones, but we have
variations, that is, as limits of minimizing sequences:
the linearization result:
the two main ingredients are compactness and lower-
Theorem 4 A nonlinear system is locally control- semicontinuity. If convexity does not hold, existence
lable if its linearization is. The converse is false. is not granted in general but for special cases.
Necessary conditions for the optimality of a
There are many results for the important class of
control u(). The major result in this direction is
controlaffine systems
the celebrated Pontryagin maximum principle
X
m (PMP) which extends the EulerLagrange equation
y_ f0 y fi yui 9 to control systems, and the Weierstrass necessary
i1
conditions for a strong local minimum in the
where f0 , . . . , fm are smooth vector fields on Rn and calculus of variations. Various extensions and other
U = Rm . In general, there exists no explicit represen- necessary conditions are now available (Agrachev
tation for the trajectories of [9], in terms of integrals and Sachkov 2004).
of the control as it happens for linear systems. Still, a Sufficient conditions for optimality. The standard
rich mathematical theory has been developed apply- procedure resorts to embedding the optimal control
ing techniques and ideas from differential geometry: problem in a family of problems, obtained by
Control Problems in Mathematical Physics 639

varying the initial conditions. One defines the value Alternatively, one can define the maximized
function V by Hamiltonian
Vt; 
y inf Jy; u Hy; p maxhp; f y; ui
u
where the inf is taken over the set of trajectories and but H may fail to be smooth. Another difficulty lies
controls satisfying y(t) = 
y. Under suitable assumptions, in the fact that an initial condition is given for y and
V is the solution to a first-order HamiltonJacobian a final condition is given for .
PDE. The lack of regularity of the value function V has The proof of PMP relies on a special type of
long provided a major obstacle to a rigorous mathema- variations, called needle variations, of a reference
tical analysis, solved by the theory of viscosity solutions trajectory. Given a candidate optimal control u and
(Bardi and Capuzzo Dolcetta 1997). Another method corresponding trajectory y , a time  of approximate
consists in building an optimal synthesis, that is, a continuity for f (y (), u ()) and ! 2 U, a needle
collection of trajectorycontrol pairs. variation is a family of controls u" obtained
Pontryagin maximum principle Consider a general by replacing u with ! on the interval [  ", ].
autonomous control system: A needle variation gives rise to a variation v of the
trajectory satisfying the variational equation
y_ f y; u 10
vt
_ Dy f y t; u t  vt 14
where y 2 Rn and u 2 U compact subset of Rm . We
assume to have regularity of f guaranteeing existence in classical sense only after time . Recently Piccoli
and uniqueness of trajectories for every u() 2 U. For and Sussmann (2000) introduced a setting in which
a fixed T > 0, an optimal control problem in Mayer needle and other variations happen to be
form is given by differentiable.
One may also consider some final (or initial)
min yT; u; y0 
y 11
u2U constraint:
where is the final cost and y  the initial condition. T; yT 2 S 15
More generally, one can consider also the Lagran-
R where S  R  Rn (and T not fixed). In this case, the
gian cost L(y, u)dt and reduce to this case by
final condition for p is more complicated as well as
adding a variable y0 (0) = 0 and y_ 0 = L.
the proof of PMP. It is interesting to note the many
The well-known PMP provides, under suitable
connections between PMP and classical mechanics
assumptions, a necessary condition for optimality in
framework well illustrated by Bloch (2003) and
terms of a lift of the candidate optimal trajectory to
Jurdjevic (1997).
the cotangent bundle. For problems as [11], PMP
can be stated as follows:
Value function and HJB equation In this section
Theorem 6 Let u () be a (bounded) admissible we consider the minimization problem
control whose corresponding trajectory y () = y(, u )
is optimal. Call p : [0, T] 7! Rn the solution of the inf T; yT; u 16
u2U
adjoint linear equation
for the control system
_
pt pt  Dy f y t; u t
12
pT r y T y_ f t; y; u; ut 2 U a.e. 17

Then the maximality condition subject to the terminal constraints [15], where
S  Rn1 is a closed target set.
pt  f y t; u t max pt  f y t; ! 13
!2U Theorem 7 (PDE of dynamic programming).
holds for almost every time t 2 [0, T]. Assume that the value function V, for [15][17],
is C1 on some open set 
R  Rn , not intersecting
Notice that the conclusion of the theorem can be the target set S. Then V satisfies the Hamilton
interpreted by saying that the pair (y, p) satisfies the Jacobi equation
system:  
Vs s; y min Vy s; y  f s; y; ! 0
@Hy ; p; u @Hy ; p; u !2U
18
y_ ; p_ 
@p @y 8s; y 2 
where H(y, p, u) = hp, f (y, u)i. This is a pseudo Equation [18] is called the HamiltonJacobiBellman
Hamiltonian system, since H also depends on u . (HJB) equation, after Richard Bellman. In general,
640 Control Problems in Mathematical Physics

however, V fails to be differentiable: this is the case for


FG
Example 1 along the lines   . To isolate V as the F+G
unique solution of the HJB equation, one has to resort
to the concept of viscosity solution. The dynamic
programming and HJB equation apparatus applies
also to stochastic problems for which the equation
u = +1 u = 1
happens to be parabolic, because of the Ito formula.

Optimal syntheses Roughly speaking, an optimal


synthesis is a collection of optimal trajectories, one
for each initial condition  y. Geometric techniques
provide a systematic method to construct syntheses: Figure 2 Optimal feedback for Example 4.

Step 1 Study the properties of optimal trajectories


via PMP and other necessary conditions. Control of PDEs
Step 2 Determine a (finite-dimensional) sufficient
The theory for control of models governed by PDEs
family for optimality, that is, a class of trajectories
is, as expected, much more ramified and much less
(satisfying PMP) containing all possible optimal ones.
complete. An exhaustive resume of the available
Step 3 Construct a synthesis selecting one trajec-
results is not possible in short space, thus we focus
tory for every initial condition in such a way as to
on Example 2 and few others to illustrate some
cover the state space in a regular fashion.
techniques to treat control problems and give
Step 4 Prove that the synthesis of Step 3 is indeed
various references (see also Fursikov and Imanuvilov
optimal.
(1996), Komornik (1994), and Lasiecka and Triggiani
One of the main problems in step 2 is the possible (2000), and references therein).
presence of optimal controls with an infinite number Besides the variety of control problems illustrated
of discontinuities, known as Fuller phenomenon. The in the Introduction, for PDE models one can consider
key concept of regular synthesis, of step 3, was different ways of applying the control, for example:
introduced by Boltianskii and recently refined by Boundary control One consider the system [3]
Piccoli and Sussmann (2000) to include Fuller phe- (with F independent of u) and impose the condition
nomena. The above strategy works only in some y(t, x) = u(t, x) to hold for every time t and every x in
special cases, for example for two-dimensional some region. Usually, we assume y(t) to be defined
minimum-time problems (Boscain and Piccoli 2004): bounded region  and the control acts on some set
we report below an example.   @. Obviously, also Neumann conditions are
natural as @ y = u where  is the exterior normal to .
Example 4 Consider the problem of orienting in
Internal control One consider the system [3]
minimum time a satellite with two orthogonal rotors:
with F depending on u. Thus, the control acts on the
the speed of one rotor is controlled, while the second
equation directly.
rotor has constant speed. This problem is modelled by
Other controls There are various other control
a left-invariant control system on SO(3):
problems one may consider as Galerkin-type
y_ yF uG; y 2 SO3; juj 1 approximation and control of some finite family of
modes. An interesting example is given by Coron
where F and G are two matrices of so(3), the Lie
(2002), where the position of a tank is controlled to
algebra of SO(3). Using the isomorphism of Lie
regulate the water level inside.
algebras (SO(3), [. , .]) (R3 , ), the condition that
the rotors are orthogonal reads: trace(F  G) = 0.
If we are interested to orient only a fixed semi-axis Control of a Vibrating String
then we project the system on the sphere S2 :
We consider Example 2, but various results hold for
y_ yF uG; y 2 S2 ; juj 1 hyperbolic linear systems in general. First consider
the uncontrolled system
In this case, F G and F  G are rotations around
two fixed axes and, if the angle between these two ztt z; z0; t z1; t 0 19
axes is less than =2, every optimal trajectory is a
A first integral is the energy given by
finite concatenation of arcs corresponding to con-
Z
stant control 1 or 1. The optimal synthesis can 1 h 2 i
Et jzx j jzt j2 dx
be obtained by the feedback shown in Figure 2. 2
Control Problems in Mathematical Physics 641

Then we say that the system [19] is observable at method of Coron, which consists in finding a
time T if there exists C(T) such that trajectory y such that the following hold:
Z T 1. y(0) = y(T) = 0;
E0 CT jzx 1; tj2 dt 2. the linearized system around y is controllable.
0
Then by implicit-function theorem, local controll-
which means that if we observe zero displacement
ability is granted, that is, there exits " > 0 such that
on the right end for time T then the solution has
for every data y0 , y1 of norm less than ", there exists
zero energy and hence vanishes. In this case, the
a control steering the system from y0 to y1 in time T.
system is observable for every time T 2: this is
This method does not give many advantages in the
precisely the time taken by a wave to travel from the
finite-dimensional case, but permits to obtain excel-
right end point to the left one and backward.
lent results for PDE systems such as Euler, Navier
Thanks to a duality as for the finite-dimensional
Stokes, SaintVenant, and others (Coron 2002).
case, observability of [19] is equivalent to null
controllability for [5][7], that is, to the property
Control of Schrodinger Equation
that for every initial conditions y0 , y1 there exists a
control u() such that the corresponding solution Consider the issue of designing an efficient transfer of
verifies y(x, T) = yt (x, T) = 0. More precisely, the population between different atomic or molecular
desired control is given by u(t) = ~zx (1, t), where ~z is levels using laser pulses. The mathematical descrip-
the solution of [19] minimizing the functional (over tion consists in controlling the Schrodinger equation.
L2  H 1 ) Many results are available in the finite-dimensional
case. Finite-dimensional closed quantum systems are
Jz;0; zt ; 0 in fact left-invariant control systems on SU(n), or on
Z Z Z
1 T 2 the corresponding Hilbert sphere S2n1  Cn , where
jzx 1; tj dt y0 zt ; 0dx  y1 z; 0dx n is the number of atomic or molecular levels, and
2 0
powerful techniques of geometric control are avail-
One can check that this functional is continuous and able both for what concerns controllability and
convex, and the coercivity is granted by the optimal control (Agrachev and Sachkov 2004,
observability of [19]; thus, a minimum exists by Boscain and Piccoli 2004, Jurdjevic 1997).
the direct method of Tonelli. This is an example of Recent papers consider the minimum-time pro-
the method known as Hilberts uniqueness method blem with unbounded controls as well as minimiza-
introduced by Lions (1988). tion of the energy of transition. Boscain et al. (2002)
In the multidimensional case, controllability can have applied the techniques of sub-Riemannian geo-
be characterized by imposing a condition on the metry on Lie groups and of optimal synthesis on two-
region   @ on which the control acts. More dimensional manifolds to the population transfer
precisely, rays of geometric optics in  should problem in a three-level quantum system driven by
intersect  (Zuazua 2005). two external fields of arbitrary shape and frequency.
If we consider infinite-time horizon T = 1 and Although many results are available for finite-
introduce the functional dimensional systems, only few controllability prop-
Z 1 Z erties have been proved for the Schrodinger equation
J kyk2 dt N u2 dt dx as a PDE, and in particular no satisfactory global
0
controllability results are available at the moment.
then the optimal control is determined as follows.
If (y, p) is a solution of the optimality system:
[5][6] with y = 0 outside  and Further Reading
ptt  p y 0; @ p Ny 0 on  Agrachev A and Sachkov Y (2004) Control from a Geometric
Perspective. Springer.
p 0 on @ Bardi M and Capuzzo Dolcetta I (1997) Optimal Control and
Viscosity Solutions of HamiltonJacobiBellman Equations.
then u = y on  (Lions 1988, Zuazua 2005). Boston: Birkhauser.
Bloch AM (2003) Nonholonomic Mechanics and Control, with
the collaboration of J. Baillieul, P. Crouch and J. Marsden,
Controllability via Return Method of Coron with scientific input from P. S. Krishnaprasad, R. M. Murray
and D. Zenkov. New York: Springer.
As we saw in Theorem 4, a nonlinear system may be Boscain U and Piccoli B (2004) Optimal Synthesis for Control
controllable even if its linearization is not. In this Systems on 2-D Manifolds. Springer SMAI, vol. 43. Heidelberg:
case, controllability can be proved by the return Springer.
642 Convex Analysis and Duality Methods

Boscain U, Chambrion T, and Gauthier J-P (2002) On the K P Komornik V (1994) Exact Controllability and Stabilization. The
problem for a three-level quantum system: optimality implies Multiplier Method. Chichester: Wiley.
resonance. Journal of Dynamical and Control Systems Lasiecka I and Triggiani R (2000) Control theory for Partial
8: 547572. Differential Equations: Continuous and Approximation The-
Bullo F and Lewis AD (2005) Geometric Control of Mechanical ories. Cambridge: Cambridge University Press.
Systems. New York: Springer. Lions JL (1988) Exact controllability, stabilization and perturba-
Coron JM (2002) Return method: some application to flow tions for distributed systems. SIAM Review 30: 168.
control. Mathematical Control Theory, Part 1, 2 (Trieste, Piccoli B and Sussmann HJ (2000) Regular synthesis and
2001). In: Agrachev A (ed.) ICTP Lecture Notes, vol. VIII. sufficiency conditions for optimality. SIAM Journal of Control
Trieste: Abdus Salam Int. Cent. Theoret. Phys. Optimization 39: 359410.
Fursikov AV and Imanuvilov O Yu (1996) Controllability of Sontag ED (1998) Mathematical Control Theory. New York:
Evolution Equations. Lecture Notes Series, vol. 34. Seoul: Springer.
Seoul National University. Zuazua E (2005) Propagation, observation and conrol of wave
Jurdjevic V (1997) Geometric Control Theory. Cambridge: approximatex by finite difference methods. SIAM Review
Cambridge University Press. 47: 197243.

Convex Analysis and Duality Methods


G Bouchitte, Universite de Toulon et du Var, is recovered by saying that A is convex, where its
La Garde, France indicator function A is defined by setting
2006 Elsevier Ltd. All rights reserved.

0 if x 2 A
A x
1 otherwise
Introduction
Continuity and Lower-Semicontinuity
Convexity is an important notion in nonlinear
optimization theory as well as in infinite- A first consequence of the convexity is the continuity
dimensional functional analysis. As will be seen on the topological interior of the domain. We refer for
below, very simple and powerful tools will be instance to Borwein and Lewis (2000) for a proof of
derived from elementary duality arguments (which
Theorem 1 Let f : X ! R [ {1} be convex and
are by-products of the MoreauFenchel transform
proper. Assume that supU f < 1, where U is a
and HahnBanach theorem). We will emphasize on
suitable open subset of X. Then f is continuous and
applications to a large range of variational pro- locally Lipschitzian on all int(dom f ).
blems. Some arguments of measure theory will be
skipped. As an immediate corollary, a convex function on
a normed space is continuous provided it is
majorized by a locally bounded function. In the
Basic Convex Analysis finite-dimensional case, it is easily deduced that a
In the following, we denote by X a normed vector finite-valued convex function f : Rd ! R is locally
space, and by X the topological dual of X. If Lipschitz. Furthermore, by Aleksandrovs theorem,
a topology different from the normed topology is f is almost everywhere twice differentiable and the
used on X, we will denote it by . For every x 2 X non-negative Hessian matrix r2 f coincides with the
and A  X, V x denotes the open neighborhoods of x absolutely continuous part of the distributional
and int A, cl A, respectively, the interior and the Hessian matrix D2 f (it is a Radon measure taking
closure of A. We deal with extended real-valued values in the non-negative symmetric matrices).
functions f : X ! R [ {1}. We denote by dom f = However, in infinite-dimensional spaces, for
f 1 (R) and by epi f = {(x, ) 2 X  R: f (x) } ensuring compactness properties (as, e.g., in condi-
the domain and the epigraph of f, respectively. We tion (ii) of Theorem 4 below), we need to use weak
say that f is proper if dom f 6 ;. Recall that f is topologies and the situation is not so simple.
convex if for every (x, y) 2 X2 and t 2 [0, 1], there A major idea consists in substituting the continuity
holds property with lower-semicontinuity.
f tx 1  ty tf x 1  tf y Definition 2 A function f : X ! R [ {1} is -l.s.c.
at x0 2 X if for all  2 R, there exists U 2 V x0
by convention 1 a 1
such that f >  on U. In particular, f will be l.s.c. on
The notion of convexity for a subset A  X all X provided f 1 ((r, 1)) is open for every r 2 R.
Convex Analysis and Duality Methods 643

Remark 3 Definition 7 Let f : X ! R [ {1}. The Moreau


Fenchel conjugate f  : X ! R [ {1} of f is defined
(i) The following sequential notion can be also
by setting, for every x 2 X :
used: f is -sequentially l.s.c. at x0 if
 f  x supfhxjx i  f xjx 2 Xg
8xn  X xn ! x0 ) lim inf f xn  f x0
n!1
In a symmetric way, if f  is proper on X , we define
It turns out that this notion (weaker in general) the biconjugate f  : X ! R [ {1} by setting
is equivalent to the previous one provided x0
admits a countable basis of neighborhoods. f  x supfhxjx i  f  x jx 2 X g
(ii) A well-known consequence of HahnBanach As a consequence, the so-called Fenchel inequality
theorem is that, for convex functions, the lower- holds:
semicontinuity property with respect to the
normed topology of X is equivalent to the weak hxjx i  f x f  x ; x; x 2 X  X
(or weak sequential) lower-semicontinuity.
Notice that f does not need to be convex. However,
Theorem 4 (Existence). Let f : X ! R [ {1} be if f is convex, then f  agrees with the Legendre
proper, such that Fenchel transform.
(i) f is -l.s.c., Definition 8 Let f : X ! R [ {1}. The sub-
(ii) 8r 2 R, f 1 ((1, r]) is -relatively compact. differential of f at x is the possibly void subset of
Then there is x  2 X such that f ( x) = inf f and @f (x)  X defined by
argmin f := {x 2 Xjf (x) = inf f } is -compact. @f x : fx 2 X: f x f  x hx; x ig
In practice, the choice of the topology  is ruled
It is easy to check that @f (x) is convex and weak-
by the condition (ii) above. For example, if X is a
star closed. Moreover, if f is convex and has a
reflexive infinite-dimensional Banach space and if f
differential (or Gateaux derivative) f 0 (x) at x, then
is coercive (i.e., limkxk ! 1 f (x) = 1), we may take
@f (x) = {f 0 (x)}. After summarizing some elementary
for  the weak topology (but never the normed
properties of the Fenchel transform, we give
topology). This restriction implies in practice that
examples in Rd or in infinite-dimensional spaces.
the first condition in Theorem 4 may fail. In this
case, it is often useful to substitute f with its lower- Lemma 9
semicontinuous (l.s.c.) envelope.
(i) f  is convex, l.s.c. with respect to the weak star
Definition 5 Given a topology , the relaxed function topology of X .
f (=f  ) is defined as (ii) f  (0) = inf f and f  g ) f   g .
f x supfgxjg : X ! R [ f1g; (iii) (inf i fi ) = supi fi , for every family {fi }.
(iv) f  (x) = sup{g(x): g affine continuous on X and
g is -l:s:c:; g  f g g  f } (by convention, the supremum is identi-
cally 1 if no such g exists).
It is easy to check that f is -l.s.c. at x0 if and only
if f (x0 ) = f (x0 ). Futhermore, Proof (i) This assertion is a direct consequence of the
f x sup inf f ; fact that f  can be written as the supremum
epi f clXR epi f of functions gx , where gx := hx j i  f (x). Clearly,
U2V u U
these functions are affine and weakly star-continuous
We can now state the relaxed version of Theorem 1.4. on X . The assertions (ii), (iii) are trivial. To obtain (iv),
Theorem 6 (Relaxation). Let f : X ! R [ {1}, it is enough to observe that an affine function g of
then: inf f = inf f . Assume further that, for all the form g(x) = hx, x i   satisfies g  f iff
real r, f 1 ((1, r]) is T -relatively compact; then f f  (x )  . &
attains its minimum and argmin f = argmin f \ Example 1 Let f : X ! R, be defined by
{x2 Xjf (x) = f (x)}.
1 p
f x kxkX ; 1 < p < 1
p
MoreauFenchel Conjugate
then,
The duality between X and X will be denoted by the
symbol h j i. If X is a Euclidian space, we identify X 1  p0 1 1
f  x kx kX ; with 1
with X via the scalar product denoted ( j ). p0 p p0
644 Convex Analysis and Duality Methods

whereas, for p = 1, we find f  = B , where Rd ! [0, 1] a T BRd -measurable integrand.


B = {kx k  1}. Then the partial conjugate  (x, z ) := sup{hz j z i 
2 (x, z): z 2 Rd } is a convex measurable integrand.
Example 2 Let A 2 R dsym be a symmetric positive-
Let us define
definite matrix and let f (x) := (1=2)(Ax j x)(x 2 R d ).
Then, for all y 2 Rd , we have f  (y) = (1=2)(A1 y j y). Z
Notice that if A has a negative eigenvalue, then I : u 2 Lp d ! x; uxd 2 R [ f1g

f  1.
and assume that I is proper. Then there holds
Particular examples on Rd are also very popular.
(I ) = I , where
For instance:
Z
0
Minimal surfaces I  : v 2 Lp d !  x; vxd
q 

f x 1 jxj2
( q
2 Duality Arguments
f  y  1  jyj if jyj  1
1 otherwise Two Key Results

Entropy The first result related to the biconjugate f  is


 a consequence of the HahnBanach theorem.
x log x if x 2 R
f x ; f  y expy  1 Recalling the assertion (v) of Lemma 9, we notice
1 otherwise that the existence of an affine minorant for f is
equivalent to the properness of f  (i.e.,
Example 3 Let C  X be convex, and let f = C . 9x0 2 X : f  (x0 ) < 1).
Then,
Theorem 10 Let f : X ! R [ {1} be convex and
f  x C c suphxjx i proper. Then
x2C
(i) f is l.s.c. at x0 if and only if f  is proper
support function of C
and f  (x0 ) = f (x0 ). In particular, the lower-
Notice that if M is a subspace of X, then semicontinuity of f on all X is equivalent to the
(M ) = M? . We specify now a particular case of identity f f  .
interest. (ii) If f  is proper, then f  = f .
Let  be a bounded open subset of Rn . Take
 R d ) to be the Banach space of continu- Proof We notice that by Lemma 9, f   f and f 
X = C0 (;
 with values in Rd . is l.s.c (even for the weak topology). Therefore,
ous functions on the compact )
f   f and, moreover, f is l.s.c. at x0 if f  (x0 ) 
As usual, we identify the dual X with the space
 R d ) of R d -valued Borel measures on   with f (x0 ). Conversely, if f is l.s.c. at x0 , for every 0 <
Mb (;
f (x0 ), there exists a neighborhood V of x0 such
finite total variation. Let K be a closed convex of
that V  (1, 0 ) \ epi f = ;. It follows that
Rd such that 0 2 K. Then 0K () := sup {( j z): z 2 K}
epi f is a proper closed convex subset of X  R
is a non-negative convex l.s.c. and positively
which does not intersect the compact singleton
1-homogeneous function on R d (e.g., K is the
{(x0 , 0 )}. By applying the HahnBanach strict
Euclidean norm if K is the unit ball of Rd ). Let us
separation theorem, there exists (x0 , 0 ) 2 X  R
define C := { 2 X: (x) 2 K, 8x 2 }. Then, we
such that
have
Z hx0 ; x0 i 0 0 < hx; x0 i 0
C  0K
for all x;  2 epi f
Z  
d
: 0K
dx 1
Taking  ! 1 and x 2 dom f , we find 0  0. In
 d

fact, 0 > 0 as the strict inequality above would be


where
is any non-negative Radon measure such violated for x = x0 . Eventually, we obtain that f is
that
(the choice of
is indifferent). In the case minorized by the affine continuous function
where K is the unit ball, we recover the total g(x) = hx  x0 , x0 =i 0 . Thus, we conclude
variation of . that f  is proper and that f  (x0 )  0 .
Example 4 (Integral functionals). Given 1  p < The assertion (ii) is a direct consequence of the
1, (, , T ) a measured space and :   equivalence in (i). &
Convex Analysis and Duality Methods 645

Theorem 11 Let X be a normed space and let convex l.s.c. function and let F 7! X be the convex
f : X ! [0, 1] be a convex and proper function; functional defined by
assume that f is continuous at 0, then 
Au if u 2 DA
(i) f  achieves its minimum on X Fu
1 otherwise
(ii) f (0) = f  (0) = inf f 
Proof Assume that there exists u0 2 D(A) such that  is
continuous at Au0 . Then
(i) Let M be an upper bound of f on the ball {kxk
R}. Then (i) The Fenchel conjugate of F is given by

f  x  supfhx; x i  f x: kxk  Rg 8f 2 X ; F f inff :  2 Y  ; A  f g


 Rkx kX  M
where, if both sides of the equality are finite, the
Hence, for every r, the set {x 2 X : f  (x )  r} infimum on the right-hand side is achieved.
is bounded, thus -relatively compact, where  is (ii) If, in addition, Y is reflexive and  is l.s.c.
the weak-star topology on X . By assertion (i) of coercive, we have
Lemma 9, f  is -l.s.c. and Theorem 4 applies.
(ii) By Theorem 10, since f is convex proper and 
Fu F u inffpj u; p 2 GAg 3

l.s.c. at x0 = 0, we have f (0) = f  (0) = inf f  .


& where G(A) denotes the graph of A.

Some Useful Consequences Proof

Proposition 12 (Conjugate of a sum). Let f , g : X ! (i) Define H, K : X  Y ! R [ {1} by


R [ {1} be convex such that
Hu; p GA u; p; Ku; p p
9x0 2 X : f is continuous at x0 and gx0 < 1 2

Then we have the identity F (f ) = (H K) (f , 0),


Then where the conjugate of H K is taken with
(i) respect to the duality (X  Y, X  Y  ). From the
f g x = inf {f  x1 g x2 } assumption, K is continuous at (u0 , Au0 ) 2
x1 x2 = x
dom H. By Proposition 12, we obtain

(the equality holds in R).
(ii) If both sides of the equality in (i) are finite, then H K f ; 0
the infimum in the right-hand side is achieved. inf  fK f  g;  H  g; g
g;2X Y
Proof Without any loss of generality, we may
assume that x = 0 (we reduce to this case by After a simple computation, it is easy to check
substituting g with g  h , x i). We let that
hp infff x p gxjx 2 Xg 
H g;  0
 if A  f
Noticing that (p, x) 7! f (x p) g(x) is convex, we 1 otherwise

infer that h(p) is convex as well. As h is majorized 
by the function p 7! f (x0 p) g(x0 ), which by [2] K f  g;    if g f
1 otherwise
continuous at 0, we deduce from Theorems 1 and 11
that h(0) = h (0) and that h achieves its infimum. (ii) Let J(u) := inf{(p): (u, p) 2 G(A)}. As observed
Now h(0) = inf(f g) = (f g) (0) and for F in the proof of (i), we have the identity
J (f ) = (H K) (f , 0). Therefore, in view of
h p supfhp; p i  hp: p 2 Xg  = F = J and it is enough to
Theorem 10, F
supfhp; p i  f x p  gx: x 2 X; p 2 Xg prove that J is convex l.s.c. proper. Let us
g p f  p consider a sequence (un ) in X converging to
some u 2 X. Without any loss of generality, we
The assertions (i), (ii) follow since h (0) =
may assume that lim inf J(un ) = lim J(un ) < 1.
min h = min {g (p ) f  (p )}. &
Then there is a sequence (pn ) such that, for every
Proposition 13 (Composition). Let X, Y be two 
n, (un , pn ) 2 G(A) and J(un )  (un )  1=n. As
Banach spaces and A : X 7! Y a linear operator with is coercive, {pn } is bounded in the reflexive
dense domain D(A). Let  : Y ! R [ {1} be a space Y and possibly passing to a subsequence,
646 Convex Analysis and Duality Methods

we may assume that pn converges weakly to is a possibly concentrated Radon measure sup-
some p. Since G(A) is a (weakly) closed subspace ported on . In general, the operator A : u 2
of X  Y, we infer that (u, p) as the limit of C1 ()  L2 () 7! ru 2 L2 (; Rn ) is not closable
(un , pn ) still belongs to G(A). Thus, we conclude, and we need to come back to the general formula
thanks to the (weak) lower-semicontinuity of  [3]. The general structure of G(A) has been given in
Bouchitte et al. (1997) and Bouchitte and Fragala
lim inf Jun lim pn  p  Ju (2002, 2003), namely
n n
&

An immediate consequence of Propositions 12 and u;  2 GA ()u 2 W 1;2 ; 9 2 L2 ; Rn :


13 is the following variant:  r u ; x 2 T x?
Proposition 14 Under the same notation as in
Proposition 13, let  : X ! R [ {1} be a convex where T (x), r (x) are suitable notions of tangent
function and assume that there exists u0 2 D(A) space and tangential gradient with respect to , and
such that F(u0 ) < 1 and  is continuous at Au0 . W 1, 2 denotes the domain of the extended tangential
Then we have gradient operator.
Remark 16 The assertion (ii) of Proposition 13
inf f u Aug sup f  A    g is not valid in the nonreflexive case. In
u2X 2Y 
particular, for
where the supremum on the right-hand side is Z
achieved. Furthermore, a pair ( u, ) is optimal if Fu f x; rudx
and only if it satisfies the relations:  2 @(A u) and 
A  2 @ (
u).
where f (x ,  ) has a linear growth at infinity,
Remark 15 From the assertion (ii) of Proposition we need to take Y as the space of Rn -values
13, we may conclude that F is l.s.c. whenever the vector measures on  and the relaxed functional
operator A is closed. If now A is merely closable F needs to be indentified on the space BV()
 we obtain
(with closure denoted by A), of integrable functions with bounded variations.
 The computation of F is a delicate problem for
 

Fu GAu if u 2 dom A which we refer to Bouchitte and Dal Maso (1993)
1 otherwise and Bouchitte and Valadier (1998).
This is the typical situation when F is an integral Remark 17 By duality techniques, it is possible
functional defined on smooth functions of the kind also to handle variational integrals of the kind
Z Z
Fu f x; ru dx Fu f x; ux; ruxdx
 

where  is an bounded open subset of R n , f :   even if the dependence of f (x, u, z) with respect to u
Rn ! R is a convex integrand with quadratic growth is nonconvex. The idea consists in embedding the
(i.e., cjzj2  f (x, z)  C(1 jzj2 for suitables C  space BV() in the larger space BV(  R) through
c > 0). Then X = L2 (), Y = L2 (; Rn ), the map u 7! 1u , where 1u is the characteristic
Z function defined on   R by setting
Gv f x; vx dx 

1u x; t : 1 if ux > t
0 otherwise
and A : u 2 C1 () 7! ru 2 L2 (; Rn ). It turns out
that A is closable and that the domain of A  Then it is possible to show, under suitable
characterizes the Sobolev space W 1, 2 () on which conditions on the integrand f, that there exists
 coincides with the distributional gradient
A a convex l.s.c., 1-homogeneous functional
operator. G : BV(  R) ! R [ {1} such that F(u) = G(1u ).
The situation is more involved if we consider This functional G is constructed as in the Example
3 taking C to be a suitable convex subset of
Z C0 (  R). This nice new idea has been the key
Fu f x; ru d tool of the calibration method developed recently

(Alberti et al. 2003).
Convex Analysis and Duality Methods 647

Convex Variational Problems in Duality sup h . Recalling [4], we therefore consider the dual
problem:
Finite-Dimensional Case
 
P  sup b  y : y  0; AT c  0
We sketch the duality scheme in two cases.

Linear programming Let c 2 R n , b 2 Rm and A an Theorem 19 The following assertions are equivalent:
m  n matrix. We denote by AT the transpose
(i) (P) has a solution.
matrix. We consider the linear program
(ii) (P  ) has a solution.
P inffcjx: x  0; Ax  bg (iii) There exists (x0 , y0 ) 2 Rn  Rm
such that
Ax0  b, AT y0 c  0.
and its perturbed version (p 2 Rm )
In this case, we have min (P) = max (P  ) and
hp : inffcjx: x  0; Ax p  bg an admissible pair ( x, y) is optimal if and
only if c  x = b  y or, equivalently, satisfies
 
An easy computation gives the complementarity relations: (A x  b)  
y=
8y 2 Rm ;  (AT y c)  x
 = 0.
if AT y c  0; y  0 4

h y bjy
1 otherwise
Convex programming Let f , g1 , . . . , gm : X ! R be
Lemma 18 Assume that inf (P) is finite. Then: convex l.s.c. functions and the optimization problem
(i) h is convex proper and l.s.c. at 0. P infff x: gj x  0; j 1; 2 . . . ; mg
(ii) (P) has at least one solution.
Here X = Rn or any Banach space. As before, we
Proof We introduce the (n m)  (m 1) matrix introduce the value function
B defined by
  p 2 Rm ; hp : infff x:
cT 0
B : gj x pj  0 j 2 1; 2; . . . ; mg
A Im
and compute its Fenchel conjugate:
(Im is the m-dimensional identity matrix). Denote
{b1 , b2 , . . . , bnm }  Rm1 P
the columns of B and K 
inf
the convex cone K := { jj = nm
j bj : j  0}. By 2 Rm ; h x2X fLx; g if  0
=1 1 otherwise
Farkas lemma, this cone K is closed. P
where L(x, ) := f (x) i gi (x) is the so-called
(i) Let  := lim inf {h(p): p ! 0}. We have to prove
Lagrangian. We notice that h is convex and that
that   h(0) = inf P. Let {p" } be a sequence in
the equality h(0) = h (0) is equivalent to the zero-
Rm such that p" ! 0 and h(p" ) ! . By the
duality gap relation
definition of h, we may choose x"  0 such that
Ax"  b and (c j x" ) ! . Then we see that the inf sup Lx; sup inf Lx;
column vector x~" associated with (x" , b  Ax" ) 2 x x

Rnm satisfies: Bx~" 2 K and


  This condition is fulfilled, in particular, if we make
 the following qualification assumption (ensuring
Bx~" !
b that h is continuous at 0 and Theorem 11 applies):
Therefore, 9x0 2 X : f continuous at x0 ; gj x0 < 0; 8j 5

 
 Theorem 20 Assume that [5] holds. Then x  is
2K
b optimal for (P) if and only if there exist Lagrangian
~ = (x, x0 ) such that x  0, x0  0,
and there exists x multipliers 1 , 2 , . . . m in R such that
(c j x) =  and Ax x0 = b. It follows that x is !
X
admissible for (P) and then (c j x) =   h(0).  2 argmin f
x j gj ; j gj 
x 0; 8j
(ii) We repeat the proof of (i) choosing p" = 0 so X j
that  = inf (P). &
Notice that the existence of such a solution x 
n
Thanks to the assertion (i) in Lemma 18, we deduce is ensured if, for example,
P X = R and if, for some
from Theorem 10 that inf (P) = h(0) = h (0) = k > 0, the function f k j gj is coercive.
648 Convex Analysis and Duality Methods

PrimalDual Formulations in Mechanics Futhermore, a pair ( u, ) is optimal if and only if it


satisfies the following system:
We present here the example of elasticity which
motivated the pioneering work by J J Moreau on div  f on  equlibrium
convex duality techniques. Further examples can be x 2 @jx; e
u a:e: on  constitutive law
found in Ekeland and Temam (1976). An elastic body is
placed in a bounded domain   Rn whose boundary u
0 a:e: on 0
 consists of two disjoint parts  = 0 [ 1 . The n g on 1
unknown u :  ! Rn (deformation) satisfies a Dirichlet
condition u = 0 on 0 , where the body is clamped. The
system is subjected to a surface load g 2 L2 (1 ; Rn ) and Duality in Mass Transport Problems
to a volumic load f 2 L2 (; Rn ). The static equilibrium
problem has the following variational formulation: General Cost Functions
Z Z Let X, Y be a compact metric space and c : X 
P inf jx; eu dx  f  u dx Y ! [0, 1) a continuous cost function. We denote
u0 on 0 
Z  by P(X), P(X  Y) the sets of probability measures
 g  u dHn1 on X and X  Y, respectively. Given two elements
1 2 P(X),  2 P(Y), we denote by ( , ) the subset
of probability measures in P(X  Y) whose margin-
where e(u) := (1=2)(ui, j uj, i ) denotes the symmetric
2 als are, respectively, and . Identified as a subset
strain tensor and j : (x, z) 2   Rnsym ! R is a
of (C0 (X  Y)) (the space of signed Radon mea-
convex integrand representing the local elastic
sures on X  Y), it is convex and weakly-star
behavior of the material. We assume a quadratic
compact. The MongeKantorovich formulation of
growth as in Remark 15 (in the case of linear
the mass transport problem reads as follows:
elasticity, an isotropic homogeneous material is Z 
characterized by the quadratic form
Tc ;: inf cx;ydxdy:  2  ; 6

XY
jx; z jtrzj2 jzj2
2 This formulation, where the infimum is achieved (as
, being the Lame constants). we minimize an l.s.c. functional on a compact set for
We apply Proposition 14 with X = W 1, 2 (; Rn ), the weak star topology), is already a relaxation of
2
Y = L2 (; Rnsym ), Au = e(u) and where we set the initial Monge mass transport problem,
8 R Z 
< Rf  u dx
> #
inf cx; Tx dx: T 
T X
u  1 g  u dHn1 if u 0 on 0
>
: where the infimum is searched among all transports
1 otherwise
Z maps T : X 7! Y pushing forward on  (i.e., such
v jx; v dx that (T 1 (B) = (B) for all Borel subset B  Y).
 This is equivalent to restricting the infimum in [6] to
After some computations, we may write the supre- the subclass {T }  ( , ), where
mum appearing in Proposition 14 as our dual Z
problem hT ; x; yi : x; Tx dx
 Z X
2
P  sup  j x;  dx:  2 L2 ; Rnsym ; In order to find a dual problem for [6], we fix

  2 P(Y) and consider the functional F : Mb (X) !
[0, 1) defined by
div  f on ;   n g on 1

Tc ;  if  0; X 1
F
where j is the MoreauFenchel conjugate with 1 otherwise
respect to the second argument and n(x) denotes
the exterior unit normal on . The matrix-valued (Mb (X) denote the Banach space of (bounded)
map  is called the stress tensor and j the stress signed Radon measures on X).
potential. Note that the boundary conditions for n Lemma 22 F is convex, weakly-star l.s.c. and
have to be understood in the sense of traces. proper. Its MoreauFenchel conjugate is given by
Z
Theorem 21 The problems (P) and (P  ) have
solutions and we have the equality: inf(P) = sup (P  ). 8 2 C X; F  c ydy
0 
Y
Convex Analysis and Duality Methods 649

where Let us introduce the dual problem of [6]:


c Z Z 
y : inffcx; y  x: x 2 Xg
sup d d : ; 2 F c 7

X Y
Proof The convexity property is obvious and the
properness follows from the fact that We will say that (, ) 2 F c is a pair of c-concave
Z conjugate functions if = c and c = (where
c
F  cx; y dxdy symmetrically (x) := inf {c(x, y)  (x): y 2 Y}).
XY Checking the latter condition amounts to verifying
Let n be such that n * (weakly star). We may that enjoys the so-called c-concavity property
assume that lim inf n F( n ) = limn F( n ) :=  is finite. cc = (in general, we have only cc  , whereas
Then n and the associated optimal n are prob- ccc = c ). We refer for instance to Villani (2003) for
ability measures on X and on X  Y, respectively. further details about this c-duality.
As X and Y are compact, possibly passing to a Now, by exploiting Theorem 10 and Lemma 22,
subsequence, we may assume that n * , and we obtain a very simple proof of Kantorovich
clearly we have  2 ( , ). Since c(x, y) is l.s.c. duality theorem:
non-negative, we conclude that Theorem 23 The following duality formula holds:
Z
Z Z 
lim inf F n lim inf cx; yn dxdy
n n XY Tc ;  sup d d : ; 2 F c
Z X Y
 cx; y dxdy Moreover, the supremum in the right-hand side
XY
member is achieved by a pair (,  ) of conjugate
F
c-concave functions such that, for any optimal  in
Let us compute now F (). We have  (y) = c(x, y), -a.e.
[6], there holds (x)
Z
 Proof By Theorem 10 and Lemma 22, we have
F inf cx; ydxdy
XY
Z  Tc ;  F
Z Z 
 d : 2 PX;  2  ; 
sup d c d: 2 C0 X
Z X
Z X Z
Y

inf cx; y  xdxdy:
XY  sup d d: ; 2 F c
 X Y
 2  ;   Tc ; 
Z
where the last inequality follows from the definition
 c y dy of F c . Therefore, inf [6] = sup [7]. Furthermore, on
Y
the right-hand side of first equality, we increase the
To prove that the last inequality is actually an supremum by substituting with cc (recall that
equality, we observe that, for every y 2 Y and 2 ccc = c ). Thus,
C0 (X), the minimum of the l.s.c. function c(  , y)  Z Z
is attained on the compact set X and there exists a sup7
sup d c d: 2 C0 X;
Borel selection map S(y) such that c (y) = c(S(y), y)  X Y

(S(y) for all y 2 Y. We obtain the desired equality by
choosing  defined, for every test , by c-concave
Z Z
Take a maximizing sequence (n , cn ) of c-concave
x; ydxdy : Sy; ydy
XY Y conjugate functions. It is easy to check that {fn }
& is equicontinuous on X: this follows from the c-con-
cavity property and from the uniform continuity of
We observe that, for every 2 C0 (X), the func- c (observe that n (x1 )  n (x2 ) = cc cc
n (x1 )  n (x2 ) 
tion c introduced in Lemma 22 is continuous (use supY {c(x1 ,  )  c(x2 ,  )}). Then, by Ascolis theorem,
the uniform continuity of c) and therefore the pair possibly passing to subsequences, we may assume
(, c ) belong to the class that: n  cn converges uniformly to some continuous
 function  where {cn } is a suitable sequence of
F c : ;  2 C0 X  C0 Y:
reals. Then, one checks that  is still c-concave
x y  cx; yg and that (n  cn )c = cn cn converges uniformly to
650 Convex Analysis and Duality Methods

c . Thus, recalling that (X) = (Y) = 1, we


consequence of the triangular inequality, we have
deduce that the following equivalence:
Z Z 
c c-concave , x  y  cx; y; 8x; y
sup7
lim n d n d
n
Z X Y

, c 
Z
c
lim n  cn d n cn d Let us denote Lip1 (X) := {u 2 C0 (X): u(x)  u(y) 
n X Y
Z Z c(x, y)}. The first assertion of Theorem 23 becomes
 d
c
 d
the KantorovichRubintein duality formula:
X Y Z 
The last assertion is a consequence of the extrem- Tc ;  max u d  : u 2 Lip1 X 8

X
ality relation:
As it appears, Tc ( , ) depends only on the differ-
0 inf6
 sup7
ence f =  , which belongs to the space M0 (X) of
Z signed measure on X with zero average. Defining

cx; y  x
  y dxdy N(f ) := Tc (f , f  ) provides a seminorm (Kantoro-
XY
vich norm) on M0 (X) (it turns out that M0 (X) is
& not complete and that in general its completion is a
strict subspace of the dual of Lip(X)).
We will now specialize to the case where X is a
Remark 24 compact manifold equipped with a geodesic dis-
(i) In their discrete version (i.e., ,  are atomic tance. This will allow us to link the original problem
measures), problems [6] and [7] can be seen as to another primaldual formulation closer to that
particular linear programming problems (see the considered in the section Primaldual formulation
section Finite-dimensional case). in mechanics and yielding to a connection with
(ii) The case X = Y  Rn and c(x, y) = (1=2)jx  yj2 partial differential equations. As a model example,
let us assume that K = ,  where  is a bounded
is important. In this case, the notion of c-concavity
is linked to convexity and the Fenchel transform connected open subset of Rn with a Lipschitz
boundary. Let     be a compact subset (on
since, for every 2 C0 (X), one has
which the transport will have zero cost) and define
!
j  j2 j  j 2 
 c  cx;y: inf H1 S n :
2 2 

S Lipschitz curve joining x to y; S   9

Then if (,
 c ) is a solution of [7], we find that where H1 denotes the one-dimensional Hausdorff
measure (length). It is easy to check that
jxj2
0 x :  x
 cx; y minf x; y;  x;   y; g
2
where  (x, y) is the geodesic distance on  (induced
is convex continuous and that the extremality by the Euclidean norm). Furthermore, the following
condition: (x)
 c (y) = c(x, y) is equivalent to characterization holds:
Fenchel equality 0 (x) 0 (y) = (xjy). There-
fore, any optimal  is supported in the graph u 2 Lip1 X () u 2 W 1;1 ;
of the subdifferential map @0 . In the case jruj  1 a.e. in ; u cte on  10

where is absolutely continuous with respect to


the Lebesgue measure, it is then easy to deduce Since f :=   is balanced, the value of the
that the optimal  is unique and that  = T0 , constant on  in [10] is irrelevant and can be set
where T0 = r0 is the unique gradient (a.e. to 0. Thus we may rewrite the right hand side
defined) of a convex function such that member of [8] in a equivalent way as
r]0 ( ) = . This is a celebrated result by Y Z
Brenier (see, e.g., the monographs by Evans max u df: u 2 W 1;1 ;


(1997) and Villani (2003)). 
jruj  1 a.e. on ; u 0 on  11

The Distance Case

In the following, we assume that X = Y and that We will now derive a new dual problem for [11]
c(x, y) is a semidistance. As an immediate by using Proposition 14. To this aim, we consider
Convex Analysis and Duality Methods 651

 (as a closed subspace of W 1, 1 ()),


X = C1 () Remark 27 Given a solution  for [6], we can
 Rn ), Y  = Mb (;
Y = C0 (;  R n ) and the operator construct a solution  for [12] by selecting for every
A : u 2 X 7! ru 2 Y. (x, y) 2 spt(
 ) a geodesic curve Sxy joining x and y
 f =   and c (possibly passing through the free-cost zone ) and
Theorem 25 Let ,  2 P(),
by setting, for every test :
defined by [9]. Then,
Z Z !
Z

h ; i : 1 
 S dH dxdy
Tc ;  min  R n ;
j j: 2 Mb ;  
xy
 Sxy



div f on  n 12
where Sxy denote the unit oriented tangent vector
(see Bouchitte and Buttazzo (2001)). It is also
possible to show (see Ambrosio (2003)) that any
where the divergence condition is intended in the solution  can be represented as before through a
sense that particular solution . As a consequence, the support
Z Z of any solution  of [12] is supported in the geodesic
 r df envelope of the set spt( ) [ spt() [ . However, we
 
  stress the fact that, in general, there is no uniqueness
at all of the optimal triple ( , u  for [6], [11]
, )
for all 2 C1 compactly supported in R n n.
and [12].
Proof (sketch)
R We apply Proposition 14 with
Remark 28 An approximation procedure for par-
(u) =   u df if u = 0 on  (1 otherwise),
 (1 otherwise). ticular solutions of problems [11], [12] can be
A r, and (v) = 0 if jvj  1 on 
obtained by solving a p-Laplace equation and then
We obtain that the minimum  in [12] is reached
by sending p to infinity. Precisely, consider the
and that  = , where
solution up 2 W 1, p () of
 Z

 : inf  u df: u 2 C1 ; divjrujp2 ru f 
on n


 u0 on 
jruj  1 on  u 0 on 
which, for p > n, exists (due to the compact
embedding W 1, p ()  C0 ())  and is unique. In
To prove that  = Tc ( , ) = sup (11), we consider a Bouchitte et al. (2003) it is proved that the sequence
maximizer u  in [11] and prove that it can be {(up , p )}, where p = jrup jp2 rup , is relatively
approximated uniformly by a sequence {un } of compact in Mb (;  Rn )  C0 ( (weakly star with
functions in C1 () which satisfy the same con- respect to the first component) and that every cluster
straints. This technical part is done by truncation point (  solves [11], [12]. It is an open problem
u, )
and convolution arguments (we refer to Bouchitte to know whether or not such a cluster point is
et al. (2003) for details). & unique. If the answer is yes, the process described
Remark 26 By localizing the integral identity above would select one optimal pair among all
associated with [12], it is possible to deduce possible solutions. As far as problem [11] is
the optimality conditions which characterize optimal concerned, this problem is connected with the
pairs (  for [11], [12] (without requiring any
u, ) theory of viscosity solutions for the infinite Lapla-
regularity). This is done by using a weak notion cian (see Evans (1997)) although this theory does
of tangential gradient with respect to a measure not provide an answer as it erases the role of the
(see Bouchitte et al. (1997) and Bouchitte and source term f. On the other hand, a new entropy
Fragala (2002)). If  =  dx where  2 L1 (; Rn ) selection principle should be found for the solutions
and if   @, then we find that  = aru, where the of dual problem [12]. In fact, the following partial
result holds: let E : Mb (;  Rn ) ! R [ {1} be the
pair (
u, a) solves the following system:
functional defined by
8
divaru f on  diffusion equation <R d
jruj 1 a.e. on fa > 0g eikonal equation E :  jj logjj dx if dx and 
dj j
:
u0 a.e. on  1 otherwise
@u Assume that [12] admits at least one solution 0
0 on 
@n such that E( 0 ) < 1. Then it can be shown that
652 Convex Analysis and Duality Methods


the sequence {p } does converge weakly-star to , Further Reading
the unique minimizer of the problem
Alberti G, Bouchitte G, and Dal Maso G (2003) The calibration
inffE : solution of 12
g method for the MumfordShah functional and free-disconti-
nuity problems. Calculus of Variations and Partial Differential
The general case, in particular when all optimal Equations 16(3): 299333.
Ambrosio L (2003) Lecture notes on optimal transport problems.
measures are singular, is open.
In: Mathematical Aspects of Evolving Interfaces (Funchal
Remark 29 Variational problems [11], [12] have 2000), Lecture Notes in Mathematics, vol. 1812, pp. 152.
Berlin: Springer.
important counterparts in the theory of elasticity
Borwein M and Lewis SA (2000) Convex Analysis and Nonlinear
and in optimal design problems (see Bouchitte and Optimization. Theory and Examples, CMS Series. Berlin:
Buttazo (2001)). They read, respectively, as Springer.
Z Bouchitte G and Buttazzo G (2001) Characterization of optimal
shapes and masses through MongeKantorovich equations.
max u  df: u 2 \p>1 W 1;p ; R n ;


Journal of the European Mathematical Society 3: 139168.
 Bouchitte G, Buttazzo G, and De Pascale L (2003) A p-Laplacian
rux 2 K a:e: on ; u 0 on  approximation for some mass optimization problems. Journal
of Optimization Theory and Applications 118: 125.
Z Bouchitte G, Buttazzo G, and Seppecher P (1997) Energies with
min  R n2 ;
0K : 2 Mb ; respect to a measure and applications to low dimensional
sym

 structures. Calculus of Variations and Partial Differential
 Equations 5: 3754.
div f on n  Bouchitte G and Dal Maso G (1993) Integral representation and
relaxation of convex local functionals on BV. Annali della
2 Scuola Superiore di Pisa 20(4): 483533.
where K  R nsym ) is a convex compact subset of Bouchitte G and Fragala I (2002) Variational theory of weak
symmetric second-order tensors associated with the geometric structures: the measure method and its applications.
elastic material, 0K () = sup {  z: z 2 K} is convex Variational Methods for Discontinuous Structures, Ser.
positively R1-homogeneous and the functional on PNLDE, vol. 51, pp. 1940. Basel: Birkhauser.
Bouchitte G and Fragala I (2003) Second order energies on thin
measures  0K ( ) is intended in the sense given in
structures: variational theory and non-local effects. Journal of
[1]. A celebrated example is given by Michells Functional Analysis 204(1): 228267.
problem (Michell 1904) where n = 2 and K := {z 2 Bouchitte G and Valadier M (1988) Integral representation of
2
Rnsym , j(z)j  1}, (z) being the largest singular value convex functionals on a space of measures. Journal of
of z. The potential 0K is given by the nondifferenti- Functional Analysis 80: 398420.
Ekeland I and Temam R (1976) Analyse convexe et problemes
able convex function 0K () = 1 () 2 (), where the
variationnels. Paris: Dunod-Gauthier Villars.
i ()s are the singular values of . Evans LC (1997) Partial differential equations and Monge
Kantorovich mass transfer. In: Bott R, Jaffe A, Jerison D,
Unfortunately, it is not known if the vector
Lutsztig G, Singer I, and Yau JT (eds.) Current Developments
variational problem above can be linked to an in Mathematics, pp. 65126. Cambridge.
optimal transportation problem of the type [6], Michell AGM (1904) The limits of economy of material in frame
even if the analogous of equivalence [10] does exist structures. Philosophical Magazine and Journal of Science
in the Michells case, namely (for  convex): 6: 589597.
Rockafellar RT (1970) Convex Analysis. Princeton: Princeton
eu  1 on  University Press.
Villani C (2003) Topics in Optimal transportation, Graduate
() jux  uyjx  yj  jx  yj2 ; 8x; y studies in Mathematics, vol. 58. Providence, RI: AMS.

Cosmic Censorship see Spacetime Topology, Causal Structure and Singularities


Cosmology: Mathematical Aspects 653

Cosmology: Mathematical Aspects


G F R Ellis, University of Cape Town, radiation); in the case of a scalar field with potential
Cape Town, South Africa V() and spacelike surfaces { = const:}, on choosing
2006 Elsevier Ltd. All rights reserved. ua orthogonal to these surfaces, the stress tensor has
2
a perfect-fluid form with  = (1/2) V(),
2
p = (1/2)  V(). A cosmological constant  can
be represented as a perfect fluid with  p = 0,
Introduction  = p. More general matter may involve a momen-
Mathematical cosmology focuses on the geometrical tum flux density qa and anisotropic pressures ab
and mathematical aspects of the study of the (Ehlers 1961). Whatever the nature of the matter, it
universe as a whole. Because the structure of will usually be required to satisfy energy conditions
spacetime (with metric tensor gab (xj )) is governed (Hawking and Ellis 1973). All realistic matter has a
by gravity, with matter and energy causing space- positive inertial mass density:
time curvature according to the nonlinear gravita-
p>0 3
tional field equations of the theory of general
relativity, it has its roots in differential geometry. It (note that realistic cosmological models are non-
is to be distinguished from the three other major empty), whereas all ordinary matter has a positive
aspects of modern cosmology, namely astrophysical gravitational mass density:
cosmology, high-energy physics cosmology, and
observational cosmology; see Peacock (1999) for  3p > 0 4
these aspects. but this is not necessarily true for a scalar field or
The Einstein field equations (EFEs) are effective cosmological constant.
Mathematical cosmology (Ellis and van Elst 1999)
Rab  12 Rgab gab Tab 1
studies (1) generic properties of solutions with a
where Rab is the Ricci tensor, R the Ricci scalar, Tab preferred 4-velocity field and matter content as
the matter tensor,  the cosmological constant, and indicated above, (2) the standard FLRW models,
 the gravitational constant. Cosmological models (3) approximate FLRW solutions, and (4) other
differ from generic solutions of these equations in exact and approximate cosmological solutions. The
that they have preferred world lines in spacetime ultimate underlying issue is (5) the origin of the
associated with the motion of matter and distribu- universe. We look at these in turn. We aim to use
tion of radiation (Ellis 1971). This is a classic case of covariant methods as far as possible, to avoid being
a broken symmetry: the underlying equations [1] are misled by coordinate effects, and to obtain exact
locally Lorentz invariant but their solutions are not. solutions and exact results as far as possible, because
These preferred world lines, characterized by a unit approximate methods can be misleading in the case
4-velocity vector ua , are associated at late times with of these nonlinear field equations.
fundamental observers, and a key aspect of
cosmological modeling is determining the observa-
tional relations such observers would determine Exact Properties
through astronomical observations.
The dynamics of cosmological models is deter- We can split the equations into spacelike and
mined by their matter content. This is usually timelike parts relative to the 4-velocity ua , obtain-
represented in simplified form, often using the ing the (1 3) covariant dynamical equations and
perfect-fluid approximation to represent the effect identities in terms of the fluid shear ab , vorticity
of matter or radiation; that is, !ab , expansion  = ua ;a , and acceleration ab =
ua;b ub (Ehlers 1961, Ellis 1971, Ellis and van Elst
Tab  pua ub pgab 2 1999). The energy density of a perfect fluid obeys
where  is the energy density and p the pressure, and the conservation equation
the matter 4-velocity ub is the preferred cosmo-
S_
logical 4-velocity. This description can include a _ 3 p 5
scalar field  with dynamics governed by the S
KleinGordon equation, provided ua is normal to with extra terms occurring in the case of more
spacelike surfaces { = const}. Suitable equations of complex matter. From the momentum equations,
state describe the nature of the matter envisaged pressure-free solutions are geodesic (ab = 0). The
(e.g., p = 0 for baryons, whereas p = =3 for crucial RaychaudhuriEhlers equation for the
654 Cosmology: Mathematical Aspects

time derivative of the expansion (Ehlers 1961) only if the gravitational field equations remain valid
can be written as to arbitrarily early times; but we would in fact
expect that, at high enough energy densities,

S  quantum gravity would take over from classical
3 2!2  2 ab;b   3p  6 gravity, so whether or not there was indeed a
S 2
singularity would depend on the nature of the as
where the representative length scale S is defined by yet unknown theory of quantum gravity. The cash
 = 3S=S. This is the basis of the fundamental value of the singularity theorems then is the
singularity theorem: if in an expanding universe implication that, when the energy conditions are
! = 0 = ab and the combined matter present satisfies satisfied, one would indeed be involved in such a
[4], with   0, then there was a singularity where quantum gravity realm in the very early universe.
S ! 0 a finite time t0 < 1=H0 ago, H0 = (S=S)0 being
the present value of the Hubble constant. The energy
density will diverge there, so this is a spacetime
The Standard FriedmannLematre
singularity: an origin of physics, matter, and space- Models
time itself. However, the deduction does not follow if The standard models of cosmology are the Fried-
there is rotation or acceleration, which could mannLematre (FL) models with RobertsonWalker
conceivably avoid the singularity, so this result is by (RW) geometry: that is, they are exactly spatially
itself inconclusive for realistic cosmologies. homogeneous and locally isotropic, invariant under a
The vorticity obeys conservation laws analogous G6 of isometries (Robertson 1933, Ehlers 1961).
to those in Newtonian theory (Ehlers 1961). They have a unique cosmic time function t, with
Vorticity-free solutions (! = 0) occur whenever the space sections {t = const:} of constant spatial curva-
fluid flow lines are hypersurface-orthogonal in ture orthogonal to the uniquely preferred 4-velocity
spacetime, that is, there exists a cosmic time ua . The fluid acceleration, vorticity, and shear all
function for the comoving observers, which will vanish, and all physical quantities depend only on the
measure proper time along the flow lines if time coordinate t. They can be represented by a
additionally the fluid flow is geodesic. The rate of metric with scale factor S(t):
change of shear is related to the conformal curvature
(Weyl) tensor, which represents the free gravita- ds2  gab dxa dxb
tional field, and which splits into an electric part Eab dt2 S2 tfdr2 f 2 rd
2 sin2
d2 g
and a magnetic part Hab in close analogy with 7
electromagnetic theory. Shear-free solutions ( = 0)
are very special because they strongly constrain the in comoving coordinates (xa ) = (t, r,
, ), where f (r) =
Weyl tensor; indeed if the flow is shear free and { sin r, r, sinh r} if {k = 1, 0, 1}, and the matter is a
geodesic, then it either does not expand ( = 0), or perfect fluid with 4-velocity vector ua = dxa =ds = 0a .
does not rotate (! = 0) (Ellis 1967). The set of The curvature of the space sections {t = const:} is
cosmological observations associated with generic K = k=S2 ; these 3-spaces are necessarily closed (com-
cosmological models has been characterized in pact) if they are positively curved (k = 1), but may be
power series form by Kristian and Sachs (1966), open or closed in the flat (k = 0) and negatively curved
and that result has been extended to general models (k = 1) cases, depending on their topology
by Ellis et al. (1985). (Lachieze-Rey and Luminet 1995).
The local regularity of the theory is expressed in Matter obeys the conservation equation [5], whose
existence and uniqueness theorems for the EFEs, outcome depends on the equation of state; for
provided the matter behavior is well defined through baryons  = M=S3 , whereas for radiation  = M=S4 ,
prescription of suitable equations of state (Hawking where M is a constant. The dynamics of the models is
and Ellis 1973). However, in general the theory governed by the Raychaudhuri equation
breaks down in the large, and this feature is S 
specified by the HawkingPenrose singularity theo- 3   3p  8
rems, predicting the existence of a geodesic incom- S 2
pleteness of spacetime under conditions applicable which has the Friedmann equation
to realistic cosmological models satisfying the energy
3S_ 2 3k
conditions given by eqns [3] and [4] (Hawking and    2 9
Ellis 1973, Tipler et al. 1980). However, the S2 S
conclusion does not follow if the energy conditions as a first integral whenever S 6 0. Depending on the
are not satisfied. Furthermore, the deduction follows matter components present, one can qualitatively
Cosmology: Mathematical Aspects 655

characterize the dynamical behavior of these models nature of which is most clear when represented in
(Robertson 1933) and find exact and approximate conformal diagrams (Hawking and Ellis 1973, Tipler
solutions to these equations as well as phase planes et al. 1980). These result from the fact that light
representing the relation of the different models to can only proceed a finite distance in the finite time
each other; for example, Ehlers and Rindler (1989) since the origin of the universe, and imply that for
give the phase planes for models with noninteracting a standard radiation-dominated hot-big-bang early
matter and radiation and an arbitrary cosmological universe, regions of larger than 1
angular size on
constant. Universes with maxima or minima in S(t) the surface of last scattering, which emits the CBR,
can only occur if k = 1; when  = 0, the universe are causally disconnected: hence, no causal process
recollapses in the future iff k = 1. Static solutions since the start of the universe can account for the
are possible only if k = 1 and (assuming [4]) extreme isotropy of the CBR (T=T 105 over
 > 0. The simplest expanding solutions are the the whole sky, once a dipole anisotropy T=T
Einsteinde Sitter universes with k = 0 = . 103 due to our local velocity relative to the
Equation [8] is a special case of [6], with cosmological rest frame is allowed for). This is the
corresponding implications: if the combined matter horizon problem, one of the driving forces
present satisfies [4], with   0, then there must have behind the theory of inflation (Guth 1981): the
been an initial singularity, or at least the universe idea that, in the very early universe, a slow-rolling
must have emerged from a quantum gravity domain. scalar field led to a brief exponential expansion
The temperature would have been arbitrarily high in through at least 50 e-folds (during which time the
the past, so there was a hot big bang era in the early spacetime was approximately de Sitter), thus
universe where matter and radiation were in equili- smoothing the universe and solving the horizon
brium with each other at very high temperatures that problem (Guth 1981, Peacock 1999). This is
rapidly fell as the universe expanded. Many physical possible because a scalar field can violate the energy
processes took place then, in particular nucleosynth- condition [3] and so allows acceleration: S > 0.
esis of light elements took place at 109 K. Decou- Consequently, there are now many studies of the
pling of matter and radiation took place at a dynamics of FLRW solutions driven by scalar fields
temperature of  4000 K, followed by formation of and the subsequent decay of these scalar fields into
stars and galaxies (see Peacock (1999) for a discus- radiation. One interesting point is that one can
sion of these physical processes). The black-body obtain exact solutions of this kind for arbitrarily
radiation emitted by the surface of last scattering at chosen evolutions S(t), provided they satisfy a
2
4000 K is observed by us today as cosmic black-body restriction on the magnitude of S , by running the
radiation (CBR) at a temperature of 2.75 K. field equations backwards to determine the needed
One can determine observational relations for potential V() (Ellis and Madsen 1991). The
these models such as the magnituderedshift relation inflationary paradigm is dominant in present-day
for standard candles at recent times from the EFEs theoretical cosmology, but suffers from the problem
(Sandage 1961). The aim of observations is to that it is not in fact a well-defined theory, for there
determine the Hubble constant H0 , dimensionless is no single accepted proposal for the physical
deceleration parameter q0 = (3=H02 )( S=S)0 , and nature of the effective scalar field underlying the
normalized density parameters 0i = 0i =3H02 for supposed exponential expansion; rather there are
each component of matter present. The spatial numerous competing proposals. As the inflaton has
curvature and the cosmological constant then follow not yet been identified, this theory is not yet
from [6] and [9]; also the present scale factor S0 is soundly linked to well-established physics.
determined if k 6 0. The universe is of positive
spatial curvature
P (k = 1) iff 0  m  > 1,
Approximate FL Solutions
where m  i 0i ,  = =3H02 . Current observa-
tions indicate m 0.3,  0.7, 0 1.02 The real universe is, of course, not exactly FL, and
0.02. Because the nucleosynthesis results limit the studies of structure formation depend on studies of
baryon density to a very low value (0b 0.02), solutions that are approximately FL models they
which is about the same as the density of luminous are realistic (lumpy) universe models. These
matter, this indicates the dominant presence of both enable detailed studies of observable properties
nonbaryonic dark matter and a repulsive force such as CBR anisotropies and gravitational lensing
corresponding to either a cosmological constant or induced by matter inhomogeneities, and of the
varying scalar field (dark energy). development of those inhomogeneities from quan-
Crucial causal limitations occur because of the tum fluctuations in the very early universe that then
existence of particle horizons (Rindler 1956), the get expanded to very large scales by inflation.
656 Cosmology: Mathematical Aspects

The key problem here is that apart from the standard of the CBR. The EhlersGerenSachs (EGS) theorem
coordinate freedom allowed in general relativity, there (Ehlers et al. 1968) provides a sound basis for this
is a serious gauge issue: the background FL model is not argument: it shows that if freely propagating CBR
uniquely determined by the realistic universe model; (obeying the Liouville equation) is exactly isotropic in
however, the magnitudes of many perturbed quantities an expanding universe domain U,then the universe is
depend on how it is fitted into the lumpy model. For exactly FL in that domain (i.e., it has exactly the RW
example, the density perturbation  is determined spatially homogenous and isotropic geometry there),
pointwise by the equation the point being that any inhomogeneities in the
matter distribution between us and the surface of last
xi  xi  xi
scattering will produce anisotropies in the CBR
where (xi ) is the background density. But by temperature we measure. But that result does not
altering the correspondence between the background apply to the real universe, because the CBR is not
and realistic models (specifically, by the choice of exactly isotropic. The almost EGS theorem
surfaces (xi ) = const. in the realistic model) one can (Stoeger et al. 1995) shows that this result is stable:
assign that quantity any value, including zero (if one almost isotropic CBR in the domain U implies that
chooses (xi ) = (xi )). This is the gauge problem. the universe is almost-FL in that domain. The
One can handle it by using standard variables and application to the real universe comes by making a
keeping close track of the gauge freedom at all weak Copernican assumption: we assume we are
times. However, one then ends up with higher-order not special, so all observers in U (taken to be the
equations than necessary because some of the visible part of the universe) will also see almost
perturbation modes present are pure gauge modes isotropic CBR, just as we do. The result then
with no physical significance. Alternatively, one can follows. A further argument for homogeneity of the
fix the gauge by some unique specification of how universe comes from postulating uniform thermal
the background model is fitted into the realistic histories (Bonnor and Ellis 1986), but that argument
model, but there is no agreement on a unique way to is yet to be completed and applied in a practical way.
do this, and different choices give different answers.
The preferable resolution is to use gauge-invariant
Anisotropic and Inhomogeneous Models
variables, either coordinate based (Bardeen 1980) or
covariant, based on the (13) covariant decomposi- The FL universes are geometrically extremely special.
tion of spacetime quantities mentioned above (Ellis We wish further to understand the full range of
and Bruni 1989), in either case resulting in pertur- possible universe models, their dynamical behaviors,
bation equations without gauge freedom and of and which of them might, at some epoch, realistically
order corresponding to the physical degrees of represent the real universe. This enables us to see how
freedom. The key point in the latter approach is to the approximate FL models fit into this wider set of
choose covariant variables that vanish in the back- possibilities, and under what circumstances they are
ground spacetime; they are then automatically gauge attractors in this set of cosmologies.
invariant. Realistic structure formation studies carry Exact solutions are characterized by their space-
out this process for a mixture of matter components time symmetries. Symmetries are characterized by
with different average velocities, and extend to a the dimension s of the surfaces of homogeneity and
kinetic theory description of the background radia- the dimension q of the isotropy group at a general
tion (see Ellis and van Elst (1999) and references point, together giving the dimension r = s t of the
therein). The outcome is a prediction of the CBR group of isometries Gr (at special points, such as a
anisotropy power spectrum, determined by the center of symmetry, s can decrease and q increase
inhomogeneities in the gravitational field and the but always so that r stays unchanged). In the case of
motions of the matter components at decoupling a cosmological model, because the 4-velocity ua is
(Sachs and Wolfe 1967). This spectrum can then be invariant under isotropies, the only possible dimen-
compared with observations and used in determin- sions for the isotropy group are q = 3, 1, 0; whereas
ing the values of the cosmological parameters the dimension t of the surfaces of homogeneity can
mentioned above (see Peacock 1999). take any value from 4 to 0. This gives the basis for a
One crucial issue is why it is reasonable to use a classification of cosmological spacetimes (Ellis 1967,
perturbed FL model for the observable region of the Ellis and van Elst 1999).
universe. The key argument is that this is plausible When q = 3, we have isotropic solutions there
because of the high isotropy of all observations are no preferred spatial directions and it is then
around us when averaged on a sufficiently large a theorem that they must be spatially homoge-
spatial scale, and particularly the very low anisotropy neous FL universes (Ehlers 1961). When q = 1, we
Cosmology: Mathematical Aspects 657

have locally rotationally symmetric (LRS) solu- times. This is an indication that inflation can
tions, with precisely one preferred spacelike direc- succeed in making anisotropic early states resemble
tion at a generic point (Ellis 1967). When q = 0, the FL models at later times. Observational properties
solutions are anisotropic in that there can be no like element abundances and CBR anisotropy
continuous group of rotations leaving the solution patterns can be worked out in these models (some
invariant; however, there can be discrete isotropies of them develop a characteristic isolated hot spot
in some special cases. in the CBR sky). For q = 1 (r = 4), we have spatially
When t = 4,we have spacetime homogeneous solu- homogeneous LRS models, either Kantowski Sachs
tions, with all physical quantities constant; they cannot or Bianchi universes, and again observations can be
expand (by [5] and [3]). Nevertheless, two cases are of worked out in detail and phase planes developed
interest. For q = 1 (r = 5) we find the Godel universe, showing their dynamical behavior, often isotropiz-
rotating everywhere with constant vorticity, which ing at late times. There are orthogonal and tilted
illustrates important causal anomalies (Godel 1949, cases, the latter possibly involving nonscalar singu-
Hawking and Ellis 1973). For q = 3 (r = 6), we find larities. For q = 3 (r = 6), we have the isotropic FL
the Einstein static universe (Einstein 1917), the models, discussed above. Both the LRS and isotropic
unique nonexpanding FL model with k = 1 and  > 0. cases could be good models of the real universe.
It is of interest because it could possibly represent the When t = 2, we have inhomogeneous evolving
asymptotic initial state of nonsingular inflationary models. This is a very large family, but the LRS
universe models (Ellis et al. 2003). The higher- (q = 1, r = 2) cases have been examined in detail; in
symmetry models (de Sitter and anti-de Sitter the case of pressure-free matter, these are the
universes with higher-dimensional isotropy groups) TolmanBondi inhomogeneous models (Bondi
are not included here because they do not obey the 1947) that can be integrated exactly, and have
energy condition [3] they are empty universes, been used for many interesting astrophysical and
which can be interesting asymptotic states but are cosmological studies. Krasinski (1997) gives a very
not by themselves good cosmological models. complete catalog of these and lower-symmetry
When t = 3, we have spatially homogeneous inhomogeneous models and their uses in cosmology.
evolving universe models. For q = 0 (r = 3), there A considerable challenge is the dynamical systems
are a large family of Bianchi universes, spatially analysis for generic inhomogeneous models, needed
homogeneous but anisotropic, characterized into to properly understand the early evolution of generic
nine types according to the structure constants of universe models (Uggla et al. 2003), and hence to
the Lie algebra of the three-dimensional symmetry determine what is generic behavior.
group G3 . These can be orthogonal: the fluid flow
is orthogonal to the surfaces of homogeneity, or
The Origin of the Universe
tilted; the latter case can have fluid rotation or
acceleration, but the former cannot. They exhibit a The issue underlying all this is what led to the initial
large variety of behaviors, including power-law, conditions for the universe, for example, providing
oscillatory, and nonscalar singularities (Tipler et al. the starting conditions for inflation. There are many
1980). A vexed question is whether truly chaotic approaches to studying the quantum gravity phase
behavior occurs in Bianchi IX models. The behavior of cosmology, including the Wheelerde Witt equa-
of large families of these models has been character- tion, the path-integral approach, string cosmology,
ized in dynamical systems terms (Wainwright and pre-big bang theory, brane cosmology, the ekpyrotic
Ellis 1996), showing the intriguing way that higher- universe, the cyclic universe, and loop quantum
symmetry solutions provide a skeleton that guides gravity approaches. These lie beyond the purview of
the behavior of lower-symmetry solutions in the the present article, except to say that they are all
space of spacetimes. Many Bianchi models can be based on unproven extrapolations of known physics.
shown to isotropize at late times, particularly if The physically possible paths will become clearer as
viscosity is present; thus, they are asymptotic to the the nature of quantum gravity is elucidated.
FL universes in the far future. In some cases, Bianchi It is pertinent to note that there exist nonsingular
models exhibit intermediate isotropization: they are realistic cosmological solutions, possible in the light
much like FL models for a large part of their life, but of the violations of the energy condition enabled by
are very different from it both at very early and very the supposed scalar fields that underlie inflationary
late stages of their evolution. These could be good universe theory. These nonsingular solutions can even
models of the real universe. An important theorem avoid the quantum gravity era (Ellis et al. 2003).
by Wald (1983) shows that a cosmological constant However, they have very fine-tuned initial conditions,
will tend to isotropize Bianchi solutions at late which is nowadays considered as a disadvantage; but
658 Cotangent Bundle Reduction

there is no proof that whatever processes led to the Ellis GFR (1971) In: Sachs RK (ed.) General Relativity and
existence of the universe preferred generic rather than Cosmology, Proc. Int. School of Physics Enrico Fermi,
Course XLVII, p. 104. Academic Press.
fine-tuned conditions; this is a philosophical rather Ellis GFR and Bruni M (1989) Physical Review D 40: 1804.
than physical assumption. It may well be that, as Ellis GFR and van Elst H (1999) In: Lachieze-Ray M (ed.) Theoretical
regards the start of the universe, the options are that and Observational Cosmology, vol. 541 [gr-qc/9812046], Nato
either an initial singularity occurred, or the initial Series C: Mathematical and Physical Sciences: Kluwer.
conditions were very finely tuned and allowed an Ellis GFR and Madsen M (1991) Classical and Quantum Gravity
8: 667.
infinitely existing universe. Investigation of whether Ellis GFR, Murugan J, and Tsagas CG (2003) gr-qc/0307112.
this conjecture is in fact valid, and if so which is the Ellis GFR, Nel SD, Stoeger W, Maartens R, and Whitman AP
best option, are intriguing open topics. (1985) Physics Reports 124(5 and 6): 315.
Godel K (1949) Reviews of Modern Physics 21: 447.
See also: Einstein Equations: Exact Solutions; Guth A (1981) Physical Review D 23: 347.
EinsteinCartan Theory; General Relativity: Experimental Hawking SW and Ellis GFR (1973) The Large Scale Structure of
Space Time. Cambridge: Cambridge University Press.
Tests; General Relativity: Overview; Gravitational
Krasinski A (1997) Inhomogeneous Cosmological Models.
Lensing; Lie Groups: General Theory; Newtonian Limit of
Cambridge: Cambridge University Press.
General Relativity; Quantum Cosmology; Shock Wave Kristian J and Sachs RK (1966) The Astrophysical Journal 143: 379.
Refinement of the FriedmanRobertsonWalker Metric; Lachieze-Rey M and Luminet JP (1995) Physics Reports
Spacetime Topology, Causal Structure and Singularities; 254: 135214.
String Theory: Phenomenology. Robertson HP (1933) Reviews of Modern Physics 5: 62.
Peacock JA (1999) Cosmological Physics. Cambridge: Cambridge
University Press.
Further Reading Rindler W (1956) Monthly Notices of the Royal Astronomical
Society 116: 662.
Bardeen JM (1980) Physical Review D 22: 1882. Sachs RK and Wolfe A (1967) Astrophysical Journal 147: 73.
Bondi H (1947) Monthly Notices of the Royal Astronomical Sandage A (1961) Astrophysical Journal 133: 355.
Society 107: 410. Stoeger W, Maartens R, and Ellis GFR (1995) Astrophysical
Bonnor WB and Ellis GFR (1986) Monthly Notices of the Royal Journal 443: 1.
Astronomical Society 218: 605. Tipler FJ, Clarke CJS, and Ellis GFR (1980) In: Held A (ed.)
Ehlers J (1961) Abh Mainz Akad Wiss u Lit (translated in Gen General Relativity and Gravitation: One Hundred Years after
Rel Grav 25: 1225, 1993). the Birth of Albert Einstein, vol. 2, p. 97. Plenum.
Ehlers J, Geren P, and Sachs RK (1968) Journal of Mathematical Uggla C, van Elst H, Wainwright J, and Ellis GFR (2003) Physical
Physics 9: 1344. Review D gr-qc/0304002 (to appear).
Ehlers J and Rindler W (1989) Monthly Notices of the Royal Wainwright J and Ellis GFR (eds.) (1996) The Dynamical Systems
Astronomical Society 238: 503. Approach to Cosmology. Cambridge: Cambridge University
Einstein A (1917) Sitz Ber Preuss Akad Wiss (translated in The Press.
Principle of Relativity, 1993). Dover. Wald RM (1983) Physical Review D 28: 2118.
Ellis GFR (1967) Journal of Mathematical Physics 8: 1171.

Cotangent Bundle Reduction


J-P Ortega, Universite de Franche-Comte, tangent vector V q 2 T q (T Q), where Q : T Q ! Q
Besancon, France is the cotangent bundle projection and T q Q :
T S Ratiu, Ecole Polytechnique Federale T q (T Q) ! Tq Q is its tangent map (or derivative)
de Lausanne, Lausanne, Switzerland at q. In natural cotangent bundle coordinates (qi , pi ),
2006 Elsevier Ltd. All rights reserved. we have Q = pi dqi and Q = dqi ^ dpi .
Let  : G Q ! Q be a left smooth action of the Lie
group G on the manifold and Q. Denote by
g  q = (g, q) the action of g 2 G on the point q 2 Q
Introduction
and by g : Q ! Q the diffeomorphism of Q induced
The general symplectic reduction theory (see by g. The lifted left action G T Q ! T Q, given by

Symmetry and Symplectic Reduction) becomes g  q = Tgq g1 ( q ) for g 2 G and q 2 Tq Q,
much richer and has many applications if the preserves Q , and admits the equivariant momentum
symplectic manifold is the cotangent bundle map J : T Q ! g whose expression is h J( q ), i =
(T Q, Q = dQ ) of a manifold Q. The canonical q ((Q (q)), where  2 g , the Lie algebra of G, h , i : g
1-form Q on T Q is given by Q ( q )(V q ) = g ! R is the duality pairing between the dual g and g ,
q (T q Q (V q )), for any q 2 Q, q 2 Tq Q, and and Q (q) = d( exp t, q)=dtjt = 0 is the value of the
Cotangent Bundle Reduction 659

infinitesimal generator vector field Q of the G-action between TQ and T  Q . Note that if g is abelian or
at q 2 Q (see Hamiltonian Group Actions and =0, the embedding  is always onto and thus the
Symmetries and Conservation Laws). Throughout reduced space is again, topologically, a cotangent
this article, it is assumed that the G-action on Q, bundle.
and hence on T  Q, is free and proper. Recall also It should be noted that there is a choice in this
that ((T  Q) , (Q ) ) denotes the reduced manifold theorem, namely the 1-form  . Whereas the
at  2 g  (see Symmetry and Symplectic Reduction), reduced symplectic space ((T  Q) , (Q ) ) is intrin-
where (T  Q) := J 1 ()=G is the orbit space of the sic, the symplectic structure on the space T  Q
G -action on the momentum level manifold J 1 () depends on  . The theorem above states that no
and G := {g 2 G j Adg  = } is the isotropy sub- matter how  is chosen, there is a symplectic
group of the coadjoint representation of G on g  . diffeomorphism, which also depends on  , of the
The left-coadjoint representation of g 2 G on  2 g  reduced space onto a submanifold of T  Q .
is denoted by Adg1 .
Cotangent bundle reduction at zero is already quite
Connections
interesting and has many applications. Let  : Q ! Q=G
be the G-principal bundle projection defined by the The 1-form  is usually obtained from a left
proper free action of G on Q, usually referred to as the connection on the principal bundle  : Q ! Q=G or
shape space bundle. Zero is a regular value of J and the  : Q ! Q=G. A left connection 1-form A 2 1 (Q; g )
map 0 : ((T  Q)0 , (Q )0 ) !(T  (Q=G), Q=G ) given on the left principal G-bundle  : Q ! Q=G is a Lie
by 0 ([q ])(Tq (vq )) := q (vq ), where q 2 J 1 (0), algebra-valued 1-form A : TQ ! g , where g denotes
[q ] 2 (T  Q)0 , and vq 2 Tq Q, is a well-defined sym- the Lie algebra of G, satisfying the conditions A(Q ) = 
plectic diffeomorphism. for all  2 g and A(Tq g (v)) = Adg (A(v)) for all g 2 G
This theorem generalizes in two nontrivial ways and v 2 Tq Q, where Adg denotes the adjoint action of
when one reduces at a nonzero value of J: an G on g . The horizontal vector sub-bundle HQ of the
embedding and a fibration theorem. connection A is defined as the kernel of A, that is, its
fiber at q 2 Q is the subspace Hq := ker A(q). The map
vq 7! verq (vq ) := [A(q)(vq )]Q (q) is called the vertical
Embedding Version of Cotangent
projection, while the map vq 7! horq (vq ) := vq 
Bundle Reduction
verq (vq ) is called the horizontal projection. Since for
Let  2 g  , Q := Q=G ,  : Q ! Q the projection any vector vq 2 Tq Q we have vq = verq (vq ) horq (vq ),
onto the G -orbit space, g  := { 2 g j ad  = 0} the it follows that TQ = HQ  VQ and the maps
Lie algebra of the coadjoint isotropy subgroup G , horq : Tq Q ! Hq Q and verq : Tq Q ! Vq Q are projec-
where ad  := [, ] for any ,  2 g , ad : g  ! g  the tions onto the horizontal and vertical subspaces at every
dual map, 0 := jg  2 g  the restriction of  to g  , q 2 Q.
and ((T  Q) , (Q ) ) the reduced space at . The Connections can be equivalently defined by the
induced G -action on T  Q admits the equivariant choice of a sub-bundle HQ  TQ complementary to
momentum map J  : T  Q ! g  given by J  (q ) = the vertical sub-bundle VQ satisfying the following
J(q )jg  . Assume there is a G -invariant 1-form  G-invariance property: Hgq Q = Tq g (Hq Q) for
on Q with values in ( J  )1 (0 ). Then there is a unique every g 2 G and q 2 Q. The sub-bundle HQ is called,
closed 2-form  on Q such that   = d . Define as before, the horizontal sub-bundle and a connection
the magnetic term B := Q  , where Q : 1-form A is defined by setting A(q)(Q (q) uq ) = ,
T  Q ! Q is the cotangent bundle projection, for any  2 g and uq 2 Hq Q.
which is a closed 2-form on T  Q . Then the map The curvature of the connection A is the Lie
 : ((T  Q) , (Q ) ) ! (T  Q , Q  B ) given by algebra-valued 2-form on Q defined by B(uq , vq ) =
 ([q ])(Tq  (vq )):= (q   (q))(vq ), for q 2 J 1 (), dA(horq (uq ), horq (vq )). When one replaces vectors in
[q ]2(T  Q) , and vq 2Tq Q, is a symplectic embed- the exterior derivative with their horizontal projec-
ding onto a submanifold of T  Q covering the base tions, then the result is called the exterior covariant
Q . The embedding  is a diffeomorphism onto derivative and the preceding formula for B is often
T  Q if and only if g =g  . If the 1-form  takes written as B = DA. Curvature measures the lack of
values in the smaller set J 1 () then the image of  is integrability of the horizontal distribution, namely
the the vector sub-bundle [T (VQ)] of T  Q , where B(u, v) = A([hor(u), hor(v)]) for any two vector
VQ  TQ is the vertical vector sub-bundle consisting fields u and v on Q. The Cartan structure equations
of vectors tangent to the G-orbits, that is, its fiber at state that B(u, v) = dA(u, v)  [A(u), A(v)], where
q2Q equals Vq Q={Q (q) j  2g }, and  denotes the the bracket on the right hand side is the Lie
annihilator relative to the natural duality pairing bracket in g .
660 Cotangent Bundle Reduction

Since the connection A is a Lie algebra-valued V : Q ! R. If there is a Lie group G acting on Q by


1-form, for each  2 g  the formula  (q) := isometries and leaving the potential invariant, then
A(q) (), where A(q) : g  ! Tq Q is the dual of the we have a simple mechanical system with symmetry.
linear map A(q) : Tq Q ! g , defines a usual 1-form on The amended or effective potential V : Q ! R at
Q. This 1-form  takes values in J 1 () and is  2 g  is defined by V := H   , where  is the
equivariant in the following sense: g  = Adg  for 1-form associated to the mechanical connection. Its
any g 2 G. expression in terms of the locked moment  of inertia

tensor is given by V (q) := V(q) (1=2) , I(q)1  .
Magnetic Terms and Curvature The amended potential naturally induces a smooth
function V b  2 C1 (Q=G ).
There are two methods to construct the 1-form 
from a connection. The first is to start with a The fundamental result about simple mechanical
connection 1-form A 2 1 (Q; g  ) on the principal systems with symmetry is the following. The push-
G -bundle  : Q ! Q=G . Then the 1-form  := forward by the embedding  : ((T  Q) , (Q ) ) !
hjg  , A i 2 1 (Q) is G -invariant and has values in (T  Q , Q  B ) of the reduced Hamiltonian
( J  )1 (jg  ). The magnetic term B is the pullback to H 2 C1 ((T  Q) ) of a simple mechanical system
T  (Q=G ) of the jg  -component d of the H = KV  Q 2 C1 (T  Q) is the restriction to the
curvature of A thought of as a 2-form on the vector sub-bundle  ((T  Q) )  T  (Q=G ), which
base Q=G . is also a symplectic submanifold of (T  (Q=G ),
The second method is to start with a connection Q=G  B ), of the simple mechanical system on
A 2 1 (Q, g ) on the principal bundle  : Q ! Q=G, T  (Q=G ) whose kinetic energy is given by the
to define  := h, Ai 2 1 (Q), and to observe that quotient Riemannian metric on Q=G and whose
potential is V b  . However, Hamiltons equations on
this 1-form is G -invariant and has values in J 1 ().
The magnetic term B is in this case the pullback to T  (Q=G ) for this simple mechanical system are
T  (Q=G ) of the -component d of the curvature computed relative to the magnetic symplectic form
of A thought of as a 2-form on the base Q=G . Q=G B .
There is a wealth of applications starting from
The Mechanical Connection this classical theorem to mechanical systems, span-
ning such diverse areas as topological characteriza-
If (Q, hh , ii) is a Riemannian manifold and G acts by tion of the level sets of the energymomentum map
isometries, there is a natural connection on the to methods of proving nonlinear stability of relative
bundle  : Q ! Q=G, namely, define the horizontal equilibria (block-diagonalization of the stability
space at a point to be the metric orthogonal to the form in the application of the energymomentum
vertical space. This connection is called the mechan- method).
ical connection and its horizontal bundle consists of
all vectors vq 2 TQ such that J(hhvq , ii) = 0. Fibration Version of Cotangent Bundle Reduction
To determine the Lie algebra-valued 1-form A of
this connection, the notion of locked inertia tensor There is a second theorem that realizes the reduced
needs to be introduced. This is the linear map space of a cotangent bundle as a locally trivial
I(q) : g ! g  depending smoothly on q 2 Q defined by bundle over shape space Q=G. This version is
the identity hI(q), i = hhQ (q), Q (q)ii for any particularly well suited in the study of quantization
,  2 g . Since the G-action is free, each I(q) is problems and in control theory. The result is the
invertible. The connection 1-form whose horizontal following. Assume that G acts freely and properly
space was defined above is given by A(q)(vq ) = on Q. Then the reduced symplectic manifold (T  Q)
I(q)1 ( J(hhvq , ii)). is a fiber bundle over T  (Q=G) with fiber the
Denote by K : T  Q ! R the kinetic energy of the coadjoint orbit O . How this is related to the
metric hh , ii on the cotangent bundle, that is, Poisson structure of the quotient (T  Q)=G will be
K(hhvq , ii) := (1=2)kvq k2 . The 1-form  = A(  )  is discussed later.
characterized for the mechanical connection A by the
condition K( (q)) = inf {K(q ) j q 2 J 1 () \ Tq Q}. The KaluzaKlein Construction
The extra term in the symplectic form of the reduced
The Amended Potential
space is called a magnetic term because it has this
A simple mechanical system is a Hamiltonian system interpretation in electromagnetism. To understand
on a cotangent bundle T  Q whose Hamiltonian why B is called a magnetic term, consider the
function is the the sum of the kinetic energy of a problem of a particle of mass m and charge e
Riemannian metric on Q and a potential function moving in R3 under the influence of a given
Cotangent Bundle Reduction 661

magnetic field B = Bx i By j Bz k, divB = 0. The symplectically diffeomorphic to (T  R3 , dx ^ dpx


Lorentz force law (written in the International dy ^ dpy dz ^ dpz  B), which coincides with the
System) gives the equations of motion phase space in Answer 1 if we put  = e. This also
gives the physical interpretation of the momentum
dv map J : T  Q = R3  S1  R3  R ! g  = R, J(q, ;
m ev  B 1
dt p, p) = p and hence of the variable conjugate to
where e is the charge and v = (x, _ y,_ z_ ) = q_ is the the circle variable : p represents the charge.
velocity of the particle. What is the Hamiltonian Moreover, the magnetic term in the symplectic
description of these equations? form is, up to a charge factor, the magnetic field.
There are two possible answers to this question. The kinetic energy Hamiltonian
To formulate them, associate to the divergence free 1 1
vector field B the closed 2-form B = Bx dy ^ dz hq; ; p; p : kpk2 p2
2m 2
By dx ^ dz Bz dx ^ dy. Also, write B = curl A for
some other vector field A = (Ax , Ay , Az ) on R 3 , of the KaluzaKlein metric, that is, the Riemannian
called the magnetic potential. metric obtained by keeping the standard metrics on
Answer 1 Take on T  R3 the symplectic form each factor and declaring R3 and S1 orthogonal,
B = dx ^ dpx dy ^ dpy dz ^ dpz  eB, where induces the reduced Hamiltonian
(px , py , pz ) = p := mv is the momentum of the 1 1
particle, and h = mkvk2 =2 = m(x_ 2 y_ 2 z_ 2 )=2 is the h q kpk2 2
2m 2
Hamiltonian, the kinetic energy of the particle. A
direct verification shows that dh = B (Xh ,  ), where which, up to the constant 2 =2, equals the kinetic
energy Hamiltonian in Answer 1. Note that this
@ @ @ @ reduced system is not the geodesic flow of the
Xh x_ y_ z_ eBz y_  By z_
@x @y @z @px Euclidean metric because of the presence of the
@ @ magnetic term in the symplectic form. However,
eBx z_  Bz x
_ eBy x_  Bz x
_ 2 the equations of motion of a charged particle in a
@py @pz
magnetic field are obtained by reducing the geodesic
which gives the equations of motion [1]. flow of the KaluzaKlein metric.
Answer 2 Take on T  R3 the canonical symplec- A similar construction is carried out in Yang
tic form  = dx ^ dpx dy ^ dpy dz ^ dpz and the Mills theory where A is a connection on a principal
Hamiltonian hA = kp  eAk2 =2m. A direct verifica- bundle and B is its curvature. Magnetic terms also
tion shows that dhA = (XhA ,  ), where XhA has the appear in classical mechanics. For example, in
same expression [2]. rotating systems the Coriolis force (up to a dimen-
Next we show how the magnetic term in the sional factor) plays the role of the magnetic term.
symplectic form B is obtained by reduction from
the KaluzaKlein system. Let Q = R3  S1 with Reconstruction of Dynamics
the circle G = S1 acting on Q, only on the second for Cotangent Bundles
factor. Identify the Lie algebra g of S1 with R. Since
A general reconstruction method of the dynamics
the infinitesimal generator of this action defined
from the reduced dynamics was given in (see
by  2 g = R has the expression Q (q, ) = (q, ; 0, ),
Symmetry and Symplectic Reduction). For cotangent
if TS1 is trivialized as S1  R, a momentum
bundles, using the mechanical connection, this
map J : T  Q = R3 S1 R 3 R ! g  = R is given by
method simplifies considerably.
J(q, ; p,p) = (p,p)  (0,)= p, that is, J(q, ; p,p)=p.
Start with the following general situation. Let G act
In this case, the coadjoint action is trivial, so for any
freely on the configuration manifold Q; let h : T  Q !
 2 g  = R, we have G = S1 , g  = R, and 0 = . The
R be a G-invariant Hamiltonian,  2 g  , q 2 J 1 (),
1-form  = (Ax dx Ay dy Az dz d ) 2 1 (Q),
and c (t) the integral curve of the reduced system with
where d denotes the length 1-form on S1 , is clearly
initial condition [q ] 2 (T  Q) given by the reduced
G = S1 -invariant, has values in J 1 () = {(q, ; p, ) j
Hamiltonian function h : (T  Q) ! R. In terms of a
q, p 2 R 3 , 2 S1 }, and its exterior differential equals
connection A 2 1 ( J 1 (); g  ) on the left G -principal
d = B. Thus, the closed 2-form  on the base
bundle J 1 () ! (T  Q) the reconstruction procedure
Q = Q=G = Q=S1 = R3 equals B and hence
proceeds in four steps:
the magnetic term, that is, the closed 2-form
B = Q  on T  Q = T  R 3 , is also B since
Step 1: Horizontally lift the curve c (t) 2 (T  Q)
Q : Q = R 3  S1 ! Q=G = R3 is the projection. to a curve d(t) 2 J 1 () with d(0) = q .
Therefore, the reduced space (T  Q) is
Step 2: Set (t) = A(d(t))(Xh (d(t))) 2 g  .
662 Cotangent Bundle Reduction

Step 3: With (t) 2 g  determined in step 2, solve (c) Reconstruction of dynamics for simple
the nonautonomous differential equation g(t) _ = mechanical systems with symmetry. The case of
Te Lg(t) (t) with initial condition g(0) = e, where Lg simple mechanical systems with symmetry deserves
denotes left translation on G; this is the step that special attention since several steps in the recon-
involves quadratures and is the main obstacle struction method can be simplified. For simple
to finding explicit formulas. mechanical systems, the knowledge of the base

Step 4: The curve c(t) = g(t)  d(t), with d(t) found integral curve q(t) suffices to determine the entire
in step 1 and g(t) found in step 3 is the integral integral curve on T  Q. Indeed, if h = K V  q is
curve of Xh with initial condition c(0) = q . the Hamiltonian, the Legendre transformation
Fh : T  Q ! TQ determines the Lagrangian system
This method depends on the choice of the conne-
on TQ given by (uq ) = (1=2)kuq k2  V(uq ), for
ction A 2 1 ( J 1 (); g  ). Here are several particular
uq 2 Tq Q. Lagranges equations are second-order
cases when this procedure simplifies.
and thus the evolution of the velocities is given by
(a) One-dimensional coadjoint isotropy group. If
the time derivative q(t) _ of the base integral curve.
G = S1 or G = R, identify g  with R via the map
Since Fh = (F)1 , the solution of the Hamiltonian
a 2 R $ a
2 g  , where
2 g  ,
6 0, is a generator of
system is given by F(q(t)). _ Using the explicit
g  . Then a connection 1-form on the S1 (or R)
expression of the mechanical connection and the
principal bundle J 1 () ! (T  Q) is the 1-form A on
notation given in the general procedure, the method
J 1 () given by A = (1=h,
i)  , where  is the
of reconstruction simplifies to the following steps.
pullback of the canonical 1-form 2 1 (T  Q) to
To find the integral curve c(t) of the simple mecha-
the submanifold J 1 (). The curvature of this
nical system with G-symmetry h = K V  Q on
connection is the 2-form on (T  Q) given by
T  Q with initial condition c(0) = q 2 Tq Q, know-
curv(A) = (1=h,
i)! , where ! is the reduced
ing the integral curve c (t) of the reduced Hamil-
symplectic form on (T  Q) . In this case, the curve
tonian system on (T  Q) given by the reduced
(t) 2 g  in step 2 is given by (t) = [h](d(t)), where
Hamiltonian function h : (T  Q) ! R with initial
 2 X(T  Q) is the Liouville vector field character-
condition c (0) = [q ] one proceeds in the follow-
ized by the property of being the unique vector field
ing manner. Recall the symplectic embedding
on T  Q that satisfies the relation d (, ) = . In
 : ((T  Q) , (Q ) ) ! (T  (Q=G ), Q=G  B ). The
canonical coordinates (qi , pi ) on T  Q,  = pi @p @
.
i curve  (c (t)) 2 T  (Q=G ) is an integral curve of
(b) Induced connection. Any connection A 2
the Hamiltonian system on (T  (Q=G ), Q=G  B )
1 (Q; g  ) on the left principal bundle Q ! Q=G
given by the function that is the sum of the kinetic
induces a connection A 2 1 ( J 1 (); g  ) by A(q )
energy of the quotient Riemannian metric and the
(Vq ) := A(q)(Tq Q (Vq )), where q 2 Q, q 2 Tq Q, b  . Let q (t) :=
quotient amended potential V
Vq 2 Tq (T  Q), and Q : T  Q ! Q is the cotangent
Q=G (c (t)) be the base integral curve of this system,
bundle projection. In this case, the curve (t) 2 g  in
where Q=G : T  (Q=G ) ! Q=G is the cotangent
step 2 is given by (t) = A(q(t))(Fh(d(t)), where
bundle projection.
q(t) := Q (d(t)) is the base integral curve and the
vector bundle morphism Fh : T  Q ! TQ is the fiber
Step 1: Relative to the mechanical connection
derivative of h given by Amech 2 1 (Q; g  ), horizontally lift q (t) 2 Q=G
 to a curve qh (t) 2 Q passing through qh (0) = q.
d
Step 2: Determine (t) 2 g  from the algebraic system
Fhq q :  hq tq
dt t0 hh(t)Q (qh (t)), Q (qh (t))ii = h, i for all  2 g  ,
where hh , ii is the G-invariant kinetic energy
for any q , q 2 Ta Q. Two particular instances of
Riemannian metric on Q. This implies that q_ h (0)
this situation are noteworthy.
and (0)Q (q) are the horizontal and vertical compo-
(b1) Assume that the Hamiltonian h is that of a nents of the vector ]q 2 Tq Q which is associated by
simple mechanical system with symmetry. the metric hh , ii to the initial condition q .
Choosing A to be the mechanical connection
Step 3: Solve g(t) _ = Te Lg(t) (t) in G with initial
Amech , the curve (t) 2 g  in step 2 is given by condition g(0) = e.
(t) = Amech (q(t)) (hhd(t), ii).
Step 4: The curve q(t) := g(t)  qh (t), with qh (t)
(b2) If Q = G is a Lie group, dim G = 1, and
is a and g(t) determined in steps 2 and 4, respectively,
generator of g  , then the connection A 2 1 (G) is the base integral curve of the simple mechanical
can be chosen to equal A(g) := (1=h,
i) system with symmetry defined by the function h
Tg Rg1 (), where
is a generator of g  and Rg satisfying q(0) = 0. The curve (Fh)1 (q(t)) _ 2 TQ
is right translation on G. is the integral curve of this system with initial
Cotangent Bundle Reduction 663

condition c(0) = q . In addition, q0 (t) = g(t)  group and (  ,  ) is a positive-definite metric


(q_ h (t) (t)Q (qh (t))) is the horizontal plus vertical invariant under the adjoint action of G on g
decomposition relative to the connection induced satisfying (
, ) = hh
Q (q), Q (q)ii for all q 2 Q
on J 1 () ! (T  Q) by the mechanical connection and
,  2 g , then the element (t) in step 2 can
Amech 2 1 (Q; g  ). be chosen to be constant and is determined by
the identity (, ) = jg  on g  . The solution of
There are several important situations when
the equation on step 3 is then g(t) = exp(t).
step 3, the main obstruction to an explicit solution is proportional to (t). Try
(c3) The case when (t)
of the reconstruction problem, can be carried out.
to find a real-valued function f (t) such that
We shall review some of them below.
g(t) = exp(f (t)(t)) is a solution of the equation
(c1) The case G = S1 . If G is abelian, Rthe equation in g(t)
_ = Te Lg(t) (t) with f (0) = 0. This gives, for
t
step 3 has the solution g(t) = exp 0 (s)ds. If, in small t, the equation f_ (t)(t) f (t)(t) = (t),
1
addition, G = S , then (s) can be explicitly that is, it is necessary that (t) and (t) be
determined by step 2. Indeed, if
2 g  is a proportional. So, if (t) = (t)(t) for some
generator of g  , writing (s) = a(s)
for some known R t smooth R s function (t), then this gives
smooth real-valued function a defined on some f (t) = 0 exp( t (r)dr) ds.
open interval around the origin, the algebraic (c4) The case of G solvable. Write g(t) = exp(f1 (t)1 )
equation in step 2 implies that hha(s)(t)Q (qh (t)), exp(f2 (t)2 )    exp(fn (t)n ), for some basis

Q (qh (t))ii = h,
i, which gives a(s)= h,
i= {1 , 2 , . . . , n } of g  and some smooth real-valued
k
Q (qh (s))k2 . Therefore, the base integral curve of functions fi , i = 1, 2, . . . , n, defined around zero. It
the solution of the simple mechanical system with is known that if G is solvable, the equation in
symmetry on T  Q passing through q is step 3 can be solved by quadratures for the fi .
Z t !
ds
qt exp h;
i 2

 qh t Reconstruction Phases for Simple Mechanical
0 k
Q qh sk
Systems with S 1 Symmetry
and Consider a simple mechanical system with symmetry
Z ! G on the Riemannian manifold (Q, hh , ii) with
t
ds G-invariant potential V 2 C1 (Q). If  2 g  , let V
qt
_ exp h;
i

0 k
Q qh sk2 be the amended potential and V b  2 C1 (Q=G ) the
! induced function on the base. Let c : [0, T] ! T  Q be
h;
i an integral curve of the system with Hamiltonian
 q_ h t
Q qh t
k
Q qh sk2 h = K V  Q and suppose that its projection
c : [0, T] ! (T  Q) to the reduced space is a closed
(c2) The case of compact Lie groups. An obvious integral curve of the reduced system with Hamil-
situation when the differential equation in step 3 tonian h . The reconstruction phase associated to
can be solved is if (t) =  for all t, where  is a the loop c (t) is the group element g 2 G , satisfying
given element of g  . Then the solution is the identity c(T) = g  c(0). We shall present two
g(t) = exp(t). However, step 2 puts certain explicit formulas of the reconstruction phase for the
restrictions under this hypothesis, because it case when G = S1 . Let
2 g  = R be a generator of
requires that hh(t)Q (qh (t)), Q (qh (t))ii = h, i the coadjoint isotropy algebra and write c(T) =
for any  2 g  . This is satisfied if there is a exp(
)  c(0); in this case, is identified with the
bilinear nondegenerate form ( , ) on g satisfy- reconstruction phase and, as we shall see in concrete
ing (
, ) = hh
Q (q), Q (q)ii for all q 2 Q and mechanical examples, it truly represents an angle.

,  2 g . This implies that ( , ) is positive If G = S1 , the G -principal bundle  : J 1 () !
definite and invariant under the adjoint action (T  Q) := J 1 ()=G admits two natural connec-
of G on g , so semisimple Lie algebras of tions: A = (1=
)  2 1 ( J 1 ()), where  is the
noncompact type are excluded. If G is com- pullback of the canonical 1-form on the cotangent
pact, which ensures the existence of a positive bundle to the momentum level submanifold J 1 (),
adjoint invariant inner product on g , and and Q Amech 2 1 ( J 1 ()). There is no reason to
Q = G, this condition implies that the kinetic choose one connection over the other and thus there
energy metric is invariant under the adjoint are two natural formulas for the reconstruction
action. There are examples in which such phase in this case. Let c (t) be a periodic orbit of
conditions are natural, such as in Kaluza period T of the reduced system and denote also by
Klein theories. Thus, if G is a compact Lie h the value of the Hamiltonian function on it.
664 Cotangent Bundle Reduction

Assume that D is a two-dimensional surface in Casimir functions that are all smooth functions of
(T  Q) whose boundary is the loop c (t). Since the kk2 , where  2 R 3 denotes the body angular
manifolds (T  Q) and T  (Q=S1 ) are diffeomorphic momentum.
(but not symplectomorphic), it makes sense to The Hamiltonian of the rigid body on the Lie
consider the base integral curve q (t) obtained by Poisson space T  SO(3)=SO(3) R 3 is given by
projecting c (t) to the base Q=S1 , which is a closed  
curve of period T. Denote by 1 21 22 23
h :
Z T 2 I1 I2 I3
b  i : 1
hV b  q t dt
V where I1 , I2 , I3 > 0 are the principal moments of
T 0
inertia of the body. Let I := diag(I1 , I2 , I3 ) denote the
the average of V b  over the loop q (t). Let qh (t) 2 Q moment of inertia tensor diagonalized in a principal-
be the Amech -horizontal lift of q (t) to Q and let be axis body frame. The LiePoisson bracket on R3 is
the Amech -holonomy of the loop q (t) measured from given by {f , g}() =   (rf ()  rg()) and the
q(0), the base Rpoint of c(0); its expression is given by equation of motions are  =   , where  2 R3 is
R
exp = exp( D B), where B is the curvature of the the body angular velocity given in terms of  by
mechanical connection. Denote by ! the reduced i := =Ii , for i = 1, 2, 3, that is,  = I1 . The
symplectic form on (T  Q) . With these notations the trajectories of the these equations are found by
phase is given by intersecting a family of homothetic energy ellipsoids
ZZ with the angular momentum concentric spheres. If
1 2h  hV b  iT
! I1 > I2 > I3 , one immediately sees that all orbits are

D 
periodic with the exception of four centers (the two
Z T
ds possible rotations about the long and the short

2
3 moment of inertia axis of the body), two saddles
0 k
Q qh sk
(the two rotations about the middle moment of
The first terms in both formulas are the so-called inertia axis of the body), and four heteroclinic orbits
geometric phases because they carry only geometric connecting the two saddles.
information given by the connection, whereas the Suppose that (t) is a periodic orbit on the sphere
second terms are called the dynamic phases since S2kk with period T. After time T, by how much has
they encapsulate information directly linked to the the rigid body rotated in space? The answer to this
Hamiltonian. The expression of the total phase as a question follows directly from [3]. Taking
= =kk
sum of a geometric and a dynamic phase is not and the potential v 0 we get
intrinsic and is connection dependent. It can even
2h T
happen that one of these summands vanishes. We 
shall consider now two concrete examples: the free kk
ZZ
rigid body and the heavy top. 2kIsk2  s  Istr I
ds
D s  Is2
Reconstruction Phases for the Free Rigid Body Z T
ds
kk3
The motion of the free rigid body is a geodesic with 0 s  Is
respect to a left-invariant Riemannian metric on
where D is one of the two spherical caps on S2kk
SO(3) given by the moment of inertia of the body.
whose boundary is the periodic orbit (t), h is the
The phase space of the free rigid body motion is
value of the total energy on the solution (t), and 
T  SO(3) and a momentum map J : T  SO(3) ! R 3 of
is the oriented solid angle, that is,
the lift of left translation to the cotangent bundle is Z Z
given by right translation to the identity element. 1 areaD
 :  ! ; jj
We have identified here so(3) with R3 by the kk D kk2
Lie algebra isomorphism x 2 (R3 ,  ) 7! x ^ 2 (so(3),
[ , ]), where x^(y) = x  y, and so(3) with R3 by
the inner product on R 3 . The reduced manifold
Reconstruction Phases for the Heavy Top
J 1 ()=G is identified with the sphere S2kk in R3 of
radius kk with the symplectic form ! = dS=kk, The heavy top is a simple mechanical systems with
where dS is the standard area form on S2kk and G symmetry S1 on T  SO(3) whose Hamiltonian function
S1 is the group of rotations around the axis . These is given by h(h ) := (1=2)k]h k2 Mgk  h , where
concentric spheres are the coadjoint orbits of the Lie h 2 SO(3), h 2 Th SO(3), k is the unit vector of the
Poisson space so(3) and represent the level sets of the spatial Oz axis (pointing in the direction opposite to
Cotangent Bundle Reduction 665

that of the gravity force), M 2 R is the total mass of the action. The leaves of this Poisson manifold are the
body, g 2 R is the value of the gravitational accelera- orbit reduced spaces J 1 (O )=G, where O  g  is
tion, the fixed point about which the body moves is the the coadjoint G-orbit through  2 g  (see Symmetry
origin, and is the unit vector of the straight line and Symplectic Reduction). Is there an explicit
segment of length connecting the origin to the center formula for this reduced Poisson bracket on a
of mass of the body. This Hamiltonian is left invariant manifold diffeomorphic to (T  Q)=G? It turns out
under rotations about the spatial Oz axis. A momen- that this question has two possible answers, once a
tum map induced by this S1 -action is given by connection on the principal bundle  : Q ! Q=G is
J : T  SO(3) ! R, J(h ) = Te Lh (h )  k; recall that introduced. The discussion below will also link to
Te Lh (h ) =:  2 R3 is the body angular momentum. the fibration version of cotangent bundle reduction.
The reduced space J 1 ()=S1 is generically the cotan- In order to present these answers, we review two
gent bundle of the unit sphere endowed with the bundle constructions. Let G act freely and properly
symplectic structure given by the sum of the canonical on the manifold P and consider the a (left) principal
form plus a magnetic term; equivalently, this is the G-bundle  : P ! P=G := M. Let : N ! M be a
coadjoint orbit in the dual of the Euclidean Lie algebra surjective submersion. Then the pullback bundle
se(3) = R 3  R 3 given by O = {(, ) j    = ,  : (n, p) 2 P~ := {(n, p) 2 N  P j (p) = (n)} 7! n 2 N
kk2 = 1}. The projection map J 1 () ! O imple- over N is also a principal (left) G-bundle relative to
menting the symplectic diffeomorphism between the the action g  (n, p) := (n, g  p).
reduced space and the coadjoint orbit in se(3) is If there is a (left) G-action a manifold V, then the
given by h 7! (, ) := (Te Lh (h ), h1 k). The orbit diagonal G-action g  (p, v) = (g  p, g  v) on P  V is
symplectic form ! on O has the expression also free and proper and one can form the asso-
! (, )((  x   y,   x), (  x0   y0 , ciated bundle P G V := (P  V)=G which is a
  x0 )) =   (x  x0 )    (x  y0  x0  y) for any locally trivial fiber bundle E : [p, v] 2 E := P G
x, x0 , y, y0 2 R3 . The heavy-top equations  =   V 7! (p) 2 M over M with fibers diffeomorphic to
Mg  ,  =    are LiePoisson equations on V. Analogously, one can form the associated fiber
se(3) for the Hamiltonian h(, ) = (1=2)   bundle E~ : E ~ := P
~ G V ! N. Summarizing, the
Mg  and the LiePoisson bracket {f , g}(, ) = associated bundle E ~ =P
~ G V ! N is obtained
  (r f  r g)    (r f  r g  r g  r f ), from the principal bundle  : P ! M, the surjective
where r and r denote the partial gradients. submersion : N ! M, and the G-manifold V by
Let ((t), (t)) be a periodic orbit of period T of pullback and association, in this order.
the heavy-top equations. After time T, by how much These operations can be reversed. First, form the
has the heavy top rotated in space? The answer is associated bundle E : E = P G V ! M and then
provided by [3]: pull it back by the surjective submersion : N ! M
ZZ  Z T  to N to get the pullback bundle ~E : E ~ ! N. The map
1 1 ~ ~
 : P G V ! E defined by ([(n, p), v]) := (n, [p, v])
! 2h T  2Mg s  ds
 D  0 is an isomorphism of locally trivial fiber bundles.
ZZ
2kIsk2  s  Istr I These general considerations will be used now to
ds realize the quotient Poisson manifold (T  Q)=G in
D s  Is2
Z T two different ways. Let Q be a manifold and G a Lie
ds group (with Lie algebra g ) acting freely and properly

0 s  Is on it. Let A 2 1 (Q; g ) be a connection 1-form on
where D is the spherical cap on the unit sphere the left G-principal bundle  : Q ! Q=G. Pull back
whose boundary is the closed curve (t) and D is a the G-bundle  : Q ! Q=G by the cotangent bundle
two-dimensional submanifold of the orbit O projection Q=G : T  (Q=G) ! Q=G to T  (Q=G) to
obtain the G-principal bundle ~Q=G : ([q] , q) 2 Q ~ :=
bounded by the closed integral curve ((t), (t)). 
The first terms in each summand represent the {([q] , q) j [q] = (q), q 2 Q} 7! [q] 2 T (Q=G). This
geometric phase and the second terms the dynamic bundle is isomorphic to the annihilator (VQ) 
phase. T  Q of the vertical bundle VQ := ker T  TQ.
Next, form the coadjoint bundle S : S := Q ~ G
  ~
g ! T (Q=G) of Q, S (([q] , q), ) = [q] , that is,
Gauged Poisson Structures
the associated vector bundle to the G-principal
If the Lie group G acts freely and properly on a bundle Q ~ ! T  (Q=G) given by the coadjoint repres-
smooth manifold Q, then (T  Q)=G is a quotient entation of G on g  . The connection-dependent map
Poisson manifold (see Poisson Reduction), where the A : S ! (T  Q)=G defined by A ([([q] , q), ]) :=
quotient is taken relative to the (left) lifted cotangent [Tq ([q] ) A(q) ], where q 2 Q, q 2 Tq Q, and
666 Cotangent Bundle Reduction

 2 g  , is a vector bundle isomorphism over Q=G. by re W f (w)(v ) = df (w)(v , T(q, ) Qg  (horq
A [q] [q]
The Sternberg space is the Poisson manifold (S, { , }S ), (T[q] Q=G (v[q] )), 0)) where Qg  : Q  g  !
where { , }S is the pullback to S by A of the quotient Q G g  = e g  is the orbit map. The symbol r eW
A
Poisson bracket on (T  Q)=G. signifies that this is a covariant derivative on the
Next, we proceed in the opposite order. Construct pullback bundle W induced by the covariant
first the coadjoint bundle  ~g  : [q, ] 2 e g  := Q G derivative rA on the coadjoint bundle e g  . This
g  7! [q] 2 Q=G associated to the principal bundle covariant derivative rA is induced on e g  by the
 : Q ! Q=G and then pull it back by the cotangent connection A.
bundle projection Q=G : T  (Q=G) ! Q=G to
For f 2 C1 (W), we have dSA~ (f  ) = (r e W f )  .
A
T  (Q=G) to obtain the vector bundle W : W :=
To write the two gauged Poisson brackets on S and
{([q] , [q, ]) j Q=G ([q] ) =  ~g  ([q, ]) = [q]}, W ([q] ,
on W explicitly, we denote by ~g = Q G g the
[q, ]) = [q] over T  (Q=G). Note that W = T 
adjoint bundle of  : Q ! Q=G, by Q=G the
(Q=G)  e g  and hence W is also a vector bundle over
canonical symplectic structure on T  (Q=G), by
Q/G. Let HQ be the horizontal sub-bundle defined by
B 2 2 (Q; g ) the curvature of A, and by B the
the connection A; thus, TQ = HQ  VQ, where
~g -valued 2-form B 2 2 (Q=G; ~g ) on the base Q=G
Hq Q := ker A(q). For each q 2 Q, the linear map
defined by B([q])(u[q] , v[q] ) = [q, B(q)(uq , vq )], for any
Tq jHq Q : Hq Q ! T[q] (Q=G) is an isomorphism. Let
uq , vq 2 Tq Q that satisfy Tq (uq ) = u[q] and
horq := (Tq jHq Q )1 : T[q] (Q=G) ! Hq Q  Tq Q be
Tq (vq ) = v[q] . Note that both S and W  are Lie
the horizontal lift operator induced by the connection
algebra bundles, that is, their fibers are Lie algebras
A. Thus, horq : Tq Q ! T[q] 
(Q=G) is a linear surjective
and the fiberwise Lie bracket operation depends
map whose kernel is the annihilator (Hq Q) of the
smoothly on the base point. If f 2 C1 (S), denote by
horizontal space. The connection-dependent map ~ G g the usual fiber derivative of f.
f = s 2 S = Q
A : (T  Q)=G ! W defined by A ([q ]) := (horq
Similarly, if f 2 C1 (W) denote by f = w 2 W  the
(q ), [q, J(q )]), where q 2 Q, q 2 Tq Q, and J : T 
usual fiber derivative of f. Finally, ] : T 
Q ! g  is the momentum map of the lifted action,
(T  (Q=G)) ! T(T  (Q=G)) is the vector bundle iso-
h J(q ), i = q ((Q (q)) for  2 g , is a vector bundle
morphism induced by Q=G . The Poisson bracket of
isomorphism over Q/G and A  A = . The Wein-
f , g 2 C1 (S) is given by
stein space is the Poisson manifold (W, { , }W ), where
{ , }W is the push-forward by A of the Poisson 
ff ; ggS s Q=G q d SA~f s] ; d SA~gs]
bracket of (T  Q)=G. In particular,  : S ! W is a


connection independent Poisson diffeomorphism. The f g
Poisson brackets on S and on W are called gauged  s; ;
s s
Poisson brackets. They are expressed explicitly in terms D  E
of various covariant derivatives induced on S and on v; Q=G Bq d SA~f s] ; d SA~gs]
W by the connection A 2 1 (Q; g ).
Recall that the connection A on the principal g  . The Poisson bracket f , g 2
where v = [q, ] 2 e
bundle  : Q ! Q=G naturally induces connections C1 (W) is given by
on pullback bundles and affine connections on  W
associated vector bundles. Thus, both S and W ff ; ggW w Q=G q r e W gw]
e f w] ; r
A A
carry covariant derivatives induced by A. They are

f g
given, according to general definitions, in the cases  w; ;
w w
under consideration, by: D  W E

v; Q=G Bq r e f w] ; re W gw]
A A

If f 2 C1 (S), s = [([q] , q), ] 2 S, and v[q] 2 T[q]


T  (Q=G), then d SA~ f (s) 2 T [q] T  (Q=G) is defined Note that their structure is of the form: canonical
by d SA~ f (s)(v[q] ) := df (s)T(([q] , q), ) Qg
~  v[q] , horq bracket plus a (left) LiePoisson bracket plus a
(T[q] (v[q] )), 0 where Qg ~ 
~  : Q  g !QG
~ curvature coupling term.
 S
g = S is the orbit map. The symbol d A~ signifies
that this is a covariant derivative on the The Symplectic Leaves of the Sternberg
associated bundle S induced by the connection and Weinstein Spaces
~ on the principal G-pullback bundle
A ~  g  ! T  Q given by A (([q] , q),
The map A : Q
Q~ ! T  (Q=G). This connection A ~ is the pullback ~  g ,
) := Tq ([q] ) A(q) , where (([q] , q),)2 Q

connection defined by A. is a G-equivariant diffeomorphism; the G-action

If f 2 C1 (W), w = ([q] , [q, ]) 2 W, and v[q] 2 T[q] on T  Q is by cotangent lift and on Q ~  g  is
T  (Q=G), then r e W f (w) 2 T  T  (Q=G) is defined 
g  (([q] , q), ) = (([q] , g  q), Adg1 ). The pullback J A
A [q]
Cotangent Bundle Reduction 667

of the momentum map to Q ~  g  has the expression Further Reading


J A (([q] , q), ) = , so if O  g  is a coadjoint orbit we
~ Abraham R and Marsden JE (1978) Foundations of Mechanics,
have J 1 A (O)= Q  O, and hence the orbit reduced 2nd edn. Reading, MA: Addison-Wesley.
manifold J 1 A (O)=G, whose connected components Guichardet A (1984) On rotation and vibration motions of
are the symplectic leaves of S, equals Q ~ G O. Its molecules. Annales de lInstitut Henri Poincare. Physique
symplectic form is the Sternberg minimal coupling Theorique 40: 329342.
form ! ~  Iwai T (1987) A geometric setting for classical molecular
O S Q=G .
dynamics. Annales de lInstitut Henri Poincare. Physique
In this formula, the 2-form ! ~
O has not been Theorique. 47: 199219.
defined yet. It is uniquely defined by the identity Kummer M (1981) On the construction of the reduced phase
~ b  
Qg
~ ! O = d A O !O , where !O is the minus orbit space of a Hamiltonian system with symmetry. Indiana
symplectic form on O (see Symmetry and Symplectic University Mathematics Journal 30: 281291.
Reduction), O : Q ~  O ! O is the projection on the Lewis D, Marsden JE, Montgomery R, and Ratiu TS (1986) The
b 2 2 (Q ~  O) is the 2-form Hamiltonian structure for dynamic free boundary problems.
second factor, and A Physica D 18: 391404.
given by b [q] , q), )
A(( ((u[q] , vq ), ) =
  Marsden JE, Montgomery R, and Ratiu TS (1990) Reduction,
 , A(q)(vq ) for (([q] , q), ) 2 Q ~  O, (u , vq ) 2 symmetry, and phases in mechanics. Memoirs of the American
[q]
~ and  2 g  .
T([q] , q) Q, Mathematical Society 88(436).
The symplectic leaves of the Weinstein space Marsden JE, Misioek G, Ortega J-P, Perlmutter M, and Ratiu TS
(2005) Hamiltonian Reduction by Stages, Lecture Notes in
W are obtained by pushing forward by  the Mathematics. Springer.
symplectic leaves of the Sternberg space. They are Marsden JE and Perlmutter M (2000) The orbit bundle picture of
the connected components of the symplectic cotangent bundle reduction. Comptes Rendus Mathematiques de
manifolds (T  (Q=G)  (Q G O), T  (Q=G) Q=G lAcademie des Sciences. La Societe Royale du Canada
Q G O !  22: 3354.
Q G O ), where O is a coadjoint orbit in g ,
 Marsden JE and Ratiu TS (2003) Introduction to Mechanics and
Q=G is the canonical symplectic form on T (Q=G), Symmetry, 2nd edn. second printing; 1st edn. (1994), Texts in
! QG O is a closed 2-form on Q G O to be defined Applied Mathematics, vol. 17. New York: Springer.
below, and T  (Q=G) : T  (Q=G)  (Q G O) ! Montgomery R (1984) Canonical formulations of a particle in a
T (Q=G), QG O : T  (Q=G)  (Q G O) ! Q G O
 YangMills field. Letters in Mathematical Physics 8: 5967.
are the projections. The closed 2-form ! Montgomery R (1991) How much does a rigid body rotate? A
QG O 2
Berrys phase from the eighteenth century. American Journal
2 (Q G O) is uniquely determined by the identity of Physics 59: 394398.
Q  O ! 
Q G O = !QO , where QO : Q  O ! Q G O Montgomery R, Marsden JE, and Ratiu TS (1984) Gauged Lie
2
is the orbit space projection, ! QO 2  (Q  O) is Poisson structures. In: Marsden J (ed.) Fluids and Plasmas:

closed and given by ! QO (q, )((u q , ad ),
Geometry and Dynamics, Contemporary Mathematiques,
  vol. 28, pp. 101114. Providence, RI: American Mathematical
(vq , ad )) := d(A  idO )(q, ) ((uq , ad ), (vq ,
Society.
ad )) !  
O ()(ad , ad ), and A  idO 2  (Q
1
Satzer WJ (1977) Canonical reduction of mechanical systems

g ) is given  by (A  idO )(q, )(uq , ad ) = invariant under abelian group actions with an application to
, A(q)(uq ) , for q 2 Q,  2 g  , uq , vq 2 Tq Q, ,  2 g . celestial mechanics. Indiana, University Mathematics Journal
Thus, on the Sternberg and Weinstein spaces, 26: 951976.
Simo JC, Lewis D, and Marsden JE (1991) The stability of relative
both the Poisson bracket as well as the symplectic
equilibria. Part I: The reduced energymomentum method.
form on the leaves have explicit connection Archive for Rational Mechanics and Analysis 115: 1559.
dependent formulas (see Gauge Theory: Mathema- Smale S (1970) Topology and mechanics. Inventiones Mathema-
tical Applications for a general treatment of gauge ticae 10: 305331, 11: 4564.
theories). Sternberg S (1977) Minimal coupling and the symplectic mechanics of
a classical particle in the presence of a YangMills field.
Proceedings of the National Academy of Sciences 74: 52535254.
See also: Gauge Theory: Mathematical Applications;
Weinstein A (1978) A universal phase space for particles in Yang
Hamiltonian Group Actions; Poisson Reduction;
Mills fields. Letters in Mathematical Physics 2: 417420.
Symmetries and Conservation Laws; Symmetry and Zaalani N (1999) Phase space reduction and Poisson structure.
Symplectic Reduction. Journal of Mathematical Physics 40(7): 34313438.
668 Critical Phenomena in Gravitational Collapse

Critical Phenomena in Gravitational Collapse


C Gundlach, University of Southampton, for the black hole mass M in the limit p ! p
Southampton, UK from above.
2006 Elsevier Ltd. All rights reserved. 2. Universality. While p and C depend on the
particular one-parameter family of data, the critical
exponent  has a universal value,  0.374, for all
one-parameter families of scalar-field data. Further-
Introduction more, for a finite time in a finite region of space, the
solutions generated by all near-critical data
Sufficiently dense concentrations of massenergy in approach one and the same solution  , called the
general relativity collapse irreversibly and form black critical solution:
holes. More precisely, the singularity theorems state r t  t 
that once a closed trapped surface has developed, some r; t  ;

2
world lines will only extend to a finite length in the L L
future they end in a spacetime singularity. Further- The constants t and L depend again on the
more, the cosmic censorship hypothesis states that this family of initial data, but  (r, t) is universal. This
singularity is hidden away inside a black hole. One universal phase ends when the evolution decides
can, therefore, classify initial data in general relativity between black hole formation and dispersion.
which describe an isolated system with no black hole The universal critical solution is approached by
present into those which remain regular, and those any initial data that are sufficiently close to the
which form a black hole during their evolution. black hole threshold, on either side, and from any
Theorems on the stability of Minkowski spacetime, one-parameter family.
and similar results for some types of matter coupled to 3. Scale-echoing. The critical solution  (r, t) is
gravity, imply that sufficiently weak (in some technical unchanged when one rescales space and time by
sense) initial data will remain regular. On the other a factor e :
hand, no necessary or sufficient criterion for black hole  
formation is known. For very strong data the existence  r; t  e r; e t 3
of a closed trapped surface implies black hole
where  3.44 for the scalar field.
formation, but although the data themselves may be
regular, the trapped surface must already be inside the The same phenomena were quickly discovered in
black hole. Between the very weak and very strong many other types of matter coupled to gravity, and
regime, there is a middle regime of initial data for even in vacuum gravity (where gravitational waves can
which one cannot decide if they will or will not form a form black holes). The echoing period  and critical
black hole, other than evolving them in time. exponent  depend on the type of matter, but the
The threshold between collapse and dispersion was existence of the phenomena appears to be generic. For
first explored systematically by Choptuik (1992). He some types of matter (e.g., perfect fluid matter), the
concentrated on the simple model of a spherically critical solution is continuously scale invariant (or
symmetric massless scalar (matter) field (r, t). In this continuously self-similar, CSS) in the sense that
model, the scalar-field matter must either form a black
hole, or disperse to infinity it cannot form stable  r; t  r=t 4
stars. Choptuik explored the space of initial data by rather than scale-periodic (or discretely self-similar,
means of one-parameter families of initial data which DSS) as in [3]. (We use the notation  (x) for the
interpolate between strong data (say with large function of one variable r=t.) We have described
parameter p) that form a black hole and weak data scale invariance and scale-echoing here in terms
(with small p) that disperse. The critical value p of the of coordinates, but these do admit geometric,
parameter p can be found for each family by evolving coordinate-invariant definitions, which are not
many data sets from that family. Near the black hole restricted to spherical symmetry.
threshold, Choptuik found the following phenomena: There is also another kind of critical behavior at the
1. Mass scaling. By fine-tuning the initial data to black hole threshold. Here, too, the evolution goes
the threshold along any one-parameter family, through a universal critical solution, but it is static,
one can make arbitrarily small black holes. Near rather than scale invariant. As a consequence, the mass
the threshold, the black hole mass scales as of black holes near the threshold takes a universal
finite value (some fixed fraction of the mass of the
M Cp  p  for p  p 1 critical solution), instead of showing power-law
Critical Phenomena in Gravitational Collapse 669

scaling. In an analogy with first- and second-order is that the same spacetime can be sliced in many
phase transitions in statistical mechanics, the critical different ways, none of which is preferred. There-
phenomena with a finite mass at the black hole fore, to turn general relativity into a dynamical
threshold are called type I, and the critical phenomena system, one has to fix a slicing (and in practice also
with power-law scaling of the mass are called type II. coordinates on each slice). In the example of the
At this point, we characterize the degree of rigor spherically symmetric massless scalar field, using
of the various parts of the theory that is summarized polar slicing and an area radial coordinate r, a point
in this article. Critical phenomena were discovered in phase space can be characterized by the two
in the numerical time evolution of generic asympto- functions
tically flat initial data. Numerical evolution of many  
elements of a specific one-parameter family, and @
Z r; r r 5
fine-tuning to the black hole threshold along that @t
family showed self-similarity and mass scaling near
In spherical symmetry, there are no degrees of
the threshold. Doing this for a number of randomly
freedom in the scalar field, and Cauchy data for
chosen one-parameter families suggests that these
the metric can be reconstructed from Z using the
phenomena, and in particular the echoing scale 
Einstein constraints.
and mass-scaling exponent , are universal between
The phase space consists of two halves: initial
initial data within one model (e.g., the spherical
data whose time evolution always remains regular,
scalar field). Numerical experiments, however, can
and data which contain a black hole or form one
only explore a finite-dimensional subspace of the
during time evolution. The numerical evidence
infinite-dimensional space of initial data (phase
collected from individual one-parameter families of
space) of the field theory, and so cannot prove
data suggests that the black hole threshold that
universality.
separates the two is a smooth hypersurface. The
We go further by applying the theory of dynami-
mass-scaling law [1] can, therefore, be restated
cal systems to general relativity. The arguments
without explicit reference to one-parameter families.
summarized in the next section would be difficult to
Let P be any function on phase space such that data
make rigorous, as the dynamical system under
sets with P > 0 form black holes, and data with P < 0
consideration is infinite dimensional, but they
do not, and which is analytic in a neighborhood of
suggest a focus on fixed points of the dynamical
the black hole threshold P = 0. The black hole mass
system and their linear perturbations. Even though
as a function on phase space is then given by
the dynamical systems motivation is not mathema-
tically rigorous, the linearized analysis itself is a M FP P 6
well-defined problem that can be solved numerically
to essentially arbitrary precision. This proves uni- for P > 0, where F(P) > 0 is an analytic function.
versality on a perturbative level, and provides Consider now the time evolution in this dynami-
numerical values of  and . A combination of the cal system, near the threshold (critical surface)
global dynamical systems analysis and perturbative between black hole formation and dispersion. A
analysis even predicts further critical exponents for phase-space trajectory that starts out in a critical
black hole charge and angular momentum. Finally, surface by definition never leaves it. A critical
critical phenomena have been discovered in a surface is, therefore, a dynamical system in its own
number of systems (different types of matter and right, with one dimension fewer. If it has an
symmetry restrictions), and this suggests that they attracting fixed point, such a point is called a
may be generic for some large class of field theories critical point. It is an attractor of codimension 1,
(although details such as the numerical values of and the critical surface is its basin of attraction. The
 and  do depend on the system), but there is no fact that the critical solution is an attractor of
conclusive evidence for this at present. codimension 1 is visible in its linear perturbations: it
has an infinite number of decaying perturbation
modes tangential to (and spanning) the critical
The Dynamical Systems Picture
surface, and a single growing mode not tangential
When we consider general relativity as an infinite- to the critical surface.
dimensional dynamical system, a solution curve is a Any trajectory beginning near the critical surface,
spacetime. Points along the curve are Cauchy but not necessarily near the critical point, moves
surfaces in the spacetime, which can be thought of almost parallel to the critical surface toward the
as moments of time. An important difference critical point. As the phase point approaches the
between general relativity and other field theories critical point, its movement parallel to the surface
670 Critical Phenomena in Gravitational Collapse

Flat space fixed point terms this corresponds to a discrete symmetry (DSS
rather than CSS in type II, or a pulsating critical
Black hole solution, rather than a stationary one, in type I).
threshold
One-parameter
family of
initial data Self-Similarity and Mass Scaling
Critical
point Type II critical phenomena occur where the critical
solution is scale invariant (self-similar, CSS or DSS).
p<p Using suitable spacetime coordinates, a CSS solution
p=p * can be characterized as independent of a time
p>p * coordinate  which is also a logarithmic scale.
*
Similarly, a DSS solution can be characterized as
periodic in . For example, starting from the scale
periodicity [3] in polar-radial coordinates, we
Black hole fixed point replace r and t by new coordinates
r  tt 

Figure 1 The phase-space picture for the black hole threshold x ;    ln  7
t  t L
in the presence of a critical point. The arrow lines are time
evolutions, corresponding to spacetimes. The line without an where the accumulation time t and scale L must be
arrow is not a time evolution, but a one-parameter family of initial matched to the one-parameter family under con-
data that crosses the black hole threshold at p = p . (Reproduced
sideration.  has been defined so that it increases as
with permission from Gundlach C (2003) Critical phenomena in
gravitational collapse. Physics Reports 376: 339405.) t increases and approaches t from below. It is useful
to think of r, t, and L as having dimension length in
units c = G = 1, and of x and  as dimensionless.
slows down, while its distance and velocity out of Choptuiks observation, expressed in these coordi-
the critical surface are still small. The phase point nates, is that in any near-critical solution there is
spends sometime moving slowly near the critical a spacetime region where the fields Z are well
point. Eventually, it moves away from the critical approximated by the critical solution, or
point in the direction of the growing mode, and ends Zx;  Z x;  8
up on an attracting fixed point.
This is the origin of universality: any initial data with
set that is close to the black hole threshold (on either Z x;   Z x;  9
side) evolves to a spacetime that approximates the
critical spacetime for sometime. When it finally Note that the time parameter of the dynamical
approaches either the dispersion fixed point or the system must be chosen as  if a CSS solution is to be
black hole fixed point, it does so on a trajectory that a fixed point, or a DSS solution a cycle. More
appears to be coming from the critical point itself. generally (going beyond spherical symmetry), on any
All near-critical solutions are passing through one of self-similar spacetime one can introduce coordinates
these two funnels. All details of the initial data have x = (, x1 , x2 , x3 ) in which the metric is of the form
been forgotten, except for the distance from the
g e2 g 10
black hole threshold: the closer the initial phase
point is to the critical surface, the more the solution and where g is independent of  for a CSS
curve approaches the critical point, and the longer it spacetime, and periodic in  for a DSS spacetime.
will remain close to it. These coordinates are not unique.
In all systems that have been examined, the black The critical exponent  can be calculated from the
hole threshold contains at least one critical point. A linear perturbations of the critical solution. In order
fixed point of the dynamical system represents a to keep the notation simple, the discussion will be
spacetime with an additional continuous symmetry restricted to a critical solution that is spherically
that generic solutions do not have. If the critical symmetric and CSS, which is correct, for example,
spacetime is time independent in the usual sense, we for perfect-fluid matter.
have type I critical phenomena; if the symmetry is Let us assume that we have fine-tuned initial data
scale invariance, we have type II critical phenomena. close to the black hole threshold so that in a region
The attractor within the critical surface may also be the resulting spacetime is well approximated by the
a limit cycle, rather than a fixed point. In spacetime CSS critical solution. This part of the spacetime
Critical Phenomena in Gravitational Collapse 671

corresponds to the section of the phase-space These Cauchy data at t = tp depend on the initial
trajectory that lingers near the critical point. In this data at t = 0 only through the overall scale Lp , and
region, we can linearize around Z . As Z does not through the sign in front of . If the field equations
depend on , its linear perturbations can depend themselves are scale invariant, or asymptotically
on  only exponentially. Labeling the perturbation scale invariant at scales Lp and smaller, the black
modes by i, a single mode perturbation is of hole mass, which has dimensions of length in
the form gravitational units, must be proportional to the
initial data scale Lp , the only length scale that is
Z Ci ei  Zi x 11 present. Therefore,
In the near-critical regime, we can therefore
M / Lp / p  p 1=0 18
approximate the solution as
X
1 and we have found the critical exponent to be  = 1=0 .
i 
Zx;  Z x Ci p e Zi x 12
i0

The notation Ci (p) is used because the perturbation The Analogy with Statistical Mechanics
amplitudes Ci depend on the initial data, and hence The existence of a threshold where a qualitative
on the parameter p that controls the initial data. change takes place, universality, scale invariance,
If Z is a critical solution, by definition there is and critical exponents suggest that there is a
exactly one i with positive real part (in fact, it is mathematical analogy between type II critical
purely real), say 0 . As t ! t from below, which phenomena and critical phase transitions in statis-
corresponds to  ! 1, all other perturbations decay tical mechanics.
and can be neglected. By definition, the critical In equilibrium statistical mechanics, observable
solution corresponds to p = p , and so we must have macroscopic quantities, such as the magnetization of
C0 (p ) = 0. Linearizing around p , we obtain a ferromagnetic material, are derived as statistical
 averages over microstates of the system. The
dC0 
Zx;  Z x p  p e0  Z0 x 13 expected value of an observable is
dp p X
hAi Amicrostate eHmicrostate; 19
in a region of the spacetime. microstates
Now we extract Cauchy data at one particular
value of  within that region, namely at p The Hamiltonian H depends on the parameters ,
defined by which comprise the temperature, parameters char-
 acterizing the system such as interaction energies of
dC0  the constituent molecules, and macroscopic forces
jp  p je0 p  14
dp p such as the external magnetic field. The objective of
statistical mechanics is to derive relations between
where is an arbitrary small constant, so that the macroscopic quantities A and parameters .
Zx;p Z x  Z0 x 15 Phase transitions in thermodynamics are thresholds
in the space of external forces  at which the
where  is the sign of p  p , left behind because by macroscopic observables A, or one of their derivatives,
definition is positive. As  increases from p , the change discontinuously. In a ferromagnetic material
growing perturbation becomes nonlinear and the at high temperatures, the magnetization m of the
approximation [13] breaks down. Then either a material (alignment of atomic spins) is determined by
black hole forms (say for the positive sign), or the the external magnetic field B. At low temperatures, the
solution disperses (for the negative sign). We need material shows a spontaneous magnetization even at
not follow this nonlinear evolution in detail to find zero external field, which breaks rotational symmetry.
the black hole mass scaling in the former case: With increasing temperature, the spontaneous magne-
dimensional analysis is sufficient. Going back to tization m decreases and vanishes at the Curie
coordinates t and r, we have temperature T as

r r jmj  T  T 20
Zr; tp Z  Z0 16
Lp Lp
In the presence of a very weak external field, the
where spontaneous magnetization aligns itself with the
external field B, while its strength is, to leading
Lp  Lep 17 order, independent of B. The function m(B, T),
672 Critical Phenomena in Gravitational Collapse

therefore, changes discontinuously at B = 0. The line taking into account that the -evolution in critical
B = 0 for T < T is, therefore, a line of first-order collapse is toward smaller scales, while the renor-
phase transitions between the possible directions of malization group flow goes toward larger scales:
the spontaneous magnetization (in a one-dimen- therefore,
diverges at the critical point, while M
sional system, between m up and m down). This line vanishes.
ends at the critical point (B = 0, T = T ) where the We have shown above that the black hole mass is
order parameter jmj vanishes. The role of B = 0 as controlled by one global function P on phase space.
the critical value of B is obscured by the fact that Clearly, P is the gravity equivalent of T  T in
B = 0 is singled out by symmetry. the ferromagnet. But it is tempting to speculate
A critical phase transition involves scale-invariant (Gundlach 2002)that there is also a gravity equiva-
physics. One sign of this is that fluctuations appear lent of the external magnetic field B, which gives rise
on a large range of length scales between the to a second independent critical exponent. At least
underlying atomic scale and the scale of the sample. in some situations, the angular momentum of the
In particular, the atomic scale, and any dimensionful initial data can play this role. Note that, like B,
parameters associated with that scale, must become angular momentum is a vector, with a critical value
irrelevant at the critical point. This can be taken as that is zero because all other values break rotational
the starting point for obtaining properties of the symmetry. Furthermore, the final black hole can
system at the critical point. have nonvanishing angular momentum, which must
One first defines a semigroup acting on micro- depend on the angular momentum of the initial
states: the renormalization group. Its action is to data. The former is analogous to the magnetization
group together a small number of particles as a m, the latter to the external field B. It can be shown
single particle of a fictitious new system, using some that this analogy holds perturbatively for small
averaging procedure. Alternatively, this can also be angular momentum. Future numerical simulations
done in Fourier space. One then defines a dual will show if it goes further.
action of the renormalization group on the space of
Hamiltonians by demanding that the partition
Universality and Cosmic Censorship
function is invariant under the renormalization
group action: Critical phenomena in gravitational collapse first
X X generated interest because a complicated self-similar
0
eH eH 21 structure and dimensionless numbers  and  arise
microstates microstates0 from generic initial data evolved by quite simple
field equations. Another point of interest is the
The renormalized Hamiltonian H 0 is in general rather detailed analogy of phenomena in a determi-
more complicated than the original one, but it can nistic field theory with critical phase transitions in
be approximated by a fixed expression where only statistical mechanics. But critical phenomena are
a finite number of parameters  are adjusted. Fixed important for general relativity mostly for a differ-
points of the renormalization group correspond to ent reason.
Hamiltonians with the parameters  at their critical Black holes are among the most important
values. The critical value of any dimensional solutions of general relativity because of their
parameter  must be zero (or infinity). Only universality: the black hole uniqueness theorems
dimensionless combinations can have nontrivial state that stable black holes are completely deter-
critical values. mined by their mass, angular momentum, and
The behavior of thermodynamical quantities at electric charge the KerrNewman family of black
the critical point is in general not trivial to calculate. holes. Perturbation theory shows that any perturba-
But the action of the renormalization group on tions of black holes from the KerrNewman solu-
length scales is given by its definition. The blowup tions must be radiated away.
of the correlation length
at the critical point is, Critical solutions have a similar importance
therefore, the easiest critical exponent to calculate. because they are generic intermediate states of
We make contact with critical phenomena in the evolution that are also independent of the
gravitational collapse by considering the time evolu- initial data. An important distinction is that
tion in coordinates (, x) as a renormalization group critical solutions depend on the matter model,
action. The calculation of the critical exponent for and are therefore less universal than black holes.
the black hole mass M is the precise analog of the However, critical phenomena in gravitational
calculation of the critical exponent for the correla- collapse seem to arise in axisymmetric vacuum
tion length
, substituting T  T for p  p , and spacetimes, and so are apparently not linked to the
Critical Phenomena in Gravitational Collapse 673

presence of matter. Furthermore, they also arise in Outlook


perfect-fluid matter with the equation of state
Critical phenomena in gravitational collapse are now
p = =3, which is that of an ultrarelativistic gas.
well understood in spherical symmetry, both theoreti-
This is a good approximation for matter at very
cally and in numerical simulations. In some matter
high density, such as in the big bang. This is
models, the phenomenology is quite complicated, but
important because critical phenomena probe
it still fits into the basic picture outlined here.
arbitrarily large matter densities or spacetime
The crucial question as to what happens beyond
curvatures as the initial data are fine-tuned to the
spherical symmetry remains largely unanswered at
black hole threshold. At even higher densities,
the time of writing. Perturbation theory around
presumably on the Planck scale, scale invariance is
spherical symmetry suggests that critical phenom-
again broken by quantum-gravity effects, and
ena are not restricted to exactly spherical situa-
so critical phenomena will end there.
tions. This is also supported by simulations in
The cosmic censorship conjecture states that
axisymmetric (highly nonspherical) vacuum grav-
naked singularities do not arise from suitably
ity. Other simulations of nonspherical gravitational
generic initial data for suitably well-behaved mat-
collapse which cover the necessary range of space-
ter. Critical phenomena in gravitational collapse
time scales required to see critical phenomena are
have forced a tightening of this conjecture. Type II
only just becoming available, and the results are
(self-similar) critical solutions contain a naked
not yet clear-cut. For collapse with angular
singularity, that is, a point of infinite spacetime
momentum, no high-resolution calculations have
curvature from which information can reach a
yet been carried out. As the necessary techniques
distant observer. (By contrast, the singularity inside
become available, one should be prepared for
a black hole is hidden from distant observers.) On a
numerical simulations to make dramatic extensions
kinematical level, this could be seen already from
or corrections to the picture of critical collapse
the form [10] of the metric. Because the critical
drawn up here.
solution is the end state for all initial data that are
exactly on the black hole threshold, all initial data See also: Computational Methods in General Relativity:
on the black hole threshold form a naked singular- The Theory; Spacetime Topology, Causal Structure and
ity. As type II critical phenomena appear to be Singularities; Stability of Minkowski Space; Stationary
generic at least in spherical symmetry, this means Black Holes.
that in generic self-gravitating systems, the space of
regular initial data that form naked singularities is
larger than expected, namely of codimension 1.
Excluding naked singularities from generic initial Further Reading
data may be the sharpest version of cosmic censor- Abrahams AM and Evans CR (1993) Critical behavior and
ship one can now hope to prove. scaling in vacuum axisymmetric gravitational collapse. Physi-
Another point of interest in critical collapse is that cal Review Letters 70: 29802983.
it allows one to make a small region of arbitrarily Choptuik MW (1993) Universality and scaling in gravitational
collapse of a massless scalar field. Physical Review Letters
high curvature from finite-curvature initial data.
70: 912.
This may be a route for probing quantum-gravity Choptuik MW (1999) Critical behavior in gravitational collapse.
effects. Similarly, one can make black holes that are Progress of Theoretical Physics 136 (suppl.): 353365.
much smaller than any length scale present in the Evans CR and Coleman JS (1994) Critical phenomena and self-
initial data or the matter equation of state. An similarity in the gravitational collapse of radiation fluid.
Physical Review Letters 72: 17821785.
application has been suggested for this in cosmol-
Gundlach C (1999) Living Reviews in Relativity 2: 4 (published
ogy, where primordial black holes could have electronically at http://www.livingreviews.org).
masses much smaller than the Hubble scale at Gundlach C (2002) Critical gravitational collapse with angular
which they are created, rather than of the order of momentum: from critical exponents to universal scaling
this scale. functions. Physical Review D 65: 064019.
674 Current Algebra

Current Algebra
G A Goldin, Rutgers University, Piscataway, NJ, USA More specifically (Adler and Dashen 1968), let
2006 Elsevier Ltd. All rights reserved. F a (x), a = 1, 2, . . . ,8,  = 0, 1, 2, 3, be an octet of
hadronic vector currents, where as usual
x = (x ) = (x0 , x) denotes a point in four-dimensional
spacetime. Likewise, introduce an axial vector octet
Introduction
F 5
a (x). Unless otherwise specified, we use natural
Certain commutation relations among the current units, where h = 1 and c = 1. Define the correspond-
density operators in quantum field theories define ing charges Fa and Fa5 to be the space integrals of the
an infinite-dimensional Lie algebra. The original time components of these currents, that is,
current algebra of Gell-Mann described weak and Z
electromagnetic currents of the strongly interacting 0
Fa x d3 xF 0a x0 ; x
particles (hadrons), leading to the AdlerWeisberger Z 1
formula and other important physical results. This Fa5 x0 d3 xF 50 x0
; x
a
helped inspire mathematical and quantum-theoretic
developments such as the Sugawara model, light where d3 x = dx1 dx2 dx3 . Then F1 , F2 , F3 are the
cone currents, Virasoro algebra, the mathematical three components I1 , I2 , I3 of the isotopic spin, and
theory of affine KacMoody algebras, and non- p
Y = (2 3=3)F8 is the hypercharge. The usual elec-
relativistic current algebra in quantum and statis- tromagnetic current Jem
(x0 , x) is given by
tical physics. Lie algebras of local currents may be
p !
the infinitesimal representations of loop groups,  3 

local current groups or gauge groups, diffeomorph- Jem q F 3 F 2
3 8
ism groups, and their semidirect products or other
extensions. Broadly construed, current algebra thus where q is the unit elementary charge, and the total
R
leads directly into the representation theory of charge is given by Q = d3 x Jem 0
(x0 , x) = q(I3 Y=2).
infinite-dimensional groups and algebras. Applica- The hadronic part of the weak current entering an
tions have ranged across conformally invariant effective Lagrangian can be written as
field theory, vertex operator algebras, exactly h   i
solvable lattice and continuum models in statistical 
Jw F 1  F 5
1 i F 2  F 5
2 cos C
physics, exotic particle statistics and q-commuta- h   i
tion relations, hydrodynamics and quantized vortex F 4  F 54 i F 5  F 55 sin C 3
motion. This brief survey describes but a few
highlights. where C is the Cabibbo angle (determined experi-
mentally to be  0.27 rad). The terms with F 1  F 51
and F 2  F 52 are strangeness conserving, those with
Relativistic Local Current Algebra F 4  F 54 and F 5  F 55 are not.
for Hadrons The main current algebra hypothesis is that the
time components F 0 and F 50 of these octets satisfy
To model superfluidity, Landau had proposed in
the equal-time commutation relations:
1941 a quantum hydrodynamics fundamentally
 0 0 
based on local fluid densities and currents as F a x ; x; F 0b y0 ; y x0 y0
(operator) dynamical variables. However, current X
algebra came into its own in theoretical physics with i3 x  y cabd F 0d x0 ; x
the ideas of Gell-Mann in the early 1960s. The basic  0 0
d

concept, in the era just preceding quantum chromo- F a x ; x; F 50 0
b y ; y x0 y0
dynamics (QCD), was that even without knowing X 4
the Lagrangian governing hadron dynamics in i3 x  y cabd F 50 0
d x ; x
d
detail, exact kinematical information the local  50 0 
symmetry could still be encoded in an algebra of F a x ; x; F 50 0
b y ; y x0 y0
X
currents. The local (vector and axial vector) current i3 x  y cabd F 0d x0 ; x
density operators, expressed where possible in terms d
of underlying quantized field operators in Hilbert
space, were to form two octets of Lorentz 4-vectors, where the cabd are structure constants of the Lie
with each octet corresponding to the eight genera- algebra of SU(3), antisymmetric in the indices. Since
tors of the compact Lie group SU(3). current commutators relate bilinear expressions to
Current Algebra 675

linear ones, they fix the normalizations of the beyond an experimental test of the algebra of
currents. The chiral currents F L  5
a = 1=2(F a  F a ) charges to test the actual local current algebra.
R  5
and F a = 1=2(F a F a ) commute with each Here, the prediction pertained to structure functions
other, so that the local current algebra decomposes in the deep inelastic scattering of neutrinos. This
into two independent pieces. was elaborated by Bjorken to inelastic electron
The Dirac -functions in eqns [4] require that F 0a and scattering. On the theoretical side, the study of the
50
F a be interpreted as (unbounded) operator-valued chiral current in perturbation theory led into the
distributions; while the fixed-time condition suggests theory of anomalies. All these ideas were highly
these should make mathematical sense as influential in subsequent theoretical work (Treiman
three-dimensional distributions, with x0 held constant. et al. 1985, Mickelsson 1989).
Such distributions may be modeled on the test-function It is a natural idea to try to extend eqns [4] or [6],
space D of real-valued, compactly supported, C1 which elegantly express the combined ideas of
functions on the spacelike hyperplane R3 . For functions locality and symmetry, to an equal-time commutator
fa , fa5 2 D, one has formally the smeared currents algebra that would also include the space compo-
that are expected to be bona fide (unbounded) nents of the local currents F ka , k = 1, 2, 3. One may
operators in Hilbert space; suppressing x0 , write without difficulty the commutators of the
Z charges in [1] with these space components:
F 0a fa d3 xfa xF 0a x0 ; x
R 3 Fa x0 ; F kb x0 ; x Fa5 x0 ; F 5k 0
b x ; x
Z 5 X
 
50 5
F a fa d3 x fa5 xF 50
a x0
; x i cabd F kd x0 ; x
R3 d
7
Equations [4] then become Fa x0 ; F 5k 0 5 0 k 0
b x ; x Fa x ; F b x ; x
 0    X
F a fa ; F 0b fb F 50 50
a fa ; F b fb
i cabd F 5k 0
d x ; x
X d
i F 0d cabd fa fb
d 6 But the commutator of the local time component
 0  X with the local space component of the current
F a fa ; F 50
b fb i F 50
d cabd fa fb cannot be merely the obvious extrapolation from
d
eqns [4] and [7], that is, it cannot be
Let g(x) be a C map from R3 to the Lie algebra G of
1

chiral SU(3)  SU(3), equal to zero outside a compact F a x0 ; x; F kb y0 ; yx0 = y0


set. The set of all such G-valued functions forms an X
= i3 x  y cabd F kd x0 ; x
infinite-dimensional Lie algebra under the pointwise
d
bracket, [g, g0 ](x) = [g(x), g0 (x)]. Let us call this Lie
algebra map0 (R3 , G), where the subscript 0 indicates and so forth. Under very general conditions, for a
the condition of compact support when that is relativistic theory based on local quantum fields or
applicable (on compact manifolds, we omit the sub- local observables, additional Schwinger terms are
script). Expanding g(x) with respect to a fixed basis of required on the right-hand sides of such commu-
G, we straightforwardly identify the map g with the tators (Renner 1968).
5 Well-known difficulties in specifying the Schwinger
two octets
P of 0test functions
P 50fa 5and fa . Then, defining
F (g) = a F a (fa ) a F a (fa ), eqns [6] are inter- terms are associated with the fact that operator-
preted (for fixed x0 ) as a representation F of valued distributions are singular when regarded as if
map0 (R3 , G). they were functions of spacetime points. Thus, the
Integrating out the spatial variables entirely using product of two distributions at a point is often
eqns [1] leads to a representation at x0 of G by the singular or undefined. When the currents forming a
charges Fa and Fa5 . The AdlerWeisberger sum rule local current algebra are written as normal-ordered
was first derived (in 1965) from the commutation products of field operator distributions and their
relations of these charges, together with the assump- derivatives, the Schwinger terms in their commuta-
tion of a partially conserved axial-vector current tion relations may be calculated, for example, by
(PCAC). It connected nucleon -decay coupling with splitting points in the arguments of the underlying
pionnucleon scattering cross sections, agreeing well fields, and subsequently letting the separation tend
with experiment. Various low-energy theorems toward zero. The general form of a Schwinger term
followed, also in accord with experiment. Shortly typically involves the derivative of a -function times
thereafter, Adler was able to eliminate the PCAC an operator. This may be a multiple of the identity
assumption, and derived a further sum rule going (i.e., a c-number) or not, depending on the underlying
676 Current Algebra

field-theoretic model. Furthermore, when the number Related to the Sugawara current algebra, with s = 1
of spacetime dimensions is greater than 1 1, the and the spatial dimension compactified, are affine
c-number Schwinger terms turn out to be infinite. KacMoody and Virasoro algebras (Goddard and
Hence, we do not obtain this way a bona- fide Olive 1986, Kac 1990). Consider the infinite-dimen-
infinite-dimensional, equal-time commutator algebra sional Lie algebra map(S1 , G) of smooth functions
comprising all the components of the local currents. from the circle to G under the pointwise bracket. This
is also called a loop algebra. Referring to the basis Fa ,
define Ta(m) for integer m to be the Fourier function
Sugawara, KacMoody, and  ! Fa exp [im]. The pointwise bracket in
Virasoro Algebras map(S1 , G) gives [Ta(m) , Tb(n) ] = id cabd Td(mn) for these
generators. The corresponding (untwisted) affine
Since equations such as [4] and [6] are not explicitly KacMoody algebra is a (uniquely defined, nontri-
dependent on how the currents are constructed from vial) one-dimensional central extension of this loop
underlying canonical fields, one has the possibility algebra that is, the new generator commutes with all
of writing a theory entirely in terms of self-adjoint elements of the Lie algebra and, in an irreducible
currents as the dynamical variables, bypassing the representation, must be a multiple of the identity.
field operators entirely, and expressing a Hamilto- In such a representation, the new bracket can be
nian operator directly in terms of such local written as
currents. This is in the spirit of approaches to X
quantum field theory based on local algebras of n mn
Tam ; Tb  i cabd Td kmab m;n I 10
observables. It suggests consideration of relativistic d
current algebras with finite c-number or operator
Schwinger terms in s 1 dimensions, s  1. where k is a constant. Here, Ta(m = 0) is again a
The Sugawara model, which is of this type, turned representation of G. Self-adjointness of the local
out to be one of the most influential of those currents in the representation imposes the condition


proposed in the late 1960s and early 1970s. Ta(m) = Ta(m) .


Henceforth, let G be a compact Lie group, and G Now the compactly supported C1 (tangent)
its Lie algebra; let Fa , a = 1, . . . , dim G, be a basis for vector fields on a C1 manifold M form a natural
G, with [Fa , Fb ] = id cabd Fd . The Sugawara current Lie algebra under the Lie bracket, denoted by
algebra, at the fixed time x0 = y0 (which, from here vect0 (M). In local Euclidean coordinates, for g1 , g 2 2
on, we suppress in the notation), is given by vect0 (M), one can write this bracket as
X g 1 ; g 2  g1  rg 2  g 2  rg 1 11
Ja0 x; Jb0 y i3 x  y cabd Jd0 x
d
X As the affine KacMoody algebras are central
Ja0 x; Jbk y i 3
x  y cabd Jdk x extensions of the algebra of G-valued functions on
d 8 S1 , so are Virasoro algebras central extensions of the
@ algebra of vector fields on S1 . Let L(m) denote
icab k 3 x  yI
@x the (complexified) vector field described by
Jak x; Jb y 0 k; 1; 2; 3 exp [im](1=i)@=@, for integer m. These genera-
tors then satisfy [L(m) , L(n) ] = (m  n)L(mn) .
where Ja = (Ja0 , Jak ), k = 1, 2, 3, is again a 4-vector, c is a Adjoining to the Lie algebra of vector fields a
finite constant, and I is the identity operator. The time new central element (commuting with all the
components in eqns [8] behave like the local currents in L(m) ), the Virasoro bracket in an irreducible
eqns [4]. The Schwinger term is a c-number, while representation is given by the formula
setting the commutators of the space components to
zero is the simplest choice consistent with the Jacobi Lm ; Ln  m  nLmn
identity. The Sugawara Hamiltonian is given in terms of m 1mm  1
the local currents by the formal expression: c m;n I 12
12
Z " #
1X 3 0 2
X3
k 2 where the numerical coefficient c is called the
H d x Ja x Ja x 9 Virasoro central charge; self-adjointness of the
2c a R3 k1 
currents imposes L(m) = L(m) . It is straightforward
where the pointwise products of the currents require to verify that eqn [12] satisfies the Jacobi identity.
interpretation in the particular representation. This The special form of the central term in the Virasoro
Hamiltonian leads to current conservation equations current algebra results from the GelfandFuks
for the Ja . cohomology on the algebra of vector fields.
Current Algebra 677

The KacMoody and Virasoro algebras, both M under the pointwise bracket, exponentiates to the
modeled on S1 , may be combined to form a natural local current group Map0 (M, G), consisting of
semidirect sum of Lie algebras, with the additional smooth maps from M to G that are the identity
bracket outside a compact set in M, under the pointwise
group operation. When M is taken to be the four-
Tam ; Ln  mTamn 13 dimensional spacetime manifold (rather than a
Roughly speaking, the KacMoody generators cor- spacelike hyperplane), the local current group
respond to Fourier transforms of charge densities on modeled on M is mathematically a gauge group for
S1 , whereas the Virasoro generators correspond to nonabelian gauge field theory.
Fourier transforms of infinitesimal motions in S1 . Likewise, the algebra vect0 (M) exponentiates to
The central extensions provide the finite, c-number the group Diff 0 (M) of compactly supported C1
Schwinger terms. These structures have important diffeomorphisms of M (under composition). The
application to light cone current algebra, confor- KacMoody and Virasoro algebras exponentiate to
mally invariant quantum field theories in (1 1)- central extensions of the loop group Map(S1 , G) and
dimensional spacetime, the quantum theory of the diffeomorphism group Diff(S1 ), respectively. The
strings, exactly solvable models in statistical semidirect sums of the Lie algebras are the infinite-
mechanics, and many other domains. simal generators of semidirect products of the
Of greatest physical importance, both in quantum groups.
field theory and statistical mechanics, are those Under appropriate technical conditions, self-
irreducible, self-adjoint representations of the Virasoro adjoint representations of current algebras generate
algebra known as highest weight representations, (and may be obtained from) continuous unitary
where the spectrum of the operator L(m = 0) is bounded representations of the corresponding groups. The
below. In these applications, one represents a pair of needed technical conditions have to do with the
Virasoro algebras by mutually commuting sets of existence of a dense set of analytic vectors belonging
operators L(m) and L  (m) . In the quantum theory, for to a common, dense invariant domain of essential
example, one takes the total energy H / L  (0) L(0) , self-adjointness for the currents.
 (0) (0)
and the total momentum P / L  L . In a highest
weight representation, there is a unique eigenstate of
L(0) having the lowest eigenvalue h; for this vacuum Nonrelativistic Current Algebra
jhi, L(m) jhi = 0, m > 0.
Friedan, Qiu, and Shenker showed in 1984 that In nonrelativistic local current algebra, Schwinger
highest weight representations are characterized by a terms do not appear. In 1968, Dashen and Sharp
class of specific, non-negative values of the central defined (at fixed time t, suppressed in the present
charge c and, correspondingly, of h: either c  1 (and notation) a mass density (x) = m  (x) (x) and a
h  0) or c = 1  6( 2)1 ( 3)1 , = 1, 2, 3, . . . momentum density J(x) = (h=2i){  (x)r (x) 

(and h assumes a corresponding, specified set of values [r (x)] (x)}, where is a second-quantized cano-
for each value of ). In a beautiful application to the nical field; here we keep h in the notation. The
study of the critical behavior of well-known statistical resulting equal-time algebra is the semidirect sum:
systems, in which the generator of dilations is x; y 0
proportional to L  (0) L(0) , they discovered a direct
@
correspondence with permitted values of the central x; Jk y ih k 3 x  yx
charge; thus, c = 1=2 for the Ising model, c = 7=10 for @x
the tricritical Ising model, c = 4=5 for the three-state @ 3 14
Jk x; J y ih  x  yJ y
Potts model, and c = 6=7 for the tricritical three-state @yk

Potts model. @
 3 x  yJk x
@x
Since this current algebra is independent of whether
Current Algebras and Groups obeys commutation or anticommutation relations,
Local current algebras may be exponentiated to the information as to particle statistics (Bose or
obtain corresponding infinite-dimensional topologi- Fermi) is not encoded in the Lie algebra itself but in
cal groups (Pressley and Segal 1986, Mickelsson the choice of its representation (up to unitary
1989, Kac 1990). Let G be a Lie group whose Lie equivalence). Again interpreting  andR Jk as operator-
algebra is G. The algebra map0 (M, G), consisting of valued distributions,
R define (f ) = R3 d3 x f (x)(x)
3
smooth, compactly supported G-valued functions on and J(g) = R3 d x 3k = 1 gk (x)Jk (x), where f and the
678 Current Algebra

components gk of the vector field g belong to the and  (


) 1 to obtain a unitary group representa-
function-space D. Then the Lie algebra becomes tion on complex-valued wave functions; but inequi-
valent cocycles describe unitarily inequivalent
f1 ; f2  0 representations.
f ; Jg i
hg  rf 15 The configuration space (N) , N = 1, 2, 3, . . . ,
Jg1 ; Jg 2  i
hJg1 ; g2  consists of N-point subsets of Rs , and (N) is the
(local) Lebesgue measure on (N) . The correspond-
Equations [15] are a representation by self-adjoint ing diffeomorphism group and local current algebra
operators of the semidirect sum of the abelian Lie representations describe N identical quantum parti-
algebra D with vect0 (R3 ). The corresponding group cles in s-dimensional space. When 1, we have
is the natural semidirect product of the space D bosonic exchange symmetry. Inequivalent cocycles
(regarded as an abelian topological group under on (N) are obtained (for s  2) by inducing
addition) with Diff 0 (R3 ). (generalizing Mackeys method) from inequivalent
The construction generalizes to a general manifold unitary representations of the fundamental group
M or manifold with boundary (in place of R3 ), and 1 [(N) ]. For s  3, this fundamental group is the
to a general set of charge densities that generate the symmetric group SN of particle permutations; the
local Lie algebra map0 (M, G). When M = S1 , we have odd representation of SN , N  2, gives fermionic
the KacMoody and Virasoro algebras with central exchange symmetry, while the higher-dimensional
charge zero. However, L(0) in the nonrelativistic representations are associated with particles satisfy-
(1 1)-dimensional quantum theories is propor- ing the parastatistics of Greenberg and Messiah.
tional to the total momentum P, and thus is When s = 2, however, 1 [(N) ] is the braid group
unbounded above and below. BN . Goldin, Menikoff, and Sharp obtained induced
The continuous unitary representations of representations of the current algebra describing the
Diff 0 (M), or its semidirect product with a local intermediate statistics proposed by Leinaas and
current group at fixed time, thus describe nonrela- Myrheim for identical particles in 2-space. Such
tivistic quantum systems (Albeverio et al. 1999, excitations, subsequently termed anyons by Wilc-
Goldin 2004). The unitary representation V(),  2 zek and characterized as charge-flux tube compo-
Diff 0 (M), satisfies V(gr ) = exp [i(r=
h)J(g)], where sites, are important constructs in the theory of
r 2 R and gr is the one-parameter flow in Diff 0 (M) surface phenomena such as the quantum Hall effect,
generated by the vector field g. Such a representa- and anyonic statistics has also been applied to the
tion may be described very generally by means of a study of high-Tc superconductivity. Current algebra
measure  on a configuration space , quasi-invariant representations induced by higher-dimensional
under a group action of Diff 0 (M) on , together representations of BN describe the statistics of
with a unitary 1-cocycle on Diff 0 (M)  . The plektons. Similarly, current algebra in nonsimply
Hilbert space for the representation is connected space describes the AharonovBohm
H = L2d (, W), which is the space of measurable effect. R
functions (
),
2 , taking values in an inner Let  (h) = Rs ds x h(x)  (x) denote the smeared
product space W, and square integrable with respect creation field. Let the indexed set of representations
to . The unitary representation V is given by N , JN , N = 0, 1, 2, . . . , satisfying the current algebra
s [15], act in Hilbert spaces HN , where  (h) : HN !
d HN1 , (h) : HN1 ! HN , (h)jH0 = 0, so that 
V



16
d and intertwine the N-particle diffeomorphism
group
L1 representations. Let (f ) and J(g) act on
where 
denotes the group action Diff0 (M)   ! H N , so that (f )N = N (f )N , J(g)N =
N=0
;  is the measure on  transformed by  (which, JN (g)N . Then conditions for a Fock space hier-
by the quasi-invariance of , is absolutely contin- archy are specified by commutator brackets of the
uous with respect to ); d =d is the Radon fields with the currents:
Nikodym derivative; and  (
) : W ! W is a system
 
of unitary operators in W obeying the cocycle f ; h N1 f h
equation  
18
Jg; h JN1 gh
1 2
1
2 1
17
The local creation and annihilation fields for anyons
Equations [16] and [17] hold outside sets of in R2 , obeying [18], satisfy q-commutation relations,
-measure zero in . Given the quasi-invariant where q is the relative phase change associated with
measure  on , one may always choose W = C a single counterclockwise exchange of two anyons,
Current Algebra 679

and the q-commutator [A, B]q = AB  qBA. These Further Reading


relations generalize the canonical commutation
Adler SL and Dashen RF (1968) Current Algebras and Applica-
(q = 1) and anticommutation (q = 1) relations of tions to Particle Physics. New York: Benjamin.
quantum field theory. Albeverio S, Kondratiev YuG, and Rockner M (1999) Diffeo-
When  is the configuration space of infinite but morphism groups and current algebras: configuration space
locally finite subsets of Rs , nonrelativistic current analysis in quantum theory. Reviews in Mathematical Physics
algebra describes the physics of infinite gases in 11: 123.
Arnold VI and Khesin BA (1998) Topological Methods in
continuum classical or quantum statistical Hydrodynamics. Applied Mathematical Sciences, vol. 125.
mechanics. Here, the most important kinds of Berlin: Springer.
measures  are Poisson measures (associated with Gell-Mann M and Neeman Y (2000) The Eightfold Way (1964)
gases of noninteracting particles at fixed average (reissued). Cambridge, MA: Perseus Publishing.
density) or Gibbsian measures (associated with Goddard P and Olive D (1986) KacMoody and Virasoro
algebras in relation to quantum physics. International Journal
translation-invariant two-body interactions). These of Modern Physics A 1: 303414.
measures describe equilibrium states and correlation Goldin GA (1996) Quantum vortex configurations. Acta Physica
functions in the classical case, and specify the Polonica B 27: 23412355.
current algebra representations in the quantum Goldin GA (2004) Lectures on diffeomorphism groups in
theory. quantum physics. In: Govaerts J, Hounkonnou N, and
Msezane AZ (eds.) Contemporary Problems in Mathematical
The group of volume-preserving diffeomorphisms Physics: Proceedings of the Third International Workshop,
was taken by Arnold as the symmetry group of an pp. 393. Singapore: World Scientific.
ideal, classical, incompressible fluid, and Marsden Goldin GA and Sharp DH (1991) The diffeomorphism group
and Weinstein described the hydrodynamics of such approach to anyons. International Journal of Modern Physics
a fluid using the LiePoisson bracket associated with B 5: 26252640.
Ismagilov RS (1996) Representations of Infinite-Dimensional
the nonrelativistic current algebra of divergenceless Groups. Translations of Mathematical Monographs, vol. 152.
vector fields. The idea of using this algebra to study Providence, RI: American Mathematical Society.
quantized fluid motion, included in the program Kac V (1990) Infinite Dimensional Lie Algebras. Cambridge:
proposed by Rasetti and Regge, formed the basis of Cambridge University Press.
the subsequent study of quantized vortex structures Marsden J and Weinstein A (1983) Coadjoint orbits, vortices, and
Clebsch variables for incompressible fluids. Physica D
in superfluids from the point of view of geometric 7: 305323.
quantization on coadjoint orbits of the diffeomorph- Mickelsson J (1989) Current Algebras and Groups. New York:
ism group. This leads to quantum configuration Plenum.
spaces whose elements are no longer sets of points Ottesen JT (1995) Infinite Dimensional Groups and Algebras in
for example, spaces of vortex filaments in R2 , or Quantum Physics. Berlin: Springer.
Pressley A and Segal G (1986) Loop Groups. Oxford: Oxford
ribbons and tubes in R3 . University Press.
Renner B (1968) Current Algebras and Their Applications.
See also: Algebraic Approach to Quantum Field Theory; Oxford: Pergamon.
Electroweak Theory; Quantum Chromodynamics; Sharp DH and Wightman AS (eds.) (1974) Local Currents and
Solitons and KacMoody Lie Algebras; Symmetries in Their Applications. New York: Elsevier.
Quantum Field Theory: Algebraic Aspects; Toda Lattices; Treiman SB, Jackiw R, Zumino B, and Witten E (1985) Current
Two-Dimensional Conformal Field Theory and Vertex Algebra and Anomalies. Singapore: World Scientific.
Operator Algebras.

You might also like