Encyclopedia of Mathematical Physics 1
Encyclopedia of Mathematical Physics 1
Encyclopedia of Mathematical Physics 1
info/
EDITORS
I n bygone centuries, our physical world appeared to be filled to the brim with mysteries. Divine powers
could provide for genuine miracles; water and sunlight could turn arid land into fertile pastures, but the
same powers could lead to miseries and disasters. The force of life, the vis vitalis, was assumed to be the
special agent responsible for all living things. The heavens, whatever they were for, contained stars and other
heavenly bodies that were the exclusive domain of the Gods.
Mathematics did exist, of course. Indeed, there was one aspect of our physical world that was recognised to
be controlled by precise, mathematical logic: the geometric structure of space, elaborated to become a genuine
form of art by the ancient Greeks. From my perspective, the Greeks were the first practitioners of mathematical
physics, when they discovered that all geometric features of space could be reduced to a small number of
axioms. Today, these would be called fundamental laws of physics. The fact that the flow of time could be
addressed with similar exactitude, and that it could be handled geometrically together with space, was only
recognised much later. And, yes, there were a few crazy people who were interested in the magic of numbers,
but the real world around us seemed to contain so much more that was way beyond our capacities of analysis.
Gradually, all this changed. The Moon and the planets appeared to follow geometrical laws. Galilei and
Newton managed to identify their logical rules of motion, and by noting that the concept of mass could be
applied to things in the sky just like apples and cannon balls on Earth, they made the sky a little bit more
accessible to us. Electricity, magnetism, light and sound were also found to behave in complete accordance
with mathematical equations.
Yet all of this was just a beginning. The real changes came with the twentieth century. A completely new
way of thinking, by emphasizing mathematical, logical analysis rather than empirical evidence, was pioneered
by Albert Einstein. Applying advanced mathematical concepts, only known to a few pure mathematicians, to
notions as mundane as space and time, was new to the physicists of his time. Einstein himself had a hard
time struggling through the logic of connections and curvatures, notions that were totally new to him, but are
only too familiar to students of mathematical physics today. Indeed, there is no better testimony of Einsteins
deep insights at that time, than the fact that we now teach these things regularly in our university classrooms.
Special and general relativity are only small corners of the realm of modern physics that is presently being
studied using advanced mathematical methods. We have notoriously complex subjects such as phase transitions in
condensed matter physics, superconductivity, BoseEinstein condensation, the quantum Hall effect, particularly
the fractional quantum Hall effect, and numerous topics from elementary particle physics, ranging from fibre
bundles and renormalization groups to supergravity, algebraic topology, superstring theory, CalabiYau spaces
and what not, all of which require the utmost of our mental skills to comprehend them.
The most bewildering observation that we make today is that it seems that our entire physical world
appears to be controlled by mathematical equations, and these are not just sloppy and debatable models, but
precisely documented properties of materials, of systems, and of phenomena in all echelons of our universe.
Does this really apply to our entire world, or only to parts of it? Do features, notions, entities exist that are
emphatically not mathematical? What about intuition, or dreams, and what about consciousness? What
about religion? Here, most of us would say, one should not even try to apply mathematical analysis, although
even here, some brave social scientists are making attempts at coordinating rational approaches.
No, there are clear and important differences between the physical world and the mathematical world.
Where the physical world stands out is the fact that it refers to reality, whatever reality is. Mathematics is
the world of pure logic and pure reasoning. In physics, it is the experimental evidence that ultimately decides
whether a theory is acceptable or not. Also, the methodology in physics is different.
A beautiful example is the serendipitous discovery of superconductivity. In 1911, the Dutch physicist Heike
Kamerlingh Onnes was the first to achieve the liquefaction of helium, for which a temperature below 4.25 K
had to be realized. Heike decided to measure the specific conductivity of mercury, a metal that is frozen solid
at such low temperatures. But something appeared to go wrong during the measurements, since the volt
meter did not show any voltage at all. All experienced physicists in the team assumed that they were dealing
with a malfunction. It would not have been the first time for a short circuit to occur in the electrical
equipment, but, this time, in spite of several efforts, they failed to locate it. One of the assistants was
responsible for keeping the temperature of the sample well within that of liquid helium, a dull job, requiring
nothing else than continuously watching some dials. During one of the many tests, however, he dozed off.
The temperature rose, and suddenly the measurements showed the normal values again. It then occurred to
the investigators that the effect and its temperature dependence were completely reproducible. Below 4.19
degrees Kelvin the conductivity of mercury appeared to be strictly infinite. Above that temperature, it is
finite, and the transition is a very sudden one. Superconductivity was discovered (D. van Delft, Heike
Kamerling Onnes, Uitgeverij Bert Bakker, Amsterdam, 2005 (in Dutch)).
This is not the way mathematical discoveries are made. Theorems are not produced by assistants falling
asleep, even if examples do exist of incidents involving some miraculous fortune.
The hybrid science of mathematical physics is a very curious one. Some of the topics in this Encyclopedia
are undoubtedly physical. High Tc superconductivity, breaking water waves, and magneto-hydrodynamics,
are definitely topics of physics where experimental data are considered more decisive than any high-brow
theory. Cohomology theory, DonaldsonWitten theory, and AdS/CFT correspondence, however, are examples
of purely mathematical exercises, even if these subjects, like all of the others in this compilation, are strongly
inspired by, and related to, questions posed in physics.
It is inevitable, in a compilation of a large number of short articles with many different authors, to see quite a
bit of variation in style and level. In this Encyclopedia, theoretical physicists as well as mathematicians together
made a huge effort to present in a concise and understandable manner their vision on numerous important
issues in advanced mathematical physics. All include references for further reading. We hope and expect that
these efforts will serve a good purpose.
Gerard t Hooft,
Spinoza Institute,
Utrecht University,
The Netherlands.
PREFACE
Jean-Pierre Francoise
Gregory L. Naber
Tsou Sheung Tsun
PERMISSION ACKNOWLEDGMENTS
The following material is reproduced with kind permission of Nature Publishing Group
Figures 11 and 12 of Point-vortex Dynamics
http://www.nature.com/nature
The following material is reproduced with kind permission of Oxford University Press
Figure 1 of Random Walks in Random Environments
http://www.oup.co.uk
GUIDE TO USE OF THE ENCYCLOPEDIA
Example
If you were attempting to locate material on path integral methods via the alphabetical contents list:
PATH INTEGRAL METHODS see Functional Integration in Quantum Physics; Feynman Path Integrals
The dummy entry directs you to two other entries in which path integral methods are covered. At the appropriate
locations in the contents list, the volume and page numbers for these entries are given.
If you were trying to locate the material by browsing through the text and you had looked up Path Integral Methods,
then the following information would be provided in the dummy entry:
Path Integral Methods see Functional Integration in Quantum Physics; Feynman Path Integrals
xii GUIDE TO USE OF THE ENCYCLOPEDIA
3. Cross-References
All of the articles in the Encyclopedia have been extensively cross-referenced. The cross-references, which appear at the
end of an entry, serve three different functions:
Example
The following list of cross-references appears at the end of the entry STOCHASTIC HYDRODYNAMICS
Here you will find examples of all three functions of the cross-reference list: a topic discussed in greater detail elsewhere
(e.g. Incompressible Euler Equations: Mathematical Theory), parallel discussion in other entries (e.g. Stochastic Differ-
ential Equations) and reference to entries that broaden the discussion (e.g. Turbulence Theories).
The eight Introductory Articles are not cross-referenced from any of the main entries, as it is expected that introductory
articles will be of general interest. As mentioned above, the Introductory Articles may be found at the start of Volume 1.
4. Index
The index will provide you with the volume and page number where the material is located. The index entries
differentiate between material that is a whole entry, is part of an entry, or is data presented in a figure or table. Detailed
notes are provided on the opening page of the index.
5. Contributors
A full list of contributors appears at the beginning of each volume.
CONTRIBUTORS
A Abbondandolo L Andersson
Universita di Pisa University of Miami
Pisa, Italy Coral Gables, FL, USA and Albert Einstein Institute
Potsdam, Germany
M J Ablowitz
University of Colorado B Andreas
Boulder, CO, USA Humboldt-Universitat zu Berlin
Berlin, Germany
S L Adler
Institute for Advanced Study V Araujo
Princeton, NJ, USA Universidade do Porto
Porto, Portugal
H Airault
Universite de Picardie A Ashtekar
Amiens, France Pennsylvania State University
University Park, PA, USA
G Alberti
W Van Assche
Universita di Pisa
Katholieke Universiteit Leuven
Pisa, Italy
Leuven, Belgium
S Albeverio
G Aubert
Rheinische FriedrichWilhelms-Universitat Bonn
Universite de Nice Sophia Antipolis
Bonn, Germany
Nice, France
S T Ali H Au-Yang
Concordia University Oklahoma State University
Montreal, QC, Canada Stillwater, OK, USA
R Alicki M A Aziz-Alaoui
University of Gdansk Universite du Havre
Gdansk, Poland Le Havre, France
G Altarelli V Bach
CERN Johannes Gutenberg-Universitat
Geneva, Switzerland Mainz, Germany
C Amrouche C Bachas
Universite de Pau et des Pays de lAdour Ecole Normale Superieure
Pau, France Paris, France
M Anderson V Baladi
State University of New York at Stony Brook Institut Mathematique de Jussieu
Stony Brook, NY, USA Paris, France
xiv CONTRIBUTORS
D Bambusi M Blasone
Universita di Milano Universita degli Studi di Salerno
Milan, Italy Baronissi (SA), Italy
C Bardos M Blau
Universite de Paris 7 Universite de Neuchatel
Paris, France Neuchatel, Switzerland
D Bar-Natan S Boatto
University of Toronto IMPA
Toronto, ON, Canada Rio de Janeiro, Brazil
L V Bogachev
E L Basor
University of Leeds
California Polytechnic State University
Leeds, UK
San Luis Obispo, CA, USA
L Boi
M T Batchelor EHESS and LUTH
Australian National University Paris, France
Canberra, ACT, Australia
M Bojowald
S Bauer The Pennsylvania State University
Universitat Bielefeld University Park, PA, USA
Bielefeld, Germany
C Bonatti
V Beffara Universite de Bourgogne
Ecole Nomale Superieure de Lyon Dijon, France
Lyon, France
P Bonckaert
R Beig Universiteit Hasselt
Universitat Wien Diepenbeek, Belgium
Vienna, Austria
F Bonetto
M I Belishev Georgia Institute of Technology
Petersburg Department of Steklov Institute Atlanta, GA, USA
of Mathematics
St. Petersburg, Russia G Bouchitte
Universite de Toulon et du Var
La Garde, France
P Bernard
Universite de Paris Dauphine
A Bovier
Paris, France
Weierstrass Institute for Applied Analysis and Stochastics
Berlin, Germany
D Birmingham
University of the Pacific H W Braden
Stockton, CA, USA University of Edinburgh
Edinburgh, UK
Jir Bicak
Charles University, Prague, Czech Republic H Bray
and Albert Einstein Institute Duke University
Potsdam, Germany Durham, NC, USA
C Blanchet Y Brenier
Universite de Bretagne-Sud Universite de Nice Sophia Antipolis
Vannes, France Nice, France
CONTRIBUTORS xv
J Bros J Cardy
CEA/DSM/SPhT, CEA/Saclay Rudolf Peierls Centre for Theoretical Physics
Gif-sur-Yvette, France Oxford, UK
R Brunetti R Caseiro
Universitat Hamburg Universidade de Coimbra
Hamburg, Germany Coimbra, Portugal
M Bruschi A S Cattaneo
Universita di Roma La Sapienza Universitat Zurich
Rome, Italy Zurich, Switzerland
T Brzezinski A Celletti
University of Wales Swansea Universita di Roma Tor Vergata
Swansea, UK Rome, Italy
D Buchholz D Chae
Universitat Gottingen Sungkyunkwan University
Gottingen, Germany Suwon, South Korea
F H Busse L Chierchia
Universitat Bayreuth Universita degli Studi Roma Tre
Bayreuth, Germany Rome, Italy
G Buttazzo S Chmutov
Universita di Pisa Petersburg Department of Steklov
Pisa, Italy Institute of Mathematics
St. Petersburg, Russia
P Butta
Universita di Roma La Sapienza M W Choptuik
Rome, Italy University of British Columbia
Vancouver, Canada
S L Cacciatori
Universita di Milano Y Choquet-Bruhat
Milan, Italy Universite P.-M. Curie, Paris VI
Paris, France
P T Callaghan
Victoria University of Wellington P T Chrusciel
Wellington, New Zealand Universite de Tours
Tours, France
Francesco Calogero
University of Rome, Rome, Italy and Institute Chong-Sun Chu
Nazionale di Fisica Nucleare University of Durham
Rome, Italy Durham, UK
A Carati F Cipriani
Universita di Milano Politecnico di Milano
Milan, Italy Milan, Italy
xvi CONTRIBUTORS
R L Cohen G W Delius
Stanford University University of York
Stanford, CA, USA York, UK
T H Colding G F dellAntonio
University of New York Universita di Roma La Sapienza
New York, NY, USA Rome, Italy
J C Collins
C DeWitt-Morette
Penn State University
The University of Texas at Austin
University Park, PA, USA
Austin, TX, USA
G Comte
Universite de Nice Sophia Antipolis L Diosi
Nice, France Research Institute for Particle and Nuclear Physics
Budapest, Hungary
A Constantin
Trinity College A Doliwa
Dublin, Republic of Ireland University of Warmia and Mazury in Olsztyn
Olsztyn, Poland
D Crowdy
Imperial College G Dolzmann
London, UK University of Maryland
College Park, MD, USA
A B Cruzeiro
University of Lisbon
S K Donaldson
Lisbon, Portugal
Imperial College
London, UK
G Dal Maso
SISSA
Trieste, Italy T C Dorlas
Dublin Institute for Advanced Studies
F Dalfovo Dublin, Republic of Ireland
Universita di Trento
Povo, Italy M R Douglas
Rutgers, The State University of New Jersey
A S Dancer Piscataway, NJ, USA
University of Oxford
Oxford, UK M Dutsch
Universitat Zurich
P DAncona Zurich, Switzerland
Universita di Roma La Sapienza
Rome, Italy
B Dubrovin
SISSA-ISAS
S R Das
Trieste, Italy
University of Kentucky
Lexington, KY, USA
J J Duistermaat
E Date Universiteit Utrecht
Osaka University Utrecht, The Netherlands
Osaka, Japan
S Duzhin
N Datta Petersburg Department of Steklov Institute of
University of Cambridge Mathematics
Cambridge, UK St. Petersburg, Russia
CONTRIBUTORS xvii
G Ecker B Ferrario
Universitat Wien Universita di Pavia
Vienna, Austria Pavia, Italy
M Efendiev R Finn
Universitat Stuttgart Stanford University
Stuttgart, Germany Stanford, CA, USA
T Eguchi D Fiorenza
University of Tokyo Universita di Roma La Sapienza
Tokyo, Japan Rome, Italy
J Ehlers A E Fischer
Max Planck Institut fur Gravitationsphysik University of California
(Albert-Einstein Institut) Santa Cruz, CA, USA
Golm, Germany
A S Fokas
P E Ehrlich University of Cambridge
University of Florida Cambridge, UK
Gainesville, FL, USA
J-P Francoise
D Einzel Universite P.-M. Curie, Paris VI
Bayerische Akademie der Wissenschaften Paris, France
Garching, Germany
S Franz
G A Elliott The Abdus Salam ICTP
University of Toronto Trieste, Italy
Toronto, Canada
L Frappat
G F R Ellis Universite de Savoie
University of Cape Town Chambery-Annecy, France
Cape Town, South Africa
J Frauendiener
C L Epstein Universitat Tubingen
University of Pennsylvania Tubingen, Germany
Philadelphia, PA, USA
K Fredenhagen
J Escher Universitat Hamburg
Universitat Hannover Hamburg, Germany
Hannover, Germany
S Friedlander
J B Etnyre
University of Illinois-Chicago
University of Pennsylvania
Chicago, IL, USA
Philadelphia, PA, USA
G Falkovich M R Gaberdiel
Weizmann Institute of Science ETH Zurich
Rehovot, Israel Zurich, Switzerland
M Farge G Gaeta
Ecole Normale Superieure Universita di Milano
Paris, France Milan, Italy
xviii CONTRIBUTORS
L Galgani H Gottschalk
Universita di Milano Rheinische Friedrich-Wilhelms-Universitat Bonn
Milan, Italy Bonn, Germany
G Gallavotti O Goubet
Universita di Roma La Sapienza Universite de Picardie Jules Verne
Rome, Italy Amiens, France
R Gambini T R Govindarajan
Universidad de la Republica The Institute of Mathematical Sciences
Montevideo, Uruguay Chennai, India
G Gentile A Grassi
Universita degli Studi Roma Tre University of Pennsylvania
Rome, Italy Philadelphia, PA, USA
P G Grinevich
A Di Giacomo
L D Landau Institute for
Universita di Pisa
Theoretical Physics
Pisa, Italy
Moscow, Russia
P B Gilkey
Ch Gruber
University of Oregon
Ecole Polytechnique Federale de Lausanne
Eugene, OR, USA
Lausanne, Switzerland
S Gindikin F Guerra
Rutgers University Universita di Roma La Sapienza
Piscataway, NJ, USA Rome, Italy
A Giorgilli T Guhr
Universita di Milano Lunds Universitet
Milan, Italy Lund, Sweden
G A Goldin C Guillope
Rutgers University Universite Paris XII Val de Marne
Piscataway, NJ, USA Creteil, France
G Gonzalez C Gundlach
Louisiana State University University of Southampton
Baton Rouge, LA, USA Southampton, UK
R Gopakumar S Gutt
Harish-Chandra Research Institute Universite Libre de Bruxelles
Allahabad, India Brussels, Belgium
D Gottesman K Hannabuss
Perimeter Institute University of Oxford
Waterloo, ON, Canada Oxford, UK
CONTRIBUTORS xix
M Haragus D D Holm
Universite de Franche-Comte Imperial College
Besancon, France London, UK
B Hasselblatt A Huckleberry
Tufts University Ruhr-Universitat Bochum
Medford, MA, USA Bochum, Germany
P Hayden K Hulek
McGill University Universitat Hannover
Montreal, QC, Canada Hannover, Germany
D C Heggie D Iagolnitzer
The University of Edinburgh CEA/DSM/SPhT, CEA/Saclay
Edinburgh, UK Gif-sur-Yvette, France
B Helffer R Illge
Universite Paris-Sud Friedrich-Schiller-Universitat Jena
Orsay, France Jena, Germany
G M Henkin P Imkeller
Universite P.-M. Curie, Paris VI Humboldt Universitat zu Berlin
Paris, France Berlin, Germany
M Henneaux G Iooss
Universite Libre de Bruxelles Institut Non Lineaire de Nice
Bruxelles, Belgium Valbonne, France
S Herrmann M Irigoyen
Universite Henri Poincare, Nancy 1 Universite P.-M. Curie, Paris VI
Vandoeuvre-les-Nancy, France Paris, France
C P Herzog J Isenberg
University of California at Santa Barbara University of Oregon
Santa Barbara, CA, USA Eugene, OR, USA
J G Heywood R Ivanova
University of British Columbia University of Hawaii Hilo
Vancouver, BC, Canada Hilo, HI, USA
A C Hirshfeld E M Izhikevich
Universitat Dortmund The Neurosciences Institute
Dortmund, Germany San Diego, CA, USA
A S Holevo R W Jackiw
Steklov Mathematical Institute Massachusetts Institute of Technology
Moscow, Russia Cambridge, MA, USA
T J Hollowood J K Jain
University of Wales Swansea The Pennsylvania State University
Swansea, UK University Park, PA, USA
xx CONTRIBUTORS
M Jardim L H Kauffman
IMECCUNICAMP University of Illinois at Chicago
Campinas, Brazil Chicago, IL, USA
L C Jeffrey R K Kaul
University of Toronto The Institute of Mathematical Sciences
Toronto, ON, Canada Chennai, India
J Jimenez Y Kawahigashi
Universidad Politecnica de Madrid University of Tokyo
Madrid, Spain Tokyo, Japan
S Jitomirskaya B S Kay
University of California at Irvine University of York
Irvine, CA, USA York, UK
P Jizba R Kenyon
Czech Technical University University of British Columbia
Prague, Czech Republic Vancouver, BC, Canada
A Joets M Keyl
Universite Paris-Sud Universita di Pavia
Orsay, France Pavia, Italy
K Johansson T W B Kibble
Kungl Tekniska Hogskolan Imperial College
Stockholm, Sweden London, UK
G Jona-Lasinio S Kichenassamy
Universita di Roma La Sapienza Universite de Reims Champagne-Ardenne
Rome, Italy Reims, France
V F R Jones J Kim
University of California at Berkeley University of California at Irvine
Berkeley, CA, USA Irvine, USA
N Joshi S B Kim
University of Sydney Chonnam National University
Sydney, NSW, Australia Gwangju, South Korea
D D Joyce A Kirillov
University of Oxford University of Pennsylvania
Oxford, UK Philadelphia, PA, USA
G Kasperski K Kirsten
Universite Paris-Sud XI Baylor University
Orsay, France Waco, TX, USA
CONTRIBUTORS xxi
F Kirwan M Krbec
University of Oxford Academy of Sciences
Oxford, UK Prague, Czech Republic
S Klainerman D Kreimer
Princeton University IHES
Princeton, NJ, USA Bures-sur-Yvette, France
I R Klebanov A Kresch
Princeton University University of Warwick
Princeton, NJ, USA Coventry, UK
Y Kondratiev D Kretschmann
Universitat Bielefeld Technische Universitat Braunschweig
Bielefeld, Germany Braunschweig, Germany
A Konechny P B Kronheimer
Rutgers, The State University of New Jersey Harvard University
Piscataway, NJ, USA Cambridge, MA, USA
K Konishi B Kuckert
Universita di Pisa Universitat Hamburg
Pisa, Italy Hamburg, Germany
T H Koornwinder Y Kuramoto
University of Amsterdam Hokkaido University
Amsterdam, The Netherlands Sapporo, Japan
P Kornprobst J M F Labastida
INRIA CSIC
Sophia Antipolis, France Madrid, Spain
V P Kostov G Labrosse
Universite de Nice Sophia Antipolis Universite Paris-Sud XI
Nice, France Orsay, France
R Kotecky C Landim
Charles University IMPA, Rio de Janeiro, Brazil and UMR 6085
Prague, Czech Republic and the and Universite de Rouen
University of Warwick, UK France
Y Kozitsky E Langmann
Uniwersytet Marii Curie-Sklodowskiej KTH Physics
Lublin, Poland Stockholm, Sweden
P Kramer S Laporta
Universitat Tubingen Universita di Parma
Tubingen, Germany Parma, Italy
C Krattenthaler O D Lavrentovich
Universitat Wien Kent State University
Vienna, Austria Kent, OH, USA
xxii CONTRIBUTORS
G F Lawler M Lyubich
Cornell University University of Toronto
Ithaca, NY, USA Toronto, ON, Canada and Stony Brook University
NY, USA
C Le Bris
CERMICS ENPC R Leandre
Champs Sur Marne, France Universite de Bourgogne
Dijon, France
A Lesne
Universite P.-M. Curie, Paris VI P Levay
Paris, France Budapest University of Technology and Economics
Budapest, Hungary
D Levi
Universita Roma Tre R Maartens
Rome, Italy Portsmouth University
Portsmouth, UK
J Lewandowski
Uniwersyte Warszawski N MacKay
Warsaw, Poland University of York
York, UK
R G Littlejohn
J Magnen
University of California at Berkeley
Ecole Polytechnique
Berkeley, CA, USA
France
R Livi
F Magri
Universita di Firenze
Universita di Milano Bicocca
Sesto Fiorentino, Italy
Milan, Italy
R Longoni
J Maharana
Universita di Roma La Sapienza
Institute of Physics
Rome, Italy
Bhubaneswar, India
J Lowengrub S Majid
University of California at Irvine Queen Mary, University of London
Irvine, USA London, UK
C Lozano C Marchioro
INTA Universita di Roma La Sapienza
Torrejon de Ardoz, Spain Rome, Italy
T T Q Le K Marciniak
Georgia Institute of Technology Linkoping University
Atlanta, GA, USA Norrkoping, Sweden
B Lucquin-Desreux M Marcolli
Universite P.-M. Curie, Paris VI Max-Planck-Institut fur Mathematik
Paris, France Bonn, Germany
V Lyubashenko M Marino
Institute of Mathematics CERN
Kyiv, Ukraine Geneva, Switzerland
CONTRIBUTORS xxiii
J Marklof P K Mitter
University of Bristol Universite de Montpellier 2
Bristol, UK Montpellier, France
L Mason
Y Morita
University of Oxford
Ryukoku University
Oxford, UK
Otsu, Japan
V Mastropietro
Universita di Roma Tor Vergata P J Morrison
Rome, Italy University of Texas at Austin
Austin, TX, USA
V Mathai
University of Adelaide J Mund
Adelaide, SA, Australia Universidade de Sao Paulo
Sao Paulo, Brazil
J Mawhin
Universite Catholique de Louvain F Musso
Louvain-la-Neuve, Belgium Universita Roma Tre
Rome, Italy
S Mazzucchi
Universita di Trento
Povo, Italy G L Naber
Drexel University
B M McCoy Philadelphia, PA, USA
State University of New York at Stony Brook
Stony Brook, NY, USA B Nachtergaele
University of California at Davis
E Meinrenken Davis, CA, USA
University of Toronto
Toronto, ON, Canada
C Nash
National University of Ireland
I Melbourne
Maynooth, Ireland
University of Surrey
Guildford, UK
S Necasova
J Mickelsson Academy of Sciences
KTH Physics Prague, Czech Republic
Stockholm, Sweden
A I Neishtadt
W P Minicozzi II Russian Academy of Sciences
University of New York Moscow, Russia
New York, NY, USA
N Neumaier
S Miracle-Sole
Albert-Ludwigs-University in Freiburg
Centre de Physique Theorique, CNRS
Freiburg, Germany
Marseille, France
A Miranville S E Newhouse
Universite de Poitiers Michigan State University
Chasseneuil, France E. Lansing, MI, USA
xxiv CONTRIBUTORS
C M Newman P E Parker
New York University Wichita State University
New York, NY, USA Wichita KS, USA
S Paycha
S Nikcevic
Universite Blaise Pascal
SANU
Aubiere, France
Belgrade, Serbia and Montenegro
P A Pearce
M Nitsche University of Melbourne
University of New Mexico Parkville VIC, Australia
Albuquerque, NM, USA
P Pearle
R G Novikov Hamilton College
Universite de Nantes Clinton, NY, USA
Nantes, France
M Pedroni
J M Nunes da Costa Universita di Bergamo
Universidade de Coimbra Dalmine (BG), Italy
Coimbra, Portugal
B Pelloni
University of Reading
S OBrien
UK
Tyndall National Institute
Cork, Republic of Ireland
R Penrose
University of Oxford
A Okounkov Oxford, UK
Princeton University
Princeton, NJ, USA A Perez
Penn State University,
A Onuki University Park, PA, USA
Kyoto University
Kyoto, Japan J H H Perk
Oklahoma State University
J-P Ortega Stillwater, OK, USA
Universite de Franche-Comte
Besancon, France T Peternell
Universitat Bayreuth
Bayreuth, Germany
H Osborn
University of Cambridge
D Petz
Cambridge, UK
Budapest University of Technology and Economics
Budapest, Hungary
Maciej P Wojtkowski
University of Arizona
M J Pflaum
Tucson, AZ, USA and Institute of Mathematics PAN
Johann Wolfgang Goethe-Universitat
Warsaw, Poland
Frankfurt, Germany
J Palmer B Piccoli
University of Arizona Istituto per le Applicazioni del Calcolo
Tucson, AZ, USA Rome, Italy
J H Park C Piquet
Sungkyunkwan University Universite P.-M. Curie, Paris VI
Suwon, South Korea Paris, France
CONTRIBUTORS xxv
S Pokorski E Remiddi
Warsaw University Universita di Bologna
Warsaw, Poland Bologna, Italy
E Presutti
J E Roberts
Universita di Roma Tor Vergata
Universita di Roma Tor Vergata
Rome, Italy
Rome, Italy
E Previato
Boston University L Rey-Bellet
Boston, MA, USA University of Massachusetts
Amherst, MA, USA
B Prinari
Universita degli Studi di Lecce R Robert
Lecce, Italy Universite Joseph Fourier
Saint Martin DHeres, France
J Pullin
Louisiana State University F A Rogers
Baton Rouge, LA, USA Kings College London
London, UK
M Pulvirenti
Universita di Roma La Sapienza
R M S Rosa
Rome, Italy
Universidade Federal do Rio de Janeiro
Rio de Janeiro, Brazil
O Ragnisco
Universita Roma Tre
Rome, Italy C Rovelli
Universite de la Mediterranee et Centre
P Ramadevi de Physique Theorique
Indian Institute of Technology Bombay Marseilles, France
Mumbai, India
S N M Ruijsenaars
S A Ramakrishna Centre for Mathematics and Computer Science
Indian Institute of Technology Amsterdam, The Netherlands
Kanpur, India
F Russo
J Rasmussen Universite Paris 13
Princeton University Villetaneuse, France
Princeton, NJ, USA
L H Ryder
L Rastelli
University of Kent
Princeton University
Canterbury, UK
Princeton, NJ, USA
T S Ratiu S Sachdev
Ecole Polytechnique Federale de Lausanne Yale University
Lausanne, Switzerland New Haven, CT, USA
S Rauch-Wojciechowski H Sahlmann
Linkoping University Universiteit Utrecht
Linkoping, Sweden Utrecht, The Netherlands
xxvi CONTRIBUTORS
M Salmhofer M A Semenov-Tian-Shansky
Universitat Leipzig Steklov Institute of Mathematics
Leipzig, Germany St. Petersburg, Russia and and Universite de Bourgogne
Dijon, France
P M Santini
Universita di Roma La Sapienza A N Sengupta
Rome, Italy Louisiana State University
Baton Rouge LA, USA
A Sarmiento
Universidade Federal de Minas Gerais
S Serfaty
Belo Horizonte, Brazil
New York University
New York, NY, USA
R Sasaki
Kyoto University
E R Sharpe
Kyoto, Japan
University of Utah
Salt Lake City, UT, USA
A Savage
University of Toronto
Toronto, ON, Canada D Shepelsky
Institute for Low Temperature Physics and Engineering
M Schechter Kharkov, Ukraine
University of California at Irvine
Irvine, CA, USA S Shlosman
Universite de Marseille
D-M Schlingemann Marseille, France
Technical University of Braunschweig
Braunschweig, Germany A Siconolfi
Universita di Roma La Sapienza
R Schmid Rome, Italy
Emory University
Atlanta, GA, USA
V Sidoravicius
IMPA
G Schneider
Rio de Janeiro, Brazil
Universitat Karlsruhe
Karlsruhe, Germany
J A Smoller
K Schneider University of Michigan
Universite de Provence Ann Arbor MI, USA
Marseille, France
M Socolovsky
B Schroer Universidad Nacional Autonoma de Mexico
Freie Universitat Berlin Mexico DF, Mexico
Berlin, Germany
J P Solovej
T Schucker
University of Copenhagen
Universite de Marseille
Copenhagen, Denmark
Marseille, France
S Scott A Soshnikov
Kings College London University of California at Davis
London, UK Davis, CA, USA
P Selick J M Speight
University of Toronto University of Leeds
Toronto, ON, Canada Leeds, UK
CONTRIBUTORS xxvii
H Spohn B Temple
Technische Universitat Munchen University of California at Davis
Garching, Germany Davis, CA, USA
J Stasheff R P Thomas
Lansdale, PA, USA Imperial College
London, UK
D L Stein
University of Arizona U Tillmann
Tucson, AZ, USA University of Oxford
Oxford, UK
K S Stelle
Imperial College K P Tod
London, UK University of Oxford
Oxford, UK
G Sterman
J A Toth
Stony Brook University
McGill University
Stony Brook, NY, USA
Montreal, QC, Canada
S Stringari
C A Tracy
Universita di Trento
University of California at Davis
Povo, Italy
Davis, CA, USA
S J Summers
A Trautman
University of Florida
Warsaw University
Gainesville, FL, USA
Warsaw, Poland
V S Sunder
D Treschev
The Institute of Mathematical Sciences
Moscow State University
Chennai, India
Moscow, Russia
Y B Suris L Triolo
Technische Universitat Munchen Universita di Roma Tor Vergata
Munchen, Germany Rome, Italy
R J Szabo J Troost
Heriot-Watt University Ecole Normale Superieure
Edinburgh, UK Paris, France
H Tasaki V Turaev
Gakushuin University IRMA
Tokyo, Japan Strasbourg, France
M E Taylor D Ueltschi
University of North Carolina University of Arizona
Chapel Hill, NC, USA Tucson, AZ, USA
R Temam A M Uranga
Indiana University Consejo Superior de Investigaciones Cientificas
Bloomington, IN, USA Madrid, Spain
xxviii CONTRIBUTORS
A Valentini R F Werner
Perimeter Institute for Theoretical Physics Technische Universitat Braunschweig
Waterloo, ON, Canada Braunschweig, Germany
M Vaugon H Widom
Universite P.-M. Curie, Paris VI University of California at Santa Cruz
Paris, France Santa Cruz, CA, USA
P Di Vecchia C M Will
Nordita Washington University
Copenhagen, Denmark St. Louis, MO, USA
A F Verbeure N M J Woodhouse
Institute for Theoretical Physics University of Oxford
KU Leuven, Belgium Oxford, UK
M Viana V Wunsch
IMPA Friedrich-Schiller-Universitat Jena
Rio de Janeiro, Brazil Jena, Germany
G Vitiello D R Yafaev
Universita degli Studi di Salerno Universite de Rennes
Baronissi (SA), Italy Rennes, France
S Waldmann M Yuri
Albert-Ludwigs-Universitat Freiburg Hokkaido University
Freiburg, Germany Sapporo, Japan
J Wambsganss D Zubrinic
Universitat Heidelberg University of Zagreb
Heidelberg, Germany Zagreb, Croatia
R S Ward V Zupanovic
University of Durham University of Zagreb
Durham, UK Zagreb, Croatia
E Wayne R Zecchina
Boston University International Centre for Theoretical Physics (ICTP)
Boston, MA, USA Trieste, Italy
F W Wehrli S Zelditch
University of Pennsylvania Johns Hopkins University
Philadelphia, PA, USA Baltimore, MD, USA
CONTRIBUTORS xxix
S Zelik M R Zirnbauer
Universitat Stuttgart Universitat Koln
Stuttgart, Germany Koln, Germany
M B Ziane
University of Southern California
Los Angeles, CA, USA
CONTENTS LIST BY SUBJECT
Location references refer to the volume number and page number (separated by a colon).
Symmetries in Quantum Field Theory: Algebraic Quantum Mechanics: Weak Measurements 4:276
Aspects 5:179 Quantum n-Body Problem 4:283
Symmetries in Quantum Field Theory of Lower Quantum Spin Systems 4:295
Spacetime Dimensions 5:172 Quasiperiodic Systems 4:308
Symmetry Breaking in Field Theory 5:198 Schrodinger Operators 4:487
Two-Dimensional Models 5:328 Stability of Matter 5:8
Thermal Quantum Field Theory 5:227 Stationary Phase Approximation 5:44
TomitaTakesaki Modular Theory 5:251 Supersymmetric Quantum Mechanics 5:145
Topological Defects and Their Homotopy Topological Defects and Their Homotopy
Classification 5:257 Classification 5:257
Twistor Theory: Some Applications 5:303
Hamiltonian Group Actions 2:600 Free Interfaces and Free Discontinuities: Variational
Mirror Symmetry: A Geometric Survey 3:439 Problems 2:411
Multi-Hamiltonian Systems 3:459 -Convergence and Homogenization 2:449
Recursion Operators in Classical Gauge Theory: Mathematical Applications 2:468
Mechanics 4:371 Geometric Measure Theory 2:520
Singularity and Bifurcation Theory 4:588 HamiltonJacobi Equations and Dynamical
Stationary Phase Approximation 5:44 Systems: Variational Aspects 2:636
Minimax Principle in the Calculus of
Variations 3:432
Variational Techniques Optimal Transportation 3:632
Capillary Surfaces 1:431 Variational Techniques for GinzburgLandau
Control Problems in Mathematical Physics 1:636 Energies 5:355
Convex Analysis and Duality Methods 1:642 Variational Techniques for Microstructures 5:363
CONTENTS
VOLUME 1
A
Abelian and Nonabelian Gauge Theories Using Differential Forms A C Hirshfeld 141
Abelian Higgs Vortices J M Speight 151
Adiabatic Piston Ch Gruber and A Lesne 160
AdS/CFT Correspondence C P Herzog and I R Klebanov 174
Affine Quantum Groups G W Delius and N MacKay 183
AharonovBohm Effect M Socolovsky 191
Algebraic Approach to Quantum Field Theory R Brunetti and K Fredenhagen 198
Anderson Localization see Localization for Quasiperiodic Potentials
Anomalies S L Adler 205
Arithmetic Quantum Chaos J Marklof 212
Asymptotic Structure and Conformal Infinity J Frauendiener 221
Averaging Methods A I Neishtadt 226
Axiomatic Approach to Topological Quantum Field Theory C Blanchet and V Turaev 232
Axiomatic Quantum Field Theory B Kuckert 234
B
Backlund Transformations D Levi 241
BatalinVilkovisky Quantization A C Hirshfeld 247
Bethe Ansatz M T Batchelor 253
BF Theories M Blau 257
Bicrossproduct Hopf Algebras and Noncommutative Spacetime S Majid 265
xl CONTENTS
C
C -Algebras and their Classification G A Elliott 393
Calibrated Geometry and Special Lagrangian Submanifolds D D Joyce 398
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type S N M Ruijsenaars 403
Canonical General Relativity C Rovelli 412
Capacities Enhanced by Entanglement P Hayden 418
Capacity for Quantum Information D Kretschmann 424
Capillary Surfaces R Finn 431
Cartan Model see Equivariant Cohomology and the Cartan Model
Cauchy Problem for Burgers-Type Equations G M Henkin 446
Cellular Automata M Bruschi and F Musso 455
Central Manifolds, Normal Forms P Bonckaert 467
Channels in Quantum Information Theory M Keyl 472
Chaos and Attractors R Gilmore 477
Characteristic Classes P B Gilkey, R Ivanova and S Nikcevic 488
ChernSimons Models: Rigorous Results A N Sengupta 496
Classical Groups and Homogeneous Spaces S Gindikin 500
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups M A Semenov-Tian-Shansky 511
Clifford Algebras and Their Representations A Trautman 518
Cluster Expansion R Kotecky 531
Coherent States S T Ali 537
Cohomology Theories U Tillmann 545
Combinatorics: Overview C Krattenthaler 553
Compact Groups and Their Representations A Kirillov and A Kirillov, Jr. 576
Compactification of Superstring Theory M R Douglas 586
Compressible Flows: Mathematical Theory G-Q Chen 595
Computational Methods in General Relativity: The Theory M W Choptuik 604
CONTENTS xli
VOLUME 2
D
Deformation Quantization A C Hirshfeld 1
Deformation Quantization and Representation Theory S Waldmann 9
Deformation Theory M J Pflaum 16
Deformations of the Poisson Bracket on a Symplectic Manifold S Gutt and S Waldmann 24
@-Approach to Integrable Systems P G Grinevich 34
Derived Categories E R Sharpe 41
Determinantal Random Fields A Soshnikov 47
Diagrammatic Techniques in Perturbation Theory G Gentile 54
Dimer Problems R Kenyon 61
Dirac Fields in Gravitation and Nonabelian Gauge Theory J A Smoller 67
Dirac Operator and Dirac Field S N M Ruijsenaars 74
Dispersion Relations J Bros 87
Dissipative Dynamical Systems of Infinite Dimension M Efendiev, S Zelik and A Miranville 101
Donaldson Invariants see Gauge Theoretic Invariants of 4-Manifolds
DonaldsonWitten Theory M Marino 110
Duality in Topological Quantum Field Theory C Lozano and J M F Labastida 118
Dynamical Systems and Thermodynamics A Carati, L Galgani and A Giorgilli 125
Dynamical Systems in Mathematical Physics: An Illustration from Water Waves O Goubet 133
E
Effective Field Theories G Ecker 139
Eigenfunctions of Quantum Completely Integrable Systems J A Toth 148
Eight Vertex and Hard Hexagon Models P A Pearce 155
Einstein Equations: Exact Solutions Jir Bicak 165
Einstein Equations: Initial Value Formulation J Isenberg 173
Einstein Manifolds A S Dancer 182
EinsteinCartan Theory A Trautman 189
Einsteins Equations with Matter Y Choquet-Bruhat 195
ElectricMagnetic Duality Tsou Sheung Tsun 201
Electroweak Theory K Konishi 209
Elliptic Differential Equations: Linear Theory C Amrouche, M Krbec, S Necasova and B Lucquin-Desreux 216
Entanglement R F Werner 228
xlii CONTENTS
F
FalicovKimball Model Ch Gruber and D Ueltschi 283
Fedosov Quantization N Neumaier 291
Feigenbaum Phenomenon see Universality and Renormalization
Fermionic Systems V Mastropietro 300
Feynman Path Integrals S Mazzucchi 307
Finite-Dimensional Algebras and Quivers A Savage 313
Finite Group Symmetry Breaking G Gaeta 322
Finite Weyl Systems D-M Schlingemann 328
Finitely Correlated States R F Werner 334
Finite-Type Invariants D Bar-Natan 340
Finite-Type Invariants of 3-Manifolds T T Q Le 348
Floer Homology P B Kronheimer 356
Fluid Mechanics: Numerical Methods J-L Guermond 365
Fourier Law F Bonetto and L Rey-Bellet 374
FourierMukai Transform in String Theory B Andreas 379
Four-Manifold Invariants and Physics C Nash 386
Fractal Dimensions in Dynamics V Zupanovic and D Zubrinic 394
Fractional Quantum Hall Effect J K Jain 402
Free Interfaces and Free Discontinuities: Variational Problems G Buttazzo 411
Free Probability Theory D-V Voiculescu 417
Frobenius Manifolds see WDVV Equations and Frobenius Manifolds
Functional Equations and Integrable Systems H W Braden 425
Functional Integration in Quantum Physics C DeWitt-Morette 434
G
-Convergence and Homogenization G Dal Maso 449
Gauge Theoretic Invariants of 4-Manifolds S Bauer 457
Gauge Theories from Strings P Di Vecchia 463
Gauge Theory: Mathematical Applications S K Donaldson 468
General Relativity: Experimental Tests C M Will 481
General Relativity: Overview R Penrose 487
Generic Properties of Dynamical Systems C Bonatti 494
Geometric Analysis and General Relativity L Andersson 502
Geometric Flows and the Penrose Inequality H Bray 510
Geometric Measure Theory G Alberti 520
Geometric Phases P Levay 528
Geophysical Dynamics M B Ziane 534
Gerbes in Quantum Field Theory J Mickelsson 539
CONTENTS xliii
H
Hamiltonian Fluid Dynamics P J Morrison 593
Hamiltonian Group Actions L C Jeffrey 600
Hamiltonian Reduction of Einsteins Equations A E Fischer and V Moncrief 607
Hamiltonian Systems: Obstructions to Integrability M Irigoyen 624
Hamiltonian Systems: Stability and Instability Theory P Bernard 631
HamiltonJacobi Equations and Dynamical Systems: Variational Aspects A Siconolfi 636
Hard Hexagon Model see Eight Vertex and Hard Hexagon Models
High Tc Superconductor Theory S-C Zhang 645
Holomorphic Dynamics M Lyubich 652
Holonomic Quantum Fields J Palmer 660
Homeomorphisms and Diffeomorphisms of the Circle A Zumpano and A Sarmiento 665
Homoclinic Phenomena S E Newhouse 672
Hopf Algebra Structure of Renormalizable Quantum Field Theory D Kreimer 678
Hopf Algebras and q-Deformation Quantum Groups S Majid 687
h-Pseudodifferential Operators and Applications B Helffer 701
Hubbard Model H Tasaki 712
Hydrodynamic Equations see Interacting Particle Systems and Hydrodynamic Equations
Hyperbolic Billiards M P Wojtkowski 716
Hyperbolic Dynamical Systems B Hasselblatt 721
VOLUME 3
I
Image Processing: Mathematics G Aubert and P Kornprobst 1
Incompressible Euler Equations: Mathematical Theory D Chae 10
Indefinite Metric H Gottschalk 17
Index Theorems P B Gilkey, K Kirsten, R Ivanova and J H Park 23
Inequalities in Sobolev Spaces M Vaugon 32
Infinite-Dimensional Hamiltonian Systems R Schmid 37
Instantons: Topological Aspects M Jardim 44
Integrability and Quantum Field Theory T J Hollowood 50
Integrable Discrete Systems O Ragnisco 59
Integrable Systems and Algebraic Geometry E Previato 65
Integrable Systems and Discrete Geometry A Doliwa and P M Santini 78
Integrable Systems and Recursion Operators on Symplectic and Jacobi Manifolds R Caseiro and
J M Nunes da Costa 87
Integrable Systems and the Inverse Scattering Method A S Fokas 93
Integrable Systems in Random Matrix Theory C A Tracy and H Widom 102
Integrable Systems: Overview Francesco Calogero 106
xliv CONTENTS
J
The Jones Polynomial V F R Jones 179
K
KacMoody Lie Algebras see Solitons and KacMoody Lie Algebras
KAM Theory and Celestial Mechanics L Chierchia 189
Kinetic Equations C Bardos 200
Knot Homologies J Rasmussen 208
Knot Invariants and Quantum Gravity R Gambini and J Pullin 215
Knot Theory and Physics L H Kauffman 220
Kontsevich Integral S Chmutov and S Duzhin 231
Kortewegde Vries Equation and Other Modulation Equations G Schneider and E Wayne 239
K-Theory V Mathai 246
L
Lagrangian Dispersion (Passive Scalar) G Falkovich 255
Large Deviations in Equilibrium Statistical Mechanics S Shlosman 261
Large-N and Topological Strings R Gopakumar 263
Large-N Dualities A Grassi 269
Lattice Gauge Theory A Di Giacomo 275
LeraySchauder Theory and Mapping Degree J Mawhin 281
Lie Bialgebras see Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups
Lie Groups: General Theory R Gilmore 286
Lie Superalgebras and Their Representations L Frappat 305
Lie, Symplectic, and Poisson Groupoids and Their Lie Algebroids C-M Marle 312
Liquid Crystals O D Lavrentovich 320
LjusternikSchnirelman Theory J Mawhin 328
Localization for Quasiperiodic Potentials S Jitomirskaya 333
Loop Quantum Gravity C Rovelli 339
Lorentzian Geometry P E Ehrlich and S B Kim 343
Lyapunov Exponents and Strange Attractors M Viana 349
M
Macroscopic Fluctuations and Thermodynamic Functionals G Jona-Lasinio 357
Magnetic Resonance Imaging C L Epstein and F W Wehrli 367
Magnetohydrodynamics C Le Bris 375
CONTENTS xlv
N
Negative Refraction and Subdiffraction Imaging S OBrien and S A Ramakrishna 483
Newtonian Fluids and Thermohydraulics G Labrosse and G Kasperski 492
Newtonian Limit of General Relativity J Ehlers 503
Noncommutative Geometry and the Standard Model T Schucker 509
Noncommutative Geometry from Strings Chong-Sun Chu 515
Noncommutative Tori, YangMills, and String Theory A Konechny 524
Nonequilibrium Statistical Mechanics (Stationary): Overview G Gallavotti 530
Nonequilibrium Statistical Mechanics: Dynamical Systems Approach P Butta and C Marchioro 540
Nonequilibrium Statistical Mechanics: Interaction between Theory and
Numerical Simulations R Livi 544
Nonlinear Schrodinger Equations M J Ablowitz and B Prinari 552
Non-Newtonian Fluids C Guillope 560
Nonperturbative and Topological Aspects of Gauge Theory R W Jackiw 568
Normal Forms and Semiclassical Approximation D Bambusi 578
N-Particle Quantum Scattering D R Yafaev 585
Nuclear Magnetic Resonance P T Callaghan 592
Number Theory in Physics M Marcolli 600
O
Operads J Stasheff 609
Operator Product Expansion in Quantum Field Theory H Osborn 616
Optical Caustics A Joets 620
Optimal Cloning of Quantum States M Keyl 628
Optimal Transportation Y Brenier 632
Ordinary Special Functions W Van Assche 637
xlvi CONTENTS
VOLUME 4
P
Painleve Equations N Joshi 1
Partial Differential Equations: Some Examples R Temam 6
Path Integral Methods see Functional Integration in Quantum Physics; Feynman Path Integrals
Path Integrals in Noncommutative Geometry R Leandre 8
Peakons D D Holm 12
Penrose Inequality see Geometric Flows and the Penrose Inequality
Percolation Theory V Beffara and V Sidoravicius 21
Perturbation Theory and Its Techniques R J Szabo 28
Perturbative Renormalization Theory and BRST K Fredenhagen and M Dutsch 41
Phase Transition Dynamics A Onuki 47
Phase Transitions in Continuous Systems E Presutti 53
PirogovSinai Theory R Kotecky 60
Point-Vortex Dynamics S Boatto and D Crowdy 66
Poisson Lie Groups see Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups
Poisson Reduction J-P Ortega and T S Ratiu 79
Polygonal Billiards S Tabachnikov 84
Positive Maps on C-Algebras F Cipriani 88
Pseudo-Riemannian Nilpotent Lie Groups P E Parker 94
Q
q-Special Functions T H Koornwinder 105
Quantum 3-Manifold Invariants C Blanchet and V Turaev 117
Quantum CalogeroMoser Systems R Sasaki 123
Quantum Central-Limit Theorems A F Verbeure 130
Quantum Channels: Classical Capacity A S Holevo 142
Quantum Chromodynamics G Sterman 144
Quantum Cosmology M Bojowald 153
Quantum Dynamical Semigroups R Alicki 159
Quantum Dynamics in Loop Quantum Gravity H Sahlmann 165
Quantum Electrodynamics and Its Precision Tests S Laporta and E Remiddi 168
Quantum Entropy D Petz 177
Quantum Ergodicity and Mixing of Eigenfunctions S Zelditch 183
Quantum Error Correction and Fault Tolerance D Gottesman 196
Quantum Field Theory in Curved Spacetime B S Kay 202
Quantum Field Theory: A Brief Introduction L H Ryder 212
Quantum Fields with Indefinite Metric: Non-Trivial Models S Albeverio and H Gottschalk 216
Quantum Fields with Topological Defects M Blasone, G Vitiello and P Jizba 221
Quantum Geometry and Its Applications A Ashtekar and J Lewandowski 230
Quantum Group Differentials, Bundles and Gauge Theory T Brzezinski 236
Quantum Hall Effect K Hannabuss 244
Quantum Mechanical Scattering Theory D R Yafaev 251
Quantum Mechanics: Foundations R Penrose 260
Quantum Mechanics: Generalizations P Pearle and A Valentini 265
Quantum Mechanics: Weak Measurements L Diosi 276
Quantum n-Body Problem R G Littlejohn 283
CONTENTS xlvii
R
Random Algebraic Geometry, Attractors and Flux Vacua M R Douglas 323
Random Dynamical Systems V Araujo 330
Random Matrix Theory in Physics T Guhr 338
Random Partitions A Okounkov 347
Random Walks in Random Environments L V Bogachev 353
Recursion Operators in Classical Mechanics F Magri and M Pedroni 371
Reflection Positivity and Phase Transitions Y Kondratiev and Y Kozitsky 376
Regularization for Dynamical -Functions V Baladi 386
Relativistic Wave Equations Including Higher Spin Fields R Illge and V Wunsch 391
Renormalization: General Theory J C Collins 399
Renormalization: Statistical Mechanics and Condensed Matter M Salmhofer 407
Resonances N Burq 415
Ricci Flow see Singularities of the Ricci Flow
Riemann Surfaces K Hulek 419
RiemannHilbert Methods in Integrable Systems D Shepelsky 429
RiemannHilbert Problem V P Kostov 436
Riemannian Holonomy Groups and Exceptional Holonomy D D Joyce 441
S
Saddle Point Problems M Schechter 447
Scattering in Relativistic Quantum Field Theory: Fundamental Concepts and Tools D Buchholz and
S J Summers 456
Scattering in Relativistic Quantum Field Theory: The Analytic Program J Bros 465
Scattering, Asymptotic Completeness and Bound States D Iagolnitzer and J Magnen 475
Schrodinger Operators V Bach 487
Schwarz-Type Topological Quantum Field Theory R K Kaul, T R Govindarajan and P Ramadevi 494
SeibergWitten Theory Siye Wu 503
Semiclassical Approximation see Stationary Phase Approximation; Normal Forms and
Semiclassical Approximation
Semiclassical Spectra and Closed Orbits Y Colin de Verdiere 512
Semilinear Wave Equations P DAncona 518
Separation of Variables for Differential Equations S Rauch-Wojciechowski and K Marciniak 526
Separatrix Splitting D Treschev 535
Several Complex Variables: Basic Geometric Theory A Huckleberry and T Peternell 540
Several Complex Variables: Compact Manifolds A Huckleberry and T Peternell 551
Shock Wave Refinement of the FriedmanRobertsonWalker Metric B Temple and J Smoller 559
Shock Waves see Symmetric Hyperbolic Systems and Shock Waves
Short-Range Spin Glasses: The Metastate Approach C M Newman and D L Stein 570
Sine-Gordon Equation S N M Ruijsenaars 576
Singularities of the Ricci Flow M Anderson 584
Singularity and Bifurcation Theory J-P Francoise and C Piquet 588
xlviii CONTENTS
VOLUME 5
T
t HooftPolyakov Monopoles see Solitons and Other Extended Field Configurations
Thermal Quantum Field Theory C D Jakel 227
Thermohydraulics see Newtonian Fluids and Thermohydraulics
Toda Lattices Y B Suris 235
Toeplitz Determinants and Statistical Mechanics E L Basor 244
TomitaTakesaki Modular Theory S J Summers 251
Topological Defects and Their Homotopy Classification T W B Kibble 257
Topological Gravity, Two-Dimensional T Eguchi 264
Topological Knot Theory and Macroscopic Physics L Boi 271
Topological Quantum Field Theory: Overview J M F Labastida and C Lozano 278
Topological Sigma Models D Birmingham 290
Turbulence Theories R M S Rosa 295
Twistor Theory: Some Applications L Mason 303
Twistors K P Tod 311
Two-Dimensional Conformal Field Theory and Vertex Operator Algebras M R Gaberdiel 317
Two-Dimensional Ising Model B M McCoy 322
Two-Dimensional Models B Schroer 328
U
Universality and Renormalization M Lyubich 343
V
Variational Methods in Turbulence F H Busse 351
Variational Techniques for GinzburgLandau Energies S Serfaty 355
Variational Techniques for Microstructures G Dolzmann 363
Vertex Operator Algebras see Two-Dimensional Conformal Field Theory and Vertex
Operator Algebras
Viscous Incompressible Fluids: Mathematical Theory J G Heywood 369
von Neumann Algebras: Introduction, Modular Theory, and Classification Theory V S Sunder 379
von Neumann Algebras: Subfactor Theory Y Kawahigashi 385
Vortex Dynamics M Nitsche 390
Vortices see Abelian Higgs Vortices: Point-Vortex Dynamics
W
Wave Equations and Diffraction M E Taylor 401
Wavelets: Application to Turbulence M Farge and K Schneider 408
Wavelets: Applications M Yamada 420
Wavelets: Mathematical Theory K Schneider and M Farge 426
WDVV Equations and Frobenius Manifolds B Dubrovin 438
Weakly Coupled Oscillators E M Izhikevich and Y Kuramoto 448
WheelerDe Witt Theory J Maharana 453
Wightman Axioms see Axiomatic Quantum Field Theory
Wulff Droplets S Shlosman 462
Y
YangBaxter Equations J H H Perk and H Au-Yang 465
INDEX 475
Introductory Articles
Introductory Article: Classical Mechanics
G Gallavotti, Universita di Roma La Sapienza, forces not corresponding to a potential are certain
Rome, Italy velocity-dependent forces like the Coriolis force
2006 G Gallavotti. Published by Elsevier Ltd. (which, however, appears only in noninertial frames
All rights reserved. of reference) and the closely related Lorentz force
(in electromagnetism): they could be easily accom-
modated in the Hamiltonian formulation of
mechanics; see Appendix 2.
General Principles The action principle states that an equivalent
Classical mechanics is a theory of motions of point formulation of the eqns [1] is that a motion
particles. If X = (x1 , . . . , xn ) are the particle positions t ! X 0 (t) satisfying [1] during a time interval
in a Cartesian inertial system of coordinates, the [t1 , t2 ] and leading from X 1 = X 0 (t1 ) to X 2 = X 0 (t2 ),
equations of motion are determined by their masses renders stationary the action
(m1 , . . . , mn ), mj > 0, and by the potential energy of Z t2 X !
n
interaction, V(x1 , . . . , xn ), as 1 _ 2
AfXg mi X i t VXt dt 2
t1 i1
2
i @xi Vx1 ; . . . ; xn ;
mi x i 1; . . . ; n 1
within the class Mt1 , t2 (X 1 , X 2 ) of smooth (i.e.,
here xi = (xi1 , . . . , xid ) are coordinates of the ith analytic) motions t ! X(t) defined for t 2 [t1 , t2 ]
particle and @xi is the gradient (@xi1 , . . . , @xid ); d is the and leading from X 1 to X 2 .
space dimension (i.e., d = 3, usually). The potential The function
energy function will be supposed smooth, that is,
analytic except, possibly, when two positions coin- 1X n
def
LY, X mi y2i VX KY VX,
cide. The latter exception is necessary to include the 2 i1
important cases of gravitational attraction or, when Y y1 , . . . , yn
dealing with electrically charged particles, of Cou-
lomb interaction. A basic result is that if V is is called the Lagrangian function and the action can
bounded below, eqn [1] admits, given initial data be written as
X 0 = X(0), X_ 0 = X(0),
_ a unique global solution
Z t2
t ! X(t), t 2 (1, 1); otherwise a solution can fail
_
LXt; Xt dt
to be global if and only if, in a finite time, it reaches
t1
infinity or a singularity point (i.e., a configuration in
which two or more particles occupy the same point: _
The quantity K(X(t)) is called kinetic energy and
an event called a collision). motions satisfying [1] conserve energy as time
In eqn [1], @xi V(x1 , . . . , xn ) is the force acting on t varies, that is,
the points. More general forces are often admitted.
For instance, velocity-dependent friction forces: they _
KXt VXt E const: 3
are not considered here because of their phenomeno-
logical nature as models for microscopic phenomena Hence the action principle can be intuitively thought
which should also, in principle, be explained in of as saying that motions proceed by keeping
terms of conservative forces (furthermore, even from constant the energy, sum of the kinetic and potential
a macroscopic viewpoint, they are rather incomplete energies, while trying to share as evenly as possible
models, as they should be considered together with their (average over time) contribution to the energy.
the important heat generation phenomena that In the special case in which V is translation invariant,
def P
accompany them). Another interesting example of motions conserve linear momentum Q = i mi x_ i ; if V
2 Introductory Article: Classical Mechanics
is rotation invariant around thePorigin O, motions In general, the -dimensional manifold M will not
def
conserve angular momentum M = i mi xi ^ x_ i , where ^ admit a global system of coordinates: however, it
denotes the vector product in Rd , that is, it is the tensor will be possible to describe points in the vicinity
(a ^ b)ij = ai bj bi aj , i, j = 1, . . . , d: if the dimension of any X 0 2 M by using N = nd coordinates
d = 3 the a ^ b will be naturally regarded as a vector. q = (q1 , . . . , q , q1 , . . . , qN ) varying in an open ball
More generally, to any continuous symmetry group of BX 0 : X = X(q1 , . . . , q , q1 , . . . , qN ).
the Lagrangian correspond conserved quantities: this is The q-coordinates can be chosen well adapted to
formalized in the Noether theorem. the surface M and to the kinetic metric, i.e., so that
It is convenient to think that the scalar product the points of M are identified by q1 = = qN = 0
in Rdn is defined Pin terms of the ordinaryPscalar product (which is the meaning of adapted); furthermore,
in R d , a b = dj= 1 aj bj , by (v, w) = ni= 1 mi vi wi : infinitesimal displacements (0, . . . , 0, d"1 , . . . , d"N )
so that kinetic energy and line element ds can be out of a point X 0 2 M are orthogonal to M (in the
written as K(X) _ = 1 (X, _ X)_ and ds2 = Pn mi dx2 , kinetic metric) and have a length independent of the
2 i=1 i
respectively. Therefore, the metric generated by the position of X 0 on M (which is the meaning of well
latter scalar product can be called kinetic energy adapted to the kinetic metric).
metric. Motions constrained on M arise when the
The interest of the kinetic metric appears from the potential V has the form
Maupertuis principle (equivalent to [1]): the princi-
ple allows us to identify the trajectory traced in R d VX Va X WX 5
by a motion that leads from X 1 to X 2 moving with
energy E. Parametrizing such trajectories as where W is a smooth function which reaches its
! X() by a parameter varying in [0, 1] so that minimum value, say equal to 0, precisely on the
the line element is ds2 = (@ X, @ X) d 2 , the principle manifold M while Va is another smooth potential.
states that the trajectory of a motion with energy E The factor > 0 is a parameter called the rigidity of
which leads from X 1 to X 2 makes stationary, among the constraint.
the analytic curves x 2 M0, 1 (X 1 , X 2 ), the function A particularly interesting case arises when the level
surfaces of W also have the geometric property of
Z q being parallel to the surface M: in the precise sense
Lx E Vxs ds 4 that the matrix @q2i qj W(X), i, j > is positive definite
x
and X-independent, for all X 2 M, in a system of
so that the possible trajectories traced by the coordinates well adapted to the kinetic metric.
solutions of [1] in Rnd and with energy E can be A potential W with the latter properties can be
identified with the geodesics of the metric called an approximately ideal constraint reaction. In
def
dm2 = (E V(X)) ds2 . fact, it can be proved that, given an initial datum
For more details, the reader is referred to Landau X 0 2 M with velocity X_ 0 tangent to M, i.e., given
and Lifshitz (1976) and Gallavotti (1983). an initial datum whose coordinates in a local system
of coordinates are (q0 , 0) and (q_ 0 , 0) with q0 =
(q01 , . . . , q0 ) and q_ 0 = (q_ 01 , . . . , q_ 0 ), the motion
generated by [1] with V given by [5] is a motion
Constraints t ! X (t) which
Often particles are subject to constraints which force 1. as ! 1 tends to a motion t ! X 1 (t);
the motion to take place on a surface M Rnd , i.e., 2. as long as X 1 (t) stays in the vicinity of the initial
X(t) is forced to be a point on the manifold data, say for 0 t t1 , so that it can be
M. A typical example is provided by rigid systems described in the above local adapted coordinates,
in which motions are subject to forces which keep its coordinates have the form t ! (q(t), 0) =
the mutual distances of the particles constant: (q1 (t), . . . , q (t), 0, . . . , 0): that is, it is a motion
jxi xj j = ij , with ij time-independent positive quan- developing on the constraint surface M; and
tities. In essentially all cases, the forces that imply 3. the curve t ! X 1 (t), t 2 [0, t1 ], as an element of
constraints, called constraint reactions, are velocity the space M0, t1 (X 0 , X 1 (t1 )) of analytic curves on
dependent and, therefore, are not in the class of M connecting X 0 to X 1 (t1 ), renders the action
conservative forces considered here, cf. [1]. Hence,
Z t1
from a fundamental viewpoint admitting only conser-
AX _
KXt Va Xt dt 6
vative forces, constrained systems should be regarded 0
as idealizations of systems subject to conservative
forces which approximately imply the constraints. stationary.
Introductory Article: Classical Mechanics 3
The latter property can be formulated intrinsically, satisfy the mentioned conditions and therefore, the so
that is, referring only to M as a surface, via the constrained motions X 1 (t) of the body satisfy the
restriction of the metric ds2 to line elements ds = variational principles mentioned in connection with [7]
(dq1 , . . . , dq , 0, . . . , 0) tangent to M atPthe point and [9]: in other words, the above natural way of
X = (q0 , 0, . . . , 0) 2 M; we write ds2 = 1, i, j gij (q) realizing a rather general rigidity constraint is ideal.
dqi dqj . The symmetric positive-definite matrix g The modern viewpoint on the physical meaning of
can be called the metric on M induced by the kinetic the constraint reactions is as follows: looking at
energy. Then the action in [6] can be written as motions in an inertial Cartesian system, it will appear
Z t1 1;
that the system is subject to the applied forces with
1X potential Va (X) and to constraint forces which are
Aq gij qtq_ i tq_ j t
0 2 i;j defined as the differences Ri = mi x i xi Va (X). The
! latter reflect the action of the forces with potential
W(X) in the limit of infinite rigidity ( ! 1).
V a qt dt 7
In applications, sometimes the action of a constraint
def
can be regarded as ideal: the motion will then verify the
where V a (q) = Va (X(q1 , . . . , q ,0, . . . , 0)): the function variational principles mentioned and R can be com-
1; puted as the differences between the mi x i and the active
def 1X
Lh; q gij qi j V a q forces xi Va (X). In dynamics problems it is, however,
2 i;j a very difficult and important matter, particularly in
1 engineering, to judge whether a system of particles can
gqh h V a q 8 be considered as subject to ideal constraints: this leads
2
to important decisions in the construction of machines.
is called the constrained Lagrangian of the system. It simplifies the calculations of the reactions and fatigue
An important property is that the constrained motions of the materials but a misjudgment can have serious
conserve the energy defined as E = 12 (g(q)q, _ q)
_ consequences about stability and safety. For statics
V a (q); see next section. problems, the difficulty is of lower order: usually
The constrained motion X 1 (t) of energy E satisfies assuming that the constraint reaction is ideal leads to
the Maupertuis principle in the sense that the curve an overestimate of the requirements for stability of
on M on which the motion develops renders equilibria. Hence, employing the action principle to
Z q statics problems, where it constitutes the principle of
Lx E Va xs ds 9 virtual work, generally leads to economic problems
x
rather than to safety issues. Its discovery even predates
stationary among the (smooth) curves that develop Newtonian mechanics.
on M connecting two fixed values X 1 and X 2 . In the We refer the reader to Arnold (1989) and
particular case in which = n this is again Mauper- Gallavotti (1983) for more details.
tuis principle for unconstrained motions under the
potential V(X). In general, is called the number of
degrees of freedom because a complete description
of the initial data requires 2 coordinates q(0), q(0).
_ Lagrange and Hamilton Forms
If W is minimal on M but the condition on W of of the Equations of Motion
having level surfaces parallel to M is not satisfied, i.e., The stationarity condition for the action A(q), cf.
if W is not an approximate ideal constraint reaction, [7], [8], is formulated in terms of the Lagrangian
it still remains true that the limit motion X 1 (t) takes L(h, x), see [8], by
place on M. However, in general, it will not satisfy the
above variational principles. For this reason, motions d
arising as limits (as ! 1) of motions developing @ Lqt;
_ qt
dt i
under the potential [5] with W having minimum on M @xi Lqt;
_ qt; i 1; . . . ; 10
and level curves parallel (in the above sense) to M are
called ideally constrained motions or motions subject which is a second-order differential equation called
by ideal constraints to the surface M. the Lagrangian equation of motion. It can be cast in
As anPexample, suppose that W has the form normal form: for this purpose, adopting the
W(X) = i, j2P wij (jxi xj j) with wij (jxj) 0 an ana- convention of summation over repeated indices,
lytic function vanishing only when jxj = ij for i, j in introduce the generalized momenta
some set of pairs P and for some given distances ij (e.g., def
2
wij (x) = (x 2ij )2 , > 0). Then W can be shown to pi gqij q_ j ; i 1; . . . ; 11
4 Introductory Article: Classical Mechanics
Since g(q) > 0, the motions t ! q(t) and the corre- [12] can be equivalently formulated by requiring
sponding velocities t ! q(t)
_ can be described equiva- that the function
lently by t ! (q(t), p(t)): and the equations of motion Z t2
def
[10] become the first-order equations AH j pt k_ t Hpt; k t dt 14
t1
q_ i @pi Hp; q; p_ i @qi Hp; q 12
be stationary for j = j 0 : in fact, eqns [12] are the
where the function H, called the Hamiltonian of the stationarity conditions for the Hamilton action
system, is defined by [14] on Mt0 , t1 ((p1 , q1 ), (p2 , q2 ); M). And, since the
def
derivatives of p(t) do not appear in [14], statio-
Hp; q 12gq1 p; p V a q 13 narity is even achieved in the larger space
Mt1 , t2 (q1 , q2 ; M) of the motions j : t ! (p(t), k (t))
Equations [12], regarded as equations of motion for
leading from q1 to q2 without any restriction on
phase space points (p, q), are called Hamilton
the initial and final momenta p1 , p2 (which, there-
equations. In general, q are local coordinates on M
fore, cannot be prescribed a priori independently
and motions are specified by giving q, q_ or p, q.
of q1 , q2 ). If the prescribed data p1 , q1 , p2 , q2 are
Looking for a coordinate-free representation of
not compatible with the equations of motion (e.g.,
motions consider the pairs X, Y with X 2 M and Y a
H(p1 , q2 ) 6 H(p2 , q2 )), then the action functional
vector Y 2 TX tangent to M at the point X. The
has no stationary trajectory in Mt1 , t2 ((p1 , q1 ),
collection of pairs (Y, X) is denoted T(M) = [X2M
_ (p2 q2 ); M).
(TX {X}) and a motion t ! (X(t), X(t)) 2 T(M) in
For more details, the reader is referred to Landau
local coordinates is represented by (q(t),
_ q(t)). The
and Lifshitz (1976), Arnold (1989), and Gallavotti
space T(M) can be called the space of initial data for
(1983).
Lagranges equations of motion: it has 2 dimen-
sions (also known as the tangent bundle of M).
Likewise, the space of initial data for the
Hamilton equations will be denoted T
(M) and it Canonical Transformations of Phase
consists of pairs X, P with X 2 M and P = g(X)Y Space Coordinates
with Y a vector tangent to M at X. The space T
(M)
The Hamiltonian form, [13], of the equations of
is called the phase space of the system: it has
motion turns out to be quite useful in several
2 dimensions (and it is occasionally called the
problems. It is, therefore, important to remark that
cotangent bundle of M).
it is invariant under a special class of transformations
Immediate consequence of [12] is
of coordinates, called canonical transformations.
d Consider a local change of coordinates on phase
Hpt; qt 0
dt space, i.e., a smooth, smoothly invertible map
C(p, k ) = (p 0 , k 0 ) between an open set U in the
and it means that H(p(t), q(t)) is constant along phase space of a Hamiltonian system with
the solutions of [12]. Noting that H(p, q) = degrees of freedom, into an open set U0 in a
(1=2)(g(q) q, _ q)
_ V a (q) is the sum of the kinetic 2-dimensional space. The change of coordinates is
and potential energies, it follows that the conservation said to be canonical if for any solution
of H along solutions means energy conservation in t ! (p(t), k (t)) of equations like [12], for any
presence of ideal constraints. Hamiltonian H(p, k ) defined on U, the Cimage
Let St be the flow generated on the phase space t ! (p 0 (t), k 0 (t)) = C(p(t), k (t)) is a solution of [12]
variables (p, q) by the solutions of the equations of with the same Hamiltonian, that is, with
motion [12], that is, let t ! St (p, q) (p(t), q(t)) def
Hamiltonian H0 (p 0 , k 0 ) = H(C1 (p 0 , k 0 )).
denote a solution of [12] with initial data (p, q). The condition that a transformation of coordi-
Then a (measurable) set in phase space evolves in nates is canonical is obtained by using the
time t into a new set St with the same volume: this arbitrariness of the function H and is simply
is obvious because the Hamilton equations [12] have expressed as a necessary and sufficient property of
manifestly zero divergence (Liouvilles theorem). the Jacobian L,
The Hamilton equations also satisfy a variational
principle, called the Hamilton action principle: that A B
L
is, if Mt1 , t2 ((p1 , q1 ), (p2 , q2 ); M) denotes the space of C D
the analytic functions j : t ! (p(t), k (t)) which in the 15
Aij @j 0i ; Bij @j 0i ;
time interval [t1 , t2 ] lead from (p1 , q1 ) to (p2 , q2 ),
then the condition that j 0 (t) = (p(t), q(t)) satisfies Cij @j 0i ; Dij @j 0i
Introductory Article: Classical Mechanics 5
def
X
and inverting the first equation in the form
fF; Ggp; k @k Fp; k @k Gp; k
p 0 = X(p, k ) and substituting the value for p 0 thus
k1
obtained, in the second equation, a map
@k Fp; k @k Gp; k 18
C(p, k ) = (p 0 , k 0 ) is defined on some domain (where
The latter satisfies Jacobis identity: {{F, G}, Q} the mentioned operations can be performed) and if
{{G, Q}, F} {{Q, F}, G} = 0, for any three functions such domain is open and not empty then C is a
F, G, Q on the phase space. It is quite useful to canonical map.
remark that if t ! (p(t), q(t)) = St (p, q) is a solution For similar reasons, if (k , k 0 ) is a function
to Hamilton equations with Hamiltonian H then, defined on some domain then setting p = @k
given any observable F(p, q), it evolves as (k , k 0 ), p 0 = @k 0 (k , k 0 ) and solving the first rela-
def
F(t) = F(p(t), q(t)) satisfying tion to express k 0 = D(p, k ) and substituting in the
second relation a map (p 0 , k 0 ) = C(p, k ) is defined on
@t Fpt; qt = {H; F}pt; qt some domain (where the mentioned operations can
Requiring the latter identity to hold for all observables be performed) and if such domain is open and not
F is equivalent to requiring that the t ! (p(t), q(t)) be a empty then C is a canonical map.
solution of Hamiltons equations for H. Likewise, canonical transformations can be con-
Let C : U ! U0 be a smooth, smoothly invertible structed starting from a priori given functions
transformation between two open 2-dimensional F(p, k 0 ) or G(p, p 0 ). And the most general canonical
sets: C(p, k ) = (p 0 , k 0 ). Suppose that there is a function map can be generated locally (i.e., near a given point
(p 0 , k ) defined on a suitable domain W such that in phase space) by a single one of the above four
ways, possibly composed with a few trivial
p @k p 0 ; k canonical maps in which one pair of coordinates
Cp; k p 0 ; k 0 ) 19
k 0 @p 0 p 0 ; k (i , i ) is transformed into (i , i ). The necessity of
also including the trivial maps can be traced to the
then C is canonical. This is because [19] implies that existence of homogeneous canonical maps, that is,
if k , p 0 are varied and if p, k 0 , p 0 , k are related by maps such that p dk = p 0 dk 0 (e.g., the identity
C(p, k ) = (p 0 , k 0 ), then p dk k 0 dp 0 = d(p 0 , k ), map, see below or [49] for nontrivial examples)
which implies that which are action preserving hence canonical, but
which evidently cannot be generated by a function
p dk Hp; k dt p 0 dk 0 HC1 p 0 ; k 0 dt
(k , k 0 ) although they can be generated by a
dp 0 ; k dp 0 k 0 20 function depending on p 0 , k .
6 Introductory Article: Classical Mechanics
Simple examples of homogeneous canonical maps The most general solution with energy E has the
are maps in which the coordinates q are changed form q(t) = Q(t0 t), where t0 is defined by
into q0 = R(q) and, correspondingly, the ps are _ 0 ), i.e., it is the time needed for
q0 = Q(t0 ), q_ 0 = Q(t
transformed as p0 = (@q R(q))1 T p, linearly: indeed, the standard solution Q(t) to reach the initial data
def
this map is generated by the function F(p0 , q) = for the new motion.
p0 R(q). If the derivative of V vanishes in one of the
For instance, consider the map Cartesianpolar extremes or if at least one of the two solutions q (E)
coordinates (q1 , q2 ) ! (,
) with (,
) the polar
q does not exist, the motion is not periodic and it may
coordinates of q (namely = q21 q22 ,
= arctan be unbounded: nevertheless, it is still expressible via
def
(q2 =q1 )) and let n= q=jqj = (n1 , n2 ) and t =(n2 , n1 ). integrals of the type [22]. If the potential V is
def def periodic in q and the variable q is considered to be
Setting p = p n, p
= p t, the map (p1 , p2 ,
varying on a circle then essentially all solutions are
q1 , q2 ) !(p , p
, ,
) is homogeneous canonical
periodic: exceptions can occur if the energy E has a
(because p dq = p nd p td
= p d p
d
).
value such that V(q) = E admits a solution where V
As a further example, any area-preserving map
has zero derivative.
(p, q) ! (p0 , q0 ) defined on an open region of the
Typical examples are the harmonic oscillator, the
plane R2 is canonical: because in this case the
pendulum, and the Kepler oscillator: whose Hamil-
matrices A, B, C, D are just numbers, which satisfy
tonians, if m, !, g, h, G, k are positive constants, are,
AD BC = 1 and, therefore, [16] holds.
respectively,
For more details, the reader is referred to Landau
and Lifshitz (1976) and Gallavotti (1983). p2 1
m!2 q2
2m 2
p2 q
mg 1 cos 24
Quadratures 2m h
2
The simplest mechanical systems are integrable by p 1 G2
mk m 2
quadratures. For instance, the Hamiltonian on R2 , 2m jqj 2q
1 2 the Kepler oscillator Hamiltonian has a potential
Hp; q p Vq 21
2m which is singular at q = 0 but if G 6 0 the energy
conservation forbids too close an approach to q = 0
generates a motion t ! q(t) with initial data q0 , q_ 0
and the singularity becomes irrelevant.
such that H(p0 , q0 ) = E, i.e., 12 mq_ 20 V(q0 ) = E,
The integral in [23] is called a quadrature and the
satisfying
systems in [21] are therefore integrable by quad-
r
2 ratures. Such systems, at least when the motion is
qt
_ E Vqt periodic, are best described in new coordinates in
m
which periodicity is more manifest. Namely when
If the equation E = V(q) has only two solutions V(q) = E has only two roots q (E) and V 0 (q (E)) > 0
q (E) < q (E) and j@q V(q (E))j > 0, the motion is the energytime coordinates can be used by replac-
periodic with period ing q, q_ or p, q by E, , where is the time needed
Z q E for the standard solution t ! Q(t) to reach the given
dx _
TE 2 p 22 data, that is, Q() = q, Q() = q.
_ In such coordi-
q E 2=mE Vx nates, the motion is simply (E, ) ! (E, t) and,
of course, the variable has to be regarded as
The special solution with initial data q0 =
varying on a circle of radius T=2. The E,
q (E), q_ 0 = 0 will be denoted Q(t), and it is an
variables are a kind of polar coordinates, as can
analytic function (by the general regularity theorem
be checked by drawing the curves of constant E,
on ordinary differential equations). For 0 t T=2
energy levels, in the plane p, q in the cases in
or for T=2 t T it is given, respectively, by
[24]; see Figure 1.
Z Qt
dx In the harmonic oscillator case, all trajectories are
t p 23a periodic. In the pendulum case, all motions are
q E 2=mE Vx
periodic except the ones which separate the oscilla-
or tory motions (the closed curves in the second
Z Qt
drawing) from the rotatory motions (the apparently
T dx open curves) which, in fact, are on closed curves as
t p 23b
2 q E 2=mE Vx well if the q coordinate, that is, the vertical
Introductory Article: Classical Mechanics 7
choice of I, the other coordinates vary on a standard and, since the computation of S(A, j) is reduced to
-dimensional torus T : hence, it is possible to say that integrations which can be regarded as a natural
a phase space region of integrability is foliated into extension of the quadratures discussed in the one-
-dimensional invariant tori T (I) parametrized by the dimensional cases, such systems are also called
values of the constants of motion I 2 I . integrable by quadratures. The just-described con-
If an integrable system is anisochronous then it is struction is a version of the more general Arnold
canonically integrable: that is, it is possible to define Liouville theorem.
on W a canonical change of coordinates (p, q) = In practice, however, the actual evaluation of the
C(A, a) mapping W onto J T and such that integrals in [29], [30] can be difficult: its analysis in
H(C(A, a)) = h(A) for a suitable h. Then, if various cases (even as elementary as the pendu-
def
w(A) = @A h(A), the equations of motion become lum) has in fact led to key progress in various
domains, for example, in the theory of special
A_ 0; a_ wA 28 functions and in group theory.
Given a system (I, j) of coordinates integrating an In general, any surface on phase space on which
anisochronous system the construction of action the restriction of the differential form p dq is locally
angle coordinates can be performed, in principle, via integrable is called a Lagrangian manifold: hence the
a classical procedure (under a few extra invariant tori of an anisochronous integrable system
assumptions). are Lagrangian manifolds.
Let 1 , . . . , be topologically independent circles If an integrable system is anisochronous, it cannot
on T , for definiteness let i (I) = {j j 1 = 2 = = admit more than independent constants of motion;
i1 = i1 = = 0, i 2 [0, 2]}, and set furthermore, it does not admit invariant tori of
I dimension > . Hence -dimensional invariant tori
1 are called maximal.
Ai I p dq 29
2 i I Of course, invariant tori of dimension < can also
exist: this happens when the variables I are such that
If the map I ! A(I) is analytically invertible as the frequencies w(I) admit nontrivial rational rela-
I = I(A), the function tions; i.e., there is an integer components vector
Z j n 2 Z , n = (1 , . . . , ) 6 0 such that
SA; j p dq 30 X
0 wI n !i Ii 0 32
i
is well defined if the integral is over any path
joining the points (p(I(A), 0), q(I(A), 0)) and in this case, the invariant torus T (I) is called
(p(I(A), j)), q(I(A), j) and lying on the torus para- resonant. If the system is anisochronous then
metrized by I(A). det @I w(I) 6 0 and, therefore, the resonant tori are
The key remark in the proof that [30] really associated with values of the constants of motion
defines a function of the only variables A, j is that I which form a set of measure zero in the space
anisochrony implies the vanishing of the Poisson I but which is not empty and dense.
brackets
P (cf. [18]): {Ii , Ij } = 0 (hence also {Ai , Aj } Examples of isochronous systems are the systems of
h, k @Ik Ai @Ih Aj {Ik , Ih } = 0). And the property harmonic oscillators, i.e., systems with Hamiltonian
{Ii , Ij } = 0 can be checked to be precisely the
X 1;
integrability condition for the differential form p dq
1 2 1X
pi cij qi qj
restricted to the surface obtained by varying q while p is i1
2mi 2 i; j
constrained so that (p, q) stays on the surface
I = constant, i.e., on the invariant torus of the points where the matrix v is a positive-definite matrix.
with fixed I. This is an isochronous system with frequencies
The latter property is necessary and sufficient in w = (!1 , . . . , ! ) whose squares are the eigenvalues of
1=2 1=2
order that the function S(A, j) be well defined (i.e., the matrix mi cij mj . It is integrable in the region
be independent on the integration path P ) up to an W of the data x = (p, q) 2 R2 such that, setting
additive quantity of the form i 2ni Ai with 0 1
!2 !2
n = (n1 , . . . , n ) integers. 1 B X v
; i pi
X v
; i qi C
Then the actionangle variables are defined by the A
@ p !2
q A
2!
mi m1
canonical change of coordinates with S(A, j) as i1 i1 i
generating function, i.e., by setting
for all eigenvectors v
,
= 1, . . . , , of the above
i @Ai SA; j; Ii @j i SA; j 31 matrix, the vectors A have all components >0.
Introductory Article: Classical Mechanics 9
Even though this system is isochronous, it never- Hence, the equations of motion are
theless admits a system of canonical actionangle
d
coordinates in which the Hamiltonian takes the m2
_ 0
simplest form dt
i.e., m2
= G is a constant of motion (it is the
X
angular momentum), and
hA !
A
w A 33
1 m
@ V @ 2
_2
with 2
2
0 1 G
P
@ V
v
; i pi
p m3
B i mC
B C def
arctanB i1 C @ VG
@ P p A
mi !
v
; i qi Then the energy conservation yields a second
i1
constant of motion E,
as conjugate angles.
An example of anisochronous system is the free m 2 1 G2
_ V E
rotators or free wheels: i.e., noninteracting points 2 2 m2
on a circle of radius R or noninteracting homo- 1 2 1 p2
m 2 or
L _ 2
_2 V
2 Z Rt
TE; G dx
The planarity of the motion is not a strong restriction t p 36b
as central motion always takes place on a plane. 2 E; G 2=mE VG x
10 Introductory Article: Classical Mechanics
determined, via the second quadrature, as follows. In terms of the above !0 , 0 the Jacobian matrix
The function Gm1 R(t)2 is periodic with period G)=@(A
T(E, G); hence it can be expressed in a Fourier series
@(E,
1 , A2 ) is computed from [38], [39] to be
!0 0 0 t
X . It follows that @E S = t, @G S =
(t)
2 0 1
0 E; G k E; G exp itk
TE; G so that, see [31],
k60
def def
the quadrature for
(t) can be performed by 1 @A1 S !0 t; 2 @A2 S
t 40
integrating the series terms. Setting
and (A1 , 1 ), (A2 , 2 ) are the actionangle pairs.
def TE; G
X k E; G 2 For more details, see Landau and Lifshitz (1976)
t0 exp it0 k
2 k60 k TE; G and Gallavotti (1983).
0 ), the expression
and 1 (0) =
0
(t
Z t
G Newtonian Potential and Keplers Laws
t
0 2
dt0
0 mRt0 t0 The anisochrony property, that is, det @(!0 , 0 )=
becomes @(A1 , A2 ) 6 0 or, equivalently, det @(!0 , 0 )=
@(E, G) 6 0, is not satisfied in the important cases
1 t 1 0 0 E; G t 37 of the harmonic potential and the Newtonian
Hence the system is integrable and the spectrum is potential. Anisochrony being only a sufficient con-
w(E, G) = (!0 (E, G), !1 (E, G)) (!0 , !1 ) with dition for canonical integrability it is still possible
(and true) that, nevertheless, in both cases the
def 2 def canonical transformation generated by [39] inte-
!0 and !1 0 E; G
TE; G grates the system. This is expected since the two
while I = (E, G) are constants of motion and the potentials are limiting cases of anisochronous ones
angles j = (0 , 1 ) can be taken as (e.g., jqj2" and jqj1" with " ! 0).
The Newtonian potential
def def
0 !0 t0 ; 1
0
t0 1 2 km
Hp; q p
At E, G fixed, the motion takes place on a two- 2m jqj
dimensional torus T (E, G) with 0 , 1 as angles.
is integrable in the region G 6 0, E0 (G) =
p
In the anisochronous cases, i.e., when
k2 m3 =2G2 < E < 0, jGj < k2 m3 =(2E). Pro-
det @E, G w(E, G) 6 0, canonical actionangle vari-
ceeding as in the last section, one finds integrating
ables conjugated to (p , , p
,
) can be constructed
coordinates and that the integrable motions develop
via [29], [30] by using two cycles 1 , 2 on the torus
on ellipses with one focus on the center of attraction
T (E, G). It is convenient to choose
S so that motions are periodic, hence not anisochro-
1. 1 as the cycle consisting of the points = x,
= 0 nous: nevertheless, the construction of the canonical
whose first half (where p 0)pconsists in the
coordinates via [29][31] (hence [39]) works and
set E, (G) x E, (G), p = 2m(E VG (x)) leads to canonical coordinates (L0 , 0 , G0 , 0 ). To
and d
= 0; and obtain actionangle variables with a simple
Introductory Article: Classical Mechanics 11
D c E
P P
P
O S O S O S
Figure 2 Eccentric and true anomalies of P, which moves on a small circle E centered at a point c moving on the circle D located
half-way between the two concentric circles containing the Keplerian ellipse: the anomaly of c with respect to the axis OS is . The
circle D is eccentric with respect to S and therefore is, even today, called eccentric anomaly, whereas the circle D is, in ancient
terminology, the deferent circle (eccentric circles were introduced in astronomy by Ptolemy). The small circle E on which the point P
moves is, in ancient terminology, an epicycle. The deferent and the epicyclical motions are synchronous (i.e., they have the same
period); Kepler discovered that his key a priori hypothesis of inverse proportionality between angular velocity on the deferent and
distance between P and S (i.e., _ = constant) implied both synchrony and elliptical shape of the orbit, with focus in S. The latter law is
equivalent to 2
_ = constant (because of the identity a _ =
).
_ Small eccentricity ellipses can hardly be distinguished from circles.
interpretation, it is convenient to perform on the where g = g(e sin , e cos ), f = f (e sin , e cos ),
variables (L0 , 0 , G0 , 0 ) (constructed by following the and g(x, y), f (x, y) are suitable functions analytic
procedure just indicated) a further trivial canonical for jxj, jyj < 1. Furthermore, g(x, y) = x(1 y ),
transformation by setting L = L0 G0 , G = G0 , f (x, y) = 2x(1 54 y ) and the ellipses denote
= 0 , = 0 0 ; then terms of degree 2 or higher in x, y, containing only
even powers of x.
1. (average anomaly) is the time necessary for the
For more details, the reader is referred to Landau
point P to move from the pericenter to its actual
and Lifshitz (1976) and Gallavotti (1983).
position, in units of the period, times 2;
2. L (action) is essentially the energy E = k2 m3 =2L2 ;
3. G (angular momentum);
Rigid Body
4. (axis longitude), is the angle between a fixed
axis and the major axis of the ellipse oriented Another fundamental integrable system is the rigid
from the center of the ellipse O to the center of body in the absence of gravity and with a fixed point
attraction S. O. It can be naturally described in terms of the Euler
angles
0 , 0 , 0 (see Figure 3) and their derivatives
The eccentricity of the ellipse is e such that G =
p
_0 , _ 0 , _ 0 .
L 1 e2 . The ellipse equation is = a(1
Let I1 , I2 , I3 be the three principal inertia moments
e cos ), where is the eccentric anomaly (see
of the body along the three principal axes with unit
Figure 2), a = L2 =km2 is the major semiaxis, and
vectors i1 , i2 , i3 . The inertia moments and the
is the distance to the center of attraction S.
principal axes are the eigenvalues and the associated
Finally, the relations between eccentric anomaly ,
unit eigenvectors of the 3 P 3 inertia matrix I ,
average anomaly , true anomaly
(the latter is the
which is defined by I hk = ni= 1 mi (xi )h (xi )k , where
polar angle), and SP distance are given by the
h, k = 1, 2, 3 and xi is the position of the ith particle
Kepler equations
in a reference frame with origin at O and in which
e sin
1 e cos 1 e cos
1 e2 i3 z
Z
d
0 41
1 e2 3=2 2 i2
0 1 e cos
0 0
y
1 e2 O
a 1 e cos
0
and the relation between true anomaly and average x 0
anomaly can be inverted in the form
n i1
g
42 Figure 3 The Euler angles of the comoving frame i 1 , i 2 , i 3 with
1 e2
f ) respect to a fixed frame x , y , z. The direction n is the node line,
a 1 e cos f intersection between the planes x, y and i 1 , i 2 .
12 Introductory Article: Classical Mechanics
all particles are at rest: this comoving frame exists as Since angular momentum is conserved, it is con-
a consequence of the rigidity constraint. The venient to introduce the laboratory frame (O; x0 ,
principal axes form a coordinate system which is y0 , z0 ) with fixed axes x0 , y0 , z0 and (see Figure 4):
comoving as well: that is, in the frame (O; i1 , i2 , i3 )
1. (O; x, y, z), the momentum frame with fixed axes,
as well, the particles are at rest.
but with z-axis oriented as M, and x-axis
The Lagrangian is simply the kinetic energy: we
coinciding with the node (i.e., the intersection)
imagine the rigidity constraint to be ideal (e.g., as
of the x0 y0 plane and the xy plane (orthogonal
realized by internal central forces in the limit of
to M). Therefore, x, y, z is determined by the two
infinite rigidity, as mentioned in the section Lagrange
Euler angles , of (O; x, y, z) in (O; x0 , y0 , z0 );
and Hamilton forms of equations of motion). The
2. (O; 1, 2, 3), the comoving frame, that is, the
angular velocity of the rigid motion is defined by
frame fixed with the body, and with unit vectors
w
_0 n _ 0 z _ 0 i3 43 i1 , i2 , i3 parallel to the principal axes of the body.
The frame is determined by three Euler angles
expressing that a generic infinitesimal motion
0 , 0 , 0 ;
must consist of a variation of the three Euler 3. the Euler angles of (O; 1, 2, 3) with respect to
angles and, therefore, it has to be a rotation of (O; x, y, z), which are denoted
, , ; P
speeds
_0 , _ 0 , _ 0 around the axes n, z, i3 as shown 4. G, the total angular momentum: G2 = j Ij2 !2j ;
in Figure 3. 5. M3 , the angular momentum along the z0 axis;
Let (!1 , !2 , !3 ) be the components of w along the M3 = G cos ; and
principal axes i1 , i2 , i3 : for brevity, the latter axes 6. L, the projection of M on the axis 3, L = G cos
.
will often be called 1, 2, 3. Then the angular
momentum M, with respect to the pivot point O, The quantities G, M3 , L, , , determine
0 , 0 ,
0 and
0 ,
_ _ 0 , _ 0 , or the p
, p , p variables
and the kinetic energy K can be checked to be 0 0 0
conjugated to
0 , 0 , 0 as shown by the following
M I 1 !1 i 1 I 2 ! 2 i 2 I 3 !3 i 3 comment.
1 44 Considering Figure 4, the angles , determine
K I1 !21 I2 !22 I2 !23 location, in the fixed frame (O; x0 , y0 , z0 ) of the
2
direction of M and the node line m, which are,
and are constants of motion. From Figure 3 it follows respectively, the z-axis and the x-axis of the fixed
that !1 =
_0 cos 0 _ 0 sin
0 sin 0 , !2 =
_0 sin 0 frame associated with the angular momentum; the
_ 0 sin
0 cos 0 and !3 = _ 0 cos
0 _ 0 , so that the angles
, , then determine the position of the
Lagrangian, uninspiring at first, is comoving frame with respect to the fixed frame
def 1 (O; x, y, z), hence its position with respect to
L I1
_0 cos 0 _ 0 sin
0 sin 0 2 (O; x0 , y0 , z0 ), that is, (
0 , 0 , 0 ). From this and
2
1 G, it is possible to determine w because
2
I2
_0 sin 0 _ 0 sin
0 cos 0
2 I3 !3 I 2 !2
1 cos
; tan
I3 _ 0 cos
0 _ 0 2 45 G I 1 !1 47
2 !22 I22 G2 I12 !21 I32 !23
Angular momentum conservation does not imply and, from [43],
_0 , _ 0 , _ 0 are determined.
that the components !j are constants because
i1 , i2 , i3 also change with time according to
z0
d 3
ij w ^ ij ; j 1; 2; 3 0 2
dt M ||z
_ = 0 becomes, by the first of [44] and y
Hence, M
denoting Iw = (I1 !1 , I2 !2 , I3 !3 ), the Euler equations O y0
Iw w ^ Iw = 0, or
I1 !_ 1 I2 I3 !2 !3
x0 1
I2 !_ 2 I3 I1 !3 !1 46 0 0
I3 !_ 3 I1 I2 !1 !2 n
n0
x =m
which can be considered together with the conserved Figure 4 The laboratory frame, the angular momentum frame,
quantities [44]. and the comoving frame (and the Deprit angles).
Introductory Article: Classical Mechanics 13
The Lagrangian [45] gives immediately (after Note that if I1 = I2 = I, the above analysis is
expressing w, i.e., n, z, i3 , in terms of the Euler extremely simplified. Furthermore, if gravity g acts
angles
0 , 0 , 0 ) an expression for the variables on the system the Hamiltonian will simply change by
p
0 , p0 , p 0 conjugated to
0 , 0 , 0 : the addition of a potential mgz if z is the height of
the center of mass. Then (see Figure 4), if the center
p
0 M n0 ; p0 M z0 ; p M i3 48
0 of mass of the body is on the axis i3 and z = h cos
0 ,
and, in principle, we could proceed to compute the and h is the distance of the center of mass from O,
Hamiltonian. since cos
0 = cos
cos sin
sin cos , the Hamil-
However, the computation can be avoided tonian will become H = K mgh cos
0 or
because of the very remarkable property (DEPRIT), 1=2
which can be checked with some patience, making G2 G2 L2 M3 L M23
H mgh 1
use of [48] and of elementary spherical trigonometry 2I3 2I G2 G2
identities, 1=2 !
L2
1 2 cos 53
M3 d G d L d G
p0 d0 p 0 d 0 p
0 d
0 49
so that, again, the system is integrable by quadratures
which means that the map ((M3 , ), (L, ), (with the roles of and interchanged with respect
(G, )) ! ((p
0 ,
0 ), (p0 , 0 ), (p 0 , 0 )) is a canoni- to the previous case) in suitable regions of phase space.
cal map. And in the new coordinates, the kinetic This is called the Lagranges gyroscope.
energy, hence the Hamiltonian, takes the form A less elementary integrable case is when the
" !# inertia moments are related as I1 = I2 = 2I3 and the
1 L2 2 2 sin2 cos2 center of mass is in the i1 i2 plane (rather than on
K G L 50 the i3 -axis) and only gravity acts, besides the
2 I3 I1 I2
constraint force on the pivot point O; this is called
This again shows that G, M3 are constants of Kowalevskaias gyroscope.
motion, and the L, variables are determined by a For more details, see Gallavotti (1983).
quadrature, because the Hamilton equation for
combined with the energy conservation yields
Other Quadratures
!
2 2
_ 1 sin cos An interesting classical integrable motion is that of a
I3 I1 I2 point mass attracted by two equal-mass centers of
v
gravitational attraction, or a point ideally constrained
u
u2E G2 sin2 cos2 to move on the surface of a general ellipsoid.
u I1 I2
t 51 New integrable systems have been discovered
1 sin2 cos2
I3 I1 I2 quite recently and have generated a wealth of new
developments ranging from group theory (as integ-
In the integrability region, this motion is periodic rable systems are closely related to symmetries) to
with some period TL (E, G). Once (t) is determined, partial differential equations.
the Hamilton equation for leads to the further It is convenient to extend the notion of integ-
quadrature rability by stating that a system is integrable in a
!
sin2 t cos2 t region W of phase space if
_ G 52
I1 I2 1. there is a change of coordinates (p, q) 2
W ! {A, a, Y, y} 2 (U T ) (V Rm ) where
which determines a second periodic motion with U R , V Rm , with m 1, are open sets; and
period TG (E, G). The , M3 are constants and, 2. the A, Y are constants of motion while the other
therefore, the motion takes place on three- coordinates vary linearly:
dimensional invariant tori T E, G, M3 in phase space,
each of which is always foliated into two- a; y ! a wA; Yt; y vA; Yt 54
dimensional invariant tori parametrized by the
where w(A, Y), v(A, Y) are smooth functions.
angle which is constant (by [50], because K is
M3 -independent): the latter are in turn foliated by In the new sense, the systems studied in the previous
one-dimensional invariant tori, that is, by periodic sections are integrable in much wider regions (essen-
orbits, with E, G such that the value of tially on the entire phase space with the exception of a
TL (E, G)=TG (E, G) is rational. set of data which lie on lower-dimensional surfaces
14 Introductory Article: Classical Mechanics
forming sets of zero volume). The notion is con- whose Lax pair is related to that of the Calogero
venient also because it allows us to say that even the lattice.
systems of free particles are integrable. By taking suitable limits as n ! 1 and as the
Two very remarkable systems integrable in the other parameters tend to 0 or 1 at suitable rates,
new sense are the Hamiltonian systems, respectively integrability of a few differential equations, among
called Toda lattice (KRUSKAL, ZABUSKY), and which the KortewegdeVries equation or the non-
Calogero lattice (CALOGERO, MOSER); if (pi , qi ) 2 R 2 , linear Schrodinger equation, can be derived.
they are As mentioned in the introductory section, sym-
metry properties under continuous groups imply
1 Xn X
n1
existence of constants of motion. Hence, it is natural
HT p; q p2i g eqi1 qi
2m i1 i1 to think that integrability of a mechanical system
1 Xn Xn
g reflects enough symmetry to imply the existence of
HC p; q p2i 55 as many constants of motion, independent and in
2m i1 i<j qi qj 2 involution, as the number of degrees of freedom, n.
1X n This is in fact always true, and in some respects it
m!2 q2i is a tautological statement in the anisochronous
2 i1
cases. Integrability in a region W implies existence
where m > 0 and , !, g 0. They describe the of canonical actionangle coordinates (A, a) (see the
motion of n interacting particles on a line. section Quasiperiodicity and integrability) and the
The integration method for the above systems is Hamiltonian depends solely on the As: therefore, its
again to find first the constants of motion and later restriction to W is invariant with respect to the
to look for quadratures, when appropriate. The action of the continuous commutative group T n of
constants of motion can be found with the method the translations of the angle variables. The actions
of the Lax pairs. One shows that there is a pair of can be seen as constants of motion whose existence
self-adjoint n n matrices M(p, q), N(p, q) such that follows from Noethers theorem, at least in the
the equations of motion become anisochronous cases in which the Hamiltonian
p formulation is equivalent to a Lagrangian one.
d
Mp; q iMp; q; Np; q; i 1 56 What is nontrivial is to recognize, prior to
dt realizing integrability, that a system admits this
which imply that M(t) = U(t)M(0)U(t)1 , with U(t) a kind of symmetry: in most of the interesting cases,
unitary matrix. When the equations can be written in the systems either do not exhibit obvious symmetries
the above form, it is clear that the n eigenvalues of the or they exhibit symmetries apparently unrelated to
matrix M(0) = M(p0 , q0 ) are constants of motion. the group T n , which nevertheless imply existence of
When appropriate (e.g., in the Calogero lattice case sufficiently many independent constants of motion
with ! > 0), it is possible to proceed to find canonical as required for integrability. Hence, nontrivial
actionangle coordinates: a task that is quite difficult integrable systems possess a hidden symmetry
due to the arbitrariness of n, but which is possible. under T n : the rigid body is an example.
The Lax pairs for the Calogero lattice (with However, very often the symmetries of a Hamiltonian
! = 0, g = m = 1) are H which imply integrability also imply partial
isochrony, that is, they imply that the number of
Mhh ph ; Nhh 0 independent frequencies is smaller than n (see the
i 1 57 section Quasiperiodicity and integrability). Even
Mhk ; Nhk 2
h 6 k in such cases, often a map exists from the original
qh qk qh qk
coordinates (p, q) to the integrating variables (A, a)
while for the Toda lattice (with m = g = 12 = 1) the in which A are constants of motion and the a are
nonzero matrix elements of M, N are uniformly rotating angles (some of which are also
constant) with spectrum w(A), which is the gradient
Mhh ph ; Mh; h1 Mh1; h eqh qh1 A h(A) for some function h(A) depending only on a
58
Nh; h1 Nh1; h i eqh qh1 few of the A coordinates. However, the map might
fail to be canonical. The system is then said to be
which are checked by first trying the case n = 2. bi-Hamiltonian: in the sense that one can represent
Another integrable system (SUTHERLAND) is motions in two systems of canonical coordinates,
1 Xn Xn
g not related by a canonical transformation, and by
HS p; q p2k 2
59 two Hamiltonian functions H and H0 h which
2m ik h<k sinh qh qk generate the same motions in the respective
Introductory Article: Classical Mechanics 15
coordinates (the latter changes of variables are power series expansion in " as " = "1 "2 2 .
sometimes called canonical with respect to the Hence, 1 would have to satisfy
pair H, H 0 while the transformations considered in
the section Canonical transformations of phase wA0 a 1 A0 ; a f A0 ; a f A0 61
space coordination are called completely
canonical). where f (A0 ) depends only on A0 (hence integrating
For more details, we refer the reader to Calogero both sides with respect to a, it appears that f (A0 )
and Degasperis (1982). must coincide with the average of f (A0 , a) over a).
This implies that the Fourier transform fn (A),
n 2 Z , should satisfy
Generic Nonintegrability fn A0 0 if wA0 n 0; n 6 0 62
It is natural to try to prove that a system close to which is equivalent to the existence of e fn (A0 ) such that
an integrable one has motions with properties very 0 e
fn (A) = w(A ) n fn (A) for n 6 0. But since there is no
close to quasiperiodic. This is indeed the case, but in relation between w(A) and f (A, a), this property
a rather subtle way. That there is a problem is easily generically will not hold in the sense that as close
seen in the case of a perturbation of an anisochro- as wished to an f which satisfies the property [62] there
nous integrable system. will be another f which does not satisfy it essentially no
Assume that a system is integrable in a region W matter how closenessPis defined, (e.g., with respect to
of phase space which, in the integrating actionangle the metric jjf gjj = n jfn (A) gn (A)jj). This is so
variables (A, a), has the standard form U T with 2
because the rank of AA h(A) is higher than 1 and w(A)
a Hamiltonian h(A) with gradient w(A) = @A h(A). If varies at least on a two-dimensional surface, so that
the forces are perturbed by a potential which is w n = 0 becomes certainly possible for some n 6 0
smooth then the new system will be described, in the while fn (A) in general will not vanish, so that 1 ,
same coordinates, by a Hamiltonian like hence " , does not exist.
This means that close to a function f there is a
H" A; a hA "f A; a 60 function f 0 which violates [62] for some n. Of course,
this depends on what is meant by close: however,
with h, f analytic in the variables A, a. here essentially any topology introduced on the
If the system really behaved like the unperturbed space of the functions f will make the statement
one, it ought to have constants of motion of the correct. For instance, ifPthe distance between two
form F" (A, a) analytic in " near " = 0 and uniform, functions is defined by n supA2U jfn (A) gn (A)j or
that is, single valued (which is the same as periodic) by sup A, a jf (A, a) g(A, a)j.
in the variables a. However, the following theorem The idea behind the last statement of the theorem
(POINCARE) shows that this is a somewhat unlikely is in essence the same: consider, for simplicity, the
possibility. anisochronous case in which the matrix AA h(A)
2
Integrability in the sense (i), (ii) can be called The condition that B is a constant of motion can be
analytic integrability and it is the strongest (and written order by order in its expansion in ": the first
most naive) sense that can be given to the attribute. two orders are
The first part of the theorem, that is, (i), (ii), holds wA @a B0 A; a 0
simply because, if integrability was assumed, a
@A f A; a @a B0 A; a @a f A; a @A B0 A; a 64
generating function of the integrating map would
have the form A0 a " (A0 , a) with admitting a wA @a B1 A; a 0
16 Introductory Article: Classical Mechanics
Then the above two relations and anisochrony imply coordinates as "V(x), in terms of the actionangle
(1) that B0 must be a function of A only and (2) that variables of the unperturbed, integrable, system.
w(A) n and @A B0 (A) n vanish simultaneously for all In particular, the problem arises when trying to
n. Hence, the gradient of B0 must be proportional to check nonexistence of nontrivial constants of
w(A), that is, to the gradient of h(A) : A B0 (A) = motion when the anisochrony assumption (cf. the
(A) A h(A). Therefore, generically (because of the previous section) is not satisfied. Usually it
anisochrony) it must be that B0 depends on A becomes satisfied to second order (or higher):
through h(A) : B0 (A) = F(h(A)) for some F. but to show this, a more detailed information on
Looking again, with the new information, at the the structure of the perturbing function expressed
second of [64] it follows that at fixed A the in actionangle variables is needed. For instance,
a-derivative in the direction w(A) of B1 equals this is often necessary even when the perturbation
F0 (h(A)) times the a-derivative of f, that is, is approximated by a trigonometric polynomial, as
B1 (A, a) = f (A, a)F0 (h(A)) C1 (A). it is essentially always the case in celestial
Summarizing: the constant of motion B has been mechanics.
written as B(A, a) = F(h(A)) "F0 (h(A))f (A, a) Finding explicit expressions for the actionangle
"C1 (A) "2 B2 which is equivalent to variables is in itself a rather nontrivial task which
B(A, a) = F(H" ) "(B00 "B01 ) and therefore leads to many problems of intrinsic interest even in
B00 "B01 is another analytic constant of seemingly simple cases. For instance, in the case of
motion. Repeating the argument also B00 "B01 the planar gravitational central motion, the Kepler
must have the form F1 (H" ) "(B000 "B001 ); equation = " sin (see the first of [41]) must be
conclusion solved expressing in terms of (see the first of
[42]). It is obvious that for small ", the variable
B FH" "F1 H" "2 F2 H" can be expressed as an analytic function of ":
"n Fn H" O"n1 65 nevertheless, the actual construction of this expres-
sion leads to several problems. For small ", an
By analyticity, B = F" (H" (A, a)) for some F" : hence interesting algorithm is the following.
generically all constants of motion are trivial. Let h() = , so that the equation to solve (i.e.,
Therefore, a system close to integrable cannot the first of [41]) is
behave as it would naively be expected. The
h " sin h
problem, however, was not manifest until POIN-
CAREs proof of the above results: because in most @c
" h 66
applications the function f has only finitely many @
Fourier components, or at least is replaced by an where c() = cos ; the function ! h() should be
approximation with this property, so that at least periodic in , with period 2, and analytic in ", for
[62] and even a few of the higher-order constraints " small and real. If h() = "h(1) "2 h(2) , the
like [64] become possible in open regions of action Fourier transform of h(k) () satisfies the recursion
space. In fact, it may happen that the values of A of relation
interest are restricted so that w(A) n = 0 only for
large values of n for which fn = 0. Nevertheless, X1
1 X
hk
i0 c0 i0 p
the property that fn (A) = (w(A) n)e fn (A) (or the p! k1 kp k1
p1
analogous higher-order conditions, e.g., [64]), 0 1 p
Y
which we have seen to be necessary for analytic hkj j ; k>1 67
integrability of the perturbed system, can be
checked to fail in important problems, if no with c the Fourier transform of the cosine (c1 = 12 ,
approximation is made on f. Hence a conceptual c = 0 if 6 1), and (of course) h(1) = ic .
problem arises. Equation [67] is obtained by expanding the RHS
For more details see Poincare (1987). of [66] in powers of h and then taking the Fourier
transform of both sides retaining only terms of order
k in ".
Iterating the above relation, imagine drawing all
Perturbing Functions
trees
with k branches, or lines, distinguished
To check, in a given problem, the nonexistence of by a label taking k values, and k nodes and attach to
nontrivial constants of motion along the lines each node v a harmonic label v = 1 as in Figure 5.
indicated in the previous section, it is necessary to The trees will be assumed to start with a root line vr
express the potential, usually given in Cartesian linking a point r and the first node v (see Figure 5)
Introductory Article: Classical Mechanics 17
4
(also readable from the tree representation): the
1
actual radius of convergence, first determined by
5
Laplace, of the series for h can also be determined
2 6
from the latter expression for h (ROUCHE) or directly
7 from the tree representation: it is
0.6627.
0
3 8 One can find better estimates or at least more
9 efficient methods for evaluating the sums in [69]:
10 in fact, in performing the sum in [69] important
cancellations occur. For instance, the harmonic
Figure 5 An example of a tree graph and its labels. It contains
only one simple node (3). Harmonics are indicated next to their labels can be subject to the further strong constraint
nodes. Labels distinguishing lines are not marked. that no line carries zero current because the
sum of the values of the trees of fixed order and
with at least one line carrying zero current
and then bifurcate arbitrarily (such trees are some-
vanishes.
times called rooted trees).
The above expansion can also be simplified by
Imagine the tree oriented from the endpoints
partial resummations. For the purpose of an
towards the root r (not to be considered a node)
example, let the nodes with one entering and one
and given a node v call v0 the node immediately
exiting line (see Figure 5) be called as simple
following it. If v is the first node before the root r,
nodes. Then all tree graphs which, on any line
let v0 = r and v 0 = 1. For each such decorated tree
between two nonsimple nodes, contain any number
define its numerical value
of simple nodes can be eliminated. This is done by
i Y Y replacing, in evaluating the (remaining) tree values,
Val
v 0 v cv 68 the factors v0 v in [68] by v0 v =(1 " cos ): then
k! lines lv0 v nodes the value of
(denoted Val(
) ) for a tree becomes a
and define a current (l) on a line l = v0 v to be the function of and " and [69] is replaced by
sum of P the harmonics of the nodes preceding
v0 : (l) = wv v . Call (
) the current flowing in
1 X
X
h "k ei Val
70
the root branch and call order of
the number of
k1
;
nodes (or branches). Then order
k
X where the
means that the trees are subject to the
hk
Val
69 further restriction of not containing any simple
;
order
k node. It should be noted that the above graphical
representation of the solution of the Kepler equation
provided trees are considered identical if they can be
is strongly reminiscent of the representations of
overlapped (labels included) after suitably scaling
quantities in terms of graphs that occur often in
the lengths of their branches and pivoting them
quantum field theory. Here the trees correspond to
around the nodes out of which they emerge (the root
Feynman graphs, the factors associated with the
is always imagined to be fixed at the origin).
nodes are the couplings, the factors associated with
If the trees are stripped of the harmonic labels,
the lines are the propagators, and the resummations
their number is finite and it can be estimated to be
are analogous to the self-energy resummations,
k!4k (because the labels which distinguish the lines
while the cancellations mentioned above can be
can be attached to an unlabeled tree in many ways).
related to the class of identities called Ward
The harmonic labels (i.e., v = 1) can be laid
identities. Not only the analogy can be shown not
down in 2k ways, and the value of each tree can be
to be superficial, but it also turns out to be very
bounded by P(1=k!)2
k
(because c1 = 12).
(k) k helpful in key mechanical problems: see Appendix 1.
Hence jh j 4 , which gives a (rough) The existence of a vast number of identities
estimate of the radius of convergence of the
relating the tree values is shown already by the
expansion of h in powers of ": namely 0.25 (easily
simple form of the Lagrange series and by the
improvable to 0.3678 if 4k k! is replaced by kk1
even more remarkable resummation (LEVI-CIVITA)
using Cayleys formula for the enumeration of
leading to
rooted trees). A simple expression for h(k) ( )
(LAGRANGE) is k
X1
" sin k 1
1 h @ 71
hk = @ k1 sink k! 1 " cos
k! k1
18 Introductory Article: Classical Mechanics
It is even possible to further collect the series analytic invariant torus on which the motion is
terms to express it as a series with much better quasiperiodic and
convergence properties; for instance, its terms can be
1. has the same spectrum w 0 ,
reorganized and collected (resummed) so that h is
2. depends analytically on " at least for " small,
expressed as a power series in the parameter
3. reduces to the unperturbed torus {A0 } T as
p
" e 1"
2 " ! 0.
p 72
1 1 "2 More concretely, the question is:
with radius of convergence 1, which corresponds to Are there functions H " (y ), h" (y ) analytic in y 2 T
" = 1 (via a simple argument by Levi-Civita). The and in " near 0, vanishing as " ! 0 and such that the
torus with parametric equations
analyticity domain for the Lagrange series is jj < 1.
This also determines the value of Laplace radius,
which is the point closest to the origin of the A A0 H " y ; a y h" y ; y 2 T 73
complex curve j(")j = 1: it is imaginary so that it is def
is invariant and, if w 0 = w(A0 ), the motion on it is
the root of the equation
simply y ! y w 0 t, i.e., it is quasiperiodic with
p p spectrum w 0 ?
2
"e 1 " =1 1 "2 1
The analysis provides an example, in a simple In this context, Poincares theorem (in the section
case of great interest in applications, of the kind of Generic nonintegrability) had followed another
computations actually necessary to represent the key result, earlier developed in particular cases and
perturbing function in terms of actionangle completed by him, which provides a partial answer
variables. The property that the function c() in to the question.
[66] is the cosine has been used only to limit the Suppose that w 0 = w(A0 ) 2 R satisfies a Diophan-
range of the label to be 1; hence the same tine property, namely suppose that there exist
method, with similar results, can be applied to constants C, > 0 such that
study the inversion of the relation between the
average anomaly and the true anomaly
and to 1
jw 0 nj ; for all 0 6 n 2 Z 74
efficiently obtain, for instance, the properties of Cjnj
f, g in [42].
For more details, the reader is referred to Levi- which, for each > 1 fixed, is a property
Civita (1956). enjoyed by all w 2 R but for a set of zero measure.
Then the motions on the unperturbed torus run over
trajectories that fill the torus densely because of the
irrationality of w 0 implied by [74]. Writing
Lindstedt and Birkhoff Series:
Hamiltons equations,
Divergences
Nonexistence of constants of motion, rather than a_ @A H0 A " A f A; a; A_ " a f A; a
being the end of the attempts to study motions close
to integrable ones by perturbation methods, marks with A, a given by [73] with y replaced by y wt,
the beginning of renewed efforts to understand their and using the density of the unperturbed trajectories
nature. implied by [74], the condition that [73] are
Let (A, a) 2 U T be actionangle variables equations for an invariant torus on which the
defined in the integrability region for an analytic motion is y ! y w 0 t are
Hamiltonian and let h(A) be its value in the action
angle coordinates. Suppose that h(A0 ) is anisochro- w 0 w 0 y h" y A H0 A0 H " y
nous and let f (A, a) be an analytic perturbing " A f A0 H " y ; y h" y w 0 y H " y
function. Consider, for " small, the Hamiltonian
" a f A0 H " y ; y h" y 75
H" (A, a) = H0 (A) "f (A, a).
Let w 0 = w (A0 ) A H0 (A) be the frequency spec-
The theorem referred to above (POINCARE) is that
trum (see the section Quasiperiodicity and integ-
rability) of one of the invariant tori of the Theorem 2 If the unperturbed system is anisochro-
unperturbed system corresponding to an action A0 . nous and w 0 = w(A0 ) satisfies [74] for some C, > 0
Short of integrability, the question to ask at this there
P1 exist two well definedP power series h" (y ) =
k (k) 1 k (k)
point is whether the perturbed system admits an k=1 " h (y ) and H " (y ) = k = 1 " H (y ) which
Introductory Article: Classical Mechanics 19
solve [75] to all orders in ". The series for H " is u" A0 "A2
uniquely determined, and such is also the series for
F" A0 ; a
h" up to the addition of an arbitrary constant at each
order, so that it is unique if h" is required, as X
1 X i 2 k
"k fn eian 77
henceforth done with no loss of generality, to have i!01 1 !02 2 k1
k1 06n2Z2
zero average over y .
The algorithm for the construction is illustrated in The series does not converge: in fact, its convergence
a simple case in the next section (see eqns [83], would imply integrability and, consequently,
[84]). Convergence of the above series, called bounded trajectories in phase space: however, the
Lindstedt series, even for small " has been a problem equations of motion for [76] can be easily solved
for rather a long time. Poincare proved the existence explicitly and in any open region near given initial
of the formal solution; but his other result, discussed data there are other data which have unbounded
in the section Generic nonintegrability, casts trajectories if !01 =(!02 ") is rational.
doubts on convergence although it does not exclude Nevertheless, even in this elementary case a
it, as was immediately stressed by several authors formal sum of the series yields
(including Poincare himself). The result in that
section shows the impossibility of solving [75] for uA0 "A02
all w 0 s near a given spectrum, analytically and X fn eian 78
uniformly, but it does not exclude the possibility of F" A0 ; a "
2
i!01 1 !20 "2
06n2Z
solving it for a single w 0 .
The theorem admits several extensions or analogs: and the series in [78] (no longer a power series in ")
an interesting one is to the case of isochronous is really convergent if w = (!01 , !02 ") is a Dio-
unperturbed systems: phantine vector (by [74], because analyticity implies
Given the Hamiltonian H" (A, a) = w 0 A "f (A, a), exponential decay of jfn j). Remarkably, for such
with w 0 satisfying [74] and f analytic, there exist values of " the Hamiltonian H" is integrable and it is
power series C" (A0 , a 0 ), u" (A0 ) such that H" (C" (A0 , a 0 )) = integrated by the canonical map generated by [78],
w 0 A0 u" (A0 ) holds as an equality between formal in spite of the fact that [78] is obtained, from [77],
power series (i.e., order by order in ") and at the via the nonrigorous sum rule
same time the C" , regarded as a map, satisfies order by
order the condition (i.e., (4.3)) that it is a canonical map. X
1
1
zk for z 6 1 79
This means that there is a generating function k0
1z
A0 a F" (A0 , a) also
P defined by a formal power
series F" (A0 , a) = 1 k=1 " k (k)
F (A 0
, a), that is, such (applied to cases with jzj 1, which are certainly
0 0
that if C" (A , a ) = (A, a) then it is true, order by realized for a dense set of "s even if w is Diophantine
order in powers of ", that A = A0 a F" (A0 , a) and because the zs have values z = 2 =w 0 n). In other
a 0 = a A0 F" (A0 , a). The series for F" , u" are called words, the integration of the equations is elementary
Birkhoff series. and once performed it becomes apparent that, if w is
In this isochronous case, if Birkhoff series were diophantine, the solutions can be rigorously found
convergent for small " and (A0 , a) in a region of the from [78]. NoteP that,k for instance, this means that
form U T , with U R open and bounded, it relations like 1 k = 0 2 = 1 are really used to obtain
would follow that, for small ", H" would be inte- [78] from [77].
grable in a large region of phase space (i.e., where the Another extension of Lindstedt series arises in a
generating function can be used to build a canonical perturbation of an anisochronous system when
map: this would essentially be U T deprived of a asking the question as to what happens to the
small layer of points near the boundary of U). unperturbed invariant tori T w 0 on which the spec-
However, convergence for small " is false (in general), trum is resonant, that is, w 0 n = 0 for some n 6 0,
as shown by the simple two-dimensional example n 2 Z . The result is that even in such a case there is a
formal power series solution showing that at least
H" A; a w 0 A " A2 f a a few of the (infinitely many) invariant tori into
76 which T w0 is in turn foliated in the unperturbed case
A; a 2 R2 T2
can be formally continued at " 6 0 (see the section
with f (a) an arbitrary analytic function with all Resonances and their stability).
Fourier coefficients fn positive for n 6 0 and fo = 0. For more details, we refer the reader to Poincare
In the latter case, the solution is (1987).
20 Introductory Article: Classical Mechanics
Quasiperiodicity and KAM Stability This is a stability result: for instance, in systems
with two degrees of freedom the invariant tori of
To discuss more advanced results, it is convenient
dimension two which lie on a given three-dimensional
to restrict attention to a special (nontrivial) para-
energy surface, will separate the points on the energy
digmatic case
surface into the set which is inside the torus and the
H" A; a 12 A2 "f a 80 set which is outside. Hence, an initial datum
starting (say) inside cannot reach the outside. Like-
In this simple case (called Thirring model: represent- wise, a point starting between two tori has to stay in
ing particles on a circle interacting via a potential between forever. Further, if the two tori are close, this
"f (a)) the equations for the maximal tori [75] means that motion will stay very localized in action
reduce to equations for the only functions h" : space, with a trajectory accessing only points close to
the tori and coming close to all such points, within a
w y 2 h" y " a f y h" y ; y 2 T 81 distance of the order of the distance between the
confining tori. The case of three or more degrees of
as the second of [75] simply becomes the definition
freedom is quite different (see sections Diffusion in
of H " because the RHS does not involve H " .
phase space and The three-body problem).
The real problem is therefore whether the formal
In the simple case of the rotators system [80] the
series considered in the last section converge at least
equations for the parametric representation of the
for small ": and the example [76] on the Birkhoff
tori are given by [81]. The latter bear some analogy
series shows that sometimes sum rules might be
with the easier problem in [66]: but [81] are
needed in order to give a meaning to the series. In
equations instead of one and they are differential
fact, whenever a problem (of physical interest)
equations rather than ordinary equations. Further-
admits a formal power series solution which is not
more, the function f (a) which plays here the role of
convergent, or which is such that it is not known
c() in [66] has Fourier coefficient fn with no
whether it is convergent, then one should look for
restrictions on n, while the Fourier coefficients c
sum rules for it.
for c in [66] do not vanish only for = 1.
The modern theory of perturbations starts with
The above differences are, to some extent,
the proof of the convergence for " small enough of
minor and the power series solution to [81] can
the Lindstedt series (KOLMOGOROV). The general
be constructed by the same algorithm as used in the
KAM result is:
case of [66]: namely one forms trees as in Figure 5
Theorem 3 (KAM) Consider the Hamiltonian with the harmonic labels v 2 Z replaced by n v 2 Z
H" (A, a) = h(A) "f (A, a), defined in U = V T (still to be thought of as possible harmonic indices in
with V R open and bounded and with f (A, a), the Fourier expansion of the perturbing function f).
h(A) analytic in the closure V T where h(A) is also All other labels affixed to the trees in the section
def
anisochronous; let w 0 = w(A0 ) = @A h(A0 ) and assume Generic nonintegrability will be the same. In
that w 0 satisfies [74]. Then particular, the current flowing on a branch l = v0 v
will be defined as the sum of the harmonics of the
(i) there is "C, > 0 such that the Lindstedt series
nodes w v preceding v:
converges for j"j < "C, ;
(ii) its sum yields two function H " (y ), h" (y ) on T def
X
which parametrize an invariant torus nl nw 82
wv
T C, (A0 , ");
(iii) on T C, (A0 , ") the motion is y ! y w 0 t, see
and we call n(
) the current flowing in the root
[73]; and
branch.
(iv) the set of data in U which belong to invariant
Here the value Val(
) of a tree has to be defined
tori T C, (A0 , ") with w(A0 ) satisfying [74]
differently because the equation to be solved ([81])
with prefixed C, has complement with volume
contains the differential operator (w 0 y )2 which,
<const Ca for a suitable a > 0 and with area
when Fourier transformed, becomes multiplication
also <const Ca on each nontrivial surface of
of the Fourier component with harmonic n by
constant energy H" = E.
(iw n)2 .
In other words, for small " the spectra of most The variation due to the presence of the operator
unperturbed quasiperiodic motions can still be found (w 0 y )2 and the necessity of its inversion in the
as spectra of perturbed quasiperiodic motions devel- evaluation of u h(k)
n , that is, of the component of
oping on tori which are close to the corresponding h(k)
n along an arbitrary unit vector u, is nevertheless
unperturbed ones (i.e., with the same spectrum). quite simple: the value of a tree graph
of order k
Introductory Article: Classical Mechanics 21
(i.e., with k nodes and k branches) has to be defined which lie on the same path to the root carry the
by (cf. [68]) same current and, furthermore, the node harmonics
! are bounded by jnj N for some N. Then the
def i1
k Y n v0 n v number of lines in
with divisor w 0 n satisfying
Val
k! 2 2n < Cjw 0 n j 2n1 does not exceed 4Nk2n= .
lines lv0 v w 0 nl
!
Y Hence, setting
fn v 83
def
nodes v F C2 maxjnjN jfn j
where the n v0 appearing in the factor relative to the the corresponding Val(
) can be bounded by
root line rv from the first node v to the root r (see
1 k 2k Y 1
n= def 1
Figure 5) is interpreted as a unit vector u (it was F N 22n4Nk2 Bk
interpreted as 1 in the one-dimensional case [66]). k! n0
k!
X 85
Equation [83] makes sense only for trees in which B FN 2 2 8n2n=
no line carries zero current. Then the component n
along u (the harmonic label attached to the root of a
since the product is convergent. In the case in which
tree) of h(k) is given (see also [69]) by
f is a trigonometric polynomial of degree N, the
X
above restricted contributions to u h(k) n would
u hk
n Val
84 generate a convergent series for " small enough. In
; n
n
order
k
fact, the number of trees is bounded (as in the
section Perturbing
P funct ions) by k!4k (2N 1)k so
k (k)
where the
means that the sum is only over trees in that the series n j"j ju hn j would converge for
which a nonzero current n(l) flows on the lines l 2
. small " (i.e., j"j < (B 4(2N 1) )1 ).
The quantity u h(k) 0 will be defined to be 0 (see the Given this comment, the analysis of the remain-
previous section). ing contributions becomes the real problem, and it
In the case of [66] zero-current lines could appear: requires new ideas because among the excluded trees
but the contributions from tree graphs containing at there are some simple kth order trees whose value
least one zero current line would cancel. In the alone, if considered separately from the other
present case, the statement that the above algorithm contributions, would generate a factorially divergent
actually gives h(k)n by simply ignoring trees with lines power series in ".
with zero current is nontrivial. It was Poincares However, the contributions of all large-valued
contribution to the theory of Lindstedt series to show trees of order k can be shown to cancel: although
that even in the general case (cf. [75]) the equations not exactly (unlike the case of the elementary
for the invariant tori can be solved by a formal power problem in the section Perturbing functions,
series. Equation [84] is proved by induction on k after where the cancellation is not necessary for the
checking it for the first few orders. proof, in spite of its exact occurrence), but enough
The algorithm just described leading to [83] can so that in spite of the existence of exceedingly large
be extended to the case of the general Hamiltonian values of individual tree graphs their total sum can
considered in the KAM theorem. still be bounded by a constant to the power k so that
The convergence proof is more delicate than the the power series actually converges for " small
(elementary) one for eqn [66]. In fact, the values of enough. The idea is discussed in Appendix 1.
trees of order k can give large contributions to h(k) n : For more details, the reader is referred to Poincare
because the new factors (w 0 n(l))2 , although not (1987), Kolmogorov (1954), Moser (1962), and Arnold
zero, can be quite small and their small size can (1989).
overwhelm the smallness of the factors fn and ". In
fact, even if f is a trigonometric polynomial (so that fn
vanishes identically for jnj large enough) the currents Resonances and their Stability
flowing in the branches can be very large, of the
A quasiperiodic motion with r rationally indepen-
order of the number k of nodes in the tree; see [82].
dent frequencies is called resonant if r is strictly less
This is called the small-divisors problem. The key
than the number of degrees of freedom, . The
to its solution goes back to a related work (SIEGEL)
difference s = r is the degree of the resonance.
which shows that
Of particular interest are the cases of a perturba-
Theorem 4 Consider the contribution to the sum tion of an integrable system in which resonant
in [82] from graphs
in which no pairs of lines motions take place.
22 Introductory Article: Classical Mechanics
A typical example is the n-body problem which other words, the a priori stable case, s1 = s2 = 0 in
studies the mutual perturbations of the motions of [86], is the only excluded case. Of course, the stability
n 1 particles gravitating around a more massive properties of the motions when a perturbation acts
particle. If the particle masses can be considered to will depend on the perturbation in both cases.
be negligible, the system will consist of n 1 central The a priori stable systems usually have a great
Keplerian motions: it will therefore have = 3(n 1) variety of resonances (e.g., in the anisochronous
degrees of freedom. In general, only one frequency case, resonances of any dimension are dense). The
per body occurs in the absence of the perturbations a priori unstable systems have (among possible other
(the period of the Keplerian orbit). Hence, r n 1 resonances) some very special r-dimensional
and s 2(n 1) (or in the planar case s (n 1)) resonances occurring when the unstable coordinates
with equality holding when the periods are ration- (p, q) and (p, k ) are zero and the frequencies of the r
ally independent. actionangle coordinates are rationally independent.
Another example is the rigid body with a fixed In the first case (a priori stable), the general
point perturbed by a conservative force: in this case, question is whether the resonant motions, which
the unperturbed system has three degrees of freedom form invariant tori of dimension r arranged into
but, in general, only two frequencies (see the families that fill -dimensional invariant tori, con-
discussion following [52]). tinue to exist, in presence of small enough perturba-
Furthermore, in the above examples there is the tions "f (A, a), on slightly deformed invariant tori.
possibility that the independent frequencies assume, Similar questions can be asked in the a priori
for special initial data, values which are rationally unstable cases. To examine the matter more closely
related, giving rise to resonances of even higher consider the formulation of the simplest problems.
order (i.e., with smaller values of r). A priori stable resonances: more precisely, suppose
In an integrable anisochronous system, resonant H0 = 12 A2 and let {A0 } T be the unperturbed
motions will be dense in phase space because the invariant torus T A0 with spectrum w 0 = w(A0 ) =
frequencies w(A) will vary as much as the actions @A H0 (A0 ) with only r rationally independent compo-
and therefore resonances of any order (i.e., any nents. For simplicity, suppose that w 0 = (!1 , . . . ,
def
r < ) will be dense in phase space: in particular, the !r , 0, . . . , 0) = (w, 0) with w 2 Rr . The more general
periodic motions (i.e., the highest-order resonances) case in which w has only r rationally independent
will be dense. components can be reduced to the special case above
Resonances, in integrable systems, can arise in by a canonical linear change of coordinates at the price
a priori stable integrable systems and in a priori of changing the H0 to a new one, still quadratic in the
unstable systems: the former are systems whose actions but containing mixed products Ai Bj : the proofs
Hamiltonian admits canonical actionangle coordi- of the results that are discussed here would not be
nates (A, a) 2 U T with U R open, while the really affected by such more general form of H.
latter are systems whose Hamiltonian has, in It is convenient to distinguish between the fast
suitable local canonical coordinates, the form angles 1 , . . . , r and the resonant angles
r1 , . . . , (also called slow or secular) and
X
s1
1 X
s2
1 call a = (a 0 , b) with a 0 2 Tr and b 2 Ts . Likewise,
H0 A p2i 2i q2i 2j 2j 2j ; we distinguish the fast actions A0 = (A1 , . . . , Ar ) and
i1
2 j1
2 86
the resonant ones Ar1 , . . . , A and set A = (A0 , B)
i ; j > 0 with A0 2 Rr and B 2 Rs .
Therefore, the torus T A0 , A0 = (A00 , B0 ), is in turn a
where (A, a) 2 U Tr , U 2 R r , (p, q) 2 V R 2s1 , continuum of invariant tori T A0 , b with trivial
(p, k )2V R2s2 with V,V 0 neighborhoods of the
0
parametric equations: b fixed, a 0 = y , y 2 Tr , and
origin
p and = r s1 s2 , si 0, s1 s2 > 0 and A0 = A00 , B = B0 . On each of them the motion is:
p
j , j are called Lyapunov coefficients of A0 , B, b constant and a 0 ! a 0 wt, with rationally
the resonance. The perturbations considered are independent w 2 Rr .
supposed to have the form "f (A, a, p, q, p, k ). The Then the natural question is whether there exist
denomination of a priori stable or unstable refers to functions h" , k" , H " , K " smooth in " near " = 0 and in
the properties of the a priori given unperturbed y 2 Tr , vanishing for " = 0, and such that the torus
Hamiltonian. The label a priori unstable is T A0 , b 0 , " with parametric equations
certainly appropriate if s1 > 0: here also s1 = 0 is
allowed for notational convenience implying that the
A0 A00 H " y ; a 0 y h" y ;
Lyapunov coefficients in a priori unstable cases are all
p y 2 Tr 87
of order 1 (whether real j or imaginary i j ). In B B0 K " y ; b b 0 k" y
Introductory Article: Classical Mechanics 23
meaning via summation rules provided f and b 0 The case of a priori unstable systems has also
satisfy certain additional conditions and provided been widely studied. In this case too resonances
certain values of " are excluded. An example of a with Diophantine r-dimensional spectrum w are
theorem is the following: considered. However, in the case s2 = 0 (called a
priori unstable hyperbolic resonance) the Lindstedt
Theorem 6 Given the Hamiltonian [80] and a
series can be shown to be convergent, while in the
resonant torus T A00 , b 0 with w = A00 2 Rr satisfying a
case s1 = 0 (called a priori unstable elliptic reso-
Diophantine property let b 0 be a nondegenerate
nance) or in the mixed cases s1 , s2 > 0 extra
maximum R point for the average potential f ( b) def =
r conditions are needed. They involve w and
(2) Tr f (a 0 , b)dr a 0 . Consider the Lindstedt series
m = (1 , . . . , s2 ) (cf. [86]) and properties of the
solution for eqns [89] of the perturbed resonant
perturbations as well. It is also possible to study a
torus with spectrum (w, 0). It is possible to express
slightly different problem: namely to look for
the single nth-order term of the series as a sum of
conditions on w, m, f which imply that, for small
many terms and then rearrange the series thus
", invariant tori with spectrum "-dependent but
obtained so that the resummed series converges for
close, in a suitable sense, to w exist.
" in a domain E which contains a segment [0, "0 ] and
The literature is vast, but it seems fair to say that,
also a subset of ["0 , 0] which, although with open
given the above comments, particularly those con-
dense complement, is so large that it has 0 as a
cerning uniqueness and analyticity, the situation is still
Lebesgue density point. Furthermore, the resummed
quite unsatisfactory. We refer the reader to Gallavotti
series for h" , k" define an invariant r-dimensional
et al. (2004) for more details.
analytic torus with spectrum w.
More generally, if b 0 is only a nondegenerate
stationarity point for f ( b), the domain of definition
Diffusion in Phase Space
of the resummed series is a set E ["0 , "0 ] which
on both sides of the origin has an open dense The KAM theorem implies that a perturbation of an
complement although it has 0 as a Lebesgue density analytic anisochronous integrable system, i.e., with
point. an analytic Hamiltonian H" (A, a) = H0 (A)
Theorem 6 can be naturally extended to the "f (A, a) and nondegenerate Hessian matrix
2
general case in which the Hamiltonian is the most @AA h(A), generates large families of maximal invar-
general perturbation of an anisochronous integrable iant tori. Such tori lie on the energy surfaces but do
2
system H" (A, a) = h(A) "f (A, a) if @AA h is a non- not have codimension 1 on them, i.e., they do not
singular matrix and the resonance arises from a split the (2 1)dimensional energy surfaces into
spectrum w(A0 ) which has r independent compo- disconnected regions except, of course, in the case of
nents (while the remaining are not necessarily zero). systems with two degrees of freedom (see the section
We see that the convergence is a delicate problem Quasiperiodicity and KAM stability).
for the Lindstedt series for nearly integrable reso- The refore, there might exist trajectories with
nant motions. They might even be divergent initial data close to Ai in action space which reach
(mathematically, a proof of divergence is an open phase space points close to Af 6 Ai in action space
problem but it is a very reasonable conjecture in for " 6 0, no matter how small. Such diffusion
view of the above physical interpretation); never- phenomenon would occur in spite of the fact that
theless, Theorem 6 shows that sum rules can be the corresponding trajectory has to move in a space
given that sometimes (i.e., for " in a large set near in which very close to each {A} T there is an
" = 0) yield a true solution to the problem. invariant surface on which points move keeping
This is reminiscent of the phenomenon met in A constant within O("), which for " small can be
discussing perturbations of isochronous systems in jAf Ai j.
[76], but it is a much more complex situation. It In a priori unstable systems (cf. the section
leaves many open problems: foremost among them Resonances and their stability) with s1 = 1,
is the question of uniqueness. The sum rules of s2 = 0, it is not difficult to see that the correspond-
divergent series always contain some arbitrary ing phenomenon can actually occur: the paradig-
choices, which lead to doubts about the uniqueness matic example (ARNOLD) is the a priori unstable
of the functions parametrizing the invariant tori system
constructed in this way. It might even be that the
convergence set E may depend upon the arbitrary A21 p2
H" A2 gcos q 1
choices, and that considering several of them no " 2 2
with j"j < "0 is left out. "cos 1 sin 2 cos q 1 90
Introductory Article: Classical Mechanics 25
called the Arnold diffusion. Simple sufficient con- Long-Time Stability of Quasiperiodic
ditions for a transition from near Ai to near Af are Motions
expressed by the following result:
A more difficult problem is whether the same
Theorem 7 Given the Hamiltonian [91] with Hu phenomenon of migration in action space occurs in
admitting two hyperbolic fixed points P with a priori stable systems. The root of the difficulty is a
heteroclinic connections, t ! (pa (t), qa (t)), a = 1, 2, remarkable stability property of quasiperiodic
suppose that: motions. Consider Hamiltonians H" (A, a) = h(A)
(i) On the unperturbed energy surface of energy "f (A, a) with H0 (A) = h(A) strictly convex, analytic,
E = H(Ai ) Hu (P ) there is a regular curve and anisochronous on the closure U of an open
: s ! A(s) joining Ai to Af such that the bounded region U R , and a perturbation "f (A, a)
unperturbed tori {A(s)} T can be continued analytic in U T .
at " 6 0 into invariant tori T A(s), " for a set of Then a priori bounds are available on how long it
values of s which fills the curve leaving only can possibly take to migrate from an action close to
gaps of size of order o("). A1 to one close to A2 : and the bound is of
(ii) The matrix Dij of the second derivatives of exponential type as " ! 0 (i.e., it admits a lower
the integral of f over the heteroclinic motions is bound which behaves as the exponential of an
not degenerate, that is, inverse power of "). The simplest theorem is
(NEKHOROSSEV):
j det Dj Theorem 7 There are constants 0 < a, b, d, g,
Z such that any initial datum (A, a) evolves so that A
1
det dt @i j f A; a wAt; will not change by more than a"g before a long time
1 bounded below by exp (b"d ).
Thus, this puts an exponential bound, i.e., a
pa t; qa t
of degrees of freedom and h(A) is far from strictly with " small and the mass mM moves in the plane of
convex), leaving wide open the possibility of observ- the circular orbit. This will be called the circular
ing rapid diffusion. restricted three-body problem.
Further, changing the assumptions can dramati- In a reference system with center S and rotating at
cally change the results. For instance, rapid diffusion the angular speed of J around S inertial forces
can sometimes be proved even though it might be (centrifugal and Coriolis) act. Supposing that the
feared that it should require exponentially long body J is located on the axis with unit vector i at
times: an example that has been proposed is the distance R from the origin S, the acceleration of the
case of a three-timescales system, with Hamiltonian point M is
p2 2 "R
!1 A1 !2 A2 g1 cos q F !0 R
R i 2w 0 ^ R_
2 1"
"f 1 ; 2 ; p; q 93
if F is the force of attraction and w 0 ^ R_ !0 R_ ?
def 1=2 1=2
with w " = (!1 , !2 ), where !1 = " !, !2 = " !e where w 0 is a vector with jw 0 j = !0 and perpen-
and p e > 0 constants. The three scales are
!,
! dicular to the orbital plane and R? def = (2 , 1 ) if
!1
1 , g1 , !1
2 . In this case, there are many R = (1 , 2 ). Here, taking into account that the origin
(although by no means all) pairs A1 , A2 which can S rotates around the fixed center of mass, !20 (R
be connected within a time that can be estimated to "R=(1 ")i) is the centrifugal force while 2w 0 ^ R_
be of order O("1 log "1 ). is the Coriolis force. The equations of motion can
This is a rapid-diffusion case in an a priori therefore be derived from a Lagrangian
unstable system in which condition [92] is not
satisfied: because the "-dependence of w(A) implies 1 2 1
L R_ W !0 R? R_ !20 R2
that the lower bound c in [92] must depend on " 2 2
(and be exponentially small with an inverse power 2 "R
!0 Ri 94
of " as " ! 0). 1"
The unperturbed system in [93] is nonresonant in with
the H0 part for " > 0 outside a set of zero measure
(i.e., where the vector w " satisfies a suitable def
!20 R3 kmS 1 " g0
Diophantine property) and, furthermore, it is
kmS kmS "
a priori unstable: cases met in applications can be W
a priori stable and resonant (and often not aniso- jRj jR Rij
chronous) in the H0 part. In such a system, not only where k is the gravitational constant, R the distance
the speed of diffusion is not understood but between S and J, and finally the last three terms in [94]
proposals to prove its existence, if present (as come from the Coriolis force (the first) and from the
expected), have so far not given really satisfactory centripetal force (the other two, taking into account that
results. the origin S rotates around the fixed center of mass).
For more details, the reader in referred Setting g = g0 =(1 ") kmS , the Hamiltonian of
to Nekhorossev (1977). the system is
1 g 1
H p !0 R? 2 !20 R2
The Three-Body Problem 2 jRj 2
g
R
1 R
which is convenient if we study the interior problem, on an ellipse rotating at a rate !0 ) with actions
i.e., jRj < R. This can be expressed in the action (L0 , G0 ), provided " is small enough. Hence,
angle coordinates via [41], [42]:
The KAM theorem answers, at least conceptually, the
0 0 f 0 ;
0 0 0 0 f 0 classical question: can a solution of the three-body
1=2 problem remain close to an unperturbed one forever?
G20 jRj G20 1 97 That is, is it possible that a solar system is stable
e 1 2 ; forever?
L0 R gR 1 ecos0 f0
where (see [42]), f =f (esin, ecos) and Assuming e, j%j=R 1 and retaining only the lowest
orders in e and j%j=R 1 the Hamiltonian [98]
5 simplifies into
f x; y 2x 1 y
4
g2 "g G40
with the ellipsis denoting higher orders in x, y even H 2 !G0 " G0 3cos20 0
2L0 2R g2 R2
in x. The Hamiltonian takes the form, if !2 = gR3 , 9
e cos0 e cos0 20
g2 g 2
H" !G0 " FG0 ; L0 ;0 ;0 0 98 3
2L20 R ecos30 20 100
2
where the only important feature (for our purposes) is
where
that F(L, G, ,
) is an analytic function of L, G, ,
near a datum with jGj < L (i.e., e > 0) and jRj < R. "g G40
However, the domain of analyticity in G is rather " G0 1 "1=2 1!G0
2R g2 R2
small as it is constrained by jGj < L excluding in 1=2
particular the circular orbit case G = L. G20
e 1 2
Note that apparently the KAM theorem fails to be L0
applicable to [98] because the matrix of the second
derivatives of H0 (L, G) has vanishing determinant. It is an interesting exercise to estimate, assuming
Nevertheless, the proof of the theorem also goes as model the Hamiltonian [100] and following the
through in this case, with minor changes. This can proof of the KAM theorem, how small has " to be if
be checked by studying the proof or, following a a planet with the data of Mercury can be stable
remark by Poincare, by simply noting that the forever on a (slowly precessing) orbit with actions
squared Hamiltonian H0" def
= (H" )2 has the form close to the present-day values under the influence
of a mass " times the solar mass orbiting on a circle,
2
g2 at a distance from the Sun equal to that of Jupiter. It
H" 2 !G0 "F0 G0 ; L0 ; 0 ; 0 0 99
0
is possible to follow either the above reduction to
2L0
the ordinary KAM theorem or to apply directly to
with F0 still analytic. But this time [100] the Lindstedt series expansion, proceeding
@ 2 H00 along the lines of the section Quasiperiodicity and
det 6g2 L4 2
0 !0 h 6 0 KAM stability. The first approach is easy but the
@G0 ; L0
second is more efficient: in both cases, unless the
if h g2 L2
0 2!G0 6 0 estimates are done in a particularly careful manner,
the value found for "mS is not interesting from the
Therefore, the KAM theorem applies to H0" and viewpoint of astronomy.
the key observation is that the orbits generated by The reader is refered to Arnold (1989) for more
the Hamiltonian (H" )2 are geometrically the same as details.
those generated by the Hamiltonian H" : they are
only run at a different speed because of the need of a
time rescaling by the constant factor 2H" .
Rationalization and Regularization of
This shows that, given an unperturbed ellipse of
Singularities
parameters (L0 , G0 ) such that w = (g2 =L30 , !),
G0 > 0, with !1 =!2 Diophantine, then the perturbed Often integrable systems have interesting data which
system admits a motion which is quasiperiodic with lie on the boundary of the integrability domain. For
spectrum proportional to w and takes place on an orbit instance, the central motion when L = G (circular
which wraps around a torus remaining forever close to orbits) or the rigid body in a rotation around one of
the unperturbed torus (which can be visualized as the principal axes or the two-body problem when
described by a point moving, according to the area law G = 0 (collisional data). In such cases, perturbation
Introductory Article: Classical Mechanics 29
theory cannot be applied as discussed above. obtained from the one in [101] by letting alone
Typically, the
perturbation depends on quantities
p L, and setting
like L G and is not analytic at L = G. Never- p p
theless, it is sometimes possible to enlarge phase space p 2G cos ; q 2G sin 104
and introduce new coordinates in the vicinity of the then p, q vary in a neighborhood of the origin with
data which in the initial phase space are singular. the origin itself excluded.
A notable example is the failure of the analysis of Adding the origin of the pq plane then in a full
the circular restricted three-body problem: it appar- neighborhood of the origin, the Hamiltonian [96] is
ently fails when the orbit that we want to perturb is analytic in L, , p, q. This is because it is analytic
circular. (cf. [96], [97]) as a function of L, and e cos
0
It is convenient to introduce the canonical and of cos (0
0 ). Since
0 = f and
coordinates L, and G, :
0 0 = f by [97], the Hamiltonian [96] is
analytic in L, , e cos ( f ), cos ( f )
L L0 ; G L 0 G0 for e small (i.e., for G small) and, by [42], f is
101
0 0 ; 0 analytic in e sin ( ) and e cos ( ). Hence the
pq trigonometric identities
so that e = 2GL1 1 G(2L)1 and 0 = r
and
0 = 0 f0 , where f0 is defined in [42] (see p sin q cos G
e sin p 1
also [97]). Hence, L 2L
r 105
0 f ;
0 0 f p cos q sin G
e cos p 1
s
L 2L
p 1 G
e 2G 1
102 together with G = (1=2)(p2 q2 ) imply that [103] is
L 2L
analytic near p = q = 0 and L > 0, 2 [0, 2]. The
j%j L2 1 e2 1 Hamiltonian becomes analytic and the new coordi-
R gR 1 e cos f nates are suitable to describe motions crossing the
origin: for example, by setting
and the Hamiltonian [100] takes the form
def 1 p2 q2 1=2
g2 C 1 L
H" !L !G 2 4L
2L2
g [100] becomes
" FL G; L; ; 103
R
g2
In the coordinates L,G of [101] the unperturbed H !L !12p2 q2
2L2
circular case corresponds to G = 0 and [96], once 4
"g L 12 p2 q2
expressed in the actionangle variables G, L, , , is " 12p2 q2
analytic in a domain whose size is controlled by 2R g2 R2
p
G. Nevertheless, very often problems of perturba- 3 cos 2 11 cos 3 cos 3p
tion theory can be regularized. 7 sin 3 sin 3qC 106
This is done by enlarging the integrability
domain by adding to it points (one or more) around The KAM theorem does not apply in the form
the singularity (a boundary point of the domain of discussed above to Cartesian coordinates, that is,
the coordinates) and introducing new coordinates to when, as in [106], the unperturbed system is not
describe simultaneously the data close to the assigned in actionangle variables. However, there
singularity and the newly added points: in many are versions of the theorem (actually its corollaries)
interesting cases, the equations of motion are no which do apply and therefore it becomes possible to
longer singular (i.e., become analytic) in the new obtain some results even for the perturbations of
coordinates and are therefore apt to describe the circular motions by the techniques that have been
motions that reach the singularity in a finite time. illustrated here.
One can say that the singularity was only apparent. Likewise, the Hamiltonian of the rigid body with
Perhaps this is best illustrated precisely in the a fixed point O and subject to analytic external
above circular restricted three-body problem, with forces becomes singular, if expressed in the action
the singularity occurring where G = 0, that is, at a angle coordinates of Deprit, when the body motion
circular unperturbed orbit. If we describe the points nears a rotation around a principal axis or, more
with G small in a new system of coordinates generally, nears a configuration in which any two of
30 Introductory Article: Classical Mechanics
the axes i3 , z, or z0 coincide (i.e., any two among the It is useful to introduce the notion of a line 1
principal axis, the angular momentum axis and the situated between two lines , 0 with 0 > : this
inertial z-axis coincide; see the section Rigid will mean that 1 precedes 0 but not .
body). Nevertheless, by imitating the procedure All trees
in which there are some pairs l0 > l of
just described in the simpler cases of the circular consecutive lines of scale label 1 which have equal
three-body problem, it is possible to enlarge the current and such that all lines between them bear
phase space so that in the new coordinates the scale label 0 are obtained by inserting on the lines
Hamiltonian is analytic near the singular of trees in 0 with label 1 any number of clusters
configurations. of lines and nodes, with lines of scale 0 and with the
A regularization also arises when considering property that the sum of the harmonics of the nodes
collisional orbits in the unrestricted planar three- inserted vanishes.
body problem. In this respect, a very remarkable Consider a line l0 2
0 2 0 linking nodes v1 < v2
result is the regularization of collisional orbits in the and labeled 1 and imagine inserting on it a cluster
planar three-body problem. After proving that if the of lines of scale 0 with sum of the node harmonics
total angular momentum does not vanish, simulta- vanishing and out of which emerges one line
neous collisions of the three masses cannot occur connecting a node vout in to v2 and into which
within any finite time interval, the question is enters one line linking v1 to a node vin 2 . The
reduced to the regularization of two-body collisions, insertion of a klines, jj = (k 1)-nodes, cluster
under the assumption that the total angular momen- changes the tree value by replacing the line factor,
tum does not vanish. that will be briefly called value of the cluster , as
The local change of coordinates, which changes the
n v1 n v2 n v1 M; nl0 n v2 1
relative position coordinates (x, y) of two colliding 2
! 2
107
bodies as (x, y) ! (, ), with x iy = ( i)2 , is not w 0 nl0 w 0 nl0 w 0 nl0 2
one to one, hence it has to be regarded as an where M is an matrix
enlargement of the positions space, if points with
different (, ) are considered different. However, the "jj Y Y n v n v0
Mrs ; nl0 out; r in; s fn v 2
equations of motion written in the variables , have k! v2 l2 w 0 nl
no singularity at , = 0 (LEVI-CIVITA).
Another celebrated regularization is the regular- if = v0 v denotes a line linking v0 and v. Therefore, if
ization of the Schwartzschild metric, i.e., of the all possible connected clusters are inserted and the
general relativity version of the two-body problem: resulting values are added up, the result can be taken
it is, however, somewhat out of the scope of this into account by attributing to the original
P line l0 a
review (SYNGE, KRUSKAL). factor like [107] with M(0) (n(l0 )) def= M(; n(l0 ))
For more details, the reader is refered to Levi- replacing M(; n(l0 )).
Civita (1956). If several connected clusters are inserted on the
same line and their values are summed, the result is
a modification of the factor associated with the line
Appendix 1: KAM Resummation Scheme l0 into
The idea to control the remaining contributions is to !k
reduce the problem to the case in which there are no X
1
M0 nl0 1
pairs of lines that follow each other in the tree order n v1 2
n v2
k0 w 0 nl0 w 0 nl0 2
and which have the same current. Mark by a scale !
label 0 the lines, see [74], [83], of a tree whose 1
n v1 n v2 108
divisors C=w 0 :n(l) are >1: these are lines which give w 0 nl0 2 M0 nl0
no problems in the estimates. Then mark by a scale
label 1 the lines with current n(l) such that The series defining M(0) involves, by construction, only
jw 0 n(l)j 2n1 for n = 1 (i.e., the remaining lines). trees with lines of scale 0, hence with large divisors, so
The lines labeled 0 are said to be on scale 0, while that it converges to a matrix of small size of order "
those labeled 1 are said to be on scale 1. A cluster (actually "2 , more precisely) if " is small enough.
of scale 0 will be a maximal collection of lines of Convergence can be established by simply remark-
scale 0 forming a connected subgraph of a tree
. ing that the series defining M(1) is built with lines
Consider only trees
0 2 0 of the family 0 of with values >(1=2) of the propagator, so that it
trees containing no clusters of lines with scale label certainly converges for " small enough (by the
0 which have only one line entering the cluster and estimates in the section Perturbing functions,
one exiting it with equal current. where the propagators were identically 1) and the
Introductory Article: Classical Mechanics 31
sum is of order " (actually "2 ), hence <1. However, follow each other while any line between them has
such an argument cannot be repeated when dealing lower scale (i.e., 0), here between means preced-
with lines with smaller propagators (which still have ing l0 but not preceding l, as above.
to be discussed). Therefore, a method not relying on Therefore, a scale-independent method has to be
so trivial a remark on the size of the propagators has devised to check the convergence for M(1) and for the
eventually to be used when considering lines of scale matrices to be introduced later to deal with even
higher than 1, as it will soon become necessary. smaller propagators. This is achieved by the following
The advantage of the collection of terms achieved extension of Siegels theorem mentioned in the section
with [108] is that we can represent h as a sum of Quasiperiodicity and KAM stability:
values of trees which are simpler because they
Theorem 8 Let w 0 satisfy [74] and set w = Cw 0 .
contain no pair of lines of scale 1 with in between
Consider the contribution to the sum in [82] from
lines of scale 0 with total sum of the node harmonics
graphs
in which
vanishing. The price is that the divisors are now more
involved and we even have a problem due to the fact (i) no pairs 0 > of lines which lie on the same
that we have not proved that the series in [108] path to the root carry the same current n if all
converges. In fact, it is a geometric series whose value lines 1 between them have current n(1 ) such
is the RHS of [108] obtained by the sum rule [79] that jw n(1 )j > 2jw nj;
unless we can prove that the ratio of the geometric (ii) the node harmonics are bounded by jnj N for
series is <1. This is trivial in this case by the previous some N.
remark: but it is better to note that there is another
Then the number of lines in
with divisor w n
reason for convergence, whose use is not really
satisfying 2n < jw n j 2n1 does not exceed
necessary here but will become essential later.
4 Nk2n= , n = 1, 2, . . . .
The property that the ratio of the geometric series
is <1 can be regarded as due to the consequence of This implies, by the same estimates in [85], that
the cancellation mentioned in the section Quasi- the series defining M(1) converges. Again, it must be
periodicity and KAM stability which can be checked that there are cancellations implying that
shown to imply that the ratio is <1 because M(1) (n) = "2 (w 0 n)2 m(1) (n) with jm(1) (n)j < D0 for
M(0) (n) = "2 (w 0 n)2 m(0) (n) with C jm(0) (n)j < D0 the same D0 > 0 and the same "0 .
for some D0 > 0 and for all j"j < "0 for some "0 . At this point, one deals with trees containing only
Then for small " the divisor in [108] is essentially lines carrying labels 0, 1, 2, and the line factors for
still what it was before starting the resummation. the lines = v0 v of scale 0 are n v0 n v =(w 0 n())2 ,
At this point, an induction can be started. Consider those of the lines = v0 v of scale 1 have line factors
trees evaluated with the new rule and place a scale n v0 (w 0 n()2 M(0) (n()))1 n v , and those of the
level 2 on the lines with C jw 0 n(l)j 2n1 for lines = v0 v of scale 2 have line factors
n = 2: leave the label 0 on the lines already marked
so and label by 1 the other lines. The lines of scale n v0 w 0 n2 M1 n1 n v
1 will satisfy 2n < jw 0 n(l)j 2n1 for n = 1.
The graphs will now possibly contain lines of scale 0, Furthermore, no pair of lines of scale 1 or of scale
1 or 2 while lines with label 1 no longer can 2 with the same momentum and with only lines
appear, by construction. of lower scale (i.e., of scale 0 in the first case or of
A cluster of scale 1 will be a maximal collection of scale 0, 1 in the second) between them can
lines of scales 0, 1 forming a connected subgraph of follow each other.
a tree
and containing at least one line of scale 1. This procedure can be iterated until, after infi-
The construction carried out by considering clusters nitely many steps, the problem is reduced to the
of scale 0 can be repeated by considering trees
1 2 1 , evaluation of tree values in which each line carries a
with 1 the collection of trees with lines marked 0, 1, scale label n and there are no pairs of lines which
or 2 and in which no pairs of lines with equal follow each other and which have only lines of
momentum appear to follow each other if between lower scale in between. Then the Siegel argument
them there are only lines marked 0 or 1. applies once more and the series so resumed is an
Insertion of connected clusters of such lines on a absolutely convergent series of functions analytic in
line l0 of
1 leads to define a matrix M(1) formed by ": hence the original series is convergent.
summing tree values of clusters with lines of scales Although at each step there is a lower bound on the
0 or 1 evaluated with the line factors defined in denominators, it would not be possible to avoid using
[107] and with the restriction that in there are no Siegels theorem. In fact, the lower bound would become
pairs of lines < 0 with the same current and which worse and worse as the scale increases. In order to check
32 Introductory Article: Classical Mechanics
the estimates of the constants D0 , "0 which control the Therefore, if H = Hk is directed along the k-axis,
scale independence of the convergence of the various the acceleration it produces is the same that the
series, it is necessary to take advantage of the theorem, Coriolis force would impress on a unit mass located
and of the absence (at each step) of the necessity of in a reference frame which rotates with angular
considering trees with pairs of consecutive lines with velocity !0 k around the k-axis if H = 2!0 k.
equal momentum and intermediate lines of higher scale. The above remarks imply that a homogeneous
One could also perform the analysis by bounding sphere electrically charged uniformly with a unit
h(k) order by order with no resummations (i.e., charge and freely pivoting about its center in a
without changing the line factors) and exhibiting the constant magnetic field H directed along the k-axis
necessary cancellations. Alternatively, the paths that undergoes the same motion as it would follow if not
Kolmogorov, Arnold and Moser used to prove subject to the magnetic field but seen in a
the first three (somewhat different) versions of the noninertial reference frame rotating at constant
theorem, by successive approximations of the angular velocity !0 around the k-axis if H and !0
equations for the tori, can be followed. are related by H = 2!0 : in this frame, the Coriolis
The invariant tori are Lagrangian manifolds just force is interpreted as a magnetic field.
as the unperturbed ones (cf. comments after [31]) This holds, however, only if the centrifugal force
and, in the case of the Hamiltonian [80], the has zero moment with respect to the center: true in
generating function A y (A, y ) can be the spherical symmetry case only. In spherically
expressed in terms of their parametric equations nonsymmetric cases, the centrifugal forces have in
general nonzero moment, so the equivalence
A; y Gy a y hy A w hy between Coriolis force and the Lorentz force is
def
y Gy hy h y y h y a only approximate.
Z The Larmor theorem makes this more precise. It
def dy 109 gives a quantitative estimate of the difference between
a hy h y y h y
2 the motion of a general system of particles of mass m
Z
dy in a magnetic field and the motion of the same
h y y h y
2 particles in a rotating frame of reference but in the
absence of a magnetic field. The approximation is
where = (w y ) and the invariant torus corre- estimated in terms of the size of the Larmor frequency
sponds to A0 = w in the map a = y A F(A, y ) and eH=2mc, which should be small compared to the
A0 = A y (A, y ). In fact, by [109] the latter other characteristic frequencies of the motion of the
becomes A0 = A h and, from the second of [75] system: the physical meaning is that the centrifugal
written for f depending only on the angles a, it is force should be small compared to the other forces.
A = w h when A, a are on the invariant torus. The vector potential A for a constant magnetic
Note that if a exists it is necessarily determined by the field in the k-direction, H = 2!0 k, is A = 2!0 k ^ R
third relation in [109] but the check that the second 2!0 R? . Therefore, from the treatment of the Coriolis
equation in [109] is soluble (i.e., that the RHS is an exact force in the section Three-body problem (see
gradient up to a constant) is nontrivial. The canonical [95]), the motion of a charge e with mass m in a
map generated by A y F(A, y ) is also defined for A0 magnetic field H with vector potential A and subject
close to w and foliates the neighborhood of the invariant to other forces with potential W can be described, in
torus with other tori: of course, for A0 6 w the tori an inertial frame and in generic units, in which the
defined in this way are, in general, not invariant. speed of light is c, by a Hamiltonian
The reader is referred to Gallavotti et al. (2004)
for more details. 1 e 2
H p A WR 110
2m c
where p = mR_ (e=c)A and R are canonically con-
Appendix 2: Coriolis and Lorentz jugate variables.
Forces Larmor Precession
Larmor precession refers to the motion of an
electrically charged particle in a magnetic field H Further Reading
(in an inertial frame of reference). It is due to the
Arnold VI (1989) Mathematical Methods of Classical Mechanics.
Lorentz force which, on a unit mass with unit Berlin: Springer.
charge, produces an acceleration R = v ^ H if the Calogero F and Degasperis A (1982) Spectral Transform and
speed of light is c = 1. Solitons. Amsterdam: North-Holland.
Introductory Article: Differential Geometry 33
Chierchia L and Valdinoci E (2000) A note on the construction of Landau LD and Lifshitz EM (1976) Mechanics. New York:
Hamiltonian trajectories along heteroclinic chains. Forum Pergamon Press.
Mathematicum 12: 247255. Levi-Civita T (1956) Opere Matematiche. Accademia Nazionale
Fasso F (1998) Quasi-periodicity of motions and complete dei Lincei. Bologna: Zanichelli.
integrability of Hamiltonian systems. Ergodic Theory and Moser J (1962) On invariant curves of an area preserving
Dynamical Systems 18: 13491362. mapping of the annulus. Nachricten Akademie Wissenschaften
Gallavotti G (1983) The Elements of Mechanics. New York: Gottingen 11: 120.
Springer. Nekhorossev V (1977) An exponential estimate of the time of
Gallavotti G, Bonetto F, and Gentile G (2004) Aspects of the stability of nearly integrable Hamiltonian systems. Russian
Ergodic, Qualitative and Statistical Properties of Motion. Mathematical Surveys 32(6): 165.
Berlin: Springer. Poincare H (1987) Methodes nouvelles de la mecanique celeste
Kolmogorov N (1954) On the preservation of conditionally vol. I. Paris: Gauthier-Villars. (reprinted by Gabay, Paris,
periodic motions. Doklady Akademia Nauk SSSR 96: 1987).
527530.
finite dimensional, Frechet, Banach, or Hilbert for m 2 M and for any X, Y 2 Tm M, 2 R so that
example) is a topological space M equipped with a vector fields on M build a linear space.
family of local coordinate charts (Ui , i )i2I such that the One can generate tangent vectors to M via local
open subsets Ui M cover M and where i : Ui ! V, one-parameter groups of differentiable transforma-
i 2 I, are homeomorphisms which give rise to smooth tions of M, that is, mappings (t, m) 7! t (m) from
transition maps i 1 j : j (Ui \ Uj ) ! i (Ui \ Uj ). ], [ U to U (with > 0 and U M an
An n-dimensional differentiable manifold is a differ- open subset of M) such that 0 = Id, ts = t s
entiable manifold modeled P on Rn . The sphere 8s, t 2 ], [ with t s 2 ], [ and m 7! t (m) is a
n n
Sn1 := {(x1 , . . . , xn ) 2 R , i = 1 x2i = 1} is a differenti- diffeomorphism of U onto an open subset t (U).
able manifold of dimension n 1. The tangent vector at t = 0 to the curve (t) = t (m)
Simple differentiable curves in Rn are one- yields a tangent vector to M at point m = (0).
dimensional differentiable manifolds locally speci- Conversely, when M is finite dimensional, the
fied by coordinates x(t) = (x1 (t), . . . , xn (t)) 2 R n , fundamental theorem for systems of ordinary
where t 7! xj (t) is of class Ck . The tangent at point equations yields, for any vector field X on M, the
x(t0 ) to such a curve, which is a straight line passing existence (around any point m 2 M) of a
through this point with direction given by the vector local one-parameter group of local transformations
x0 (t0 ), generalizes to the concept of tangent space :], [ U ! M (with U an open subset contain-
Tm M at point m 2 M of a smooth manifold M ing m) which induces the tangent vector
modeled on V which is a vector space isomorphic to X(m) 2 Tm M.
V spanned by tangent vectors at point m to curves A differentiable mapping : M ! N induces a map
(t) of class C1 on M such that (t0 ) = m. (m) : Tm M ! T(m) M defined by Xf = X(f ).
In order to make this more precise, one needs the An immersion of a manifold M in a manifold N is a
notion of differentiable mapping. Given two differ- differentiable mapping : M ! N such that the maps
entiable manifolds M and N, a mapping f : M ! N (m) are injective at any point m 2 M. Such a map is
is differentiable at point m if, for every chart (U, ) an embedding if it is moreover injective in which case
of M containing m and every chart (V, ) of N such (M) N is a submanifold of N. The unit sphere Sn
that f (U) V, the mapping f 1 : (U) ! (V) is a submanifold of Rn1 . Whitney showed that every
is differentiable at point (m). In particular, differenti- smooth real n-dimensional manifold can be embedded
able mappings f : M ! R form the algebra C1 (M, R) in R2n1 .
of smooth real-valued functions on M. Differentiable A differentiable manifold whose coordinate charts
mappings : [a, b] ! M from an interval [a, b] R to take values in a complex vector space V and whose
a differentiable manifold M are called differentiable transition maps are holomorphic is called a complex
curves on M. A differentiable mapping f : M ! N manifold, which is complex n-dimensional if V = Cn .
which is invertible and with differentiable inverse The complex projective space CPn , the union of
f 1 : N ! M is called a diffeomorphism. complex straight lines through 0 in Cn1 , is a
The derivative of a function f 2 C 1 (M, R) along compact complex manifold of dimension n. Similarly
a curve : [a, b] ! M at point (t0 ) 2 M with t0 2 to the notion of differentiable mapping between
[a, b] is given by differentiable manifolds, we have the notion of
d holomorphic mapping between complex manifolds.
Xf : f t A smooth family m 7! Jm of endomorphisms of the
dtjtt0
tangent spaces Tm M to a differentiable manifold M such
and the map f 7! Xf is called the tangent vector to 2
that Jm = Id gives rise to an almost-complex manifold.
the curve at point (t0 ). Tangent vectors to some The prototype is the almost-complex structure on Cn
curve : [a, b] ! M at a given point m 2 ([a, b]) defined by J(@xi ) = @yi ; J(@yi ) = @xi with z = (x1
form a vector space Tm M called the tangent space iy1 , . . . , xn iyn ) 2 Cn which can be transferred to a
to M at point m. complex manifold M by means of local charts. An
A (smooth) map which, to a point m 2 M, assigns almost-complex structure J on a manifold M is called
a tangent vector X 2 Tm M is called a (smooth) complex if M is the underlying differentiable manifold
vector field. It can also be seen as a derivation of a complex manifold which induces J in this way.
~ : f 7! Xf on C1 (M, R) defined by (Xf
X ~ )(m) := Studying smooth functions on a differentiable
X(m)f for any m 2 M and the bracket of vector manifold can provide information on the topology
fields is thereby defined from the operator bracket of the manifold: for example, the behavior of a
gY] := X
[X, ~ Y~ Y~ X.
~ The linear operations on smooth function on a compact manifold as its
tangent vectors carry out to vector fields (X critical points strongly restricted by the topological
Y)(m) := X(m) Y(m), (X)(m) := X(m) for any properties of the manifold. This leads to the Morse
Introductory Article: Differential Geometry 35
critical point theory which extends to infinite- Lobatchevsky in 1829 and Bolyai in 1832. Non-
dimensional manifolds and, among other conse- Euclidean geometries actually played a major role in
quences, leads to conclusions on extremals or closed the development of differential geometry and Loba-
extremals of variational problems. Rather than chevskys work inspired Riemann and later Klein.
privileging points on a manifold, one can study Dropping the positivity assumption for the
instead the geometry of manifolds from the point of bilinear forms gm on Tm M leads to Lorentzian
view of spaces of functions, which leads to an manifolds which are (n 1)-dimensional smooth
algebraic approach to differential geometry. The manifolds equipped with bilinear forms on the
initial concept there is a commutative ring (which tangent spaces with signature (1, n). These occur in
becomes a possibly noncommutative algebra in the general relativity and tangent vectors with negative,
framework of noncommutative geometry), namely positive, or vanishing squared length are called
the ring of smooth functions on the manifold, while timelike, spacelike, and lightlike, respectively.
the manifold itself is defined in terms of the ring as the Just as complex vector spaces can be equipped with
space of maximal ideals. In particular, this point of positive-definite Hermitian products, a complex
view proves to be fruitful to understand supermani- manifold M can come equipped with a Hermitian
folds, a generalization of manifolds which is impor- metric, namely a positive-definite Hermitian product
tant for supersymmetric field theories. hm on Tm M for every point m 2 M depending
One can further consider the sheaf of smooth smoothly on the point m; every Hermitian metric
functions on an open subset of the manifold; this induces a Riemannian one given by its real part. The
point of view leads to sheaf theory which provides a complex projective space CPn comes naturally
unified approach to establishing connections between equipped with the FubiniStudy Hermitian metric.
local and global properties of topological spaces.
Transformation Groups
Metric Properties Metric properties can be seen from the point of view
Riemann focused on the metric properties of manifolds of transformation groups. Poncelet in his Traite
but the first clear formulation of the concept of a projectif des figures (1822) had investigated classical
manifold equipped with a metric was given by Weyl in Euclidean geometry from a projective geometric
Die Idee der Riemannsche Flache. A Riemannian point of view, but it was not until Cayley (1858)
metric on a differentiable manifold M is a positive- that metric properties were interpreted as those
definite scalar product gm on Tm M for every point stable under any projective transformation which
m 2 M depending smoothly on the point m. A manifold leaves cyclic points (points at infinity on the
equipped with a Riemannian metric is called a imaginary axis of the complex plane) invariant.
Riemannian manifold. A Weyl transformation, which Transformation groups were further investigated by
is multiplying the metric by a smooth positive function, Lie, leading to the modern concept of Lie group, a
yields a new Riemannian metric with the same angle smooth manifold endowed with a group structure
measurement as the original one, and hence leaves the such that the group operations are smooth.
conformal structure on M unchanged. A vector field X on a Lie group G is called left-
Riemann also suggested considering metrics on (resp. right-) invariant if it is invariant under left
the tangent spaces that are not induced from scalar translations Lg : h 7! gh (resp. right translations
products; metrics on the manifold built this way Rg : h 7! hg) for every g 2 G, that is, if (Lg ) X(h) =
were first systematically investigated by Finsler and X(gh) 8(g, h) 2 G2 (resp. (Rg ) X(h) = X(gh) 8(g, h)
are therefore called Finsler metrics. Geodesics on a 2 G2 ). The set of all left-invariant vector fields
Riemannian manifold M which correspond to equipped with the sum, scalar multiplication, and
smooth curves : [a, b] ! M that minimize the the bracket operation on vector fields form an
length functional algebra called the Lie algebra of G.
The group Gln (R) (resp. Gln (C)) of all real (resp.
Z s
complex) invertible n n matrices is a Lie group
1 b d d
L : gt ; dt with Lie algebra, the algebra gln (R) (resp. gln (C)) of
2 a dt dt
all real (resp. complex) n n matrices and the
then generalize to curves which realize the shortest bracket operation reads [A, B] = AB BA.
distance between two points chosen sufficiently close. The orthogonal (resp. unitary) group On (R) :=
Euclids axioms which naturally lead to Rieman- {A 2 Gln (R), At A = 1}, where At denotes the trans-
nian geometry are also satisfied up to the axiom posed matrix (resp. Un (C) := {A 2 Gln (C), A A = 1},
of parallelism by a geometry developed by where A = A t ), is a compact Lie group with Lie
36 Introductory Article: Differential Geometry
algebra on (R) := {A 2 Gln (R), At = A} (resp. un (C) := space. Smooth sections of E are maps : B ! E such
{A 2 Gln (C), A = A}). that = IB .
A left-invariant vector field X on a finite-dimen- When F is a vector space and when, given open
sional Lie group G (or equivalently an element X of subsets Ui B that cover B with corresponding
the Lie algebra of G) generates a global one- coordinate charts (Ui , i )i2I , the local diffeomorph-
parameter group of transformations X (t), t 2 R. isms i : 1 (Ui ) i (Ui ) F give rise to transition
The mapping from the Lie algebra of G into G maps i j1 : j (Ui \ Uj ) F ! i (Ui \ Uj ) F that
defined by exp(X) := X (1) is called the exponential are linear in the fiber, the bundle is S called a vector
mapping. The exponential mapping on Gl Pn (R) (resp. bundle. The tangent bundle TM = m2M Tm M to a
Gln (C)) is given by the series exp (A) = 1 i
i = 0 A =i!. differentiable manifold M modeled on a vector space
As symmetry groups of physical systems, Lie V is a vector bundle with typical fiber V and
groups play an important role in physics, in transition maps ij = (i 1 1
j , d(i j )) expressed
particular in quantum mechanics and YangMills in terms of the differentials of the transition maps on
theory. Infinite-dimensional Lie groups arise as the manifold M. So are the cotangent bundle, the
symmetry groups, such as the group of diffeomorph- dual of the tangent bundle, and tensor products of
isms of a manifold in general relativity, the group of the tangent and cotangent vector bundles with
gauge transformations in YangMills theory, and typical fiber the dual V and tensor products of V
the group of Weyl transformations of metrics on a and V . Vector fields defined previously are sections
surface in string theory. The principle the physics of the tangent bundle, 1-forms on M are sections of
should not depend on how it is described translates the cotangent bundle, and contravariant tensors,
to an invariance under the action of the (possibly resp. covariant tensors are sections of tensor
infinite-dimensional group) of symmetries of the products of the tangent, resp. cotangent bundles. A
theory. Anomalies arise when such an invariance differentiable mapping : M ! N takes covariant
holds for the classical action of a physical theory but p-tensor fields on N to their pullbacks by ,
breaks at the quantized level. covariant p-tensors on M given by
In his Erlangen program (1872), Klein puts the
TX1 ; . . . ; Xp := T X1 ; . . . ; Xp
concept of transformation group in the foreground
introducing a novel idea by which one should for any vector fields X1 , . . . , Xp on M.
consider a space endowed with some properties Differentiating a smooth function f on M gives
as a set of objects invariant under a given group of rise to a 1-form df on M. More generally, exterior p-
transformations. One thereby reaches a classifica- forms are antisymmetric smooth covariant p-tensors
tion of geometric results according to which group is so that !(X(1) , . . . , X(p) ) = ()!(X1 , . . . , Xp ) for
relevent in a particular problem as, for example, the any vector fields X1 , . . . , Xp on M and any permuta-
projective linear group for projective geometry, tion 2 p with signature ().
the orthogonal group for Riemannian geometry, or Riemannian metrics are covariant 2-tensors and
the symplectic group for symplectic geometry. the space of Riemannian metrics on a manifold M is
an infinite-dimensional manifold which arises as a
configuration space in string theory and general
Fiber Bundles
relativity.
Transformation groups give rise to principal fiber A principal bundle is a fiber bundle (P, , B) with
bundles which play a major role in YangMills typical fiber a Lie group G acting freely and properly
theory. The notion of fiber bundle first arose out of on the total space P via a right action (p, g) 2
questions posed in the 1930s on the topology and the P G 7! pg = Rg (p) 2 P and such that the local
geometry of manifolds, and by 1950 the definition of diffeomorphisms 1 (U) U G are G-equivariant.
fiber bundle had been clearly formulated by Steenrod. Given a principal fiber bundle (P, , B) with structure
A smooth fiber bundle with typical fiber a group a finite-dimensional Lie group G, the action of
manifold F is a triple (E, , B), where E and B are G on P induces a homomorphism which to an
smooth manifolds called the total space and the base element X of the Lie algebra of G assigns a vector
space, and : E ! B is a smooth surjective map field X on P called the fundamental vector field
called the projection of the bundle such that the generated by X. It is defined at p 2 P by
preimage 1 (b) of a point b 2 B called the fiber of
the bundle over b is isomorphic to F and any base d
X p : RexptX p
point b has a neighborhood U B with preimage dtjt0
1 (U) diffeomorphic to U F, where the diffeo-
mophisms commute with the projection on the base where exp is the exponential map on G.
Introductory Article: Differential Geometry 37
Given an action of G on a vector space V, one the group G) decomposition of the tangent space
builds from a principal bundle with typical fiber G an Tp P = Hp P Vp P at each point p into a horizontal
associated vector bundle with typical fiber V. space Hp P and the vertical space Vp P = Ker dp ,
Principal bundles are essential in gauge theory; U(1)- gives rise to a linear connection on the associated
principal bundles arise in electro-magnetism and vector bundle.
nonabelian structure groups arise in YangMills A connection on P gives rise to a 1-form ! on P
theory. There the fields are connections on the with values in the Lie algebra of the structure group
principal bundle, and the action of gauge transforma- G called the connection 1-form and defined as
tions on (irreducible) connections gives rise to an follows. For each X 2 Tp P, !(X) is the unique
infinite-dimensional principal bundle over the moduli element U of the Lie algebra of G such that the
space with structure group given by gauge transfor- corresponding fundamental vector field U (p) at
mations. Infinite-dimensional bundles arise in other point p coincides with the vertical component of X.
field theories such as string theory where the moduli In particular, !(U ) = U for any element U of the Lie
space corresponds to inequivalent complex structures algebra of G.
on a Riemann surface and the infinite-dimensional The space of connections which is an infinite-
structure group is built up from Weyl transformations dimensional manifold arises as a configuration space
of the metric and diffeomorphisms of the surface. in YangMills theory and also comes into play in the
SeibergWitten theory.
Connections
On a manifold there is no canonical method to Geometric Differential Operators
identify tangent spaces at different points. Such an
From connections one defines a number of differ-
identification, which is needed in order to differenti-
ential operators on a Riemannian manifold, among
ate vector fields, can be achieved on a Riemannian
them second-order Laplacians. In particular, the
manifold via parallel transport of the vector fields.
LaplaceBeltrami operator f 7! tr(rTM df ) on
The basic concepts of the theory of covariant
smooth functions, where rT M is the connection on
differentiation on a Riemannian manifold were given
the cotangent bundle induced by the Levi-Civita
at the end of the nineteenth century by Ricci and, in a
connection on M, generalizes the ordinary Laplace
more complete form, in 1901 in collaboration with
operator on Euclidean space. This in turn generalizes
Levi-Civita in Methodes de calcul differentiel absolu et
to second-order operators E := tr(rT ME rE )
leurs applications; on a Riemannian manifold, it is
acting on smooth sections of a vector bundle E over
possible to define in a canonical manner a parallel
a Riemannian manifold M, where rE is a connection
displacement of tangent vectors and thereby to
on E and rT ME the connection on T M E
differentiate vector field covariantly using the since
induced by rE and the Levi-Civita connection on M.
then called Levi-Civita connection.
The Dirac operator on a spin Riemannian
More generally, a (linear) connection (or equiva-
manifold, a first-order differential operator whose
lently a covariant derivation) on a vector bundle E
square coincides with the LaplaceBeltrami opera-
over a manifold M provides a way to identify fibers
tor up to zeroth-order terms, can be best under-
of the vector bundle at different points; it is a map r
stood going back to the initial idea of Dirac. A
taking sections of E to E-valued 1-forms on M
first-order differentialP operator with constant
which satisfies a Leibniz rule, r(f ) = df f r, n
matrix coefficients i = 1 i (@=@x
Pi) has square
for any smooth function f on M. When E is the
given by the Laplace operator ni= 1 @ 2 =@x2i on
tangent bundle over M, curves on the manifold
Rn if and only if its coefficients satisfy the the
with covariantly constant velocity r(t)_ = 0 give rise
Clifford relations
to geodesics. Given an initial velocity (0) _ =X 2
Tm M and provided X has small enough norm, X (1) i2 1 8 i 1; . . . ; n
defines a point on the corresponding geodesic and
i j j i 0 8 i 6 j
the map exp : X 7! X (1) a diffeomorphism from a
neighborhood of 0 in Tm M to a neighborhood of The resulting Clifford algebra, once complexified, is
m 2 M called the exponential map of r. isomorphic in even dimensions n = 2k to the space
The concept of connection extends to principal End(Sn ) (and End(Sn ) End(Sn ) in odd dimensions
k
bundles where it was developed by Ehresmann n = 2k 1) of endomorphisms of the space Sn = C2
building on the work of Cartan. A connection on a of complex n-spinors. When instead of the canoni-
principal bundle (P, , B) with structure group G, cal metric on Rn one starts from the the metric on
which is a smooth equivariant (under the action of the tangent bundle TM induced by the Riemannian
38 Introductory Article: Differential Geometry
metric on M and provided the corresponding spinor Riemannian curvature tensor, a 4-tensor which in
spaces patch up to a spinor bundle over M, M is local coordinates reads
called a spin manifold. The Dirac operator on a
spin Riemannian manifold M is a first-order @ @ @ @
Rijkl : g ; ;
differential
P operator acting on spinors given by @i @j @k @l
Dg = ni= 1 i rei , where r is the connection
on spinors (sections of the spinor bundle S) induced further taking a partial trace leads to the P Ricci
by the Levi-Civita connection and e1 , . . . , en is curvature given by the 2-tensor Ricij = k Rikjk ,
an orthonormal frame of the tangent bundle TM. the trace ofPwhich gives in turn the scalar cur-
This is a particular case of more general twisted vature R = i Ricii . Sectional curvature at a point
Dirac operators DW g on a twisted spinor bundle m in the direction of a two-dimensional plane
S W equipped with the connection rSW which spanned by two vectors U and V corresponds to
combines the connection r with a connection rW K(U, V) = g((U, V)V, U). A manifold has constant
on an auxilliary vector bundle W. Their square sectional curvature whenever K(U, V)=kU ^ Vk2 is a
2
(DWg ) relates to the Laplacian
SW
built from this constant K for all linearly independent vectors U,V.
twisted connection via the Lichnerowicz formula A Riemannian manifold with constant sectional
which is useful for estimates on the spectrum of the curvature is said to be spherical, flat, or hyperbolic
Dirac operator in terms of the underling geometric type depending on whether K > 0, K = 0, or K < 0,
data. respectively. One owes to Cartan the discovery of an
When there is no spin structure on M, one can still important class of Riemannian manifolds, symmetric
hope for a Spinc structure and a Dirac Dc operator spaces, which contains the spheres, the Euclidean
associated with a connection compatible with that spaces, the hyperbolic spaces, and compact Lie
structure. In particular, every compact orientable groups. A connected Riemannian manifold M
4-manifold can be equipped with a Spinc structure equipped at every point m with an isometry m
and one can build invariants of the differentiable such that m (m) = m and the tangent map Tm m
manifold called SeibergWitten invariants from equals Id on the tangent space (it therefore reverses
solutions of a system of two partial differential the geodesics through m) is called symmetric. CPn
equations, one of which is the Dirac equation equipped with the FubiniStudy metric is a symmetric
Dc = 0 associated with a connection compatible space with the isometry given by the reflection with
with the Spinc structure and the other a nonlinear respect to a line in Cn1 . A compact symmetric space
equation involving the curvature. has non-negative sectional curvature K.
Constraints on the curvature can have topological
consequences. Spheres are the only simply connected
manifolds with constant positive sectional curvature;
Curvature
if a simply connected complete Riemannian mani-
The concept of curvature, which is now under- fold of dimension >1 has non-positive sectional
stood in terms of connections (the curvature of a curvature along every plane, then it is homeo-
connection r is defined by = r2 ), historically morphic to the Euclidean space.
arose prior to that of connection. In its modern A manifold with Ricci curvature tensor propor-
form, the concept of curvature dates back to Gauss. tional to the metric tensor is called an Einstein
Using a spherical representation of surfaces the manifold. Since Einstein, curvature is a cornerstone
Gauss map , which sends a point m of an oriented of general relativity with gravitational force being
surface R3 to the outward pointing unit normal interpreted in terms of curvature. For example, the
vector m Gauss defined what is since then called vacuum Einstein equation reads Ricg = (1=2)Rg g with
the Gaussian curvature Km at point m 2 U as Ricg the Ricci curvature of a metric g and Rg its scalar
the limit when the area of U tends to zero of the curvature. In addition, KaluzaKlein supergravity is a
ratio area( (U))=area(U). It measures the obstruc- unified theory modeled on a direct product of the
tion to finding a distance-preserving map from a Mikowski four-dimensional space and an Einstein
piece of the surface around m to a region in the manifold with positive scalar curvature.
standard plane. Gauss Teorema Egregium says that The Ricci flow dg(t)=dt = 2Ricg(t) , which is
the Gaussian curvature of a smooth surface in R3 is related with the Einstein equation in general
defined in terms of the metric on the surface so that relativity, was only fairly recently introduced in the
it agrees for two isometric surfaces. mathematical literature. Hopes are strong to get a
From the curvature of a connection on a classification of closed 3-manifolds using the Ricci
Riemannian manifold (M, g), one builds the flow as an essential ingredient.
Introductory Article: Differential Geometry 39
manifold. Chern (or characteristic) classes are topo- where g denotes a Riemannian metric on a spin
logical invariants associated to fiber bundles and play manifold M, DW g a Dirac operator acting on sections
a crucial role in index theory. ChernWeil theory of some twisted bundle S W with S the spinor
builds representatives of these de Rham cohomology bundle on M and W an auxiliary vector bundle over
classes from a connection r of the form tr(f (r2 )), M, ind(DW g ) the index of the Dirac operator, and
where f is some analytic function. g , W respectively the curvatures of the Levi-Civita
When the manifold is Riemannian, the Laplace connection and a connection on W, and A( ^ g) a
Beltrami operator on functions generalizes to differ- ^
particular Chern form called the A-genus. Index
ential forms in two different ways, namely to the theorems are useful to compute anomalies in gauge
Bochner Laplacian T M on forms (i.e., sections of theories arising from functional quantisation of
T M), where the contangent bundle T M is classical actions.
equipped with a connection induced by the Levi-Civita Given an even-dimensional closed spin manifold
connection and to the LaplaceBeltrami operator on (M, g) and a Hermitian vector bundle W over M, the
forms (d d )2 = d d d d , where d is the (formal) index of the associated Dirac operator DW g yields the
adjoint of the exterior differential d. These are related so-called Atiyah map K0 (M) 7! Z defined by
via Weitzenbocks formula which in the particular case W 7! ind(DW 0
g ), where K (M) is the group of formal
of 1-forms states that the difference of those two differences of stable homotopy classes of smooth
operators is measured by the Ricci curvature. vector bundles over M. This is the starting point for
When the manifold is compact, Hodges theorem the noncommutative geometry approach to index
asserts that the de Rham cohomology groups are theory, in which the space of smooth functions on a
40 Introductory Article: Electromagnetism
manifold which arises here in a disguised from since Husemoller D (1994) Fibre Bundles, 3rd edn. Graduate Texts in
K0 (M) K0 (C1 (M)) (which consists of formal Mathematics 20. New York: Springer Verlag.
Jost J (1998) Riemannian Geometry and Geometric Analysis,
differences of smooth homotopy classes of idempo- Universitext. Berlin: Springer.
tents in the inductive limit of spaces of matrices Klingenberg W (1995) Riemannian Geometry, 2nd edn. Berlin: de
gln (C1 (M))) is generalized to any noncommutative Gruyter.
smooth algebra. Kobayashi S and Nomizou K (1996) Foundations of Differential
Geometry I, II. Wiley Classics Library, a Wiley-Interscience
Publication. New York: Wiley.
Further Reading Lang S (1995) Differential and Riemannian Manifolds, 3rd edn.
Graduate Texts in Mathematics, 160. New York: Springer
Bishop R and Crittenden R (2001) Geometry of Manifolds. Verlag.
Providence, RI: AMS Chelsea Publishing. Milnor J (1997) Topology from the Differentiate Viewpoint.
Chern SS, Chen WH, and Lam KS (2000) Lectures on Differential Princeton Landmarks in Mathematics. Princeton, NJ: Princeton
Geometry, Series on University Mathematics. Singapore: World University Press.
Scientific. Nakahara M (2003) Geometry, Topology and Physics, 2nd edn.
Choquet-Bruhat Y, de Witt-Morette C, and Dillard-Bleick M Bristol: Institute of Physics.
(1982) Analysis, Manifolds and Physics, 2nd edn. Amsterdam Spivak M (1979) A Comprehensive Introduction to Differential
New York: North Holland. Geometry, vols. 1, 2 and 3. Publish or Perish Inc., Wilmington,
Gallot S, Hulin D, and Lafontaine J (1993) Riemannian Geometry, Delaware.
Universitext. Berlin: Springer. Sternberg S (1983) Lectures on Differential Geometry, 2nd edn.
Helgason S (2001) Differential, Lie Groups and Symmetric Spaces. New York: Chelsea Publishing Co.
Graduate Studies in Mathematics 36. AMS, Providence, RI.
moving charges. They unify the treatment of rest or in uniform motion. In the world of classical
electricity and magnetism by revealing for the first mechanics, therefore:
time the full duality between the electric and
Principle of Relativity There is no absolute stan-
magnetic fields. They have been verified over an
dard of rest; only relative motion is observable.
almost unimaginable variety of physical processes,
from the propagation of light over cosmological In his Dialogue concerning the two chief world
distances, through the behavior of the magnetic systems, Galileo illustrated the principle by arguing
fields of stars and the everyday applications in that the uniform motion of a ship on a calm sea does
electrical engineering and laboratory experiments, not affect the behavior of fish, butterflies, and other
down in their quantum version to the exchange moving objects, as observed in a cabin below deck.
of photons between individual electrons. Relativity theory takes the principle as funda-
The history of Maxwells equations is convoluted, mental, as a statement about the nature of space and
with many false turns. Maxwell himself wrote down time as much as about the properties of the
an inconsistent form of the equations, with a Newtonian equations of motion. But if it is to be
different sign for in the first equation, in his given such universal significance, then it must apply
1865 work A dynamical theory of the electromag- to all of physics, and not just to Newtonian
netic field. The consistent form appeared later in dynamics. At first this seems unproblematic it is
his Treatise on Electricity and Magnetism (1873); hard to imagine that it holds at such a basic level,
see Chalmers (1975). but not for more complex physical interactions.
In this article, we shall not follow the historical Nonetheless, deep problems emerge when we try to
route to the equations. Some of the complex story of extend it to electromagnetism since Galilean invari-
the development hinted at in the remarks above can ance conflicts with Maxwells equations.
be found in the articles by Chalmers (1975), Siegel All appears straightforward for systems involving
(1985), and Roche (1998). Neither shall we follow slow-moving charges and slowly varying electric and
the traditional pedagogic route of many textbooks in magnetic fields. These are governed by laws that
building up to the full dynamical equations through appear to be invariant under transformations
the study of basic electrical and magnetic phenom- between uniformly moving frames of reference.
ena. Instead, we shall follow a path to Maxwells One can imagine a modern version of Galileos
equations that is informed by knowledge of their ship also carrying some magnets, batteries, semi-
most critical feature, invariance under Lorentz conductors, and other electrical components. Salvia-
transformations. Maxwell, of course, knew nothing tis argument for relativity would seem just as
of this. compelling.
We shall start with a summary of basic facts The problem arises when we include rapidly
about the behavior of charges in electric and varying fields in particular, when we consider the
magnetic fields, and then establish the full dynami- propagation of light. As Einstein (1905) put it,
cal framework by considering this behavior as seen Maxwells electrodynamics . . . , when applied to
from moving frames of reference. It is impossible, of moving bodies, leads to asymmetries which do not
course, to do this consistently within the framework appear to be inherent in the phenomena. The
of classical ideas of space and time since Maxwells central difficulty is that Maxwells equations give
equations are inconsistent with Galilean relativity. light, along with other electromagnetic waves, a
But it is at least possible to understand some of the definite velocity: in empty space, it travels with the
key features of the equations, in particular the need same speed in every direction, independently of the
for the term involving the time derivative of E, the motion of the source a fact that is incompatible
so-called displacement current, in the third of with Galilean invariance. Light traveling with speed
Maxwells equations. c in one frame should have speed c u in a frame
We shall begin with some remarks concerning the moving towards the source of the light with speed u.
role of relativity in classical dynamics. Thus, it should be possible for light to travel with
any speed. Light that travels with speed c in a frame
in which its source is at rest should have some other
Relativity in Newtonian Dynamics
speed in a moving frame; so Galilean invariance
Newtons laws hold in all inertial frames. The would imply dependence of the velocity of light on
formalism of classical mechanics is invariant under the motion of the source.
Galilean transformations and it is impossible to tell A full resolution of the conflict can only be
by observing the dynamical behavior of particles achieved within the special theory of relativity: here,
and other bodies whether a frame of reference is at remarkably, Maxwells equations retain exactly
42 Introductory Article: Electromagnetism
their classical form, but the transformations between EM2. A stationary point charge e generates an electric
the space and time coordinates of frames of field, but no magnetic field. The electric field is
reference in relative motion do not. The difference given by
appears when the velocities involved are not insig- ker
nificant when compared with the velocity of light. E 7
r3
So long as one can ignore terms of order u2 =c2 ,
Maxwells equations are compatible with the Gali- where r is the position vector from the charge,
lean principle of relativity. r = jrj, and k is a positive constant, analogous
to the gravitational constant.
Charges, Fields, and the By combining [7] and [5], we obtain an inverse-
Lorentz-Force Law square law electrostatic force
The basic objects in the modern form of electro- kee0
magnetic theory are 8
r2
charged particles; and between two stationary charges; unlike gravity, it is
the electric and magnetic fields E and B, which repulsive when the charges have the same sign.
are vector quantities that depend on position and
time. EM3. A point charge moving with velocity v gen-
erates a magnetic field
The charge e of a particle, which can be positive
or negative, is an intrinsic quantity analogous k0 ev ^ r
B 9
to gravitational mass. It determines the strength r3
of the particles interaction with the electric where k0 is a second positive constant.
and magnetic fields as its mass determines
the strength of its interaction with gravitational This is extrapolated from measurements of the
fields. magnetic field generated by currents flowing in
The interaction is in two directions. First, electric electrical circuits.
and magnetic fields exert a force on a charged The constants k and k0 in EM2 and EM3
particle which depends on the value of the charge, determine the strengths of electric and magnetic
the particles velocity, and the values of E and B at interactions. They are usually denoted by
the location of the particle. The force is given by the 1 0
Lorentz-force law k ; k0 10
40 4
f eE u ^ B 5 Charge e is measured in coulombs, jBj in teslas, and
jEj in volts per meter. With other quantities in SI units,
in which e is the charge and u is the velocity. It is
analogous to the gravitational force 0 8:9 1012 ; 0 1:3 106 11
continuous distribution of charge. The densities are volume V between S and a small sphere SR to
defined as the limits deduce that
P P Z Z Z
e ev E dS E dS E dS 0
lim ; J lim 12
V!0 V V!0 V S SR @V
where V is a small volume containing the point, e is and that the integrals of E over S and SR are the
a charge within the volume, and v is its velocity; the same. Therefore,
sums are over the charges in V and the limits are Z (
e=0 if the charge is in
taken as the volume is shrunk (although we shall not E dS the volume bounded by S
worry too much about the precise details of the S
0 otherwise
limiting process).
When we sum over a distribution of charges,
the integral on the left picks out the total charge
within S. Therefore, we have the Gauss theorem.
Stationary Distributions of Charge
The Gauss theorem. For any closed surface @V
We begin the task of converting the basic principles bounding a volume V,
into partial differential equations by looking at the Z
electric field of a stationary distribution of charge, E dS Q=0
where the passage to the continuous limit is made by @V
using the Gauss theorem to restate the inverse- where E is the total electric field and Q is the total
square law. charge within V.
The Gauss theorem relates the integral of the
electric field over a closed surface to the total charge Now we can pass to the continuous limit. Suppose
contained within it. For a point charge, the electric that E is generated by a distribution of charges with
field is given by EM2: density (charge per unit volume). Then by the
Gauss theorem,
er Z Z
E 1
40 r3 E dS dV
@V 0 V
Since div r = 3 and grad r = r=r, we have
for any volume V. But then, by the divergence
er e 3 3r r theorem,
divE div 0 Z
0 r3 40 r3 r5
div E =0 dV 0
everywhere except at r = 0. Therefore, by the V
divergence theorem, Since this holds for any volume V, it follows that
Z
div E =0 14
E dS 0 13
@V By an argument in a similar spirit, we can also
for any closed surface @V bounding a volume V that show that the electric field of a stationary distribu-
does not contain the charge. tion of charge is conservative in the sense that the
What if the volume does contain the charge? total work done by the field when a charge is moved
Consider the region bounded by the sphere SR of around a closed loop vanishes; that is,
radius R centered on the charge; SR has outward I
unit normal r=r. Therefore, E ds 0
Z Z
e e for any closed path. This is equivalent to
E dS 2
dS
SR 4R 0 SR 0 curl E 0 15
In particular, the value of the surface integral on the since, by Stokes theorem,
left-hand side does not depend on R. I Z
Now consider arbitrary finite volume bounded by E ds curl E dS
S
a closed surface S. If the charge is not inside
the volume, then the integral of E over S vanishes where S is any surface spanning the path. This vanishes
by [13]. If it is, then we can apply [13] to the for every path and for every S if and only if [15] holds.
44 Introductory Article: Electromagnetism
The field of a single stationary charge is con- the right-hand side, by analogy with the charge
servative since density in [14].
e
E grad ;
40 r Inconsistency with Galilean Relativity
and therefore curl E = 0 since the curl of a gradient Our central concern is the compatibility of the laws
vanishes identically. For a continuous distribution, of electromagnetism with the principle of relativity.
E = grad , where As Einstein observed, simple electromagnetic inter-
Z actions do indeed depend only on relative motion;
1 r 0
r dV 0 16 the current induced in a conductor moving through
40 r 0 2V jr r 0 j
the field of a magnet is the same as that generated in
In the integral, r (the position of the point at which a stationary conductor when a magnet is moved past
is evaluated) is fixed, and the integration is over it with the same relative velocity (Einstein 1905).
the positions r 0 of the individual charges. In spite of Unfortunately, this symmetry is not reflected in our
the singularity at r = r 0 , the integral is well defined. basic principles. We very quickly come up against
So, [15] also holds for a continuous distribution of contradictions if we assume that they hold in every
stationary charge. inertial frame of reference.
One emerges as follows. An observer O can measure
the values of B and E at a point by measuring the force
The Divergence of the Magnetic Field on a particle of standard charge, which is related to the
velocity v of the charge by the Lorentz-force law,
We can apply the same argument that established
the Gauss theorem to the magnetic field of a slow- f eE v ^ B
moving charge. Here, A second observer O0 moving relative to the first with
0 ev ^ r velocity v will see the same force, but now acting on a
B particle at rest. He will therefore measure the electric
4r3
field to be E0 = f =e. We conclude that an observer
where r is the vector from the charge to the point at
moving with velocity v through a magnetic field B and
which the field is measured. Since r=r3 = grad(1=r),
an electric field E should see an electric field
we have
E0 E v ^ B 18
r 1
div v ^ 3 v ^ curl grad 0
r r By interchanging the roles of the two observers, we
should also have
Therefore, div B = 0 except at r = 0, as in the case of
the electric field. However, in the magnetic case, the E E 0 v ^ B0 19
integral of the field over a surface surrounding the where B0 is the magnetic field measured by the
charge also vanishes, since if SR is a sphere of radius second observer. If both are to hold, then B B0
R centered on the charge, then must be a scalar multiple of v.
Z Z
0 e v^r r But this is incompatible with EM3; if the fields are
B dS 3
dS 0 those of a point charge at rest relative to the first
SR 4 SR r r
observer, then E is given by [7], and
By the divergence theorem, the same is true for any
surface surrounding the charge. We deduce that if B0
magnetic fields are generated only by moving On the other hand, the second observer sees the field
charges, then of a point charge moving with velocity v. Therefore,
Z
B dS 0 0 ev ^ r
B0
@V 4r3
for any volume V, and hence that So B B0 is orthogonal to v, not parallel to it.
This conspicuous paradox is resolved, in part, by
div B 0 17
the realization that EM3 is not exact; it holds only
Of course, if there were free magnetic poles when the velocities are small enough for the
generating magnetic fields in the same way that magnetic force between two particles to be negli-
charges generate electric fields, then this would not gible in comparison with the electrostatic force. If v
hold; there would be a magnetic pole density on is a typical velocity, then the condition is that v2 0
Introductory Article: Electromagnetism 45
should be much less than 1=0 . That is, the velocities when we replace B by cB to put it into the same
involved should be much less than units as E). The magnetic fields generated by
currents in electrical circuits are not, however,
1
c p 3 108 m s1 dominated by large electric fields. This is because
0 0 the currents are created by the flow, at slow
This, of course, is the velocity of light. velocity, of electrons, while overall the matter in
the wire is roughly electrically neutral, with the
electric fields of the positively charged nuclei and
The Limits of Galilean Invariance negatively charged electrons canceling.
Our basic principles EM1EM3 must now be seen to This is the physical context to keep in mind in
be approximations they describe the interactions of the following deduction of Faradays law of
particles and fields when the particles are moving induction from Galilean invariance for velocities
relative to each other at speeds much less than that of much less than c. The law relates the electromotive
light. To emphasize that we cannot expect, in force or voltage around an electrical circuit
particular, EM3 to hold for particles moving at to the rate of change of the magnetic field B over
speeds comparable with c, we must replace it by a surface spanning the circuit. In its differential
form, the law becomes one of Maxwells
EM30 . A charge moving with velocity v, where v c, equations.
generates a magnetic field Suppose first that the fields are generated by
0 ev ^ r charges all moving relative to a given inertial
B Ov2 =c2 20 frame of reference R with the same velocity v.
4r3
Then in a second frame R0 moving relative to R
The magnetic field of a system of charges in with velocity v, there is a stationary distribution of
general motion satisfies charge. If the velocity is much less than that of
div B 0 21 light, then the electric field E0 measured in R0 is
related to the electric and magnetic E and B
In the second part, we have retained [21] as a measured in R by
differential form of the statement that there are no
free magnetic poles; the magnetic field is generated E0 E v ^ B
only by the motion of the charges. With this change, Since the field measured in R0 is that of a stationary
the theory is consistent with the principle of distribution of charge, we have
relativity, provided that we ignore terms of order
v2 =c2 . The substitution of EM30 for EM3 resolves the curl E0 0
conspicuous paradox; the symmetry noted by Ein- In R, the charges are all moving with velocity v, so
stein between the current generated by the motion of their configuration looks exactly the same from the
the conductor in a magnetic field and by the motion point r at time t as it does from the point r v at
of a magnet past a conductor is explained, provided time t . Therefore,
that the velocities are much less than that of light.
The central problem remains however; the equa- Br v; t Br; t
tions of electromagnetism are not invariant under Er v; t Er; t
a Galilean transformation with velocity comparable
to c. The paradox is still there, but it is more subtle and hence by taking derivatives with respect to
than it appeared to be at first. There are three at = 0,
possible ways out: (1) the noninvariance is real and @B
has observable effects (necessarily of order v2 =c2 or v grad B 0
@t 22
smaller); (2) Maxwells theory is wrong; or (3) the @E
Galilean transformation is wrong. Disconcertingly, v grad E 0
@t
it is the last path that physics has taken. But that is
to jump ahead in the story. Our task is to complete So we must have
the derivation of Maxwells equations. 0 curl E0
curl E curlv ^ B
Faradays Law of Induction curl E v div B v grad B
The magnetic field of a slow-moving charge will @B
curl E 23
always be small in relation to its electric field (even @t
46 Introductory Article: Electromagnetism
where is the charge density, J is the current charge; it is a differential form of the statement
density, and c2 = 1=0 0 . These are Maxwells that charges are neither created nor destroyed.
equations, the basis of modern electrodynamics.
Together with the Lorentz-force law, they describe
the dynamics of charges and electromagnetic fields. Conservation of Charge
We have arrived at them by considering how basic
electromagnetic processes appear in moving frames To see the connection between the continuity
of reference an unsatisfactory route because we equation and charge conservation, let us look at
have seen on the way that the principles on which the total charge within a fixed V bounded by a
we based the derivation are incompatible with surface S. If charge is conserved, then any increase
Galilean invariance for velocities comparable with or decrease in a short period of time must be
that of light. Maxwell derived them by analyzing an exactly balanced by an inflow or outflow of charge
elaborate mechanical model of electric and magnetic across S.
fields as displacements in the luminiferous ether. Consider a small element dS of S with outward
That is also unsatisfactory because the model has unit normal and consider all the particles that have a
long been abandoned. The reason that they are particular charge e and a particular velocity v at
accepted today as the basis of theoretical and time t. Suppose that there are of these per unit
practical applications of electromagnetism has little volume ( is a function of position). Those that cross
to do with either argument. It is first that they are the surface element between t and t
t are those
self-consistent, and second that they describe the that at time t lie in the region of volume
behavior of real fields with unreasonable accuracy. jv n dS
tj
shown in Figure 1. They contribute e v dS
t to the
outflow of charge through the surface element. But
The Continuity Equation the value of J at the surface element is the sum of
It is not immediately obvious that the equations are e v over all possible values of v and e. By summing
self-consistent. Given and J as functions of the over v, e, and the elements of the surface, therefore,
coordinates and time, Maxwells equations are two and by passing to the limit of a continuous
scalar and two vector equations in the unknown distribution, the total rate of outflow is
Z
components of E and B. That is, a total of eight
equations for six unknowns more equations than J dS
S
unknowns. Therefore, it is possible that they are in
fact inconsistent. Charge conservation implies that the rate of
If we take the divergence of eqn [29], then we outflow should be equal to the rate of decrease in
obtain the total charge within V. That is,
Z Z
@ d
div B 0 dV J dS 0 31
@t dt V S
which is consistent with eqn [27]; so no problem By differentiating the first term under the integral
arises here. However, by taking the divergence of sign and by applying the divergence theorem to the
eqn [28] and substituting from eqn [26], we get second integral,
Z
@
0 div curl B div J dV 0 32
V @t
1 @
2 div E 0 div J If this is to hold for any choice of V, then and J
c @t
@ must satisfy the continuity equation. Conversely, the
0 div J continuity equation implies charge conservation.
@t
This gives a contradiction unless
n
@
div J 0 30 dS
@t
dt dt
So the choice of and J is not unconstrained; they
must be related by the continuity equation [30]. This
holds for physically reasonable distributions of Figure 1 The outflow through a surface element.
48 Introductory Article: Electromagnetism
p
The Displacement Current where c = 1= 0 0 . By taking the curl of eqn [36]
and by substituting from eqns [35] and [37], we
The third of Maxwells equations can be written as
obtain
@E 1 @E
curl B 0 J 0 33 2
0 grad div B r B 2 curl
@t c @t
1 @
in which form it can be read as an equation r2 B 2 curl E
for an unknown magnetic field B in terms of c @t
a known current distribution J and electric 1 @2B
r2 B 2 2 38
field E. When E and J are independent of t, it c @t
reduces to Therefore, the three components of B in empty space
satisfy the (scalar) wave equation
curl B 0 J
&u 0
which determines the magnetic field of a steady
current, in a way that was already familiar Here & is the dAlembertian operator, defined by
to Maxwells contemporaries. But his second 1 @2 1 @2 @2 @2 @2
2
term on the right-hand side of [33] was new; it & r
c2 @t2 c2 @t2 @x2 @y2 @z2
adds to J the so-called vacuum displacement
current By taking the curl of eqn [37], we also obtain
& E = 0.
@E
0
@t
Monochromatic Plane Waves
The name comes from an analogy with the
behavior of charges in an insulating material. The fact that E and B are vector-valued solutions of
Here no steady current can flow, but the distribu- the wave equation in empty space suggests that we
tion of charges within the material is distorted look for plane wave solutions of Maxwells
by an external electric field. When the field equations in which
changes, the distortion also changes, and the result E a cos b sin 39
appears as a current the displacement current
which flows during the period of change. Max- where a, b are constant vectors and
wells central insight was that the same term !
should be present even in empty space. The ct r e; e e 1 40
c
consequence was profound; it allowed him to
with ! > 0, , , and e constant; ! is the frequency
explain the propagation of light as an electromag-
and e is a unit vector that gives the direction of
netic phenomenon.
propagation (adding to t and ce to r leaves u
unchanged). This satisfies the wave equation, but for
a general choice of the constants, it will not be
The Source-Free Equations possible to find B such that eqns [34][37] also hold.
By taking the divergence of eqn [39], we obtain
In a region of empty space, away from the
charges generating the electric and magnetic fields, !
div E e a sin e b cos 41
we have = 0 = J, and Maxwells equations c
reduce to For eqn [34] to hold, therefore, we must choose a
and b orthogonal to e. For eqn [37] to hold, we
div E 0 34
must find B such that
! @B
div B 0 35 curl E e ^ a sin e ^ b cos 42
c @t
1 @E A possible choice is
curl B 0 36
c2 @t e^E 1
B e ^ a cos e ^ b sin 43
c c
@B and it is not hard to see that E and B then satisfy
curl E 0 37
@t [35] and [36] as well.
Introductory Article: Electromagnetism 49
The solutions obtained in this way are called nontrivial topology, then it may not be possible to
monochromatic electromagnetic plane waves. find a suitable or a throughout the whole of U.
Note that such waves are transverse in the sense Suppose now that we are given fields E and B
that E and B are orthogonal to the direction of satisfying Maxwells equations [26][29] with
propagation. The definition E can be written more sources represented by the charge density and the
concisely in the form current density J. Since div B = 0, there exists a time-
dependent vector field A (t, x, y, z) such that
E Re a ibei 44
B curl A
It is an exercise in Fourier analysis to show every
solution in empty space is a combination of If we substitute B = curl A into [29] and interchange
monochromatic plane waves. A plane wave has curl with the time derivative, then we obtain
plane or linear polarization if a and b are
@A
proportional. It has circular polarization if curl E 0
a a = b b, a b = 0. @t
At the heart of Maxwells theory was the idea that It follows that there exists a scalar (t, x, y, z) such
a light wave with definite frequency or color is that
represented by a monochromatic plane solution of
his equations. @A
E grad 47
@t
Such a vector field A is called a magnetic vector
Potentials potential; a function such that eqn [47] holds is
For every solution of Maxwells equations in vacuo, called an electric scalar potential.
the components of E and B satisfy the three- Conversely, given scalar and vector functions
dimensional wave equation; but the converse is not and A of t, x, y, z, we can define B and E by
true. That is, it is not true in general that if @A
B curl A; E grad 48
&B 0; &E 0 @t
Then two of Maxwells equations hold automati-
then E and B satisfy Maxwells equations. For this
cally, since
to happen, the divergence of both fields must vanish,
and they must be related by [36] and [37]. These @B
additional constraints are somewhat simpler to div B 0; curl E 0
@t
handle if we work not with the fields themselves,
The remaining pair translate into conditions on A
but with auxiliary quantities called potentials.
and . Equation [26] becomes
The definition of the potentials depends on
standard integrability conditions from vector calcu- @
div E r2 div A
lus. Suppose that v is a vector field, which may @t 0
depend on time. If curl v = 0, then there exists a
and eqn [28] becomes
function such that
1 @E
v grad 45 curl B r2 A grad div A
c2 @t
If div v = 0, then there exists a second vector field a 1 @ @A
such that 2 grad
c @t @t
v curl a 46 0 J
Neither nor a is uniquely determined by v. In the If we put
first case, if [45] holds, then it also holds when is
1 @
replaced by 0 = f , where f is a function of time div A
alone; in the second, if [46] holds, then it also holds c2 @t
when a is replaced by then we can rewrite the equations for A and more
0 simply as
a a grad u
@
for any scalar function u of position and time. It &
@t 0
should be kept in mind that the existence statements
are local. If v is defined on a region U with &A grad 0 J
50 Introductory Article: Electromagnetism
Here we have four equations (one scalar, one vector) If we impose the Lorenz condition, then the only
in four unknowns ( and the components of A). Any remaining freedom in the choice of A and is to
set of solutions , A determines a solution of make gauge transformations [49] in which u is a
Maxwells equations via [48]. solution of the wave equation &u = 0. Under the
Lorenz condition, Maxwells equations take the
form
Gauge Transformations & =0 ; &A 0 J 51
Given solutions E and B of Maxwells equations, Consistency with the Lorenz condition follows from
what freedom is there in the choice of A and ? the continuity equation on and J.
First, A is determined by curl A = B up to the In the absence of sources, therefore, Maxwells
replacement of A by equations for the potential in the Lorenz gauge
reduce to
A0 A grad u
& 0; &A 0 52
for some function u of position and time. The scalar
potential 0 corresponding to A0 must be chosen so together with the constraint
that
1 @
div A 0
@A0 c2 @t
grad 0 E
@t We can, for example, choose three arbitrary solu-
@A @u tions of the scalar wave equation for the compo-
E grad
@t @t nents of the vector potential, and then define by
@u Z
grad c2 div Adt
@t
That is, 0 = @u=@t f (t), where f is a function Whatever choice we make, we shall get a solution of
of t alone. We can absorb f into u by subtracting Maxwells equations, and every solution of Max-
Z
wells equations (without sources) will arise from
f dt some such choice.
Maxwells contribution was decisive, although 1846 Faraday suggested that light is a vibration
much of what we now call Maxwells theory is in magnetic lines of force.
due to his successors (Lorentz, Hertz, Einstein, and 1863 Maxwell published the equations that
so on); and, as we shall see, a key element in describe the dynamics of electric and magnetic
Maxwells own description of electromagnetism fields.
the electromagnetic ether, an all-pervasive 1905 Einsteins paper On the electrodynamics
medium which was supposed to transmit electro- of moving bodies.
magnetic waves was thrown out by Einstein.
A rough chronology is as follows.
1800 Volta demonstrated the connection between Further Reading
galvanism and static electricity.
Chalmers AF (1975) Maxwell and the displacement current.
1820 Oersted showed that the current from a Physics Education January 1975: 4549.
battery generates a force on a magnet. Einstein A (1905) On the Electrodynamics of Moving Bodies. A
1822 Ampere suggested that light was a wave translation of the paper can be found in The Principle of
motion in a luminiferous ether made up of two Relativity by Lorentz HA, Einstein A, Minkowski H, and
types of electric fluid. In the same year, Galileos Weyl H, with notes by Sommerfeld A. New York: Dover,
1952.
Dialogue concerning the two chief world sys- Roche J (1998) The present status of Maxwells displacement
tems was removed from the index of prohibited current. European Journal of Physics 19: 155166.
books. Siegel DM (1985) Mechanical image and reality in Maxwells
1831 Faraday showed that moving magnets can electromagnetic theory. In: Harman PM (ed.) Wranglers and
Physicists. Manchester: Manchester University Press.
induce currents.
statistical mechanics of systems of N identical point theories), it will be useful to consider also systems of
particles Q = (q1 , . . . , qN ) enclosed in a cubic box , particles in dimension d 6 3: in this case the above
with volume V and side L, normally assumed to 6N and 3N become, respectively, 2dN and dN.
have perfectly reflecting walls. Systems with dimension d = 1, 2 are in fact some-
Particles of mass m located at q, q0 will be times very good models for thin filaments or thin
supposed to interact via a pair potential (q q0 ). films. For the same reason, it is often useful to
The microscopic motion follows the equations imagine that space is discrete and particles can only
be located on a lattice, for example, on Zd (see the
X
N X
mq
i @qi qi qj Wwall qi section Lattice models).
j1 i The reader is referred to Gallavotti (1999) for
def
more details.
@qi Q 1
where the potential is assumed to be smooth
except, possibly, for jq q0 j r0 where it could be Pressure, Temperature, and Kinetic
1, that is, the particles cannot come closer than Energy
r0 , and at r0 [1] is interpreted by imagining that they
undergo elastic collisions; the potential Wwall models The beginning was BERNOULLIs derivation of
the container and it will be replaced, unless the perfect gas law via the identification of
explicitly stated, by an elastic collision rule. the pressure at numerical density with the
The time evolution (Q, Q) _ ! St (Q, Q)_ will, there- average momentum transferred per unit time to
fore, be described on the position velocity space, a surface element of area dS on the walls: that is,
Fb(N), of the N particles or, more conveniently, on the average of the observable 2mvv dS, with v
the phase space, i.e., by a time evolution St on the the normal component of the velocity of
momentum position (P, Q, with P = mQ) _ space, the particles that undergo collisions with dS.
F (N). The motion being conservative, the energy If f (v)dv is the distribution of the Q normal compo-
nent of velocity and f (v)d3 v i f (vi )d3 v, v =
X 1 X X
U
def
p2i qi qj Wwall qi (v1 , v2 , v3 ), is the total velocity distribution,
i
2m i<j i the average of the momentum transferred is pdS
def
given by
KP Q Z Z
will be a constant of motion; the last term in is dS 2mv2 f vdv dS mv2 f vdv
v>0
missing if walls are perfect. This makes it convenient to Z
regard the dynamics as associated with two dynamical 2 m 2 3 2 K
dS v f vd v dS 2
systems (F (N), St ) on the 6N-dimensional phase 3 2 3 N
space, and (F U (N), St ) on the (6N 1)-dimensional
Furthermore (2=3)hK=Ni was identified as pro-
surface of energy U. Since the dynamics [1] is def
portional to the absolute temperature hK=Ni =
Hamiltonian on phase space, with Hamiltonian
const (3=2)T which, with present-day notations, is
def
X 1 def written as (2=3)hK=Ni = kB T. The constant kB was
HP; Q p2i Q K
2m (later) called Boltzmanns constant and it is the
i
same for at least all perfect gases. Its independence
it follows that the volume d3N Pd3N Q is conserved on the particular nature of the gas is a conse-
(i.e., a region E has the same volume as St E) and quence of Avogadros law stating that equal
also the area (H(P, Q) U)d3N Pd3N Q is conserved. volumes of gases at the same conditions of
The above dynamical systems are well defined, temperature and pressure contain equal number
i.e., St is a map on phase space globally defined for of molecules.
all t 2 (1, 1), when the interaction potential is Proportionality between average kinetic energy
bounded below: this is implied by the a priori and temperature via the universal constant kB
bounds due to energy conservation. For gravita- became in fact a fundamental assumption extending
tional or Coulomb interactions, much more has to to all aggregates of particles gaseous or not, never
be said, assumed, and done in order to even define challenged in all later works (until quantum
the key quantities needed for a statistical theory of mechanics, where this is no longer true, see the
motion. section Quantum statistics.
Although our world is three dimensional (or at For more details, we refer the reader to Gallavotti
least was so believed to be until recent revolutionary (1999).
Introductory Article: Equilibrium Statistical Mechanics 53
bounce back. One particle in the layer will con- and (up to a proportionality factor) absolute
tribute to the average of @L (x) the amount temperature, respectively.
Z t1 Boltzmanns conception of space (and time) as
1
2 0 L j dt 6 discrete allowed him to conceive the property that
total time t0 the energy surface is constituted by points all of
if t0 is the first instant when the point j enters the which belong to a single trajectory: a property that
layer and t1 is the instant when the -component of would be impossible if the phase space was really a
the velocity vanishes against the wall. Since continuum. Regarding phase space as consisting of a
0 (L j ) is the -component of the force, the finite number of cells of finite volume hdN , for
integral is 2mjj j (by Newtons law), provided, of some h > 0 (rather than of a continuum of points),
course, j > 0: allowed him to think, without logical contradiction,
Suppose that no collisions between particles occur that the energy surface consisted of a single
while the particles travel within the range of the trajectory and, hence, that motion was a cyclic
potential of the wall, i.e., the mean free path is much permutation of its points (actually cells).
greater than the range of the potential defining the Furthermore, it implied that the time average of
wall. The contribution of collisions to the average an observable F(P, Q) had to be identified with its
momentum transfer to the wall per unit time is average on the energy surface computed via the
therefore given by, see [2], Liouville distribution
Z Z
2mv f vwall Av dv C1 FP; QHP; Q UdP dQ
v>0
if wall , f (v) are the average density near the wall with
and, respectively, the average fraction of particles Z
with a velocity component normal to the wall C HP; Q UdP dQ
between v and v dv. Here p, f are supposed to be
independent of the point on the wall: this should be (the appropriate normalization factor): a property
true up to corrections of size o(A). that was written symbolically
Thus, writing the average kinetic
R energy per particle dt dP dQ
and per velocity component, (m=2)v2 f (v)dv, as R
T dP dQ
(1=2)1 (cf. [2]) it follows that
or
def
p h@V i wall 1 7 Z
1 T
lim FSt P; Qdt
has the physical interpretation of pressure. (1=2) 1 T!1 T 0
is the average kinetic energy per degree of freedom: R
hence, it is proportional to the absolute temperature FP 0 ; Q0 HP0 ; Q0 U dP 0 dQ0
R 8
T (cf. see the section Pressure, temperature, and HP0 ; Q0 U dP 0 dQ0
kinetic energy).
On the other hand, if motion on the energy The validity of [8] for all (piecewise smooth)
surface takes place on a single periodic orbit, the observables F and for all points of the energy
quantity p in [7] is the right quantity that would surface, with the exception of a set of zero area, is
make the heat theorem work; see [4]. Hence, called the ergodic hypothesis.
regarding the trajectory on each energy surface as For more details, the reader is referred to
periodic (i.e., the system as monocyclic) leads to the Boltzmann (1968) and Gallavotti (1999).
heat theorem with p, U, V, T having the right
physical interpretation corresponding to their appel-
lations. This shows that monocyclic systems provide
Ensembles
natural models of thermodynamic behavior.
Assuming that a chaotic system like a gas in a Eventually Boltzmann in 1884 realized that the
container of volume V will satisfy, for practical validity of the heat theorem for averages computed
purposes, the above property, a quantity p can be via the right-hand side (rhs) of [8] held indepen-
defined such that dU pdV admits the inverse of dently of the ergodic hypothesis, that is, [8] was not
the average kinetic energy hKi as an integrating necessary because the heat theorem (i.e., [3]) could
factor and, furthermore, p, U, V, hKi have the also be derived under the only assumption that the
physical interpretations of pressure, energy, volume, averages involved in its formulation were computed
Introductory Article: Equilibrium Statistical Mechanics 55
as averages over phase space with respect to the probability distributions attributing the same
probability distribution on the rhs of [8]. average values to the corresponding microscopic
Furthermore, if T was identified with the average observables (i.e., whose averages have the inter-
kinetic energy, U with the average energy, and p pretation of thermodynamic functions).
with the average force per unit surface on the walls 2. Once the correct correspondence between the
of the container with volume V, the relation [3] elements of the different ensembles is established,
held for a variety of families of probability distribu- that is, once the pairs (u, v), (, v), (, ) are so
tions on phase space, besides [8]. Among these are: related to produce the same values for the
def
averages U, V, kB T = 1 , pj@j of
1. The microcanonical ensemble, which is the
collection of probability distributions on the rhs Z
2KP
of [8] parametrized by u = U=N, v = V=N (energy HP; Q; V; ; @ q1 2mv1 n2 dq1 12
and volume per particle), 3N
mc
u;v dP dQ
where (@ (q1 ) is a delta-function pinning q1 to
1 dP dQ the surface @), then the averages of all physi-
HP; Q U 9 cally interesting observables should coincide at
Zmc U; N; V N!hdN
least in the thermodynamic limit, ! 1. In this
where h is a constant with the dimensions of an way, the elements of the considered collection
action which, in the discrete representation of of probability distributions can be identified with
phase space mentioned in the previous section, can the states of macroscopic equilibrium of the
be taken such that hdN equals the volume of the system. The s depend on parameters and there-
cells and, therefore, the integrals with respect to [9] fore they form an ensemble: each of them
can be interpreted as an (approximate) sum over corresponds to a macroscopic equilibrium state
the cells conceived as microscopic configurations whose thermodynamic functions are appropriate
of N indistinguishable particles (whence the N!). averages of microscopic observables and therefore
2. The canonical ensemble, which is the collec- are functions of the parameters identifying .
tion of probability distributions parametrized by
Remark The word ensemble is often used to
, v = V=N,
indicate the individual probability distributions of
1 dPdQ what has been called here an ensemble. The meaning
c;v dPdQ eHP;Q 10 used here seems closer to the original sense in the
Zc ; N; V N!hdN
1884 paper of Boltzmann (in other words, often by
to which more ensembles can be added, such as ensemble one means that collection of the phase
the grand canonical ensemble (Gibbs). space points on which a given probability distribu-
3. The grand canonical ensemble which is the tion is considered, and this does not seem to be the
collection of probability distributions parameter- original sense).
ized by , and defined over the space For instance, in the case of the microcanonical
F gc = [1N = 0 F (N), distributions this means interpreting energy, volume,
temperature, and pressure of the equilibrium state
gc
; dPdQ with specific energy u and specific volume v as
1 dPdQ proportional, through appropriate universal propor-
e NHP;Q 11 tionality constants, to the integrals with respect to
Zgc ; ; V N!hdN
mc
u, v (dP dQ) of the mechanical quantities in [12].
The averages of other thermodynamic observables in
Hence, there are several different models of thermo-
the state with specific energy u and specific volume
dynamics. The key tests for accepting them as real
v should be given by their integrals with respect
microscopic descriptions of macroscopic thermo-
to mc u, v .
dynamics are as follows.
Likewise, one can interpret energy, volume,
1. A correspondence between the macroscopic temperature, and pressure of the equilibrium state
states of thermodynamic equilibrium and the with specific energy u and specific volume v as the
elements of a collection of probability distribu- averages of the mechanical quantities [12] with
tions on phase space can be established by respect to the canonical distribution c, v (dP dQ)
identifying, on the one hand, macroscopic which has average specific energy precisely u. The
thermodynamic states with given values of the averages of other thermodynamic observables in the
thermodynamic functions and, on the other, state with specific energy and volume u and v are
56 Introductory Article: Equilibrium Statistical Mechanics
given by their integrals with respect to c, v . A ensembles with the orthodicity property, hence
similar definition can be given for the description of leading to equivalent mechanical models of thermo-
thermodynamic equilibria via the grand canonical dynamics, can be naturally interpreted in connection
distributions. with the phenomenon of phase transition (see the
For more details, see Gibbs (1981) and Gallavotti section Phase transitions and boundary conditions).
(1999). Clearly, the quoted results do not prove
that thermodynamic equilibria are described by
the microcanonical, canonical, or grand canonical
Equivalence of Ensembles
ensembles. However, they certainly show that,
BOLTZMANN proved that, computing averages via the for most systems, independently of the number of
microcanonical or canonical distributions, the essen- degrees of freedom, one can define quite unambigu-
tial property [3] was satisfied when changes in their ously a mechanical model of thermodynamics estab-
parameters (i.e., u, v or , v, respectively) induced lishing parameter-free, system-independent, physically
changes du and dv on energy and volume, respec- important relations between thermodynamic quanti-
tively. He also proved that the function s, whose ties (e.g., @u (p(u, v)=T(u, v)) @v (1=T(u, v)), from [3]).
existence is implied by [3], was the same function The ergodic hypothesis which was at the root
once expressed as a function of u, v (or of any pair of the mechanical theorems on heat and entropy
of thermodynamic parameters, e.g., of T, v or p, u). cannot be taken as a justification of their validity.
A close examination of Boltzmanns proof shows Naively one would expect that the time scale
that the [3] holds exactly in the canonical ensemble necessary to see an equilibrium attained, called
and up to corrections tending to 0 as ! 1 in the recurrence time scale, would have to be at least the
microcanonical ensemble. Identity of thermo- time that a phase space point takes to visit all
dynamic functions evaluated in the two ensembles possible microscopic states of given energy: hence,
holds, as a consequence, up to corrections of this an explanation of why the necessarily enormous size
order. In addition, Gibbs added that the same held of the recurrence time is not a problem becomes
for the grand canonical ensemble. necessary.
Of course, not every collection of stationary In fact, the recurrence time can be estimated once
probability distributions on phase space would the phase space is regarded as discrete: for the
provide a model for thermodynamics: Boltzmann purpose of countering mounting criticism, Boltz-
called orthodic the collections of stationary mann assumed that momentum was discretized in
distributions which generated models of thermo- units of (2mkB T)1=2 (i.e., the average momentum
dynamics through the above-mentioned identifica- size) and space was discretized in units of 1=3
tion of its elements with macroscopic equilibrium (i.e., the average spacing), implying a volume of
def
states. The microcanonical, canonical, and the later cells h3N with h = 1=3 (2mkB T)1=2 ; then he calcu-
grand canonical ensembles are the chief examples lated that, even with such a gross discretization, a
of orthodic ensembles. Boltzmann and Gibbs cell representing a microscopic state of 1 cm3 of
proved these ensembles to be not only orthodic hydrogen at normal condition would require a time
19
but to generate the same thermodynamic functions, (called recurrence time) of the order of
1010
that is to generate the same thermodynamics. times the age of the Universe (!) to visit the entire
This meant freedom from the analysis of the truth energy surface. In fact, the phase space volume is
of the doubtful ergodic hypothesis (still unproved in = (3 N(2mkB T)3=2 )N h3N and the number of
any generality) or of the monocyclicity (manifestly cells of volume h3N is =(N!h3N ) e3N ; and the
false if understood literally rather than regarding the time to visit all will be e3N
0 , with
0 a typical
phase space as consisting of finitely many small, atomic unit, e.g., 1012 s but N = 1019 . In this
discrete cells), and allowed Gibbs to formulate the sense, the statement boldly made by young Boltz-
problem of statistical mechanics of equilibrium as mann that aperiodic motions can be regarded as
follows. periodic with infinite period was even made
quantitative.
Problem Study the properties of the collection of
The recurrence time is clearly so long to be
probability distributions constituting (any) one of
irrelevant for all purposes: nevertheless, the correct-
the above ensembles.
ness of the microscopic theory of thermodynamics
However, by no means the three ensembles just can still rely on the microscopic dynamics once it is
introduced exhaust the class of orthodic ensembles understood (as stressed by Boltzmann) that the
producing the same models of thermodynamics in reason why we observe approach to equilibrium,
the limit of infinitely large systems. The wealth of and equilibrium itself, over human timescales
Introductory Article: Equilibrium Statistical Mechanics 57
(which are far shorter than the recurrence times) is Not surprisingly, assumptions on the interparticle
due to the property that on most of the energy surface potential (q q0 ) are necessary to achieve an
the (very few) observables whose averages yield existence proof of the limits in [13]. The assump-
macroscopic thermodynamic functions (namely pres- tions on are not only quite general but also have a
sure, temperature, energy, . . .) assume the same value clear physical meaning. They are
even if N is only very moderately large (of the order of
1. stability: that PN is, existence of a constant B 0
103 rather than 1019 ). This implies that this value
such that i<j (qi qj ) BN for all N 0,
coincides with the average and therefore satisfies the
q1 , . . . , qN 2 Rd , and
heat theorem without any contradiction with the
2. temperedness: that is, existence of constants "0 ,
length of the recurrence time. The latter rather
R > 0 such that j(q q0 )j < Bjq q0 jd"0 for
concerns the time needed to the generic observable to
jq q0 j > R.
thermalize, that is, to reach its time average: the
generic observable will indeed take a very long time to The assumptions are satisfied by essentially all
thermalize but no one will ever notice, because the microscopic interactions with the notable exceptions
generic observable (e.g., the position of a pre-identified of the gravitational and Coulombic interactions,
particle) is not relevant for thermodynamics. which require a separate treatment (and lead to
The word proof is not used in the mathematical somewhat different results on the thermodynamic
sense so far in this article: the relevance of a behavior).
mathematically rigorous analysis was widely rea- For instance, assumptions (1), (2) are satisfied
lized only around the 1960s at the same time when if (q) is 1 for jqj < r0 and smooth for jqj > r0 ,
the first numerical studies of the thermodynamic for some r0 0, and furthermore (q) > B0 jqj(d"0 )
functions became possible and rigorous results were if r0 < jqj R, while for jqj > R it is j(q)j <
needed to check the correctness of various numerical B1 jqj(d"0 ) , for some B0 , B1 , "0 > 0, R > r0 . Briefly,
simulations. is fast diverging at contact and fast approaching 0
For more details, the reader is referred to Boltzmann at large distance. This is called a (generalized)
(1968a, b) and Gallavotti (1999). LennardJones potential. If r0 > 0, is called a
hard-core potential. If B1 = 0, the potential is said
to have finite range. (See Appendix 1 for physical
implications of violations of the above stability and
Thermodynamic Limit
temperedness properties.) However, in the following,
Adopting Gibbs axiomatic point of view, it is it will be necessary, both for simplicity and to contain
interesting to see the path to be followed to achieve the length of the exposition, to restrict consideration
an equivalence proof of three ensembles introduced to the case B1 = 0, i.e., to
in the section Heat theorem and ergodic
hypothesis. q > B0 jqjd"0 ; r0 < jqj R;
14
A preliminary step is to consider, given a cubic jqj 0; jqj > R
box of volume V = Ld , the normalization factors
Zgc (, , V), Zc (, N, V), and Zmc (U, N, V) in [9], unless explicitly stated.
[10], and [11], respectively, and to check that the Assuming stability and temperedness, the exis-
following thermodynamic limits exist: tence of the limits in [13] can be mathematically
proved: in Appendix 2, the proof of the first is
def 1
pgc ; lim log Zgc ; ; V analyzed to provide the simplest example of the
V!1 V
technique. A remarkable property of the functions
def 1
fc ; lim log Zc ; N; V pgc (, ), fc (, ), and smc (u, ) is that they are
V!1;N
V N 13 convex functions: hence, they are continuous in the
k1
B smc u;
interior of their domains of definition and, at one
1 variable fixed, are differentiable with respect to the
def
lim log Zmc U; N; V other with at most countably many exceptions.
V!1;N=V; U=Nu N
In the case of a potential without hard core
def
where the density = v1 N=V is used, instead of (max = 1), fc (, ) can be checked to tend to 0
v, for later reference. The normalization factors play slower than as ! 0, and to 1 faster than as
an important role because they have simple thermo- ! 1 (essentially proportionally to log in both
dynamic interpretation (see the next section): they cases). Likewise, in the same case, smc (u, ) can be
are called grand canonical, canonical, and micro- shown to tend to 0 slower than u umin as u ! umin ,
canonical partition functions, respectively. and to 1 faster than u as u ! 1. The latter
58 Introductory Article: Equilibrium Statistical Mechanics
asymptotic properties can be exploited to derive, from with parameters (, ) should correspond with the
the relations between the partition functions in [13], canonical with parameters (, vgc ).
X
1 For more details, the reader is referred to Ruelle
Zgc ; ; V e N Zc ; N; V (1969) and Gallavotti (1999).
N0 15
Z 1
c
Z ; N; V eU Zmc U; N; V dU Physical Interpretation of
B
Thermodynamic Functions
and, from the above-mentioned convexity, the
consequences The existence of the limits [13] implies several
properties of interest. The first is the possibility of
pmc ; max v1 v1 fc ; v1 finding the physical meaning of the functions
v
16 pgc , fc , smc and of the parameters ,
fc ; v1 maxu k1 1
B smc u; v
u Note first that, for all V the grand canonical average
and that the maxima are attained in points, or hKi, is (d=2)1 hNi, so that 1 is proportional to
intervals, internal to the intervals of definition. Let the temperature Tgc = T(, ) in the grand canonical
vgc , uc be points where the maxima are, respectively, distribution: 1 = kB T(, ). Proceeding heuristically,
attained in [16]. the physical meaning of p(, ) and can be found
Note that the quantity e N Zc (, N, V)=Zgc (, , V) through the following remarks.
has the interpretation of probability of a density ConsiderR the microcanonical distribution mc u, v and
v1 = N=V evaluated in the grand canonical distribu- denote by the integral over (P, Q) extended to the
tion. It follows that, if the maximum in the first of domain of the (P, Q) such that H(P, Q) = U and, at
[16] is strict, that is, it is reached at a single point, the the same time, q1 2 dV, where dV is an infinitesimal
values of v1 in closed intervals not containing the volume surrounding the region . Then, by the
maximum point v1 microscopic definition of the pressure p (see the
gc have a probability behaving as
<e cV , c > 0, as V ! 1, compared to the probability introductory section), it is
of v1 s in any interval containing v1 Z
gc . Hence, vgc has N 2 p21 dP dQ
the interpretation of average value of v in the grand pdV
ZU; N; V 3 2m N!hdN
canonical distribution, in the limit V ! 1. Z
2 dP dQ
Likewise, the interpretation of KP 18
3ZU; N; V N!hdN
euN Zmc uN; N; V=Zc ; N; V
where (H(P, Q) U). The RHS of [18] can be
as probability in the canonical distribution of an compared with
energy density u shows that, if the maximum in the Z
second of [16] is strict, the values of u in closed @V ZU; N; VdV N dP dQ
intervals not containing the maximum point uc have ZU; N; V ZU; N; V N!hdN
a probability behaving as <ecV, c > 0, as V ! 1, to give
compared to the probability of us in any interval
containing uc . Hence, in the limit ! 1, the @V Z dV p dV
N p dV
average value of u in the canonical distribution is uc . Z 2=3hKi
If the maxima are strict, [16] also establishes a R R
because hKi , which denotes the average K= 1,
relation between the grand canonical density, the should be essentially the same as the microcanonical
canonical free energy and the grand canonical para- average hKimc (i.e., insensitive to the fact that one
meter , or between the canonical energy, the micro- particle is constrained to the volume dV) if N is
canonical entropy, and the canonical parameter : large. In the limit V ! 1, V=N = v, the latter
@v1 v1 1
kB @u smc uc ; v1 remark together with the second of [17] yields
gc fc ; vgc ; 17
k1 1
B @v smc u; v pu; v;
where convexity and strictness of the maxima imply
the derivatives existence. k1
B @u smc u; v 19
Remark Therefore, in the equivalence between respectively. Note that p 0 and it is not increasing
canonical and microcanonical ensembles, the cano- in v because smc () is concave as a function of
nical distribution with parameters (, v) should v = 1 (in fact, by the remark following [14]
correspond with the microcanonical with para- smc (u, ) is convex in and, in general, if g() is
meters (uc , v). The grand canonical distribution convex in then g(v1 ) is always concave in v = 1 ).
Introductory Article: Equilibrium Statistical Mechanics 59
Hence, dsmc (u, v) = (du pdv)=T, so that taking For more details the reader is referred to Ruelle
into account the physical meaning of p, T (as (1969) and Gallavotti (1999).
pressure and temperature, see the section Pressure,
temperature, and kinetic energy), smc is, in thermo-
dynamics, the entropy. Therefore (see the second Phase Transitions and Boundary
of [16]), fc (, ) = uc k1
B smc (uc , ) becomes Conditions
fc ; uc Tc smc uc ; ; The analysis in the last two sections of the relations
dfc p dv smc dT 20 between elements of ensembles of distributions
describing macroscopic equilibrium states not only
and since uc has the interpretation (as mentioned in
allows us to obtain mechanical models of thermo-
the last section) of average energy in the canonical
dynamics but also shows that the models, for a given
distribution c, v it follows that fc has the thermo-
system, coincide at least as ! 1. Furthermore, the
dynamic interpretation of free energy (once com-
equivalence between the thermodynamic functions
pared with the definition of free energy, F = U TS,
computed via corresponding distributions in differ-
in thermodynamics).
ent ensembles can be extended to a full equivalence
By [17] and [20],
of the distributions.
@v1 v1 1
gc fc ; vgc uc Tc smc pvgc If the maxima in [16] are attained at single points
vgc or uc the equivalence should take place in the
and vgc has the meaning of specific volume v. Hence, gc
sense that a correspondence between , , c, v , mc
u, v
after comparison with the definition of chemical can be established so that, given any local obser-
potential, V = U TS pV, in thermodynamics, it vable F(P, Q), defined as an observable depending
follows that the thermodynamic interpretation of on (P, Q) only through the pi , qi with qi 2 , where
is the chemical potential and (see [16], [17]), the
is a finite region, has the same average with
grand canonical relation respect to corresponding distributions in the limit
pgc ; v1 1 1 1 ! 1.
gc vgc uc kB smc uc ; v
The correspondence is established by considering
shows that pgc (, ) p, implying that pgc (, ) is ( , ) $ (, vgc ) $ (umc , v), where vgc is where the
the pressure expressed, however, as a function of maximum in [16] is attained, umc uc is where the
temperature and chemical potential. maximum in [17] is attained and vgc v, (cf. also
To go beyond the heuristic derivations above, it [19], [20]). This means that the limits
should be remarked that convexity and the property Z
that the maxima in [16], [17] are reached in the def
lim FP; Qa dP dQ hFia
interior of the intervals of variability of v or u are V!1
sufficient to turn the above arguments into rigorous a independent; a gc; c; mc 21
mathematical deductions: this means that given [19]
as definitions of p(u, v), (u, v), the second of [20] coincide if the averages are evaluated by the
gc
follows as well as pgc (, ) p(uv , v1gc ). But the distributions , , c, vc , mc
umc , vmc
values vgc and uc in [16] are not necessarily unique: Exceptions to [21] are possible: and are certainly
convex functions can contain horizontal segments likely to occur at values of u, v where the maxima in
and therefore the general conclusion is that the [16] or [17] are attained in intervals rather than in
maxima may possibly be attained in intervals. isolated points; but this does not exhaust, in general,
Hence, instead of a single vgc , there might be a the cases in which [21] may not hold.
whole interval [v , v ], where the rhs of [16] reaches However, no case in which [21] fails has to be
the maximum and, instead of a single uc , there regarded as an exception. It rather signals that an
might be a whole interval [u , u ] where the rhs of interesting and important phenomenon occurs. To
[17] reaches the maximum. understand it properly, it is necessary to realize that
Convexity implies that the values of or the grand canonical, canonical, and microcanonical
for which the maxima in [16] or [17] are attained families of probability distributions are by far not
in intervals rather than in single points are rare the only ensembles of probability distributions
(i.e., at most denumerably many): the interpretation whose elements can be considered to generate
is, in such cases, that the thermodynamic functions models of thermodynamics, that is, which are
show discontinuities, and the corresponding orthodic in the sense of the discussion in the section
phenomena are called phase transitions (see the Equivalence of ensembles. More general families
next section). of orthodic statistical ensembles of probability
60 Introductory Article: Equilibrium Statistical Mechanics
bulk effects and at a phase transition the averages an idealization void of physical reality, it is never-
of the local observable, if existing at all, will theless useful to define such states because certain
exhibit a nontrivial dependence on the boundary notions (e.g., that of pure state) can be sharply
conditions. This is also called long range order. defined, with few words and avoiding wide circum-
3. It is possible to show that when this happens then volutions, in terms of them. Therefore, let:
some thermodynamic function whose value is
Definition An infinite-volume state with parameters
independent of the boundary condition (e.g., the
(, v), (u, v) or (, ) is a collection of average values
free energy in the canonical distributions) has
F ! hFi obtained, respectively, as limits of finite-
discontinuous derivatives in terms of the para-
volume averages hFin defined from canonical, micro-
meters of the ensemble. This is in fact one of the
canonical, or grand canonical distributions in n with
frequently-used alternative definitions of phase
fixed parameters (, v), (u, v) or (, ) and with general
transitions: the latter two natural definitions of
boundary condition of fixed external particles, on
first-order phase transition are equivalent. How-
sequences n ! 1 for which such limits exist simul-
ever, it is very difficult to prove that a given system
taneously for all local observables F.
shows a phase transition. For instance, existence of
a liquidgas phase transition is still an open Having set the definition of infinite-volume
problem in systems of the type considered until state consider a local observable G(X) and let
the section Lattice models below.
G(X) = G(X ), 2 Rd , with X denoting the
4. A remarkable unification of the theory of the configuration X in which all particles are trans-
equilibrium ensembles emerges: all distributions of lated by : then an infinite-volume state is called
any ensemble describe equilibrium states. If a a pure state if for any pair of local observables
boundary condition is fixed once and for all, then F, G it is
some equilibrium states might fail to be described
hF
Gi hFih
Gi! 0 22
by an element of an ensemble. However, if all !1
boundary conditions are allowed then all equili-
which is called a cluster property of the pair F, G.
brium states should be realizable in a given
The result alluded to in remark (6) is that at least in
ensemble by varying the boundary conditions.
the case of hard-core systems (or of the simple lattice
5. The analysis leads us to consider as completely
systems discussed in the section Lattice models) the
equivalent without exceptions grand canonical,
infinite-volume equilibrium states in the above sense
canonical, or microcanonical ensembles enlarged
exhaust at least the totality of the infinite-volume
by adding to them the distributions with poten-
pure states. Furthermore, the other states that can be
tial energy augmented by the interaction with
obtained in the same way are convex combinations of
fixed external particles.
the pure states, i.e., they are statistical mixtures of
6. The above picture is really proved only for
pure phases. Note that h
Gi cannot be replaced, in
special classes of models (typically in models
general, by hGi because not all infinite-volume states
in which particles are constrained to occupy
are necessarily translation invariant and in simple
points of a lattice and in systems with hard core
cases (e.g., crystals) it is even possible that no
interactions, r0 > 0 in [14]) but it is believed to
translation-invariant state is a pure state.
be correct in general. At least it is consistent
with all that is known so far in classical Remarks
statistical mechanics. The difficulty is that,
1. This means that, in the latter models, general-
conceivably, one might even need boundary
izing the boundary conditions, for example
conditions more complicated than the fixed
considering external particles to be not identical
particles boundary conditions (e.g., putting
to the ones inside the system, using periodic or
different particles outside, interacting with
partially periodic boundary conditions, or the
the system with an arbitrary potential, rather
widely used alternative of introducing a small
than via ).
auxiliary potential and first taking the infinite-
The discussion of the equivalence of the ensembles volume states in presence of it and then letting
and the question of the importance of boundary the potential vanish, does not enlarge further the
conditions has already imposed the consideration set of states (but may sometimes be useful: an
of several limits as ! 1. Occasionally, it will example of a study of a phase transition by using
again come up. For conciseness, it is useful to set up the latter method of small fields will be given in
a formal definition of equilibrium states of an the section Continuous symmetries: no d = 2
infinite-volume system: although infinite volume is crystal theorem).
62 Introductory Article: Equilibrium Statistical Mechanics
2. If is the indicator function of a local event, it both sides of the equations of motion, mqi = f i , by
will make sense to consider the probability of (1=2)qi and summing over i, it follows that
occurrence of the event in an infinite-volume state
defining it as hi. In particular, the probability 1X N
1X N
def 1
mqi q
i qi f i Cq
density for finding p particles at x1 , x2 , . . . , xp , 2 i1 2 i1 2
called the p-point correlation function, will thus be
defined in an infinite-volume state. For instance, and the quantity C(q) defines the virial of the forces
if the state is obtained as a limit of canonical in the configuration q. Note that C(q) is not
states h in with parameters , , = Nn =Vn , in a translation invariant because of the presence of the
sequence of containers n , then forces due to the walls.
* + Writing the force f i as a sum of the internal and
X
Nn
the external forces (due to the walls) the virial C can
x lim x qj
n
j1
be expressed naturally as sum of the virial Cint of the
n
* + internal forces (translation invariant) and of the
X p
Nn Y
virial Cext of the external forces.
x1 ; x2 ; . . . ; xp lim xj qij
n
i1 ;...;ip j1
By dividing both sides of the definition of the
n
virial by
and integrating over the time interval
where the sum is over the ordered p-ples [0,
], one finds in the limit
! 1, that is, up to
(j1 , . . . , jp ). Thus, the pair correlation (q, q0 ) quantities relatively infinitesimal as
! 1, that
and its possible cluster property are
hKi 12hCi and hCext i 3pV
0
q; q
where p is the pressure and V the volume. Hence
R 0
def n expUq; q ; q1 ; . . . ; qNn 2 dq1 dqNn 2
lim hKi 32 pV 12hCint i
n Nn 2!Zc0 ;; Vn
(see Figure 1) representing a sequence of possible and call P0 (v) the (-independent) product of times
macroscopic equilibrium states (the ones correspond- the pressure of the hard-core system without any
ing to the plateau) or states with extremely long time attractive tail (P0 (v) is not explicitly known except
of stability (metastable) represented by the curved if d = 1, in which case it is P0 (v)(v b) = 1, b = r0 ),
part. This would be an isothermal Carnot cycle which, and let
therefore, could not produce Z
H work: since the work 1
produced in the cycle (i.e., pdv) is the signed area a j1 qjdq
enclosed by the cycle the rule just means that the area is 2 jqj>r0
zero. The argument is doubtful at least because it is not If p(, v;
) is the pressure when
> 0 then it can be
clear that the intermediate states with p increasing proved that
with v could be realized experimentally or could even
def
be theoretically possible. p; v lim p; v;
!0
A striking prediction of [27], taken literally, is
configuration Q of particles to the left of a reference configuration which, at least for one boundary
particle (located at the origin O, say) and a condition (e.g., periodic or open), has the same
configuration Q0 to the right of the particle (with energy.
Q [ O [ Q0 compatible with the hard cores) is A symmetry is said to be continuous if the
uniformly bounded below. Then a mathematical group of transformations is a continuous group. For
proof can be devised showing that the influence of instance, continuous systems have translational
boundary conditions disappears as the boundaries symmetry if considered in a container with
recede to infinity. One also says that no long-range periodic boundary conditions. Systems with too
order can be established in a one-dimensional case, much symmetry sometimes cannot show phase
in the sense that one loses any trace of the boundary transitions. For instance, the continuous translation
conditions imposed. symmetry of a gas in a container with periodic
The analysis fails if the space dimension is 2: in boundary conditions is sufficient to exclude the
this case, even if the interaction is short-ranged, the possibility of crystallization in dimension d = 2.
energy of interaction between two regions of space To discuss this, which is a prototype of a proof
separated by a boundary is of the order of the which can be used to infer absence of many
boundary area. Hence, one cannot bound above and transitions in systems with continuous symmetries,
below the probability of any two configurations in consider the translational symmetry and a potential
two half-spaces by the product of the probabilities satisfying, besides the usual [14] and with the
of the two configurations, each computed as if the symbols used in [14], the further property that
other was not there. This is because such a bound jqj2 j@ij2 (q)j < Bjqj(d"0 ) , with "0 > 0, for some B
would be proportional to the exponential of the holds for r0 < jqj R. This is a very mild extra
surface of separation, which tends to 1 when the requirement (and it allows for a hard-core
surface grows large. This means that we cannot interaction).
consider, at least not in general, the configurations Consider an ideal crystal on a square lattice
in the two half-spaces as independently distributed. (for simplicity) of spacing a, exactly fitting in its
Analytically, a condition on the potential suffi- container of side L assumed with periodic
cient to imply that the energy between a configura- boundary conditions: so that N = (L=a)d is the
tion to the left and one to the right of the origin is number of particles and ad is the density, which is
bounded below, if d = 1, is simply expressed by supposed to be smaller than the close packing
Z 1 density if the interaction has a hard core. The
rjrjdr < 1 for r0 > r0 probability distribution of the particles is rather
r0 trivial:
Therefore, in order to have phase transitions in XY dQ
d = 1, a potential is needed that is so long range qpn a n
p n
N!
that it has a divergent first moment. It can be
shown by counterexamples that if the latter condi- the sum running over the permutations m ! p(m) of
tion fails there can be phase transitions even in the sites m 2 , m 2 Zd , 0 < mi La1 . The density
d = 1 systems. at q is
The results just quoted also apply to discrete * +
X X
N
models like lattice gases or lattice spin models that bq q a n q qj
will be considered later in the article. n j1
For more details, we refer the reader to Landau
and Lifschitz (1967), Dyson (1969), Gallavotti and its Fourier transform is proportional to
(1999), and Gallavotti et al. (2004). * +
def 1
X 2
k eik qj ; k n; n 2 Zd
N j
L
Continuous Symmetries: No d = 2
(k) has value 1 for all k of the form K = (2=a)n
Crystal Theorem
and (1=N)O( maxc = 1, 2 jeikc a 1j2 ) otherwise. In
A second case in which it is possible to rule out presence of interaction, it has to be expected that,
existence of phase transitions or at least of certain in a crystal state, (k) has peaks near the values K:
kinds of transitions arises when the system under but the value of (k) can depend on the boundary
analysis enjoys large symmetry. By symmetry is conditions.
meant a group of transformations acting on the Since the system is translation invariant a crystal
configurations and transforming each of them into a state defined as a state with a distribution close to ,
66 Introductory Article: Equilibrium Statistical Mechanics
symmetry enjoyed by the system in finite volume defined, in the grand canonical distribution with
and under suitable boundary conditions is called a parameters , (and empty boundary conditions), by
spontaneous symmetry breaking. It is yet another X1
def 1
manifestation of instability with respect to changes q1 ; .. .; qn znm
in boundary conditions, hence its occurrence reveals Zgc ; ; V m0
Z
a phase transition. There is a large class of systems dy dym
eq1 ;...;qn ;y1 ;...;ym 1 32
for which an infrared inequality implies absence of m!
spontaneous symmetry breaking: in most of the one-
This is the probability density for finding particles
or two-dimensional systems a continuous symmetry
with any momentum in the volume element dq1 dqn
cannot be spontaneously broken.
(irrespective of where other particles are), and
p
The limitation to dimension d 2 is a strong
z = e ( 2m 1 h2 )d accounts for the integration
limitation to the generality of the applicability of
over the momenta variables and is called the activity:
infrared theorems to exclude phase transitions.
it has the dimension of a density (cf. [23]).
More precisely, systems can be divided into classes
Assuming that the potential has a hard core (for
each of which has a critical dimension below
simplicity) of radius R, the interaction energy
which too much symmetry implies absence of
q1 (q2 , . . . , qn ) of a particle at q1 with any number
phase transitions (or of certain kinds of phase
of other particles at q2 , . . . , qm with jqi qj j > R is
transitions).
bounded below by B for some B 0 (related but
It should be stressed that, at the critical dimen-
not equal to the B in [14]). The functions will be
sion, the symmetry breaking is usually so weakly
regarded as a sequence of functions of one, two, . . .
forbidden that one might need astronomically large
particle positions: = { (q1 , . . . , qn )}1
n = 1 vanish-
containers to destroy small effects (due to boundary
ing for qj 62 . Then, one checks that
conditions or to very small fields) which break the
symmetry. For example, in the crystallization just q1 ; . . . ; qn zn;1 q1 K q1 ; . . . ; qn 33a
discussed, the Fourier transform peaks are only
p with
bounded by O(1= log "1 ). Hence, from a practical
point of view, it might still be possible to have some def
K q1 ; ... ;qn eq1 q2 ;...;qn q2 ;.. .;qn n>1
kind of order even in large containers. 1 Z
X dy1 dys Y s
The reader is referred to Mermin (1968), Hohen- eq1 yk 1
berg (1969), and Ruelle (1969). s1 s! k1
q2 ;. ..;qn ; y1 ; ...; ys 33b
High Temperature and Small Density where n,1 , n>1 are Kronecker deltas and (q) is the
indicator function of . Equation [33] is called the
There is another class of systems in which no phase KirkwoodSalzburg equation for the family of corre-
transitions take place. These are the systems with lation functions in . The kernel K of the equations is
stable and tempered interactions (e.g., those independent of , but the domain of integration is .
satisfying [14]) in the high-temperature and low- Calling the sequence of functions
density region. The property is obtained by showing (q1 , . . . , qn ) 0 if n 6 1 and (q) = z (q), a
that the equation of state is analytic in the variables recursive expansion arises, namely
(, ) near the origin (0, 0).
A simple algorithm (Mayers series) yields the z z2 K z3 K2 z4 K3 34
coefficients of the virial series
It gives the correlation functions, provided the series
X
1 converges. The inequality
p; ck k Z p
k2
jKp q1 ;. ..;qn j e2B1p jeq 1jdq
It has the drawback that the kth order coefficient ck ()
is expressed as a sum of many terms (a number def 2B1p
e r3p 35
growing more than exponentially fast in the order k)
and it is not so easy (but possible) to show shows that the series [34], called Mayers series,
combinatorially that their sum is bounded exponen- converges if jzj < e(2B1) r()3 . Convergence is
tially in k if is small enough. A more efficient uniform (as ! 1) and (Kp ) (q1 , ... , qn ) tends to
approach leads quickly to the desired solution. a limit as V ! 1 at fixed q1 , ... , qn and the limit is
def P
Denoting F(q1 , . . . , qn ) = i<j (qi qj ), consider simply (Kp )(q1 , .. ., qn ), if (q1 , .. ., qn ) 0 for n 6 1,
the (spatial or configurational) correlation functions and (q1 ) 1. This is because the kernel K contains
68 Introductory Article: Equilibrium Statistical Mechanics
the factors (e(q1 y) 1) which decay rapidly or, if therefore their configurations do not contain
has finite range, will eventually even vanish. It momentum variables.
is also clear that (Kp )(q1 , ... , qn ) is translation The interaction energy is just the potential
invariant. energy, and ensembles are defined as collections of
Hence, if jzje2B1 r()3 < 1, the limits, as ! 1, probability distributions on the position coordinates
of the correlation functions exist and can be of the particle configurations. Usually, the potential
computed by a convergent power series in z; the is a pair potential decaying fast at 1 and, often,
correlation functions will be translation invariant (in with a hard-core forbidding double or higher
the thermodynamic limit). occupancy of the same lattice site. For instance,
In particular, the one-point correlation function the lattice gas with potential , in a cubic box
= (q) is = z(1 O(zr()3 )), which, to lowest order with jj = V = Ld sites of a square lattice with mesh
in z, just shows that activity and density essentially a > 0, is defined by the potential energy attributed
coincide when they are small enough. Furthermore, to the configuration X of occupied distinct sites,
p = (1=V) log Zgc (, , V) is such that i.e., subsets X
:
Z X
1
z@z p q dq HX x y 37
V x;y2X
The canonical and grand canonical ensembles in the can be shown to exist by a method similar to the
box with respective parameters (, m) or (, h) will one discussed in Appendix 2. They have convexity
be defined as the probability distributions
P on the spin and continuity properties as in the cases of the
configurations s = {x }x2 with x2 x = M = mV
continuum systems. In the case of a lattice gas, the
or without constraint on M, respectively; hence, f , p functions are still interpreted as free energy
P and pressure, respectively. In the case of spin, f (, h)
exp x;y x yx y has the interpretation of magnetic free energy,
p;m s while g(, m) does not have a special name in the
Zcs ; M;
thermodynamics of magnetic systems. As in the
p;h s 40
continuum systems, it is occasionally useful to define
P P
exp h x x;y x yx y infinite-volume equilibrium states:
Zgc
s ; h; Definition An infinite-volume state with para-
meters (, h) or (, m) is a collection of average
where the denominators are normalization factors
values F ! hFi obtained, respectively, as limits of
again called, respectively, the canonical and grand
finite-volume averages hFin defined from canonical
canonical partition functions. As in the study of the
or grand canonical distributions in n with fixed
previous continuous systems, canonical and grand
parameters (, h) or (, m), or (u, v) and with general
canonical ensembles with external fixed particle
boundary condition of fixed external spins or empty
configurations can be defined together with the
sites, on sequences n ! 1 for which such limits
corresponding ensembles with external fixed spin
exist simultaneously for all local observables F.
configurations; the subscript s stands for spins.
For each configuration X
of a lattice gas, let This is taken verbatim from the definition in the
{nx } be nx = 1 if x 2 X and nx = 0 if x 62 X. Then the section Phase transitions and boundary condi-
transformation x = 2nx 1 establishes a correspon- tions. In this way, it makes sense to define the
dence between lattice gas and spin distributions. In spin correlation Q functions for X = (x1 , . . . , xn ) as
the correspondence, the potential (x y) of the hX i if X = j xj . For instance, we shall call
def
lattice gas generates a potential (1=4)(x y) for the (x1 , x2 ) = hx1 x2 i and a pure phase can be defined
corresponding spin system and the chemical potential as an infinite-volume state such that
for the lattice gas is associated with a magnetic
P field
hX Yx i hX ihYx i ! 0 42
h for the spin system with h = (1=2)( x60 (x)). x!1
The correspondence between boundary conditions
Again, for more details, we refer the reader to Ruelle
is natural: for instance, a boundary condition for the
(1969) and Gallavotti (1969).
lattice gas in which all external sites are occupied
becomes a boundary condition in which external
sites contain a spin . The close relation between
lattice gas and spin systems permits switching from Thermodynamic Limits and Inequalities
one to the other with little discussion. An interesting property of lattice systems is that it is
In the case of spin systems, empty boundary possible to study delicate questions like the existence
conditions are often considered (no spins outside ). of infinite-volume states in some (moderate) generality.
In lattice gases and spin systems (as well as in A typical tool is the use of inequalities. As the simplest
continuum systems), often periodic and semiperiodic example of a vast class of inequalities, consider the
boundary conditions are considered (i.e., periodic in ferromagnetic Ising model with some finite (but
one or more directions and with empty or fixed arbitrary) range interaction Jxy 0 in a field hx 0 :
external particles or spins in the others). J, h may even be not translationally invariant. Then
Thermodynamic limits for the partition functions def
the average of X = x1 x2 xn , X = (x1 , . . . , xn ),
1 in a state with empty boundary conditions (i.e., no
f ; v lim log Zcp ; N; external spins) satisfies the inequalities
!1
V=Nv
N
1 gc
hX i; @hx hX i; @Jxy hX i 0 X = x1 ; . . . ; xn
p; lim log Zp ; ;
!1 V More generally, let H(s) in [39] be replaced by
41 P
1 H(s) = X JX X with JX 0 and X can be any
g; m lim log Zcs ; M;
!1; V finite set; then, if Y = (y1 , . . . , yn ), X = (x1 , . . . , xn ),
M=V!m
the following Griffiths inequalities hold:
1
f ; h lim log Zgc
s ; ;
!1 V hX i 0; @JY hX i hX Y i hX ihY i 0 43
70 Introductory Article: Equilibrium Statistical Mechanics
which some of the opposite sides of are because the last ratio in [46] does not exceed 1.
identified while or conditions are assigned on Note that there are >3p different shapes of
with
the remaining sides: call these cylindrical or perimeter p and at most p2 congruent
s containing
semiperiodic boundary conditions. x; therefore, the probability that the spin at x is
A new description of the spin configurations is when the boundary condition is satisfies the
useful: given s, draw a unit segment perpendicular inequality
to the center of each bond b having opposite spins at X
1
its extremes. An example of this construction is P; p2 3p e2Jp ! 0
provided by Figure 2 for the boundary condition . !1
p4
The set of segments can be grouped into lines
separating regions where the spins are positive from This probability can be made arbitrarily small so
regions where they are negative. If the boundary that hx i, is estimated by a quantity which is as
condition is or , the lines form closed polygons, close to 1 as desired provided is large enough and
whereas, if the condition is , there is also a single the closeness of hx i, to 1 is estimated by a
polygon 1 which is not closed (as in Figure 2). If the quantity which is both x and independent.
boundary condition is periodic or cylindrical, all A similar argument for the ()-boundary condition,
polygons are closed but some may go around . or the remark that for h = 0 it is hx i, = hx i, ,
The polygons are also called contours and the length leads to conclude that, at large , hx i, 6 hx i,
of a polygon
will be denoted j
j. and the difference between the two quantities
The correspondence (
1 ,
2 , . . . ,
n , 1 ) ! s, for is positive uniformly in . This is the proof
the boundary condition or, for the boundary (Peierls theorem) of the fact that there is, if is
condition (or ), s ! (
1 , . . . ,
n ) is one-to-one large, a strong instability, of the magnetization with
and, if h = 0, the energy H (s) of a configuration is respect to the boundary conditions, i.e., the nearest-
higher than J(number of bonds in ) P by an neighbor Ising model in dimension 2 (or greater, by an
P
amount 2J(j 1 j i j
i j) or, respectively, 2J i j
i j. identical argument) has a phase transition. If the
The grand canonical probability of each spin dimension is 1, the argument clearly fails and no phase
configuration is therefore proportional, if h = 0, transition occurs (see the section Absence of phase
respectively, to transitions: d = 1).
P P For more details, see Gallavotti (1999).
e2Jj 1 j i j
i j or e2J i j
i j 45
and the updown symmetry is clearly reflected
by [45]. Finite-Volume Effects
The average hx i, of with boundary
The description in the last section of the phase
conditions is given by hx i, = 1 2P, (), where
transition in the nearest-neighbor Ising model can be
P, () is the probability that the spin x is 1. If the
made more precise both from physical and mathe-
site x is occupied by a negative spin then the point x is
matical points of view giving insights into the nature
inside some contour
associated with the spin
of the phase transitions. Assume that the boundary
configuration s under consideration. Hence, if (
)
condition is the ()-boundary condition and
is the probability that a given contour belongs to
describe a spin configuration s by means of the
the set of contours
P describing a configuration s, it
associated closed disjoint polygons (
1 , . . . ,
n ).
is P, ()
ox (
) where
ox means that
Attribute to s = (
1 , . . . ,
n ) a probability propor-
surrounds x.
tional to [45]. Then the following MinlosSinais
If = (
1 , . . . ,
n ) is a spin configuration and if
theorem holds:
the symbol comp
means that the contour
is
disjoint from
1 , . . . ,
n (i.e., {
[ } is a new spin Theorem If is large enough there exist C > 0,
configuration), then (
) > 0 with (
) e2Jj
j and such that a spin
P configuration s randomly chosen out of the grand
P 2J j
0 j
3
e
0 2 canonical distribution with boundary conditions
P P and h = 0 will contain, with probability approaching
2J j
0 j
e
0 2
1 as ! 1, a number K(
) (s) of contours con-
P P
e
2J
0 2
j
0 j gruent to
such that
2Jj
j comp
e P 2J P 0 j
0 j p
e
2 jK
s
jjj C jj eJj
j 47
e2Jj
j 46 and this relation holds simultaneously for all
s.
72 Introductory Article: Equilibrium Statistical Mechanics
Thus, there are very few contours (and the larger analyticity holds at all h. For large, the function
they are the smaller is, in absolute and relative f (, h) has an essential singularity at h = 0: a result
value, their number): a typical spin configuration in that can be interpreted as excluding a naive theory
the grand canonical ensemble with ()-boundary of metastability as a description of states governed
conditions is such that the large majority of the spins by an equation of state obtained from an analytic
is positive and, in the sea of positive spins, there continuation to negative values of h of f (, h).
are a few negative spins distributed in small and The above considerations and results further
rare regions (their number, however, is still of order clarify the meaning of a phase transition for a
of jj). finite system. For more details, we refer the
Another consequence of the analysis in the last reader to Gallavotti (1999) and Friedli and Pfister
section concerns the the approximate equation of (2004).
state near the phase transition region at low
temperatures and finite . If is finite, the graph
of h versus m (, h) will have a rather different
Beyond Low Temperatures
behavior depending on the possible boundary con-
ditions. For example, if the boundary condition is
(Ferromagnetic Ising Model)
() or (), one gets, respectively, the results A limitation of the results discussed above is the
depicted in Figure 3a and 3b, where m () denotes condition of low temperature ( large enough).
def
the spontaneous magnetization (i.e., m () = A natural problem is to go beyond the low-
limh!0 lim!1 m (, h)). temperature region and to describe fully the phe-
With periodic or empty boundary conditions, the nomena in the region where boundary condition
diagram changes as in Figure 4. The thermody- instability takes place and first develops. A number
namic limit m(, h) = lim!1 m (, h) exists for all of interesting partial results are known, which
h 6 0 and the resulting graph is in Figure 4b, considerably improve the picture emerging from
which shows that at h = 0 the limit is discontin- the previous analysis. A striking list, but far from
uous. It can be proved, if is large enough, that exhaustive, of such results follows and focuses on
1 > limh ! 0 @h m(, h) = () > 0 (i.e., the angle the properties of ferromagnetic Ising spin systems.
between the vertical part of the graph and the rest The reason for restricting to such cases is that they
is sharp). are simple enough to allow a rather fine analysis,
Furthermore, it can be proved that m(, h) is which sheds considerable light on the structure of
analytic in h for h 6 0. If is small enough, statistical mechanics suggesting precise formulation
m(, h) m(, h)
1 1
m*() m*()
O(||1/2) O(||1/2) O (||1/2) O (||1/2)
h h
m*() m*()
(a) (b)
Figure 3 The h vs m (, h) graphs for finite and (a) and (b) conditions.
m(, h) m(, h)
1 1
m*() m*()
O(||1/2) O(||1/2)
h h
m*() m*()
(a) (b)
Figure 4 (a) The h vs m (, h) graph for periodic or empty boundary conditions. (b) The discontinuity (at h = 0) of the thermodynamic limit.
Introductory Article: Equilibrium Statistical Mechanics 73
of the problems that it would be desirable to the unit circle) in the z-plane. Then, if J0 6 0,
understand in more general systems. they lie in a closed set N 1 , -independent and
def contained in a neighborhood of N of width
1. Let z = e h and consider that the product of zV
shrinking to 0 when jjJ0 jj ! 0. This allows to
(V is the number of sites jj of ) times the
establish various relations between analyticity
partition function with periodic or perfect-wall
properties and boundary condition instability
boundary conditions and with finite-range
as described in (3) below.
ferromagnetic interaction, not necessarily nearest-
3. In the ferromagnetic Ising model, with not necessa-
neighbor; a polynomial in z (of degree 2V)
rily a nearest-neighbor interaction, one says that
is thus obtained. Its zeros lie on the unit
there is a gap around 0 if d () = 0 near = 0. It
circle jzj = 1: this is LeeYangs theorem. It
can be shown that if is small enough there is a gap
implies that the only singularities of f (, h) in
for all h of width uniform in h.
the region 0 < < 1, 1 < h < 1 can be
4. Another question is whether the boundary
found at h = 0.
condition instability is always revealed by the
A singularity can appear only if the point z = 1
one-spin correlation function (i.e., by the magne-
is an accumulation point of the limiting distribu-
tization) or whether it might be shown only
tion (as ! 1) of the zeros on the unit circle: if
by some correlation functions of higher order. It
the zeros are z1 , . . . , z2V then
can be proved that no boundary condition
instability occurs for h 6 0; at h = 0 it is possible
1 only if
log zV Z; h; ; periodic
V
1X 2V lim m; h 6 lim m; h
h!0
50
2J h logz zi h!0
V i1
5. A consequence of the Griffiths inequalities
and if (cf. the section Thermodynamic limits and
inequalities) is that if [50] is true for a given
V 1 number of zeros of the form 0 then it is true for all > 0 . Therefore, item
d (4) leads to a natural definition of the critical
zj eij ; j d !
!1 2 temperature Tc as the least upper bound of the
it is T s such that [50] holds (kB T = 1 ).
Z 6. If d = 2 the free energy of the nearest-neighbor
1 ferromagnetic Ising model has a singularity
f ; h 2J logz ei d 48
2 at c and the value of c is known exactly
from the exact solutions of the model:
The existence of the measure d () follows from def
m(, 0 ) = m () (1 sinh4 2J)1=8 . The loca-
the existence of the thermodynamic limit: but tion and nature of the singularities of f (, 0) as a
d () is not necessarily d-continuous, i.e., not function of remains an open question for d = 3.
necessarily proportional to d. In particular, the question whether there is a
2. It can be shown that, with not necessarily a singularity of f (, 0) at = c is open.
nearest-neighbor interaction, the zeros of the 7. For < c there is instability with respect to
partition function do not move too much under boundary conditions (see (6) above) and a
small perturbations of the potential even if one natural question is: how many pure phases
perturbs the energy (at perfect-wall or periodic can exist in the ferromagnetic Ising model?
boundary conditions) into (cf. the section Phase transitions and boundary
H0 s H s H s conditions, eqn [22]). Intuition suggests
X that there should be only two phases: the
H s J0 X X 49
positively magnetized and the negatively
X
magnetized ones.
0 One has to distinguish between translation-
where J (X) is very general and defined on
subsets X = (x1 , . . .P
, xk )
such that the quan- invariant pure phases and non-translation-invariant
tity jjJ0 jj = supy2Zd y2X jJ0 (X)j is small enough. ones. It can be proved that, in the case of the
More precisely, with a ferromagnetic pair two-dimensional nearest-neighbor ferromagnetic
potential J fixed, suppose that one knows that, Ising models, all infinite-volume states (cf. the
when J0 = 0, the partition function zeros in the section Lattice models) are translationally invar-
variable z = eh lie in a certain closed set N (of iant. Furthermore, they can be obtained by
74 Introductory Article: Equilibrium Statistical Mechanics
considering just the two boundary conditions external cause favoring the occupation of a part of
and : the latter states are also pure states for the volume by a single phase. Such an asymmetry
models with non-nearest-neighbor ferromagnetic can be obtained in at least two ways: through a
interaction. The solution of this problem has led to weak uniform external field (in complete analogy with
the introduction of many new ideas and techniques the gravitational field in the liquidvapor transition) or
in statistical mechanics and probability theory. through an asymmetric field acting only on boundary
8. In any dimension d 2, for large enough, it can spins. The latter should have the same qualitative
be proved that the nearest-neighbor Ising model effect as the former, because in a phase transition
has only two translation-invariant phases. If the region a boundary perturbation produces volume
dimension is 3 and is large, the and effects (see sections Phase transitions and inequal-
phases exhaust the set of translation-invariant ities and Symmetry-breaking phase transitions).
pure phases but there exist non-translation- From a mathematical point of view, it is simpler to
invariant phases. For close to c , however, the use a boundary asymmetry to produce phase separa-
question is much more difficult. tions and the simplest geometry is obtained by
considering -cylindrical or -cylindrical boundary
For more details, see Onsager (1944), Lee and
conditions: this means or boundary conditions
Yang (1952), Ruelle (1971), Sinai (1991), Gallavotti
periodic in one direction (e.g., in Figure 2 imagine the
(1999), Aizenman (1980), Higuchi (1981), and
right and left boundary identified after removing the
Friedli and Pfister (2004).
boundary spins on them).
Spins adjacent to the bases of act as symmetry-
Geometry of Phase Coexistence breaking external fields. The -cylindrical bound-
ary condition should favor the formation inside
Intuition about the phenomena connected with the of the positively magnetized phase; therefore, it
classical phase transitions is usually based on the will be natural to consider, in the canonical
properties of the liquidgas phase transition; this distribution, this boundary condition only when
transition is usually experimentally investigated in
the total magnetization is fixed to be the sponta-
situations in which the total number of particles is
neous magnetization m ().
fixed (canonical ensemble) and in presence of an On the other hand, the -boundary condition
external field (gravity). favors the separation of phases (positively magnetized
The importance of such experimental conditions phase near the top of and negatively magnetized
is obvious; the external field produces a nontransla- phase near the bottom). Therefore, it will be natural
tionally invariant situation and the corresponding to consider the latter boundary condition in the
separation of the two phases. The fact that the case of a canonical distribution with magnetization
number of particles is fixed determines, on the other
m = (1 2)m () with 0 < < 1 ([51]). In the latter
hand, the fraction of volume occupied by each of the
case, the positive phase can be expected to adhere to
two phases. the top of and to extend, in some sense to be
Once more, consider the nearest-neighbor ferro- discussed, up to a distance O(L) from it; and then to
magnetic Ising model: the results available for it can change into the negatively magnetized pure phase.
be used to obtain a clear picture of the solution to To make the phenomenological description
problems that one would like to solve but which in precise, consider the spin configurations s through
most other models are intractable with present-day the associated sets of disjoint polygons (cf. the
techniques.
section Symmetry-breaking phase transitions). Fix
It will be convenient to discuss phase coexistence in
the boundary conditions to be or -cylindrical
the canonical ensemble distributions on configurations boundary conditions and note that polygons asso-
of fixed total magnetization M = mV (see the section ciated with a spin configuration s are all closed and
Lattice models; [40]). Let be large enough to be in of two types: the ones of the first type, denoted
the two-phase region and, for a fixed 2 (0, 1), let
1 , . . . ,
n , are polygons which do not encircle ; the
m m 1 m second type of polygons, denoted by the symbols ,
are the ones which wind up, at least once, around .
1 2 m 51
So, a spin configuration s will be described by a set
that is, m is in the vertical part of the diagram of polygons; the statistical weight of a configuration
m = m(, h) at fixed (see Figure 4). s = (
1 , . . . ,
n , 1 , . . . , h ) is (cf. [45]):
Fixing m as in [51] does not yet determine the P
separation of the phases in two different regions; for P
2J i
j
i j j
j j j
this effect, it will be necessary to introduce some e 52
Introductory Article: Equilibrium Statistical Mechanics 75
The reason why the contours that go around where (
) e2Jj
j is the same quantity as
the cylinder are denoted by (rather than by
) is already mentioned in the text of the theorem of
that they look like open contours (see the section Finite-volume effects. A similar result holds for
Symmetry-breaking phase transitions) if one forgets the contours below (cf. the comments on [47]).
that the opposite sides of have to be identified. In the
The above theorem not only provides a detailed and
case of the -boundary conditions then the number of
rather satisfactory description of the phase separation
polygons of -type must be odd (hence 6 0), while for
phenomenon, but it also furnishes a precise micro-
the -boundary condition the number of -type
scopic definition of the line of separation between the
polygons must be even (hence it could be 0).
two phases, which should be naturally identified with
For more details, the reader is referred to Sinai
the (random) line .
(1991) and Gallavotti (1999).
A similar result holds in the canonical distribution
, , m () where (i) is replaced by: no -type
polygon is present, while (ii), (iii) become super-
Separation and Coexistence of Phases fluous, and (iv) is modified in the obvious way. In
other words, a typical configuration for the distribu-
In the context of the geometric description of
tion the , , m () has the same appearance as a
the spin configuration in the last section, consider
typical configuration of the corresponding grand
the canonical distributions with -cylindrical or the
canonical ensemble with ()-boundary condition
-cylindrical boundary conditions and zero field: they
(whose properties are described by the theorem
will be denoted briefly as , , , , respectively.
given in the section Beyond low temperatures
The following theorem (MinlosSinais theorem)
(ferromagnetic Ising model).
provided the foundations of the microscopic theory
For more details, see Sinai (1991) and Gallavotti
of coexistence: it is formulated in dimension d = 2
(1999).
but, modulo obvious changes, it holds for d 2.
Theorem For 0 < < 1 fixed, let m = (1 2)
m (); then for large enough a spin configuration Phase Separation Line and Surface
s = (
1 , . . . ,
n , 1 , . . . , 2h1 ) randomly chosen with Tension
the distribution , enjoys the properties (i)(iv) below
Continuing to refer to the nearest-neighbor Ising
with a , -probability approaching 1 as ! 1:
ferromagnet, the theorem of the last section means
(i) s contains only one contour of -type and that, if is large enough, then the microscopic line ,
separating the two phases, is almost straight (since
jj j 1 "Lj < oL 53
"() is small). The deviations of from a straight line
where "() > 0 is a suitable (-independent) are more conveniently studied in the grand canonical
function of tending to zero exponentially fast distributions 0 with boundary condition set to 1 in
as ! 1. the upper half of @, vertical sites included, and
(ii) If
, denote respectively, the regions above to 1 in the lower half: this is illustrated in Figure 2
and below , and jj V, j j, j j are, (see the section Symmetry-breaking phase transi-
respectively, the volumes of , , then tions). The results can be converted into very
3=4
similar results for grand canonical distributions with
jj
j Vj < V -cylindrical boundary conditions of the last section.
3=4
j
j 1 V j < V 54 Define to be rigid if the probability that passes
through the center of the box (i.e., 0) does not
where () !
! 1 exponentially fast; the expo- tend to 0 as ! 1; otherwise, it is not rigid.
nent 3/4, P
here and below, is not
P optimal. The notion of rigidity distinguishes between the
(iii) If M = x2 x and M
x2 x , then possibilities for the line to be straight. The
3=4
excess length "()L (see [53]) can be obtained in
jM
m Vj < V two ways: either the line is essentially straight (in
3=4
M
1 m Vj < V 55 the geometric sense) with a few bumps distributed
with a density of order "() or, otherwise, it is only
(iv) If K
(s) denotes the number of contours con- locally straight and with an important part of the
gruent to a given
and lying in then, excess length being gained through a small bending
simultaneously for all the shapes of
: on a large length scale. In three dimensions a similar
phenomenon is possible. Rigidity of , or its failure,
jK
s
V j CeJj
j V 1=2 ; C > 0 56 can in principle be investigated by optical means;
76 Introductory Article: Equilibrium Statistical Mechanics
there can be interference of coherent light scattered temperature Tc (the latter being defined as the
by macroscopically separated surface elements of highest temperature below which there are at least
only if is rigid in the above sense. two pure phases). The temperature T ec , whose
It has been rigorously proved that, the line is not existence is rather well established in numerical
rigid in dimension 2. And, at least at low tempera- experiments, would be called the roughening
ture, the pfluctuation
of the middle point is of the transition temperature. The rigidity of is con-
order O( L). In dimension 3 however, it has been nected with the existence of translationally non-
shown that the surface is rigid at low enough invariant equilibrium states. The latter exist in
temperature. dimension d = 3, but not in dimension d = 2, where
A deeper analysis is needed to study the shape of the discussed nonrigidity of , established all the
the separation surface under other conditions, for way to Tc , provides the intuitive reason for the
example, with boundary conditions in a canoni- absence of non-translation-invariant states. It has
cal distribution with magnetization intermediate been shown that in d = 3 the roughening tempera-
between m (). It involves, as a prerequisite, the ture Tec () necessarily cannot be smaller than the
definition and many properties of the surface critical temperature of the two-dimensional Ising
tension between the two phases. Here only model with the same coupling.
the definition of surface tension in the case of Note that existence of translationally noninvar-
-boundary conditions in the two-dimensional case iant equilibrium states is not necessary for the
will be mentioned. If Z (, m ()) and Z (, m) description of coexistence phenomena. The theory
are, respectively, the canonical partition functions of the nearest-neighbor two-dimensional Ising model
for the - and -cylindrical boundary conditions is a clear proof of this statement.
the tension
() is defined as The reader is referred to Onsager (1944), van
Beyeren (1975), Sinai (1991), Miracle-Sole (1995),
1 Z ; m Pfister and Velenik (1999), and Gallavotti (1999) for
lim log
!1 L Z ; m more details.
distribution. In systems with short-range interaction This means that if i are regarded as points in R d
(i.e., with (r) vanishing for jrj large enough) the there are functions 2n such that
point c is a critical point if the pair correlation tends
1 2n1
to 0 (see [57]), slower than exponential (e.g., as a 2n 0; ; . . . ; !2n 2n 0; 1 ; . . . ; 2n1
power of the distance jrj = jq1 q2 j).
A typical example is the two-dimensional Ising 0< 2R 59
model on a square lattice and with nearest-neighbor
and h0 1 . . . 2n1 i / 2n (0, 1 , . . . , 2n1 ) if 1
ferromagnetic interaction of size J. It has a single
jxi xj j l0 (). The numbers !2n define a sequence
critical point at = c , h = 0 with sinh 2c J = 1. The
of critical exponents.
cluster property is that hx y i hx ihy i! 0 as
jxyj!1 Other critical exponents can be associated with
jxyj jxyj approaching the critical point along other directions
e e
A p ; A (e.g., along h ! 0 at = c ). In this case, the length up
jx yj jx yj2 to which there are scaling phenomena is l0 (h) = o h .
1 Further, the magnetization m(h) tends to 0 as h ! 0 at
Ac ; 58
jx yj1=4 fixed = c as m(h) = m0 h1= for > 0.
None of the feautres of critical exponents is known
for < c , > c , or = c , respectively, where rigorously, including their existence. An exception is the
A (), Ac , () > 0. The properties [58] stem from case of the twodimensional nearest-neighbor Ising
the exact solution of the model. ferromagnet where some exponents are known exactly
At the critical point, several interesting phenom- (e.g., !2 = 1=4, !2n = n!2 , or = 1, while , are not
ena occur: the lack of exponential decay indicates rigorously known). Nevertheless, for Ising ferromag-
lack of a length scale over which really distinct nets (not even nearest-neighbor but, as always here,
phenomena can take place, and properties of the finite-range) in all dimensions, all of the exponents
system observed at different length scales are likely mentioned are conjectured to be the same as those
to be simply related by suitable scaling transforma- of the nearest-neighbor Ising ferromagnet. A further
tions. Many efforts have been dedicated at finding exception is the derivation of rigorous relations
ways of understanding quantitatively the scaling between critical exponents and, in some cases, even
properties pertaining to different observables. The their values under the assumption that they exist.
result has been the development of the renormaliza-
Remark Naively it could be expected that in a pure
tion group approach to critical phenomena (cf. the
state in P zero field with hx i = 0 the quantity
section Renormalization group). The picture that
s = jj1=2 x2 x , if is a cubic box of side ,
emerges is that the closer the critical point is the
should have a probability distribution which is
larger becomes the maximal scale of length below
Gaussian, with dispersion lim!1 hs2 i. This is
which scaling properties are observed. For instance,
usually true, but not always. Properties [58]
in a lattice spin system in zero field the magnetiza-
show that in the d = 2 ferromagnetic nearest-
tion Mjja in a box
should have essentially
neighbor Ising model, hs2 i diverges proportionally
the same distribution for all s with side < l0 () and 1
to 24 so that the variable s cannot have the above
l0 () ! 1 as ! c , provided a is suitably chosen.
Gaussian distribution. The variable S = jj7=8
P
The number a is called a critical exponent.
There are several other critical exponents that x2 x will have a finite dispersion: however,
there is no reason that it should be Gaussian. This
can be defined near a critical point. They can
makes clear the great interest of a fluctuation theory
be associated with singularities of the thermody-
and its relevance for the critical point studies (see
namic function or with the behavior of
the next two sections).
the correlation functions involving joint densities at
two or more than two points. As an example, For more details, the reader is referred to Onsager
consider a lattice spin system: then the 2nspins (1944), Domb and Green (1972), McCoy and Wu
correlation h0 1 . . . 2n1 ic could behave propor- (1973), and Aizenman (1982).
tionally to 2n (0, 1 , . . . , 2n1 ), n = 1, 2, 3, . . . , for a
suitable family of homogeneous functions n , of
some degree !2n , of the coordinates (1 , . . . , 2n1 ) Fluctuations
at east when the reciprocal distances are large but
As it appears from the discussion in the last section,
< l0 () and
fluctuations of observables around their averages
l0 const: c ! 1 have interesting properties particularly at critical
!0 points. Of particular interest are observables that
78 Introductory Article: Equilibrium Statistical Mechanics
are averages, over large volumes , of local functions function of m(h). If p = M =jj the function F(p) is
F(x) on phase space: this is so because macroscopic given by
observables often have this form. For instance, given
Fp f ; hp f ; h @h f ; hhp h 60
a region inside the system container ,
,
consider a configuration x = (P, Q) and the number then a quite general result is:
P
of particles
P N = q2 1 in , or the potential energy
TheoremP The relations (1)(3) hold if the potential
F = P (q, q0 )2 (q q0 ) or the kinetic energy
satisfies x j(x)j < 1 and if F(p) [60] is smooth
K = q2 (1=2m)p2 . In the case of lattice spin
and F00 (p) 6 0 in open intervals around those in
systems, consider a configuration
P s and, for instance,
which p is considered, that is, around p = 0 for the
the magnetization M = i2 i in . Label the
law of large numbers and for the central limit law or
above four examples by = 1, . . . , 4.
in an open interval containing a, b for the case of the
Let be the probability distribution describing
large deviations law.
the equilibrium state in which the quantities X are
def
considered; let x = hX =jji and p = (X In the cases envisaged, the theory of equivalence
x )=jj. Then typical properties of fluctuations that of ensembles implies that the function F can also be
should be investigated are ( = 1, . . . , 4): computed via thermodynamic functions naturally
associated with other equilibrium ensembles. For
1. for all > 0 it is lim!1 (jpj > ) = 0 (law of
instance, instead of the grand canonical f (, h), one
large numbers);
could consider the canonical g(, m) (see [41]), then
2. there is D > 0 such that
It then follows, heuristically, that the probability The discussion of the last section shows that at
of p in zero field has the form const. e(p)jj dp so the critical point the nature of the large fluctuations
that the probability that p 2 [a, b] will be const is also expected to change: no central limit law is
exp (jj maxp2[a,b] (p)). expected to hold in general because of the example
Conversely, the large deviations law for p at h = 0 of [58] with the divergence of the average of the
implies the validity of the central limit law for the normal second moment of the magnetization in a
fluctuations of p in all small enough fields h: this box as the side tends to 1.
simply arises from the function F(p) having a For more details the reader is referred to Olla
negative second derivative. (1987).
This means that there is a duality between central
limit law and large deviation law or that the law of
Renormalization Group
large deviations is a global version of the central
limit law, in the sense that: The theory of fluctuations just discussed concerns
only fluctuations of a single quantity. The problem
1. if the central limit law holds for h in an interval
of joint fluctuations of several quantities is also
around h0 then the fluctuations of the magnetiza-
interesting and in fact led to really new develop-
tion at field h0 satisfy a large deviation law in a
ments in the 1970s. It is necessary to restrict
small enough interval J around m(h0 ); and
attention to rather special cases in order to illustrate
2. if a large deviation law is satisfied in an interval
some ideas and the philosophy behind the approach.
around h0 then the central limit law holds for the
Consider, therefore, the equilibrium distribution 0
fluctuations of magnetization around its average
associated with one of the classical equilibrium
in all fields h with h h0 small enough.
ensembles. To fix the ideas we consider the
Going beyond the heuristic level in establishing equilibrium distribution of an Ising energy function
the duality amounts to giving a precise meaning to H0 , having included the temperature factor in the
small enough and to discuss which properties of energy: the inclusion is done because the discussion
m(h) and D(h), or F(p) are needed to derive will deal with the properties of 0 as a function of .
properties (1), (2). It will also be assumed that the average of each spin
For purposes of illustration consider the Ising is zero (no magnetic field, see [39] with h = 0).
model with ferromagnetic short range interaction : Keeping in mind a concrete case, imagine that H0
then the central limit law holds for all h if is small is the energy function of the nearest-neighbor Ising
enough and, under the same condition on , the ferromagnet in zero field.
large deviations law holds for all h and all intervals Imagine that the volume of the container has
[a, b]
(1, 1). If is not small then the condition periodic boundary conditions and is very large,
h 6 0 has to be added. Hence, the conditions are ideally infinite. Define the family of blocks kx,
fairly weak and the apparent exceptions concern the parametrized by x 2 Zd and with k an integer,
value h = 0 and not small where the statements consisting of the lattice sites x = {ki xi < (k 1)
may become invalid because of possible phase i }. This is a lattice of cubic blocks with side size k
transitions. that will be called the k-rescaled lattice.
P
In presence of phase transitions, the law of large Given , the quantities mx = kd x2kx x are
numbers, the central limit law, and law of large called the block spins and define the map
deviations should be reformulated. Basically, one R,k 0 = k transforming the initial distribution on
has to add the requirement that fluctuations are the original spins into the distribution of the block
considered in pure phases and change, in a natural spins. Note that if the initial spins have only two
way, the formulation of the laws. For instance, values x = 1, the block spins take values between
the large fluctuations of magnetization in a pure kd=kd and kd=kd at steps of size 2=kd . Further-
phase of the Ising model in zero field and large more, the map R, k makes sense independently of
(i.e., in a state obtained as limit of finite-volume how many values the initial spins can assume, and
states with or boundary conditions) in even if they assume a continuum of values Sx 2 R.
intervals [a, b] which do not contain the average Taking = 1 means, for k large, looking at the
magnetization m are not necessarily exponen- probability distribution of the joint large fluctuations
tially small with the size of jj: if [a, b]
in the blocks kx. Taking = 1=2 corresponds to
[m , m ] they are exponentially small but only studying a joint central limit property for the block
with the size of the surface of (i.e., with variables.
jj(d1)=d) ) while they are exponentially small with Considering a one-parameter family of initial
the volume if [a, b] \ [ m , m ] = ;. distributions 0 parametrized by a parameter
80 Introductory Article: Equilibrium Statistical Mechanics
(that will be identified with the inverse temperature), Note that this theorem is stated without even
typically there will be a unique value () of such mentioning the renormalization maps Rn1=2 : it can
that the joint fluctuations of the block variables nevertheless be interpreted as stating that
admit a limiting distribution, X 1
Rn1=2 H0 ! S2x 65
probk mx 2 ax ; bx ; s 2 n!1
d
2D
x2Z
Z fbx g Y
! g Sx x2 dSx 63 but the interpretation is not rigorous because [64]
k!1 fax g x2 does not state require that Rn1=2 H0 () makes sense
for n 1. It states that at high temperature block
for some distribution g (z) on R . Q spins have normal independent fluctuations: it is
If > (), the limit will then be x2 (Sx ) dSx , therefore an extension of the central limit law.
or if < () the limit will not exist (because the There are a few cases in which the map R can be
block variables will be too large, with a dispersion rigorously shown to be well defined at least when
diverging as k ! 1). acting on special equilibrium states like the high-
It is convenient to choose as sequence of k ! 1 temperature lattice spin systems: but these are
the sequence k = 2n with n = 0, 1, . . . because in this exceptional cases of relatively little interest.
way it is R,k Rn ,1 and the limits k ! 1 along Nevertheless, there is a vast literature dealing with
the sequence k = 2n can be regarded as limits on a approximate representations of the map R . The
sequence of iterations of a map R, 1 acting on the reason is that, assuming not only its existence but
probability distributions of generic spins Sx on the also that it has the properties that one would
lattice Zd (the sequence 3n would be equally normally expect to hold for a map acting on a finite
suited). dimensional space, it follows that a number of
It is even more convenient to consider probability consequences can be drawn; quite nontrivial ones as
distributions that are expressed in terms of energy they led to the first theory of the critical point that
functions H which generate, in the thermodynamic goes beyond the van der Waals theory discribed in
limit, a distribution : then R,1 defines an action the section van der Waals theory.
R on the energy functions so that R H = H 0 if H The argument proceeds essentially as follows. At
generates , H 0 generates 0 and R,1 = 0 . Of the critical point, the fluctuations are expected to be
course, the energy function will be more general anomalous (cf. the last remark P in the section
pCritical
than [39] and at least a form like U in [49] has to points) in the sense that h( x2 x = jj)2 i will
be admitted. tend to 1, because = 1=2 does P not correspond to
In other words, R gives the result of the action the right fluctuation scale of 2 , signaling that
of R,1 expressed as a map acting on the energy Rn1=2,1 0 (c ) will not have a limit but, possibly, there
functions. Its iterates also define a semigroup is c > 1=2 such that Rn c ,1 0 (c ) converges to a limit
which is called the block spin renormalization in the sense of [63]. In the case of the critical nearest-
group. neighbor Ising ferromagnetic c = 7=8 (see ending
While the map R,1 is certainly well defined as a remark in the section Critical points). Therefore, if
map of probability distributions into probability the map Rc , 1 is considered as acting on 0 (), it will
distributions, it is by no means clear that R is well happen that forQall < c , Rn c ,1 0 (c ) will converge to
defined as a map on the energy functions. Because, if a trivial limit x2 (Sx ) dSx because the value c is
is given by an energy function, it is not clear that greater than 1/2 while normal fluctuations are expected.
R,1 is such. If the map Rc can be considered Q as a map on the
A remarkable theorem can be (easily) proved energy functions, this says that x2 (Sx ) dSx is a
when R, 1 and its iterates act on initial 0 s which (trivial) fixed point of the renormalization group
are equilibrium states of a spin system with short- which attracts the energy functions H0 corre-
range interactions and at high temperature ( small). sponding to the high-temperature phases.
In this case, if = 1=2, the sequence of distributions The existence of the critical c can be associated
Rn
1=2,1 0 () admits a limit which is given by with the existence of a nontrivial fixed point H for
a product of independent Gaussians: Rc which is hyperbolic with just one Lyapunov
exponent > 1; hence, it has a stable manifold of
probk mx 2 ax ; bx ; s 2 codimension 1. Call the probability distribution
Z fbx g Y Y corresponding to H .
1 dSx
! exp S2x p 64 The migration towards the trivial fixed point for
k!1 fax g 2D 2D
x2 x2 < c can be explained simply by the fact that for
Introductory Article: Equilibrium Statistical Mechanics 81
such values of the initial energy function H0 is (e.g., the WilsonFisher "-expansion) that allow one
outside the stable manifold of the nontrivial fixed to pass from the well-defined R, 1 to the action of
point and under application of the renormalization R on the energy functions, it is possible to obtain
transformation Rnc , H0 migrates toward the trivial quite unambiguously values for c and expressions
fixed point, which is attractive in all directions. for H which are associated with the action of Rc
By increasing , it may happen that, for on various classes of models.
= c , H0 crosses the stable manifold of the For instance, it can lead to conclude that the
nontrivial fixed point H for Rc . Then Rnc c H0 critical behavior of all ferromagnetic finite-range
will no longer tend to the trivial fixed point but it lattice spin systems (with energy functions given by
will tend to H : this means that the block spin [39]) have critical points controlled by the same c
variables will exhibit a completely different fluctua- and the same nontrivial fixed point: this property is
tion behavior. If is close to c , the iterations of Rc far from being mathematically proved, but it is
will bring Rnc H0 close to H , only to be eventually considered a major success of the theory. One has to
repelled along the unstable direction reaching a compare it with van der Waals critical point theory:
distance from it increasing as n j c j. for the first time, an approximation scheme has
This means that up to a scale length O(2n() ) lattice led, even though under approximations not fully
units with n() j c j = 1 (i.e., up to a scale O(j controllable, to computable critical exponents which
c jlog2 )), the fluctuations will be close to those of the are not equal to those of the van der Waals theory.
fixed point distribution , but beyond that scale they The renormalization group approach to critical
will come close to those of the trivial fixed point: to see phenomena has many variants, depending on which
them the block spins would have to be normalized kind of fluctuations are considered and on the models
with index = 1=2 and they would appear as to which it is applied. In statistical mechanics, there
uncorrelated Gaussian fluctuations (cf. [64], [65]). are a few mathematically complete applications:
The next question concerns finding the nontrivial certain results in higher dimensions, theory of dipole
fixed points, which means finding the energy gas in d = 2, hierarchical models, some problems in
functions H and the corresponding c which are condensed matter and in statistical mechanics of
fixed points of Rc . If the above picture is correct, lattice spins, and a few others. Its main mathematical
the distributions corresponding to the H would successes have occured in various related fields where
describe the critical fluctuations and, if there was not only the philosophy described above can be
only one choice, or a limited number of choices, of applied but it leads to renormalization transforma-
c and H this would open the way to a universality tions that can be defined precisely and studied in
theory of the critical point hinted already by the detail: for example, constructive field theory, KAM
primitive results of van der Waals theory. theory of quasiperiodic motions, and various pro-
The initial hope was, perhaps, that there would be a blems in dynamical systems.
very small number of critical values c and H However, the applications always concern special
possible: but it rapidly faded away leaving, however, cases and in each of them the general picture of the
the possibility that the critical fluctuations could be trivialnontrivial fixed point dichotomy appears
classified into universality classes. Each class would realized but without being accompanied, except in
contain many energy functions which, upon iterated rare cases (like the hierarchical models or the
actions of Rc , would evolve under the control of the universality theory of maps of the interval), by the
trivial fixed point (always existing) for small while, full description of stable manifold, unstable direction,
for = c , they would be controlled, instead, by a and action of the renormalization transformation on
nontrivial fixed point H for Rc with the same c and objects other than the one of immediate interest (a
the same H . For < c , a resolution of the generality which looks often an intractable problem,
approach to the trivial fixed point would be seen by but which also turns out not to be necessary).
considering the map R1=2 rather than Rc whose In the renormalization group context, mathema-
iterates would, however, lead to a Gaussian distribu- tical physics has played an important role also by
tion like [64] (and to a limit energy function like [65]). providing clear evidence that universality classes
The picture is highly hypothetical: but it is could not be too few: this was shown by the
the first suggestion of a mechanism leading to numerous exact solutions after Onsagers solution
critical points with the character of universality of the nearest-neighbor Ising ferromagnet: there are
and with exponents different from those of the van in fact several lattice models in d = 2 that exhibit
der Waals theory or, for ferromagnets on a lattice, critical points with some critical exponents exactly
from those of its lattice version (the CurieWeiss computable and that depend continuously on the
theory). Furthermore, accepting the approximations models parameters.
82 Introductory Article: Equilibrium Statistical Mechanics
For more details, we refer the reader to McCoy Lack of equipartition is important, as it solves
and Wu (1973), Baxter (1982), Bleher and Sinai paradoxes that arise in classical statistical mechanics
(1975), Wilson and Fisher (1972), Gawedzky and applied to systems with infinitely many degrees
Kupiainen (1983, 1985), Benfatto and Gallavotti of freedom, like crystals (modeled by lattices of
(1995), and Mastropietro (2004). coupled oscillators) or fields (e.g., the electromagnetic
field important in the study of black body radiation).
However, although this has been the first surprise of
Quantum Statistics quantum statistics (and in fact responsible for the
Statistical mechanics is extended to assemblies of very discovery of quanta), it is by no means the last.
quantum particles rather straightforwardly. In the At low temperatures, new unexpected (i.e.,
case of N identical particles, the observables are with no analogs in classical statistical mechanics)
operators O on the Hilbert space phenomena occur: BoseEinstein condensation
(superfluidity), Fermi surface instability (supercon-
HN L2 N
or HN L2 C2 N
ductivity), and appearance of off-diagonal long-
where = , , of the symmetric ( = , bosonic range order (ODLRO) will be selected to illustrate
particles) or antisymmetric ( = , fermionic parti- the deeply different kinds of problems of quantum
cles) functions (Q), Q = (q1 , . . . , qN ), of the posi- statistical mechanics. Largely not yet understood,
tion coordinates of the particles or of the position such phenomena pose very interesting problems not
and spin coordinates (Q, s), s = (1 , . . . , N ), nor- only from the physical point of view but also from
malized so that the mathematical point of view and may pose
Z challenges even at the level of a definition. However,
XZ
j Qj2 dQ 1 or j Q; sj2 dQ 1 it should be kept in mind that in the interesting cases
s (i.e., three-dimensional systems and even most two-
and one-dimensional systems) there is no proof that
here only j = 1 is considered. As in classical the objects defined below really exist for the systems
mechanics, a state is defined by the average values like [66] (see, however, the final comment for an
hOi that it attributes to the observables. important exception).
Microcanonical, canonical, and grand canonical
ensembles can be defined quite easily. For instance,
BoseEinstein Condensation
consider a system described by the Hamiltonian
(
h = Plancks constant) In a canonical state with parameters , v, a defini-
tion of the occurrence of Bose condensation is in
2 X
h N X X
terms of the eigenvalues j (, N) of the kernel
HN qj qj qj0 wqj
2m j1 j<j0 j (q, q0 ) on L2 (), called the one-particle reduced
def
density matrix, defined by
KF 66
X1 En ;N Z
e
where periodic boundary conditions are imagined N n q; q1 ; . . . ; qN1
n1
tr eHN
on and w(q) is periodic, smooth potential (the side
0
of is supposed to be a multiple of the periodic n q ; q1 ; . . . ; qN1 dq1 . . . dqN1 68
potential period if w 6 0). Then a canonical
where En (, N) are the eigenvalues of HN and
equilibrium state with inverse temperature and
n (q1 , . . . , qN ) are the corresponding eigenfunctions.
specific volume v = V=N attributes to the observable
If j are ordered by increasing value, the state with
O the average value
parameters , v is said to contain a BoseEinstein
def tr eHN O condensate if 1 (, N) bN > 0 for all large at
hOi 67 v = V=N, fixed. This receives the interpretation
tr eHN
that there are more than bN particles with equal
Similar definitions can be given for the grand momentum. The free Bose gas exhibits a Bose
canonical equilibrium states. condensation phenomenon at fixed density and
Remarkably, the ensembles are orthodic and a heat small temperature.
theorem (see the section Heat theorem and ergodic
hypothesis) can be proved. However, equipartition
Fermi Surface
does not hold: that is, hKi 6 (d=2)N1 , although 1
is still the integrating factor of dU p dV in the heat The wave functions n (q1 , 1 , . . . , qN , N ) n (Q, s)
theorem; hence, 1 continues to be proportional to are now antisymmetric in the permutations of the
temperature. pairs (qi , i ). Let (Q, s; N, n) denote the nth
Introductory Article: Equilibrium Statistical Mechanics 83
eigenfunction of the N-particle energy HN in [66] with The system is said to contain Cooper pairs with
eigenvalue E(N, n) (labeled by n = 0, 1, . . . and non- spins , ( = or = ) if there exist functions
decreasingly ordered). Setting Q00 = (q001 , . . . , q00Np ), g (q, ) 6 0 with
s 00 = (001 , . . . , 00Np ), introduce the kernels H
p (Q, s;
N Z
0
Q0 , s 0 ) by g q; g q; dq 0 if 6 0
p Q;s;Q0 ;s 0
Z X such that
def N X1 EN;n
e
p! dNp Q00 lim x y; ; x0 y0 ; 0 ; x x0
p tr eHN V!1
s 00 n0 X
Q;s;Q00 ;s 00 ;N;n Q0 ; s 0 ;Q00 ;s 00 ; N; n 69 !0
g x y; g x0 y0 ; 0 70
xx !1
which are called p-particle reduced density matrices
In this case, g (x y, ) with largest L2 norm can be
(extending the corresponding one-particle reduced
def P called, after normalize, the wave function of the paired
density matrix [68]). Denote (q1 q2 ) = 1
state of lowest energy: this is the analog of the plane
(q1 , , q2 , ). It is also useful to consider spinless
wave for a free particle (and, like it, it is manifestly not
fermionic systems: the corresponding definitions are
normalizable, i.e., it is not square integrable as a
obtained simply by suppressing the spin labels and
function of x, y). If the system contains Cooper pairs
will not be repeated.
and the nonleading terms in the limit [70] vanish
Let r1 (k) be the Fourier transform of 1 (q q0 ): the
quickly enough the two-particle reduced density
Fermi surface can be defined as the locus of the ks in
matrix [70] regarded as a kernel operator has an
the neighborhood of which @k r1 (k) is unbounded as
eigenvalue of order V as V ! 1: that is, the state of
! 1, ! 1. The limit as ! 1 is important
lowest energy is macroscopically occupied, quite
because the notion of a Fermi surface is, possibly,
like the free Bose condensation in the ground state.
precise only at zero temperature, that is at = 1.
Cooper pairs instability might destroy the Fermi
So far, existence of Fermi surface (i.e., the smooth-
surface in the sense that r1 (k) becomes analytic in k;
ness of r1 (k) except on a smooth surface in k-space)
but it is also possible that, even in the presence of
has been proved in free Fermi systems ( = 0) and
them, there remains a surface which is the locus of the
1. certain exactly soluble one-dimensional spinless singularities of the function r1 (k). In the first case,
systems and there should remain a trace of it as a very steep
2. in rather general one-dimensional spinless systems gradient of r1 (k) of the order of an exponential in the
or systems with spin and repulsive pair interac- inverse of the coupling strength; this is what happens
tion, possibly in an external periodic potential. in the BCS model for superconductivity. The model is,
however, a mean-field model and this particular
The spinning case in a periodic potential and
regularity aspect might be one of its peculiarities. In
dimension d 2 is the most interesting case to study
any event, a smooth singularity surface is very likely to
for its relevance in the theory of conduction in
exist for some interesting density matrix (e.g., in the
crystals. Essentially no mathematical results are
BCS model with gap parameter
the wave function
available as the above-mentioned ones do not Z
concern any case in dimension >1: this is a rather 1
gx y; d
eik xy q dk
deceiving aspect of the theory and a challenge. 2 "k>0 "k2
2
In dimension 2 or higher, for fermionic systems
with Hamiltonian [66], not only there are no results of the lowest energy level of the Cooper pairs is
available, even without spin, but it is not even clear singular on a surface coinciding with the Fermi
that a Fermi surface can exist in presence of surface of the free system).
interesting interactions.
ODLRO
Cooper Pairs Consider the k-fermion reduced density matrix
The superconductivity theory has been phenomeno- k (Q, s; Q0 , s 0 ) as kernel operators Ok on L2 ((
logically related to the existence of Cooper pairs. C2 )k ). Suppose k is even, then if Ok has a (generalized)
Consider the Hamiltonian [66] and define (cf. [69]) eigenvalue of order N k=2 as N ! 1, N=V = , the
system is said to exhibit off-diagonal long-range order
x y; ; x0 y0 ; 0 ; x x0 of order k. For k odd, ODLRO is defined to exist if Ok
def
has an eigenvalue of order N (k1)=2 and k 3 (if k = 1
2 x; ; y; ; x0 ; 0 ; y0 ; 0 the largest eigenvalue of O1 is necessarily 1).
84 Introductory Article: Equilibrium Statistical Mechanics
For bosons, consider the reduced density matrix Appendix 1: The Physical Meaning of the
k (Q; Q0 ) regarding it as a kernel operator Ok on Stability Conditions
L2 ()k and define ODLRO of order k to be present
if O(k) has a (generalized) eigenvalue of order N k as It is useful to see what would happen if the
N ! 1, N=V = . conditions of stability and temperedness (see [14])
ODLRO can be regarded as a unification of the are violated. The analysis also illustrates some of the
notions of Bose condensation and of the existence of typical methods of statistical mechanics.
Cooper pairs, because Bose condensation could be
said to correspond to the kernel operator 1 (q1 q2 ) Coalescence Catastrophe due
in [68] having a (generalized) eigenvalue of order N, to Short-Distance Attraction
and to be a case of ODLRO of order 1. If the state is
pure in the sense that it has a cluster property (see The simplest violation of the first condition in [14]
the sections Phase transitions and boundary condi- occurs when the potential is smooth and negative
tions and Lattice models), then the existence of at the origin.
ODLRO, Bose condensation, and Cooper pairs Let > 0 be so small that the potential at distances
implies that the system shows a spontaneously 2 is b < 0. Consider the canonical distribution
broken symmetry: conservation of particle number with parameters , N in a (cubic) box of volume V.
and clustering imply that the off-diagonal elements The probability Pcollapse that all the N particles are
of (all) reduced density matrices vanish at infinite located in a little sphere of radius around the center
separation in states obtained as limits of states with of the box (or around any prefixed point of the box) is
periodic boundary conditions and Hamiltonian [66], estimated from below by remarking that
and this is incompatible with ODLRO.
N b
The free Fermi gas has no ODLRO, the BCS model b
N2
2 2
of superconductivity has Cooper pairs and ODLRO
with k = 2, but no Fermi surface in the above sense so that
(possibly too strict). Fermionic systems cannot have
Pcollapse
ODLRO of order 1 (because the reduced density Z
matrix of order 1 is bounded by 1). dpdq Kp q
e
The contribution of mathematical physics has h3N N!
ZC
been particularly effective in providing exactly dpdq Kp q
e
soluble models: however, the soluble models deal h3N N!
with one-dimensional systems and it can be shown p3 !N
4 2m 1 3N b1=2NN 1
that in dimensions 1, 2 no ODLRO can take place. 3
e
3h N!
A major advance is the recent proof of ODLRO and Z 71
Bose condensation in the case of a lattice version of dq q
e
[66] at a special density value (and d 3). h3N N!
In no case, for the Hamiltonian [66] with 6 0,
The phase space is extremely small: nevertheless,
existence of Cooper pairs has been proved nor
such configurations are far more probable than the
existence of a Fermi surface for d > 1. Nevertheless,
configurations which look macroscopically cor-
both Bose condensation and Cooper pairs formation
rect, that is, configurations with particles more or
can be proved to occur rigorously in certain limiting
less spaced by the average particle distance expected
situations. There are also a variety of phenomena
in a macroscopically homogeneous configuration,
(e.g., simple spectral properties of the Hamiltonians)
namely (N=V)1=3 = 1=3 . Their energy (q) is of
which are believed to occur once some of the
the order of uN for some u, so that their probability
above-mentioned ones do occur and several of
will be bounded above by
them can be proved to exist in concrete models. Z
If d = 1, 2, ODLRO can be proved to be impos- dpdq Kp uN
e
sible at T > 0 through the use of Bogoliubovs h3N N!
Pregular Z
inequality (used in the no d = 2 crystal theorem, dpdq Kp q
see the section Continuous symmetries: no d = 2 e
h3N N!
crystal theorem). p3
For more details, the reader is referred to Penrose V N 2m1 uN
3N e
and Onsager (1956), Yang (1962), Ruelle (1969), Z h N! 72
Hohenberg (1967), Gallavotti (1999), and dq q
e
Aizenman et al. (2004). h3N N!
Introductory Article: Equilibrium Statistical Mechanics 85
However, no matter how small is, the interactions in the above subsection; it occurs when
ratio Pregular =Pcollapse will approach 0 as V ! 1, the potential is too repulsive at 1, that is,
N=V ! v1 ; this occurs extremely rapidly because
2
ebN =2 eventually dominates over V N
eN log N . q
gjqj3" as q!1
Thus, it is far more probable to find the system in a so that the temperedness condition is again
microscopic volume of size rather than in a violated.
configuration in which the energy has some macro- In addition, in this case, the system does not
scopic value proportional to N. This catastrophe can occupy the whole volume: it will generate a layer of
be called an ultraviolet catastrophe (as it is due to the particles sticking, in close-packed configuration, to
behavior at very short distances) and it causes the the walls of the container. Therefore, if the density is
collapse of the particles into configurations concen- lower than the close-packing density, < cp , the
trated in regions as small as we please as V ! 1. system will leave a region around the center of the
container empty; and the volume of the empty
Coalescence Catastrophe due region will still be of the order of the total volume of
to Long-Range Attraction the box (i.e., its diameter will be a fraction of the
It occurs when the potential is too attractive near 1. box side L). The proof is completely analogous to
For simplicity, suppose that the potential has a hard the one of the previous case; except that now the
core, i.e., it is 1 for r < r0 , so that the above- configuration with lowest energy will be the one
discussed coalescence cannot occur and the system sticking to the wall and close packed there, rather
density bounded above by a certain quantity cp < 1 than the one close packed at the center.
(close-packing density). Also this catastrophe is important as it is realized in
The catastrophe occurs if (q)
gjqj3" , g, " > 0, systems of charged particles bearing the same charge:
for jqj large. For instance, this is the case for matter the charges adhere to the boundary in close-packing
interacting gravitationally; if k is the gravitational configuration, and dispose themselves so that the
constant, m is the particle mass, then g = km2 and " = 2. electrostatic potential energy is minimal. Therefore,
The probability Pregular of regular configurations, charges deposited on a metal will not occupy the whole
where particles are at distances of order 1=3 from volume: they will rather form a surface layer minimiz-
their close neighbors, is compared with the probability ing the potential energy (i.e., so that the Coulomb
Pcollapse of catastrophic configurations, with the potential in the interior is constant). In general, charges
particles at distances r0 from their close neighbors to in excess of neutrality do not behave thermodynami-
form a configuration of density cp =(1 )3 almost in cally: for instance, besides not occupying the whole
close packing (so that r0 is equal to the hard-core volume given to them, they will not contribute
radius times 1 ). In the latter case, the system does normally to the specific heat.
not fill the available volume and leaves empty a region Neutral systems of charges behave thermodyna-
whose volume is a fraction
((cp )=cp )V of V. mically if they have hard cores, so that the
Further, it can be checked that the ratio Pregular =Pcollapse ultraviolet catastrophe cannot occur or if they obey
tends to 0 at a rate O(exp (g 12 N(cp (1 )3 ))) quantum-mechanical laws and consist of fermionic
if is small enough (and < cp ). particles (plus possibly bosonic particles with
A system which is too attractive at infinity will not charges of only one sign).
occupy the available volume but will stay confined in a For more details, we refer the reader to Lieb
close-packed configuration even in empty space. and Lebowitz (1972) and Lieb and Thirring (2001).
This is important in the theory of stars: stars cannot
be expected to obey regular thermodynamics and in
Appendix 2: The Subadditivity Method
particular will not evaporate because their particles
interact via the gravitational force at large distances. A simple consequence of the assumptions is that the
Stars do not occupy the whole volume given to them exponential in (5.2) can be bounded above by
PN 2
(i.e., the universe); they do not collapse to a point only eBN exp( 2m i = 1 P i ) so that
because the interaction has a strongly repulsive core pd
(even when they are burnt out and the radiation pressure B
1 Zgc ; ; V exp Ve e 2m1
is no longer able to keep them at a reasonable size).
1 pd
)0 log Zgc ; ; V e eB 2m 1 73
Evaporation Catastrophe V
This is another infrared catastrophe, that is, a Consider, for simplicity, the case of a hard-core
catastrophe due to the long-range structure of the interaction with finite range (cf. [14]). Consider a
86 Introductory Article: Equilibrium Statistical Mechanics
sequence of boxes n with sides 2n L0 , where L0 > 0 be the Poisson bracket. Integration by parts, with
is arbitrarily fixed to be > 2R. The partition function periodic boundary conditions, yields
Zgc (, z) relative to the volume n is R
1 NZ A fC; eH gdPdQ
X z hA fC; Hgi
Zn dQeFQ Zc ; ; N
N!
N0 n
1 hfA ; Cgi 75
because the integral over the P variables can be
explicitly performed and included in zN if z is as a general identity. The latter identity implies, for
defined as z = e (2m1 )d=2 . A = {C, H}, that
Then the box n contains 2d boxes n1 for n 1
hfH; Cg fH; Cgi 1 hfC; fH; C ggi 76
and
d
Hence, the Schwartz inequality hA Aih{H, C}
1 Zn Z2n1 exp B2dLn1 =Rd1 22d 74
{H, C}i jh{A , C}ij2 combined with the two
because the corridor of width 2R around the relations in [75], [76] yields Bogoliubovs inequality:
boundaries of the 2d cubes n1 filling n has
jhfA ; Cgij2
volume 2RLn1 2d and contains at most hA Ai 1 77
(Ln1 =R)d1 2d particles, each of which interacts hfC; fC ; Hggi
with at most 2d other particles. Therefore,
Let g, h be arbitrary complex (differentiable)
def functions and @ j = @ qj
pn Ldn log Zn
Ldn1 log Zn1 B
d 2n L0 =Rd1 def
X
N
def
X
N
AQ gqj ; CP; Q pj hqj 78
for some
d > 0. Hence, 0 pn pn1 d 2n j1 j1
for some d > 0 and pn is bounded above and below P1 2
uniformly in n. So, the limit [13] exists on the sequence Then H = 2 pj F(q1 , . . . , qN ), if
Ln = L0 2n and defines a function p1 (, ).
1X X
A box of arbitrary size L can be filled with about Fq1 ; . . . ; qN jqj qj0 j " Wqj
(L=Ln )d boxes of side Ln with n so large that, 2 j6j0 j
prefixed > 0, jp1 pn j < for all n n . Likewise,
a box of size Ln can be filled by about (Ln =L)d so that, via algebra,
boxes of size L if n is large. The latter remarks lead X
us to conclude, by standard inequalities, that the fC; Hg hj @ j F pj pj @ j hj
j
limit in [13] exists and coincides with p1 .
The subadditivity method just demonstrated for def
with hj = h(qj ). If h is real valued, h{C, {C , H}}i
finite-range potentials with hard core can be extended becomes, again via algebra,
to the potentials satisfying just stability and tempered- * +
ness (cf. the section Thermodynamic limit). X
For more details, the reader is referred to Ruelle hj hj0 @ j @ j0 FQ
jj0
(1969) and Gallavotti (1999). * +
X 4X 2
2
" hj Wqj @ j hj
j
j
Appendix 3: An Infrared Inequality
(integrals on pj just replace p2j by 21 and
The infrared inequalities stem from Bogoliubovs
h(pj )i (pj )i0 i = 1 i, i0 ). Therefore, the average
inequality. Consider as an example the problem of
h{C, {C , H}}i becomes
crystallization discussed in the section Continuous
symmetries: no d = 2 crystal theorem. Let h i *
1X
denote average over a canonical equilibrium state hj hj0 2 jqj qj0 j
with Hamiltonian 2 jj0
+
N p2
X X X
H
j
UQ "WQ " h2j Wqj 41 @ j hj 2 79
j1
2 j j
with given temperature P and density parameters Choose g(q) ei(k K) q , h(q) = cos q k and
, , = a3 . Let {X, Y} = j (@pj X @ qj Y @qj X @pj Y) bound (hj hj0 )2 by k 2 (qj qj0 )2 , (@ j hj )2 by k 2 and
Introductory Article: Equilibrium Statistical Mechanics 87
h2j by 1. Hence [79] is bounded above by ND(k ) the interior points, in this case on the derivatives of FV
with with respect to , , at 0. The latter are identical to
* ! the averages in [80], [81]. In this way, the constants
def 2 1 1 X 2 B1 , B2 , B0 such that D(k ) k 2 B1 "B2 and B0 > D1
Dk k 4 q qj0 jqj qj0 j
2N j6j0 j are found.
+ For more details, the reader is referred to Mermin
1X (1968).
" jWqj j 80
N j
Landau L and Lifschitz LE (1967) Physique Statistique. Moscow: Miracle-Sole S (1995) Surface tension, step free energy and facets
MIR. in the equilibrium crystal shape. Journal Statistical Physics 79:
Lanford O and Ruelle D (1969) Observables at infinity and 183214.
states with short range correlations in statistical mechanics. Olla S (1987) Large deviations for Gibbs random fields.
Communications in Mathematical Physics 13: 194215. Probability Theory and Related Fields 77: 343357.
Lebowitz JL (1974) GHS and other inequalities. Communications Onsager L (1944) Crystal statistics. I. A two dimensional Ising
in Mathematical Physics 28: 313321. model with an orderdisorder transition. Physical Review 65:
Lebowitz JL and Penrose O (1979) Towards a rigorous molecular 117149.
theory of metastability. In: Montroll EW and Lebowitz JL Penrose O and Onsager L (1956) BoseEinstein condensation and
(eds.) Fluctuation Phenomena. Amsterdam: North-Holland. liquid helium. Physical Review 104: 576584.
Lee TD and Yang CN (1952) Statistical theory of equations of Pfister C and Velenik Y (1999) Interface, surface tension and
state and phase transitions, II. Lattice gas and Ising model. Reentrant pinning transition in the 2D Ising model. Commu-
Physical Review 87: 410419. nications in Mathematical Physics 204: 269312.
Lieb EH (2002) Inequalities. Berlin: Springer. Ruelle D (1969) Statistical Mechanics. New York: Benjamin.
Lieb EH and Lebowitz JL (1972) Lectures on the Thermodynamic Ruelle D (1971) Extension of the LeeYang circle theorem.
Limit for Coulomb Systems, In: Lenard A (ed.) Springer Physical Review Letters 26: 303304.
Lecture Notes in Physics, vol. 20, pp. 135161. Berlin: Springer. Sinai Ya G (1991) Mathematical Problems of Statistical Mechanics.
Lieb EH and Thirring WE (2001) Stability of Matter from Atoms Singapore: World Scientific.
to Stars. Berlin: Springer. van Beyeren H (1975) Interphase sharpness in the Ising model.
Mastropietro V (2004) Ising models with four spin interaction at Communications in Mathematical Physics 40: 16.
criticality. Communications in Mathematical Physics 244: Wilson KG and Fisher ME (1972) Critical exponents in 3.99
595642. dimensions. Physical Review Letters 28: 240243.
McCoy BM and Wu TT (1973) The two Dimensional Ising Yang CN (1962) Concept of off-diagonal long-range order and
Model. Cambridge: Harvard University Press. the quantum phases of liquid He and of superconductors.
Mermin ND (1968) Crystalline order in two dimensions. Physical Reviews of Modern Physics 34: 694704.
Review 176: 250254.
analysis, global analysis, the theory of pseudodiffer- any non-negative integer k, the space Ck ([0, 1]) of
ential operators, differential geometry, operator functions on P [0, 1] of class Ck equipped with the
algebras, noncommutative geometry, etc. norm kf kk = ki= 0 kf (i) k1 expressed in terms of a
finite number of seminorms kf (i) k1 = supx2[0,1]
jf (i) (x)j, i = 0, . . . , k, is also a Banach space.
Topological Vector Spaces
The space C1 ([0, 1]) of smooth functions on the
Most topological spaces one comes across in practice interval [0, 1] is not anymore a Banach space since
are metric spaces. A metric on a topological space E its topology is described by a countable family of
is a map d : E E ! [0, 1[ which is symmetric, seminorms kf kk with k varying in the positive
such that d(u, v) = 0 , u = v and which verifies the integers. The metric
triangle inequality d(u, w) d(u, v) d(v, w) for all X
1
vectors u, v, w. A topological space E is metrizable if kf gkk
df ; g 2k
there is a metric d on E compatible with the topology k1
1 kf gkk
on E, in which case the balls with radius 1=n centered
turns it into a Frechet space, that is, a locally convex
at any point x 2 E form a local base at x that is, a
complete metric space. The space S(Rn ) of rapidly
collection of neighborhoods of x such that every
decreasing functions, which are smooth functions f
neighborhood of x contains a member of this
on Rn for which
collection. A sequence (un ) in E then converges to
u 2 E if and only if d(un , u) converges to 0. kf k; : sup jx Dx f xj
The Banach fixed-point theorem on a complete x2R n
metric space (E, d) is a useful tool in nonlinear is finite for any multiindices and , is also a
functional analysis: it states that a (strict) contrac- Frechet space with the topology given by the
tion on E, that is, a map T : E ! E such that seminorms k k, . Further examples of Frechet
d(Tu, Tv) k(u, v) for all u 6 v 2 E and fixed 0 < spaces are the space C1 0 (K) of smooth functions
k < 1, has a unique fixed point T u0 = u0 . In with support in a fixed compact subset K Rn
particular, it provides local existence and uniqueness equipped with the countable family of seminorms
of solutions of differential equations dy=dt = F(y, t)
with initial condition y(0) = y0 , where F is Lipschitz kD f k1; K sup jDx f xj; 2 N n0
x2K
continuous.
Linear functional analysis starts from topological and the space C1 (M, E) of smooth sections of a
vector spaces, that is, vector spaces equipped with a vector bundle E over a closed manifold M equipped
topology for which the operations are continuous. A with a similar countable family of seminorms. Given
topological vector space equipped with a local base an open subset = [p2N Kp with Kp , p 2 N com-
whose members are convex is said to be locally pact subsets of Rn , the space D() = [p2N C1 0 (Kp )
convex. Examples of locally convex spaces are equipped with the inductive limit topology for
normed linear spaces, namely vector spaces which a sequence (fn ) in D() converges to f 2 D()
equipped with a norm, a concept that first arose in if each fn has support in some fixed compact subset
the work of Frechet. A seminorm on a vector space K and (D fn ) converges uniformly to D f on K for
V is a map : V ! [0,1[ which obeys the triangle each mutilindex is a locally convex space.
identity (u v) (u) (v) for any vectors u, v Among Banach spaces are Hilbert spaces which
and such that (u) = jj(u) for any scalar and have properties very similar to those of finite-
any vector u; if (u) = 0 ) u = 0, it is a norm, often dimensional spaces and are historically the first
denoted by k k. A norm on a vector space E gives type of infinite-dimensional space to appear with the
rise to a translation-invariant distance function works of Hilbert at the beginning of the twentieth
d(u, v) = ku vk making it a metric space. century. A Hilbert space is a Banach space equipped
Historically, one of the first examples of normed with a norm kk that derives from an inner product,
spaces is the space C([0, 1]) investigated by Riesz of that is, kuk2 = hu, ui with h , i a positive-definite
(real- or complex-valued) continuous functions on bilinear (or sesquilinear according to whether the
the interval [0, 1] equipped with the supremium base space is real or complex) form. Hilbert spaces
norm kf k1 := supx2[0,1] jf (x)j. In the 1920s, the are fundamental building blocks in quantum
general definition of Banach space arose in connec- mechanics; using (closed) tensor products, from a
tion with the works of Hahn and Banach. A normed Hilbert P space H one builds the Fock space
linear space is a Banach space if it is complete as a F (H) = 1 k
k = 0 H and
P from there the bosonic
metric space for the induced metric, C([0, 1]) being a Fock space F (H) = 1
k=0 s
k
H (where s stands
prototype of a Banach space. More generally, for for the (closed) symmetrized tensor product) as well
90 Introductory Article: Functional Analysis
P
as the fermionic Fock space F (H) = 1 k
k=0 H to define W s, p () and H s (M, E) with s any real
(where k stands for the antisymmetrized (closed) number.
tensor product). Sobolev spaces arise in many areas of mathe-
A prototype of Hilbert space is the space l2 (Z) of matics; one central example in probability theory is
complex-valued
P sequences (un )n2Z such that the CameronMartin space H 1 ([0, t]) embedded in
2
ju
n2Z n j is finite, which is already implicit in the Wiener space C([0, t]). This embedding is a
Hilberts Grundzugen. Shortly afterwords, Riesz and particular case of more general Sobolev embedding
Fischer, with the help of the integration tool theorems, which embed (possibly continuously,
introduced by Lebesgue, showed that the space sometimes even compactly (the notion of compact
L2 (]0, 1[) (first introduced by Riesz) of square- operator is discussed in a later section)) W k, p -
summable functions on the interval ]0, 1[, that is, Sobolev spaces in Lq -spaces with q > p such as the
functions f such that continuous inclusion W k, p (R n ) Lq (R n ) with
Z 1 1=2 1=q = 1=p k=n, or in Cl -spaces with l k such
kf kL2 2
jf xj dx as, for a bounded open and regular enough subset
0 of Rn and for any s l n=p with p > n, the
(the set of
continuous inclusion W s, p () Cl ()
is finite, provides an example of Hilbert space.
functions in Cl () such that D u can be continu-
These were then further generalized to spaces for all jj l).
ously extended to the closure
Lp (]0, 1[) of p-summable (1 p < 1) functionals
Sobolev embeddings have important applications for
on ]0, 1[ (i.e., functions f such that
the regularity of solutions of partial differential
Z 1 1=p equations, when showing that weak solutions one
p
kf kLp jf xj dx constructs are in fact smooth. In particular, on an n-
0
dimensional closed manifold M for s > l n=2, the
is finite), which are not Hilbert unless p = 2 but which Sobolev space H s (M, E) can be continuously
provide further examples of Banach spaces, the space embedded in the space Cl (M, E) of sections of E of
L1 (]0, 1[) of functions on ]0, 1[ bounded almost class C l , which in particular implies that the
everywhere with respect to the Lebesgue measure, solutions of a hypoelliptic partial differential equa-
offering yet another example of Banach space. tion Au = v with v 2 L2 (M, E) are smooth, as for
In 1936, Sobolev gave a generalization of the example in the case of solutions of the Seiberg
notion of function and their derivatives through Witten equations.
integration by parts, which led to the so-called
Sobolev spaces W k, p (]0, 1[) of functions f 2
Lp (]0, 1[) with derivatives up to order k lying in
Duality
Lp (]0, 1[), obtained as the closure of C1 (]0, 1[) for The concept of duality (in a topological sense) was
the norm initiated at the beginning of the twentieth century by
!1=p Hadamard, who was looking for continuous linear
Xk
p functionals on the Banach space C(I) of continuous
f 7! kf kW k;p k@ j f kLp
functions on a compact interval I equipped with a
j1
uniform topology. It is implicit in Hilberts theory
(for p = 2, W k, p (]0, 1[) is a Hilbert space often and plays a central part in Riesz work, who
denoted by H k (]0, 1[). They differ from the Sobolev managed to express such continuous functionals as
spaces W0k, p (]0, 1[), which correspond to the closure Stieltjes integrals, one of the starting points for the
of the set D(]0, 1[) for the norm f 7!kf kW k, p ; for modern theory of integration.
example, an element u 2 W 1, p (]0, 1[) lies in The topological dual of a topological vector space
1, p
W0 (]0, 1[) if and only if it vanishes at 0 and 1, E is the space E of continuous linear forms on E
that is, if and only if it satisfies Dirichlet-type which, when E is a normed space, can be equipped
boundary conditions on the boundary of the inter- with the dual norm kLkE = supu2E, kuk1 jL(u)j.
val. Similarly, one defines Sobolev spaces Dual spaces often provide a receptacle for singular
W0k, p (R) = W k, p (R) on R, Sobolev spaces W k, p () objects; any of the functions f 2 Lp (Rn )(p 1) and
and W0k, p () on open subsets Rn and using a the delta-function at point x 2 Rn, x : f 7! f (x), all lie
partition of unity on a closed manifold M, Sobolev in the space S 0 (R n ) dual to S(Rn ) of tempered
spaces H k (M, E) = W k, 2 (M, E) of sections of vector distributions on Rn , which is itself contained in the
bundles E over M. Using the Fourier transform space D0 (Rn ) of distributions dual to D(Rn ).
(discussed later), one can drop the assumption that k Furthermore, the topological dual E of a nuclear
be an integer and extend the notion of Sobolev space space E contains the support of a probability
Introductory Article: Functional Analysis 91
measure with characteristic function (see the next Lp () can be identified via the Riesz representation
section) given by a continuous positive-definite with Lp () with p conjugate to p, that is, 1=p
function on E. Among nuclear spaces are projective 1=p = 1 and Lp () is reflexive, whereas the topolo-
limits E = \p2N Hp (a sequence (un ) 2 E converges gical duals of W s, p () and W0s, p () both coincide
to u 2 E whenever it converges to u in each Hp ) of with W0s, p () so that only W0s, p () is reflexive.
countably many nested Hilbert spaces Hp Neither L1 () nor its topological dual L1 () is
Hp1 H0 such that the embedding Hp reflexive since L1 () is strictly contained in the
Hp1 is a trace-class operator (see the section topological dual of L1 () for there are continuous
Operator algebras). If Hp is the closure of E for linear forms L on L1 () that are not of the form
the norm k kp , the topological dual E0 of E for the Z
norm k k0 is an inductive limit E0 = [p2N0 Hp , Lu uv 8u 2 L1 with v 2 L1
where Hp are the dual (with respect to k k0 )
Hilbert spaces with norm k kp (a sequence (un ) 2 Similarly, the topological dual E of a normed
E0 converges to u 2 E0 whenever it lies in some Hp linear space E can be equipped with the topology
and converges to u for the topology of Hp ) and we induced by the dual norm k kE and the the weak -
have topology, namely the weakest one for which the
maps L 7! L(u), u 2 E, are continuous, and the unit
E Hp Hp1 H0
ball in E is indeed compact for this topology
H00 H1 Hp E0 (BanachAlaoglu theorem).
Duality does not always preserve separability a
As a result of the theory of elliptic operators on a topological vector space is separable if it has a
closed manifold, the Frechet space C1 (M, E) of countable dense subspace since L1 (), which is
smooth sections of a vector bundle over a closed not separable, is the topological dual of L1 (),
manifold M is nuclear as the inductive limit of which is separable. However, as a consequence of
countably many Sobolev spaces Hp (M, E) with the HahnBanach theorem, if the topological dual of
L2 -dual given by the projective limit of countably a Banach space is separable then so is the original
many Sobolev spaces H p (M, E). space and one has equivalence when adding the
The existence of nontrivial continuous linear reflexivity assumption; a Banach space is reflexive
forms on a normed linear space E is ensured by the and separable whenever its topological dual is. For
HahnBanach theorem, which asserts that for any s, p
1 p < 1, Lp () and W0 () are separable and
closed linear subspace F of E, there is a nonvanish- moreover reflexive if p 6 1.
ing continuous linear form that vanishes on F. When
the space is a Hilbert space (H,h , iH ), it follows
from the RieszFrechet theorem that any continuous
Fourier Transform
linear form L on H is represented in a unique way
by a vector v 2 H such that L(u) = hv, uiH for all In the middle of the eighteenth century, oscillations
u 2 H, thus relating the dual pairing on the left with of a vibrating string were interpreted by Bernouilli
the Hilbert inner product on the right and identify- as a limit case for the oscillation of n-point masses
ing the topological dual H with H. when n tends the infinity, and Bernouilli introduced
The strong topology induced by the norm k k on the novel idea of the superposition principle by
a normed vector space E that is, the topology in which the general oscillation of the string should
which a sequence (un ) converges to u whenever decompose in a superposition of proper oscilla-
kun uk ! 0 is too refined to have compact sets tions. This point of view triggered off a discussion
when E is infinite dimensional since the compactness as to whether or not an arbitrary function can be
of the unit ball in E for the strong topology expanded as a trigonometric series. Other examples
characterizes finite-dimensional spaces. Since com- of expansions in orthogonal functions (this termi-
pact sets are useful for existence theorems, one is nology actually only appears with Hilbert) had been
inclined to weaken the topology: the weak topology found in the mean time in relation to oscillation
on E which coincides with the strong topology problems and investigations on heat theory, but it
when E is finite dimensional and for which a was only in the nineteenth century, with the works
sequence (un ) converges to u if and only if L(un ) ! of Fourier and Dirichlet, that the superposition
L(u) 8L 2 E has compact unit ball if and only if E problem was solved.
is reflexive or, in other words, if E can be canonically Separable Hilbert spaces can be equipped with a
identified with its double dual (E ) . For 1 < p < 1, countable orthonormal system {en }n2Z (hen , em iH =
given an open subset Rn, the topological dual of mn with h , iH the scalar product on H) which is
92 Introductory Article: Functional Analysis
complete, that is, any vector u 2 H can P be expanded Fourier transform maps a Gaussian function
2
in this system in a unique way u = n2Z u ^n en with x 7! e(1=2)jxj on Rn , where is a nonzero scalar,
1 2
Fourier coefficientsPu ^n = hu, en i. The latter obey to another Gaussian function 7! e(1=2) jj (up to
Parsevals relation n2Z j^ un j2 = kuk2 (where k k is a nonzero multiplicative factor), a starting point for
the norm associated with h , i), and the Fourier T-duality in string theory. More generally, the
transform u 7! (^ u(n))n2Z gives rise to an isometric characteristic function
isomorphism between the separable Hilbert space Z
H and the Hilbert space l2 (Z) of square-summable
^ : eihx;iH
dx
sequences of complex numbers. In particular, the H
continuous solutions = (I A)1 f of the equation bounded. Unbounded operators arise in partial
f = (I A) for f 2 C([0, 1]), Fredholm in 1900 differential equations that involve differential opera-
(Sur une classe dequations fonctionnelles) studied the tors such as the Laplacian on an open subset
equation f = (I A), introducing a complex para- Rn . The following equations provide fundamental
meter . He proved what is since then called the examples of partial differential equations which
Fredholm alternative, which states that either the arose over time from the study of various problems
equation f = (I A) has a unique solution for every in mathematical physics with the works of Poisson,
f 2 C([0, 1]) or the corresponding homogeneous equa- Fourier, and Cauchy:
tion (I A) = 0 has nontrivial solutions. In modern
u 0 Laplace equation
language, it means that the resolvent R(A,
) = (A
2
of Ut , t 2 R, which in a compact form reads and defines a smoothing operator, an operator that
Ut = eitA . An important example in quantum maps Sobolev function to smooth function. In
mechanics is Ut = eit H U0 , t 2 R with H a self- general, a pseudodifferential operators A on an
adjoint Hamiltonian, which solves the Schrodinger open subset U of Rn with symbol A only has a
equation d=dtu = iHu. The LieTrotter formula, distribution kernel
which has important applications for Feynman Z
path integrals, expresses the unitary semigroup KA x; y eihxy;i d
generated by A B, where A, B, and A B are Rn
self-adjoint on their respective domains as a strong The kernel of the inverse Laplacian ( m2 )1
limit on Rn (the non-negative real number m2 stands
itA itB n for the mass) called Greens function on R n ,
eitAB lim e n e n plays an essential role in the theory of Feynman
t!1
graphs.
On the other hand, positive operators on a
Hilbert space (H,h , iH ) that is, A self-adjoint
and such that hAu, uiH 0 8u 2 D(A) generate
Spectral Theory
one-parameter semigroups Tt = etA , t 0. Hille
and Yosida proved that on a Hilbert space, strongly Spectral theory is the study of the distribution of the
continuous contraction (i.e., jkTt kj 1 8t > 0) values of the complex parameter for which, given
semigroups such that T0 = Id are in one-to-one a linear operator A on a normed space E, the
correspondence with densely defined positive opera- operator A I has an inverse and of the properties
tors A : D(A) H ! H that are maximal (i.e., I A of this inverse when it exists, the resolvent
is onto), obtained as (minus the) infinitesimal R(A, ) = (A I)1 of A. The resolvent (A) of A
generators is the set of complex numbers for which A I is
invertible with densely defined bounded inverse. The
Tt u u
Au lim ; u2H spectrum Sp(A) of A is the complement in C of the
t!0 t resolvent; it consists of a union of three disjoint sets:
of the corresponding semigroups. Similarly, a posi- the set of all complex numbers for which A I is
tive densely defined self-adjoint operator A on a not injective, called the point spectrum such a is
Hilbert space H gives rise to a densely defined
p pclosed
an eigenvalue of A with associated eigenfunction
symmetric sesquilinear form (u, v) 7! p Au, AviH
h any u 2 D(A) such that Au = u; the set of points
(see next section for a definition of A;h , iH is the for which A I has a densely defined unbounded
scalar product on H) and this map yields a one- inverse R(A, ) called the continuous spectrum; and
to-one correspondence between operators and the set of points for which A I has a well-
sesquilinear forms on H with the aforementioned defined unbounded but not densely defined inverse
properties, one of the starting points for the theory R(A, ) called the residual spectrum.
of Dirichlet forms. To a probability measure
on A bounded operator has bounded spectrum and a
a separable Banach space E, one can associate a self-adjoint operator A acting on a Hilbert space has
densely defined closed symmetric sesquilinear form real spectrum and no residual spectrum since the
(it is in fact a Dirichlet form) on a Hilbert space H range of A I is dense. As a consequence of the
Introductory Article: Functional Analysis 95
Fredholm alternative, the spectrum of a compact with involution given by the adjoint operation
operator consists only of point spectrum; it is A 7! A ; it is a C -algebra, that is, an algebra over
countable with accumulation point at 0. A Hamilto- C with a norm k k and an involution such that A
nian of a quantum mechanichal system can have is closed for this norm and such that kabk kakkbk
both point and continuous spectra, but its point and ka ak = kak2 for all a, b 2 A and by the
spectrum is of special interest because the corre- GelfandNaimark theorem, every C -algebra is
sponding eigenfunctions are stationary states of the isomorphic to a sub-C -algebra of some L(H). The
system. As was first pointed out by Kac (Can you notion of spectrum extends from bounded opera-
hear the shape of a drum?), the spectrum of an tors to C -algebras; the spectrum sp(a) of an
operator acting on functions can reflect the geome- element a in a C -algebra A is a (compact) set of
try of the space these functions are defined on, a complex numbers such that a 1 is not inver-
starting point for many interesting and far-reaching tible. The notion of self-adjointness also extends
questions in differential geometry. (a = a ), and just as a self-adjoint operator B 2
A self-adjoint linear operator on a Hilbert space L(H) is non-negative (in which case its spectrum
can be described in terms of a family of projections lies in R ) if and only if B = A A for some bounded
E , 2 R via the spectral representation operator A, an element b 2 A is said to be non-
Z negative if and only if b = a a for some a 2 A, in
A dE which case sp(a) R 0.
SpA The algebra C(X) of continuous functions f : X !
Given a Borel real-valued function f on R, the operator C vanishing at infinity on some locally compact
Z Hausdorff space X equipped with the supremum
f A f dE norm and the conjugation f 7! f is also a C -algebra
SpA and a prototype for abelian C -algebras, since
yields another self-adjoint operator. A positive Gelfand showed that every abelian C -algebra is
operator A on a dense domain D(A) of some Hilbert isometrically isomorphic to C(X), with X compact if
space (H,h , iH ) has non-negative spectrum and for the algebra is unital. To a C -algebra A, one can
any positive real number t, the map 7! et gives associate an abelian group K0 (A) which is dual to the
the associated bounded heat-operator Grothendieck group K0 (X) of isomorphism classes of
Z vector bundles over a compact Hausdorff space X.
etA et dE Compact operators on a Hilbert space H form
SpA the only proper two-sided ideal K(H) of the C -
p algebra L(H) which is closed for the operator norm
while the map 7! gives rise to a positive
p p2 topology on L(H). The quotient L(H)=K(H) is
operator A such that A = A. called the Calkin space, after Calkin, who classi-
The resolvent can also be used to define new fied all two-sided ideals in L(H) for a separable
operators Hilbert space H; one can set up a one-to-one
Z correspondence between such ideals and certain
1
f A f RA; d sequence spaces. Corresponding to the Banach
2i C
spacePl1 (Z) of complex-valued sequences (un ) such
from a linear operator via a Cauchy-type integral that n2N jun j < 1, is the -ideal IP1 (H) of trace-
along a countour C around the spectrum; this way class operators. The trace tr(A) = n2Z hA en ,en iH
one defines complex powers Az of (essentially self- of a negative operator A 2 L(H) lies in [0, 1]
adjoint) positive elliptic pseudodiffferential opera- and is independent of the choice of the complete
tors which enter the definition of the zeta-function, orthonormal basis {en , n 2 Z} of H equipped with
z 7! (A, z), of the operator A. The -function is a the inner product h , iH . I 1 (H) is the Banach space
useful tool to extend the ordinary determinant to of bounded linear operators on H such that
-determinants of self-adjoint elliptic operators, kAk1 = tr(jAj) is bounded. Given an (esssentially
thereby providing an ansatz to give a meaning to self-adjoint) positive differential operator D of
partition functions in the path integral approach to order d acting on smooth functions on a closed
quantum field theory. n-dimensional Riemannian manifold M, its
complex power Dz is a trace class on the space
of L2 -functions on M provided Re(z) > n=d and the
Operator Algebras
corresponding trace tr(Dz ) extends to a mero-
Bounded linear operators on a Hilbert space H morphic function on the whole plane, the
form an algebra L(H) closed for the operator norm -function (D, z) which is holomorphic at 0.
96 Introductory Article: Minkowski Spacetime and Special Relativity
More generally, Banach spaces lp (Z), 1 p < 1, operators) are particularly useful. A Holder-type
of
P complex-valued sequences (un )n2Z such that inequality shows that the product of two Hilbert
p
ju
n2Z n j < 1 relate to Schatten ideals I p (H), 1 Schmidt operators is trace-class. Moreover, for any
p < 1, where I p (H) is the Banach space of bounded two HilbertSchmidt operators A and B, the
linear operators on H such that kAkp = (tr(jAjp ))1=p cyclicity property that tr(A B) = tr(B A) holds,
is bounded. Just as all lp -sequences converge to 0, and the sesquilinear form (A, B) 7! tr(A B ) makes
the Schatten ideals I p (H) all lie in K(H) and we L2 (H) a Hilbert space.
have I p1 (H) I p (H) K(H).
Compact operators and Schatten ideals are
useful to extend index theory to a noncommuta-
Further Reading
tive context; a Fredholm module (H, F) over an
involutive algebra A is given by an involutive Adams R (1975) Sobolev Spaces. London: Academic Press.
representation of A in a Hilbert space H and Dunford N and Schwartz J (1971) Linear Operators. Part I.
a self-adjoint bounded linear operator F on H General Theory. Part II. Spectral Theory. Part III. Spectral
Operators. New York: Wiley.
such that F2 = IdH and the operator brackets Hille E (1972) Methods in Classical and Functional Analysis.
[F, (a)] are compact for all a 2 A. To a London: Academic Press and Addison-Wesley.
p-summable Fredholm module (H, F), that is, Kato T (1982) A Short Introduction to Perturbation Theory for
[F, (a)] 2 I p (H) for all a 2 A, one associates a Linear Operators. New YorkBerlin: Springer.
representative
of the Chern character ch (H, F) Reed M and Simon B (1980) Methods of Modern Mathematical
Physics vols. IIV, 2nd edn. New York: Academic Press.
given by a cyclic cocycle on A, which pairs up with Riesz F and SZ-Nagy B (1968) Lecons danalyse fonctionnelle.
K-theory to build an integer-valued index map
Paris: GauthierVillars: Budapest Akademiai Kiado.
on K-theory. Rudin W (1994) Functional Analysis, 2nd edn. New York:
Schatten ideals are also useful to investigate the International Series in Pure and Applied Mathematics.
geometry of infinite-dimensional spaces such as loop Yosida K (1980) Functional Analysis, 6th edn. Die Grundlehren
der Mathematischen Wissenschaften in Einzeldarstellungen
groups, for which the HilbertSchmidt operators Band vol. 132. BerlinNew York: Springer.
(operators in I 2 (H) are also called HilbertSchmidt
Timelike 1. (orthogonality) T = ,
CN where T means transpose and
0 1
1 0 0 0
B0 1 0 0C
ab B
@0
C
Null 0 1 0A
Spacelike 0 0 0 1
very particular type of observer. Specifically, our coordinate axes. On the other hand, for any real
admissible observers preside over three-dimensional, number one can define an element L() of L by
right-handed, Cartesian spatial coordinate systems, 0 1
cosh 0 0 sinh
relative to which photons always move along B C
0 1 0 0
straight lines in any direction. With a single clock L B
@
C
A 3
0 0 1 0
located at the origin, such an observer can determine
sinh 0 0 cosh
the speed, c, of light in vacuo by the so-called Fizeau
procedure (emit a photon from the origin when the and, if two admissible bases are related by this Lorentz
clock there reads t1 , bounce it back from a mirror transformation, then the coordinate transformation [2]
located at (x1 , x2 , x3 ), receive the photon at the becomes
origin again when the clock there reads t2 and set
q ^1 cosh x1 sinh x4
x
c = 2 (x1 )2 (x2 )2 (x3 )2 =(t2 t1 )). Now place an
^2 x2
x
identical clock at each spatial point and synchronize 4
them by emitting from the origin a spherical ^3 x3
x
electromagnetic wave (photons in all directions) ^4 sinh x1 cosh x4
x
and setting the clock whose location is (x1 , x2 , x3 )
q Letting = tanh (so that 1 < < 1) and suppressing
to read (x1 )2 (x2 )2 (x3 )2 =c at the instant the ^2 = x2 and x
x ^3 = x3 , one obtains
wave arrives. An observer now assigns to an event
1
the three spatial coordinates of the location at which ^1 p x1 p x4
x
1 2 1 2
it occurred in his coordinate system as well as the 5
time reading on the clock at that location at the 1
^4 p x1 p x4
x
instant the event occurred. We shall assume also 1 2 1 2
that our admissible observers are inertial in the sense
of Newtonian mechanics (the trajectory of a particle This corresponds to two observers whose spatial
on which no forces act, when described in terms axes are oriented as shown in Figure 2 with the
of the coordinates just introduced, is a point or a hatted coordinate system moving along the common
straight line traversed at constant speed). It is an x1 -, x
^1 -axis with speed jj, to the right if > 0 and
experimental fact (and quite a remarkable one) that to the left if < 0.
all of these admissible observers (whether or not they We remark that, reverting to traditional time units,
are in relative motion) agree on the numerical value of = v=c, where jvj is the relative speed of the two
the speed of light in vacuo (c
3.00 1010 cm s1 ). coordinate systems, and [5] becomes what is gener-
We shall exploit this fact at the outset to have all of our ally referred to as a Lorentz transformation in
admissible observers measure time in units of distance elementary expositions of special relativity, that is,
by simply multiplying their time coordinates t by c. x1 vt
The resulting time coordinate is denoted x4 = ct. In ^1 p
x
1 v2 =c2
these units all speeds are dimensionless and the speed 6
of light in vacuo is 1. t v=c2 x1
^t p
In our mathematical model M of the world of 1 v2 =c2
events, this very subtle and complex notion of an
admissible observer is fully identified with the
conceptually very simple notion of an admissible
x2 x 2
basis {e1 , e2 , e3 , e4 }. If x 2 M is an event and if we
write x = xa ea , then (x1 , x2 , x3 ) are the spatial and x4 ( > 0)
is the time coordinate supplied for x by the
corresponding observer. If {^e1 , ^e2 , ^e3 , ^e4 } is another
basis/observer related to {e1 , e2 , e3 , e4 } by [1] and if
we write x = x ^a^ea , then
^a a b xb ;
x a 1; 2; 3; 4 2
x 1, x 1
Thus, Lorentz transformations relate the space and
time coordinates supplied for any given event by two
admissible observers. If (a b ) 2 R, then the two x3 x 3
observers differ only in the orientation of their spatial Figure 2 Observers in standard configuration.
Introductory Article: Minkowski Spacetime and Special Relativity 99
There is a sense in which, to understand the material object (e.g., the observers clock situated at
kinematic effects of special relativity, it is enough that point) we find that the events x0 and x are both
to restrict ones attention to the so-called special experienced by this material particle and that,
p
Lorentz transformations L(). Specifically, one can moreover, jg(x x0 , x x0 )j is just the time lapse
show (Naber 1992, theorem 1.3.5) that if 2 L is between the events recorded by a clock carried along by
any Lorentz transformation, then there exists a real this material particle. To any other admissible observer
number and two rotations R1 , R2 2 R such that this material particle appears free (not subject to
= R1 L()R2 . Since R1 and R2 involve no relative forces) because it moves on a straight line with constant
motion, all of the kinematics is contained in L(). speed. This leads us to the following definitions. If
We shall explore these kinematic effects in more x0 , x 2 M are such that x x0 is timelike, then the
detail shortly. straight line in M containing x0 and x is called the
Now suppose that x and x0 are two distinct events world
p line of a free material particle in M and
in M and consider the displacement vector x x0 jg(x x0 , x x0 )j, usually written (x x0 ), or
from x0 to x. If {e1 , e2 , e3 , e4 } is an admissible basis simply , is the proper time separation of x0 and x.
and if we write x = xa ea and x0 = xa0 ea , then x One can think of (x x0 ) as a sort of length for
x0 = (xa xa0 )ea = xa ea . If x x0 is null, then x x0 measured, however, by a clock carried along by
1 2 2 2 3 2 4 2 a free material particle that experiences both x0 and x.
x x x x It is an odd sort of length, however, since it satisfies
so the spatial separation of the two events is equal to not the usual triangle inequality, but the following
the distance light would travel during the time lapse reversed version.
between the events. The same must be true in any Reversed triangle inequality (Naber 1992, theorem
other admissible basis since Lorentz transformations 1.4.2) Let x0 , x and y be events in M for which y x
are the matrices of linear maps that preserve the and x x0 are timelike with the same time orientation.
Lorentz inner product. Consequently, all admissible Then y x0 = (y x) (x x0 ) is timelike and
observers agree that x0 and x are connectible by
a photon. They even agree as to which of the two y x0 y x x x0 7
events is to be regarded as the emission of the
with equality holding if and only if y x and x x0
photon and which is to be regarded as its reception
are linearly dependent.
since one can show (Naber 1992, theorem 1.3.3)
that, when a vector is either timelike or null and The sense of the inequality in [7] has interesting
nonzero, the sign of its fourth coordinate is the same consequences about which we will have more to say
in every admissible basis (because 4 4 1). Thus, shortly.
x4 x40 is either positive for all admissible observers Finally, let us suppose that x x0 is spacelike.
(x0 occurred before x) or negative for all admissible Then, in any admissible basis
observers (x0 occurred after x). Since photons move 1 2 2 2 3 2 4 2
along straight lines in admissible coordinate systems x x x > x
we adopt the following terminology. If x0 , x 2 M are
such that x x0 is null, then the straight line in M so the spatial separation of x0 and x is greater than the
containing x0 and x is called the world line of a distance light could travel during the time lapse that
photon in M and is to be thought of as the set of all separates them. There is clearly no admissible observer
events in the history of some particle of light that for whom the events occur at the same location. No
experiences both x0 and x. free material particle (or even photon) can experience
Let us now suppose instead that x x0 is timelike. both x0 and x. However, one can show (Naber 1992,
Then, in any admissible basis, section 1.5) that, given any real number T (positive,
negative, or zero), one can find an admissible basis
1 2 2 2 3 2 4 2 x4 = T. Some admissible
{^e1 , ^e2 , ^e3 , ^e4 } in which ^
x x x < x
observers will judge the events simultaneous, some
so the spatial separation of x0 and x is less than the will assert that x0 occurred before x, and others will
distance light would travel during the time lapse reverse the order. Temporal order, cause and effect,
between the events. In this case, one can prove (Naber have no meaning for such pairs of events. For those
1992, section 1.4) that there exists an admissible basis admissible observers for whom the events are simulta-
p
x1 = ^
{^e1 , ^e2 , ^e3 , ^e4 } in which ^ x2 = ^
x3 = 0, that is, neous (^ x4 = 0), the quantity g(x x0 , x x0 ) is
there is an admissible observer for whom the two the distance between them and for this reason this
events occur at the same spatial location, one after the quantity is called the proper spatial separation of x0
other. Thinking of this location as occupied by some and x (whenever x x0 is spacelike).
100 Introductory Article: Minkowski Spacetime and Special Relativity
For any two events x0 , x 2 M, g(x x0 , x x0 ) is is unnecessary, but makes the pictures easier to
given in any admissible basis by (x1 )2 (x2 )2 draw). The x ^1 -axis will be represented by the
(x3 )2 (x4 )2 and is called the interval separating straight line x ^4 = 0 which, from [5], is given by
x0 and x. It is the closest analog in Minkowskian x4 = x1 (in Figure 3 we have assumed that > 0).
geometry to the (squared) length in Euclidean Similarly, the x ^4 -axis is identified with the line
geometry. It can, however, assume any real value x4 = (1=)x1 . Since Lorentz transformations leave
depending on the physical relationship between the Lorentz inner product invariant, the hyperbolas
the events x0 and x. Historically, of course, it was (x1 )2 (x4 )2 = k coincide with (^ x1 )2 (^x4 )2 = k and
the various physical interpretations of this interval we calibrate the axes accordingly, for example, the
that we have just described which led Minkowski branch of (x1 )2 (x4 )2 = 1 with x1 > 0 intersects
(Einstein et al. 1958) to the introduction of the the x1 -axis at the point (x1 , x4 ) = (1, 0) and intersects
structure that bears his name. the x ^1 -axis at the point (^ x1 , x
^4 ) = (1, 0). This
necessitates a different scale on the hatted and
unhatted axes, but one can show (Naber 1992,
Kinematic Effects section 1.3) that, with this calibration, all coordi-
nates can be obtained geometrically by projecting
All of the well-known kinematic effects of special parallel to the opposite axis (e.g., the x4 - and x ^4 -
relativity (the addition of velocities formula, the coordinates of an event result from projecting
relativity of simultaneity, time dilation, and length parallel to the x1 - and x ^1 -axes, respectively).
contraction) follow easily from what we have done. Thus, a line of simultaneity in the hatted
Because it eases visualization and because, as we (respectively, unhatted) coordinates is parallel to
mentioned earlier, it suffices to do so, we will limit our the x^1 - (respectively, x1 -) axis so that, in general, a
discussion to the special Lorentz transformations. pair of events lying on one will not lie on the other
Let 1 and 2 be two real numbers and consider (note, however, that these lines are really three-
the corresponding elements L(1 ) and L(2 ) of dimensional hyperplanes so what appears to be a
L defined by [3]. Sum formulas for sinh and point of intersection is actually a two-dimensional
cosh imply that L(1 )L(2 ) = L(1 2 ). Defining plane of agreement, any two events in which are
i = tanh i , i = 1, 2, and = tanh (1 2 ), the sum judged simultaneous by both observers).
formula for tanh then gives For any two events whatsoever the relationship
1 2 between the time lapse ^ x4 in the hatted coordinates
8 4
and the time lapse x in the unhatted coordinates is,
1 1 2
from [5],
The physical interpretation is simple. One has three
1
admissible observers whose spatial axes are related x4 p x1 p x4
^
in the manner shown in Figure 2. If the speed of the 1 2 1 2
second relative to the first is 1 and the speed of the so the two are generally not equal. Consider, in
third relative to the second is 2 , then the speed of particular, two events on the world line of a point
the third relative to the first is not 1 2 as a at rest in the unhatted coordinate system, for
Newtonian predisposition would lead one to expect,
but rather , given by [8]. This is the relativistic
addition of velocities formula.
We have seen already that, when the interval x4
x 4 (x 1)2 (x 4)2 = 1
between x0 and x is spacelike, the events will be
judged simultaneous by some admissible obser- Hatted line of simultaneity
vers, but not by others. Indeed, if x4 = 0
and the observers
p
are related by [5], then ^x4 = Unhatted line of simultaneity
2 1 1
(= 1 )x = ^ x , which will not be
zero unless = 0 and so there is no relative motion x 1
(^x1 cannot be zero since then ^ xa = 0 for
a = 1, 2, 3, 4 and x = x0 ). This phenomenon is (x 1, x 4) = (1, 0)
called the relativity of simultaneity and we now
construct a simple geometrical representation of it.
x1
Select two perpendicular lines in the plane to (x 1, x 4) = (1, 0)
represent the x1 - and x4 -axes (the Euclidean ortho-
gonality of the lines has no physical significance and Figure 3 Relativity of simultaneity.
Introductory Article: Minkowski Spacetime and Special Relativity 101
complex and subtle and some of which are common- approximation to the integral and appealing to our
place (a passenger in a smooth, quiet airplane interpretation of the proper time separation
p
traveling at constant groundspeed cannot feel = ab xa xb . There are subtleties, however,
his motion relative to the earth). It is a powerful both mathematical and physical (Naber 1992, section
guide for constructing the laws of relativistic 1.4). The mathematical ones are addressed by the
physics, but even more fundamentally it prohibits following result (which combines theorems 1.4.6
us from regarding any particular admissible observer and 1.4.8 of Naber (1992)).
as having a privileged view of the universe. In
Theorem Let x0 and x be two events in M. Then
particular, we are forbidden from attaching any
x x0 is timelike and future directed if and only if
objective significance to such questions as, were the
there exists a timelike world line : [0 , 1 ] ! M in
two supernovae simultaneous?, How long did the
M with (0 ) = x0 and (1 ) = x and, in this case,
meson survive?, and What is the distance between
the Crab Nebula and Alpha Centauri? This is L x x0 9
severe, but one must deal with it.
with equality holding if and only if is a parametriza-
tion of a timelike straight line.
Particles and 4-Momentum The inequality [9] asserts that if two material
particles experience both x0 and x, then the one
If I R is an interval, then a map : I ! M is a curve
that is free (and so can be regarded as at rest in
in M. Relative to any admissible basis we can write
some admissible coordinate system) has longer to
xa ea wait for the occurrence of the second event (moving
clocks run slow). For many years this basically
for each 2 I. We shall assume that is smooth in
obvious fact was christened The Twin Paradox.
the sense that each xa (), a = 1, 2, 3, 4, is infinitely
Just as a smooth curve in Euclidean space has an
differentiable (C1 ) on I and the velocity vector
arc length parametrization, so a timelike world line
dxa has a proper time parametrization defined as
0 ea
d follows. For each in [0 , 1 ] let
Z p
is nonzero for every 2 I (we adopt the usual
custom, in a vector space, of identifying the tangent jg 0 ; 0 j d
0
space at each point with the vector space itself). This
definition of smoothness clearly does not depend on (the proper time length of from (0 ) to ()).
the choice of admissible basis for M. The curve is Then = () has a smooth inverse = () so can
said to be spacelike, timelike, or null if be reparametrized by . We will abuse our notation
slightly and write
dxa dxb
0 0 ab xa ea
d d
is positive, negative, or zero, respectively, for each The velocity vector with this parametrization is
2 I. A timelike curve for which 0 () is future denoted
directed for each 2 I is called a timelike world line dxa
and its image is identified with the set of all events U U ea
d
in the history of some (not necessarily free) point
material particle. If I = [0 , 1 ] and : [0 , 1 ] ! M called the 4-velocity of the world line and is the unit
is a timelike world line, then the proper time length tangent vector field to , that is,
of is defined by U U 1 10
Z 1 p
L jg0 ; 0 j d for each . An admissible observer is, of course,
0 more likely to parametrize a world line by his own
s time coordinate x4 . Then
Z 1
dxa dxb
ab d
d d dx1 dx2 dx3
0
0 x4 4 e1 4 e2 4 e3 e4
dx dx dx
and interpreted as the time lapse between the events
(0 ) and (1 ) as recorded by a clock carried along by so
the particle whose world line is . This interpretation 0 4
g x ; 0 x4 1 kVk2
is easily motivated by writing out a Riemann sum
Introductory Article: Minkowski Spacetime and Special Relativity 103
is a parametrization of the world line of a photon parallel, in which case the sum is null and future
through x0 . Being null, N can be written in any directed (Naber 1992, lemma 1.4.3). We call this
admissible basis as sum the total 4-momentum of A. Now we formulate
a definition which is intended to model a finite set
N N e4 d e4 19 of free particles colliding at some event with a
(perhaps new) set of free particles emerging from the
where
h collision (e.g., an electron and proton collide, with a
d N e1 2 N e2 2 neutron and neutrino emerging from the collision).
A contact interaction in M is a triple (A, x, A), ~
i1=2 h
N e 3 2 N e1 e1 where A and A~ are two finite sets of free particles,
i neither of which contains a pair of particles with
N e2 e2 N e3 e3 20 linearly dependent 4-momenta (which would pre-
sumably be physically indistinguishable) and x 2 M
is the direction vector of the world line in the is an event such that
corresponding spatial coordinate system. Now, by
1. x is the terminal point of all of the particles in A
analogy with [16], we define a photon in M to
(i.e., for each world line : [0 , 1 ] ! M of a
be a curve in M of the form [18], take N to be its
particle in A, (1 ) = x);
4-momentum and define the energy E of the photon ~ and
2. x is the initial point of all the particles in A,
in the admissible basis {e1 , e2 , e3 , e4 } by
3. the total 4-momentum of A equals the total
4-momentum of A. ~
E N e4 21
Properly (3) is called the conservation of 4-momentum.
Then, by [19], ~ is
If A consists of a single free particle, then (A, x, A)
N E d e4 22 called a decay (e.g., a neutron decays into a proton, an
electron and an antineutrino).
The corresponding frequency
and wavelength ~
Consider, for example, an interaction (A, x,A)
are then defined by
= E=h and = 1=
. In another
^ ^e4 ), where d
^d ^ for which A~ consists of a single photon. The total
admissible basis, one has N = E(
^ 4-momentum of A~ is null so the same must be true of
and E are defined by the hatted versions of [20] and
A. Since the 4-momenta of the individual particles in
[21]. One can then show (Naber 1992, section 1.8)
A are timelike or null and future directed their sum
that
can be null only if they are, in fact, all null and
E^
^ 1 cos parallel. Since A cannot contain distinct photons with
p parallel 4-momenta, it must consist of a single photon
E
1 2
which, by (3), must have the same 4-momentum as
1 ~ In essence, nothing happened at
1 cos 2 1 cos 23 the photon in A.
2 x. We conclude that no nontrivial interaction of the
where is the relative speed of the two spatial type modeled by our definition can result in a single
coordinate systems and is the angle (in the photon and nothing else. Reversing the roles of A
unhatted spatial coordinate system) between the and A~ shows that, if 4-momentum is to be conserved,
direction d of the photon and the direction of a photon cannot decay.
motion of the hatted spatial coordinate system. Next let us consider the decay of a single material
Equation [23] is the formula for the relativistic particle into two material particles, for example, the
Doppler effect with the first term in the series being spontaneous disintegration of an atom through
the classical formula. -emission. Thus, we consider a contact interaction
We conclude this section by examining a few ~ in which A consists of a single free material
(A, x, A)
simple interactions between particles of the sort particle of proper mass m0 and A~ consists of two
modeled by our definitions, assuming only that free material particles with proper masses m1 and
4-momentum is conserved in the interaction. For m2 . Let P0 , P1 , and P2 be the 4-momenta of the
convenience, we will use the term free particle to particles of proper mass m0 , m1 , and m2 , respec-
refer to either a free material particle or a photon. tively. Then P0 = P1 P2 . Appealing to the
If A is a finite set of free particles, then each reversed triangle inequality, the fact that P1 and
element of A has a unique 4-momentum which is a P2 are linearly independent and future directed, and
future-directed timelike or null vector. The sum of [12] we conclude that
any such collection of vectors is timelike and future
directed, except when all of the vectors are null and m0 > m1 m2 23
Introductory Article: Minkowski Spacetime and Special Relativity 105
The excess mass m0 (m1 m2 ) of the initial (, m, q) is a test charge). Let us write [24] more
particle is regarded, via [17], as a measure of the simply as
amount of energy required to split m0 into two
pieces. Stated somewhat differently, when the two ~ m dU
FU 25
particles in A~ were held together to form the single q d
particle in A, the binding energy contributed to
the mass of this latter particle. Dotting both sides of [25] with U gives
Reversing the roles of A and A~ in the last m dU m d
example gives a contact interaction modelling an ~
FU U U U U
q d 2q d
inelastic collision (two free material particles with
masses m1 and m2 collide and coalesce to form a m d
1 0
third of mass m0 ). The inequality [23] remains true, 2q d
of course, and a somewhat more detailed analysis
Since any future-directed timelike unit vector u is
(Naber 1992, section 1.8) yields an approximate
the 4-velocity of some charged particle, we find
formula for m0 (m1 m2 ) which can be com- ~ u = 0 for any such vector. Linearity then
that F(u)
pared (favorably) with the Newtonian formula for ~ v = 0 for any timelike vector. Now,
implies F(v)
the loss in kinetic energy that results from the
if u and v are timelike and future directed, then u v
collision (energy which, classically, is viewed as ~ v) (u v) = F(u)
~ v
is timelike so 0 = F(u
taking the form of heat in the combined particle). ~ ~ ~
u F(v) and therefore F(u) v = u F(v). But M
An analysis of the interaction in which both A and
has a basis of future-directed timelike vectors so
A~ consist of an electron and a photon yields (Naber
1992, section 1.8) a formula for the so-called ~ ~
Fx y x Fy 26
Compton effect. Many more such examples of this
sort are treated in great detail in Synge (1972, for all x, y 2 M. Thus, at each point, the linear
chapter VI, 14). transformation F ~ must be skew-symmetric with
respect to the Lorentz inner product. One could
therefore model an electromagnetic field on M by
Charged Particles and Electromagnetic an assignment to each point of a skew-symmetric
Fields linear transformation whose job it is to assign to the
4-velocity of a charged particle whose world line
A charged particle in M is a triple (, m, q), where passes through that point the change in 4-momen-
(, m) is a material particle and q is a nonzero real tum that the particle should expect to experience
number called the charge of the particle. Charged because of the presence of the field. However, a
particles do two things of interest to us. By their slightly different perspective has proved more con-
very presence they create electromagnetic fields and venient. Notice that a skew-symmetric linear trans-
they also respond to the electromagnetic fields formation F ~ : M ! M and the Lorentz inner
created by other charges. product together determine a bilinear form F : M
Charged particles respond to an electromag- M ! R given by
netic field by experiencing changes in 4-momentum.
The quantitative nature of this response, that is, the ~
Fx; y Fx y
equation of motion, is generally taken to be the
so-called Lorentz 4-force law which expresses ~ x=
which is also skew-symmetric (F(y, x) = F(y)
the proper time rate of change of the particles F(x, y)) and that, conversely, a skew-symmetric
4-momentum at each point of the world line as a bilinear form uniquely determines a skew-symmetric
linear function of the 4-velocity. Thus, at each point linear transformation. Now, an assignment of a
() of the world line skew-symmetric bilinear form to each point of M is
nothing other than a 2-form on M and it is in the
dP language of forms that we choose to phrase classical
~ U
qF 24
d electromagnetic theory (a concise introduction to
this language is available, for example, in Spivak
where F ~( ) :M ! M is a linear transformation (1965, chapter 4).
determined, in each admissible coordinate system, Nature imposes a certain restriction on which
by the classical electric E and magnetic B fields (here 2-forms can reasonably represent an electromagnetic
we are assuming that the contribution of q to the field on M (Maxwells equations). To formulate
ambient electromagnetic field is negligible, that is, these we introduce a source 1-form J as follows: If
106 Introductory Article: Minkowski Spacetime and Special Relativity
x1 , x2 , x3 , x4 is any admissible coordinate system on On regions in which there are no charges, so that
M, then J = 0, [28] and [31] become the source free Maxwell
equations
J J1 dx1 J2 dx2 J3 dx3 dx4 27
dF 0 32
where : M ! R is a charge density function and
J = J1 e1 J2 e2 J3 e3 is a current density vector field and
(these are to be regarded as the usual smoothed d F 0 33
out, pointwise versions of charge per unit
volume and charge flow per unit area per unit that is, both F and F are closed 2-forms.
time as measured by the corresponding admissible Any 2-form F on M can be written in any admissible
observer). Now, our formal definition is as follows: coordinate system as F = (1/2)Fab dxa ^ dxb (summa-
The electromagnetic field on M determined by the tion convention!), where (Fab ) is the skew-symmetric
source 1-form J on M is a 2-form F on M that matrix of components of F. In order to make contact
satisfies Maxwells equation with the notation generally employed in physics, we
introduce the following names for these components:
dF 0 28 0 1
0 B3 B2 E1
and B B3 0 B1 E2 C
Fab B
@ B2 B1
C 34
0 E3 A
d FJ 29 E1 E2 E3 0
A few comments are in order here. We have chosen Thus,
units in which not only the speed of light, but also
various other constants that one often finds in F E1 dx1 ^ dx4 E2 dx2 ^ dx4
Maxwells equations (the dielectric constant
0 and E3 dx3 ^ dx4 B3 dx1 ^ dx2
magnetic permeability 0 ) are 1 and a factor of 4 in
[29] is normalized out. The in [29] is the Hodge B2 dx3 ^ dx1 B1 dx2 ^ dx3 35
star operator determined by the Lorentz inner Computing F, dF, d F and d F and writing
product and the chosen orientation of M. This is a E = E1 e1 E2 e2 E3 e3 and B = B1 e1 B2 e2 B3 e3
natural isomorphism one finds that dF = 0 is equivalent to
: p M ! 4p M; p 0; 1; 2; 3; 4 div B 0 36
introduce constant magnetic fields in a bubble is any 2-form satisfying dF = 0 and g is an arbitrary
chamber so as to induce a particle of interest to 0-form, then locally, on a neighborhood of any
follow a circular path. We show now how to point, there exists a 1-form A satisfying
measure the charge-to-mass ratio for such a particle.
Taking c = 0 in [45] and computing U(), then using dA F and d Ag 47
[11] to solve for the coordinate velocity vector V of (a more general result is proved in Parrott (1987,
the particle gives appendix 2) and a still more general one in section
2.9 of this same source). The usefulness of the
abq=m bq
V q cos e1 second condition in [47] can be illustrated as
m
1 kVk2 follows. Suppose we are given some (physical)
configuration of charges and currents (i.e., some
bq
sin e2 source 1-form J) and we wish to find the corre-
m
sponding electromagnetic field F. We must solve
From this one computes Maxwells equations dF = 0 and d F = J (subject to
1 whatever boundary conditions are appropriate).
m2 Locally, at least, we may seek instead a correspond-
kV k2 1
a b2 q2
2 ing potential A (so that F = dA). Then the first of
Maxwells equations is automatically satisfied
(note that this is a constant). Solving this last equation (dF = d(dA) = 0) and we need only solve
for q=m (and assuming q > 0 for convenience) one
d (dA) = J. To simplify the notation let us tempora-
arrives at rily write = d and consider the operator =
q 1 kVk d
d on forms (variously called the Laplace
q Beltrami operator, Laplacede Rham operator, or
m ajbj
1 kVk2 Hodge Laplacian on Minkowski spacetime). Then
Since a, b, and kVk are measurable, one obtains the A dA dA d d A d dA 48
desired charge-to-mass ratio.
To conclude we wish to briefly consider the According to the result quoted above, we may
existence and use of potentials for electromagnetic narrow down our search by imposing the condition
fields. Suppose F is an electromagnetic field defined d A = 0, that is
on some connected, open region X in M. Then F is
A 0 49
a 2-form on X which, by [28], is closed. Suppose
also that the second de Rham cohomology H 2 (X ; R) (this is generally referred to as imposing the Lorentz
of X is trivial (since M is topologically R4 this will gauge). With this, [48] becomes A = d (dA) and
be the case, for example, when X is all of M, or an to satisfy the second Maxwell equation we must
open ball in M, or, more generally, an open star- solve
shaped region in M). Then, by definition, every
closed 2-form on X is exact so, in particular, there A J 50
exists a 1-form A on X satisfying Thus, we see that the problem of (locally) solving
F dA 46 Maxwells equations for a given source J reduces
to that of solving [49] and [50] for the potential A.
In particular, such a 1-form A always exists locally To understand how this simplifies the problem, we
on a neighborhood of any point in X for any F. Such note that a calculation in admissible coordinates
an A is not uniquely determined, however, because, shows that the operator reduces to the compo-
if A satisfies [46], then so does A df for any nentwise dAlembertian &, defined on real-valued
smooth real-valued function (0-form) f on X (d2 = 0 functions by
implies d(A df ) = dA d2 f = dA = F). Any 1-form
A satisfying [46] is called a (gauge) potential for F. @2 @2 @2 @2
& 2
2
2
The replacement A ! A df for some f is called a @x1 @x2 @x3 @x4 2
gauge transformation of the potential and the
freedom to make such a replacement without Thus, eqn [50] decouples into four scalar equations
altering [46] is called gauge freedom. &Aa Ja ; a 1; 2; 3; 4 51
One can show that, given F, it is always possible
to locally solve dA = F for A subject to an arbitrary each of which is the well-studied inhomogeneous
specification of the 0-form d A. More precisely, if F wave equation.
Introductory Article: Quantum Mechanics 109
described on the assumption that light is distributed P Jordan, W Pauli, P Dirac and, on the mathema-
discontinuously in space and described by a finite number tical side, also by J von Neumann and A Weyl. This
of quanta which move without being divided and which formulation maintains that one should only consider
must be absorbed or emitted as a whole. relations between observable quantities, described
Notice that, for wavelength of 8103 A, a 30 W by elements that depend only on the initial and final
lamp emits roughly 1020 photons s1 ; for macro- states of the system; each state has an internal
scopic objects the discrete nature of light has no energy. By energy conservation, the difference
appreciable consequence. between the energies must be proportional (with a
Plancks postulate and energy conservation imply universal constant) to the frequency of the radiation
that in emitting and absorbing light the atoms of the absorbed or emitted. This is enough to define the
various elements can lose or gain energy only by energy of the state of a single atom modulo an
discrete amounts. Therefore, atoms as producers or additive constant. The theory must also take into
absorbers of radiation are better described by a account the probability of transitions under the
theory that assigns to each atom a (possible infinite) influence of an external electromagnetic field.
discrete set of states which have a definite energy. We shall give some details later on, which will
The old quantum theory of matter addresses help to follow the basis of this approach.
precisely this question. Its main proponent is The other attempt was originated by L de Broglie
N Bohr (Bohr 1913, 1918). The new theory is following early remarks by HW Bragg and
entirely phenomenological (as is Plancks theory) M Brillouin. Instead of emphasizing the discrete
and based on Rutherfords model and on three nature of light, he stressed the possible wave nature
more postulates (Born 1924): of particles, using as a guide the HamiltonJacobi
formulation of classical mechanics. This attempt
(i) The states of the atom are stable periodic was soon supported by the experiments of Davisson
orbits, as given by Newtons laws, of energy and Germer (1927) of scattering of a beam of ions
En , n 2 Z , given by En = hn f (n), where h is from a crystal. These experiments showed that,
Planks constant, n is the frequency of the while electrons are recorded as point particles,
electron on that orbit, and f(n) is for each atom their distribution follows the law of the intensity for
a function approximately linear in Z at least for the diffraction of a (dispersive) wave. Moreover, the
small values of Z. relation between momentum and frequency was,
(ii) When radiation is emitted or absorbed, the within experimental errors, the same as that
atom makes a transition to a different state. obtained by Einstein for photons.
The frequency of the radiation emitted or The theory started by de Broglie was soon placed
absorbed when making a transition is in almost definitive form by E Schrodinger. In this
n, m = h1 jEn Em j. approach one is naturally led to formulate and solve
(iii) For large values of n and m and small values of partial differential equations and the full develop-
(n m)=(n m) the prediction of the theory ment of the theory requires regularity results from
should agree with those of the classical theory the theory of functions.
of the interaction of matter with radiation. Schrodinger soon realized that the relations which
were found in the approach of Heisenberg could be
Later, A Sommerfeld gave a different version of the easily (modulo technical details which we shall
first postulate, by requiring that the allowed orbits discuss later) obtained within the formalism he was
be those for which the classical action is an integer advocating and indeed he gave a proof that the two
multiple of Plancks constant. formalisms were equivalent. This proof was later
The old quantum theory met success when refined, from the mathematical point of view, by
applied to simple systems (atoms with Z < 5) but J von Neumann and G Mackey.
it soon appeared evident that a new, radically In fact, Schrodingers approach has proved much
different point of view was needed and a fresh more useful in the solution of most physical
start; the new theory was to contain few free problems in the nonrelativistic domain, because it
parameters, and the role of postulate (iii) was now can rely on the developments and practical use of
to fix the value of these parameters. the theory of functions and of partial differential
There were two (successful) attempts to construct equations. Heisenbergs algebraic approach has
a consistent theory; both required a more sharply therefore a lesser role in solving concrete problems
defined mathematical formalism. The first one was in (nonrelativistic) QM.
sparked by W Heisenberg, and further important If one considers processes in which the number of
ideas and mathematical support came from M Born, particles may change in time, one is forced to
Introductory Article: Quantum Mechanics 111
introduce a Hilbert space that accommodates states neighborhood of !0 ), one finds that u
^(x, !) is an
with an arbitrarily large number of particles, as is approximate solution of the equation
the case of the theory of relativistic quantized field
or in quantum statistical mechanics; it is then more !20 2
ux; !
^ n x; !^
ux; ! 1
difficult to follow the line of Schrodinger, due to c2
difficulties in handling spaces of functions of Writing u(x, !) = A(x, !) ei(!=c)W(x, !) the phase
infinitely many variables. The approach of Heisen- W(x, !) satisfies, in the high-frequency limit, the
berg, based on the algebra of matrices, has a rather eikonal equation jrW(x, !)j2 = n2 (x, !). One can
natural extension to suitable algebras of operators; define for the solution a phase velocity vf and it
the approach of Schrodinger, based on the descrip- turns out that vf = c=jrW(x, !)j.
tion of a state as a (wave) function, encounters more On the other hand, classical mechanics can also be
difficulties since one must introduce functionals over described by propagation of surfaces of constant value
spaces of functions and the description of dynamics for the solution W(x, t) of the HamiltonJacobi
does not have a simple form. equation H(x, rW) = E, with H = p2 =2m V(x).
From this point of view, the generalization of Recall that high-frequency (the realm of geometric
Heisenbergs approach has led to much progress in optics) corresponds to small distances. This analogy
the understanding of the structure of the resulting led Schrodinger (1926) to postulate that the dynamics
theory. Still some relevant results have been satisfied by the waves associated with the particles was
obtained in a Schrodinger representation. We shall given by the (Schrodinger) equation
not elaborate further on this point.
We shall end this introductory section with a @ x; t h2
ih x x; t Vx x; t 2
short description of the emergence of the structure @t 2m
of QM in Heisenbergs and Schrodingers
This wave was to describe the particle and its motion,
approaches; this will provide a motivation for the
but, being complex valued, it could not represent any
axiom of QM which we shall introduce in the
measurable property. It is a mathematical
R property of
following section. For an extended analysis, see, for
the solutions of [2] that the quantity j (x, t)j2 d3 x is
example, Jammer (1979).
preserved in time. Furthermore, if one sets
The specific form that was postulated by
de Broglie (1923) for the wave nature of a particle x; t j x; tj2
relies on the relation of geometrical optics with
h
wave propagation and on the formulation of jx; t i x; tr x; t x; tr x; t 3
Hamiltonian mechanics as a sort of wave front 2m
propagation through the solution of the Hamilton one easily verifies the local conservation law
Jacobi equation and the introduction of group
@
velocity. div jx; t 0 4
By the analogy with electromagnetic wave, it is @t
natural to associate with a free nonrelativistic These mathematical properties led to the statis-
particle of momentum p and mass m the plane wave tical interpretation given by Max Born: in those
experiments in which the position of the particles is
h p2 measured, the integral of j (x, t)j2 over a region of
p x; t eipxEt=h ; h
; E space gives the probability that at time t the particle
2 2m
is localized in the region . Moreover, the current
Schrodinger obtained the equation for a quantum associated with a charged particle is given locally by
particle in a field of conservative forces with j(x, t) defined above.
potential V(x) by considering an analogy with the Let us now briefly review Heisenbergs approach.
propagation of an electromagnetic wave in a At the heart of this approach are: empirical formulas
medium with refraction index n(x, !) that varies for the intensities of emission and absorption of
slowly on the scale of the wavelength. Indeed, in this radiation (dispersion relations), Sommerfelds quan-
case the wave follows the laws of geometrical tum condition for the action and the vague
optics, and has therefore a particle-like behavior. statement the analogue of the derivative for the
If one denotes by u^(x, !) the Fourier transom (with discrete action variable is the corresponding finite
respect to time) of a generic component of the difference quotient. And, most important, the
electric field and one assumes that the field be remark that the correct description of atomic
essentially monochromatic (so that the support of physics was through quantities associated with
u
^(x, !) as a function of ! is in a very small pairs of states, that is, (infinite) matrices and the
112 Introductory Article: Quantum Mechanics
empirical fact that the frequency (or rather the wave The conclusion Born and Heisenberg drew is that
number) !k, j of the radiation (emitted or absorbed) the matrix A that takes the place of the momentum
in the transition between the atomic levels k and in the classical theory must be such that
j (k 6 j) satisfies the Ritz combination principle jAnm, n j2 = e2 hm1 f (n m, n). In the same vein,
!m, j !j, k = !m, k . It easy to see that any doubly considering the polarization in a static electric
indexed family satisfying this relation must have the field, it is possible to find an expression for the
form !m, k = Em Ek for suitable constant Ej . matrix that takes the place of the coordinate x in
It was empirically verified by Kramers that the classical Hamiltonian theory.
dipole moment of an atom in an external monochro- In general, the new approach (matrix mechanics)
matic external field with frequency was proportional associates matrices with some relevant classical
to the field with a coefficient (of polarization) observables (such as functions of position or
momentum) with a time dependence that is derived
e2 X fi Fi from the empirical dispersion relations of Kramers,
P 5
4m i i2 2 i2 2 the correspondence principle, Bohrs rule, Sommer-
feld action principle and first- (and second-) order
where e, m are the charge and the mass of the
perturbation theory for the interaction of an atom
electron and fi , Fi are the probabilities that the
with an external electromagnetic field. It was soon
frequency is emitted or absorbed.
clear to Born and Jordan (1925) that this dynamics
A detailed analysis of the phenomenon of polarization
took the form ihA_ = AH HA for a matrix H that
in classical mechanics, with the clearly stated aim of
for the case of the hydrogen atom is obtained for the
presenting the results in a way that may give hints for the
classical Hamiltonian with the prescription given for
construction of a New Mechanics was made by Max
the coordinates x and p. It was also seen as plausible
Born (1924). He makes use of action-angle variables
the relation [^ ^k ] = iI among the matrices x
xh , p ^k and
{ Ji , i } assuming that the atom can be considered as a
^k corresponding to position and momentum. One
p
collection of harmonic oscillators with frequency i
year later P Dirac (1926) pointed out the structural
coupled linearly to the electric field of frequency .
identity of this relation with the Poisson bracket of
In the dipole approximation one obtains the
Hamiltonian dynamics, developed a quantum alge-
following result for the polarization P (linear
bra and a quantum differentiation and proved
response in energy to the electric field):
that any -derivation (derivation which preserves
X jAJj2 m the adjoint) of the algebra BN of N N matrices is
P 2m rJ 6 inner, that is, is given by (a) = i[a, h] for a
m>0 m 2
Hermitian matrix h. Much later this theorem was
where k = @H=@Jk , H is the interaction Hamiltonian), extended (with some assumptions) to the algebra of
and A( J) is a suitable matrix. In order to derive the all bounded operators on a separable Hilbert space.
new dynamics, having as a guide the correspondence Since the derivations are generators of a one-
principle, one has to compare this result with the parameter continuous group of automorphisms,
Kramers dispersion relation, which we write (to make that is, of a dynamics, this result led further strength
the comparison easier) in the form to the ideas of Born and Heisenberg.
The algebraic structure introduced by Born,
e2 X fm;n fn;m Jordan, and Heisenberg (1926) was used by Pauli
P 2 2
2 Em > En 7
4m n;m n;m n;m 2 (1927) to give a purely group-theoretical derivation
of the spectrum of the hydrogen atom, following the
Bohrs rule implies that (n , n) = (E(n lines of the derivation in symplectic mechanics of the
E(n))=h. SO(4) symmetry of the Coulomb system. This
Born and Heisenberg noticed that, for n suffi- remarkable success gave much strength to the
ciently large and k small, one can approximate the Heisenberg formulation of QM, which was soon
differential operator in [6] with the corresponding recognized as an efficient instrument in the study of
difference operator, with an error of the order of k/n. the atomic world.
Therefore, [6] could be substituted by The algebraic formulation was also instrumental
" in the description given by Pauli (1928) of the
1
X jAnm;n j2
P h 2
spin (a property of electrons empirically postu-
2
mk >0 n m lated by Goudsmidt and Uhlenbeck to account for a
#
jAnm;n j2 hyperfine splitting of some emission lines) as
8 internal degree of freedom without reference to
n m2 2 spatial coordinates and still connected with the
Introductory Article: Quantum Mechanics 113
properties of the the system under the group of interpretation forces the particle wave to be square
spatial rotations. This description through matrices integrable, and mathematics provides a limitation on
has a major role also in the formulation by Pauli of the simultaneous localization in momentum and
the exclusion principle (and its relation with Fermi position leading to Heisenbergs uncertainty princi-
Dirac statistics), which gave further credit to the ple. Dynamics is obtained from a particlewave
Heisenbergs theory by helping in reproducing duality and an analogy with the relativistic wave
correctly the classification of the atoms. equation in the low-energy regime. The presence of
These features may explain why the standard bound states with quantized energies is seen as a
formulation of the axioms of QM given in the next consequence of the well-known fact that waves
section shows the influence of Heisenbergs confined to a bounded spatial region have their
approach. On the other hand, comparison with wave number (and therefore energy) quantized.
experiments is usually set in the framework in
Schrodingers approach. Posing the problems in
terms of properties of the solution of the Schrodinger Formal Structure
equation, one is led to a pragmatic use of the
In this section we describe the formal mathematical
formalism, leaving aside difficulties of interpreta-
structure that is commonly associated with QM. It
tion. This separation of the axioms from the
constitutes a coherent mathematical theory, but the
practical use may be one of the reasons why a
interpretation axiom it contains leads to conceptual
serious analysis of the axioms and of the problems
difficulties.
that arise from them is apparently not a concern for
We state the axioms in the form in which they
most of the research in QM, even from the point of
were codified by J von Neumann (1966); they
view of mathematical physics.
constitute a mathematically precise rendering of the
One should stress that both the approach of Born
formalism of Born, Heisenberg, and Jordan. The
and Heisenberg and that of de Broglie and Schro-
formalism of Schrodinger per se does not require
dinger are rooted in a mixture of attention to the
general statements about the category of
experimental data, deep understanding of the pre-
observables.
vious theory, bold analogies and approximations,
and deep concern for the consistency of the new Axiom I
mechanics.
(i) Observables are represented by self-adjoint opera-
There is an essential difference between the
tors in a complex separable Hilbert space H.
starting points of the two approaches. In Heisen-
(ii) Every such operator represents an observable.
bergs approach, the atom has a priori no spatial
structure; the description is entirely in terms of its Remark Axiom I (ii) is introduced only for mathe-
properties under emission and absorption of light, matical simplicity. There is no physical justification
and therefore its observable quantities are repre- for part (ii). In principle, an observable must be
sented by matrices. Dynamics enters through the connected to a procedure of measurement (observa-
study of the interaction with the electromagnetic tion) and for most of the self-adjoint operators on H
field, and some analogies with the classical theory of (e.g., in the Schrodinger representation for
electrodynamics in an asymptotic regime (correspon- ixk (@=@xh )xk ) such procedure has not yet been given).
dence principle). In this way, as we have briefly
Axiom II
indicated, the special role of some matrices, which
have a mutual relation similar to the relation of (i) Pure states of the systems are represented by
position and momentum in Hamiltonian theory. normalized vectors in H.
Following this analogy, it is possible to extend the (ii) If a measurement of the observable A is made on
theory beyond its original scope and consider a system in the state represented by the element
phenomena in which the electrons are not bound 2 H, the average of the numerical values one
to an atom. obtains is < , A >, a real number because A is
In the approach of Schrodinger, on the other self-adjoint (we have denoted by < , > the
hand, particles and collections of particles are scalar product in H).
represented by spatial structures (waves). Spatial
Remark Notice that Axiom II makes no statement
coordinates are therefore introduced a priori, and
about the outcome of a single measurement.
the position of a particle is related to the intensity of
the corresponding wave (this was stressed by Born). Using the natural complex structure of B(H), pure
Position and momentum are both basic measurable states can be extended as linear real functionals on
quantities as in classical mechanics. Physical B(H).
114 Introductory Article: Quantum Mechanics
One defines a state as any linear real positive b, 2 R then immediately after the measure-
functional on B(H) (all bounded operators on the ment the system can be in any (not necessarily
separable Hilbert space H) and says that a state is pure) state which lies in the convex hull of the
normal if it is continuous in the strong topology. pure states which are in the spectral subspace of
It can be proved that a normal state can be the operator A in the interval b;
decomposed into a convex combination of at most (b , b ).
a denumerable set of pure states. With these
Note Statements (ii) and (iii) can be extended
definitions a state is pure iff it has no nontrivial
without modification to the case in which the initial
decomposition. It is worth stressing that this state-
state is not a pure state, and is represented by a
ment is true only if the operators that correspond to
density matrix
.
observable quantities generate all of B(H); one refers
to this condition by stating that there are no Remark 1 Axiom III makes sure that if one
superselection rules. performs, immediately after the first, a further
By general results in the theory of the algebra measurement of the same observable A the outcome
B(H), a normal state is represented by a positive will still lie in the interval b; . This is needed to
operator of trace class
through the formula give some objectivity to the statement made about
(A) = Tr(
A). Since a positive trace-class operator the outcome; notice that one must place the
(usually referred to as density matrix in analogy condition immediately after because the evolution
with its classical counterpart) has eigenvalues k may not leave invariant the spectral subspaces of A.
that are positive and sum up to 1, the decomposition
P If the operator A has, in the interval b; , only
of the normal state takes the form
= k k k , discrete (pure point) spectrum, one can express
where k is the projection operator onto the kth Axiom III in the following way: the outcome can
eigenstate (counting multiplicity). be any state that can be represented by a convex
It is also convenient to know that if a sequence of affine superposition of the eigenstates of A with
normal states
k on B(H) converges weakly (i.e., for eigenvalues contained in b; .
each A 2 B(H) the sequence
k (A) converges) then
the limit state is normal. This useful result is false in In the very special case when A has only one
general for closed subalgebras of B(H), for example, eigenvalue in b; and this eigenvalue is not
for algebras that contain no minimal projections. degenerate, one can state Axiom III in the following
Note that no pure state is dispersion free with form (commonly referred to as reduction of the
respect to all the observables (contrary to what wave packet): the system after the measurement is
happens in classical mechanics). Recall that the pure and is represented by an eigenstate of the
dispersion of the state
with respect to the operator A.
observable A is defined as
(A)
(A2 ) (
(A))2 .
Remark 2 Notice that the third axiom makes a
The connection of the state with the outcome of a
statement about the state of the system after the
single measurement of an observable associated with
measurement is completed.
an operator A is given by the following axiom, which
we shall formulate only for the case when the self- It follows from Axiom III that one can measure
adjoint operator A has only discrete spectrum. The simultaneously only observables which are repre-
generalization to the other case is straightforward but sented by self-adjoint operators that commute with
requires the use of the spectral projections of A. each other (i.e., their spectral projections mutually
commute). It follows from the spectral representa-
Axiom III
tion of the self-adjoint operators that a family {Ak }
(i) If A has only discrete spectrum, the possible of commuting operators can be considered (i.e.,
outcomes of a measurement of A are its there is a representation in which they are) functions
eigenvalues {ak }. over a common measure space.
(ii) If the state of the system immediately before the Axioms IIII give a mathematically consistent
measurement is represented by the vector P 2 H, formulation of QM and allow a statistical descrip-
the probability that the outcome be ak is h j < , tion (and statistical prediction) of the outcome of
A; k
h > j, where h
A; k
are a complete orthonormal the measurement of any observable. It is worth
set in the Hilbert space spanned by the eigenvec- remarking that while the predictions will have only
tors of A to the eigenvalue ak . a statistical nature, the dynamical evolution of the
(iii) If a system is in the pure state and one observables (and by duality of the states) will be
performs a measurement of the observable described by deterministic laws. The intrinsically
A with outcome aj 2 (b , b ) for some statistical aspect of the predictions comes only from
Introductory Article: Quantum Mechanics 115
the third postulate, which connects the mathemati- statistical mixture of the same two states, defined
cal content of the theory with the measurement by the density matrix
= jaj2 jbj2 , where we
process. have denoted by the orthogonal projection onto
The third axiom, while crucial for the connection the normalized vector . Therefore, the search for
of the mathematical formalism with the experimen- these interference terms is one of the means to verify
tal data, contains the seed of the conceptual the predictions of QM, and their smallness under
difficulties which plague QM and have not been given conditions is a sign of quasiclassical behavior
cured so far. of the system under study.
Indeed, the third axiom indicates that the process Strictly connected to superposition are entangle-
of measurement is described by laws that are ment and the partial trace operation. Suppose that
intrinsically different from the laws that rule the one has two systems which when considered
evolution without measurement. This privileged role separately are described by vectors in two Hilbert
of the changing by effect of a measurement leads to spaces Hi , i = 1, 2, and which have observables Ai 2
serious conceptual difficulties since the changing is B(Hi ). When we want to study their mutual
independent of whether or not the result is recorded interaction, it is natural to describe both of them in
by some observer; one should therefore have a way the Hilbert space H1 H2 and to consider the
to distinguish between measurements and generic observables A1 I and I A2 .
interactions with the environment. When the systems interact, the interaction will not
A related problem that is originated by Axiom III in general commute with the projection operator 1
is that the formulation of this axiom refers implicitly onto H1 . Therefore, even if the initial state is of the
to the presence of a classical observer that certifies form 1 2 , i 2 Hi , the final state (after the
the outcomes of measurements and is allowed to interaction) is a vector
2 H1 H2 which cannot
make use of classical probability theory. This be written as
= 1 2 with i 2 Hi . It can be
observer is not subjected therefore to the laws shown, however, that there always exist two
of QM. orthonormal family P vectors n 2 H1 and n 2 H2
These two aspects of the conceptual difficulties such
P that
= cn n n for suitable cn 2 C,
have their common origin in the separation of the jcn j2 = 1 (this decomposition is not unique in
measuring device and of the measured systems into general).
disjoint entities satisfying different laws. The diffi- Recalling that (A1 I) = (A1 ), one can write
culties in the theory of measurement have not yet X
received a satisfactory answer, but various attempts
A1 I jcn j2 n A1
A1
have been made, with various degree of success, and X
jcn j2 n
some of them are described briefly in the section n
Interpretation problems. It appears therefore that
QM in its present formulation is a refined and The map 2 :
!
1 is called reduction or also
successful instrument for the description of the conditioning) with respect to H2 ; it is also called
nonrelativistic phenomena at the Planck scale, but partial trace with respect to H2 . The first notation
its internal consistency is still standing on shaky reflects the analogy with conditioning in classical
ground. probability theory.
Returning to the axioms, it is worth remarking The map 2 can be extended by linearity to a map
explicitly that according to Axiom II a state is a from normal states (density matrices) on B(H1 H2 )
linear functional over the observables, but it is to normal states on B(H1 ) and gives rise to a
represented by a sesquilinear function on the positivity-preserving and trace-preserving map.
complex Hilbert space H. Since Axiom II states One can in fact prove (Takesaki 1971) that any
that any normalized element of H represents a state conditioning for normal states of a von Neumann
(and elements that differ only by a phase represent algebra M is completely positive in the sense that it
the same state) together with , also
a remains positive after tensorization of M with B(K),
b , jaj2 jbj2 = 1 represent a state superposition of where K is an arbitrary Hilbert space.
and (superposition principle). It can also be proved that a partial converse is
But for an observable A, one has in general true, that is, that every completely positive trace-
(A) 6 jaj2 (A) jbj2 (A), due to the cross-terms preserving map on normal states of a von
in the scalar product. The superposition principle is Neumann algebra A B(H) can be written, for a
one of the characteristic features of QM. The suitable choice of a larger Hilbert space K and
superposition of the two pure states and has partial Pisometries Vk , in the form (Kraus form)
properties completely different from those of a (a) = k Vk aVk .
116 Introductory Article: Quantum Mechanics
But it must be remarked that, if U(t) is a one- described above for a trace. Most of the definitions
parameter group of unitary operators on H1 H2 (e.g., of entropy) can be given in this enlarged
and
is a density matrix, the one-parameter family context, but differences may occur, since in general
of maps (t)
! 2 (U(t)
U (t)) does not, in A does not contain finite-dimensional projections,
general, have the semigroup property (t s) = and therefore the trace function is not the trace
(t) (s) s, t > 0 and therefore there is in general commonly defined in a Hilbert space. We shall not
no generator (of a reduced dynamics) associated describe further this very interesting and much
with it. Only in special cases and under very strong developed theory, of major relevance in quantum
hypothesis and approximations is there a reduced statistical mechanics. For a thorough presentation
dynamics given by a semigroup (Markov property). see Ohya and Petz (1993).
Since entanglement and (nontrivial) conditioning are The simplest and most-studied example is the
marks of QM, and on the other side the Markov case when each Hilbert space Hi is a complex
property described above is typical of conditioning in two-dimensional space. The resulting system is
classical mechanics, it is natural to search for condi- constructed in analogy with the Ising model of
tions and approximations under which the Markov classical statistical mechanics, but in contrast to that
property is recovered, and more generally under which system it possesses, for each value of the index i,
the coherence properties characteristic of QM are infinitely many pure states. The corresponding
suppressed (decoherence). We shall discuss briefly this algebra of observables is a closed subalgebra of
problem in the section Interpretation problems, (C2 C2 ) Z and generically does not contain any
devoted to the attempts to overcome the serious finite-dimensional projection.
conceptual difficulties that descend from Axiom III. This model, restricted to the case (C2 C2 )K , K a
It is seen from the remarks and definitions above finite integer, has become popular in the study of
that normal states (density matrices) play the role quantum information and quantum computation, in
that in classical mechanics is attributed to measures which case a normalized element of Hi is called a q-bit
over phase space, with the exception that pure states (in analogy with the bits of information in classical
in QM do not correspond to Dirac measures (later information theory). It is clear that the unit sphere in
on we shall discuss the possibility of describing a (C2 C2 ) contains many more than four points, and
quantum-mechanical states with a function (Wigner this gives much more freedom for operations on the
function) on phase space). system. This is the basis of quantum computation and
In this correspondence, evaluation of an observa- quantum information, a very interesting field which
ble (a measurable function over phase space) over a has received much attention in recent years.
state (a normalized, positive measure) is related to
finding the (Hilbert space) trace of the product of an
operator in B(H) with a density matrix. Notice that Quantization and Dynamics
the trace operation shares some of the properties of The evolution in nonrelativistic QM is described by
the integral, in particular tr AB = tr BA if A is in the Schrodinger equation in the representation in
trace class and B 2 B(H) (cf. g 2 L1 and f 2 L1 ) which for an N-particle system the Hilbert space is
and tr AB > 0 if A is a density matrix and B is a L2 (R3N Ck , where Ck is a finite-dimensional space
positive operator. This suggests to define functions which accounts for the fact that some of the
over the density matrices that correspond to quan- particles may have a spin content.
tities which are important in the theory of dynami- Apart from (often) inessential parameters, the
cal systems, in particular the entropy. Schrodinger equation for spin-0 particles can be
This is readily done if the Hilbert space is finite written typically as
dimensional, and in the infinite-dimensional case if
one takes as observables all Hermitian bounded @
ih H
operators. In quantum statistical mechanics one is @t
led to consider an infinite collection of subsystems, X
N
If some particles have of spin 1/2, the correspond- One is led to wonder whether the structure of
ing kinetic energy term should read (i h
r)2 , fields (operator-valued elements in the dual of
where
k , k = 1, 2, 3, are the Pauli matrices and one compactly supported smooth functions on classical
must add a term W(x) which is a matrix field with spacetime), taken over in a simple way from the
values in Ck Ck and takes into account the field structure of classical electromagnetism, is a
coupling between the spin degrees of freedom. valid instrument in the description of phenomena
Notice that the local operator i
r is a square that take place at a scale incomparably smaller than
root of the Laplacian. the scale (atomic scale) at which we have reasons to
A relativistic extension of the Schrodinger equa- believe that the formalisms of Schrodinger and
tion for a free particle of mass m
0 in dimension Heisenberg provide a suitable model for the descrip-
3 was obtained by Dirac in a space of spinor- tion of natural phenomena.
valued functions k (x, t), k = 0, 1, 2, 3, which carries The phenomena which are related to the interac-
an irreducible representation of the Lorentz group. tion of a quantum nonrelativistic particle interacting
In analogy with the electromagnetic field, for which with the quantized electromagnetic field take place
a linear partial differential equation (PDE) can be at the atomic scale. These phenomena have been the
written using a four-dimensional representation of subject of very intense research in theoretical
the Lorentz group, the relativistic Dirac equation is physics, mostly within perturbation theory, and the
the linear PDE analysis to the first few orders has led to very
spectacular results (although there is at present no
X
3
@ proof that the perturbation series are at least
i k m ; x0 ct
@xk asymptotic).
k0
In this field rigorous results are scarce, but
where the k generate the algebra ofP a representation recently some progress has been made, establishing,
of the Lorentz group. The operator (@=@xk )k is a among other things, the existence of the ground
local square root of the relativistically invariant state (a nontrivial result, because there is no gap
dAlembert operator @ 2 =@x20 m I. separating the ground-state energy from the con-
When one tries to introduce (relativistically tinuous part of the spectrum) and paving the way
invariant) local interactions, one faces the same for the description of scattering phenomena; the
problem as in the classical mechanics, namely one latter result is again nontrivial because the photon
must introduce relativistically covariant fields (e.g., field may lead to an anomalous infrared (long-
the electromagnetic field), that is, systems with an range) behavior, much in the same way that the
infinite number of degrees of freedom. If this field is long-range Coulomb interaction requires a special
considered as external, one faces technical problems, treatment in nonrelativistic scattering theory.
which can be overcome in favorable cases. But if one This contribution to the Encyclopedia is meant to
tries to obtain a fully quantized theory (by also be an introduction to QM and therefore we shall
quantizing the field) the obstacles become unsur- limit ourselves to the basic structure of nonrelativis-
mountable, due also to the nonuniqueness of the tic theory, which deals with systems of a finite
representation of the canonical commutation rela- number of particles interacting among themselves
tions if these are taken as the basis of quantization, and with external (classical) potential fields, leaving
as in the finite-dimensional case. for more specialized contributions a discussion of
In a favorable case (e.g., the interaction of a more advanced items in QM and of the successes
quantum particle with the quantized electromagnetic and failures of a relativistically invariant theory of
field) one can set up a perturbation scheme in a interaction between quantum particles and quan-
parameter (the physical value of in natural units tized fields.
is roughly 1/137). We shall come back later to We shall return therefore to basics.
perturbation schemes in the context of the Schro- One may begin a section on dynamics in QM by
dinger operator; in the present case one has been discussing some properties of the solutions of the
able to find procedures (renormalization) by which Schrodinger equation, in particular dispersive effects
the series in that describe relevant physical and the related scattering theory, the problem of
quantities are well defined term by term. But even bound states and resonances, the case of time-
in this favorable case, where the sum of the first few dependent perturbation and the ionization effect,
terms of the series is in excellent agreement with the the binding of atoms and molecules, the Rayleigh
experimental data, one has reasons to believe that scattering, the Hall effect and other effects in
the series is not convergent, and one does not even nanophysics, the various multiscale and adiabatic
know whether the series is asymptotic. limits, and in general all the physical problems that
118 Introductory Article: Quantum Mechanics
have been successfully solved by Schrodingers QM topologies. The strongest result refers to Wigners
(as well as the very many interesting and unsolved case. One can prove that if a one-parameter group
problems). of Wigner automorphism t is measurable in the
We will consider briefly these issues and the weak topology (i.e., t
(A) is measurable in t for
approximation schemes that have been developed in every choice of A and
) then it is possible to choose
order to derive explicit estimates for quantities of the U(t) provided by Wigners theorem in such a
physical interest. Since there are very many excellent way that they form a group which is continuous in
reviews of present-day research in QM (e.g., Araki the strong topology. Similar results are obtained for
and Ezawa (2004), Blanchard and DellAntonio the cases of Kadison and Segal automorphism, but
(2004), Cycon et al. (1986), Islop and Sigal (1996), in both cases one has to assume continuity of t in a
Lieb (1990), Le Bris (2005), Simon (2002), and stronger topology (the strong operator topology in
Schlag (2004)) we refer the reader to the more the Segal case, the norm topology in Kadisons).
specialized contributions to this Encyclopedia for a Weak continuity is sufficient if the operator product
detailed analysis and precise statements about the is preserved (in this case one speaks of automorph-
results. isms of the algebra of bounded operators). The
We prefer to come back first to the foundations of existence of the continuous group U(t) defines a
the theory; we shall take the point of view of Hamiltonian evolution. One has indeed:
Heisenberg and start discussing the mapping proper-
Theorem 1 (Stone). The map t ! U(t), t 2 R is a
ties of the algebra of observables and of the states.
weakly continuous representation of R in the set of
Since transition probabilities play an important role,
unitary operators in a Hilbert space H if and only if
we consider only transformations which are such
there exists a self-adjoint operator H on (a dense set
that, for any pair of pure states 1 and 2 , one has
of) H such that U(t) = eitH and therefore
< (1 ), (2 ) > = < 1 , 2 >. We call these maps
Wigner automorphisms. dUt
A result of Wigner (see Weyl (1931)) states that if 2 DH ! i HUt 10
dt
is a Wigner automorphism then there exists a
unique operator U , either unitary or antiunitary, The operator H is called generator of the dynamics
such that (P) = U PU for all projection operators. described by U(t).
If there is a one-parameter group of such auto- Note In Schrodingers approach the operator
morphisms, the corresponding operators are all described in Stones theorem is called Hamiltonian,
unitary (but they need not form a group). in analogy with the classical case. In the case of one
A generalization of this result is due to Kadison. particle of mass m in R3 subject to a conservative
Denoting by I1, the set of density matrices, a force with potential energy V(x) it has the following
Kadison automorphism is, by definition, such that form, in units in which h = 1:
for all
1 ,
2 2 I1, and all 0 < s < 1 one has (s
1
(1 s)
2 ) = s(
1 ) (1 s)(
2 ). For Kadison auto- 1 X @x2
k
morphisms the same result holds as for Wigners. H Vx; 11
2m k
@x2k
A similar result holds for automorphisms of the
observables. Notice that the product of two Hermi- If the potential V depends on time, Stones theorem
tian operators is not Hermitian in general, but is not directly applicable but still the spectral
Hermiticity is preserved under Jordans product properties of the self-adjoint operators Ht and of
defined as A B (1=2)[AB BA]. the Kernel of the group ! eiHt are essential to
A Segal automorphism is, by definition, an solve the (time-dependent) Schrodinger equation.
automorphism of the Hermitian operators that The semigroup t ! etH0 is usually a positivity-
preserves the Jordan product structure. A theorem preserving semigroup of contractions and defines a
of Segal states that is a Segal automorphism if and Markov process; in favorable cases, the same is true
only if there exist an orthogonal projector E, a of t ! etH (FeynmannKac formula).
unitary operator U in EH, and an antiunitary There is an analogous situation in the general
operator V in (I E)H such that (A) = W AW , theory of dynamical systems on a von Neumann
where W U V. algebra; in analogy with the case of elliptic
We can study now in more detail the description operators, one defines as dissipation a map on
of the dynamics in terms of automorphism of a von Neumann algebra M which satisfies (a a)
Wigner or Kadison type when it refers to states a (a) (a )a for all a 2 M. The positive dissipa-
and of Segal type when it refers to observables. We tion is called completely positive if it remains
require that the evolution be continuous in suitable positive after tensorization with B(K) for any
Introductory Article: Quantum Mechanics 119
Hilbert space K. Notice that according to this the essentially self-adjoint operator that acts on the
definition every -derivation is a completely positive smooth functions with compact support as multi-
dissipation. For dissipations there is an analog of the plication by the coordinate x and p ^ is defined
theorem of Stinespring, and often bounded dissipa- similarly in Fourier space. This representation can
tion can be written as be trivially generalized to construct operators q ^k and
X ^k in L2 (RN ).
p
X 1
a = ih; a
Vk aVk {Vk Vk ; a} Another frequently used representation of [12] is
2 on L2 (S1 ) (and when generalized to N degrees of
for a 2 M freedom, on T N ). In this representation, the operator
^
(the symbols {. , .} denote the anticommutator). PNis defined
p
ik=2
by ck ! kck on functions f () =
c
k = M k e , 0
M, N < 1. In this case the
In general terms, by quantization is meant the operator q ^ is defined as multiplication by the angle
construction of a theory by deforming a commutative coordinate . It is easy to check that this representa-
algebra of functions on a classical phase X in such a tion is inequivalent to the previous one and that [12]
way that the dynamics of the quantum system can be is satisfied (as an identity) on the (dense) set of
derived from the prescription of deformation, usually vectors which are in the domain both of p ^q^ and
by deforming the Poisson brackets if X is a cotangent of q^p^. But notice that the domain of essential self-
bundle T M (Halbut 2002, Landsman 2002). We adjointness of p ^ is not left invariant by the action of
shall discuss only the Weyl quantization (Weyl 1931) ^ (f () is a function on S1 only if f (2) = 0).
q
that has its roots in Heisenbergs formulation of QM We shall denote p ^ in this representation by the
and refers to the case in which the configuration space symbol @=@per and refer to it as the Bloch
is RN , or, with some variant (FloquetZak) the representation. It can be modified by setting the
N-dimensional torus. We shall add a few remarks action of p ^ as cn ! ncn , 0 < < 2, and this
on the Wick (anti-Weyl) quantization. More general gives rise to the various BlochZak and magnetic
formulations are needed when one tries to quantize a representations.
classical system defined on the cotangent bundle of The Bloch representation can be extended to
a generic variety and even more so if it defined on a periodic functions on R1 noticing that L2 (R) =
generic symplectic manifold. L2 (S1 ) l2 (N); similarly, the BlochZak and the
The Weyl quantization is a mathematically accu- magnetic representation can be extended to L2 (RN ).
rate rendering of the essential content of the The difference between the representations can be
procedure adopted by Born and Heisenberg to seen more clearly if one considers the one-parameter
construct dynamics by finding operators which groups of unitary operators generated by the
play the role of symplectic coordinates. canonical operators q ^ and p^. In the Schrodinger
Consider a system with one degree of freedom. representation on L2 (R), these groups satisfy
The first naive attempt would be to find operators
q ^ that satisfy the relation
^, p UaVb eiab VbUa
^
^ iI
q; p
^ 12 Ua eia^q ; Vb eibp
and to construct the Hamiltonian in analogy with and therefore, setting z = a ib and W(z)
the classical case. To play a similar role, the eiab=2 V(b)U(a) one has
operators q^ and p ^ must be self-adjoint and satisfy 0
[12] at least in a weak sense. If both are bounded, WzWz0 ei!z;z =2 Wz z0
13
[12] implies eibp^ q
^eibp^ = q
^ bI (the exponential is z 2 C; !z; z0 Imz; z0
defined through a convergent series) and therefore
the spectrum of q ^ is the entire real line, a contra- The unitary operators W(z) are therefore projective
diction. Therefore, that inclusion sign in [12] is strict representations of the additive group C. This
and we face domain problems, and as a consequence generalizes immediately to the case of N degrees
[12] has many inequivalent solutions (equivalence of freedom; the representation is now of the
here means unitary equivalence). additive group CN and ! is the standard symplectic
Apart from pathological ones, defined on form on CN .
L2 -spaces over multiple coverings of R, there are In the Bloch representation, the unitaries
inequivalent solutions of [12] which are effectively U(a)V(b)U (a)V (b) are not multiples of the iden-
used in QM. tity, and have no particularly simple form. The map
The most common solution is on the Hilbert space CN 3 z ! W(z) with the structure [13] is called Weyl
L2 (R) (with Lebesgue measure), with x ^ defined as system; it plays a major role in QM. The following
120 Introductory Article: Quantum Mechanics
theorem has therefore a major importance in the is the formal adjoint of ak in L2 (R). One has
mathematical theory of QM. jak (Nk 1)1=2 j < 1. In the domain of N these
operators satisfy the following relations (canonical
Theorem 2 (von Neumann 1965). There exists
commutation relations)
only one, modulo unitary equivalence, irreducible
representation of the Weil system. ak ; ah k;h ; ah ; ak 0
15
The proof of this theorem follows a general Nk ; ah ah h;k ; Nh ; ak ak h;k
pattern in the theory of group representations. One
In view of the last two relations, the operator ak is
introduces an algebra W (N) of operators
called the annihilation operator (relative to the kth
Z
degree of freedom) and its formal adjoint is called
Wf f zWzdz; f 2 L1 CN the creation operator. The operators ak have as
spectrum the entire complex plane, the operators ak
called Weyl algebra. have empty spectrum; the eigenvectors of Nk are the
It easy to see that jWf j = jf j1 and that f ! Wf is a Hermite polynomials in the variable xk . The
linear isomorphism of algebras if one considers W (N) eigenvectors of ak (i.e., the solutions in L2 (R) of
with its natural product structure and L1 as a the equation ak = , 2 C) are called coherent
noncommutative algebra with product structure states; they have a major role in the Bargmann
Z
i FockSegal quantization and in general in the
f g dz0 f z z0 gz0 exp !z; z0 14 semiclassical limit.
2
The operators {Nk } generate a maximal abelian
So far the algebra W (N) is a concrete algebra of system and therefore the space L2 (RN ) has a natural
bounded operators on L2 (R2 ). But it can also be representation as the symmetrized subspace of
considered an abstract C -algebra which we still k (CN )k (Fock representation). In this representa-
denote by W (N) . tion, a natural basis is given by the common
It is easy to see that, according to [14], if f0 is eigenvectors {nk } , k = 1, . . . , N, of the operators Nk .
chosen to be a suitable Gaussian, then Wf0 is a A generic vector can be written as
projection operator which commutes with all the
X X
Wf s. Moreover, Wf Wg = f , g Wf g for a suitable cfnk g fnk g ; jcfnk g j2 < 1
phase factor . Considering the GelfandNeumark fnk g fnk g
Segal construction for the C -algebra W (N) , one
finds that these properties lead to a decomposition and therefore can be represented by the sequence c{nk } .
of any representation in cyclic irreducible equivalent Notice that the creation operators do not create
ones, completing the proof of the theorem. particles in RN but rather act as a shift in the basis
The Weyl system has a representation (equivalent of the Hermite polynomials.
to the Schrodinger one) in the space L2 (RN , g), It is traditional to denote by (L2 (RN )) the Fock
where g is Gausss measure. This allows an exten- representation (also called second quantization
sion in which CN is replaced by an infinite- because for each degree of freedom the wave
dimensional Banach space equipped with a Gauss function is written in the quantized basis of the
measure (weak distribution (Segal 1965, Gross harmonic oscillator) and to denote by (A) the lift
1972, Wiener 1938)). Uniqueness fails in this more of a matrix A 2 B(CN ). These notations are espe-
general setting (uniqueness is strictly connected with cially used if CN is substituted with a Banach space
the compactness of the unit ball in CN ). Notice that X. This terminology was introduced by Segal in his
in the Schrodinger representation (and, therefore, in work on quantization of the wave equation; it is
any other representation) the Hamiltonian for the used ever since, mostly in a perturbative context.
harmonic oscillator defines a positive self-adjoint In the theory of quantized fields, the space CN is
operator substituted with a Banach space, X, of functions.
In this setting, second quantization (Segal 1965,
X
N
@2 Nelson 1974) considers the state {nk } as represent-
N Nk ; Nk x2k 1
@xk2 ing a configuration of the system in which there are
1
precisely nk particles in the kth physical state (this
The spectrum of each of the commuting operators presupposes having chosen a basis in the space of
Nk consists of the positive integers (including 0) and distribution on R3 ). There is no problem in doing
is therefore called number operator for the kth this (Gross 1972) and one can choose for X a
degree of freedom. The operatorp Nk can be written suitable Sobolev space (which one depends on the
as Nk = ak ak , where ak = (1= 2)(xk @=@xk ) and ak Gaussian measure given in X) if one wants that the
Introductory Article: Quantum Mechanics 121
generalization of the commutation relations [15] be is reasonable to introduce the following relations
of the form [a (f ), a(g)] = < f , g> with a suitable (canonical anticommutation relations:
scalar product < , > in X. The problem with
quantization of relativistic fields is that, in order to fak ; ah g k;h ; fah ; ak g 0
16
ensure locality, one is forced to use a Sobolev space Nk ; ah ah h;k ; fA; Bg AB BA
of negative index (depending on the dimension of
The Hilbert space is now N H2 , where H2 is a
physical space), and this gives rise to difficulties in
two-dimensional complex Hilbert space. Notice that
the definition of the dynamics for nonlinear vector
H2 carries an irreducible two-dimensional represen-
fields.
tation of sU(2) o(3) (spin representation) so that
One should notice that in the work of Segal
this quantization associates spin 1/2 and
(1965), and then in Constructive field theory antisymmetry.
(Nelson 1974), the Fock representation is placed in
The operators in [16] are all bounded (in fact
a Schrodinger context exhibiting the relevant opera-
bounded by 1 in norm). The Fock representation is
tors as acting on a space L2 (X, g), where X is a
constructed as in the case of Weyl (see Araki
subspace of the space of Schwartz distributions on
(1988)), with nk equal 0 or 1 for each index k.
the physical space of the particles one wants to
The infinite-dimensional case is defined in the same
describe and g is a suitably defined Gauss measure
way, and leads to inequivalent irreducible represen-
on X.
tations (Araki 1988); only in one of them is the
The Fock representation is related to the Bargmann number operator defined and bounded below. Some
FockSegal representation (Bargmann 1967), a repre-
of these representations can be given a Schrodinger-
sentation in a space of holomorhic functions on CN
like form, with the introduction of a gauge and an
square integrable with respect to a Gaussian measure.
integration formalism based on a trace (Gross
For its development, this representation relies on the
1972). This system is much used in quantum
properties of Toeplitz operators and on Tauberian
statistical mechanics because it deals with bounded
estimates. It is much used in the study of the
operators and can take advantage of strong results
semiclassical limit and in the formulation of QM in
in the theory of C -algebras. In the finite-dimensional
systems for which the classical version has, for phase case (and occasionally also in the general case) it is
space, a manifold which is not a cotangent bundle
used in quantum information (the space H2 is the
(e.g., the 2-sphere).
space of a quantum bit).
Remark The Fock representation associated with Returning to the Weyl system, we now introduce
the Weyl system in the infinite-dimensional context the strictly related Wigner function which plays an
can describe only particles obeying BoseEinstein important role in the analysis of the semiclassical
statistics; indeed, the states are qualified by their limit and in the discussion of some scaling limits, in
particle content for each element of the basis chosen particular the hydrodynamical limit and the Bose
and there is no possibility of identifying each Einstein condensation when N ! 1.
particle in an N-particle state. This is obvious in The Wigner function W for a pure state is a
the finite-dimensional case: the Hermite polynomial real-valued function on the phase space of the
of order 2 cannot be seen as composed of two classical system which represents the state faithfully.
polynomials of order 1. It is defined as
Z
In the infinite-dimensional context, if one wants y y
to treat particles which obey FermiDirac statistics, W x;
2n ei
;x x x dy
Rn 2 2
one must rely on the Pauli exclusion principle (Pauli
1928), which states that two such particles cannot The Wigner function is not positive in general (the
be in the same configuration; to ensure this, the only exceptions are those Gaussian states that satisfy
wave function must be antisymmetric under permu- (x) (p)
h). But is has the interesting property
tation of the particle symbols. It is a matter of fact that its marginalsRreproduce correctly the Born rule.
(and a theorem in relativistic quantum field theory 2 . If the func-
In fact, one has W (x,
) dx = j(
)j
n
which follows in that theory from covariance, tion (t, x) x 2 R is a solution of the free Schrodinger
locality and positivity of the energy (Streater and equation ih@=@t = h2 then its Wigner function
Wightman 1964) that particles with half-integer spin satisfies the Liouville (transport) equation @W =@t
obey the FermiDirac statistics. Therefore, to quan-
rW = 0.
tize such systems, one must introduce (commuta- The Wigner function is strictly linked with the
tion) relations different from those of Weyl. Since it Weyl quantization. This quantization associates
must now be that (a )2 = 0, due to antisymmetry, it with every function
(p, x) in a given regularity
122 Introductory Article: Quantum Mechanics
class an operator
(D, x) (the Weyl symbol of the Under the correspondence A $ A, ^ linear symplec-
function
) defined by tic maps correspond to unitary transformations.
Z This is not in general the case for nonlinear maps.
D; xf ; g
; xWf ; g
; x d
dx One can prove that conditions (1)(5) give
a complete characterization of the map A $ A. ^
Z p p Moreover, the correspondence cannot be extended
Wf ; g
; x ei
;p f x ; x dp
2 2 to other functions in phase space. Indeed, one has:
It can be verified that the action of F preserves the Theorem 3 (van Hove). Let G be the class of
Schwartz classes S and S0 and is unitary in L2 (R2N ). functions C1 on R2N which are generators of global
Moreover, one has
(D, x) =
(D, x). symplectic flows. For g 2 G let g (t) be the
The relation between Weyls quantization and corresponding group. There cannot exist for every
Wigner functions can be readily seen from the g a correspondence g $ ^g, with ^g self-adjoint, such
natural duality between bounded operators and that ^g(x, p) = g(^ ^).
x, p
pure states:
Z We described the Weyl quantization as a corre-
^ ^ ap; qp; q dp dq
trA spondence between functions in the Schwartz class S
Z and a class of bounded operators. Weyls quantiza-
0 tion can be extended to a much wider class of
p; q eip;q q0 ; q dq0
functions. Operators that can be so constructed are
called Fourier integral operators. One uses the
We give now a brief discussion of the general notation
(D, x).
structure of a quantization, and apply it to the We have the following useful theorems (Robert
Weyl quantization. By quantization of a Hamilto- 1987):
nian system we mean a correspondence, parame-
trized by a small parameter h, between classical Theorem 4 Let l1 , . . . , lK be linear functions on RN
observables (real functions on a phase space F ) and such that {li lk } = 0. Let P be a polynomial and let
quantum observables (self-adjoint operators on a
(
, x) P[l1 (
, x), lK (
, x)]. Then
Hilbert space H) with the property that the (i)
(D, x) maps S in L2 (RN ) and self-adjoint;
corresponding structures coincide in the limit h ! 0 (ii) if g is continuous, then (g(
)(D, x) = g(
(D, x)).
and the difference for h 6 0 can be estimated in a
suitable topology. One proves that
(D, x) extends to a continuous
This last requirement is important for the applica- map S0 (X) ! S0 (X) and, moreover,
tions and, from this point of view, Weyls quantiza- Theorem 5 (CalderonVaillancourt). If
0
tion gives stronger results than the other formalisms P
jjjj2N1 jD
Dx
j < 1 the norm of the opera-
of quantization.
tor
(D, x) is bounded by
0 .
We limit our analysis to the case F T X, with
X RN , and we make use of the realization of H as Any operator obtained from a suitable class of
L2 (RN ). functions through Weyls quantization is called a
Let {xi } be Cartesian coordinates in RN and pseudodifferential operator. If
(q, p) = P(p), where
consider a correspondence A ! A ^ that satisfies the P is a polynomial,
(p,
q) is a differential operator.
following requirements: Moreover, if
(p, x) 2 L2 then
(D, x) is a
^ is linear; HilbertSchmidt operator and
1. A$A
2. xk $ x^k where x ^k is multiplication by xk ; Z 1=2
n=2 2
3. pk $ i h@=@xk ; j
D; xjHS 2h jAzj dz
4. if f is a continuous function in RN , one has
f (x) $ f (^ x) and ^f (p) = (Ff )(^
x), where F denotes a
Pseudodifferential operators turn out to be very
Fourier transform;
^ , (, ), , 2 RN , where L is the important in particular in the quantum theory of
5. L $ L
molecules (Le Bris 2003), where adiabatic analysis
generator of the translations in phase space in
^ is the generator of the one- and Peierls substitution rules force the use of
the direction and L
pseudodifferential operators.
parameter group t ! W(t) associated with by
The next important problem in the theory of
the Weyl system.
quantization is related to dynamics.
Note that (1) and (4) imply (2) and (3) through a Let be a quantization procedure and let H(p, q)
limit procedure. be a classical Hamiltonian on phase space. Let At be
Introductory Article: Quantum Mechanics 123
the evolution of a classical observable A under the annihilation operators by placing all creation opera-
flow defined by H and assume that (At ) is well tors to the left.
defined or all t. We now come back to Schrodingers equation and
Is there a self-adjoint operator H ^ such that notice that it can be derived within Heisenbergs
^ ^
(At ) = eitH (A) eitH ? If so, can one estimate formalism and Weyls quantization scheme from the
^ (H)j? Conversely, if the generator of the
jH Hamiltonian of an N-particle system in Hamiltonian
quantized flow is, by definition, H ^ (as is usually mechanics (at least if one neglects spin, which has
assumed), is it possible to give an estimate of the no classical analog).
difference j(At ) ((A))t j for a dense set of 2 Apart from (often) inessential parameters, the
^ ^ ~ t At j ,
H, where At eitH AeitH , or to estimate jA 1 Schrodinger equation for N scalar particles in R3
where A ~ t is defined by (A~ t ) = ((A)) . Is it possible can be written as
t
to write an asymptotic series in h for the differences?
For the Weyl quantization some quantitative @ X N
ih ihrk Ak 2 V H
results have been obtained if one makes use of the @t k1 17
semiclassical observables (Robert 1987). We shall
2 L2 R3N
not elaborate further on this point.
For completeness, we briefly mention another where Ak are vector-valued functions (vector poten-
quantization procedure which is often used in tials) and V = Vk (xk ) Vi, k (xi xk ) are scalar-
mathematical physics. valued function (scalar potentials) on R3 .
Typical problems in Schrodingers quantum
Wick Quantization mechanics are:
This quantization assigns positive operators to 1. Self-adjointness of H, existence of bound states
positive functions, but does not preserve polynomial (discrete spectrum of the operator), their number
relations. It is strictly related to the Bargmann and distribution, and, in general, the properties
FockSegal representation. of the spectrum.
Call coherent state centered in the point (y, ) of 2. Existence, completeness, and continuity proper-
phase space the normalized solution of (ip ^x ^ ties of the wave operators
i x)y, (x) = 0.
Wicks quantization of the classical observable A W
s lim eitH0 eitH 18
1
is by definition the map A ! OpW (A), where
Z and the ensuing existence and properties of the
OpW A 2 hn Ay; ; y; y; dy d S-matrix and of the scattering cross sections. In
[18] H0 is a suitable reference operator, usually
One can prove, either directly or going through (with periodic boundary conditions if the
Weyls representation, that potentials are periodic in space), for which
Schrodingers equation can be somewhat analy-
1. if A
0 then OpW
h (A)
0; tically controlled.
2. the Weyl symbol of the operator OpW h (A) is 3. Existence and property of a semiclassical limit.
Z Z
1 2 2
hn
Ay; ehxy
dy d In [17] and [18] we have implicitly assumed that H
is time independent; very interesting problems arise
3. for every A 2 O(0) one has kOpW ^
h (A) Ak = when H depends on time, in particular if it is
O(h).
periodic or quasiperiodic in time, giving rise to
Wicks quantization associates with every vector ionization phenomena. In the periodic case, one is
2 H a positive Radon measure in phase R space, helped by Floquets theory, but even in this case
called Husimi measure. It is defined by A d = many interesting problems are still unsolved.
(OpWh (A) ), A 2 S(z). Wicks quantization is less
If the potentials are sufficiently regular, the
adapted to the treatment of nonrelativistic particles, spectrum of H consists of an absolutely continuous
in particular Eherenfests rule does not apply, and part (made up of several bands in the space-periodic
the semiclassical propagation theorem has a more case) and a discrete part, with few accumulation
complicated formulation. It is very much used for points.
the analysis in Fock space in the theory of quantized On the contrary, if V(x, !) is a measurable
relativistic fields, where a special role is assigned to function on some probability space , with a
Wick ordering, according to which the polynomials suitable distribution (e.g., Gaussian), the spectrum
in x
^h and p ^h are reordered in terms of creation and may have totally different properties almost surely.
124 Introductory Article: Quantum Mechanics
For example, in the case N = 1 (so that the terms Vi, j ! 0, a very singular PDE (the coefficients of the
h
are absent) in one and two spatial dimensions the differential terms go to zero in this limit).
spectrum is pure point and dense, with eigenfunctions Dividing each term of the equation by h (because
which decrease at infinity exponentially fast (although we do not want to change the scale of time) leads, in
not uniformly); as a consequence, the evolution group the case of one quantum particle in R3 in potential
does not give rise to a dispersive motion. The same is field V(x) (we treat, for simplicity, only this case), to
true in three dimensions if the potential is sufficiently the equation
strong and the kinetic energy content of the initial state
@x; t
is sufficiently limited. This very interesting behavior is i hx; t h1 Vxx; t 19
due roughly to the randomness of the barriers @t
generated by the potential and is also present, to a It is convenient therefore to rescale the spatial
large extent, for potentials quasiperiodic in space variables by a factor p h1=2 (i.e., choose different
(Pastur and Figotin 1992). units) setting x = hX and look for solutions of [19]
In these as well as in most problems related which remain regular in the limit h ! 0 as functions
to Schrodingers equation, a crucial role is taken of the rescaled variable X. One searches therefore
by the resolvent operator (H I)1 , where is for solutions that on the physical scale have
any complex number outside the spectrum of H; support that becomes vanishingly small in the
many of the results are obtained when the difference limit. It is therefore not surprising that, in the limit,
(H I)1 (H0 I)1 is a compact operator. these solutions may describe point particles; the
Problems of type (1) and (2) are of great physical main result of semiclassical analysis is that he
interest, and are of course common with theoretical coordinates of these particles obey Hamiltons laws
physics and quantum chemistry (Le Bris 2003), of classical mechanics.
although the instruments of investigation are some- This can be roughly seen as follows (accurate
what different in mathematical physics. The semi- estimates are needed to make this empirical analysis
classical limit is often more of theoretical interest, precise). Using multiscale analysis, one may write the
but its analysis has relevance in quantum chemistry solution in the form (X, x, t) and seek solutions
and its methods are very useful whenever it is which are smooth in X and x. Both terms on the right-
convenient to use multiscale methods, as in the hand side of p[19] contain contributions of order 2
study of molecular spectra. and 1 in h and in order to have regular solutions
We start with a brief description of point (3); it one must have cancellations between equally singular
provides a valid instrument in the description of contributions. For this, one must perform an expan-
quantum-mechanical systems at a scale where it is sion to the second order of the potential (assumed at
convenient to use units in which the physical least twice differentiable) around a suitable trajectory
constant h has a very small value ( h 1027 in q(t), q 2 R3 , and choose this trajectory in such a way
CGS units). From Heisenbergs commutation rela- that the cancellations take place.
tions, [^ ^]
x, p hI, it follows that the product of the A formal analysis shows that this is achieved only
dispersion (uncertainty) of the position and momen- if the trajectory chosen is precisely a solution of the
tum variables is proportional to h and therefore at classical Lagrange equations. Of course, a more
least one of these two quantities must have very refined analysis and good estimates are needed to
large values (compared to h). One considers usually make this argument precise, and to estimate the
the case in which these dispersions have comparable error that is made when one p neglects in the resulting
values, which is therefore very small, of the order of equation terms of order h; in favorable cases, for
magnitude h1=2 (but very large as compared with h). each chosen T the error in the solution for most
In order to make connection with the Hamilton initial
p conditions of the type described is of order
Jacobi formalism of classical mechanics one can also h for jtj < T.
consider the case in which the dispersion in This semiclassical result is most easily visualized
momentum is of the order h (the WKB method). using the formalism of Wigner functions (the
The semiclassical limit takes advantage mathema- technical details, needed to to make into a proof
tically from the fact that the parameter h is very the formal arguments, take advantage of regularity
small in natural units, and performs an asymptotic estimates in the theory of functions).
analysis, in which the terms of lowest order are In natural units, one defines
exactly described and the difference is estimated. N
The problem one faces is that the Schrodinger i
Wh; x;
; t W x; ; t
equation becomes, in the mathematical limit 2 h
Introductory Article: Quantum Mechanics 125
In terms of the Wigner function Wh, the Schrodin- success. We give here a very naive introduction to
ger equation [19] takes the form these problems and refer the reader to the more
specialized contributions to this Encyclopedia for a
@f h rigorous analysis and exact statements.
rx f h Kh f h 0
@t 20 Of course, most of the problems of physical
h t 0 0 h interest are not exactly solvable, in the sense that
rarely the final result is given explicitly in terms of
where
simple functions. As a consequence, exact numerical
i i
; y 1 hy
hy results, to be compared with experimental data, are
Kh e h V x V x rarely obtained in physically relevant problems, and
2N 2 2
most often one has to rely on approximation
It can be proved (Robert 1987) that if the potential schemes with (in favorable cases) precise estimates
is sufficiently regular and if the initial datum on the error.
converges in a suitable topology to a positive Formal perturbation theory is the easiest of such
measure f0 , then, for all times, Wh, (x, t) converges schemes, but it seldom gives reliable results to
to a (weak) solution of the Liouville equation physically interesting problems. One writes
@f H H V 21
rx f rVx r
f 0
@t
where is a small real parameter, and sets a formal
This leads to the semiclassical limit if, for example, scheme in case (1) by writing
one considers a sequence of initial data n where n
is a sequence of functions centered at x0 with X
1 X
1
can a priori guess the presence of coordinates which apt to concur with mathematical investigation to a
have a rapid dependence on time (fast variables) and fuller comprehension of QM.
a complementary set of coordinates whose depen-
dence on time is slow. This suggests that one can try
an asymptotic analysis, often in connection with Interpretation Problems
adiabatic techniques. Seldom one deals with cases in
In this section we describe some of the conceptual
which the hypotheses of elementary adiabatic
problems that plague present-day QM and some of
theorems are satisfied, and one has to refine the
the attempts that have been made to cure these
analysis, mostly through subtle estimates which
problems, either within its formalism or with an
ensure the existence of quasi invariant subspaces.
altogether different approach.
Asymptotic techniques and refined estimates are
also needed to study the effective description of a
Approaches within the QM Formalism
system of N interacting identical particles when N
becomes very large; for example, in statistical We begin with the approaches from within. We
mechanics, one searches for results which are valid have pointed out that the main obstacle in the
when N ! 1. measurement problem is the description of what
The most spectacular results in this direction are occurs during an act of measurement. Axiom III
the proof of stability of matter by E Lieb and claims that it must be seen as a destruction act,
collaborators, and the study of the phenomenon of and the outcome is to some extent random. The
BoseEinstein condensation and the related Gross final state of the system is one of the eigenstates of
Pitaevskii (nonlinear Schrodinger) equation. The the observable, and the dependence on the initial
experimental discovery of the state of matter state is only through an a priori probability assign-
corresponding to a BoseEinstein condensate is a ment; the act of measurement is therefore not a
clear evidence of the nonclassical behavior of matter causal one, contrary to the (continuous) causal
even at a comparatively macroscopic size. From the reversible description of the interaction with the
point of view of mathematical physics, the ongoing environment. One should be able to distinguish
research in this direction is very challenging. a priori the acts of measurement from a generic
One should also recognize the increasing role that interaction.
research in QM is taking in applications, also in There is a further difficulty. Due to the super-
connection with the increasing success of nanotech- position principle, if a system S on which we want
nology. In this respect, from the point of view of to make a measurement of the property associated
mathematical physics, the study of nanostructure with the operator A interacts with an instrument
(quantum-mechanical systems constrained to very I described by the operator S, the final state
of the
small regions of space or to lower-dimensional combined system will be a coherent superposition of
manifolds, such as sheets or graphs) is still in its tensor product of (normalized) eigenstates of the
infancy and will require refined mathematical two systems
techniques and most likely entirely new ideas. X X
Finally, one should stress the important role
cn;m A S
n m; jcn;m j2 1 23
n;m n;m
played by numerical analysis (Le Bris 2003) and
especially computer simulations. In problems involv- Measurement as described by Axiom III of QM
ing very many particles, present-day analytical claims that once the measurement P is over, the
techniques provide at most qualitative estimates measured system is, with probability m jcn, m j2 , in
and in favorable cases bounds on the value of the the state A n and the instrument is in a state which
quantities of interest. Approximation schemes are carries the information about the final state of the
not always applicable and often are not reliable. system (after all, what one reads at the end is an
Hints for a progress in the mathematical treatment indicator of the final state of the instrument).
of some relevant physical phenomena of interest in It is therefore convenient to write
in the form
QM (mostly in condensed matter physics) may come X X
from the ab initio analysis made by simulations on
dn A
n n ; jdn j2 1 24
n
large computers; this may provide a qualitative and,
to a certain extent, quantitative behavior of the (this defines n if the spectrum of A is pure point and
solutions of Schrodingers equation corresponding to nondegenerate). It is seen from [24] that, due to the
typical initial conditions. In recent times the reduction postulate, we know that the the measured
availability of more efficient computing tools has system is in the state A n0 if a measurement of an
made computer simulation more reliable and more observable T with nondegenerate spectrum,
Introductory Article: Quantum Mechanics 127
eigenvectors {n }, and eigenvalues {zn } gives the measured system, and these are the observables that
results zn0 . specify the outcome of the measurement in prob-
Along these lines, one does not solve the measure- abilistic terms.
ment problem (the outcome is still probabilistic) but The scattering approach relies on the Schrodinger
at least one can find the reason why the measuring approach to QM, and on results from the theory of
apparatus may be considered classical. scattering. This approach describes the interaction of
It is more convenient to go back to [23] and to the system S (typically a heavy particle) with an
assume that one is able to construct the measuring environment made of a large number of light particles
apparatus in such a way that one divides (roughly) and seeks to describe the state of S after the
its pure (microscopic) states in sets n (each interaction when one does not have any information
corresponding to a macroscopic state) which are on the final state of the light particle. One seeks to
(roughly) in one-to-one correspondence to the prove that the reduced density matrix is (almost)
eigenstates of A. The sets n contain a very large diagonal in a given representation (typically the one
number, Nn , of elements, so that the sets n need given by the spatial coordinates). This defines the
not be given with extreme precision. And the sets n observable (typically, position) that can be measured
must be in a sense stable under small external and the probability of each outcome.
perturbations. Both approaches rely on the loss of information in
It is clear from this rough description that the the process to cancel the effect of the superposition
apparatus should contain a large number of small principle and to bring the measurement problem
components and still its interaction with the small within the realm of classical probability theory.
system A should lead to a more or less sudden None of them provides a causal dependence of the
change of the sets n . result of the measurement on the initial state of the
A concrete model of this mechanism has been system.
proposed by K Hepp (1972) for the case when A is a We describe only very briefly these attempts.
2 2 matrix, and the measuring apparatus is made In its more basic form, the scattering approach
of a chain of N spins, N ! 1; the analysis was has as starting point the Schrodinger equation for a
recently completed by Sewell (2005) with an system of two particles, one of which has mass very
estimate on the error which is made if N is finite much smaller than the other one. The heavy particle
but large. This is a dynamical model, in which the may be seen as representing the system on which a
observable A (a spin) interacts with a chain of spins measurement is being made. The outline of the
(moves over the spins) leaving the trace of its method of analysis (which in favorable cases can be
passage. It is this trace (final macroscopic state of made rigorous) (Joos and Zeh 1985, Tegmark 1993)
the apparatus) which is measured and associated is the following. One chooses units in which the
with the final state of A. The interaction is not mass of the heavy particle is 1, and one denotes by
instantaneous but may require a very short time, the mass of the light particle. If x is the coordinate
depending on the parameters used to describe the of the heavy particle and y that of the light one, and
apparatus and the interaction. if the initial state of the system is denoted by
We call decoherence the weakening of the 0 (x, y), the solution of the equation for the system
superposition principle due to the interaction with is (apart from inessential factors)
the environment.
Two different models of decoherence have been t expfix 1 y Wx Vx ytg0
analyzed in some detail; we shall denote them
thermal-bath model and scattering model; both are Making use of center-of-mass and relative coordi-
dynamical models and both point to a solution, to nates, one sees that when is very small one should
various extents, of the problem of the reduction to a be able to describe the system on two timescales,
final density matrix which commutes with the one fast (for the light particle) and one slow (for the
operator A (and therefore to the suppression of the heavy one) and, therefore, place oneself in a setting
interference terms). which may allow the use of adiabatic techniques. In
The thermal-bath model makes use of the this setting, for the measure of the heavy particle
Heisenberg representation and relies on results of (e.g., its position) one may be allowed to consider
the theory of C -algebras. This approach is closely the light particle in a scattering regime, and use the
linked with (quantum) statistical mechanics; its aim wave operator corresponding to a potential
is to prove, after conditioning with respect to the Vx (y) V(y x).
degrees of freedom of the bath, that a special role Taking the partial trace with respect to the
emerges for a commuting set of operators of the degrees of freedom of the light particle (this
128 Introductory Article: Quantum Mechanics
corresponds to no information of its final state) one So the appearance of classical properties of a
finds, at least heuristically, that the state of the quantum system corresponds to the emergence of
heavy particle is now described (due to the trace an algebra with nontrivial center. Since automorphic
operation) by a density matrix
for which in the evolutions of an algebra preserve its center, this
coordinate representation the off-diagonal terms program can be achieved only if we admit the loss of
x, x0 are slightly suppressed by a factor
x, x0 = 1 quantum coherence, and this requires that the
(Wx , Wx0 ) where represents the initial state of quantum systems we describe are open and interact
the light particle and Wx is the wave operator for with the environment, and moreover that the
the motion of the light particle in the potential Vx . commutative algebra which emerges be stable for
One must assume that function which represents time evolution.
the initial state of the heavy particle is sufficiently It may be shown that one must consider quantum
localized so that
x, x0 < 1 for every x0 6 x in its environment in the thermodynamic limit, that is,
support. consider the interaction of the system to be
If the environment is made of very many measured with a thermal bath. A discussion of the
particles (their number N() must be such that possible emergence of classical observables and of
lim ! 0 N() = 1) and the heavy particle can be the corresponding dynamics is given by Gell-Mann
supposed to have separate interactions with all of (1993). In all these approaches, the commutative
them, the off-diagonal elements of the density subalgebra is selected by the specific form of the
matrix tend to 0 as ! 0 and the resulting density interaction; therefore, the measuring apparatus
0 0
matrix tends to R have the form (x, x ) = (x x ) determines the algebra of classical observables.
(x), (x)
0, (x) dx = 1. If it can be supposed On the experimental side, a number of very
that all interactions take place within a time T() , interesting results have been obtained, using very
> 0 one has (x) = j (x)j2 . refined techniques; these experiments usually also
If the interactions are not independent, the determine the decoherence time. The experimental
analysis becomes much more involved since it has results, both for the collision model (Hornberger
to be treated by many-body scattering theory; this et al. 2003) and for the thermal-bath model
suggests that the scattering approach can be hardly (Hackermueller et al. 2004), are done mostly with
used in the context of the thermal-bath model. In fullerene (a molecule which is heavy enough and is
any case, the selection of a preferred basis (the not deflected too much after a collision with a
coordinate representation) depends on the fact that particle of the gas). They show a reasonable
one is dealing with a scattering phenomenon. A few accordance with the (rough) theoretical conclusions.
steps have been made for a rigorous analysis (Teta The most refined experiments about decoherence
2004) but we are very far from a mathematically are those connected with quantum optics (circularly
satisfactory answer. polarized atoms in superconducting cavities). These
The thermal-bath approach has been studied are not related to the wave nature of the particles
within the algebraic formulation of QM and stands but in a sense to the wave nature of a photon as a
on good mathematical ground (Alicki 2002, single unit. The electromagnetic field is now
Blanchard et al. 2003, Sewell 2005). Its drawback regarded as an incoherent superposition of states
is that it is difficult to associate the formal scheme with an arbitrarily large number of photons.
with actual physical situations and it is difficult to Polarized photons can be produced one by one,
give a realistic estimate on the decoherence time. and they retain their individuality and their polar-
The thermal-bath approach attributes the deco- ization until each of them interacts with the
herence effect to the practical impossibility of environment (e.g., the boundary of the cavity or a
distinguishing between a vast majority of the pure particle of the gas). In a sense, these experimental
states of the systems and the corresponding statis- results refer to a decoherence by collision theory.
tical mixtures. In this approach, the observables are The experiments by Haroche (2003) prove that
represented by self-adjoint elements of a weakly coherence may persist for a measurable interval of
closed subalgebra M of all bounded operators B(H) time and are the most controlled experiments on
on a Hilbert space H. This subalgebra may depend coherence so far.
on the measuring apparatus (i.e, not all the
apparatuses are fit to measure a set of observables).
Other Approaches
A classical observable by definition commutes
with all other observables and therefore must belong We end this section with a brief discussion of the
to the center of A which is isomorphic to a problem of hidden variables and a presentation of
collection of functions on a probability space M. an entirely different approach to QM, originated by
Introductory Article: Quantum Mechanics 129
D Bohm (1952) and put recently on firm mathema- configuration of the points, the dynamics in a
tical grounds by Duerr et al. (1999). The approach is potential field V(x) is described in the following
radically different from the traditional one and it is way: for the wave by a nonrelativistic Schrodinger
not clear at present whether it can give a solution to equation with potential V and for the coordinates by
the measurement problem and a description of all the ordinary differential equation (ODE)
the phenomena which traditional QM accounts for.
But it is very interesting from the point of view of rk
x_ k h=mk Im x; xk 2 R3
the mathematics involved.
We have remarked that the formulation of QM
that is summarized in the three axioms given earlier where mk is the mass of the mth particle.
has many unsatisfactory aspects, mainly connected Notice that the vector field is singular at the zeros
with the superposition principle (described in its of the wave function, therefore global existence and
extremal form by the Schrodingers cat paradox) uniqueness must be proved. To see why Bohmian
and with the problem of measurement which mechanics is empirically equivalent to QM, at least
reveals, for example, through the EinsteinRosen for measurement of position, notice that the
Podolski paradox, an intrinsic nonlocality if one equation for the points coincides with the continuity
maintains that their objective properties can be equation in QM. It follows that if one has at time
attributed to systems which are far apart. From the zero a collection of points distributed with density
very beginning of QM, attempts have been made to j0 j2 , the density at time t will be j(t)j2 where (t)
attribute these features to the presence of hidden is the solution of the Schrodinger equation with
variables; the statistical nature of the predictions initial datum 0 .
of QM is, from this point of view, due to the Bohm (1952) formulated the theory as a modi-
incompleteness of the parameters used to describe fication of Newtons laws (and in this form it has
the systems. The impossibility of matching the been widely used) through the introduction of a
statistical prediction of QM (confirmed by experi- quantum potential VQ . This was achieved by
mental findings) with a local theory based on hidden writing the wave function in its polar form
variables and classical probability theory has been = ReiS=h and writing the continuity equation as a
known for sometime (Kochen and Specker 1967), modified HamiltonJacobi equation. The version of
also through the use of Bell inequalities (Bell Bohms theory discussed in Duerr et al. (1999)
1964) among correlations of outcomes of separate introduces only the guiding wave function and the
measurements performed on entangled system coordinates of the points, and puts the theory on
(mainly two photons or two spin-1/2 particles firm mathematical grounds. Through an impressive
created in a suitable entangled state). series of mathematical results, these authors and
A proof of the intrinsic nonlocality of QM (in the their collaborators deal with the completeness of
above sense) was given by L Hardy (see Haroche the velocity vector field, the asymptotic behavior of
(2003)). the points trajectories (both for the scattering regime
While experimental results prove that one and for the trapped trajectories, which are shown to
cannot substitute QM with a naive theory of correspond to bound states in QM), with a rigorous
hidden variables, more refined attempts may have analysis of the theorem on the flux across a surface
success. We shall only discuss the approach of Bohm (a cornerstone in scattering theory) and the detailed
(following a previous attempt by de Broglie) as analysis of the two-slit experiment through a
presented in Duerr et al. (1999). It is a dynamical study of the interaction with the measuring appara-
theory in which representative points follow classical tus. The theory is completely causal, both for the
paths and their motion is governed by a time- trajectories of the points and for the time develop-
dependent vector velocity field (in this sense, it is ment of the pilot wave, and can also accommodate
not Newtonian). In a sense, Bohmian mechanics is a points with spin. It leads to a mathematically precise
minimal completion of QM if one wants to keep the formulation of the semiclassical limit, and it may
position as primitive observable. To these primitive also resolve the measurement problem by relating
objects, Bohms theory adds a complex-valued func- the pilot wave of the entire system to its approximate
tion (the guiding wave in Bohms terminology) decomposition in incoherent superposition of pilot
defined on the configuration space Q of the particles. wave associated with the particle and to the measur-
In the case of particles with spin, the function is ing apparatus (this would be the way to see the
spinor-valued. Dynamics is given by two equations: collapse of the wave function in QM). A weak
one for the coordinates of the particles and one for point of this approach is the relation of the
the guiding wave. If x x1 , . . . , xN describes the representative points with observable quantities.
130 Introductory Article: Quantum Mechanics
Streater RF and Wightman AS (1964) PCT, Spin and Statistics Wiener N (1938) The homogeneous chaos. American Journal of
and All That. New YorkAmsterdam: Benjamin. Mathematics 60: 897936.
Takesaki M (1971) One parameter autmorphism groups and Wigner EP (1952) Die Messung quantenmechanischer operatoren.
states of operator algebras. Actes du Congres International des Zeitschrift fur Physik 133: 101108.
Mathamaticiens Nice, 1970, Tome 2, pp. 427432. Paris: Yafaev DR (1992) Mathematical scattering theory. Transactions
Gauthier Villars. of Mathematical Monographs. Providence, RI: American
Teta A (2004) On a rigorous proof of the JoosZeh formula for Mathematical Society.
decoherence in a two-body problem. Multiscale Methods in Zee HI (1970) On the interpretation of measurement in quantum
Quantum Mechanics, pp. 197205. Trends in Mathematics. theory. Foundations of Physics 1: 6976.
Boston: Birkhauser. Zurek WH (1982) Environment induced superselection rules.
Weyl A (1931) The Theory of Groups and Quantum Mechanics. Physical Reviews D 26(3): 18621880.
New York: Dover.
Definition 4 A set equipped with a topology is This space is neither Hausdorff nor compact (see
called a topological space (with respect to the given later for definition of compactness).
topology). Elements of a topological space are
Definition 13 Let X and Y be two topological
sometimes called points.
spaces and let f : X ! Y be a map from X to Y. We
Definition 5 Let x 2 X. A neighborhood of x is a say that f is continuous if f 1 (A) is open (in X)
subset of X containing an open set which contains x. whenever A is open (in Y).
Remark This seems a clumsy definition, but turns Remark Continuity is the single most important
out to be more useful in the general case than concept here. In this general setting, it looks a little
restricting to open neighborhoods, which is often done. different from the definition, but this latter works
only for metric spaces, which we shall come to shortly.
Definition 6 A subcollection of open sets B T is
called a basis for the topology T if every open set is Definition 14 A map f : X ! Y is a homeomorph-
a union of sets of B. ism if it is a continuous bijective map such that its
Definition 7 A subcollection of open sets S T is inverse f 1 is also continuous.
called a sub-basis for the topology T if every open Remark Homeomorphisms are the natural maps
set is a union of finite intersections of sets of S. for topological spaces, in the sense that two home-
Definition 8 The closure A of a subset A of X is omorphic spaces are indistinguishable from the
the smallest closed set containing A. point of view of topology. Topological invariants
are properties of topological spaces which are
Definition 9 The interior A of a subset A of X is preserved under homeomorphisms.
the largest open set contained in A.
Definition 15 Let B A. Then one can define the
Remark It is sometimes useful to define the relative topology of B by saying that a subset C B
A = {x 2 A,
boundary of A as the set An x 62 A}.
is open if and only if there exists an open set D of A
Definition 10 Let A be a subset of a topological such that C = D \ B.
space X. A point x 2 X is called a limit point of A if Definition 16 A subset B A equipped with the
every open set containing x contains some point of relative topology is called a subspace of the
A other than x. topological space A.
Definition 11 A subset A of X is said to be dense in
= X. Remark Thus, if for subsets of the real line, we
X if A consider A = [0, 3], B = [0, 2], then C = (1, 2] is open
Definition 12 A topological space X is called a in B, in the relative topology induced by the usual
Hausdorff space if for any two distinct points x, y 2 X, topology of R.
there exist an open neighborhood of A of x and an Definition 17 Given two topological spaces X and Y,
open neighborhood B of y such that A and B are
we can define a product topological space Z = X Y,
disjoint (that is, A \ B = ;).
where the set is the Cartesian product of the two sets X
Remark and Examples and Y, and sets of the form A B, where A is open in
X and B is open in Y, form a basis for the topology.
(i) This is looking more like what we expect.
However, certain mildly non-Hausdorff spaces Remark Note that the open sets of X Y are not
turn out to be quite useful, for example, in twistor always of this product form (A B).
theory. A pocket furnishes such an example.
Definition 18 Suppose there is a partition of X into
Explicitly, consider X to be the subset of the real
disjoint subsets A , 2 I , for some index set I , or
plane consisting of the interval [1, 1] on the x-
equivalently, there is defined on X an equivalence
axis, together with the interval [0, 1] on the line
relation . Then one can define the quotient
y = 1, where the following pairs of points are
topology on the set of equivalence classes {A , 2
identified: (x, 0) (x, 1), 0 < x 1. Then the two
I }, usually denoted as the quotient space X= = Y,
points (0, 0) and (0, 1) do not have any disjoint
as follows. Consider the map : X ! Y, called the
neighborhoods. Strictly speaking, one needs the
canonical projection, which maps the element x 2 X
notion of a quotient topology, introduced below.
to its equivalence class [x]. Then a subset U Y is
(ii) For a more truly non-Hausdorff topology,
open if and only if 1 (U) is open.
consider the space of positive integers N =
{1, 2, 3, . . . }, and take as open sets the following: Proposition 1 Let T be the quotient topology on
;, N, and the sets {1, 2, . . . , n} for each n 2 N. the quotient space Y. Suppose T 0 is another
Introductory Article: Topology 133
topology on Y such that the canonical projection is Definition 25 A metric space is a set X together
continuous, then T 0 T . with a function d : X X ! R satisfying
Definition 19 An (open) cover {U : 2 I } for X is a (i) d(x, y) 0,
collection of open sets U X such that their union (ii) d(x, y) = 0 , x = y,
equals X. A subcover of this cover is then a subset of (iii) d(x, z) d(x, y) d(y, z) (triangle inequality).
the collection which is itself a cover for X.
Remarks
Definition 20 A topological space X is said to be
(i) The function d is called the metric, or distance
compact if every cover contains a finite subcover.
function, between the two points.
Remark So for a compact space, however one (ii) This concept of metric is what is generally
chooses to cover it, it is always sufficient to use a known as Euclidean metric in mathematical
finite number of open subsets. This is one of the physics. The distinguishing feature is the posi-
essential differences between an open interval (not tive definiteness (and the triangle inequality).
compact) and a closed interval (compact). The former One can, and does, introduce indefinite metrics
is in fact homeomorphic to the entire real line. (for example, the Minkowski metric) with
various signatures. But these metrics are not
Definition 21 A topological space X is said to be
usually used to induce topologies in the spaces
connected if it cannot be written as the union of two
concerned.
nonempty disjoint open sets.
Definition 26 Given a metric space X and a point
Remark A useful equivalent definition is that any
x 2 X, we define the open ball centred at x with
continuous map from X to the two-point set {0, 1},
radius r (a positive real number) as
equipped with the discrete topology, cannot be
surjective. Br x fy 2 X : dx; y < rg
Definition 22 Given two points x, y in a topolo- Given a metric space X, we can immediately
gical space X, a path from x to y is a continuous define a topology on it by taking all the open balls in
map f : [0, 1] ! X such that f (0) = x, f (1) = y. We X as a basis. We say that this is the topology
also say that such a path joins x and y. induced by the given metric. Then we can recover
Definition 23 A topological space X is path- our usual definition of continuity.
connected if every two points in X can be joined Proposition 4 Let f : X ! Y be a map from the metric
by a path lying entirely in X. space X to the metric space Y. Then f is continuous
Proposition 2 A path-connected space is connected. (with respect to the corresponding induced topologies)
at x 2 X if and only if given any > 0, 9 > 0 such that
Proposition 3 A connected open subspace of R n is d(x, x0 ) < implies d(f (x, ), f (x0 )) < .
path-connected.
Note that we do not bother to give two different
Definition 24 Given a topological space X, define symbols to the two metrics, as it is clear which
an equivalence relation by saying that x y if and spaces are involved. The proof is easily seen by
only if x and y belong to the same connected taking the relevant balls as neighborhoods. Equally
subspace of X. Then the equivalence classes are easy is the following:
called (connected) components of X.
Proposition 5 A metric space is Hausdorff.
Examples
Definition 27 A map f : X ! Y of metric spaces is
(i) The Lie group O(3) of 3 3 orthogonal matrices uniformly continuous if given any > 0 there exists
has two connected components. The identity > 0 such that for any x1 , x2 2 X, d(x1 , x2 ) <
connected component is SO(3) and is a subgroup. implies d(f (x1 ), f (x2 )) < .
(ii) The proper orthochronous Lorentz transformations
of Minkowski space form the identity component Remark Note the difference between continuity
of the group of Lorentz transformations. and uniform continuity: the latter is stronger and
requires the same for the whole space.
Definition 28 Two metrics d1 and d2 defined on X
Metric Spaces are equivalent if there exist positive constants a and
b such that for any two points x, y 2 X we have
A special class of topological spaces plays an
important role: metric spaces. ad1 x; y d2 x; y bd1 x; y
134 Introductory Article: Topology
Remark This is clearly an equivalence relation. Definition 31 A metric space X is complete if every
Two equivalent metrics induce the same topology. Cauchy sequence in X converges to a limit in it.
Examples Examples
(i) Given a set X, we can define the discrete metric (i) The closed interval [0, 1] on the real line is
as follows: d0 (x, y) = 1 whenever x 6 y. This complete, whereas the open interval (0, 1) is
induces the discrete topology on X. This is quite not. For example, the Cauchy sequence
a convenient way of describing the discrete {1=n, n = 2, 3, . . . } has no limit in this open
topology. interval. (Considered as a sequence on the real
(ii) In R, the usual metric is d(x, y) = jx yj, and line, it has of course the limit point 0.)
the usual topology is the one induced by this. (ii) The spaces Rn are complete.
(iii) More generally, in Rn , we can define a metric (iii) The Hilbert space 2 consisting of all
for every p 1 by sequences
P 2of real numbers {x1 , x2 , . . . } such
( )1=p that 1 1 xk converges is complete with respect
X
n
p to the obvious metric which is a generalization
dp x; y jxk yk j
to infinite dimension of d2 above. For arbi-
k1
trary p 1, one can similarly define p , which
where x = (x1 , x2 , . . . , xn ), y = (y1 , y2 , . . . , yn ). In are also complete and are hence Banach
particular, for p = 2 we have the usual Eucli- spaces.
dean metric, but the other cases are also useful.
To continue the series, one can define Remarks Completeness is not a topological invar-
iant. For example, the open interval (1, 1) and the
d1 max fjxk yk jg whole real line are homeomorphic (with respect to
1<k<n
the usual topologies) but the former is not complete
All these metrics induce the same topology on R n . while the latter is. The homeomorphism can
(iv) In a vector space V, say over the real or the conveniently be given in terms of the trigonometric
complex field, a function k k : V ! R is called function tangent.
a norm if it satisfies the following axioms:
(a) kxk = 0 if and only if x = 0, Definition 32 A subset B of the metric space X is
(b) kxk = jjkxk, and bounded if there exists a ball of radius R (R > 0)
(c) kx yk kxk kyk. which contains it entirely.
Then it is easy to see that a metric can be defined Theorem 1 (HeineBorel) Any closed bounded
using the norm subset of Rn is compact.
Remark The converse is also true. We have thus a
dx; y kx yk
nice characterization of compact subsets of R n as
In many cases, for example, the metrics defined in being closed and bounded.
example (iii) above, one can define the norm of a
Proposition 6 Any bounded sequence in Rn has a
vector as just the distance of it from the origin. One
convergent subsequence.
obvious exception is the discrete metric.
A slightly more general concept is found to be Definition 33 Consider a sequence {fn } of real-
useful for spaces of functions and operators: that of valued functions on a subset A (usually an interval)
seminorms. A seminorm is one which satisfies the of R. We say that {fn } converges pointwise in A if
last two of the conditions, but not necessarily the the sequence of real numbers {fn (x)} converges for
first, for a norm, as listed above. every x 2 A. We can then define a function f : A ! R
by f (x) = limn!1 fn (x), and write fn ! f .
Definition 29 Given a metric space X, a sequence
of points {x1 , x2 , . . . } is called a Cauchy sequence if, Definition 34 A sequence of functions fn : A !
given any > 0, there exists a positive integer N R, A R is said to converge uniformly to a function
such that for any k, > N we have d(xk , x ) < . f : A ! R if given any > 0, there exists a positive
integer N such that, for all x, jfn (x) f (x)j <
Definition 30 Given a sequence of points
whenever n > N.
{x1 , x2 , . . . } in a metric space X, a point x 2 X is
called a limit of the sequence if given any > 0, Theorem 2 Let fn : (a, b) ! R be a sequence of
there exists a positive integer N such that for any functions continuous at the point c 2 (a, b), and
n > N we have d(x, xn ) < . We say that the suppose fn converges uniformly to f on (a, b). Then f
sequence converges to x. is continuous at c.
Introductory Article: Topology 135
Remark and Example The pointwise limit of take equivalence classes of functions which are equal
continuous functions need not be continuous, as almost everywhere (that is, up to a null set), but very
can be shown by the following example: often we can take representatives of these classes
fn (x) = xn , x 2 [0, 1]. We see that the limit function and just deal with genuine functions instead. Note
f is not continuous: that of all Lp , only L2 is a Hilbert space.
n
f x
0 x 6 1 Definition 38 In the space Lp , we define its norm by
1 x1 Z 1=p
p
Definition 35 Let X be a metric space. A map kf k jf xj dx
f : X ! X is a contraction if there exists c < 1 such
that d(f (x), f (y)) cd(x, y) for all x, y 2 X. Now we turn to general normed spaces, and
operators on them.
Theorem 3 (Banach) If X is a complete metric
space and f is a contraction in X, then f has a unique Definition 39 Convergence in the norm is also
fixed point x 2 X, that is, f (x) = x. called strong convergence. In other words, a
sequence (xn ) in a normed space X is said to
converge strongly to x if
Some Function and Operator Spaces
lim kxn xk 0
The spaces of functions and operators can be n!1
equipped with different topologies, given by various
Definition 40 A sequence (xn ) in a normed space X
concepts of convergence and of norms (or sometimes
is said to converge weakly to x if
seminorms), very often with different such concepts
for the same space. As we saw earlier, a norm in a lim f xn f x
n!1
vector space gives rise to a metric, and hence to a
topology. Similarly with the concept of convergence for all bounded linear functionals f.
for sequences of functions and operators, as one Consider the space B(X, Y) of bounded linear
then knows what the limit points, and hence closed operators T from X to Y. We can make this into a
sets, are. normed space by defining the following norm:
But before we do that, let us introduce, in a
slightly different context, a topology which is in kTk sup kTxk
some sense the natural one for the space of x 2X; kxk 1
continuous maps from one space to another.
Then we can define three different concepts of
Definition 36 Consider a family F of maps from a
convergence on B(X, Y). There are in fact more in
topological space X to a topological space Y, and
current use in functional analysis.
define W(K, U) = {f : f 2 F, f (K) U}. Then the
family of all sets of the form W(K, U) with K Definition 41 Let X and Y be normed spaces and
compact (in X) and U open (in Y) form a sub-basis let (Tn ) be a sequence of operators Tn 2 B(X, Y).
for the compact open topology for F.
(i) (Tn ) is uniformly convergent if it converges in
Consider a topological space X and sequences of the norm.
functions (fn ) on it. Let D X. We can then define (ii) (Tn ) is strongly convergent if (Tn x) converges
pointwise convergence and uniform convergence strongly for every x 2 X.
exactly as for functions on subsets of the real line. (iii) (Tn ) is weakly convergent if (Tn x) converges
weakly for every x 2 X.
Definition 37 Let X, D and (fn ) as above.
Remark Clearly we have: uniform convergence )
(i) The functions fn converge pointwise on D to a
strong convergence ) weak convergence, and the
function f if the sequence of numbers
limits are the same in all three cases. However, the
fn (x) ! f (x), 8x 2 D.
converses are in general not true.
(ii) The functions fn converge uniformly on D to a
function f if given > 0, there exists N such that
for all n > N we have jfn (x) f (x)j < , 8x 2 D. Homotopy Groups
Next we consider the Lebesgue spaces Lp , that The most elementary and obvious property of a
is, functions f defined on subsets of Rn , such topological space X is the number of connected
that jf (x)jp is Lebesgue integrable, for real components it has. The next such property, in a
numbers p 1. To define these spaces, we tacitly certain sense, is the number of holes X has. There
136 Introductory Article: Topology
are higher analogues of these, called the homotopy Definition 45 A space X is called simply connected
groups, which are topological invariants, that is, if 1 (X) is trivial.
they are invariant under homeomorphisms. They
To define the higher homotopy groups, let us go
play important roles in many topological considera-
into a little detail about homotopy.
tions in field theory and other topics of mathema-
tical physics. The articles Topological Defects Definition 46 Given two topological spaces X and
and Their Homotopy Classification and Electric- Y, and maps
Magnetic Duality contain some examples.
p; q : X ! Y
Definition 42 Given a topological space X, the
zeroth homotopy set, denoted 0 (X), is the set of we say that h is a homotopy between the maps p, q if
connected components of X. One sometimes writes
0 (X) = 0 if X is connected. h:XI !Y
To define the fundamental group of X, or 1 (X), is a continuous map such that h(x, 0) = p(x),
we shall need the concept of closed loops, which we h(x, 1) = q(x), where I is the unit interval [0, 1]. In
shall find useful in other ways too. For simplicity, this case, we write p q.
we shall consider based loops (that is, loops passing
Definition 47 A map f : X ! Y is a homotopy
through a fixed point in X). It seems that in most
equivalence if there exists a map g : Y ! X such
applications, these are the relevant ones. One could
that g f idX and f g idY .
consider loops of various smoothness (when X is a
manifold), but in view of applications to quantum Remark This is an equivalence relation.
field theory, we shall consider continuous loops,
Definition 48 For a topological space X with base
which are also the ones relevant for topology.
point x0 , we define n (X), n 0 as the set of
Definition 43 Given a topological space X and a homotopy equivalence classes of based maps from
point x0 2 X, a (closed) (based) loop is a continuous the n-sphere Sn to X.
function of the parametrized circle to X:
Remark This coincides with the previous defini-
tions for 0 and 1 .
: 0; 2
! X
There is a very nice relation between homotopy
satisfying (0) = (2) = x0 .
classes and loop spaces.
Definition 44 Given a connected topological space
Proposition 8 n (X) = n1 (X) = = 0 (n X).
X and a point x0 2 X, the space of all closed based
loops is called the (parametrized based) loop space Remarks
of X, denoted X.
(i) When we consider the gauge group G in a Yang
Remarks Mills theory, its fundamental group classifies the
monopoles that can occur in the theory.
(i) The loop space X inherits the relative compact
(ii) For n 1, n (X) is a group, the group action
open topology from the space of continuous maps
coming from the joining of two loops together
from the closed interval [0, 2] to X. It also has a
to form a new loop. On the other hand, 0 (X)
natural base point: the constant function mapping
in general is not a group. However, when X is a
all of [0, 2] to x0 . Hence it is easy to iterate the
Lie group, then 0 (X) inherits a group structure
construction and define k X, k 1.
from X, because it can be identified with the
(ii) Here we have chosen to parametrize the circle
quotient group of X by its identity-connected
by [0, 2], as is more natural if we think in
component. For example, the two components
terms of the phase angle. We could easily have
of O(3) can be identified with the two elements
chosen the unit interval [0,1] instead. This
of the group Z2 , the component where the
would perhaps harmonize better with our pre-
determinant equals 1 corresponding to 0 in Z2
vious definition of paths and the definitions of
and the component where the determinant
homotopies below.
equals 1 corresponding to 1 in Z2 .
Proposition 7 The fundamental group of a topo- (iii) For n 2, the group n (X) is always abelian.
logical space X, denoted 1 (X), consists of classes of (iv) Examples of nonabelian 1 are the fundamental
closed loops in X which cannot be continuously groups of some Riemann surfaces.
deformed into one another while preserving the base (v) Since 1 is not necessarily abelian, much of the
point. direct-sum notation we use for the homotopy
Introductory Article: Topology 137
groups should more correctly be written multi- transitive, then we have the following nice
plicatively. However, in most literature in result: coverings of X are in 11 correspon-
mathematical physics, the additive notation dence with normal subgroups of 1 (X).
seems to be preferred. (ii) Given a connected space X, there always exists a
unique connected simply connected covering space
Examples e called the universal covering space. Further-
X,
(i) n (X Y) = n (X) n (Y), n 1. more, Xe covers all the other covering spaces of X.
(ii) For the spheres, we have the following results: For the higher homotopy groups, one has
e
0 if i > n n X n X; n2
i Sn
Z if i n One very important class of homotopy groups are
i S1 0 if i > 1 those of Lie groups. To simplify matters, we shall
n1 Sn Z2 if n 3 consider only connected groups, that is, 0 (G) = 0.
Also we shall deal mainly with the classical groups,
n2 Sn Z2 if n 2
and in particular, the orthogonal and unitary groups.
6 S3 Z12
Proposition 9 Suppose that G is a connected Lie
(iii) From the theory of sphere bundles, we can group.
deduce:
(i) If G is compact and semi-simple, then 1 (G) is
i S2 i1 S1 i S3 if i 2 e is still compact.
finite. This implies that G
i S4 i1 S3 i S7 if i 2 (ii) 2 (G) = 0.
(iii) For G compact, simple, and nonabelian,
i S8 i1 S7 i S15 if i 2 3 (G) = Z.
(iv) For G compact, simply connected, and simple,
and the first of these relations give the follow-
4 (G) = 0 or Z2 .
ing more succinct result:
Examples
i S3 i S2 if i 3
(i) 1 (SU(n)) = 0.
(iv) A result of Serre says that all the homotopy (ii) 1 (SO(n)) = Z2 .
groups of spheres are in fact finite except n (Sn ) (iii) Since the unitary groups U(n) are topologically
and 4n1 (S2n ), n 1. the product of SU(n) with a circle S1 , their
Definition 49 Given a connected space X, a map homotopy groups are easily computed using the
: B ! X is called a covering if (i) (B) = X, and (ii) for product formula. We remind ourselves that
each x 2 X, there exists an open connected neighbor- U(1) is topologically a circle and SU(2) topolo-
hood V of x such that each component of 1 (V) is open gically S3 .
in B, and restricted to each component is a home- (iv) For i 2, we have:
omorphism. The space B is called a covering space. i SO3 i SU2
Examples i SO5 i Sp2
(i) The real line R is a covering of the group U(1). i SO6 i SU4
(ii) The group SU(2) is a double cover of the group Just for interest, and to show the richness of the
SO(3). subject, some isomorphisms for homotopy groups
(iii) The group SL(2, C) is a double cover of the are shown in Table 1 and some homotopy groups
Lorentz group SO(1, 3). for low SU(n) and SO(n) are listed in Table 2.
(iv) The group SU(2, 2) is a 4-fold cover of the
conformal group in four dimensions. This local Table 1 Some isomorphisms for homotopy groups
isomorphism is of great importance in twistor
theory. Isomorphism Range
4 5 6 7 8 9 10
Appendix: A Mathematicians the other hand, the map f 1 is defined if and only
Basic Toolkit if f is bijective.
6. A map from a set to either the real or complex
The following is a drastically condensed list, most numbers is usually called a function.
of which is what a mathematics undergraduate 7. A map between vector spaces, and more particu-
learns in the first few weeks. The rest is included larly normed spaces (including Hilbert spaces), is
for easy reference. These notations and concepts called an operator. Most often, one considers
are used universally in mathematical writing. We linear operators.
have not endeavored to arrange the material in a 8. An operator from a vector space to its field of
logical order. Furthermore, given structures such as scalars is called a functional. Again, one con-
sets, groups, etc., one can usually define substruc- siders almost exclusively linear functionals.
tures such as subsets, subgroups, etc., in a
straightforward manner. We shall therefore not Relations
spell this out.
1. A relation on a set A is a subset R A A.
We say that x y if (x, y) 2 R.
Sets 2. We shall only be interested in equivalence relations.
An equivalence relation is one satisfying, for all
A [ B fx : x 2 A or x 2 Bg union
x, y, z 2 A:
A \ B fx : x 2 A and x 2 Bg intersection (a) x x (reflexive),
AnB fx : x 2 A and x 62 Bg complement (b) x y ) y x (symmetric),
A B fx; y : x 2 A; y 2 Bg Cartesian product (c) x y, y z ) x z (transitive).
3. If is an equivalence relation in A, then for each
x 2 A, we can define its equivalence class:
Maps x
fy 2 A : y xg
It can be shown that equivalence classes are
1. A map or mapping f : A ! B is an assignment of nonempty, any two equivalence classes are either
an element f (x) of B for every x 2 A. equal or disjoint, and they together partition the set
2. A map f : A ! B is injective if f (x) = f (y) A. Subgroup equivalence classes are called cosets.
) x = y. This is sometimes called a 11 map, a 4. An element of an equivalence class is called a
term to be avoided. representative.
3. A map f : A ! B is surjective if for every y 2 B
there exists an x 2 A such that y = f (x). This is Groups
sometimes called an onto map.
A group is a set G with a map, called multiplication
4. A map f : A ! B is bijective if it is both surjective
or group law
and injective. This is also sometimes called a 11
map, a term to be equally avoided. G G ! G
5. For any map f : A ! B and any subset C B, the
x; y 7! xy
inverse image f 1 (C) = {x: f (x) 2 C} A is always
defined, although, of course, it can be empty. On satisfying
Introductory Article: Topology 139
Differentiate p2 = m20 with respect to , this yields for an arbitrary vector v. The contraction of a
2-form with a vector yields a 1-form.
dp0 It is easily seen that a 2-form can be expressed in
2p f 2m0 2 f v 0 6
dt terms of a polar vector and an axial vector: if it is to
be invariant with respect to parity transformations
or with
E Ex dx Ey dy Ez dz 14 and
and B is a 2-form in three dimensions, i1 ip gi1 j1 gip jp j1 jp 25
The Hodge dual has the property that to the scalar product ( , ). Whereas the differential
operator d maps p-forms into (p 1)-forms, the
dx1 ^ ^ dxp codifferential operator d maps p-forms into (p 1)-
g11 gpp sign forms.
The relation d2 = 0 leads to
dxp1 ^ ^ dxn 27
d 2 / dd / d2 0 35
where is a permutation of the indices (1, . . . , n), This fact plays an essential role in connection with
(1) < < (p), and (p 1) < < (n). We also the conservation laws.
have Finally, we want to obtain a coordinate expres-
sion for d . Indeed d = Div for
dxp1 ^ ^ dxn
@Kj
gp1p1 gnn 1pnp sign DivK 36
@xj
dx1 ^ ^ dxp 28 where K is the multi-index of the coeffecients in
= K dxK , and K indicates that K = (k1 , . . . , kp ) is in
We therefore find that the application of the the order k1 < < kp . We will show that
Hodge dual to a p-form twice yields (, d ) = (, Div) for an arbitrary (p 1)-form
. It is a fact that
dx1 ^ ^dxp Z
g11 gpp sign dxp1 ^ ^ dxn ; d d; dI I 1
37
g11 gnn 1pnp dx1 ^ ^ dxp 29 Now we have the coordinate expressions
or d dL ^ dxL 38
pnp Ind g
1 1 Id 30 and (dxL )K = KL . It follows that
where Ind g is the number of times (1) occurs along jK @L L
the diagonal of g. dI dL ^ dxL I I 39
@xj K
Now let be a (p 1)-form, and a p-form.
Then d is an (n p 1)-form, and or
jK @K
d^ d ^ 1p1 ^ d dI I 40
@xj
p1 np1p1
d ^ 1 1 Here we use
Indg
1 ^ d
^ I IKL K L 41
np1 Indg
d ^ 1 1
where
^ d 31
8
We then have >
> 1 if (KL) is an even
>
>
Z >
> permutation of I
>
<
d; ; d d ^ 32 IKL 1 42
M > if (KL) is an odd
>
>
>
> permutation of I
with >
>
: 0 otherwise
d 1np1 1Ind g d 33
Use of the Leibnitz rule yields
We are here using the scalar product of two p-forms Z Z
Z I jK @K I
dI 1 I 1
; : ^ 34 @xj
M Z @ jK I
I K
With the help of Stokes theorem the last integral in 1
@xj
eqn [32] may be turned into a surface term at Z I
infinity, which vanishes for and with compact jK @
K I 1 43
support. d is the adjoint operator to d with respect @xj
144 Abelian and Nonabelian Gauge Theories Using Differential Forms
The first term corresponds to a surface integration We apply again the Hodge dual:
jK
and we can neglect it. We then have I I = jK from
the antisymmetry of , so that @Ex
d F div Edt curl Bx dx
Z @t
@jK
; d K 1 ; Div 44 @Ey
@xj curl By dy
@t
@Ez
curl Bz dz 53
@t
The Maxwell Equations
In Minkowski space the expression d equals the
The Maxwell equations become remarkably concise codifferential. Therefore, the equation d F = d
when expressed in terms of differential forms, namely F = j holds, with j given by j = (
, J), which is
equivalent to
dF 0; d F j 45
@E
where F is the field strength and j is the current div E
; curl B J 54
@t
density. We wish to demonstrate this. We use a
(3 1)-separation of the exterior derivative into a the inhomogeneous Maxwell equations.
timelike and a spacelike part:
@ Current Conservation
d d dt ^ 46
@t
The electromagnetic 4-current is
We then get
j
0 u
0 ;
0 v
; J 55
@B
dF dE ^ dt dB 0 47 where
is the charge density and J the current
@t
density. This corresponds to a 1-form
By comparing coefficients, we arrive at
j
dt Jx dx Jy dy Jz dz 56
@B
dE ; dB 0 48 The Hodge dual is j = 3 j2 ^ dt, with the 3-form
dt
3 =
dx ^ dy ^ dz, and the 2-form
In vector notation
j2 Jx dy ^ dz Jy dz ^ dx Jz dx ^ dy 57
@B
curl E ; div B 0 49
@t From the Maxwell equation d F = j, it follows
the usual form of the homogeneous Maxwell that
equations.
By direct application of the formula [27], one finds d 2 F d j 0 58
F ?B ^ dt ?E 50 that is
where ? means the Hodge dual in three space dj d3 j2 ^ dt d3 dj2 ^ dt
dimensions. One finds @
div J dt ^ dx ^ dy ^ dz
@t
@?E
dF d?E d?B ^ dt 51 @
@t div J 0 59
@t
Therefore,
This is the continuity equation. R
d F div Edx ^ dy ^ dz The total charge inside a volume V is Q = V
dV,
therefore
x @Ex
curl B dy ^ dz ^ dt Z Z
dt dQ d
dV J n dS 60
@Ey dt dt V
curl By dz ^ dx ^ dt @V
dt
where @V is the surface which encloses the
@Ez volume V, dS is the surface element, and n is the normal
curl Bz dx ^ dy ^ dt 52
dt vector to this surface. This is current conservation.
Abelian and Nonabelian Gauge Theories Using Differential Forms 145
The Gauge Potential of the form g = exp {i(x)}, with g an element of the
abelian gauge group G = U(1). The free action is
The Poincare lemma tells us that dF = 0 implies
F = dA, with the 4-potential A: Z
S0 L 0 d 4 x 69
A dt A 61
and the vector potential A = Ax dx Ay dy Az dz. with
From
L0 i @ m 70
@
F E ^ dt B d dt ^ A
@t the Lagrange density. This action is not invariant
@A under gauge transformations:
d ^ dt dA dt ^ 62
@t
L0 ! L00 i @ m @ 71
it follows by comparing coefficients that
The undesired term can be compensated by the
@A introduction of a gauge potential ! in a covariant
E d ; B dA 63
@t derivative of ,
In vector notation this is
D d ! 72
@A
E grad ; B curl A 64 which has the desired transformation property
@t
D ! exp {i}D when besides the transformation
The 4-potential is determined up to a gauge function :
(x) ! exp {i(x)} (x) of the matter field the gauge
A0 A d 65 potential simultaneously transforms according to the
gauge transformation ! ! ! id. The new Lagrange
This gauge freedom has no influence on the density is
observable quantities E and B:
L i D m L0 i! x x 73
F0 dA0 dA d2 dA F 66
2 The substitution @ ! D is known to physicists;
The Laplace operator is 4 = (d d) = dd
with ! = iqA it is the ansatz of minimal coupling
d d, so when the 4-potential A fulfills the condition
for taking into account electromagnetic effects:
d A = 0, we have
@ ! @ iqA . The Lagrange density becomes in
4A d dA d F j 67 this notation L = L0 A J , where J = q .
The Lagrange density must now be completed by
the classical wave equation. The condition a kinetic term for the gauge potential and we get the
d A = 0 is called the Lorentz gauge condition. complete electromagnetic Lagrange density
This condition can always be fulfilled by using the
gauge freedom: d (A d) = 0 is fulfilled when L L0 A J 14 F F 74
d d = 4 = d A, where we have used the fact
that d = 0 for functions. That is to say, d A = 0 is with F = @ A @ A . In the action this corre-
fulfilled when is a solution of the inhomogeneous sponds to
wave equation. Z Z
1
S S0 A J vol4 F F vol4 75
M 4 M
Gauge Invariance
In quantum mechanics, the electron is described by a We get the field equations for the potential A by
wave function which is determined up to a free demanding that the variation of the action vanishes:
phase. Indeed, at every point in space this phase can Z Z
1 4
be chosen arbitrarily: SA
A J vol F F vol4 76
M 4 M
0
x ! x expfixg x
x ! 0 x x expfixg 68 We write now
Z
with the only condition being that (x) is a A J vol4 A; j 77
continuous function. The gauge transformation is M
146 Abelian and Nonabelian Gauge Theories Using Differential Forms
with with
Aa iq!a 94
gx exp fxg 82
and
where g(x) is an element of the Lie group SU(2) and
is an element of the Lie algebra su(2). The Lie Ja Ta 95
algebra is a vector space, and its elements may be
In mathematical terminology ! is called a connec-
expanded in terms of a basis:
tion. The quantity A is the physicists gauge
x a xTa 83 potential. The connection is anti-Hermitian and the
gauge potential Hermitian. The gauge potential also
For su(2) the basis elements are traceless and anti- includes the coupling constant q. We will refer to
Hermitian (see below), they are conventionally both ! and A as the gauge potential, where the
expressed in terms of the Pauli matrices, relation between them is given by eqn [94].
We can write the gauge potential as A = Aa dx Ta
a
Ta 84 or, in the SU(2) case, as
2i
A A1 T1 A2 T2 A3 T3 96
with
where we see explicitly that it involves three vector
0 1 0 i
1 ; 2 fields, which couple to the electroweak currents [95]
1 0 i 0 with the single coupling constant q, and which will
85
1 0 become after symmetry breaking the three vector
3 bosons W , W , Z0 of the electroweak gauge theory.
0 1
Actually, a mix of the neutral gauge boson and the
They are conventionally normalized according to photon will combine to yield the Z0 boson, while the
orthogonal mixture gives rise to the electromagnetic
trTa Tb 12 ab 86 interaction, in an SU(2) U(1) theory. At this stage,
Abelian and Nonabelian Gauge Theories Using Differential Forms 147
the gauge bosons are all massless, their masses are The Gauge Potential and the
generated by the Higgs mechanism. Field Strength
The generalization of the abelian relationship
between the gauge potential and the field strength,
Lie-Algebra-Valued p-Forms F = dA, is
; : Ta ; Tb a ^ b
98 A generalization of the gauge transformation of
A, that is, A0 = A d, is eqn [89]:
The Lie bracket in the algebra is
c !0 g1 !g g1 dg 110
Ta ; Tb fab Tc 99
a
A quantity with the transformation property
where fbc are the structure constants. It follows from
this that 0 g1 g 111
a
; Ta ; Tb ^ b Tb ; Ta a
^ b 100 is called a tensorial quantity. The gauge potential
! is according to this definition nontensorial.
or Nevertheless the field strength is tensorial. Indeed
; 1pq1 ; 101 0 dg1 !g dg1 ^ dg
when is a p-form and is a q-form. In the special 12 g1 !g g1 dg; g1 !g g1 dg
case that Ta is a matrix, also the product Ta Tb is dg1 ^ !g g1 d!g g1 ! ^ dg dg1 ^ dg
defined, and from this the product of two Lie- 12 g1 !; !g 12 g1 !g; g1 dg
algebra-valued p-forms 12 g1 dg; g1 !g 12 g1 dg; g1 dg
^ Ta a ^ Tb b
Ta Tb a ^ b
102 g1 g dg1 ^ !g g1 ! ^ dg dg1 ^ dg
g1 ! ^ dg g1 dg ^ g1 !g g1 dg ^ g1 dg
Now the Lie bracket is a commutator:
g1 g 112
Ta ; Tb Ta Tb Tb Ta 103
where we have used the derivation of the relation
and g1 g = Id to get
; Ta ; Tb a ^ b
dg1 g1 dg g1 113
Ta a ^ Tb b
1pq Tb b
^ Ta a In the abelian case, we had dF = 0. The non-
^ 1 pq
^ 104 abelian analog is
; ^ ^ 2 ^ 106 d ! ^ ^ ! 0 115
148 Abelian and Nonabelian Gauge Theories Using Differential Forms
the Bianchi identity. It can also be written as The scalar product is invariant under the action of
G on G: for g 2 G
d ! ^ ^ ! d !; 0 116
h gXg1 ; gYg1 i tr gXYg1
because from eqn [104]
trX; Y hX; Yi 126
21
! ^ 1 ^ ! !; 117
or for X, Y, Z 2 G
The covariant derivative D is defined as
hetX Y etX ; etX ZetX i hY; Zi 127
D : d !; 118 We take the derivative of this equation with respect
for a tensorial quantity. The covariant derivative to t at the value t = 0 and get:
takes tensorial p-forms into tensorial (p 1)-forms: hX; Y; Zi hY; X; Zi 0 128
D0 0 dg1 g g1 !g g1 dg; g1 g We define an action of the algebra G on itself:
1 1 p 1 ad(X): G ! G
dg ^ g g dg 1 g ^ dg
g1 !g; g1 g g1 dg; g1 g adXY X; Y 129
p 1
g1 Dg dg1 ^ g 1 g ^ dg We can then formulate our conclusion as follows:
the action of G on itself is anti-Hermitian:
g1 dgg1 ^ g 1p g1 ^ dg
g1 Dg 119 hadXY; Z i hY; adXZi 130
or
We have thereby verified the transformation prop-
erty of eqn [91]. adXy adX 131
or
The YangMills Action
y T X
X X 121 The SU(2) YangMills action is, in analogy to the
is complex conjugation and XT means abelian case,
where X Z Z
transposition. 1 4 1
S 2 a a
F F vol 2 trF F vol4
For elements of the Lie algebra we can define a 4q M 2q M
scalar product (the Killing metric) Z
1
2 trF ^ F 133
2q M
hX; Yi : tr XY X X 122
We have included the trace in our definition of the
The scalar product is real:
scalar product:
Yi
X
Y
X X hX; Yi Z Z
hX; 123 I n
; : tr <I > vol tr ^ 134
M M
symmetric:
We then write eqn [133] as
hX; Yi trX; Y trY; X hY; Xi 124
S! 12 ; 135
and positive definite:
taking into account the relation between and the
hX; Xi X X X X jX j2 125 field strength F, and indicating the dependence on
Abelian and Nonabelian Gauge Theories Using Differential Forms 149
the gauge potential. Since is tensorial the action is The first term in the last expression is
invariant. Z
Now we calculate the variation von S[!] with
d !; !; d tr ! fd g vol4 145
respect to a variation of the gauge potential: M
We shall write these equations in terms of the fields already by Cartan (1923). A modern presentation of
i0 i differential forms and the manifolds on which they
F E; i 1; 2; 3 155 are defined is given in Abraham et al. (1983). A
recent treatment of electrodynamics in this approach
is Hehl and Obukhov (2003). Weyls argument is in
F12 B3 ; F31 B2 ; F12 B3 156 his paper of 1929.
where the E and B vectors may be thought of as Nonabelian gauge theories today explain the
electric and magnetic fields, even though they have electromagnetic, the strong and weak nuclear
Lie-algebra indices, Fi0 = (Fa )i0 Ta , etc. In the context of interactions. The original paper is that of Yang
the SU(3) theory, they are referred to as the chromo- and Mills (1954). Glashow, Salam, and Weinberg
electric and chromomagnetic fields, respectively. (1980) saw the way to apply it to the weak
The YangMills equations with = 0 are interactions by using spontaneous symmetry
breaking to generate the masses through the use
@i Fi0 iqAi ; Fi0 0 157 of the Higgs (1964) mechanism. tHooft and
with i = 1, 2, 3 a spatial index. In vector notation Veltman (1972) showed that the resulting quan-
this is tum field theory was renormalizable. The strong
interactions were recognized as the nonabelian
div E iqA E E A 158 gauge theory with gauge group SU(3) by Gell-
This is the analog of Gausss equation. Even though Mann (1972). For a modern treatment which puts
we started out without external sources, iq(A E nonabelian gauge theories in the context of
E A) plays the role of a charge density. The differential geometry, see Frankel (1987).
YangMills field E and the potential A combine to
See also: Dirac Fields in Gravitation and Nonabelian
act as a source for the YangMills field. This is an Gauge Theory; Electroweak Theory; Measure on Loop
essential feature of nonabelian gauge theories in Spaces; Nonperturbative and Topological Aspects of
which they differ from the abelian case, due to the Gauge Theory; Quantum Electrodynamics and its
fact that the commutator [A, E] is nonvanishing. Precision Tests.
Now consider the YangMills equations with a
spatial index = i:
@0 Fi0 @j Fij iqA0 ; Fi0 iqAj Fij 0 159 Further Reading
In vector notation this is Abraham A, Marsden J, and Ratiu T (1983) Manifolds, Tensor
Analysis, and Applications. MA: Addison-Wesley.
@E Cartan E (1923) On manifolds with an Affine Connection and the
curl B iqA0 E EA0 Theory of General Relativity. English translation of the French
@t
original 1923/1924 (Bibliopolis, Napoli 1986).
iqA B B A 160
Frankel T (1987) The Geometry of Physics, An Introduction.
replacing the AmpereMaxwell law. Note that there Cambridge University Press.
Gell-Mann M (1972) Quarks: developments in the quark theory
are two extra contributions to the current other of hadrons. Acta Physica Austriaca Suppl. IV: 733.
than the displacement current. Glashow SL (1980) Towards a unified theory: threads in a
The analogs of the laws of Faraday and of the tapestry. Reviews of Modern Physics 52: 539.
absence of magnetic monopoles are derived similarly Hehl FW and Obukhov YN (2003) Foundations of Classical
from the Bianchi identities. The results are Electrodynamics. Boston: Birkhauser.
Higgs PW (1964) Broken symmetries and the masses of gauge
@B bosons. Physical Review Letters 13: 508.
curl E iqfA E E A A0 B BA0 g 161 tHooft G and Veltman M (1972) Regularization and renorma-
@t
lization of gauge fields. Nuclear Physics B 44: 189.
and Poincare H (1953) Oeuvre. Paris: Gauthier-Villars.
Salam A (1980) Gauge unification of fundamental forces. Reviews
div B iqA B B A 162 of Modern Physics 52: 525.
Weinberg SM (1980) Conceptual foundations of the unified
theory of weak and electromagnetic interactions. Reviews of
Further Remarks Modern Physics 52: 515.
Weyl H (1929) Elektron und gravitation. Zeitschrift fuer Physik
The foundations of the mathematics of differential 56: 330.
forms were laid down by Poincare (1953). They Yang CN and Mills RL (1954) Construction of isotopic spin and
were applied to the description of electrodynamics isotopic gauge invariance. Physical Review 96: 191.
Abelian Higgs Vortices 151
1 0.5
0.8 0.4
0.6 0.3
, a
B
0.4 0.2
n=1 n=5
0.2 0.1
0 0
0 2 4 6 8 0 2 4 6 8
r r
(a) (b)
0.6
n=1
0.5
Energy density
0.4
0.3
0.2 n=5
0.1
0
0 2 4 6 8
r
(c)
Figure 1 Static, radially symmetric n-vortices: (a) the 1-vortex profile functions
(r ) (solid curve) and a(r ) (dashed curve) for = 2, 1,
and 1/2, left to right; (b) the magnetic field B; and (c) the energy density of n-vortices, n = 1 to 5, left to right, for = 1.
=1
1
points x1 , x2 distance s apart. One interprets x1 , x2
as the vortex positions. Eint can only depend on their
0.9 separation s = jx1 x2 j, by translation and rotation
=1 invariance. Figure 3 presents graphs of Eint (s)
2
0.8 generated by a lattice minimization algorithm. For
< 1, vortices uniformly attract one another, so a
0.7
1 2 3 4 vortex pair has least energy when coincident. For
n > 1, vortices uniformly repel, always lowering
Figure 2 The energy per unit winding En =n of radially their energy by moving further apart. The graph for
symmetric n-vortices for = 1=2, 1, and 2. = 1 would be a horizontal line, Eint (s) = 2.
154 Abelian Higgs Vortices
1.75
2.42
2.38
Eint
Eint
1.7
2.34
1.65 2.3
0 2 4 6 8 10 0 2 4 6 8 10
s s
(a) (b)
Figure 3 The 2-vortex interaction energy Eint (s) as a function of vortex separation (solid curve), in comparison with its asymptotic
1
form Eint (s) (dashed curve) for (a) = 1=2 and (b) = 2.
which is uniformly attractive. It would be pleasing if capture when = 1=2. Since type I vortices attract,
qn , mn could be deduced easily from q, m. One one might expect to be always negative, indicating
might guess qn = jnjq, mn = nm, in analogy with that the vortices deflect towards one another. In
monopoles. Unfortunately, this is false: qn , mn fact, as Figure 5a shows, this happens only for small
grow approximately exponentially with jnj. v and large b. Another naive expectation is that
= 0 or = 180 when b = 0 (either vortices pass
through one another or ricochet backwards in a
Vortex Scattering head-on collision). In fact = 90 , the only other
The AHM being Lorentz invariant, one can obtain possibility allowed by reflexion symmetry of the
time-dependent solutions wherein a single n-vortex initial data. Figure 6 depicts snapshots of such a
travels at constant velocity, with speed 0 < v < 1 scattering process at modest v. The vortices deform
and Etot = (1 v2 )1=2 En , by Lorentz boosting the each other as they get close until, at the moment of
static solutions described above. Of more dynamical coincidence, they are close to the static 2-vortex
interest are solutions in which two or more vortices ring. They then break apart along a line perpendi-
undergo relative motion. The simplest problem is cular to their line of approach. One may consider
vortex scattering. Two vortices, initially well sepa- them to have exchanged half-vortices, so that each
rated, are propelled towards one another. In the emergent vortex is a mixture of the incoming
center-of-mass (COM) frame they have, as t ! 1, vortices. This rather surprising phenomenon was
equal speed v, and approach one another along actually predicted by Ruback in advance of any
parallel lines distance b (the impact parameter) numerical simulations and turns out to be a generic
apart, see Figure 4. If b = 0, they approach head- feature of planar topological solitons.
on. Assuming they do not capture one another, they Consider now the type II case ( = 2, Figure 5b).
interact and, as t ! 1, recede along parallel straight Here, > 0 for all v, b as one expects of particles
lines having been deflected through an angle (the that repel each other. Head-on scattering is more
scattering angle). If scattering is elastic, the exit lines interesting now since two regimes emerge: for v >
also lie b apart and each vortex travels at speed v as vcrit
0.3, one has the surprising 90 scattering
t ! 1. The dependence of on v, b, and has already described, while for v < vcrit the vortices
been studied through lattice simulations by several bounce backwards, = 180 . This is easily
authors, perhaps most comprehensively by Myers, explained. In order to undergo 90 head-on scatter-
Rebbi, and Strilka (1992). We shall now describe ing, the vortices must become coincident (otherwise
their results. reflexion symmetry is violated), hence must have
Note first that vortex scattering is actually initial energy at least E2 . For v < vcrit , where
inelastic: vortices recede with speed < v because
2E1
some of their initial kinetic energy is dispersed by p E2 17
the collision as small-amplitude traveling waves 1 vcrit 2
(radiation). This energy loss can be as high as they have too little energy, so come to a halt before
80% in very fast collisions at small b. At small v the coincidence, then recede from one another. The
energy loss is tiny, but can still have important solution vcrit of [17] depends on and is plotted in
consequences for type I vortices: if v is very small, Figure 7. For v slightly above vcrit , we see that, in
they start with only just enough energy to escape contrast to the type I case, (b) is not monotonic:
their mutual attraction. In undergoing a small b maximum deflection occurs at nonzero b.
collision they can lose enough of this energy to The point vortex formalism yields a simple model
become trapped in an oscillating bound state. In this of type II vortex scattering which is remarkably
case they do not truly scatter and is ill-defined. successful at small v. One writes down the Lagrangian
Myers et al. find that v 0.2 suffices to avoid for two identical (nonrelativistic) point particles of
mass E1 moving along trajectories x1 (t), x2 (t) under
the influence of the repulsive potential E1
int ,
L 12 E1 jx_ 1 j2 jx_ 2 j2 E1
int jx1 x2 j 18
b
Energy and angular momentum conservation reduce
(v, b) to an integral over one variable (s = jx1 x2 j)
which is easily computed numerically. To illustrate,
Figure 5b shows the result for = 2, v = 0.1
Figure 4 The geometry of vortex scattering. in comparison with the lattice simulations of
156 Abelian Higgs Vortices
180
100
160
140
50 120
100
80
0 60
40
20
50 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
b b
(a) (b)
90
80
70
60
50
40
30
20
10
0
0 1 2 3 4 5 6
b
(c)
Figure 5 The 2-vortex scattering angle as a function of impact parameter b for v = 0.1 (5), v = 0.2 (4),
v = 0.3 (}), v = 0.4 (&), v = 0.5 (), and v = 0.9 (), as computed by Myers et al. (1992): (a) = 1=2; (b) = 2; (c) = 1. The
dotted curves are merely guides to the eye.pThe solid curves in (b), (c) were computed using the point vortex model. Note that Myers
et al. use different normalizations, so b = 2bMRS and = MRS =2.
Myers et al. The agreement is almost perfect. For (v, 0) = 90 for all v, just as in the large v type I
large v the approximation breaks down not only and type II cases. The point is that scalar attraction
because relativistic corrections become significant, and magnetic repulsion of vortices are mediated by
but also because small b collisions then probe the small fields with different Lorentz transformation proper-
jx1 x2 j region where vortex core overlap effects ties. While they cancel for static vortices, there is no
become important. For the same reason, the point reason to expect them to cancel for vortices in
vortex model is less useful for type I scattering. relative motion.
Here there is no repulsion to keep the vortices well
separated, so its validity is restricted to the small v,
Critical Coupling
large b regime.
Critical coupling is theoretically the most inter- The AHM with = 1 has many remarkable proper-
esting regime, where most analytic progress has been ties, at which we have so far only hinted. These all
made. Since Eint E1 int 0, one might expect vortex stem from Bogomolnyis crucial observation
scattering to be trivial ((v, b) 0), but this is quite (Manton and Sutcliffe 2004, pp. 197202) that the
wrong, as shown in Figure 5c. In particular, potential energy in this case can be rewritten as
Abelian Higgs Vortices 157
0.5
0.4
0.3
crit
0.2
0.1
0
1 2 3 4 5 6 7 8
Figure 7 The critical velocity for 90 head-on scattering of type
II vortices vcrit as a function of , as predicted by equation [17]
(solid curve), in comparison with the results of Myers et al.
(1992), (crosses).
Z ( 2
1 1 2
E B 1 jj
2 2
Z
jD1 iD2 j2 B d2 x i
dD 19
R2
Taubess theorem shows that this n-vortex is just is approximately independent of v for v 0.5.
one point, corresponding to the list [0, 0, . . . , 0], in a Further, Stuart (1994) has proved that, for initial
2n-dimensional space of static multivortex solutions speeds of order , small, the fields stay (pointwise) 2
called the moduli space Mn . This space may be close to their geodesic approximant for times of
visualized as the flat, finite-dimensional valley order 1 .
bottom in Cn on which E attains its minimum On symmetry grounds, two vortex dynamics in
value, n. Points in Mn are in one-to-one correspon- the COM frame reduces to geodesic motion in M02
dence with distinct unordered lists [z1 , z2 , . . . , zn ], C, the subspace of centered 2-vortices (a1 = 0, so
which are themselves in one-to-one correspondence z1 = z2 ), with induced metric
with points in Cn , as follows. To each list, we assign
the unique monic polynomial whose roots are zr ,
0 Gja0 jda0 da0 25
vortices remain well separated. (Note that
1 is not reference frame (the rest frame of the superconduc-
positive definite if any jzr zs j becomes too small.) tor) so it is unsurprising that the Lorentz-invariant
The results are good, provided v 0.5 and b 3 AHM is inappropriate. Insofar as vortices move at
(see Figure 5c). all, they seem to obey a first-order (in time)
dynamical system, in contrast to the second-order
AHM. Manton has devised a first-order system
Other Developments which may have relevance to superconductivity, by
replacing Ekin with a ChernSimonsSchrodinger func-
The (critically coupled) AHM on a compact physical tional (Manton and Sutcliffe 2004, pp. 193197).
space is of considerable theoretical and physical Rather than attracting or repelling, vortices now
interest. Bradlow showed that Mn () is empty unless tend to orbit one another at constant separation.
V = Area() 4n, so there is a limit to how many There is again a moduli space approximation to
vortices a space of finite area can accommodate slow vortex dynamics for
1, but it has a
(Manton and Sutcliffe 2004, pp. 227230). Manton Hamiltonian-mechanical rather than Riemannian-
has analyzed the thermodynamics of a gas of geometric flavor.
vortices by studying the statistical mechanics of Finally, an interesting simplification of the AHM,
geodesic flow on Mn (). In this context, spatial which arises, for example, as a phenomenological
compactness is a technical device to allow nonzero model of liquid helium-4, is obtained if we discard the
vortex density n=V for finite n, without confining gauge field A , or equivalently set the electric charge of
the fields to a finite box, which would destroy the to e = 0. There is now no type I/II classification, since
Bogomolnyi properties. In the limit of interest, may be absorbed by rescaling. The resulting model,
n, V ! 1 with n/V fixed, the thermodynamical which has only global U(1) phase symmetry, supports
properties turn out to depend on only through n-vortices =
(r)ein for all n, but these are not
V, so = S2 and = T 2 give equivalent results, for exponentially spatially localized,
example. The equation of state of the gas is
(P = pressure, T = temperature) n2 n2 8 n2
r 1 Or6 29
r2 22 r4
nT
P 28 and cannot have finite E by Derricks theorem. They
V 4n
are unstable for jnj > 1, and 1-vortices uniformly
which is similar, at low density n/V, to that of a gas repel one another. They can be given an interesting
of hard disks of area 2. The crucial step in deriving first-order dynamics (the GrossPitaevski equation).
[28] is to find the volume of Mn () which, despite
there being no formula for
, may be computed
exactly by remarkable indirect arguments (Manton Abbreviations
and Sutcliffe 2004, pp. 231234).
A electromagnetic gauge potential
The static AHM coincides with the Ginzburg
b impact parameter
Landau model of superconductivity, which has
D gauge-covariant derivative
precisely the same type I/II classification. Here the E potential energy
Higgs field represents the wave function of a Ekin kinetic energy
condensate of Cooper pairs, usually (but not always) F electromagnetic field strength tensor
electrons. There has been a parallel development of L Lagrangian
the static model by condensed matter theorists, L Lagrangian density
therefore; see Fossheim and Sudbo (2004), for S action
example. In fact the vortex was actually first Higgs field
discovered by Abrikosov in the condensed matter scattering angle
context. One important difference is that type I
superconductors do not support vortex solutions in
an external magnetic field Bext because the critical See also: Fractional Quantum Hall Effect;
GinzburgLandau Equation; High Tc Superconductor
jBext j required to create a single vortex is greater
Theory; Integrable Systems: Overview; Nonperturbative
than the critical jBext j required to destroy the
and Topological Aspects of Gauge Theory; Quantum
condensate completely ( 0). Type II supercon- Fields with Topological Defects; Solitons and Other
ductors do support vortices, and there are such Extended Field Configurations; Symmetry Breaking in
superconductors with
1, but the vortex Field Theory; Topological Defects and Their Homotopy
dynamics we have described is not relevant to these Classification; Variational Techniques for
systems. In this context there is an obvious preferred GinzburgLandau Energies.
160 Adiabatic Piston
Adiabatic Piston
Ch Gruber, Ecole Polytechnique Federale de question is to find the final state, that is, the final
Lausanne, Lausanne, Switzerland position Xf of the piston and the parameters (p
f , Tf )
A Lesne, Universite P.-M. Curie, Paris VI, Paris, of the gases.
France In the late 1950s, using the two laws of
2006 Elsevier Ltd. All rights reserved. equilibrium thermodynamics (i.e., thermostatics),
Landau and Lifshitz concluded that the adiabatic
piston will evolve toward a final state where
Introduction p =T = p =T . Later, Callen (1963) and others
realized that the maximum entropy condition
Macroscopic Problem implies that the system will reach mechanical
The adiabatic piston is an old problem of equilibrium where the pressures are equal p f = pf ;
thermodynamics which has had a long and con- however, nothing could be said concerning the final
troversial history. It is the simplest example con- position Xf or the final temperatures Tf
which
cerning the time evolution of an adiabatic wall, that should depend explicitly on the viscosity of the
is, a wall which does not conduct heat. The system fluids. It thus became a controversial problem since
consists of a gas in a cylinder divided by an one was forced to accept that the two laws of
adiabatic wall (the piston). Initially, the piston is thermostatics are not sufficient to predict the final
held fixed by a clamp and the two gases are in state as soon as adiabatic movable walls are
thermal equilibrium characterized by (p
, T
, N
), involved (see early references in Gruber (1999)).
where the index / refers to the gas on the left/right Experimentally, the adiabatic piston was used
side of the piston and (p, T, N) denote the pressure, already before 1924 to measure the ratio cp =cv of
the temperature, and the number of particles the specific heats of gases. In 2000, new measure-
(Figure 1). Since the piston is adiabatic, the whole ments have shown that one has to distinguish
system remains in equilibrium even if T 6 T . At between two regimes, corresponding to weak damp-
time t = 0, the clamp is removed and the piston is let ing or strong damping, with very different proper-
free to move without any friction in the cylinder. The ties, for example, for weak damping the frequency
of oscillations corresponds to adiabatic oscillations,
whereas for strong damping it corresponds to
isothermal oscillations.
N N+
Microscopic Problem
p p+ A
T T+ The adiabatic piston was first considered from a
microscopic point of view by Lebowitz who intro-
duced in 1959 a simple model to study heat
conduction. In this model, the gas consists of point
particles of mass m making purely elastic collisions
0 X L
on the wall of the cylinder and on the piston.
Figure 1 The adiabatic piston problem. Furthermore, the gas is very dilute so that the
Adiabatic Piston 161
equation of state p = nkB T is satisfied at equili- extension to hard-disk particles was analyzed at
brium, where n is the density of particles in the gas the same time by Kestemont et al. (2000). Recently,
and kB the Boltzmann constant. The adiabatic piston several other authors have contributed to this
is taken as a heavy particle of mass M m without subject.
any internal degree of freedom. Using this same The general picture which emerges from all the
model Feynman (1965) gave a qualitative analysis in investigations is the following. For an infinite
Lectures in Physics. He argued intuitively but cylinder, starting with mechanical equilibrium
correctly that the system should converge first p = p = p, the piston evolves to a stationary
toward a state of mechanical equilibrium where stochastic state with nonzero velocity toward the
p = p and then very slowly toward thermal warmer side
equilibrium. This approach toward thermal equili- r
brium is associated with the wiggles of the piston m kB p p m
hVi T T o 1
induced by the random collisions with the atoms of M 8m M
the gas. Of course, this stochastic behavior is not
part of thermodynamics and the evolution beyond with relaxation time
the mechanical equilibrium cannot appear in the r 1
macroscopical framework assuming that the piston M kB 1 1 1
p p 2
does not conduct heat. A 8m p T T
From a microscopical point of view, one is
where M=A is the mass per unit area of the piston.
confronted with two different problems: the
In this state the piston has a temperature
p
approach toward mechanical equilibrium in the
TP = T T and there is a heat flux
absence of any a priori friction (where the entropy
of both gases should increase) and, on a different r
p p m 8kB m
timescale, the approach toward thermal equilibrium jQ T T po
(where the entropy of one gas should decrease but M m M
the total entropy increase). p p p 3
The conceptual difficulties of the problem beyond
For a finite cylinder and p 6 p , the evolution
mechanical equilibrium come from the following
proceeds in four different stages. The first two are
intuitive reasoning. When the piston moves toward
deterministic and adiabatic. They correspond to the
the hotter gas, the atoms of the hotter gas gain
thermodynamic evolution of the (macroscopic)
energy, whereas those of the cooler gas lose energy.
adiabatic piston. The last two stages, which go
When the piston moves toward the cooler side, it is
beyond thermodynamics, are stochastic with heat
the opposite. Since on an average the hotter side
transfer across the piston. More precisely:
should cool down and the cold side should warm
up, we are led to conclude that on an average the 1. In the first stage whose duration is the time
piston should move toward the colder side. On the needed for the shock wave to bounce back on the
other hand, from p = nkB T, the piston should move piston, the evolution corresponds to the case of
toward the warmer side to maintain pressure the infinite cylinder (with p 6 p ). If
balance. R = Nm=M > 10, the piston will be able to
In 1996, Crosignani, Di Porto, and Segev intro- reach and maintain a constant velocity
duced a kinetic model to obtain equations describing r p
kB T T m
the adiabatic approach toward mechanical equili-
V p p p p O
brium. Starting with the microscopical model 8m p T p T
M
introduced by Lebowitz, Gruber, Piasecki, and for jp p j 1 4
Frachebourg, later joined by Lesne and Pache,
initiated in 1998 a systematic investigation of the 2. In the second stage the evolution toward
adiabatic piston within the framework of statistical mechanical equilibrium is either weakly or
mechanics, together with a large number of numer- strongly damped depending on R. If R < 1, the
ical simulations. This analysis was based on the fact evolution is very weakly damped,
p the dynamics
that m=M is a very small parameter to investigate takes place on a timescale t0 = Rt, and the effect
expansions in powers of m=M (see Gruber and of the collisions on the piston is to introduce an
Piasecki (1999) and Gruber et al. (2003) and external potential (X) = c1 =X2 c2 =(L X)2 .
reference therein). An approach using dynamical On the other hand, if R > 4, the evolution is
system methods was then developed by Lebowitz strongly damped (with two oscillations only) and
et al. (2000) and Chernov et al. (2002). An depends neither on M nor on R.
162 Adiabatic Piston
3. After mechanical equilibrium has been reached, independent of the transverse coordinates. We are
the third stage is a stochastic approach toward thus led to a formally one-dimensional problem
thermal equilibrium associated with heat transfer (except for normalizations). Therefore, in this
across the piston. This evolution is very slow and review, we consider that the particles are noninter-
exhibits a scaling property with respect to acting and all velocities are parallel to the x-axis.
t0 = mt=M. From the collision law, if v and V denote the
4. After thermal equilibrium has been reached velocities of a particle and the piston before a
(T = T , p = p ), in a fourth stage the gas collision, then under the collision on the piston:
will evolve very slowly toward a state with
Maxwellian distribution of velocities, induced v ! v 0 2V v v V
5
by the collision with the stochastic piston. V ! V 0 V v V
The general conclusion is thus that a wall which is where
adiabatic when fixed will become a heat conductor
under a stochastic motion. However, it should be 2m
6
stressed that the time required to reach thermal Mm
equilibrium will be several orders of magnitude larger Similarly, under a collision of a particle with the
than the age of the universe for a macroscopical piston boundary at x = 0 or x = L:
and such a wall could not reasonably be called a heat
conductor. However, for mesoscopic systems, the effect v ! v0 v 7
of stochasticity may lead to very interesting properties,
Let us mention that more general models have also
as shown by Van den Broeck et al. (2004) in their
been considered, for example, the case where the
investigations of Brownian (or biological) motors.
two fluids are made of point particles with different
masses m , or two-dimensional models where the
particles are hard disks. However, no significant
Microscopical Model
differences appear in these more general models and
The system consists of two fluids separated by an we restrict this article to the simplest case.
adiabatic piston inside a cylinder with x-axis, One can study different situations: L = 1, L
length L, and area A. The fluids are made of N finite, and L ! 1. Furthermore, taking first M and
identical light particles of mass m. The piston is a A finite, one can investigate several limits.
heavy flat disk, without any internal degree of
1. Thermodynamic limit for the piston only. In
freedom, of mass M m, orthogonal to the
this limit, L is fixed (finite or infinite) and
x-axis, and velocity parallel to this x-axis. If the
A ! 1, M ! 1, keeping constant the initial
piston is fixed at some position X0 , and if the two
densities n of the fluid and the parameter
fluids are in thermal equilibrium characterized by
(p
0 , T0 , N ), then they will remain in equilibrium 2mA A
forever even if T0 6 T0 : it is thus an adiabatic A 2m 8
Mm M
piston in the sense of thermodynamics. At a certain
time t = 0, the piston is let free to move and the If L is finite, this means that N ! 1 while
problem is to study the time evolution. To define the keeping constant the parameters
dynamics, we consider that the system is purely
Hamiltonian, that is, the particles and the piston mN Mgas
R 9
move without any friction according to the laws of M M
mechanics. In particular, the collisions between the
2. Thermodynamic limit for the whole system,
particles and the walls of the cylinder, or the piston,
where L ! 1 and A L2 , N L3 . In this
are purely elastic and the total energy of the system
limit, space and time variables are rescaled
is conserved. In most studies, one considers that the
according to x0 = x=L and t 0 = t=L. This limit
particles are point particles making purely elastic
can bepconsidered
as a limiting case of (1) where
collisions. Since the piston is bound to move only in
R A ! 1 (and time is scaled).
the x-direction, the velocity components of the
3. Continuum limit where L and M are fixed and
particles in the transverse directions play no role in
N ! 1, m ! 0 keeping M gas constant, that is,
this problem. Moreover, since there is no coupling
R = cte.
between the components in the x- and transverse
directions, one can simplify the model further by The case L infinite and the limit (1) have been
assuming that all probability distributions are investigated using statistical mechanics (Liouville or
Adiabatic Piston 163
Boltzmanns equations). On the other hand, the where (v 0 , V 0 ) are given by eqn [5] and
limit (2) has been studied using dynamical system Z 1
methods, reducing first the system to a billiard in an v; V; t dX; P X; v ; X; V; t 14
surf
(N N 1)-dimensional polyhedron. The limit 1
(3) has been introduced to derive hydrodynamical
We thus have to solve eqns [12][13] with initial
equations for the fluids.
conditions
In this article, we present the approach based on
statistical mechanics. Although not as rigorous as (2) x; v; t 0 n
0 0 v x X0 x
on a mathematical level, it yields more informations
on the approach toward mechanical and thermal x; v; t 0 n
0 0 v L x x X0 15
equilibrium. Moreover, it indicates what are the V; t 0
V
open problems which should be mathematically
solved. In all investigations, advantage is taken of Using the fact that = 2m=(M m) 1, we can
the fact that m/M is very small and one introduces rewrite eqn [13] as a formal series in powers of :
the small parameter
p X1
1k k1 @ k e
m=M 1 10 @t V; t Fk1 V; t 16
k1
k! @V
Let us note that measures the ratio of thermal
Z 1
velocities for the piston and a fluid particle, whereas
2 measures the ratio of velocity changes during
~k V; t
F v Vk
surf v ; V; tdv
V
a collision. Z V
v Vk
surf v ; V; tdv 17
1
Starting Point: Exact Equations from which one obtains the equations for the
moments of the piston velocity:
Using the statistical point of view, the time evolution
is given by Liouvilles equation for the probability
1 dhV n i
distribution on the whole phase space for (N
dt
N 1) particles, with L, A, N , and M finite. Z
X n
n! 1
Initially (t 0), the piston is fixed at (X0 , V0 = 0) k1 ~k1 V; t
dV V nk F 18
and the fluids are in thermal equilibrium with k1
k!n k! 1
homogeneous densities n 0 , velocity distributions
0 (v) = 0 (v), and temperatures
However, we do not know the two-point correlation
Z 1 functions.
T0 m dv n
0 0 vv
2
11 If the length of the cylinder is infinite, the
1 condition M m implies that the probability for
Integrating out the irrelevant degrees of freedom, a particle to make more than one collision on the
the Liouvilles equation yields the equations for piston is negligible. Alternatively, one could choose
the distribution (x, v; t) of the right and left initial distributions 0 (v) which are zero for jvj <
particles: vmin , where vmin is taken such that the probability
of a recollision is strictly zero. Therefore, if L = 1,
@t x; v; t v@x x; v ; t I x; v ; t 12 one can consider that before a collision on the
The collision term I (x, v; t) is a functional of piston the particles are distributed with 0 (v) for
, P (X, v; X, V; t), the two-point correlation func- all t, and the two-point correlation functions
tion for a right (resp. left) particle at (x = X, v) and factorize, that is,
the piston at (X, V). Similarly, one obtains for the
surf v; V; t surf v; tV; t; if v > V
velocity distribution of the piston: 19
Z 1
surf v; V; t
surf v; tV; t; if v < V
@t V; t A V v V v 0 0
surf v ; V ; t
1 where for L = 1,
surf (v; t) = n0 0 (v) and thus the
conditions to obtain eqn [18] are satisfied.
v V surf v ; V; t dv
Z 1 If L is finite, one can show that the factorization
A V v v V 0 0
surf v ; V ; t
property (eqn [19]) is an exact relation in the
1
thermodynamic limit for the piston (A ! 1,
V v surf v ; V ; t dv 13 M=A = cte). For finite L and finite A, we introduce
164 Adiabatic Piston
Assumption 1 (Factorization condition). Before a from which one obtains equations for dr =dt. In
collision the two-point correlation functions have the particular, using the identities
factorization property (eqn [19]) to first order in .
r1; r; r2; r;
Under the factorization condition, we have F3 3F2 ; F2 2F0 29
~k V; t Fk V; tV; t
F 20 in [22] and [24], we have
with Z t
F2 V; t F2 V;
1
Fk V; t dvv Vk
surf v; t
X 2 r;
V F0 2r 30
Z V r 0
2 r!
dvv Vk
surf v; t
1
Fk V; t Fk V; t 21
d hE i
M F2 V; t V
and from eqn [18] dt A
X1
M d
F3 V; t 1 2r 3
hVi MhF2 V; ti 22 2 2 r 2 r!
A dt
r1;
F2 V; tr 31
M d 2
hV i MhVF2 V;ti hF3 V;ti 23
A dt Depending on the questions or approximations one
= hV i then from eqns [12] and [20], wants to study, either the distribution (V; t) or the
Introducing V
moments hV n it will be the interesting objects.
it follows that the (kinetic) energies satisfy
Finally, with the condition [19], one can take
h
d hE i eqn [12] for x 6 Xt and impose the boundary
M hF2 V; ti V
dt A conditions at x = Xt :
V; ti
hV VF 2
i
Xt ; v; t Xt ; v 0 ; t; if v < Vt
32
hF3 V; ti 24 Xt ; v; t Xt ; v 0 ; t; if v > Vt
2
which implies conservation of energy.
From the first law of thermodynamics, and similarly for x = 0 and x = L with v 0 = v.
Let us note that this factorization condition is of
d h E i 1 h P! i
the same nature as the molecular chaos assumption
PW PPQ! 25
dt A A introduced in kinetic theory, and with this condition
eqn [13] yields the Boltzmann equation for this
where PPW! and PPQ! denote the work- and model.
heat-power transmitted by the piston to the fluid, In the following, to obtain explicit results as a
we conclude from eqns [22] and [25] that the heat function of the initial temperatures T0 , we take
flux is Maxwellian distributions 0 (v) and initial condi-
1 P! h tions (p
0 , T0 , n0 ) such that the velocity of the piston
V; ti
PQ M hV VF remains small (i.e., jhVit j jhv i0 j).
2
A i
hF3 V; ti 26
2
Since 1, it is interesting to introduce the Distribution (V ; t) for the Infinite
irreducible moments Cylinder (L = 1)
p
ri To lowest order in = m=M, and assuming
r hV V 27 j1 p =p j is of order , one obtains from eqn [16]
= hVi ,
and the expansion around V the usual FokkerPlanck equation whose solution
t
gives
X
1
1 !
Fn V; t
Fnr; VV r
V 28
r! 1 1 V V t2
r0 0 V; t p exp 33
2 t 22 t
Adiabatic Piston 165
r 9
with m kB p p >
r 1 hVistat T T > =
kB p p M 8m if p p 38
Vt
p p
p p 1 et k p >
>
TT ;
8m T 2 B
T hV 2 istat hVistat
s M
A 8m p p
p p 34 Let us remark that we have established eqn [35]
M kB T T under the condition that j1 p =p j = O(), but as
p p
kB p p T p T we see in the next section, the stationary value Vstat
2 t TT p p 1 e2t obtained from eqn [36] remains valid whenever
M p T p T p
j(1 p =p )(1 T =T )j 1.
where we have dropped the index zero on the
variable T , n and used the equation of state
p = n kB T .
In conclusion, in the thermodynamic limit for the Moments hV n it : Thermodynamic Limit
piston (M ! 1, M=A fixed), eqn [33] shows that for the Piston
the evolution is deterministic, that is, (V; t) = General Equations: Adiabatic Evolution
(V V(t),
where the velocity V(t) of the piston
tends exponentially fast toward stationary value In the thermodynamic limit M ! 1, ! 0, = A
Vstat = V(1) with relaxation time = 1 . is fixed and eqn [16] reduces to
Let us note that for p = p , we have V(t) 0
@ ~
and the evolution [33] is identical to the @t V; t F2 V; t 39
@V
OrnsteinUhlenbeck process of thermalization of
the Brownian particle starting with zero velocity Integrating [39] with initial condition (V; t = 0) =
and friction coefficient . The analysis of [16] to
(V) yields
first order in yields then
" #
V;t
V Vt; that is; hV n it hVint 40
X3
V; t 1
ak tV Vt k
0 V; t 35 where
k0
d
where ak (t) can be explicitly calculated and a0 (t) = Vt F2 Vt; t; Vt 0 0 41
dt
2 (t)a2 (t) because of the normalization condition.
Moreover, a2 (t) (p p ), that is, a2 (t) = 0 if Moreover,
p = p . From [35], one obtains ~2 V; t F2 V; tV; t
r F 42
p
kB T T
hVit p p and
8m p T p T
n
; P X; v; X; V; t x; v; t
X Xt
p p 1 et
V Vt 43
p T p T
2
p p p p
8 p T p T 2 where dX(t)=dt = V(t), X(t = 0) = X0 .
In conclusion, as already mentioned, in this limit
1 2tet e2t
the factorization condition (eqn [19]) is an exact
m 1 relation. Let us note that
p p T p T surf (v; t) = surf (2V v; t) if
M TT v > V(t) (on the right) or v < V(t) (on the left). Let
p p! o us also remark that 2mF2 (V(t); t) represents the
p T p T
p p 1 et 2 36 effective pressure from the right/left exerted on the
p T p T piston. Moreover, since for any distribution
and
surf (v; t), the functions F2 (V; t) and F2 (V; t) are
r
monotonically decreasing, we can introduce the
m decomposition
2
hV it hVi2t 2
t 1 2
2 ta2 t 37
M
M
From eqn [36], we now conclude that for equal p surf 2mF
2 V; t ^
p
V; tV 44
A
pressures p = p , the piston will evolve stochasti-
cally to a stationary state with nonzero velocity where the static pressure at the surface is
toward the warmer side ^ (t) = p
p surf (V = 0; t) and the friction coefficients
166 Adiabatic Piston
(V; t) are strictly positive. The evolution [41] is its final velocity Vstat and one can solve eqn [12] to
thus of the form obtain the evolution of the fluids.
d A Finite Cylinder (L < 1, M = 1)
Vt p ^ VV
^ p 45
dt M
For finite L, introducing the average temperature in
It involves the difference of static pressure and the the fluids
friction coefficient (V) = (V) (V). Finally,
from eqn [12], we obtain the evolution of the 2hE it
Tav 50
(kinetic) energy per unit area for the fluids in the left kB N
and right compartments:
we have to solve [41] and [46], that is,
d < E > d A
2mF2 V; tV 46 Vt 2m F2 V; t F2 V; t
dt A dt M 51
Therefore, from [40] and [46], and the first law of d A
kB Tav 4m F2 V; tV
thermodynamics, we recover the conclusions dt N
obtained in the previous section, that is, in the where F2 (V; t) is a functional of
surf (v; t) which we
thermodynamic limit for the piston, the evolution decompose as
(eqns [41], [12], and [35]) is deterministic and
adiabatic (i.e., in [46] only work and no heat is
F2 V; t n ^ t M V; tV 52
^ tkB T
involved). A
then, assuming that to first order in , F1 (V = 0; t) is (recall that R = mN =M). For the case N = N to
the same function of T ^ (t) as for Maxwellian be considered in the simulations, eqn [64] implies
distributions, we have that the motion is weakly damped if
2s 3 "r r #2
A 4 8kB T
^ 3 Xf X
V m^n V 5 OV 2 57 R < Rmax 1 f 65
M m 2 L L
of the autonomous system [68] with Fk = Fk (V) observables. The initial conditions are set on the
shows that the piston evolves to a stationary state first-stage solution. The initial conditions of the
given by
with velocity V second regime match the asymptotic behavior of the
first-stage solution (matching condition).
F3 VF0 V 0
F2 V 69 The slaving principle is implemented by interpret-
4 F1 V
ing an evolution equation of the form
The temperature of the piston is da da
A; a; A O1 73
dt d
2 kB TP F3 V
70
M
4 F1 V as follows: it indicates that a is in fact a fast quantity
relaxing at short times ( ) toward a stationary
and the heat flux from the piston to the fluid is
state aeq () slaved to the slow evolution and
1 P! m2 F3 F1 F3 F1 determined by the condition
PQ 71
A 2M F1 F1 A; aeq 0 74
If we choose initial conditions such that jV(t)j 1
(at lowest order in , actually A[, aeq ()] = O()
for all t, and Maxwellian distributions (v), the
which prescribes the leading order of aeq ()); the
solutions V(t), 2 (t) coincide with the solutions
following-order terms can be arbitrarily fixed as
previously obtained (eqns [36] and [37]) and
long as only the first order of perturbation is
r
1 P! m 8kB implemented. Physically, such a condition arises to
P T T
express that an instantaneous mechanical equili-
A Q M m
brium takes place at each time of the slow
p p
p p 72 relaxation to thermal equilibrium.
p T p T
In conclusion, to first order in m=M, there is a heat
Equations for the fluctuation-induced evolution of
flux from the warm side to the cold one propor-
the system Following this procedure, we arrive at
tional to (T T ), induced by the stochastic
explicit expressions for the rescaled quantities (of order
motion of the piston. e = V=, e 2 = 2 =, and e = (p p )=:
O(1))V
Finite Cylinder (L < 1, M < 1)
Ve m AL F3 F1 F3 F1 O
3 E0 F1
Singular character of the perturbation approach
Whereas the leading order is actually the thermo- e
2m AL
F3 F1 F3 F1
dynamic behavior M = 1 in the first two stages of 2m 3 E0 75
the evolution (fast relaxation toward mechanical F3 F1
equilibrium), the fluctuations of order O() rule the O
4F1
slow relaxation toward thermal equilibrium. It is
thus obvious that a naive perturbation approach e 2 F3 O
4F1
cannot give access to both regimes. This difficulty
is reminiscent of the boundary-layer problems We then introduce a (dimensionless) rescaled posi-
encountered in hydrodynamics, and the perturbation tion for the piston
method to be used here is the exact temporal analog
1 X 1 1
of the matched perturbative expansion method 2 ; 76
developed for these boundary layers. The idea is to 2 L 2 2
implement two different perturbation approaches: which satisfies
1. one at short times, with time variable t describing
d 2A F1 F1
the fast dynamics ruling the fast relaxation kB T T 77
d 3E0 F1
toward mechanical equilibrium; and
2. one for longer times, with a rescaled time To discuss eqn [77], a third assumption has to be
variable = t. introduced.
The second perturbation approach above is supple- Assumption 3 (Maxwellian Identities). In the
mented with a slaving principle, expressing that at regime when V = O(), the relations between the
each time of the slow evolution, that is, at fixed , functionals F1 , F2 , and F3 are the same at lowest
the still present fast dynamics has reached a local order in as if the distributions surf (v; V; t) were
asymptotic state, slaved to the values of the slow Maxwellian in v:
Adiabatic Piston 169
r
kB T thermal equilibrium up to a temperature difference
F1 V
T T = O(). For the sake of technical complete-
2m
78 ness (rather that physical relevance, since the above
2kB T first-order analysis is enough to get the observable,
F3 V F1 V VF2 V
m meaningful behavior), let us mention that the pertur-
bation analysis can be carried over at higher orders;
Using these identities and the (dimensionless)
using further rescaled times t2 = 2 t0 , . . . , tn = n t0 , it
rescaled time
would allow us to control the evolution up to a
rr
2 kB 2N T0 N T0 temperature difference jT T j = O(n ); however,
s 79 one could expect that the factorization condition does
3L m N
not hold at higher orders.
where N = N N , we obtain a deterministic
equation describing the piston motion (Gruber et al.
2003): Numerical Simulations
"r r #
d N N As we have seen, the results were established under
1 2 1 2 the condition that m/M is a small parameter. More-
ds 2N 2N
80 over for finite systems (L < 1, M < 1), it was
1 Xad assumed that before collisions and to first order in
0
2 L m/M, the factorization and the average assumptions
where Xad is the piston position at the end of the are satisfied. The numerical simulations are thus
adiabatic regime (i.e., Xf , eqn [62]). The meaningful essential to check the validity of these assumptions, to
observables straightforwardly follow from the solu- determine the range of acceptable values m/M for the
tion (s): perturbation expansion, to investigate the thermo-
dynamic limit, and to guide the intuition.
1
Xs L s In all simulation, we have taken kB = 1, m = 1,
2 T = 1 and usually T = 10. For L finite, we have
81
N T0 N T0 taken L = 60, X0 = 10, A = 105 , and N = N = N=2,
T s 1 2s
2N that is, p = R(M=A)(1=10) and p = 2p . The
number of particles N was varied from a few hundreds
The first-order perturbation analysis using a single to one or several millions; the mass M of the piston
rescaled time t1 = t0 is valid in the regime when from 1 to 105 . We give below some of the results
V = O() and it gives access to the relaxation toward which have been obtained for L = 1 (Figures 2 and 3)
M=5
450
400 M = 10
350
0.5
300
M = 15
250 0.4
X(t )
200
0.3
Vstat
M = 25
150
100 0.2
M = 50
50 M = 100 0.1
0
0
0 500 1000 1500 2000 2500 3000 0 20 40 60 80 100
t M
(a) (b)
Figure 2 Evolution of the piston for L = 1, and p = p = 1 as observed in simulations (stochastic line in (a), dots in (b)) compared
with prediction: (a) position X(t ) for T = 10; and (b) stationary velocity for T = 10 (continuous line) and T = 100 (dotted line), as a
function of M.
170 Adiabatic Piston
0
0
2000
2000
X (t )
4000 4000
X(t)
6000
6000
8000
8000
10000
12000 10000
0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3
t 104 t 104
(a) (b)
Figure 3 Evolution of the piston for L = 1, M = 104 , and p 6 p as observed in simulations (continuous line) compared with
predictions (dotted line): (a) p = 1, p = p p, from top to bottom p=p = 0.05, 0.1, 0.2, 1, 2, 3; and (b) p =
, p = 2
,
p=p = 1; X 0 =
X , t 0 =
t,
= 103 , 102 , 101 , 1, 10, 102 , 103 , 104 .
10 0.3
0.2
9.5
0.1
9 0
Xad V
0.1
8.5
0.2
8
0.3
7.5 0.4
0 50 100 150 200 250 300 350 0 10 20 30 40 50
9.5 0.3
0.2
9
0.1
8.5 0
Xad V
X(t )
V(t )
0.1
8
0.2
7.5
0.3
7 0.4
50 100 150 200 250 300 350 0 10 20 30 40 50
0.15
10
0.1
9.5
9 0.05
8.5 0
Xad
8 0.05
7.5 0.1
7 0.15
6.5 0.2
0 50 100 150 200 250 300 350 0 10 20 30 40 50
t t
(a) (b)
Figure 4 Deterministic evolution toward mechanical equilibrium for L < 1, M = 105 : (a) position X(t); one finds Xad
sim
= 8.3 whereas
th
Xad = 8.42 and (b) velocity V(t); one finds V sim = 0.343 whereas V th = 0.3433. From top to bottom: R = 12: strong damping,
3
independentpof R and M for R > 4 and M > 10 . R = 2: critical damping. R = 0.1: weak damping; damping coefficient increases with R
and !0 R for R < 1 but is independent of M for M > 103 .
Adiabatic Piston 171
105 105
3 3
2.8 2.8
2.6 2.6
2.4 2.4
2.2
p av
2.2
psurf
2 2
1.8
1.8
1.6
1.6
1.4
1.4
1.2
0 50 100 150 200 250 1.2
0 20 40 60 80
0 50 100 150 200 250
10 0 20 40 60 80
10.5
Tav+ (t )
10
+ (t)
9.5
Tsurf
9.5
9
9
8.5
2 2
(t)
Tav (t )
1.5 1.5
Tsurf
1 1
0 50 100 150 200 250 0 20 40 60 80
t t
(a) (b)
Figure 5 Same conditions as Figure 4, R = 12: (a) average pressure and temperature in the fluid: pav (t) = 2E n =N ,
Tav = E =N kB and (b) pressure and temperature at the surface of the piston. Prediction: Tad
= 1.54, Tad
= 9.46, pad = pad = 2.2.
Simulations: Tad = 1.52, Tad = 9.48, pad = pad = 2.2.
and for L < 1 approach to mechanical equilibrium predictions. In particular, they show that if R > 4,
(Figures 46) and to thermal equilibrium (Figures 7 the piston will be able to reach and maintain for
and 8). some time the velocity Vstat , whereas this will not be
the case for R < 1 (Figure 4b). In the second stage of
the evolution, the simulations (Figure 4) exhibit
Conclusions and Open Problems
damped oscillations toward mechanical equilibrium
In this article, the adiabatic piston has been which are in very good agreement with the predic-
investigated to first order in the small parameter tions for the final state (Xad , Tad ), the frequency of
m/M, but no attempt has been made to control the oscillations and the existence of weak and strong
remainder terms. For an infinite cylinder, no other damping depending on R < 1 or R > 4. Moreover,
assumptions were necessary and the numerical the general behavior of the evolution observed in the
simulations (Figures 2 and 3) are in perfect agree- simulations as a function of the parameters was as
ment with the theoretical prediction in particular for predicted. However, the damping coefficient of these
the stationary velocity Vstat , the friction coefficient oscillations is wrong by one or several orders of
(V), and the relaxation time . magnitude. To understand this discrepancy, we note
For a finite cylinder (L < 1) and in the thermo- that using the average assumption we have related
dynamic limit (M = 1), we were forced to introduce the damping to the friction coefficient. However, the
the average assumption to obtain a set of autono- simulations clearly show that those two dissipative
mous equations. As we have seen when initially p effects have totally different origins. Indeed, as one
6 p , this limiting case also describes the evolution can see with L = 1, friction is associated with the
to lowest order during the first two stages character-
p fact that the density of the gas in front and in the
ized by a time of the order t1 = L m=kB T , where the back of the piston is not the same as in the bulk, and
evolution is adiabatic and deterministic. In the first this generates a shock wave that propagates in the
stage, that is, before the shock wave bounces back on fluid. For finite L, when R > 4, the stationary
the piston, the simulations confirm the theoretical velocity Vstat is reached and the effect of friction is
172 Adiabatic Piston
0.1 0.1
0.3
0 0
5 0 5 5 0 5
0.2 0.2
5 0 5 5 0 5
()
0.15
0.3 0.3
0.2 0.2
0.1
0.1 0.1
0 0
5 0 5 5 0 5
0.05
0.3 0.3
0.2 0.2
0
0.1 0.1
0 0
5 0 5 5 0 5 5 2.5 0 2.5 5
(a) (b)
Figure 6 Velocity distribution in the left compartment. Same conditions as Figure 4, R = 12. Dotted line corresponds to Maxwellian
with T = 1.52: (a) t = 12, 24, 36, 48, 60, 92, 144, 240 from top to bottom and (b) t = 276460.
to transfer in this first stage more and more energy to motion. In this case very little dissipation is involved
the fluid on one side and vice versa on the other side. and the damping will be very small. This indicates
However, to stop the piston and reverse its motion, that the mechanism responsible for damping is
only a certain amount of the transferred energy is associated with shock waves bouncing back and
necessary and the rest remains as dissipated energy in forth and the average assumption, which corresponds
the fluid leading to a strong damping. On the other to a homogeneity condition throughout the gas,
hand, for R < 1, the value Vstat is never reached and cannot describe the situation. In fact, the simulations
all the energy transferred is necessary to revert the (Figure 5b) indicate that the average assumption does
35 10
9
30
8
25 7
6
T
X
20
5
15 4
3
10
2
5 1
0 50 100 150 200 250 300 0 50 100 150 200 250 300
= t = t
(a) (b)
4
Figure 7 Approach to thermal equilibrium, N = 3
10 . The smooth curves correspond to the predictions, the stochastic curves to
simulations: (a) position X (), = t, no visible difference for M = 100, 200, 1000 and (b) average temperatures T (), = t, M = 200.
Adiabatic Piston 173
0.4 0.2
0.35
0.3 0.15
()/n
0.25
()/n
0.2 0.1
0.15
0.1 0.05
0.05
0 0
10 5 0 5 10 15 10 5 0 5 10 15
(a) (b)
Figure 8 Approach to thermal equilibrium from Tad = 1.54 (dotted line in(a)) to Tf = 5.5 (heavy line in (b)). Velocity distribution
function on the left for M = 200, N = 5
104 . (a) = t = 2, 4, 14, 48, 92, 144 and (b) approach to Maxwellian distribution for > 445.
not hold in this second stage. In conclusion, one is Finally, let us mention that the relation between the
forced to admit that to describe correctly the piston problem and the second law of thermodynamics
adiabatic evolution, it is necessary to study the is one more major problem. The question of entropy
coupling between the motion of the piston and the production out of equilibrium, and the validity of the
hydrodynamic equations of the gas. Preliminary second law, are still highly controversial. Again,
investigations have been initiated, but this is still preliminary results can be found in the literature.
one of the major open problems. Another problem Among other things, this question has led to a model of
would be to study the evolution in the case of heat conductivity gases, which reproduces the correct
interacting particles. However, investigations with behavior (Gruber and Lesne 2005).
hard disks suggest that no new effects should appear.
To investigate adiabatic evolution, a simpler version
See also: Billiards in Bounded Convex Domains;
of the adiabatic piston problem, without any con-
Boltzmann Equation (Classical and Quantum);
troversy, has been introduced: this is the model of a
Hamiltonian Fluid Dynamics; Multiscale Approaches;
standard piston with a constant force acting on it. Nonequilibrium Statistical Mechanics (Stationary):
In the third stage, that is, the very slow Overview; Nonequilibrium Statistical Mechanics:
approach to thermal equilibrium, another assump- Dynamical Systems Approach.
tion was necessary, namely the factorization
condition. The simulations (Figure 7) show a very
good agreement with the prediction, and in
particular the scaling property with t0 = t=M is Further Reading
perfectly verified. It appears that the small dis-
crepancy between simulations and theoretical Callen HB (1963) Thermodynamics. New York: Wiley.
(Appendix C. See also Callen HB (1985) Thermodynamics
predictions could be due to the fact that, to and Thermostatics, 2nd edn., pp. 51 and 53. New York:
compute explicitly the coefficients in the equations Wiley.)
of motion, we have taken Maxwellian relations for Chernov N, Sinai YaG, and Lebowitz JL (2002) Scaling dynamic
the velocities of the gas particles, which is clearly of a massive piston in a cube filled with ideal gas: exact
not the case (Figure 8a). results. Journal of Statistical Physics 109: 529548.
Feynman RP (1965) Lectures in Physics I. New York: Addison-
The fourth stage of the evolution, that is, the Wesley.
approach to Maxwellian distributions (Figure 8b), is Gruber Ch (1999) Thermodynamics of systems with internal
still another major open problem. Some preliminary adiabatic constraints: time evolution of the adiabatic piston.
studies have been conducted, where one investigates European Journal of Physics 20: 259266.
the stability and the evolution of the system when Gruber Ch and Lesne A (2005) Hamiltonian model of heat
conductivity and Fourier law. Physica A 351: 358.
initially the two gases are in the same equilibrium Gruber Ch, Pache S, and Lesne A (2003) Two-time-scale
state, but characterized by a distribution function relaxation towards thermal equilibrium of the enigmatic
which is not Maxwellian. piston. Journal of Statistical Physics 112: 11991228.
174 AdS/CFT Correspondence
Gruber Ch and Piasecki J (1999) Stationary motion of the Ball Systems and the Lorentz Gas, Encyclopedia of
adiabatic piston. Physica A 268: 412442. Mathematical Sciences Series, vol. 101, pp. 217227. Berlin:
Kestemont E, Van den Broeck C, and MalekMM (2000) The Springer.
adiabatic piston: and yet it moves. Europhysics Letters 49: 143. Van den Broeck C, Meurs P, and Kawai R (2004) From
Lebowitz JL, Piasecki J, and Sinai YaG (2000) Scaling dynamics Maxwell demon to Brownian motor. New Journal of Physics
of a massive piston in an ideal gas. In: Szasz D (ed.) Hard 7: 10.
AdS/CFT Correspondence
C P Herzog, University of California at Santa Barbara, with string one-loop diagrams, by N 0 , etc. This
Santa Barbara, CA, USA counting corresponds to the closed-string coupling
I R Klebanov, Princeton University, Princeton, NJ, constant of order N1 . Thus, in the large-N limit
USA the gauge theory becomes planar, and the dual
2006 Elsevier Ltd. All rights reserved. string theory becomes classical. For small g2YM N,
the gauge theory can be studied perturbatively; in
this regime the dual string theory has not been very
Introduction useful because the background becomes highly
curved. The real power of the AdS/CFT duality,
The anti-de Sitter/conformal field theory (AdS/CFT)
which already has made it a very useful tool, lies in
correspondence is a conjectured equivalence
the fact that, when the gauge theory becomes
between a quantum field theory in d spacetime strongly coupled, the curvature in the dual descrip-
dimensions with conformal scaling symmetry and a tion becomes small; therefore, classical supergravity
quantum theory of gravity in (d 1)-dimensional
provides a systematic starting point for approximat-
anti-de Sitter space. The most promising
ing the string theory.
approaches to quantizing gravity involve super-
There is a strong motivation for an improved
string theories, which are most easily defined in
understanding of dualities of this type. In one
10 spacetime dimensions, or M-theory which is
direction, generalizations of this duality provide the
defined in 11 spacetime dimensions. Hence, the
tantalizing hope of a better understanding of
AdS/CFT correspondences based on superstrings quantum chromodynamics (QCD); QCD is a non-
typically involve backgrounds of the form AdSd1
abelian gauge theory that describes the strong
Y9d while those based on M-theory involve back-
interactions of mesons, baryons, and glueballs, and
grounds of the form AdSd1
Y10d , where Y are
has a conformal symmetry which is broken by
compact spaces.
quantum effects. In the other direction, AdS/CFT
The examples of the AdS/CFT correspondence
suggests that quantum gravity may be understand-
discussed in this article are dualities between
able as a gauge theory. Understanding the confine-
(super)conformal nonabelian gauge theories and
ment of quarks and gluons that takes place in
superstrings on AdS5
Y5 , where Y5 is a five- low-energy QCD and quantizing gravity are well
dimensional Einstein space (i.e., a space whose acknowledged to be two of the most important
Ricci tensor is proportional to the metric,
outstanding problems of theoretical physics.
Rij = 4gij ). In particular, the most basic (and maxi-
mally supersymmetric) such duality relates
N = 4 SU(N) super YangMills (SYM) and type IIB
superstring in the curved background AdS5
S5 . Some Geometrical Preliminaries
There exist special limits where this duality is The d-dimensional sphere of radius L, Sd , may be
more tractable than in the general case. If we take defined by a constraint
the large-N limit while keeping the t Hooft coupling
= g2YM N fixed (gYM is the YangMills coupling X
d1
The d-dimensional anti-de Sitter space, AdSd , may topological defect: upon touching a D-brane, a
be defined by a constraint closed string can open up and turn into an open
string whose ends are free to move along the
X
d1
X0 2 Xd 2 Xi 2 L2 2 D-brane. For the endpoints of such a string the p 1
i1 longitudinal coordinates satisfy the conventional free
(Neumann) boundary conditions, while the 9 p
This constraint shows that the symmetry group of coordinates transverse to the Dp brane have the fixed
AdSd is SO(2, d 1). AdSd is a negatively curved (Dirichlet) boundary conditions, hence the origin of
maximally symmetric space, that is, its curvature the term Dirichlet brane. The Dp brane preserves
tensor is related to the metric by half of the bulk supersymmetries and carries an
1 elementary unit of charge with respect to the (p 1)-
Rabcd gac gbd gad gbc 3 form gauge potential from the RamondRamond
L2
(RR) sector of type II superstring.
Its metric may be written as
For this article, the most important property of
dy2 D-branes is that they realize gauge theories on their
ds2AdS L2 y2 1dt2 2 y2 d2d2 4 world volume. The massless spectrum of open
y 1
strings living on a Dp brane is that of a maximally
where the radial coordinate y 2 [0, 1), and t is supersymmetric U(1) gauge theory in p 1 dimen-
defined on a circle of length 2. This space has sions. The 9 p massless scalar fields present in this
closed timelike curves; to eliminate them, we will supermultiplet are the expected Goldstone modes
work with the universal covering space where associated with the transverse oscillations of the Dp
t 2 (1, 1). The boundary of AdSd , which plays brane, while the photons and fermions provide the
an important role in the AdS/CFT correspondence, is unique supersymmetric completion. If we consider
located at infinite y. There exists a subspace of AdSd N parallel D-branes, then there are N 2 different
called the Poincare wedge, with the metric species of open strings because they can begin and
! end on any of the D-branes. N 2 is the dimension of
L 2
2
Xd2
2
ds2 2 dz2 dx0 dxi 5 the adjoint representation of U(N), and indeed we
z i1 find the maximally supersymmetric U(N) gauge
theory in this setting.
where z 2 [0, 1).
The relative separations of the Dp branes in the
A Euclidean continuation of AdSd is the
9 p transverse dimensions are determined by
Lobachevsky space (hyperboloid), Ld . It is obtained
the expectation values of the scalar fields. We will
by reversing the sign of (Xd )2 , dt2 , and (dx0 )2 in [2],
be interested in the case where all scalar expectation
[4], and [5], respectively. After this Euclidean
values vanish, so that the N Dp branes are stacked
continuation, the metrics [4] and [5] become
on top of each other. If N is large, then this stack is
equivalent; both of them cover the entire Ld .
a heavy object embedded into a theory of closed
Another equivalent way of writing the metric is
strings which contains gravity. Naturally, this
ds2L L2 d2 sinh2 d2d1 6 macroscopic object will curve space: it may be
described by some classical metric and other back-
which shows that the boundary at infinite has the ground fields including the RR (p 2)-form field
topology of Sd1 . In terms of the Euclideanized strength. Thus, we have two very different descrip-
metric [5], the boundary consists of the Rd1 at tions of the stack of Dp branes: one in terms of the
z = 0, and a single point at z = 1. U(N) supersymmetric gauge theory on its world
volume, and the other in terms of the classical RR
charged p-brane background of the type II closed
superstring theory. The relation between these two
The Geometry of Dirichlet Branes
descriptions is at the heart of the connections
Our path toward formulating the AdS5 =CFT4 between gauge fields and strings that are the subject
correspondence requires introduction of Dirichlet of this article.
branes, or D-branes for short. They are soliton-like
membranes of various internal dimensionalities
Coincident D3 Branes
contained in type II superstring theories. A Dirichlet
p-brane (or Dp brane) is a (p 1)-dimensional Gauge theories in 3 1 dimensions play an impor-
hyperplane in (9 1)-dimensional spacetime where tant role in physics, and as explained above, parallel
strings are allowed to end. A D-brane is much like a D3 branes realize a (3 1)-dimensional U(N) SYM
176 AdS/CFT Correspondence
theory. Let us compare a stack of D3 branes with where we used the standard relations = 87=2 gst 0 2
the RR-charged black 3-brane classical solution and g2YM = 4gst [10]. Thus, the size of the throat in
where the metric assumes the form string units is 1=4 . This remarkable emergence
h i of the t Hooft coupling from gravitational con-
ds2 H1=2 r f rdx0 2 dxi 2 siderations is at the heart of the success of the AdS/
h i CFT pcorrespondence. Moreover, the requirement
H 1=2 r f 1 rdr2 r2 d5 2 7
L 0 translates into 1: the gravitational
approach is valid when the t Hooft coupling is very
where i = 1, 2, 3 and
strong and the perturbative field-theoretic methods
L4 r0 4 are not applicable.
Hr 1 ; f r 1
r4 r4
The solution also contains an RR self-dual 5-form Example: Thermal Gauge Theory from
Near-Extremal D3 Branes
field strength
An important black hole observable is the Bekenstein
F dx0 ^ dx1 ^ dx2 ^ dx3 ^ dH 1
Hawking (BH) entropy, which is proportional to the
4L4 volS5 8 area of the event horizon. For the 3-brane solution
[7], the horizon is located at r = r0 . For r0 > 0 the
so that the Einstein equation of type IIB super-
3-brane carries some excess energy E above its
gravity, R = F
F
=96, is satisfied.
extremal value, and the BH entropy is also non-
In the extremal limit r0 ! 0, the 3-brane metric
vanishing. The Hawking temperature is then defined
becomes
by T 1 = @SBH =@E.
1=2 Setting r0 L in [9], we obtain a near-extremal
L4
ds2 1 4 dx0 2 dxi 2 3-brane geometry, whose Hawking temperature is
r
1=2 found to be T = r0 =(L2 ). The eight-dimensional
L4 2 area of the horizon is
1 4 dr r2 d25 9
r
Ah r0 =L3 V3 L5 volS5 6 L8 T 3 V3 12
Just like the stack of parallel, ground-state D3
branes, the extremal solution preserves 16 of the where V3 is the spatial volume of the D3 brane (i.e.,
32 supersymmetries present in the type IIB theory. the volume of the x1 , x2 , x3 coordinates). Therefore,
Introducing z = L2 =r, one notes that the limiting the BH entropy is
form of [9] as r ! 0 factorizes into the direct 2Ah 2 2
product of two smooth spaces, the Poincare wedge SBH N V3 T 3 13
2 2
[5] of AdS5 , and S5 , with equal radii of curvature L.
The 3-brane geometry may thus be viewed as a This gravitational entropy of a near-extremal
semi-infinite throat of radius L which, for r L, 3-brane of Hawking temperature T is to be
opens up into flat (9 1)-dimensional space. Thus, identified with the entropy of N = 4 supersym-
p metric U(N) gauge theory (which lives on N
for L much larger than the string length scale, 0 ,
the entire 3-brane geometry has small curvatures coincident D3 branes) heated up to the same
everywhere and is appropriately described by the temperature.
supergravity approximation to type IIB string The entropy of a free U(N) N = 4 supermultiplet
theory. which consists of the gauge field, 6N2 massless
p scalars, and 4N2 Weyl fermions can be calculated
The relation between L and 0 may be found by
equating the gravitational tension of the extremal using the standard statistical mechanics of a
3-brane classical solution to N times the tension of a massless gas (the blackbody problem), and the
single D3 brane: answer is
p 22 2
2 4 5 S0 N V3 T 3 14
L volS N 10 3
2
where vol(S 5
)= 3 It is remarkable that the 3-brane geometry captures
is the volume of a unit 5-sphere,
p
and = 8G is the ten-dimensional gravitational the T 3 scaling characteristic of a conformal field
constant. It follows that theory (CFT) (in a CFT this scaling is guaranteed by
the extensivity of the entropy and the absence of
L4 5=2 N g2YM N0 2 11 dimensionful parameters). Also, the N 2 scaling
2 indicates the presence of O(N 2 ) unconfined degrees
AdS/CFT Correspondence 177
of freedom, which is exactly what we expect in the particle incident from the asymptotic (large r) region
N = 4 supersymmetric U(N) gauge theory. But what tunnels into the r L region and produces an
is the explanation of the relative factor of 3/4 excitation of the throat. The fact that the two
between SBH and S0 ? In fact, this factor is not a different descriptions of the absorption process give
contradiction but rather a prediction about the identical cross sections supports the identification of
strongly coupled N = 4 SYM theory at finite excitations of AdS5 S5 with the excited states of
temperature. As we argued above, the supergravity the N = 4 SYM theory.
calculation of the BH entropy, [13], is relevant to Maldacena (1998) motivated this correspondence
the ! 1 limit of the N = 4 SU(N) gauge theory, by thinking about the low-energy (0 ! 0) limit of
while the free-field calculation, [14], applies to the the string theory. On the D3 brane side, in this low-
! 0 limit. Thus, the relative factor of 3/4 is not a energy limit, the interaction between the D3 branes
discrepancy: it relates two different limits of the and the closed strings propagating in the bulk
theory. Indeed, on general field-theoretic grounds, vanishes, leaving a pure N = 4 SYM theory on the
we expect that in the t Hooft large-N limit, the D3 branes decoupled from type IIB superstrings in
entropy is given by flat space. Around the classical 3-brane solutions,
there are two types of low-energy excitations. The
22 2
S N f V3 T 3 15 first type propagate in the bulk region, r L, and
3 have a cross section for absorption by the throat
The function f is certainly not constant: which vanishes as the cube of their energy. The
perturbative calculations valid for small = g2YM N second type are localized in the throat, r L, and
give find it harder to tunnel into the asymptotically flat
p region as their energy is taken smaller. Thus, both
3 3 2 3=2 the D3 branes and the classical 3-brane solution
f 1 2 16
2 3 have two decoupled components in the low-energy
Thus, the BH entropy in supergravity, [13], is limit, and in both cases, one of these components is
translated into the prediction that type IIB superstrings in flat space. Maldacena
conjectured an equivalence between the other two
3
lim f 17 components.
!1 4 Immediate support for this identification comes
from symmetry considerations. The isometry group
of AdS5 is SO(2, 4), and this is also the conformal
group in 3 1 dimensions. In addition, we have the
isometries of S5 which form SU(4) SO(6). This
The Essentials of the AdS/CFT
group is identical to the R-symmetry of the N = 4
Correspondence
SYM theory. After including the fermionic genera-
The AdS/CFT correspondence asserts a detailed map tors required by supersymmetry, the full isometry
between the physics of type IIB string theory in the supergroup of the AdS5 S5 background is
throat of the classical 3-brane geometry, that is, the SU(2, 2j4), which is identical to the N = 4 super-
region r L, and the gauge theory living on a stack conformal symmetry. We will see that, in theories
of D3 branes. As already noted, in this limit r L, with reduced supersymmetry, the S5 factor is
the extremal D3 brane geometry factors into a direct replaced by other compact Einstein spaces Y5 , but
product of AdS5 S5 . Moreover, the gauge theory AdS5 is the universal factor present in the dual
on this stack of D3 branes is the maximally description of any large-N CFT and makes the
supersymmetric N = 4 SYM. SO(2, 4) conformal symmetry a geometric one.
Since the horizon of the near-extremal 3-brane lies The correspondence extends beyond the super-
in the region r L, the entropy calculation could gravity limit, and we must think of AdS5 Y5 as a
have been carried out directly in the throat limit, background of string theory. Indeed, type IIB strings
where H(r) is replaced by L4 =r4 . Another way to are dual to the electric flux lines in the gauge theory,
motivate the identification of the gauge theory with providing a string-theoretic setup for calculating
the throat is to think about the absorption of correlation functions of Wilson loops. Furthermore,
massless particles. In the D-brane description, a if N ! 1 while g2YM N is held fixed and finite, then
particle incident from asymptotic infinity is con- there are string scale corrections to the supergravity
verted into an excitation of the stack of D-branes, limit (Maldacena 1998, Gubser et al. 1998, Witten
that is, into an excitation of the gauge theory on the 1998) which proceed in powers of
world volume. In the supergravity description, a 0 =L2 = (g2YM N)1=2 . For finite N, there are also
178 AdS/CFT Correspondence
string loop corrections in powers of 2 =L8 N 2 . large-N limit, the string theory becomes classical
As expected, with N ! 1 we can take the classical which implies
limit of the string theory on AdS5 Y5 . However, in
order to understand the large-N gauge theory with Zstring eI
0 x 20
finite t Hooft coupling, we should think of AdS5 where I[
0 (x)] is the extremum of the classical string
Y5 as the target space of a two-dimensional sigma action calculated as a functional of
0 . If we are
model describing the classical string physics. further interested in correlation functions at very
large t Hooft coupling, then the problem of
Correlation Functions and the Bulk/Boundary extremizing the classical string action reduces to
Correspondence solving the equations of motion in type IIB super-
gravity whose form is known explicitly. A simple
A basic premise of the AdS/CFT correspondence is example of such a calculation is presented in the
the existence of a one-to-one map between gauge- next subsection.
invariant operators in the CFT and fields (or Our reasoning suggests that from the point of
extended objects) in AdS. Gubser et al. (1998) and view of the metric [5], the boundary conditions are
Witten (1998) formulated precise methods for imposed not quite at z = 0, which is the true
calculating correlation functions of various opera- boundary of L5 , but at some finite value z = . It
tors in a CFT using its dual formulation. A physical does not matter which value it is since the metric [5]
motivation for these methods comes from earlier is unchanged by an overall rescaling of the coordi-
calculations of absorption by 3-branes. When a nates (z, x); thus, such a rescaling can take z = L into
wave is absorbed, it tunnels from asymptotic infinity z = for any . The physical meaning of this cutoff is
into the throat region, and then continues to that it acts as a UV regulator in the gauge theory.
propagate toward smaller r. Let us separate the Indeed, the radial coordinate z is to be considered as
3-brane geometry into two regions: r > <
L and r L. the effective energy scale of the gauge theory, and
<
For r L the metric is approximately that of decreasing z corresponds to increasing the energy. A
AdS5 S5 , while for r >
L it becomes very different safe method for performing calculations of correla-
and eventually approaches the flat metric. Signals tion functions, therefore, is to keep the cutoff on the
coming in from large r (small z = L2 =r) may be z-coordinate at intermediate stages and remove it
considered as disturbing the boundary of AdS5 at only at the end.
r L, and then propagating into the bulk of AdS5 .
Discarding the r > L part of the 3-brane metric, the
gauge theory correlation functions are related to the Two-Point Functions and Operator Dimensions
response of the string theory to boundary conditions
In the following, we present a brief discussion of
at r L. It is therefore natural to identify the
two-point functions of scalar operators in CFTd .
generating functional of correlation functions in the
The corresponding field in Ld1 is a scalar field of
gauge theory with the string theory path integral
mass m whose Euclidean action is proportional to
subject to the boundary conditions that
(x, z) =
0 (x) at z = L (at z = 1 all fluctuations Z " #
1 d 2
Xd
2 m2 L2 2
are required to vanish). In calculating correlation d x dz z d1
@z
@a
2
functions in a CFT, we will carry out the standard 2 a1
z
Euclidean continuation; then on the string theory 21
side, we will work with L5 , which is the Euclidean
version of AdS5 . In calculating correlation functions of vertex
More explicitly, we identify a gauge theory operators from the AdS/CFT correspondence, the
quantity W with a string-theory quantity Zstring : first problem is to reconstruct an on-shell field in
Ld1 from its boundary behavior. The near-bound-
W
0 x Zstring
0 x 18
ary, that is, small z, behavior of the classical
W generates the connected Euclidean Greens func- solution is
tions of a gauge-theory operator O,
Z
z; x ! zd
0 x Oz2
4
W
0 x exp d x
0 O 19 z Ax Oz2 22
Zstring is the string theory path integral calculated as where is one of the roots of
a functional of
0 , the boundary condition on the
field
related to O by the AdS/CFT duality. In the d m2 L2 23
AdS/CFT Correspondence 179
0 (x) is regarded as a source in [19] that couples states. Since the radius of the S5 is L, the masses of
to the dual gauge-invariant operator O of dimension the KaluzaKlein states are proportional to 1=L.
, while A(x) is related to the expectation value, Thus, the dimensions of the corresponding operators
are independent of L and therefore also of . On the
1 gauge-theory side, this independence is explained by
Ax hOxi 24
2 d the fact that the supersymmetry protects the dimen-
sions of certain operators from being renormalized:
It is possible to regularize the Euclidean action to
they are completely determined by the representa-
obtain the following value as a functional of the
tion under the superconformal symmetry. All
source:
families of the KaluzaKlein states, which corre-
spond to such protected operators, were classified
I
0 x d=2d=2 long ago. Correlation functions of such operators in
d=2
Z Z 0 the strong t Hooft coupling limit may be obtained
0 x
0 x
dd x dd x0 25 from the dependence of the supergravity action on
jx x0 j2 the boundary values of corresponding KaluzaKlein
fields, as in [19]. A variety of explicit calculations
Varying twice with respect to
0 , we find that the
have been performed for two-, three-, and even four-
two-point function of the corresponding operator is
point functions. The four-point functions are parti-
2 d 1 cularly interesting because their dependence on
hOxOx0 i 26
d=2 jx x0 j2
d=2 operator positions is not determined by the con-
formal invariance.
Which of the two roots, or , of [23] On the other hand, the masses of string excita-
r tions are m2 = 4n=0 , where n is an integer. For the
d d2
m2 L2 27 corresponding operators the formula [27] predicts
2 4 that the dimensions do depend on the t Hooft
should we choose for the operator dimension? For 2
coupling
p and, in fact, blow up for large = gYM N as
positive m2 , is certainly the right choice: here the 2 1=4
n.
other root, , is negative. However, it turns out
that for
Calculation of Wilson Loops
d2 d2
< m2 L2 < 1 28 The Wilson loop operator of a nonabelian gauge
4 4
theory
both roots of [23] may be chosen. Thus, there are
I
two possible CFTs corresponding to the same WC tr P exp i A 29
classical AdS action: in one of them the correspond- C
ing operator has dimension , while in the other involves the path-ordered integral of the gauge
the dimension is . We note that is bounded connection A along a contour C. For N = 4 SYM,
from below by (d 2)=2, which is precisely the one typically uses a generalization of this loop
unitarity bound on dimensions of scalar operators in operator which incorporates other fields in the
d-dimensional field theory! Thus, the ability to N = 4 multiplet, the adjoint scalars and fermions.
choose dimension is crucial for consistency of Using a rectangular contour, we can calculate the
the AdS/CFT duality. quarkantiquark potential from the expectation
Whether string theory on AdS5 Y5 contains value hW(C)i. One thinks of the quarks located a
fields with m2 in the range [28] depends on Y5 . distance L apart for a time T, yielding
The example discussed in the next section,
Y5 = T 1, 1 , turns out to contain such fields, and the hWi eTVL 30
possibility of having dimension , [27], is crucial
where V(L) is the potential.
for consistency of the AdS/CFT duality in that case.
According to Maldacena, and Rey and Yee, the
However, for Y5 = S5 , which is dual to the N = 4
AdS/CFT correspondence relates the Wilson loop
large-N SYM theory, there are no such fields and all
expectation value to a sum over string world sheets
scalar dimensions are given by [27].
ending on the boundary of L5 (z = 0) along the
The operators in the N = 4 large-N SYM theory
contour C:
naturally break up into two classes: those that Z
correspond to the KaluzaKlein states of super-
hWi eS 31
gravity and those that correspond to massive string
180 AdS/CFT Correspondence
The simplest examples of X are orbifolds C3 =, and q are integers with p q. Gauntlett et al. (2004)
where is a discrete subgroup of SO(6). Indeed, if discovered metrics on all the Y p, q , and the quiver
SU(3), then N = 1 supersymmetry is preserved. gauge theories that live on the D-branes probing the
The level surface of such an X is Y = S5 =. In this singularity are now known. Making contact with
case, the product structure of the gauge theory can the simpler examples discussed above, the Y p, 0 are
be motivated by thinking about image stacks of D3 orbifolds of T 1, 1 while the Y p, p are orbifolds of S5 .
branes from the action of . In the second class of cones X, a del Pezzo surface
The next simplest example of a CalabiYau cone shrinks to zero size at the tip of the cone. A
X is the conifold which may be described by the del Pezzo surface is an algebraic surface of complex
following equation in four complex variables: dimension 2 with positive first Chern class. One
simple del Pezzo surface is a complex projective
X
4
za 2 0 39 space of dimension 2, P2 , which gives rise to the
a1 N = 1 preserving S5 =Z3 orbifold. Another simple
case is P1 P1 , which leads to T 1, 1 =Z2 . The
Since this equation is symmetric under an overall remaining del Pezzos surfaces Bk are P2 blown up
rescaling of the coordinates, this space is a cone. The at k points, 1 k 8. The cone where B1 shrinks to
level surface Y of the conifold is a coset manifold zero size has level surface Y 2, 1 . Gauge theories for
T 1, 1 = (SU(2) SU(2))=U(1). This space has the all the del Pezzos have been constructed. Except for
SO(4) SU(2) SU(2) symmetry which rotates the the three del Pezzos just discussed, and possibly also
zs, and also the U(1) R-symmetry under za ! ei za . for B6 , metrics on the cones over these del Pezzos
The metric on T 1, 1 is known explicitly; it assumes are not known. Nevertheless, it is known that for
the form of an S1 bundle over S2 S2 . 3 k 8, the volume of the SasakiEinstein mani-
The supersymmetric field theory on the D3 branes fold Y associated with Bk is 3 (9 k)=27.
probing the conifold singularity is SU(N) SU(N)
gauge theory coupled to two chiral superfields, Ai ,
in the (N, N) representation and two chiral super- The Central Charge
fields, Bj , in the (N, N) representation. The As
The central charge provides one of the most
transform as a doublet under one of the global
amazing ways to check the generalized AdS/CFT
SU(2)s, while the Bs transform as a doublet under correspondences. The central charge c and confor-
the other SU(2). Cancelation of the anomaly in the
mal anomaly a can be defined as coefficients of
U(1) R-symmetry requires that the As and the Bs
certain curvature invariants in the trace of the stress
each have R-charge 1=2. For consistency of the
energy tensor of the conformal gauge theory:
duality, it is necessary that we add an exactly
marginal superpotential which preserves the SU(2) hT i aE4 cI4 41
SU(2) U(1)R symmetry of the theory. Since a
(The curvature invariants E4 and I4 are quadratic in
marginal superpotential has R-charge equal to 2 it
the Riemann tensor and vanish for Minkowski
must be quartic, and the symmetries fix it uniquely
space.) As discussed above, correlators such as hT i
up to overall normalization:
can be calculated from supergravity, and one finds
W ij kl tr Ai Bk Aj Bl 40 3 N 2
ac 42
There are in fact infinite families of CalabiYau 4 volY
cones X, but there are two problems one faces in
On the gauge-theory side of the correspondence,
studying these generalized AdS/CFT correspon-
anomalies completely determine a and c:
dences. The first is geometric: the cones X are not
all well understood and only for relatively few do 3
a 32 3 tr R3 tr R
we have explicit metrics. However, it is often 1
c 32 9 tr R3 5 tr R 43
possible to calculate important quantities such as
the vol(Y) without knowing the metric. The second The trace notation implies a sum over the R-charges
problem is gauge theoretic: although many techni- of all of the fermions in the gauge theory. (From the
ques exist, there is no completely general procedure geometric knowledge that a = c, we can conclude
for constructing the gauge theory on a stack of D- that tr R = 0.)
branes at an arbitrary singularity. The R-charges can be determined using the
Let us mention two important classes of Calabi principle of a-maximization. For a superconformal
Yau cones X. The first class consists of cones over gauge theory, the R-charges of the fermions
the so-called Y p, q SasakiEinstein spaces. Here, p maximize a subject to the constraints that the
182 AdS/CFT Correspondence
methods for attacking the world sheet approach to Bertolini M, Bigazzi F, and Cotrone AL (2004) New checks and
string theories in anti-de Sitter backgrounds with RR subtleties for AdS/CFT and a-maximization. JHEP 0412: 024
(arXiv:hep-th/0411249).
background fields turned on. When such methods are Bigazzi F, Cotrone AL, Petrini M, and Zaffaroni A (2002) Super-
found, it is likely that the material presented here will gravity duals of supersymmetric four dimensional gauge theories.
have turned out to be just a tiny tip of a monumental Rivista del Nuovo Cimento 25N12: 1 (arXiv:hep-th/0303191).
iceberg of dualities between fields and strings. DHoker E and Freedman DZ (2002) Supersymmetric gauge
theories and the AdS/CFT correspondence, arXiv:hep-th/
0201253.
Gauntlett J, Martelli D, Sparks J, and Waldram D (2004) Sasaki
Acknowledgments Einstein metrics on S2 S3. Advances in Theoretical
The authors are very grateful to all their colla- Mathematics in Physics 8: 711 (arXiv:hep-th/0403002).
Gubser SS, Klebanov IR, and Polyakov AM (1998) Gauge theory
borators on gauge/string duality for their valuable
correlators from noncritical string theory. Physics Letters B
input over many years. The research of I R Klebanov 428: 105 (hep-th/9802109).
is supported in part by the National Science Herzog CP, Klebanov IR, and Ouyang P (2002) D-branes on the
Foundation (NSF) grant no. PHY-0243680. The conifold and N = 1 gauge/gravity dualities, arXiv:hep-th/
research of C P Herzog is supported in part by the 0205100.
Klebanov IR (2000) TASI lectures: introduction to the AdS/CFT
NSF under grant no. PHY99-07949. Any opinions,
correspondence, arXiv:hep-th/0009139.
findings, and conclusions or recommendations Klebanov IR and Strassler MJ (2000) Supergravity and a
expressed in this material are those of the authors confining gauge theory: Duality cascades and -resolution of
and do not necessarily reflect the views of the NSF. naked singularities. JHEP 0008: 052 (arXiv:hep-th/0007191).
Maldacena J (1998) The large N limit of superconformal field
See also: Brane Construction of Gauge Theories; Branes theories and supergravity. Advances in Theoretical and
and Black Hole Statistical Mechanics; Einstein Equations: Mathematical Physics 2: 231 (hep-th/9711200).
Maldacena JM (1998) Wilson loops in large N field theories.
Exact Solutions; Gauge Theories from Strings; Large-N
Physics Review Letters 80: 4859 (arXiv:hep-th/9803002).
and Topological Strings; Large-N Dualities; Mirror Polchinski J (1998) String Theory. Cambridge: Cambridge
Symmetry: A Geometric Survey; Quantum University Press.
Chromodynamics; Quantum Field Theory in Curved Polyakov AM (1999) The wall of the cave. International Journal
Spacetime; Superstring Theories. of Modern Physics A 14: 645.
Rey SJ and Yee JT (2001) Macroscopic strings as heavy quarks in
large N gauge theory and anti-de Sitter supergravity.
Further Reading European Physics Journal C 22: 379 (arXiv:hep-th/9803001).
Semenoff GW and Zarembo K (2002) Wilson loops in SYM
Aharony O, Gubser SS, Maldacena JM, Ooguri H, and Oz Y theory: from weak to strong coupling. Nuclear Physics
(2000) Large N field theories, string theory and gravity. Proceeding Supplements 108: 106 (arXiv:hep-th/0202156).
Physics Reports 323: 183 (arXiv:hep-th/9905111). Strassler MJ The duality cascade, TASI 2003 lectures, arXiv:hep-
Benvenuti S, Franco S, Hanany A, Martelli D, and Sparks J (2005) th/0505153.
An infinite family of superconformal quiver gauge theories with Witten E (1998) Anti-de Sitter space and holography. Advances in
SasakiEinstein duals. JHEP 0506: 064 (arXiv:hep-th/0411264). Theoretical and Mathematical Physics 2: 253 (hep-th/9802150).
Quantum Affine Algebras To define the quantization of U(g ^), one can either
^) (Drinfeld 1985) as an algebra over the
define Uh (g
Definition
ring C[[h]] of formal power series over an indeter-
A quantum affine algebra Uq (g ^) is a quantization of minate h or one can define Uq (g ^) (Jimbo 1985) as an
^
the enveloping algebra U(g) of an affine Lie algebra algebra over the field Q(q) of rational functions of q
(KacMoody algebra) g ^. So we start by introducing with coefficients in Q. We will present Uh (g ^) first.
affine Lie algebras and their enveloping algebras The quantum affine algebra Uh (g ^) is the unital
before proceeding to give their quantizations. algebra over C[[h]] topologically generated by
Let g be a semisimple finite-dimensional Lie algebra Hi , E
i for i = 0, 1, . . . , r and D with relations
over C of rank r with Cartan matrix (aij )i,j = 1,..., r , h i
symmetrizable via positive integers di , so that di aij is Hi ; E
j aij E i ; Hi ; Hj 0
symmetric. In terms of the simple roots i , we have h i qH
i qi
i Hi
E ; E
ij 4
i j ji j2 i j
qi q1i
aij 2 and di :
ji j2 2
D; Hi 0; D; E i i;0 E
i
P
We can introduce an 0 = ri = 1 ni i in such a way
that the extended Cartan matrix (aij )i,j = 0,..., r is of
1a
Xij
affine type that is, it is positive semidefinite of k 1 aij k 1aij k
1 E
i Ej Ei 0; i 6 j
rank r. The integers ni are referred to as Kac indices. k qi
k0
Choosing 0 to be the highest root of g leads to an
untwisted affine KacMoody algebra while choosing where qi = qdi and q = eh . The q-binomial coeffi-
0 to be the highest short root of g leads to a twisted cients are defined by
affine KacMoody algebra. qn qn
One defines the affine Lie algebra g ^ corresponding nq 5
q q1
to this affine Cartan matrix as the Lie algebra
(over C) with generators Hi , E i for i = 0, 1, . . . , r
and D with relations nq ! nq n 1q . . .2q 1q 6
h i
Hi ; E
j aij E i ; Hi ; Hj 0
h i m mq !
7
E
i ; Ej
ij Hi 2 n q nq !m nq !
D; Hi 0; D; Ei i;0 E i ^) is a Hopf
The quantum affine algebra Uh (g
1a
Xij
1 aij k 1aij k algebra with coproduct
1k E
i Ej E i 0; i 6 j
k
k0 D D 1 1 D
The E
i are referred to as Chevalley generators and Hi Hi 1 1 Hi 8
the last set of relations are known as Serre relations. Hi =2 H =2
E
i E
i qi qi i E
i
The generator D is known as the canonical deriva-
tion. We will denote the algebra obtained by antipode
dropping the generator D by g ^0 . SD D; SHi Hi
^
In applications to physics, the affine Lie algebra g 1
9
often occurs in an isomorphic form as the loop Lie S Ei qi Ei
algebra g[z, z1 ] C c with Lie product (for and co-unit
untwisted g ^)
D Hi E
i 0 10
k l kl
Xz ; Yz X; Yz k;l X; Yc; It is easy to see that the classical enveloping
for X; Y 2 g; k; l 2 Z 3 algebra U(g ^) can be obtained from the above by
setting h = 0, or more formally,
and c being the central element.
The universal enveloping algebra U(g ^) of g
^ is the ^=hUh g
U h g ^ Ug
^
unital algebra over C with generators Hi , E i for
i = 0, 1, . . . , r and D and with relations given by [2] We can also define the quantum affine algebra
where now [ , ] stands for the commutator instead of ^) as the algebra over Q(q) with generators
U q (g
the Lie product. Ki , E
i , D for i = 0, 1, . . . , r and relations that are
Affine Quantum Groups 185
^) by
obtained from the ones given above for Uh (g gradation, s0 = 1, s1 = = sr = 0, and the prin-
setting cipal gradation, s0 = s1 = = sr = 1. We shall
Hi =2
also need the spin gradation si = di1 . The
qi Ki ; i 0; . . . ; r 11 representations
One can go further to an algebraic formulation over
C in which q is a complex number (with some points
including q = 0 not allowed). This has the advantage play an important role in applications to integrable
that it becomes possible to specialize, for example, to models where is referred to as the (multiplicative)
q a root of unity, where special phenomena occur. spectral parameter. In applications to particle scatter-
ing introduced in a later section, it is related to the
Representations rapidity of the particle. The generator D can be
For applications in physics, the finite-dimensional realized as an infinitesimal scaling operator on and
representations of Uh (g ^0 ) are the most interesting. As thus plays the role of the Lorentz boost generator.
will be explained in later sections, these occur, for The tensor product representations a b are
example, as particle multiplets in 2D quantum field irreducible generically but become reducible for
theory or as spin Hilbert spaces in quantum spin certain values of = , a fact which again is important
chains. In the next subsection, we will use them to in applications (fusion procedure, particle-bound
derive matrix solutions to the YangBaxter equation. states).
While for a nonaffine quantum algebra Uh (g)
the ring of representations is isomorphic to that of
R-Matrices
the classical enveloping algebra U(g) (because in fact
the algebras are isomorphic, as Drinfeld has pointed A Hopf algebra A is said to be almost cocommu-
out), the corresponding fact is no longer true for affine tative if there exists an invertible element R 2 A A
quantum groups, except in the case g ^ = a(1) d such that
n = sln1 .
For the classical enveloping algebras U(g ^0 ), any
finite-dimensional representation of U(g) also carries Rx
xR; for all x 2 A 13
a finite-dimensional representation of U(g ^0 ). In the where
: x y 7! y x exchanges the two factors in
quantum case, however, in general, an irreducible the coproduct. In a quasitriangular Hopf algebra,
representation of Uh (g ^0 ) reduces to a sum of this element R satisfies
representations of Uh (g).
To classify the finite-dimensional representations idR R13 R23
^0 ), it is necessary to use a different realization
of Uh (g 14
id R R13 R12
^0 ) that looks more like a quantization of the
of Uh (g
loop algebra realization [3] than the realization in and is known as the universal R-matrix (see Hopf
terms of Chevalley generators. In terms of the Algebras and q-Deformation Quantum Groups). As
generators in this alternative realization, which we a consequence of [13] and [14], it automatically
do not give here because of its complexity, the satisfies the YangBaxter equation
finite-dimensional representations can be viewed as
pseudo-highest-weight representations. There is a set R12 R13 R23 R23 R13 R12 15
of r fundamental representations V a , a = 1, . . . r, For technical reasons, to do with the infinite number
each containing the corresponding Uh (g) fundamen- of root vectors of g^, the quantum affine algebra Uh (g ^)
tal representation as a component, from the tensor does not possess a universal R-matrix that is an
products of which all the other finite-dimensional element of Uh (g ^) Uh (g^). However, as pointed out
representations may be constructed. The details can by Drinfeld (1985), it possesses a pseudouniversal
be found in Chari and Pressley (1994). R-matrix R() 2 (Uh (g ^ 0 ) U h (g
^0 ))(()). The is
Given some representation : Uh (g ^0 ) ! End(V),
related to the automorphism defined in [12].
we can introduce a parameter with the help of When using the homogeneous gradation, R() is a
the automorphism of Uh (g ^0 ) generated by D and
formal power series in .
given by When the pseudouniversal R-matrix is evaluated
E si E in the tensor product of any two indecomposable
i i
i 0; . . . ; r 12 finite-dimensional representations 1 and 2 , one
Hi Hi
obtains a numerical R-matrix
Different choices for the si correspond to different
gradations. Commonly used are the homogeneous R12 1 2 R 16
186 Affine Quantum Groups
The entries of these numerical R-matrices are (with summation over repeated indices). The Yan-
rational functions of the multiplicative spectral gian Y(g ) is the algebra generated by these and a
parameter but when written in terms of the second set of generators Ja satisfying
additive spectral parameter u = log () they are
Ia ; Jb fabc Jc
trigonometric functions of u and satisfy the Yang
Baxter equation in the form given in [1]. The matrix Ja Ja 1 1 Ja 12 fabc Ic Ib
12
R12
R The requirement that be a homomorphism
imposes further relations:
satisfies the intertwining relation
Ja ; Jb ; Ic Ia ; Jb ; Jc abcdeg fId ; Ie ; Ig g
R 12 = 1 2 x
and
12 =
2 1 x R 17
Ja ; Jb ; Il ; Jm Jl ; Jm ; Ia ; Jb
^0
for any x 2 Uh (g ). It follows from the irreducibility abcdeg flmc lmcdeg fabc Id ; Ie ; Jg
of the tensor product representations that these where
R-matrices satisfy the YangBaxter equations
1 X
23 =R
id R 13 = idid R 12 = abcdeg f f f f ; fx1 ; x2 ; x3 g xi xj xk
24 adi bej cgk ijk i6j6k
R 12 = idid R 13 =
23 = id When g = sl2 the first of these is trivial, while for
R 18
g 6 sl2 the first implies the second. The co-unit is
or, graphically, (Ia ) = (Ja ) = 0; the antipode is s(Ia ) = Ia , s(Ja ) =
Ja (1=2)fabc Ic Ib . The Yangian may be obtained
V 3 V 2 V 1 V 3 V 2 V 1 from Uh (^g 0 ) by expanding in powers of h. For
the precise relationship, see Drinfeld (1985) and
= MacKay (2005). In the spin gradation, the auto-
morphism [12] generated by D descends to Y(g) as
Ia 7! Ia , Ja 7! Ja uIa .
V 1 V 2 V 3 V 1 V 2 V 3 There are two other realizations of Y(g). The first
Explicit formulas for the pseudouniversal (see, for example, Molev 2003) defines Y(gln )
R-matrices were found by Khoroshkin and Tolstoy. directly from
However, these are difficult to evaluate explicitly in Ru vT1 uT2 v T2 vT1 uRu v
specific representations so that in practice it is easiest
to find the numerical R-matrices R ab () by solving the where T1 (u) = T(u) id, T2 (v) = id T(v), and
intertwining relation [17]. It should be stressed that X
n
solving the intertwining relation, which is a linear Tu tij u eij
equation for the R-matrix, is much easier than directly i;j1
solving the YangBaxter equation, a cubic equation. tij u ij Iij u1 Jij u2
where eij are the standard matrix units for g ln . The
rational R-matrix for the n-dimensional representa-
Yangians tion of g ln is
As remarked by Drinfeld (1986), for untwisted ^g the X
n
P
quantum affine algebra Uh (^g 0 ) degenerates as h ! 0 Ru v 1 ; where P eij eji
into another quasipseudotriangular Hopf algebra, uv i;j1
the Yangian Y(g ) (Drinfeld 1985). It is associated
is the transposition operator. Y(g ln ) is then defined
with R-matrices which are rational functions of the
to be the algebra generated by Iij , Jij , and must be
additive spectral parameter u. Its representation ring
quotiented by the quantum determinant at its
coincides with that of Uh (^g 0 ).
center to define Y(sln ). The coproduct takes a
Consider a general presentation of a Lie algebra g ,
particularly simple form,
with generators Ia and structure constants fabc ,
so that X
n
tij u tik u tkj u
Ia ; Ib fabc Ic ; Ia Ia 1 1 Ia k1
Affine Quantum Groups 187
Here we do not give explicitly the third realization, where R T() = T(1, 1; ) and T(x, y; ) =
y
namely Drinfelds new realization of Y(g ) (Drinfeld P exp( x L(; ) d). Taking the trace of this relation
1988), but we remark that it was in this presentation gives an infinity of charges in involution.
that Drinfeld found a correspondence between certain Quantization is problematic, owing to divergences
sets of polynomials and finite-dimensional irreducible in T. The QISM regularizes these by putting the
representations of Y(g ), thus classifying these (although model on a lattice of spacing , defining the lattice
not thereby deducing their dimension or constructing Lax operator to be
the action of Y(g )). As remarked earlier, the structure is
as in the earlier section: Y(g ) representations are in Ln Tn 1=2; n 1=2;
Z n1=2 !
general g -reducible, and there is a set of r fundamental
Y(g )-representations, containing the fundamental P exp L; d
n1=2
g -representations as components, from which all
other representations can be constructed.
The lattice monodromy matrix is then T() =
liml ! 1, m ! 1 Tlm where Tlm = Lm Lm1 . . . Ll1 ,
and its trace again yields an infinity of commuting
Origins in the Quantum charges, provided that there exists a quantum
Inverse-Scattering Method R-matrix R(1 , 2 ) such that
Quantum affine algebras for general ^g first appear in R1 ; 2 L1n 1 L2n 2
Drinfeld (1985, 1986) and Jimbo (1985, 1986), but
they have their origin in the quantum inverse- L2n 2 L1n 1 R1 ; 2 19
scattering method (QISM) of the St. Petersburg
c2 ) first where L1n (1 ) = Ln (1 ) id, L2n (2 ) = id Ln (2 ).
school, and the essential features of Uh (sl
That R solves the YangBaxter equation follows
appear in Kulish and Reshetikhin (1983). In this
from the equivalence of the two ways of intertwining
section, we explain how the quantization of the Lax-
Ln (1 ) Ln (2 ) Ln (3 ) with Ln (3 ) Ln (2 )
pair description of affine Toda theory led to the
Ln (1 ).
discovery of the Uh (^g ) coproduct, commutation
To compute Ln (), one uses the canonical, equal-
relations, and R-matrix. We use the normalizations
time commutation relations for the i and _ i . In
of Jimbo (1986), in which the Hi are rescaled so that
terms of the lattice fields
the Cartan matrix aij = i .j is symmetric.
We begin with the affine Toda field equations Z n1=2
pi;n _ i x dx
2X
r
m
aij j
0 :j j n1=2
@ @ i e ni e Z
n1=2 X
j1
qi;n e
=2aij j x dx
n1=2 j
an integrable model in R 11 of r real scalar fields
i (x, t) with a mass parameter m and coupling the only nontrivial relation is [pi, n , qj, n ] =
constant
. Equivalently, we may write (ih
=2)ij qj, n , and one finds
[@x Lx , @t Lt ] = 0 for the Lax pair
! !
X
X
X r
mX r Ln exp Hi pi;n exp Hj pj;n
Lx x; t Hi @t i e
=2aij j E
i Ei 2 i 4 j
2 i1 2 i;j1
"
m X
m X
=2a0j j
r
1 qi;n E
e E E i Ei
2 j1 0
0 2 i
Y n #
X r
mX r 1
Lt x; t Hi @x i e
=2aij j E qi;n i E 0 E0
i Ei
2 i1 2 i;j1 i
!
mX r
1
X
e
=2a0j j
E0 E0 exp Hj pj;n O2
2 j1 4 j
with arbitrary 2 C. The classical integrability of the the expression used by the St Petersburg school and
system is seen in the existence of r(, 0 ) such that by Jimbo. We now make the replacement
Hi =4 Hi =4
Ei 7! q Ei q , where q = exp(ih
2 =2), and
fT T0 g r; 0 ; T T0 compute the O() terms in [19], which reduce to
188 Affine Quantum Groups
a b d a b
b a b a
is a consequence of the property [14] of the universal
c R-matrix with respect to the coproduct.
There is a famous no-go theorem due to Coleman
cab
a b a b and Mandula which states the impossibility of
(a) (b) combining space-time and internal symmetries in
Figure 1 (a) Graphical representation of a two-particle any but a trivial way. Affine quantum group
scattering process described by the S-matrix Sab (). (b) At symmetry circumvents this no-go theorem. In fact,
special values cab of the relative spectral parameter, the two the derivation D is the infinitesimal two-dimensional
particles of types a and b form a bound state of type c. Lorentz boost generator and the other symmetry
Affine Quantum Groups 189
For example, the subalgebra B (^g ) of Uh (^g 0 ) algebra Y(g , h) generated by the Ii , ~Jp is, like B (^g ),
generated by a co-ideal subalgebra, (Y(g , h)) Y(g ) Y(g , h),
Hi =2 Hi
and again yields an intertwining relation for
Qi qi E
i Ei i qi 1; K-matrices. For g = sln and h = so n or sp 2n , Y(g , h)
i 0; . . . ; r 24 is the twisted Yangian described in Molev (2003).
All the constructions in earlier sections of this
is a boundary quantum group for certain choices of
review have analogs in the boundary setting. For
the parameters i 2 C[[h]]. It is a left co-ideal
more details see Delius and MacKay (2003) and
subalgebra of Uh (^g 0 ) because
MacKay (2005).
Qi Qi 1 qH g 0 B g
i Qi 2 Uh ^
i
^ 25
See also: Bethe Ansatz; Boundary Conformal Field
Intertwiners K() : V ! V= for some constant Theory; Classical r-Matrices, Lie Bialgebras, and Poisson
satisfying Lie Groups; Hopf Algebras and q-Deformation Quantum
Groups; RiemannHilbert Problem; Solitons and
^
K Q = QK; for all Q 2 B g 26 KacMoody Lie Algebras; YangBaxter Equations.
provide solutions of the reflection equation in the
form
12 id K1 R
21 = Further Reading
id K2 R
12 = id K1 Chari V and Pressley AN (1994) Quantum Groups. Cambridge:
R
Cambridge University Press.
R 21 id K2 27 Chari V and Pressley AN (1996) Yangians, integrable quantum
systems and Doreys rule. Communications in Mathematical
This can be extended to the case where the Physics 181: 265302.
boundary itself carries a representation W of B (^g ). Delius GW (1995) Exact S-matrices with affine quantum group
symmetry. Nuclear Physics B 451: 445465.
The boundary YangBaxter equation can be repre-
Delius GW and MacKay NJ (2003) Quantum group symmetry in
sented graphically as sine-Gordon and affine Toda field Theories on the Half-Line,
2
Communications in Mathematical Physics 233: 173190.
V 1/ 2
V 1/ Drinfeld V (1985) Hopf algebras and the quantum YangBaxter
1
V 1/ equation. Soviet Mathematics Doklady 32: 254258.
Drinfeld V (1986) Quantum Groups, Proc. Int. Cong. Math.
1
(Berkeley), pp. 798820.
V 1/ Drinfeld V (1988) A new realization of Yangians and quantized
= V 1
affine algebras. Soviet Mathematics Doklady 36: 212216.
Jimbo M (1985) A q-difference analogue of Ug and the Yang
V 1 Baxter equation. Letters in Mathematical Physics 10: 6369.
W Jimbo M (1986) Quantum R-matrix for the generalized Toda
V 2 V 2 W system. Communications in Mathematical Physics 102:
537547.
Another example is provided by twisted Yangians Jimbo M and Miwa T (1995) Algebraic Analysis of Solvable
where, when the Ia and Ja are constructed as Lattice Models. Providence, RI: American Mathematical
Society.
nonlocal charges in sigma models, it is found that Kulish PP and Reshetikhin NY (1983) Quantum linear problem
a boundary condition which preserves integrability for the sine-Gordon equation and higher representations.
leaves only the subset Journal of Soviet Mathematics 23: 2435.
MacKay NJ (2005) Introduction to Yangian symmetry in
Ii and ~Jp Jp 1 fpiq Ii Iq Iq Ii integrable field theory. International Journal of Modern
4
Physics (to appear).
conserved, where i labels the h-indices and p, q the Molev A (2003) Yangians and their applications. In: Hazewinkel
k-indices of a symmetric splitting g = h k. The M (ed.) Handbook of Algebra, vol. 3, pp. 907959. Elsevier.
AharonovBohm Effect 191
AharonovBohm Effect
M Socolovsky, Universidad Nacional Autonoma de In the context of the Schrodinger equation, one
Mexico, Mexico DF, Mexico can show that due to gauge invariance, if 0 is a
2006 Elsevier Ltd. All rights reserved. solution to the equation in the absence of an
electromagnetic potential, then the product of
0 (x) times the integral of A over a path joining
an arbitrary reference point x0 to x is also a
Introduction solution, if the integral is path independent. How-
In classical electrodynamics, the interaction of charged ever, it is the path integral of Feynman which in the
particles with the electromagnetic field is local, formulas for propagators of charged particles in the
through the pointlike coupling of the electric charge presence of electromagnetic fields clearly shows that
of the particles with the electric and magnetic fields, E the action of these fields on charged particles is
and B, respectively. This is mathematically expressed nonlocal, and it is given by the celebrated non-
by the Lorentz-force law. The scalar and vector integrable (path-dependent) phase factor of Wu and
potentials, and A, which are the time and space Yang (1975). Moreover, this fact provides an
components of the relativistic 4-potential A , are additional proof of the nonlocal character of
considered auxiliary quantities in terms of which quantum mechanics: to surround fluxes, or to
the field strengths E and B, the observables, are develop a potential difference, the particle has to
expressed in a gauge-invariant manner. The homo- travel simultaneously at least through two paths.
geneous or first pair of Maxwell equations are a direct Thus, the fact that the AharonovBohm (AB)
consequence of the definition of the field strengths in effect was verified experimentally, by Chambers and
terms of A_ The inhomogeneous or second pair of others, demonstrates the necessity of introducing the
Maxwell equations, which involve the charges and (gauge-dependent) potential A in describing the
currents present in the problem, are also usually electromagnetic interactions of the quantum parti-
written in terms of E and B ; however when writing cle. This is widely regarded as the single most
them in terms of A , the number of degrees of freedom important piece of evidence for electromagnetism
of the electromagnetic field is explicitly reduced from being a gauge theory. Moreover, it shows, to
six to four; and finally, with two additional gauge paraphrase Yang, that the field underdescribes the
transformations, one ends with the two physical physical theory, while the potential overdescribes it,
degrees of freedom of the electromagnetic field. and it is the phase factor which describes it exactly.
In quantum mechanics, however, both the The content of this article is essentially twofold.
Schrodinger equation and the path-integral approaches The first four sections are mainly physical, where we
for scalar and unpolarized charged particles in the describe the magnetic AB effect using the
presence of electromagnetic fields, are written in Schrodinger equation and the Feynman path inte-
terms of the potential and not of the field strengths. gral. The fifth section is geometrical and is the long-
Even in the case of the SchrodingerPauli equation est of the article. We describe the effect in the
for spin 1=2 electrons with magnetic moment m context of fiber bundles and connections, namely
interacting with a magnetic field B, one knows that as a result of the coupling of the wave function
the coupling m B is the nonrelativistic limit of the (section of an associated bundle) to a nontrivial
Dirac equation, which depends on A but not on E and flat connection (non-pure gauge vector potential
B_ Since gauge invariance also holds in the quantum with zero magnetic field) in a trivial bundle (the
domain, it was thought that A and were mere AB bundle) with topologically nontrivial (non-
auxiliary quantities, like in the classical case. simply-connected) base space. We discuss the mod-
Aharonov and Bohm, in 1959, predicted a quan- uli space of flat connections and the holonomy
tum interference effect due to the motion of charged groups giving the phase shifts of the interference
particles in regions where B(E) vanishes, but not patterns. Finally, in the last section, we briefly
A(), leading to a nonlocal gauge-invariant effect comment on the nonabelian AB effect.
depending on the flux of the magnetic field in the
inaccessible region, in the magnetic case, and on the
difference of the integrals over time of time-varying Electromagnetic Fields in Classical Physics
potentials, in the electric case. (The magnetic effect
was already noticed 10 years before by Ehrenberg In classical physics, the motion of charged particles
and Siday in a paper on the refractive index of in the presence of electromagnetic fields is governed
electrons.) by the equation
192 AharonovBohm Effect
The cancelation of the exponential factors shows of the figure (in direction z); outside of the solenoid,
that, under the condition of path independence, the magnetic field is zero. If the radius of the
there is no effect of the potential on the charged solenoid is R, a vector potential A that produces
particles. Another way to see this is by making a such field strength is given by
gauge transformation [7a][7c] with (x) = f (x),
(jBjr/2);
^ rR
which
Rx changes ! 0 and A ! A0 = A r Ax 21
0 0 (/2r);
^ r>R
x0 A(x ) dx = A A = 0.
The condition of path independence amounts, where = R2 jBj and is a unit vector in the
however, to the condition
R that no magnetic field is azimuthal direction. In fact,
present since, if A depends on , then for some R
pair of paths and 0 from (t, x) to (t0 , x0 ), 0 6 jBj^z; r R
R R R H R B = r Ax = 22
0; r > R
A 0 A = A 0 A = [(0 ) A = ds (r A),
where in the last equality we applied Stokes theorem Notice that at r = R, A is continuous but not
( is any surface with boundary [ ( 0 )), which continuously differentiable. Also, the ideal limit of
shows that B = r A must not vanish everywhere an infinitely long solenoid makes the problem two-
and has a nonzero flux through given by dimensional, that is, in the xy plane.
Z The probability amplitude for an electron emitted
= ds B 20 at the source S to arrive at the point P on the screen
, is given by the sum of two probability ampli-
The conclusion of this section is that the ansatz [19] for tudes, namely those corresponding to passing
solving [15] can only be applied in simply connected through the slits 1 and 2. The solenoid is assumed
regions with no magnetic field strength present. to be impenetrable to the electrons; mathematically,
this corresponds to a motion in a non-simply-
connected region. In the approximation for the
AharonovBohm Proposal path integral [16], in which one considers the
In 1959, Aharonov and Bohm proposed an experi- contribution of only two classes of paths, that is,
ment to test, in quantum mechanics, the coupling of the class fg represented by path I, and the class
electric charges to electromagnetic field strengths f 0 g represented by path II, if the wave function at
through a local interaction with the electromagnetic the source is S , then the wave function at P is
potential A , but not with the field strengths given by
themselves. However, as we saw before, no physical Z R
ijej=hc A
effect exists, that is, A can be gauged away, unless P ei=hS0 e
and with
Z
0 0
ei=hS0 = 2 27
P II = S 0
f 0 g
(Schulman 1971, Kobe 1979). As in [23],
and, in the last equality, we applied the extended
version of Stokes theorem (by Craven), to allow for P k0 = P ; k2Z 28
noncontinuously differentiable vector potentials;
There is a close relation between the AB effect
and the quantum of magnetic flux associated with
and the Dirac quantization condition (DQC) in the
the charge jej is defined by
presence of electric and magnetic charges: according
c
h to [25] (or [26]) the AB effect disappears when the
0 = 2 4:135 107 G cm2 24 flux equals n0 = 2n(hc=jej), n 2 Z, that is,
jej
p p when the condition
( = 2=jej = = 137 in the natural system jej = nhc 29
of units (n.s.u.) h = c = 1; is the fine structure
constant). Then the probability of finding the holds. But this is the DQC (Dirac 1931) when is
electron at P is proportional to the flux associated with a magnetic charge g :
(g) = (g=4r2 ) 4r2 = g, leading to jejg = nhc
2 0 2 0 2
j Pj =j P Ij j
P IIj (2n in the n.s.u.). This is precisely the condition for
2i=0 0 0
the Dirac string to be unobservable in quantum
2Re e P I P II 25
mechanics: to give no AB effect.
which exhibits an interference pattern shifted with
respect to that without the magnetic field: as B and
therefore change, dark and bright interference
Geometry of the AB Effect
fringes alternate periodically at the screen , with In this section we study the space of gauge classes of
period 0 . This is the magnetic AB effect, which has flat potentials outside the solenoid, which determine
been quantitatively verified in many experiments, the the AB effect; the topological structure of the AB
first one in 1960 by Chambers. The effect is: bundle; and the holonomy groups of the connec-
tions, which precisely give the phase shifts of the
1. gauge invariant, since B and therefore are
wave functions. We use the n.s.u. system; in parti-
gauge invariant; 1
cular, if [L] is the unit of length, then
p [A
p ] = [L] ,
2. nonlocal, since it depends on the magnetic field 0
[jej] = [L] , and 0 = 2=jej = = 137, where
inside the solenoid, where the electrons never
is the fine structure constant.
enter;
To synthesize, one can say that the abelian AB
3. quantum mechanical, since classically the charges
effect is a nonlocal gauge-invariant quantum effect
do not feel any force and therefore no effect
due to the coupling of the wave function (section of
would be expected in this limit; and
an associated bundle) to a nontrivial (non-exact) flat
4. topological, since the electrons necessarily move
(closed) connection in a trivial principal bundle with
in a non-simply-connected space.
a non-simply-connected base space. In the following
But perhaps the most important implication of the subsections, we will give a detailed explanation of
AB effect is a dramatic additional confirmation of these statements.
the nonlocal character of quantum mechanics: the
electron has to travel along the two paths (I and The AB Bundle
II) simultaneously; on the contrary, no flux would
The gauge group of electromagnetism is the abelian
be surrounded and then no shift of the (then
Lie group U(1) with Lie algebra (the tangent space at
nonexistent) interference fringes would be observed
the identity) u(1) = iR. In the limit of an infinitely
at the screen .
long and infinitesimally thin solenoid carrying the
Calculations in the path-integral approach includ-
magnetic flux , the space available to the electrons
ing the whole set of homotopy classes of paths
is the plane minus a point, that is, R2
, which is of
around the solenoid, indexed by an integer m, have
the same homotopy type as the circle S1 . Then the
been performed by several authors, leading to a
set of isomorphism classes of U(1) bundles over R 2
C0 = fA 2 1 R2
; u1; dA = 0g 31 Covariant Derivative, Parallel Transport,
and Holonomy
through
Let G be a matrix Lie group with Lie algebra g, B a
C0 G ! C0 ; A; f ! A f 1 df 32 differentiable manifold, : G ! P! B a principal
bundle, V a vector space, G V ! V an action,
where f 1 (x, y) = (f (x, y))1 . The moduli space and V : V ! P G V ! v
B the corresponding asso-
ciated vector bundle (V is trivial if is trivial). Call
C0 (V ) the sections of V , (TB)( (TP)) the sections
M0 fgauge equivalence classes
G of the tangent bundle of B(P), and eq (P, V) the set
of flat connections on AB g of functions : P ! V satisfying (pg) = g1 (p)
(equivariant functions from P to V). s 2 (V )
fA fA f 1 df; f 2 Gg; A 2 C0 g 33 induces s 2 eq (P, V) with s (p) = , where
s((p)) = [p, ] and 2 eq (P, V) induces s 2 (V )
is isomorphic to the circle S1 with length 1. This can with s (b) = [p, (p)], where p 2 1 (fbg). If H is a
be seen as follows: the de Rham cohomology of R 2
connection on , that is, a smooth assignment of a
with coefficients in iR in dimension 1 is (horizontal) vector subspace Hp of Tp P at each p of
P, algebraically determined by a smooth g-valued
1-form ! on P through Hp = ker(!p ), s 2 (V ),
1
HDR R2
; iR = fA0 DR ; 2 Rg
X 2 (TB), and X" 2 (TP) the horizontal lifting of
1
HDR S1 ; iR R 34 X by !, then X" (s ) 2 eq (P, V), and covariant
AharonovBohm Effect 197
derivative of s with respect to ! in the direction of X is whose solution is the time-ordered exponential
defined by Z t !
gtg01 T exp
d !
r!X s : = sX"s 36a 0
X
1 Z t
If : 1 (U) ! U G is a local trivialization of , 1 1 m 1
d1 !
1
x , = 1, . . . , dim B are local coordinates on U, and m1 0
ei , i = 1, . . . , dim V is a basis of the local sections in Z 1
1
V (U), then the local expression of [36a] is 2
d2 !
2
0
Z
@ m1
r!XU @=@x si ei = X ij Aji si ej 36b m
dm !
m
38
@x 0
For c 2 (R2
;(x0 , y0 )), which turns n times nonabelian. Examples with YangMills and grav-
around the solenoid at (0, 0), eqn [40] gives itational fields are considered in the literature.
H H
n A in a
c" x0 ; y0 ; e c x ; y ; e
H 0 0 c
relation to classical field theory nor the influence of be applied to them. However, it has recently been
background fields can be properly treated. shown that formal perturbation theory can be
Algebraic quantum field theory (AQFT; synony- reshaped in the spirit of AQFT such that the algebras
mously, local quantum physics), on the contrary, of observables of these models can be constructed as
aims at emphasizing the concept of locality at every algebras of formal power series of Hilbert space
instance. As the nonlocal features of quantum operators. The price to pay is that the deep
physics occur at the level of states (entangle- mathematics of operator algebras cannot be applied,
ment), not at the level of observables, it is better but the crucial features of the algebraic approach can
not to base the theory on the Hilbert space of states be used.
but on the algebra of observables. Subsystems of a AQFT was originally proposed by Haag as a
given system then simply correspond to subalgebras concept by which scattering of particles can be
of a given algebra. The locality concept is abstractly understood as a consequence of the principle of
encoded in a notion of independence of subsystems; locality. It was then put into a mathematically
two subsystems are independent if the algebra of precise form by Araki, Haag, and Kastler. After the
observables which they generate is isomorphic analysis of particle scattering by Haag and Ruelle
to the tensor product of the algebras of the and the clarification of the relation to the Lehmann
subsystems. SymanzikZimmermann (LSZ) formalism by Hepp,
Spacetime can then in the spirit of Leibniz be the structure of superselection sectors was studied
considered as an ordering device for systems. So, one first by Borchers and then in a fundamental series of
associates with regions of spacetime the algebras of papers by Doplicher, Haag, and Roberts (DHR)
observables which can be measured in the pertinent (see, e.g., Doplicher et al. (1971, 1974)) (soon after
region, with the condition that the algebras of Buchholz and Fredenhagen established the relation
subregions of a given region can be identified with to particles), and finally Doplicher and Roberts
subalgebras of the algebra of the region. uncovered the structure of superselection sectors as
Problems arise if one aims at a generally covariant the dual of a compact group thereby generalizing the
approach in the spirit of general relativity. Then, in TannakaKrein theorem of characterization of
order to avoid pitfalls like in the hole problem, group duals.
systems corresponding to isometric regions must be With the advent of two-dimensional conformal
isomorphic. Since isomorphic regions may be field theory, new models were constructed and it was
embedded into different spacetimes, this amounts shown that the DHR analysis can be generalized to
to a simultaneous treatment of all spacetimes of a these models. Directly related to conformal theories is
suitable class. We will see that category theory the algebraic approach to holography in anti-de Sitter
furnishes such a description, where the objects are (AdS) spacetime by Rehren.
the systems and the morphisms the embeddings of a The general framework of AQFT may be described
system as a subsystem of other systems. as a covariant functor between two categories. The
States arise as secondary objects via Hilbert space first one contains the information on local relations
representations, or directly as linear functionals on and is crucial for the interpretation. Its objects are
the algebras of observables which can be interpreted topological spaces with additional structures (typi-
as expectation values and are, therefore, positive cally globally hyperbolic Lorentzian spaces, possibly
and normalized. It is crucial that inequivalent spin bundles with connections, etc.), its morphisms
representations (sectors) can occur, and the being the structure-preserving embeddings. In the
analysis of the structure of the sectors is one of case of globally hyperbolic Lorentzian spacetimes,
the big successes of AQFT. One can also study the one requires that the embeddings are isometric and
particle interpretation of certain states as well as preserve the causal structure. The second category
(equilibrium and nonequilibrium) thermodynamical describes the algebraic structure of observables. In
properties. quantum physics the standard assumption is that one
The mathematical methods in AQFT are mainly deals with the category of C -algebras where the
taken from the theory of operator algebras, a field of morphisms are unital embeddings. In classical phys-
mathematics which developed in close contact to ics, one looks instead at Poisson algebras, and in
mathematical physics, in particular to AQFT. perturbative quantum field theory one admits alge-
Unfortunately, the most important field theories, bras which possess nontrivial representations as
from the point of view of elementary particle formal power series of Hilbert space operators. It is
physics, as quantum electrodynamics or the standard the leading principle of AQFT that the functor a
model could not yet be constructed beyond formal contains all physical information. In particular, two
perturbation theory with the annoying consequence theories are equivalent if the corresponding functors
that it seemed that the concepts of AQFT could not are naturally equivalent.
200 Algebraic Approach to Quantum Field Theory
In the analysis of the functor a, a crucial role is The concept of locally covariant quantum field
played by natural transformations from other theory is defined as follows.
functors on the locality category. For instance, a
Definition 1
field A may be defined as a natural transformation
from the category of test function spaces to the (i) A locally covariant quantum field theory is a
category of observable algebras via their functors covariant functor a from Loc to Obs and (writing
related to the locality category. for a( )) with the covariance properties
0 0 ; idM idaM
Quantum Field Theories as Covariant for all morphisms 2 homLoc (M1 , M2 ), all
0
Functors morphisms 2 homLoc (M2 , M3 ), and all
M 2 obj(Loc).
The rigorous implementation of the generally covariant (ii) A locally covariant quantum field theory
locality principle uses the language of category theory. described by a covariant functor a is called
The following two categories are used: causal if the following holds: whenever there
Loc: The class of objects obj(Loc) is formed by all are morphisms j 2 homLoc (Mj , M), j = 1, 2,
(smooth) d-dimensional (d 2 is held fixed), so that the sets 1 (M1 ) and 2 (M2 ) are causally
globally hyperbolic Lorentzian spacetimes M separated in M, then one has
which are oriented and time oriented. Given any 1 aM1 ; 2 aM2 f0g
two such objects M1 and M2 , the morphisms 2
homLoc (M1 , M2 ) are taken to be the isometric where the element-wise commutation makes
embeddings : M1 ! M2 of M1 into M2 but with sense in a(M).
the following constraints: (iii) One says that a locally covariant quantum field
theory given by the functor a obeys the time-
(i) if : [a, b] ! M2 is any causal curve and slice axiom if
(a), (b) 2 (M1 ) then the whole curve must
be in the image (M1 ), that is, (t) 2 (M1 ) for aM aM0
all t 2 [a, b];
(ii) any morphism preserves orientation and holds for all 2 homLoc (M, M0 ) such that (M)
time orientation of the embedded spacetime. contains a Cauchy surface for M0 .
The composition is defined as the composition Thus, a quantum field theory is an assignment of
of maps, the unit element in homLoc (M, M) is C -algebras to (all) globally hyperbolic spacetimes
given by the identical embedding idM : M 7! M so that the algebras are identifiable when the
for any M 2 obj(Loc). spacetimes are isometric, in the indicated way. This
Obs: The class of objects obj(Obs) is formed by all is a precise description of the generally covariant
C -algebras possessing unit elements, and the locality principle.
morphisms are faithful (injective) unit-preserving
-homomorphisms. The composition is again
defined as the composition of maps, the unit The Traditional Approach
element in homObs (A, A) is for any A 2 obj(Obs)
The traditional framework of AQFT, in the Araki
given by the identical map idA : A 7! A, A 2 A.
HaagKastler sense, on a fixed globally hyperbolic
The categories are chosen for definitiveness. One spacetime can be recovered from a locally covariant
may envisage changes according to particular needs, quantum field theory, that is, from a covariant
as, for instance, in perturbation theory where instead functor a with the properties listed above.
of C -algebras general topological -algebras are Indeed, let M be an object in obj(Loc). K(M)
better suited. Or one may use von Neumann denotes the set of all open subsets in M which are
algebras, in case particular states are selected. On relatively compact and also contain, with each pair
the other hand, one might consider for Loc bundles of points x and y, all g-causal curves in M
over spacetimes, or (in conformally invariant the- connecting x and y (cf. condition (i) in the definition
ories) admit conformal embeddings as morphisms. In of Loc). O 2 K(M), endowed with the metric of M
case one is interested in spacetimes which are not restricted to O and with the induced orientation and
globally hyperbolic, one could look at the globally time orientation, is a member of obj(Loc), and the
hyperbolic subregions (where one needs to be careful injection map M,O : O ! M, that is, the identical
about the causal convexity condition (i) above). map restricted to O, is an element in homLoc (O, M).
Algebraic Approach to Quantum Field Theory 201
With this notation, it is easy to prove the following Ultraviolet Structure and Idealized Localizations
assertion:
This section deals with the problem of inspecting the
Theorem 1 Let a be a covariant functor with theory at very small scales. In the limiting case, one
the above-stated properties, and define a map is interested in idealized localizations, eventually the
K(M) 3 O 7! A(O) a(M) by setting points of spacetimes. But the observable algebras are
trivial at any point x 2 M, namely
AO : M;O aO \
AO C1; O 2 KM
Then the following statements hold: O3x
(i) The map fulfills isotony, that is, Hence, pointlike localized observables are neces-
sarily singular. Actually, the Wightman formulation
O1 O2 ) AO1 AO2
of quantum field theory is based on the use of
for all O1 ; O2 2 KM
distributions on spacetime with values in the algebra
of observables (as a topological -algebra). In spite
(ii) If there exists a group G of isometric diffeo-
of technical complications whose physical signifi-
morphisms : M ! M (so that g = g) preser-
cance is unclear, this formalism is well suited for a
ving orientation and time orientation, then there
discussion of the connection with the Euclidean
is a representation G 3 7! of G by C -
theory, which allows, in fortunate cases, a treatment
algebra automorphisms : a(M) ! a(M)
by path integrals; it is more directly related to
such that
models and admits, via the operator-product expan-
~ AO AO;
O 2 KM sion, a study of the short-distance behavior. It is,
therefore, an important question how the algebraic
(iii) If the theory given by a is additionally causal, approach is related to the Wightman formalism. The
then it holds that reader is referred to the literature for exploring the
results on this relation.
AO1 ; AO2 f0g
Whereas these results point to an essential equiva-
for all O1 , O2 2 K(M) with O1 causally sepa- lence of both formalisms, one needs in addition a
rated from O2 . criterion for the existence of sufficiently many Wight-
man fields associated with a given local net. Such a
These properties are just the basic assumptions of
criterion can be given in terms of a compactness
the ArakiHaagKastler framework.
condition to be discussed in the next subsection. As a
benefit, one derives an operator-product expansion
which has to be assumed in the Wightman approach.
The Achievements of the Traditional In the purely algebraic approach, the ultraviolet
Approach structure has been investigated by Buchholz and
Verch. Small-scale properties of theories are studied
In the ArakiHaagKastler approach in Minkowski
with the help of the so-called scaling algebras whose
spacetime M, many results have been obtained in
elements can be described as orbits of observables
the last 40 years, some of them also becoming a
under all possible renormalization group motions.
source of inspiration to mathematics. A description
There results a classification of theories in the scaling
of the achievements can be organized in terms of a
limit which can be grouped into three broad classes:
length-scale basis, from the small to the large. We
theories for which the scaling limit is purely classical
assume in this section that the algebra a(M) is
(commutative algebras), those for which the limit is
faithfully and irreducibly represented on a Hilbert
essentially unique (stable ultraviolet fixed point) and
space H, that the Poincare transformations are
not classical, and those for which this is not the case
unitarily implemented with positive energy, and
(unstable ultraviolet fixed point). This classification
that the subspace of Poincare invariant vectors is
does not rely on perturbation expansions. It allows
one dimensional (uniqueness of the vacuum).
an intrinsic definition of confinement in terms of the
Moreover, algebras correponding to regions which
so-called ultraparticles, that is, particles which are
are spacelike to a nonempty open region are
visible only in the scaling limit.
assumed to be weakly closed (i.e., von Neumann
algebras on H), and the condition of weak
Phase-Space Analysis
additivity is fulfilled, that is, for all O 2 K(M)
the algebra generated from the algebras As far as finite distances are concerned, there are
A(O x), x 2 M is weakly dense in a(M). two apparently competing principles, those of
202 Algebraic Approach to Quantum Field Theory
nuclearity and modularity. The first one suggests geometrical meaning. Indeed, these authors showed
that locally, after a cutoff in energy, one has a for the pair (A(W), ), where W denotes the wedge
situation similar to that of old quantum mechanics, region W = {x 2 M j jx0 j < x1 }, that the associated
namely a finite number of states in a finite volume modular unitary it is the Lorentz boost with velocity
of phase space. Aiming at a precise formulation, tanh(2t) in the direction 1 and that the modular
Haag and Swieca introduced their notion of com- conjugation J is the CP1 T symmetry operator with
pactness, which Buchholz and Wichmann sharpened parity P1 the reflection with respect to the x1 = 0
into that of nuclearity. The latter authors proposed plane. Later, Borchers discovered that already on the
that the set generated from the vacuum vector , purely algebraic level a corresponding structure exists.
He proved that, given any standard pair (A, ) and a
feH A j A 2 AO; kAk < 1g one-parameter group of unitaries ! U( ) acting on
H denoting the generator of time translations the Hilbert space H with a positive generator and
(Hamiltonian), is nuclear for any > 0, roughly such that is invariant and U( )AU( ) A, > 0,
stating that it is contained in the image of the unit then the associated modular operators and J fulfill
ball under a trace class operator. The nuclear size the commutation relations
Z(,O) of the set plays the role of the partition
it U it Ue2t
function of the model and has to satisfy certain
bounds in the parameter . The consequence of this JU J U
constraint is the existence of product states, namely
which are just the commutation relations between
those normal states for which observables localized in
boosts and lightlike translations.
two given spacelike separated regions are uncorre-
Surprisingly, there is a direct connection between
lated. A further consequence is the existence of
the two concepts of nuclearity and modularity.
thermal equilibrium states (KMS states) for all > 0.
Indeed, in the nuclearity condition, it is possible to
The second principle concerns the fact that, even
replace the Hamiltonian operator by a specific
locally, quantum field theory has infinitely many
function of the modular operator associated with a
degrees of freedom. This becomes visible in the
slightly larger region. Furthermore, under mild
ReehSchlieder theorem, which states that every
conditions, nuclearity and modularity together
vector which is in the range of eH for some
determine the structure of local algebras completely;
> 0 (in particular, the vacuum ) is cyclic and
they are isomorphic to the unique hyperfinite type
separating for the algebras A(O), O 2 K(M), that is,
III1 von Neumann algebra.
A(O) is dense in H ( is cyclic) and A = 0, A 2
A(O) implies A = 0 ( is separating). The pair Sectors, Symmetries, Statistics, and Particles
(A(O), ) is then a von Neumann algebra in the
so-called standard form. On such a pair, the Large scales are appropriate for discussing global
TomitaTakesaki theory can be applied, namely issues like superselection sectors, statistics and
the densely defined operator symmetries as far as large spacelike distances are
concerned, and scattering theory, with the resulting
SA A ; A 2 AO notions of particles and infraparticles, as far as large
timelike distances are concerned.
is closable, and the polar decomposition of its
In purely massive theories, where the vacuum
closure
S = J1=2 delivers an antiunitary involution
sector has a mass gap and the mass shell of the
J (the modular conjugation) and a positive self-
particles are isolated, a very satisfactory description
adjoint operator (the modular operator) asso-
of the multiparticle structure at large times can be
ciated with the standard pair (A(O), ). These
given. Using the concept of almost local particle
operators have the properties
generators,
JAOJ AO0 At
where the prime denotes the commutant, and where is a single-particle state (i.e., an eigenstate
it
AO it
AO; t2R of the mass operator), A(t) is a family of almost
local operators essentially localized in the kinema-
The importance of this structure is based on the tical region accessible from a given point by a
fact disclosed by Bisognano and Wichmann using motion with the velocities contained in the spectrum
Poincare-covariant Wightman fields and local alge- of , one obtains the multiparticle states as limits of
bras generated by them, that for specific regions in products A1 (t) An (t) for disjoint velocity sup-
Minkowski spacetime the modular operators have a ports. The corresponding closed subspaces are
Algebraic Approach to Quantum Field Theory 203
invariant under Poincare transformations and are representation of the symmetric group. One may then
unitarily equivalent to the Fock spaces of noninter- enlarge the algebra of observables and obtain an
acting particles. algebra of operators which transform covariantly
For massless particles, no almost-local particle under the global gauge group and satisfy Bose or
generators can be expected to exist. In even Fermi commutation relations for spacelike separation.
dimensions, however, one can exploit Huygens In two spacetime dimensions, one obtains instead
principle to construct asymptotic particle generators braided tensor categories. They have been classified
which are in the commutant of the algebra of the under additional conditions (conformal symmetry,
forward or backward lightcone, respectively. Again, central charge c < 1) in a remarkable work by
their products can be determined and multiparticle Kawahigashi and Longo. Moreover, in their paper,
states obtained. one finds that by using completely new methods (Q-
Much less well understood is the case of massive systems) a new model is unveiled, apparently
particles in a theory which also possesses massless inaccessible by methods used by others. To some
particles. Here, in general, the corresponding states extent, these categories can be interpreted as duals
are not eigenstates of the mass operator. Since of generalized quantum groups.
quantum electrodynamics (QED) as well as the The question arises whether all representations
standard model of elementary particles have this describing elementary particles are, in the massive
problem, the correct treatment of scattering in these case, DHR representations. One can show that in the
models is still under discussion. One attempt to a case of a representation with an isolated mass shell
correct treatment is based on the concept of the so- there is an associated vacuum representation which
called particle weights, that is, unbounded positive becomes equivalent to the particle representation after
functionals on a suitable algebra. This algebra is restriction to observables localized spacelike to a given
generated by positive almost-local operators annihi- infinitely extended spacelike cone. This property is
lating the vacuum and interpreted as counters. weaker than the DHR condition but allows, in four
The structure at large spacelike scales may be spacetime dimensions, the same construction of a
analyzed by the theory of superselection sectors. The global gauge group and of covariant fields with Bose
best-understood case is that of locally generated and Fermi commutation relations, respectively, as the
sectors which are the objects of the DHR theory. DHR condition. In three space dimensions, however,
Starting from a distinguished representation 0 one finds a braided tensor category, which has similar
(vacuum representation) which is assumed to fulfill properties as those known from topological field
the Haag duality, theories in three dimensions.
0 The sector structure in massless theories is not
0 AO 0 AO0 well understood, due to the infrared problem. This is
in particular true for QED.
for all double cones O, one may look at all
representations which are equivalent to the vacuum
representation if restricted to the observables loca-
Fields as Natural Transformations
lized in double cones in the spacelike complement of
a given double cone. Such representations give rise In order to be able to interpret the theory in terms of
to endomorphisms of the algebra of observables, measurements, one has to be able to compare
and the product of endomorphisms can be inter- observables associated with different regions of
preted as a product of sectors (fusion). In general, spacetime, or, even different spacetimes. In the
these representations violate the Haag duality, but absence of nontrivial isometries, such a comparison
there is a subclass of the so-called finite statistics can be made in terms of locally covariant fields. By
sectors where the violation of Haag duality is small, definition, these are natural transformations from
in the sense that the nontrivial inclusion the functor of quantum field theory to another
0 functor on the category of spacetimes Loc.
AO AO0 The standard case is the functor which associates
with every spacetime M its space D(M) of smooth
has a finite Jones index. These sectors form (in at least
compactly supported test functions. There, the
three spacetime dimensions) a symmetric tensor
morphisms are the pushforwards D
.
category with some further properties which can be
identified, in a generalization of the TannakaKrein Definition 2 A locally covariant quantum field is
theorem, as the dual of a unique compact group. This a natural transformation between the functors d
group plays the role of a global gauge group. The and a, that is, for any object M in obj(Loc) there
symmetry of the category is expressed in terms of a exists a morphism M : D(M) ! a(M) such that for
204 Algebraic Approach to Quantum Field Theory
any pair of objects M1 and M2 and any morphism Field Theory: Fundamental Concepts and Tools;
between them, the following diagram commutes: Scattering in Relativistic Quantum Field Theory: The
Analytic Program; Spin Foams; Symmetries in Quantum
M1
DM1 ! AM1 Field Theory: Algebraic Aspects; Symmetries in Quantum
Field Theory of Lower Spacetime Dimensions;
# # TomitaTakesaki Modular Theory; Two-Dimensional
Models; von Neumann Algebras: Introduction, Modular
DM2 ! AM2 Theory and Classification Theory; von Neumann
M2
Algebras: Subfactor Theory.
Anomalies
S L Adler, Institute for Advanced Study, Princeton, NJ, A
USA
2006 Elsevier Ltd. All rights reserved.
Synopsis V V
Figure 1 The AVV triangle diagram responsible for the abelian
Anomalies are the breaking of classical symmetries by chiral anomaly.
quantum mechanical radiative corrections, which arise
when the regularizations needed to evaluate small
with F (x) = @ B (x) @ B (x) the electromagnetic
fermion loop Feynman diagrams conflict with a
field strength tensor. The second term in eqn [2],
classical symmetry of the theory. They have important
which would be unexpected from the application of
implications for a wide range of issues in quantum
the classical Noether theorem, is the abelian axial-
field theory, mathematical physics, and string theory.
vector anomaly (often called the AdlerBellJackiw
(or ABJ) anomaly after the seminal papers on the
subject). Since vector current conservation, together
Chiral Anomalies, Abelian with the axial-vector current anomaly, implies that
and Nonabelian the left- and right-handed chiral currents j j5 are
Consider quantum electrodynamics, with the fer- also anomalous, the axial-vector anomaly is fre-
mionic Lagrangian density quently called the chiral anomaly, and we shall
use the terms interchangeably in this article.
L i @ e0 B m0 1a There are a number of different ways to understand
where = y 0 , e0 and m0 are the bare charge and why the extra term in eqn [2] appears. (1) Working
mass, and B is the electromagnetic gauge potential. through the formal Feynman diagrammatic Ward
(We reserve the notation A for axial-vector quan- identity proof of the Noether theorem, one finds that
tities.) Under a chiral transformation there is a step where the closed fermion loop contribu-
tions are eliminated by a shift of the loop-integration
! ei5 1b variable. For Feynman diagrams that are convergent,
this is not a problem, but the AVV diagram is linearly
with constant , the kinetic term in eqn [1a] is
divergent. The linear divergence vanishes under sym-
invariant (because 5 commutes with 0 ), whereas
metric integration, but the shift then produces a finite
the mass term is not invariant. Therefore, naive
residue, which gives the anomaly. (2) If one defines the
application of Noethers theorem would lead one to
AVV diagram by PauliVillars regularization with
expect that the axial-vector current
regulator mass M0 that is allowed to approach infinity
j5 5 1c at the end of the calculation, one finds a classical
Noether theorem in the regulated theory,
obtained from the Lagrangian density by applying a
chiral transformation with spatially varying , should @ j5 jm0 @ j5 jM0 2im0 j5 jm0 2iM0 j5 jM0 3a
have a divergence given by the change under chiral
transformation of the mass term in eqn [1a]. Up to with the subscripts m0 and M0 indicating that
tree approximation, this is indeed true, but when one fermion loops are to be calculated with fermion
computes the AVV Feynman diagram with one axial- mass m0 and M0 , respectively. Taking the vacuum
vector and two vector vertices (see Figure 1), and to two-photon matrix element of eqn [3a], one finds
insists on conservation of the vector current that the matrix element h0jj5 jM0 ji, which is
j = , one finds that to order e20 , the classical unambiguously computable after imposing vector-
Noether theorem is modified to read current conservation, falls off only as M1
0 as the
regulator mass approaches infinity. Thus, the
e20 product of 2iM0 with this matrix element has a
@ j5 x 2im0 j5 x F xF x
2
162 finite limit, which gives the anomaly. (3) If the
206 Anomalies
the regularized theory as the regulator masses quarks (or an equivalent HanNambu triplet), eqn
approach infinity, this result applies to the [6b] gives the correct neutral pion decay rate. This
renormalized theory as well. calculation was one of the first pieces of evidence for
The above argument can be made precise, and the color degree of freedom of quarks.
extends to nongauge theories such as the -model as
well. For both gauge theories and the -model, Anomaly Cancellation in Gauge Theories
cancellation of radiative corrections to the anomaly
coefficient has been explicitly demonstrated in In quantum electrodynamics, the gauge particle (the
fourth-order calculations. Nonperturbative demon- photon) couples to the vector current, and so the
strations of anomaly renormalization have also been anomalous conservation properties of the axial-
given using the CallanSymanzik equations. For vector current have no effect. The same statement
example, in quantum electrodynamics, Zee, and holds for the gauge gluons in quantum chromody-
Lowenstein and Schroer, showed that a factor f namics, when treated in isolation from the other
that gives the ratio of the true anomaly to its one- interactions. However, in the electroweak theory
loop value obeys the differential equation that embeds quantum electrodynamics in a theory of
the weak force, the gauge particles (the W and Z
@ @ intermediate bosons) couple to chiral currents,
m
f 0 5
@m @ which are left- or right-handed linear combinations
Since f is dimensionless, it can have no dependence of the vector and axial-vector currents. In this case,
on the mass m, and since
() is nonzero this implies the chiral anomaly leads to problems with the
@f =@ = 0. Thus, f has no dependence on , and so renormalizability of the theory, unless the anomalies
f = 1. cancel between different fermion species. Writing all
fermions as left-handed, the condition for anomaly
cancellation is
Applications of Chiral Anomalies
trfT ; T
gT trT T
T
T T 0
Chiral anomalies have numerous applications in the for all ;
; 7
standard model of particle physics and its exten-
sions, and we describe here a few of the most with T the coupling matrices of gauge bosons to
important ones. left-handed fermions. These conditions are obeyed
in the standard model, by virtue of three nontrivial
Neutral Pion Decay p 0 ! g g sum rules on the fermion gauge couplings being
satisfied (four sum rules, if one includes the
As a result of the abelian chiral anomaly, the
gravitational contribution to the chiral anomaly
partially conserved axial-vector current (PCAC)
given in eqn [4c], which also cancels in the standard
equation relevant to neutral pion decay is modified
model). Note that anomaly cancellation in the
to read
locally gauged currents of the standard model does
@ F 53 x not imply anomaly cancellation in global-flavor
p 0 currents. Thus, the flavor axial-vector current
f 2 = 2 x S F xF x
6a anomaly that gives the 0 ! matrix element
4
remains anomalous in the full electroweak theory.
with the pion mass, f 131 MeV the charged- Anomaly cancellation imposes important constraints
pion decay constant, and S a constant determined on the construction of grand unified models that
by the constituent fermion charges and axial-vector combine the electroweak theory with quantum
couplings. Taking the matrix element of eqn [6a] chromodynamics. For instance, in SU(5) the fer-
between the vacuum state and a two-photon state, mions are put into a 5 and 10 representation, which
and using the fact that the left-hand side has a together, but not individually, are anomaly free. The
kinematic zero (the SutherlandVeltman theorem), larger unification groups SO(10) and E6 satisfy eqn
one sees that the 0 ! amplitude F is comple- [7] for all representations, and so are automatically
tely determined by the anomaly term, giving the anomaly free.
formula
p
F =2S 2=f 6b Instanton Physics and the Theta Vacuum
For a single set of fractionally charged quarks, the The theory of anomalies is intimately tied to the
amplitude F is a factor of three too small to agree physics associated with instanton classical Yang
with experiment; for three fractionally charged Mills theory solutions. Since the instanton field
208 Anomalies
strength is self-dual, the nonvanishing instanton has the same anomaly coefficient as that in the
Euclidean action underlying theory. In other words, we must have
Z
1
SE d4 x F F 82 8a trfS ; S
gS trfT ; T
gT 9
4
implies that the integral of the pseudoscalar density To prove this, one adjoins to the theory a set of
F F
over the instanton is also nonzero, right-handed spectator fermions f with the same
Z flavor structure as the original set, but which are not
d4 xF F
642 8b acted on by the color force. These right-handed
fermions cancel the original anomaly, making the
Referring back to eqn [4b], this means that the underlying theory anomaly free at zero color
integral of the nonabelian chiral anomaly for coupling; since dynamics cannot spontaneously
fermions in the background field of an instanton is generate anomalies, the theory, when the color
an integer, which in the Minkowski space continua- dynamics is turned on, must also have no global
tion has the interpretation of a topological winding chiral anomalies. This implies that the bound-state
number change produced by the instanton tunneling spectrum must conspire to cancel the anomalies
solution. This fact has a number of profound associated with the right-handed spectators; in other
consequences. Since a vacuum with a definite wind- words, the bound-state anomaly structure must
ing number ji is unstable under instanton tunnel- match that of the original fermions. This anomaly
ing, careful analysis shows that the nonabelian matching condition has found applications in the
vacuum that has correct clustering properties is a study of the possible compositeness of quarks and
Fourier superposition leptons. It has also been applied to the derivation of
X nonperturbative dynamical results in whole classes
ji ei ji 8c of supersymmetric theories, where the combined
tools of holomorphicity, instanton physics, and
anomaly matching have given incisive results.
giving rise to the -vacuum of quantum chromody-
namics, and a host of issues associated with (the lack
of) strong CP violation, the PecceiQuinn mecha-
nism, and axion physics. Also, the fact that the Global Structure of Anomalies
integral of eqn [8b] is nonzero means that the U(1) We noted earlier that chiral anomalies are irreduci-
chiral symmetry of quantum chromodynamics is ble, in that they cannot be eliminated by adding a
broken by instantons, which as shown by t Hooft local polynomial counter-term to the action. How-
resolves the longstanding U(1) problem of strong ever, anomalies can be described by a nonlocal
interactions, that of explaining why the flavor effective action, obtained by integrating out the
singlet pseudoscalar meson 0 is not light, unlike its fermion field dynamics, and this point of view proves
flavor octet partners. very useful in the nonabelian case. Starting with the
abelian case for orientation, we note that if A is an
Anomaly Matching Conditions external axial-vector field, and we write an effective
The anomaly structure of a theory, as shown by t action [A], then the axial-vector current j5 asso-
Hooft, leads to important constraints on the forma- ciated with A is given (up to an overall constant) by
tion of massless composite bound states. Consider a the variational derivative expression
theory with a set of left-handed fermions if , with i a A
color index acted on by a nonabelian gauge force, j5 x 10a
A x
and f an ungauged family or flavor index. Suppose
that the family multiplet structure is such that the and the abelian anomaly appears as the fact that the
global chiral symmetries associated with the flavor expression
index have nonvanishing anomalies tr{T , T
}T .
Then the t Hooft condition asserts that if the color
@ j5 XA G 6 0; X @ 10b
forces result in the formation of composite massless A x
bound states of the original completely confined
fermions, and if there is no spontaneous breaking of is nonvanishing even when the theory is classically
the original global flavor symmetries, then these chiral invariant. Turning now to the nonabelian
bound states must contain left-handed spin-1/2 case, the variational derivative appearing in eqns
composites with a representation structure S that [10a] and [10b] must be replaced by an appropriate
Anomalies 209
covariant derivative. In terms of the internal- the consistency conditions. Subsequently, Witten
symmetry component fields Aa and Va of the gave a new construction of this local action, in
YangMills potentials of eqn [4a], one introduces terms of the integral of a fifth-rank antisymmetric
operators tensor over a five-dimensional disk which has a
four-dimensional space as its boundary. He also
Xa x @ fabc Vb c showed that requiring ei to be independent of the
Aa x A x choice of the spanning disk requires, in analogy with
Diracs quantization condition for monopole charge,
fabc Ab
Vc x the condition that the overall coefficient in the
11a nonabelian anomaly be quantized in integer multi-
Y a x @ a
fabc Vb c ples. Comparison with the lowest-order triangle
V x V x
diagram shows that in the case of SU(Nc ) gauge
theory, this integer is just the number of colors Nc .
fabc Ab
Ac x Thus, global considerations tightly constrain the
nonabelian chiral anomaly structure, and dictate
with fabc the antisymmetric nonabelian group struc- that up to an integer-proportionality constant, it
ture constants. The operators Xa and Y a are easily must have the form given in eqns [4a] and [4b].
seen to obey the commutation relations
nonvanishing contribution to the right-hand side of sectors of a theory, which do not contain the physical
eqn [13c], giving the lowest-order trace anomaly. fields that we directly observe, to the physical sector
Unlike the chiral anomaly, the trace anomaly is containing the observed fields.
renormalized in higher orders of perturbation
theory; heuristically, the reason is that whereas
boson field regulators do not affect the chiral Further Anomaly Topics
symmetry properties of a gauge theory (which are The above discussion has focused on some of the
determined just by the fermionic terms in the principal features and applications of anomalies.
Lagrangian), they do alter the energymomentum There are further topics of interest in the physics and
tensor, since gravitation couples to all fields, includ- mathematics of anomalies that are discussed in
ing regulator fields. An analysis using the Callan detail in the references cited in the Further reading
Symanzik equations shows, however, that the trace section. We briefly describe a few of them here.
anomaly is computable to all orders in terms of
various renormalization group functions of the Anomalies in Other Spacetime Dimensions
coupling. For example, in abelian electrodynamics, and in String Theory
defining
() and () by
() = (m=)@=@m and The focus above has been on anomalies in four-
1 () = (m=m0 )@m0 =@m, the trace of the energy dimensional spacetime, but anomalies of various
momentum tensor is given to all orders by types occur both in lower-dimensional quantum
field theories (such as theories in two- and three-
1 m0 14
NF F 14 dimensional spacetimes) and in quantum field the-
ories in higher-dimensional spacetimes (such as N = 1
with N[ ] specifying conditions that make the division supergravity in ten-dimensional spacetime). Anoma-
into two terms in eqn [14] unique, and with the lies also play an important role in the formulation
ellipsis indicating terms that vanish by the equa- and consistency of string theory. The bosonic string is
tions of motion. A similar relation holds in the consistent only in 26-dimensional spacetime, and the
nonabelian case, again with the
function appearing analogous supersymmetric string only in ten-dimen-
as the coefficient of the anomalous tr N[F F ] term. sional spacetime, because in other dimensions both
Just as in the chiral anomaly case, when spin-0, these theories violate Lorentz invariance after quanti-
spin-1/2, or spin-1 fields propagate on a background zation. In the Polyakov path-integral formulation of
spacetime, there are curvature-dependent contribu- these string theories, these special dimensions are
tions to the trace anomaly, in other words, gravita- associated with the cancellation of the Weyl anomaly,
tional anomalies. These typically take the form of which is the relevant form of the trace anomaly
complicated linear combinations of terms of the discussed above. YangMills, gravitational, and
form R2 , R R , R R , R, ; , with coefficients mixed YangMills gravitational anomalies make an
depending on the matter fields involved. appearance both in N = 1 ten-dimensional super-
In supersymmetric theories, the axial-vector current gravity and in superstring theory, and again special
and the energymomentum tensor are both dimensions play a role. In these theories, only when
components of the supercurrent, and so their anoma- the associated internal symmetry groups are either
lies imply the existence of corresponding supercurrent SO(32) or E8 E8 is elimination of all anomalies
anomalies. The issue of how the nonrenormalization possible, by cancellation of hexagon-diagram anoma-
of chiral anomalies (which have a supercurrent lies with anomalous tree diagrams involving
generalization given by the Konishi anomaly), and exchange of a massless antisymmetric two-form
the renormalization of trace anomalies, can coexist in field. This mechanism, due to Green and Schwarz,
supersymmetric theories originally engendered con- requires the factorization of a sixth-order trace
siderable confusion. This apparent puzzle is now invariant that appears in the hexagon anomaly in
understood in the context of a perturbatively exact terms of lower-order invariants, as well as two
expression for the
function in supersymmetric field numerical conditions on the adjoint representation
theories (the so-called NSVZ, for Novikov, Shifman, generator structure, restricting the allowed gauge
Vainshtein, and Zakharov,
function). Supersymme- groups to the two noted above.
try anomalies can be used to infer the structure of
effective actions in supersymmetric theories, and these
in turn have important implications for possibilities Covariant versus Consistent Anomalies;
Descent Equations
for dynamical supersymmetry breaking. Anomalies
may also play a role, through anomaly mediation, in The nonabelian anomaly of eqns [4a] and [4b] is
communicating supersymmetry breaking in hidden called the consistent anomaly, because it obeys the
Anomalies 211
WessZumino consistency conditions of eqn [12c]. spacetime integral of the anomaly is a topological
This anomaly, however, is not gauge covariant, as can invariant, as noted above in our discussion of
be seen from the fact that it involves not only the instanton-related applications of anomalies.
YangMills field strengths FV, A , but the potentials
V , A as well. It turns out to be possible, by adding
appropriate polynomials to the currents, to transform Retrospect
the consistent anomaly to a form, called the covariant The wide range of implications of anomalies has
anomaly, which is gauge covariant under gauge surprised even astonished the founders of the
transformations of the potentials V , A . This anom- subject. New anomaly applications have appeared
aly, however, does not obey the WessZumino within the last few years, and very likely the future
consistency conditions, and cannot be obtained from will see continued growth of the area of quantum
variation of an effective action functional. field theory concerned with the physics and mathe-
The consistent anomalies (but not the covariant matics of anomalies.
anomalies) obey a remarkable set of relations, called
the StoraZumino descent equations, which relate
the abelian anomaly in 2n 2 spacetime dimensions Acknowledgment
to the nonabelian anomaly in 2n spacetime dimen- This work is supported, in part, by the Depart-
sions. This set of equations has been interpreted ment of Energy under grant #DE-FG02-90ER40542.
physically by Callan and Harvey as reflecting the
fact that the Dirac equation has chiral zero modes in See also: Bosons and Fermions in External Fields;
the presence of strings in 2n 2 dimensions and of BRST Quantization; Effective Field Theories; Gauge
domain walls in 2n 1 dimensions. Theories from Strings; Gerbes in Quantum Field Theory;
Index Theorems; Lagrangian Dispersion (Passive
Anomalies and Fermion Doubling in Lattice Scalar); Lattice Gauge Theory; Nonperturbative and
Gauge Theories Topological Aspects of Gauge Theory; Quantum
Electrodynamics and Its Precision Tests; Quillen
A longstanding problem in lattice formulations of Determinant; Renormalization: General Theory;
gauge field theories is that when fermions are SeibergWitten Theory.
introduced on the lattice, the process of discretization
introduces an undesirable doubling of the fermion
particle modes. In particular, when an attempt is made Further Reading
to put chiral gauge theories, such as the electroweak Adler SL (1969) Axial-vector vertex in spinor electrodynamics.
theory, on the lattice, one finds that the doublers Physical Review 177: 24262438.
eliminate the chiral anomalies, by cancellation between Adler SL (1970) Perturbation theory anomalies. In: Deser S,
modes with positive and negative axial-vector charge. Grisaru M, and Pendleton H (eds.) Lectures on Elementary
Particles and Quantum Field Theory, vol. 1, pp. 3164.
Thus, for a long time, it appeared doubtful whether Cambridge, MA: MIT Press.
chiral gauge theories could be simulated on the lattice. Adler SL (2005) Anomalies to all orders. In: t Hooft G (ed.) Fifty Years
However, recent work has led to formulations of lattice of YangMills Theory, pp. 187228. Singapore: World Scientific.
fermions that use a mathematical analog of a domain Adler SL and Bardeen WA (1969) Absence of higher order
wall to successfully incorporate chiral fermions and the corrections in the anomalous axial-vector divergence equation.
Physical Review 182: 15171536.
chiral anomaly into lattice gauge theory calculations. Bardeen W (1969) Anomalous ward identities in spinor field
theories. Physical Review 184: 18481859.
Relation of Anomalies to the AtiyahSinger Bell JS and Jackiw R (1969) A PCAC puzzle: 0 ! in the
Index Theorem -model. Nuovo Cimento A 60: 4761.
Bertlmann RA (1996) Anomalies in Quantum Field Theory.
The singlet (aA = 1) anomaly of eqn [4b] is closely Oxford: Clarendon.
related to the AtiyahSinger index theorem. Specifi- De Azcarraga JA and Izquierdo JM (1995) Lie Groups,
Lie Algebras, Cohomology and Some Applications in Physics,
cally, the Euclidean spacetime integral of the singlet
ch. 10. Cambridge: Cambridge University Press.
anomaly constructed from a gauge field can be Fujikawa K and Suzuki H (2004) Path Integrals and Quantum
shown to give the index of the related Dirac Anomalies. Oxford: Oxford University Press.
operator for a fermion moving in that background Golterman M (2001) Lattice chiral gauge theories. Nuclear
gauge field, where the index is defined as the Physics Proceeding Supplements 94: 189203.
difference between the numbers of right- and left- Green MB, Schwarz JH, and Witten E (1987) Superstring Theory.
vol. 2, sects. 13.313.5. Cambridge: Cambridge University Press.
handed zero-eigenvalue normalizable solutions of Hasenfratz P (2005) Chiral symmetry on the lattice. In: t Hooft G
the Dirac equation. Since the index is a topological (ed.) Fifty Years of YangMills Theory, pp. 377398.
invariant, this again implies that the Euclidean Singapore: World Scientific.
212 Arithmetic Quantum Chaos
Jackiw R (1985) Field theoretic investigations in current algebra Shifman M (1997) Non-perturbative dynamics in supersymmetric
and topological investigations in quantum gauge theories. In: gauge theories. Progress in Particle and Nuclear Physics 39:
Treiman S, Jackiw R, Zumino B, and Witten E (eds.) Current 1116.
Algebra and Anomalies. Singapore: World Scientific and van Nieuwenhuizen P (1988) Anomalies in Quantum Field
Princeton: Princeton University Press. Theory: Cancellation of Anomalies in d = 10 Supergravity.
Jackiw R (2005) Fifty years of YangMills theory and our Leuven: Leuven University Press.
moments of triumph. In: t Hooft G (ed.) Fifty Years of Yang Volovik GE (2003) The Universe in a Helium Droplet, ch. 18.
Mills Theory, pp. 229251. Singapore: World Scientific. Oxford: Clarendon.
Makeenko Y (2002) Methods of Contemporary Gauge Theory, Weinberg S (1996) The Quantum Theory of Fields, Vol. II
ch. 3. Cambridge: Cambridge University Press. Modern Applications, ch. 22. Cambridge: Cambridge
Neuberger H (2000) Chiral fermions on the lattice. Nuclear University Press.
Physics Proceeding Supplements 83: 6776. Zee A (2003) Quantum Field Theory in a Nutshell, sect. IV.7.
Polchinski J (1999) String Theory, vol. 1, sect. 3.4; vol. 2, sect. Princeton: Princeton University Press.
12.2. Cambridge: Cambridge University Press.
PSL2; Z
a b
2 PSL2; R: a; b; c; d 2 Z =f1g
c d
A fundamental domain for the action of the
modular group PSL(2, Z) on H is the set
F PSL2;Z z 2 H : jzj > 1; 12 < Re z < 12 9
Figure 3 Fundamental domain of the regular octagon in the
(see Figure 2). The modular group is generated by Poincare disk.
the translation
1 1
: z 7! z 1
0 1 The group of orientation-preserving isometries is
now represented by PSU(1, 1) = SU(1, 1)={1},
and the inversion where
0 1
: z 7! 1=z
1 0 SU1; 1 : ; 2 C; jj2 jj2 1 11
These generators identify sections of the boundary
of F PSL(2, Z) . By gluing the fundamental domain acting on D as above via fractional linear transfor-
along identified edges, we obtain a realization of the mations. The fundamental group of the regular
modular surface, a noncompact surface with one octagon surface is the subgroup of all elements in
cusp at z ! 1,pand two conic singularities at z = i PSU(1, 1) with coefficients of the form
and z = 1=2 i 3=2.
An interesting example of a compact arithmetic q
p p p
surface is the regular octagon, a hyperbolic k l 2; m n 2 1 2 12
surface of genus 2. Its fundamental domain is
shown in Figure 3 as a subset of the Poincare disc where k, l, m, n 2 Z[i], that is, Gaussian integers of
D = {z 2 C: jzj < 1}, which yields an alternative the form k1 ik2 , k1 , k2 2 Z. Note that not all
parametrization of the hyperbolic plane H. In these choices of k, l, m, n 2 Z[i] satisfy the condition
coordinates, the Riemannian line and volume jj2 jj2 = 1. Since all elements 6 1 of act
element read fix-point free on H, the surface nH is smooth
without conic singularities.
4dx2 dy2 4dx dy In the following, we will restrict our attention to a
ds2 ; dA 10
1 x2 y2 2 1 x2 y2 2 representative case, the modular surface with
= PSL(2, Z).
y
Eigenvalue Statistics and Selberg
Trace Formula
The statistical properties of the rescaled eigenvalues
Xj (cf. [4]) of the Laplacian can be characterized by
their distribution in small intervals
N x; L : #fj : x Xj x Lg 13
Bogomolny (1997)) suggest that the Xj are asymp- where H
is the set of all primitive oriented closed
totically Poisson distributed: geodesics , and their lengths. The quantity j is
related to the eigenvalue j by the equation j = 2j
Conjecture 1 For any bounded function g : Z
0 ! C
1=4. The trace formula [18] holds for a large class of
we have
even test functions h. For example, it is sufficient to
Z X1
1 2X Lk eL assume that h is infinitely differentiable, and that the
gN x; L dx ! gk 14 Fourier transform of h,
X X k0
k!
Z
1
as T ! 1. gt h eit d 19
2 R
One may also consider larger intervals, where
has compact support. The trace formula for non-
L ! 1 as X ! 1. In this case, the assumption on
compact surfaces has additional terms from the
the independence of the Xj predicts a central-limit
parabolic elements in the corresponding group, and
theorem. Weyls law [3] implies that the expectation
includes also sums over the resonances of the
value is asymptotically, for T ! 1,
continuous part of the spectrum. The noncompact
Z
1 2X modular surface behaves in many ways like a
N x; L dx L 15 compact surface. In particular, Selberg showed that
X X
the number of eigenvalues embedded in the con-
This asymptotics holds for any sequence of L tinuous spectrum satisfies the same Weyl law as in
bounded away from zero (e.g., L constant, or the compact case (Sarnak 2003).
L ! 1). Setting
Define the variance by
Z AreaM 2 1
1 2X 2
h X;XL 20
2 X; L N x; L L dx 16 4 4
X X
where [X, XL] is the characteristic function of the
In view of the above conjecture, p one
expects interval [X, X L], we may thus view N (X, L) as
2 (X, L) L in the limit X ! 1, L= X ! 0 (the the left-hand side of the trace formula. The above
variance exhibits
p a less universal behavior in the test function h is, however, not admissible, and
range L X (the notation A B means there is a requires appropriate smoothing. Luo and Sarnak (cf.
constant c > 0 such that A cB), cf. Sarnak (1995), Sarnak (2003)) developed an argument of this type
and a central-limit theorem for the fluctuations to obtain a lower bound on the average number
around the mean: variance,
Conjecture 2 For any bounded function g : R ! C Z p
1 L 2 X
we have X; L0 dL0 21
! L 0 log X2
Z 2X p p
1 N x; L L
g p dx in the regime X= log X L X, which is
X X 2 x; L consistent with the Poisson conjecture 2 (X, L) L.
Z 1
1 2 Bogomolny, Levyraz, and Schmit suggested a remark-
! p gt e1=2t dt 17 able limiting formula for the two-point correlation
2 1
function for the modular surface (cf. Bogomolny
as X, L ! 1, L X. et al. (1997) and Bogomolny (2006)), based on an
The main tool in the attempts to prove the above analysis of the correlations between multiplicities of
conjectures has been the Selberg trace formula. It lengths of closed geodesics. A rigorous analysis of the
relates sums over eigenvalues of the Laplacians to fluctuations of multiplicities is given by Peter (cf.
sums over lengths of closed geodesics on the Bogomolny (2006)). Rudnick (2005) has recently
hyperbolic surface. The trace formula is in its established a smoothed version of Conjecture 2 in the
simplest form in the case of compact hyperbolic regime
surfaces; we have p p
X X
Z ! 1; !0 22
X1
AreaM 1 L L log X
hj h tanh d
j0
4 1 where the characteristic function in [20] is replaced
XX
1
gn by a certain class of smooth test functions.
18 All of the above approaches use the Selberg trace
2 sinhn =2
2H
n1 formula, exploiting the particular properties of the
216 Arithmetic Quantum Chaos
distribution of lengths of closed geodesics in exponential degeneracy in the length spectrum seems
arithmetic hyperbolic surfaces. These will be dis- to occur in a weaker form also for nonarithmetic
cussed in more detail in the next section, following surfaces.
the work of Bogomolny, Georgeot, Giannoni and A further useful property of the length spectrum
Schmit, Bolte, and Luo and Sarnak (see Bogomolny of arithmetic surfaces is the bounded clustering
et al. (1997) and Sarnak (1995) for references). property: there is a constant C (again surface
dependent) such that
#L \ ; 1 C 28
Distribution of Lengths of Closed
for all . This fact is evident in the case of the
Geodesics
modular surface; the general case is proved by Luo
The classical prime geodesic theorem asserts that the and Sarnak (cf. Sarnak (1995)).
number N() of primitive closed geodesics of length
less than is asymptotically
Quantum Unique Ergodicity
e
N 23 The unit tangent bundle of a hyperbolic surface nH
describes the physical phase space on which the
One of the significant geometrical characteristics of classical dynamics takes place. A convenient para-
arithmetic hyperbolic surfaces is that the number of metrization of the unit tangent bundle is given by
closed geodesics with the same length grows the quotient nPSL(2, R this may be seen be means
exponentially with . This phenomenon is most of the Iwasawa decomposition for an element
easily explained in the case of the modular surface, g 2 PSL(2, R),
where the set of lengths appearing in the lengths ! !
spectrum is characterized by the condition 1 x y1=2 0
g
2 cosh=2 jtr j 24 0 1 0 y1=2
!
where runs over all elements in SL(2, Z) with cos =2 sin =2
29
jtrj > 2. It is not hard to see that any integer n > 2 sin =2 cos =2
appears in the set {jtr j: 2 SL(2, Z)}, and hence
the set of distinct lengths of closed geodesics is where x iy 2 H represents the position of the
particle in nH in half-plane coordinates, and 2
L f2 arcoshn=2: n 3; 4; 5; . . .g 25
[0, 2) the direction of its velocity.
Multiplying the
Therefore, the number of distinct lengths less than matrix [29] from the left by ac db and writing the
is asymptotically (for large ) result again in the Iwasawa form [29], one obtains
the action
N 0 #L \ 0; e=2 26
az b
Equations [26] and [23] say that on average the z;
7! ; 2 argcz d 30
cz d
number of geodesics with the same lengths is at least
}e=2 =. which represents precisely the geometric action of
The prime geodesic theorem [23] holds equally for isometries on the unit tangent bundle.
all hyperbolic surfaces with finite area, while [26] is The geodesic flow t on nPSL(2, R) is repre-
specific to the modular surface. For general arith- sented by the right translation
metic surfaces, we have the upper bound
t et=2 0
: g 7! g 31
N 0 ce=2 27 0 et=2
for some constant c > 0 that may depend on the The Haar measure on PSL(2, R) is thus trivially
surface. Although one expects N 0 () to be asympto- invariant under the geodesic flow. It is well known
tic to (1=2)N() for generic surfaces (since most that is not the only invariant measure, that is, t is
geodesics have a time-reversal partner which thus not uniquely ergodic, and that there is in fact an
has the same length, and otherwise all lengths are abundance of invariant measures. The simplest
distinct), there are examples of nonarithmetic Hecke examples are those with uniform mass on one, or a
triangles where numerical and heuristic arguments countable collection of, closed geodesics.
suggest N 0 () c1 ec2 = for suitable constants c1 > 0 To test the distribution of an eigenfunction
and 0 < c2 < 1=2 (cf. Bogomolny (2006)). Hence j in phase space, one associates with a function
Arithmetic Quantum Chaos 217
Lindenstrauss has proved this conjecture for The set Mn of matrices with integer coefficients and
compact arithmetic hyperbolic surfaces of congru- determinant n can be expressed as the disjoint union
ence type (such as the second example in the section [ [ a b
n d1
Hyperbolic surfaces) for special bases of eigen- Mn 36
0 d
functions, using ergodic-theoretic methods. These a;d1 b0
adn
will be discussed in more detail in the next section.
His results extend to the noncompact case, that is, to and hence the sum in [35] can be viewed as a sum
the modular surface where = PSL(2, Z). Here he over the cosets in this decomposition. We note the
shows that any weak limit of subsequences of j is product formula
of the form c, where c is a constant with values in X
[0, 1]. One believes that c = 1, but with present Tm Tn Tmn=d2 37
djgcdm;n
techniques it cannot be ruled out that a proportion
of the mass of the eigenfunction escapes into the The Hecke operators are normal, form a com-
noncompact cusp of the surface. For the modular muting family, and in addition they commute with
surface, c = 1 can be proved under the assumption of the Laplacian . In the following, we consider an
the generalized Riemann hypothesis (see the section orthonormal basis of eigenfunctions j of that
Eigenfunctions and L-functions and Sarnak are simultaneously eigenfunctions of all Hecke
(2003)). QUE also holds for the continuous part of operators. We will refer to such eigenfunctions as
the spectrum, which is furnished by the Eisenstein Hecke eigenfunctions. The above assumption is
series E(z, s), where s = 1=2 ir is the spectral automatically satisfied, if the spectrum of is
parameter. Note that the measures associated with simple (i.e., no eigenvalues coincide), a property
the matrix elements conjectured by Cartier and supported by numerical
computations. Lindenstrauss work is based on the
r a hOpaE; 1=2 ir; E; 1=2 iri 33 following two observations. Firstly, all quantum
are not probability measures but only Radon limits of Hecke eigenfunctions are geodesic-flow
measures, since E(z, s) is not square-integrable. Luo invariant measures of positive entropy, and sec-
and Sarnak, and Jakobson have shown that ondly, the only such measure of positive entropy
that is recurrent under Hecke correspondences is
r a a the Lebesgue measure.
lim 34 The first property is proved by Bourgain and
r ! 1 r b b
Lindenstrauss (2003) and refines arguments of
for suitable test functions a, b 2 C1 (nPSL(2, R)) Rudnick and Sarnak (1994) and Wolpert (2001) on
(cf. Sarnak (2003)). the distribution of Hecke points (see Sarnak (2003) for
218 Arithmetic Quantum Chaos
references to these papers). For a given point z 2 H holds, it is inessential for the proof of QUE due to
the set of Hecke points is defined as the positive entropy of quantum limits discussed in
the previous paragraph.
Tn z : Mn z 38
For most primes, the set Tpk (z) comprises (p 1)
pk1 distinct points on nH. For each z, the Hecke Eigenfunctions and L-Functions
operator Tn may now be interpreted as the
adjacency matrix for a finite graph embedded in An even eigenfunction j (z) for = SL(2, Z) has the
nH, whose vertices are the Hecke points Tn (z). Fourier expansion
Hecke eigenfunctions j with X
1
j z aj ny1=2 Kij 2ny cos2nx 41
Tn j j nj 39
n1
give rise to eigenfunctions of the adjacency matrix. We associate with j (z) the Dirichlet series
Exploiting this fact, Bourgain and Lindenstrauss
show that for a large set of integers n X
1
Ls; j aj nns 42
X
jj zj2 jj wj2 40 n1
A quantization of A is a unitary operator UN (A) Graffi, and Isola (1995). That is, [65] holds for all
on L2 (Z=NZ) satisfying the equation j = 1, . . . , N. Rudnick and Kurlberg, and more
recently Gurevich and Hadani, have established
UN A1 OpN f UN A OpN f A 59 results on the rate of convergence analogous to
1 2
for all f 2 C (T ). There are explicit formulas for [49]. These results are unconditional. Gurevich and
UN (A) when A is in the group Hadani use methods from algebraic geometry based
on those developed by Deligne in his proof of the
a b Weil conjectures (an analog of the Riemann hypoth-
2 SL2; Z: ab cd 0 mod 2 60
c d esis for finite fields).
In the case of quantum-cat maps, there are values
These may be viewed as analogs of the ShaleWeil of N for which the number of coinciding eigenvalues
or metaplectic representation for SL(2). for example, can be large, a major difference to what is expected
the quantization of for the modular surface. Linear combinations of
eigenstates with the same eigenvalue are as well
2 1
A 61 eigenstates, and may lead to different quantum
3 2
limits. Indeed, Faure, Nonnenmacher, and De Bievre
yields (see De Bievre (to appear)) have shown that there
X are subsequences of values of N, so that, for all
1=2 2i 2 f 2 C1 (T2 ),
UN A Q N exp Q
Q0 mod N
N Z
1 1
hOpf Nj ; Nj i ! f d f 0 66
0
QQ Q 02
Q0 62 2 T2 2
Lindenstrauss E Rigidity of multi-parameter actions. Israel (Barcelona, 2000), Progr. Math., vol. 202, pp. 429437.
Journal of Mathematics (Furstenberg Special Volume) (to Basel: Birkhauser.
appear). Rudnick Z (2005) A central limit theorem for the spectrum of the
Marklof J (2006) Energy level statistics, lattice point problems and modular group, Park city lectures. Annales Henri Poincare 6:
almost modular functions. In: Cartier PE, Julia B, Moussa P, 863883.
and Vanhove P (eds.) Frontiers in Number Theory, Physics and Sarnak P Arithmetic quantum chaos. The Schur lectures (1992)
Geometry on Random Matrices, Zeta Functions, and Dynami- (Tel Aviv), Israel Math. Conf. Proc., 8, pp. 183236. Bar-Ilan
cal Systems, Springer Lecture Notes. Les Houches. Univ., Ramat Gan, 1995.
Rudnick Z (2001) On quantum unique ergodicity for linear maps Sarnak P (2003) Spectra of hyperbolic surfaces. Bulletin of the
of the torus. In: European Congress of Mathematics, American Mathematical Society (N.S.) 40(4): 441478.
This boundary would have to be interpreted as Clearly, the metric is undefined at events with
infinity for the spacetime because it takes infinitely cos U = 0 or cos V = 0. These would correspond to
long for the g-geodesics to get there. events with u = 1 or v = 1 which do not lie in
We arrived at this idea of attaching a boundary by M. However, by defining the function
considering the metric structure only up to arbi-
2 cos U cos V
trary scaling, that is, by looking at metrics which
differ only by a factor. This is the conformal we find that the metric ^g = 2 g with
structure of the spacetime manifold in question. By
considering the spacetime only from the point of ^g 4dU dV sin2 V U d2 3
view of its conformal structure we obtain a picture is conformally equivalent to g and is regular for all
of the spacetime which is essentially finite but which values of U and V (keeping V U). In fact, by
leaves its causal properties unchanged, and hence in defining the coordinates
particular the properties of wave propagation. This
is exactly what is needed for a rigorous treatment of T V U; RV U
radiation emitted by the system. this metric takes the form
^g dT 2 dR2 sin2 R d2 4
Infinity for Minkowski Spacetime the metric of the static Einstein universe E. Thus, we
may regard the Minkowski spacetime as the part of
The above discussion suggests that we should consider the Einstein cylinder defined by restricting the
the spacetime metric only up to scale, that is, coordinates T and R to the region jTj R < as
to focus on the conformal structure of the spacetime illustrated in Figure 1. Although M can be considered
in question. Since we are interested in systems which as being diffeomorphic to the shaded part in Figure 1,
approach Minkowski spacetime at large distances these two manifolds are not isometric. This is obvious
from the source, it is illuminating to study Minkowski from considering the properties of the events lying on
spacetime as a preliminary example. So consider the
manifold M = R4 equipped with the flat metric
2
is the standard metric on the unit sphere S . We now
introduce retarded and advanced time coordinates,
which are adapted to the null cone and hence to the
conformal structure of g by the definition
u t r; vtr
and obtain the metric in the form i0
g du dv 14 v u2 d2
the boundary @M of M in E. Fix a point P inside M Definition 1 A spacetime (M, gab ) is called asymp-
and follow a null geodesic with respect to the metric ^g totically simple if there exists a manifold-with-
from P toward the future. It will intersect @M after a boundary M c with metric ^gab and scalar field on
finite amount of its affine parameter has elapsed. Mc and boundary I = @M such that the following
When we follow a null geodesic with respect to g conditions hold:
from P in the same direction, we find that it does not c M = int M;c
1. M is the interior of M:
reach @M for any value of its affine parameter. Thus, 2
2. ^gab = gab on M;
the boundary is at infinity for the metric g but at a
3. and ^gab are smooth on all of M;
finite location with respect to the metric ^g. When we
4. > 0 on M; = 0, ra 6 0 on I ; and
consider all possible kinds of geodesics for the metric g
5. each null geodesic acquires both future and past
we find that @M consists of five qualitatively different
endpoints on I .
pieces. The future pointing timelike geodesics all
approach the point i given by (T, R) = (, 0), while This definition formalizes the construction which
the past-pointing geodesics approach i with coordi- was explicitly performed above, by which one
nates (, 0). All spacelike geodesics come arbitrarily attaches a regular (nonempty) boundary to a space-
close to a point i0 with coordinates (0, ) (located on time after suitably rescaling its metric. Asymptoti-
the front of the cylinder in Figure 1). Null geodesics, cally simple spacetimes are exactly those for which
however, are different. For any point (T, jTj) with this process of conformal compactification is possi-
T 6 0, on @M there are g-null-geodesics which ble. The purpose of condition 5 is to exclude
come arbitrarily close. pathological cases. There are spacetimes which do
In this sense, we may regard @M as consisting of not satisfy this condition (e.g., the Schwarzschild
limit points obtained by tracing-geodesics for infi- spacetime, where some of the null geodesics enter
nite values of their affine parameters. According to the event horizon and cannot escape to infinity).
the causal character of the geodesics the set of their Yet, one would like to include them as being
respective limit points is called future/past timelike asymptotically simple in a sense, because they
infinity i , spacelike infinity i0 or future/past null- clearly describe isolated systems. For these cases,
infinity, denoted by I . These two parts of null- there exists the notion of weakly asymptotically
infinity are three-dimensional regular submanifolds simple spacetimes.
of the embedding manifold E, while the points i , i0 In order to arrive at asymptotically flat space-
are regular points in E in the sense that the metric ^g times, one needs to make certain assumptions about
is regular there. This is not automatic, considering the behavior of the curvature near the boundary,
the fact that infinitely many geodesics converge to a thus:
single point. However, the flatness of Minkowski
Definition 2 An asymptotically simple spacetime is
spacetime guarantees that the geodesics approach at
called asymptotically flat if its Ricci tensor Ric[g]
just the appropriate rate for the limit points to be
vanishes in a neighborhood of I .
regular.
This example shows that the structure of the Note that this definition imposes a rather strong
boundary is determined entirely by the metric g of restriction on the Ricci curvature; less restrictive
Minkowski spacetime. If we had chosen a different assumptions are possible. This condition applies
function 0 = ! with ! > 0 then we would not only near I . Thus, it is possible to consider
have obtained the Einstein cylinder but some spacetimes which contain matter fields as long as
different Lorentzian manifold (M0 , g0 ). Yet, the these fields do not extend to infinity.
boundary of M in M0 would have had the same Other asymptotically simple spacetimes which are
properties. not asymptotically flat are the de Sitter and anti-de
Sitter spacetimes which are solutions of the Einstein
equations with nonvanishing cosmological constant .
It is a simple consequence of the definition that
Asymptotically Flat Spacetimes
the boundary I is a regular three-dimensional
The physical idea of an isolated system is captured hypersurface of the embedding spacetime M c which
mathematically by an asymptotically flat space- is timelike, spacelike, or null depending on the sign
time. Since such a spacetime M is expected to of . In particular, for the Minkowski spacetime
approach Minkowski spacetime asymptotically, ( = 0) the boundary is necessarily a null hypersur-
the asymptotic structure of M is also expected to face, as noted above.
be similar to that of M. This expectation is The requirement that the vacuum Einstein
expressed in equations hold near I has several important
224 Asymptotic Structure and Conformal Infinity
consequences. First, I is a null hypersurface with In physical terms, the supertranslations arise
the special property of being shear-free. This means because there are infinitely many directions from
that any cross section of a bundle of its null which observers at infinity (whose world lines coincide
generators does not suffer any distortions when with the null generators of I in a certain limit) can
moved along the generators. Only expansion or observe the system and because each observer is free to
contraction can occur. The global structure of I choose its own origin of proper time u. The observers
is the same as the one from the example above. surrounding the system are not synchronized, because
Null infinity consists of two connected components, under the assumptions made there is no natural way to
I , each of which is diffeomorphic to S2 R. Thus, fix a unique common origin. Hence, a supertranslation
topologically, I are cylinders. The cone-like is a shift of the parameter along each null generator of
appearance as seen in Figure 1 is artificial. It I corresponding to a change of origin for each
depends on the particular conformal factor chosen individual observer. It can be given as a map S2 ! R.
for the conformal compactification. Furthermore, it A choice of origin on each null generator of I is
is only in very exceptional cases that the metric ^g is referred to as a cut of I . It is a two-dimensional
regular at i0 or i . surface of spherical topology which intersects each null
The most important consequence, however, con- generator exactly once. It is an open question whether
cerns the conformal Weyl tensor Ca bcd . This is the one can always synchronize the observers by imposing
part of the full Riemann curvature tensor Ra bcd which canonical conditions at i0 or i , thereby reducing the
is trace-free. It is invariant under conformal rescal- BMS group to the smaller Poincare group.
ings of the metric. Thus, on M, Ca bcd = C^ a bcd . When The supertranslations contain a unique four-
the vanishing of the Ricci tensor near I is assumed dimensional normal subgroup. In M these special
then it turns out that the Weyl tensor necessarily supertranslations are the ones which are induced by
vanishes on I . This is the ultimate justification for the translations of Minkowski spacetime in the
calling such manifolds asymptotically flat because the following way. Take the future light cone of some
entire curvature vanishes on I . event P and follow it out to I , where its intersection
defines an origin for each observer located there.
Now consider the light cone of another event Q
Some Consequences obtained from P by a translation in a spatial
direction. Then the light emitted from Q will arrive
There are several consequences of the existence of
at I earlier than that from P for observers in the
the conformal boundary I . They all can be traced
direction of the translation, while it will be delayed
back to the fact that this boundary can be used to
for observers in the opposite direction. This change
separate the geometric fields into a universal back-
in arrival time defines a specific supertranslation.
ground field and dynamical fields which propagate
Similarly, for a translation in a temporal direction,
on it. The background is given by the boundary
the light from Q will arrive later than that from P
points attached to an asymptotically flat spacetime
for all observers. Thus, every translation in M
which always form a three-dimensional null hyper-
defines a particular supertranslation on I . These
surface I with two connected components (in the
can be characterized in a different way, which is
sequel, we restrict our attention to I only; I is
intrinsic to I and which can be used in the general
treated similarly), each with the topology of a
case even though there will be no Killing vectors
cylinder. And in each case, I is shear-free.
present in a general asymptotically flat spacetime. In
an appropriate coordinate system, the asymptotic
The BMS Group
translations are given as linear combinations of the
Since the structure of null-infinity is universal over first four spherical harmonics Y00 , Y10 , Y11 . The
all asymptotically flat spacetimes, it is obvious that space of asymptotic translations T is in a natural
its symmetry group should also possess a universal way isometric to M.
meaning. This group, the so-called BondiMetzner
Sachs (BMS) group is in many respects similar to the
Poincare group, the symmetry group of M. It is the The Peeling Property
semidirect product of the Lorentz group with an c Since it
Now consider the Weyl tensor Ca bcd on M.
abelian group which, however, is not the four-
vanishes on I where = 0 we may form the
dimensional translation group but an infinite-dimen-
quotient
sional group of supertranslations. This group is a
normal subgroup, so the factor group is isomorphic
to the Lorentz group. Ka bcd 1 Ca bcd
Asymptotic Structure and Conformal Infinity 225
which can be shown to be smooth on I . The The quantity in brackets, the mass aspect, is a
physical interpretation of this tensor field is based combination of the scalar 2 which in a sense
on the following properties. In source-free regions measures the strength of the Coulomb-like part of
the field satisfies the spin-2 zero-rest-mass equation the gravitational field on I and the complex
quantity . In a so-called Bondi coordinate system,
ba Ka bcd 0
r this quantity is related to the radiation field 4 by
which is very similar to the Maxwell equations for the relation
the electromagnetic (spin-1) Faraday tensor. Thus,
4
Ka bcd is interpreted as the gravitational field, which
describes the gravitational waves contained inside the dot indicating differentiation with respect to the
the system. The zero-rest-mass equation for Ka bcd affine parameter along the null generators. Thus,
and the fact that the field is smooth on I implies that is essentially the second time integral of the
the Weyl tensor satisfies the peeling property. This radiation field. The mass aspect is integrated against
is a characteristic conspiracy between the fall-off a function W which is an asymptotic translation,
behavior of certain components of the Weyl tensor that is, a linear combination of the first four
along outgoing g-null-geodesics approaching I in spherical harmonics. Thus, one can view the
M with respect to an affine parameter s for s ! 1 expression [6] as defining a linear map T ! R.
and their algebraic type. Symbolically, the Weyl Since T and M are isometric this defines a covector
tensor has the following behavior as s ! 1 along Pa on M, which can always be shown to be timelike,
the null geodesic: Pa Pa 0. This positivity property together with the
fact that in the special cases of Schwarzschild and
4 31 211 1111 Kerr spacetimes the integral yields the mass para-
C 2 3 Os5 5
s s s s4 meters when evaluated for a time translation
where the numerator of each component indicates (W = 1) motivates the interpretation of PC as the
its Petrov type. The repeated principal null direction energymomentum 4-vector of the spacetime at the
(PND) in the first three components and one of the instant defined by the cut C. In particular, for W = 1
PNDs in the fourth component are aligned with the the integral gives the time component of PC , the
tangent vector of the geodesic. This implies that BondiSachs energy E.
the farthest reaching component of the Weyl tensor, The interpretation of [6] as energymomentum is
which is O(1=s), has the Petrov type of a radiation strengthened by the fact that PC arises as dual to the
field. It is customary to combine the components translations which is familiar from Lagrangian field
which are O(1=si ) into one complex function and theories where energy and momentum appear as
denote it by 5i . When expressed in terms of the generators for time and space translations. In fact,
c this fall-off behavior implies that
field Ka bcd on M, one can set up a Hamiltonian framework where the
of all components of Ka bcd only 4 does not role of the BondiSachs energymomentum as
necessarily vanish on I . generator of asymptotic translations is made
In special cases like the Minkowski, Schwarzs- explicit.
child, Kerr, and more generally in all asymptotically This point of view suggests that one should also
flat stationary spacetimes, even 4 vanishes on I . be able to define a notion of angular momentum for
For these reasons, 4 is called the radiation field of asymptotically flat spacetimes because angular
the system, that is, that part of the gravitational field momentum arises as the generator of rotations,
which can be registered by the observers at infinity. which can also be defined asymptotically. However,
It describes the outgoing radiation which is being while there is a unique notion of translation on I ,
emitted by the system during its evolution. this is not the case for rotations (and boosts). The
reason is hidden in the structure of the BMS group
where the Lorentz group appears naturally as a
The BondiSachs Mass-Loss Formula factor group but not as a unique subgroup. In
Gravitational waves carry away energy from the physical terms, the angular momentum depends on
system. This is a consequence of the BondiSachs an origin but there is no natural way to choose an
mass-loss formula. The BondiSachs energy origin on I . This ambiguity in the choice of origin
momentum is related to a weighted integral over a leads to several nonequivalent expressions for
cut C, angular momentum in the literature.
Z Consider now two cuts C and C0 , with C0 later than
1 C. Then we may compute the difference E = E E0
PC W W 2 _ d2 S 6
4G C of the BondiSachs energies with respect to the two
226 Averaging Methods
cuts. It turns out that this difference can be the neighborhood of spacelike infinity i0 is not
expressed as an integral over the (three-dimensional) sufficiently well understood so far.
piece of I which is bounded by the two cuts
(i.e., @ = C0 C): See also: Black Hole Mechanics; Boundaries for
Z Spacetimes; Canonical General Relativity; Einstein
1 Equations: Exact Solutions; Einstein Equations: Initial
0
E E _ _ d3 V 7
4G Value Formulation; General Relativity: Overview;
Gravitational Waves; Quantum Entropy; Spacetime
This result means that the BondiSachs energy of the Topology, Causal Structure and Singularities; Stability of
system decreases, since E0 < E and the rate of Minkowski Space; Stationary Black Holes.
decrease is given by the (positive-definite) amount
of gravitational radiation which leaves the system
during the period defined by the two cuts. Further Reading
It is necessary to point out that in this article the Ashtekar A (1987) Asymptotic Quantization. Naples: Bibliopolis.
structure of null infinity has been postulated based Bondi H, van der Burg MGJ, and Metzner AWK (1962)
on physical reasonings. The Einstein equations have Gravitational waves in general relativity VII. Waves from
been used only in a very weak sense, namely only in axi-symmetric isolated systems. Proceedings of the Royal
a neighborhood of I . It is an entirely different Society of London, Series A 269: 2152.
Frauendiener J (2004) Conformal infinity. Living Reviews in
question whether the field equations are compatible Relativity, vol. 3. http://relativity.livingreviews.org/Articles/
with this postulated structure. To answer it, one lrr-2004-1/index.html.
needs to show that there are global solutions of the Friedrich H (1992) Asymptotic structure of space-time. In: Janis AI
Einstein equations which exhibit the postulated and Porter JR (eds.) Recent Advances in General Relativity.
behavior in the asymptotic region. This question Boston: Birkhauser.
Friedrich H (1998a) Einsteins equation and conformal structure.
has been settled recently in the affirmative: there are In: Huggett SA, Mason LJ, Tod KP, Tsou SS, and Woodhouse
many global spacetimes which are asymptotically NMJ (eds.) The Geometric Universe: Science, Geometry and
flat in the sense described here. the Work of Roger Penrose. Oxford: Oxford University Press.
This article discussed has the notion of null Friedrich H (1998b) Gravitational fields near space-like and null
infinity, that is, of spacetimes which are asymptoti- infinity. Journal of Geometry and Physics 24: 83163.
Geroch R (1977) Asymptotic structure of space-time. In: Esposito
cally flat in lightlike directions. Spacetimes which FP and Witten L (eds.) Asymptotic Structure of Space-Time.
are asymptotically flat in spacelike directions have New York: Plenum.
not been covered. The latter is a notion which has Hawking S and Ellis GFR (1973) The Large Scale Structure of
been developed largely independently of null infinity Space-Time. Cambridge: Cambridge University Press.
Penrose R (1965) Zero rest-mass fields including gravitation:
since it is essentially a property of an initial data set
asymptotic behaviour. Proceedings of the Royal Society of
and not of the entire four-dimensional spacetime. London, Series A 284: 159203.
Ultimately, these two notions should coincide, in the Penrose R (1968) Structure of space-time. In: DeWitt CM
sense that if one has an initial data set which is and Wheeler JA (eds.) Battelle Rencontres. New York:
asymptotically flat in spatial directions in an appro- W. A. Benjamin.
priate sense then its Cauchy development will be an Penrose R and Rindler W (1984, 1986) Spinors and Space-Time,
Cambridge: Cambridge University Press.
asymptotically flat spacetime. However, as of yet, it Sachs RK (1962) Gravitational waves in general relativity VIII.
is not clear what the appropriate conditions should Waves in asymptotically flat space-time. Proceedings of the
be because the structure of the gravitational field in Royal Society of London, Series A 270: 103127.
Averaging Methods
A I Neishtadt, Russian Academy of Sciences, fast oscillations. The most common field of applica-
Moscow, Russia tions of averaging methods is the analysis of the
2006 Elsevier Ltd. All rights reserved. behavior of dynamical systems that differ from
integrable systems by small perturbations.
I_ "gI; ; "; _ !I "f I; ; " the first r vectors of which belong to . Instead of
n , one can introduce new variables:
I I1 ; . . . ; In 2 R 1
m
1 ; . . . ; m 2 T modd 2; 0 < " 1 # #1 ; . . . ; #r 2 T r modd 2
1 ; . . . ; mr 2 T mr modd 2
The small parameter " characterizes the amplitude
#i ki ; ; j krj ;
of the perturbation. For " = 0 one gets the
unperturbed system. The equation I = const. sin- Let R be an r m matrix whose rows are vectors
gles out an invariant m-dimensional torus of the k(i) , 1 i r. For an approximate description of the
unperturbed system. The motion on this torus is behavior of variables I, #, the averaging principle
quasiperiodic with frequency vector !(I); compo- prescribes replacing system [1] by the system
nents of vector I are called slow variables
whereas components of vector are called fast J_ "G J; ; _ R! J "RF J;
variables or phases. The right-hand sides of I
mr
system [1] are 2-periodic with respect to all j . It G J; # 2 g J; ; 0 d
T mr
3
is assumed that they are smooth enough functions I
of all arguments. It is also assumed that compo-
F J; # 2mr f J; ; 0 d
nents of the frequency vector are not linearly T mr
dependent over the ring of integer numbers
(one should express g, f through #, and then
identically with respect to I. System [1] is called
integrate over , d = d1 dmr ). System [3] is
a system with rotating phases.
called partially averaged system for resonances in
In applications, one is often interested mainly in
. Functions G , F can be obtained from Fourier
the behavior of slow variables. The averaging
series expansions of functions g, f for " = 0
principle (or method) consists in replacing the
by throwing away harmonics exp (i(k, )), k 2 =
system of perturbed equations [1] by the averaged
(nonresonant harmonics). Passing from system [1]
system
to system [3] is based on the idea that the ignored
I nonresonant harmonics oscillate fast and do not
J_ "G J; G J 2m g J; ; 0 d 2 affect essentially the evolution of the slow variables.
Tm Now let system [1] be a Hamiltonian system close
to an integrable one. The Hamiltonian function has
for the purpose of providing an approximate
the form
description of the evolution of the slow variables
over time intervals of order 1=" or longer. Here, H H0 p "H1 p; ; y; x; "
d = d1 dm . System [2] contains only slow
where , x are coordinates and p, y are conjugated
variables and, therefore, is much simpler for
to them. The equations of motion have the same
investigation than system [1]. When passing from
form as [1], with I = (p, y, x):
system [1] to system [2], one ignores the terms
g(I, , 0) G(I) on the right-hand side of [1]. The @H1 @H1
p_ " ; y_ "
averaging principle is based on the idea that these @ @x
terms oscillate and lead only to small oscillations 4
@H1 @H0 @H1
which are superimposed on the drift described by x_ " ; _ "
@y @I @I
the averaged system. To justify the averaging
principle, one should establish a relation between The averaging principle in the case when there are
the behavior of the solutions of systems [1] and [2]. no resonant relations leads to the system
This problem is still far from being completely @H1 @H1
solved. p_ 0; y_ " ; x_ "
@x @y
Another version of the averaging principles is I 5
used in the case when frequencies are approxi- H1 2m H1 p; ; y; x; 0 d
mately in resonance. This means that one or Tm
several relations of the form (k, !) = 0 approxi- Therefore, in this case there is no drift in p, and the
mately are valid with irreducible integer coefficient behavior of y, x is described by the Hamiltonian
vectors k 6 0; here, (k, !) is the standard scalar system, which contains p as a parameter. Equations
product in R m . Let be a sublattice of the integer of motion of planets around the Sun can be reduced
lattice Zm generated by these vectors. Let to the form [4]. The issue of the absence of the
r = rank and k(1) , k(2) , . . . , k(m) be a basis in Zm , evolution of momenta p is known in this problem as
228 Averaging Methods
the LagrangeLaplace theorem, about the absence of where gk , k 2 Zm , are Fourier coefficients of func-
the evolution of semimajor axes of planetary orbits. tion g at " = 0, and u01 is an arbitrary function of J. It
is assumed that the denominators in [9] do not
vanish, and that the series in [9] converges and
determines a smooth function. In the same way,
Elimination of Fast Variables, Decoupling
from the other equations in [8] one can sequentially
of Slow and Fast Motions determine F0 , v1 , . . . , Gi , u i1 , Fi , v i1 , i 1.
The basic role in the averaging method is played by On truncating the series in [6] and [7] at the terms
the idea that the exact system can be in the principal of order " l , we obtain a truncated system of the lth
approximation transformed into the averaged sys- approximation. The equation for J is decoupled
tem by means of a transformation of variables close from the other equations and can be solved
to the identical one. The extension of this idea is the separately. Then the behavior of is determined
idea that similar transformation of variables allows by means of quadrature. The behavior of original
one to eliminate, up to an arbitrary degree of variable I in this approximation is a slow drift
accuracy, the fast phases from the right-hand sides (described by the equation for J), on which small
of the equations of perturbed motion and in this oscillations (described by transformation of variables)
way decouple the slow motion from the fast one. are superimposed. The behavior of can be repre-
For system [1], provided there are no resonant sented as a rotation with slowly varying frequency,
relations between frequencies, the elimination of fast on which oscillations are also superimposed. For l = 1,
variables is performed as follows. The desirable the truncated system coincides with the averaged
transformation of variables (I, ) 7! (J, ) is sought system [2].
as a formal series If the sublattice Zm specifying possible
resonant relations is given, then in an analogous
I J "u1 J; "2 u2 J; manner one can construct a formal transformation
6
"v1 J; "2 v2 J; of variables (I, ) 7! (J, ) such that, in the new
variables, the fast phase will appear on the right-
where functions uj , vj are 2-periodic in . The hand sides of the differential equations for the new
transformation [6] should be chosen in such a way variables only in combinations (k, ), with k 2
that in the new variables the right-hand sides of (see, e.g., Arnold et al. (1988)). Again, on truncat-
equations of motion do not contain fast variables, ing the series on the right-hand sides of the
that is, the equations of motion should have the differential equations for the new variables at the
form terms of order " l , we obtain a truncated system of
J_ "G0 J "2 G1 J the lth approximation. At l = 1, this truncated
7 system coincides with the partially averaged system
_ ! J "F0 J "2 F1 J
[3] (for some special choice of arbitrary functions
Substituting [6] into [7], taking into account [1], and that are contained in the formulas for transformation
equating the terms of the same order in ", we obtain of variables). If the original system is a Hamiltonian
the following set of relations: system of the form [4], then the transformation of
variables eliminating the fast phases from the right-
@u1 hand sides of the differential equations can be
G0 J g J; ; 0 !
@ chosen to be symplectic. The corresponding
@! @v1 procedures are called Lindstedt method and
F0 J f J; ; 0 u1 !
@J @ Newcomb method (nonresonant case for n = m),
8 Delaunay method (resonant case for n = m), and
@ui1
Gi J Xi J; ! von Zeipel method (resonant case for n m) (see
@
Poincare (1957) and Arnold et al. (1988)).
@! @vi1
Fi J Yi J; ui1 !; i1 The calculation of high-order terms in the
@J @ procedures of elimination of fast variables is rather
The functions Xi , Yi are uniquely determined by the cumbersome. There are versions of these procedures
terms u1 , v1 , . . . , ui , vi in expansion [6]. The first which are convenient for symbolic processors
equation in [8] implies that (especially for Hamiltonian systems, e.g., the
DepritHori method; Giacaglia 1972).
G0 J g0 J G J The averaging method consists in using the
X gk 9
u1 J; expik; u01 J averaged system for the description of motion in
k60
ik; ! the first approximation and the truncated systems
Averaging Methods 229
obtained by means of the procedures of elimination time intervals of order 1=" (Bogolyubov and
of fast variables in the higher approximations, Mitropolskii 1961).
together with the corresponding transformations of If system [1] is a multifrequency system (m 2), but
variables. the vector of frequencies is constant and nonresonant,
then for any > 0 and small enough " < "0 () it holds
that jI(t) J(t)j < for 0 t K=" (Bogolyubov
Justification of the Averaging Method
1945, Bogolyubov and Mitropolskii 1961). If, in
To justify the averaging method, one should estab- addition, the frequencies satisfy the Diophantine
lish conditions under which the deviation of the condition j(k, !)j > const jkj for all k 2 Zm n{0}
slow variables along the solutions of the exact and some > 0, then one can choose = O("). In
system from the solutions of the averaged system this case, higher approximations of the procedure of
with appropriate initial data on time intervals of elimination of fast variables allow one to describe
order 1=" or longer tends to 0 as " ! 0. It is the dynamics with an accuracy of the order of any
desirable to have estimates from the above for these power in " on time intervals of order 1=" (see, e.g.,
deviations. The estimates of deviations of the Arnold et al. (1988)).
solutions of the exact system from the solutions of If the system is a multifrequency system, and
the truncated systems obtained by means of the frequencies are not constant (but depend on the slow
procedure of elimination of fast phases are impor- variables I), then due to the evolution of slow
tant as well. It can happen that there are bad variables the frequencies themselves are evolving
initial data for which the slow component of the slowly. At certain time moments, they can satisfy
solution of the exact system deviates from the certain resonant relations. One of the phenomena
solution of the averaged system by a value of order that can take place here is a capture into a
1 over time of order 1=". In this case, one should resonance; this capture leads to a large deviation of
have estimates from above for the measure of the set the solutions of the exact and averaged systems.
of such bad initial data; on the complementary set However, the general Anosov averaging theorem
of initial data, one should have estimates from (Anosov 1960) implies that if the frequencies ! are
above for the deviation of slow variables along the nonresonant for almost all I, then for any > 0, the
solutions of the exact system from the solution of inequality jI(t) J(t)j < is satisfied for 0 t K="
the averaged system. These problems are currently for all initial data outside a set E(, ") whose
far from being completely solved. Some general measure tends to 0 as " ! 0.pIn many cases, it
results are described in the following. turns out that mes E(, ") = O( "=) (in particular,
Let functions !, f , g on the right-hand side of the sufficient condition for the last estimate is that
system [1] be defined and bounded together with a rank(@!=@I) = m) (Arnold et al. (1988)).
sufficient number of derivatives in the domain D{I} The knowledge about averaging in two-
T m {} [0, "0 ]. Let J(t) be the solution of the frequency systems (m = 2) on time intervals, of order
averaged system [2] with initial condition I0 2 D. of 1=", is relatively more complete (see Arnold
Let (I(t), (t)) be the solution of the exact system [1] (1983), Arnold et al. (1988), and Lochak and
with initial conditions (I0 , 0 ). So, I(0) = J(0). It is Meunier (1988)). For Hamiltonian and reversible
assumed that the solution J(t) is defined and stays at systems, the justification of the averaging method is
a positive distance from the boundary of D on the a by-product of KolmogorovArnoldMoser (KAM)
time interval 0 t K=", K = const > 0. theory. The KAM theory provides estimates of the
If system [1] is a one-frequency system (m = 1), difference between the solutions of the exact and
and the frequency ! does not vanish in D, then for averaged systems for majority of initial data on
0 t K=" the solution (I(t), (t)) is well defined, infinite time interval 1 < t < 1. For remaining
and jI(t) J(t)j < C", C = const. > 0. For ! = 1, this data this difference can grow because of Arnold
assertion was proved by P Fatou (1928) and, by a diffusion, but, in general, very slowly. According to
different method, by L I Mandelshtam and L D the Nekhoroshev theorem, this difference is small on
Papaleksi (1934). This was historically the time intervals whose length grows exponentially when
first result on the justification of the averaging the perturbation decays linearly (for an analytic
method (Mintropolskii 1971). There is a proof Hamiltonian if the unperturbed Hamiltonian is a
based on the elimination of fast variables (see, e.g., generic function, the so-called steep function).
Arnold (1983)). For a one-frequency system, higher Another aspect of justification of the averaging
approximations of the procedure of elimination of method is establishing relations between invariant
fast variables allow the description of the dynamics manifolds of the exact and averaged systems.
with an accuracy of the order of any power in " on Consider, in particular, the case of a one-frequency
230 Averaging Methods
system and a multifrequency system with constant first theorem, if (t), 0 t K=", is a solution of
Diophantine frequencies. Suppose that the averaged the averaged system, and x(t) is a solution of the
system has an equilibrium such that real parts of all exact system with initial condition x(0) = (0), then
its eigenvalues are different from 0, or a limit cycle for any > 0 there exists "0 () > 0 such that
such that the absolute values of all but one of its jx(t) (t)j < for 0 t K=" and 0 < " < "0 ().
multipliers are different from 1. Then the exact The second and the third Bogolyubov theorems
system has an invariant torus, respectively, m- or describe the motion in the neighborhoods of
(m 1)-dimensional, whose projection onto the equilibria and the limit cycles of the averaged
space of the slow variables is O(")-close to the system. In particular, if for an equilibrium real
equilibrium (cycle) of the averaged system. This parts of all its eigenvalues are different from 0, or,
torus is stable or unstable together with the for a limit cycle, the absolute values of all but one
equilibrium (cycle) of the averaged system. For multipliers are different from 1, then the exact
Hamiltonian and reversible systems, the problem of system has a solution which eternally stays near
invariant manifolds is considered in the framework this equilibrium (cycle). The stability properties of
of the KAM theory. this solution are the same as the stability properties
of the corresponding equilibrium (cycle) of the
averaged system.
Averaging in Bogolyubovs Systems For systems of the form [10] a procedure exists
that, similarly to the procedure in the section
Systems in the standard form of Bogolyubov (1945)
Elimination of fast variables, decoupling of slow
are of the form
and fast motions, allows us to eliminate time t
x_ "Xt; x; "; x 2 R p; 0 < " 1 10 from the right-hand side of the system with an
accuracy of the order of any power in " by means of
It is assumed that the function X, besides the usual a transformation of variables. (To perform this
smoothness conditions, satisfies the condition of procedure, one should assume that the conditions
uniform average: the limit (time average) of uniform average are satisfied for functions
that arise in the process of constructing higher
Z T
1 approximations in this procedure (Bogolyubuv and
X0 x lim Xt; x; 0 dt 11
T!1 T 0
Mitropolskii 1961).) In the first approximation,
such a transformation of variables transforms the
exists uniformly in x. The averaging principle of original system into the averaged one.
Bogolyubov consists of the replacement of the The condition of uniform average is very impor-
original system in standard form by the averaged tant for theory. If the limit in [11] exists, but
system convergence is nonuniform in x, then the time
average X0 could be, for example, a discontinuous
_ " X0 12
function of x, and the averaged system would not be
with a goal to provide an approximate description well defined.
of the behavior of x. This approach generalizes the
approach of the section Averaging principle for
the case of constant frequencies (! = const). Upon
Averaging in SlowFast Systems
introducing in the given system with constant
frequencies the deviation from uniform rotation Systems of the form [1] are particular cases of the
= !t and denoting x = (I, ), we obtain a systems of the form
system in the standard form [10]. Here the condition
x_ f x; y; "; y_ "gx; y; " 13
of uniform average is fulfilled because X(t, x, 0) is a
quasiperiodic function of time t. The averaged which are called slowfast systems (or systems
system [12] for nonresonant frequencies coincides with slow and fast motions, with slow and fast
with the averaged system [2]; for resonant frequen- variables). The generalization of the approach of the
cies, it coincides with the partially averaged system section Averaging principle for these systems is
[3] (one should only supply systems [2] and [3] with the following averaging principle of Anosov (1960).
equations for some components of the vector !t In the system [6], let x 2 M, y 2 Rn , where M is a
that do not enter into the right-hand side of the smooth compact m-dimensional manifold. At " = 0,
averaged system). the system for fast variables x contains slow
The averaging principle of Bogolyubov is justified variables y as parameters. Assume that this system
by three Bogolyubov theorems. According to the (which is called fast system) has a finite smooth
Averaging Methods 231
invariant measure
y and is ergodic for almost all
values of y. Introduce the averaged system
Z See also: Central Manifolds, Normal Forms;
_Y " GY; GY 1 gx; Y; 0d
Y Diagrammatic Techniques in Perturbation Theory;
Y M M Hamiltonian Systems: Stability and Instability Theory;
KAM Theory and Celestial Mechanics; Multiscale
According to the averaging principle, one should use
Approaches; Random Walks in Random Environments;
the solution Y(t) of the averaged system with initial
Separatrix Splitting; Stability Problems in Celestial
condition Y(0) = y(0) for approximate description of Mechanics; Stability Theory and KAM.
slow motion y(t) in the original system. This
averaging principle is justified by the following
Anosov theorem [1]: for any positive the measure Further Reading
of the set E(, ") of initial data (from a compact in Anosov DV (1960) Averaging in systems of ordinary differential
the phase space) such that equations with rapidly oscillating solutions. Izvestiya Akade-
mii Nauk SSSR, Ser. Mat. 24(5): 721742 (Russian).
max j yt Ytj > Arnold VI (1983) Geometrical Methods in the Theory
0 t 1="
of Ordinary Differential Equations. New YorkBerlin:
tends to 0 as " ! 0. Springer.
The particular case when the original system is Arnold VI, Kozlov VV, and Neishtadt AI (1988) Mathematical
Aspects of Classical and Celestial Mechanics, Encyclopaedia
a Hamiltonian system depending on slowly vary-
of Mathematical Sciences, vol. 3. Berlin: Springer.
ing parameter = "t, and for almost all values of Bakhtin VI (2004) Cramer asymptotics in the averaging method
the motion of the system with = const is for systems with fast hyperbolic motions. Proceedings of the
ergodic on almost all energy levels, is considered Steklov Institute of Mathematics 244(1): 79.
in Kasuga (1961). Bogolyubov NN (1945) On some statistical methods in mathe-
matical physics. Akad. Nauk USSR. Lvov (Russian).
For the case when the has strong mixing proper- Bogolyubov NN and Mitropolskii YuA (1961) Asymptotic
ties, see Bakhtin (2004) and Kifer (2004). Methods in the Theory of Nonlinear Oscillations. New York:
For slowfast systems, there is also a general- Gordon and Breach.
ization of approach of the previous section that uses Giacaglia GEO (1972) Perturbation Methods in Nonlinear
time averaging and the condition of uniform average Systems, Applied Mathematical Science, vol. 8. Berlin: Springer.
Kasuga T (1961) On the adiabatic theorem for the
(Volosov 1962).
Hamiltonian system of differential equations in the classical
mechanics I, II, III. Proceedings of the Japan Academy 37(7):
366382.
Kevorkian J and Cole JD (1996) Multiple Scale and Singular
Applications of the Averaging Method Perturbations Methods, Applied Mathematical Sciences,
vol. 114. New York: Springer.
The averaging method is one of the most productive
Kifer Y (2004) Some recent advances in averaging. In: Modern
methods of perturbation theory, and its applications Dynamical Systems and Applications, 403. Cambridge:
are immense. It is widely used in celestial mechanics Cambridge University Press.
and space flight dynamics for the description of the Lochak P and Meunier P (1988) Multiphase Averaging for
evolution of motions of celestial bodies, in plasma Classical Systems, Applied Mathematical Sciences, vol. 72.
New York: Springer.
physics and theory of accelerators for description of Mitropolskii YuA (1971) Averaging Method in Nonlinear
motion of charged particles, and in radio engineer- Mechanics. Kiev: Naukova Dumka (Russian).
ing for the description of nonlinear oscillatory Poincare H (1957) Les Methodes Nouvelles de la Mecanique
regimes. There are also applications in hydrody- Celeste, vols. 13. New York: Dover.
namics, physics of lasers, optics, acoustics, etc. (see Sanders JA and Verhulst F (1985) Averaging Methods in
Nonlinear Dynamical Systems, Applied Mathematical
Arnold et al. (1988), Bogolyubov and Mitropolskii
Sciences, vol. 59. New York: Springer.
(1961), Lochak and Meunier (1988), Mitropolskii Volosov VM (1962) Averaging in systems of ordinary differential
(1971), and Volosov (1962)). equations. Russian Mathematical Surveys 17(6): 1126.
232 Axiomatic Approach to Topological Quantum Field Theory
VX q ; VX k
Axioms of a TQFT # #
VX VX
An (n 1)-dimensional TQFT (V, ) over a scalar
field k assigns to every closed oriented n-dimen- Here = k is the tensor product over k. The
sional manifold X a finite-dimensional vector space vertical maps are respectively the ones induced
V(X) over k and assigns to every cobordism by the obvious diffeomorphisms, and the stan-
(M, X, Y) a k-linear map dard isomorphisms of vector spaces.
5. Symmetry The isomorphism
M M; X; Y : VX ! VY
VX q Y VY q X
Here a cobordism (M, X, Y) between X and Y is a
compact oriented (n 1)-dimensional manifold M induced by the obvious diffeomorphism corre-
endowed with a diffeomorphism @M X q Y (the sponds to the standard isomorphism of vector
overline indicates the orientation reversal). All spaces
manifolds and cobordisms are supposed to be
smooth. A TQFT must satisfy the following axioms. VX VY VY VX
1. Naturality Any orientation-preserving diffeo- Given a TQFT (V, ), we obtain an action of the
morphism of closed oriented n-dimensional mani- group of diffeomorphisms of a closed oriented
folds f : X ! X0 induces an isomorphism f] : V n-dimensional manifold X on the vector space
(X)! V(X0 ). For a diffeomorphism g between the V(X). This action can be used to study this group.
cobordisms (M, X, Y) and (M0 , X0 , Y 0 ), the follow- An important feature of a TQFT (V, ) is that it
ing diagram is commutative: provides numerical invariants of compact oriented
(n 1)-dimensional manifolds without boundary.
gjX ]
VX ! VX0 Indeed, such a manifold M can be considered as a
cobordism between two copies of ; so that (M) 2
M # # M0 Homk (k, k) = k. Any compact oriented (n 1)-
gjY ]
VY ! VY 0 dimensional manifold M can be considered as a
Axiomatic Approach to Topological Quantum Field Theory 233
cobordism between ; and @M; the TQFT assigns to circles S1 q S1 and one circle S1 ) defines a commu-
this cobordism a vector (M) in Homk (k, tative multiplication on the vector space A = V(S1 ).
V(@M)) = V(@M) called the vacuum vector. The 2-disk, considered as a cobordism between S1
The manifold [0, 1] X, considered as a cobord- and ;, induces a nondegenerate trace on the algebra
ism from X q X to ; induces a nonsingular pairing A. This makes A into a commutative Frobenius
algebra (also called a symmetric algebra). This
VX VX ! k algebra completely determines the TQFT (V, ).
We obtain a functorial isomorphism V(X) = Moreover, this construction defines a one-to-one
V(X) = Homk (V(X), k). correspondence between equivalence classes of two-
We now outline definitions of several important dimensional TQFTs and isomorphism classes of
classes of TQFTs. finite dimensional commutative Frobenius algebras
If the scalar field k has a conjugation and all the (Kock 2003).
vector spaces V(X) are equipped with natural The formalism of TQFTs was to a great extent
nondegenerate Hermitian forms, then the TQFT motivated by the three-dimensional case, specifi-
(V, ) is Hermitian. If k = C is the field of complex cally, Wittens ChernSimons TQFTs. A mathema-
numbers and the Hermitian forms are positive tical definition of these TQFTs was first given
definite, then the TQFT is unitary. by Reshetikhin and Turaev using the theory of
A TQFT (V, ) is nondegenerate or cobordism quantum groups. The WittenReshetikhinTuraev
generated if for any closed oriented n-dimensional three-dimensional TQFTs do not satisfy exactly the
manifold X, the vector space V(X) is generated by definition above: the naturality and the functoriality
the vacuum vectors derived as above from the axioms only hold up to invertible scalar factors
manifolds bounded by X. called framing anomalies. Such TQFTs are said to
Fix a Dedekind domain D C. A TQFT (V, ) be projective. In order to get rid of the framing
over C is almost D-integral if it is nondegenerate and anomalies, one has to add extra structures on the
there is d 2 C such that d(M) 2 D for all M with three-dimensional cobordism category. Usually one
@M = ;. Given an almost integral TQFT (V, ) and a endows surfaces X with Lagrangians (maximal
closed oriented n-dimensional manifold X, we define isotropic subspaces in H1 (X; R)). For 3-cobordisms,
S(X) to be the D-submodule of V(X) generated by all several competing but essentially equivalent
the vacuum vectors. This module is preserved under additional structures are considered in the literature:
the action of self-diffeomorphisms of X and yields a 2-framings (Atiyah 1989), p1 -structures (Blanchet
finer arithmetic version of V(X). et al. 1995), numerical weights (K Walker, V Turaev).
The notion of an (n 1)-dimensional TQFT over Large families of three-dimensional TQFTs are
k can be reformulated in the categorical language as obtained from the so-called modular categories.
a symmetric monoidal functor from the category of The latter are constructed from quantum groups at
n-manifolds and (n 1)-cobordisms to the category roots of unity or from the skein theory of links.
of finite-dimensional vector spaces over k. The See Quantum 3-Manifold Invariants.
source category is called the (n 1)-dimensional
cobordism category. Its objects are closed oriented
n-dimensional manifolds. Its morphisms are cobord- Additional Structures
isms considered up to the following equivalence:
The axiomatic definition of a TQFT extends in
cobordisms (M, X, Y) and (M0 , X, Y) are equivalent
various directions. In dimension 2 it is interesting to
if there is a diffeomorphism M ! M0 compatible
consider the so-called openclosed theories involving
with the diffeomorphisms @M X q Y @M0 .
1-manifolds formed by circles and intervals and
two-dimensional cobordisms with boundary
(G Moore, G Segal). In dimension 3 one often
TQFTs in Low Dimensions
considers cobordisms including framed links and
TQFTs in dimension 0 1 = 1 are in one-to-one graphs whose components (resp. edges) are labeled
correspondence with finite-dimensional vector with objects of a certain fixed category C. In such a
spaces. The correspondence goes by associating theory, surfaces are endowed with finite sets of
with a one-dimensional TQFT (V, ) the vector points labeled with objects of C and enriched with
space V(pt) where pt is a point with positive tangent directions. In all dimensions one can study
orientation. manifolds and cobordisms endowed with homotopy
Let (V, ) be a two-dimensional TQFT. The linear classes of mappings to a fixed space (homotopy
map associated with a pair of pants (a 2-disk with quantum field theory, in the sense of Turaev).
two holes considered as a cobordism between two Additional structures on the tangent bundles spin
234 Axiomatic Quantum Field Theory
structures, framings, etc. may be also considered Blanchet C, Habegger N, Masbaum G, and Vogel P (1995)
provided the gluing is well defined. Topological quantum field theories derived from the Kauff-
man bracket. Topology 34: 883927.
Kock J (2003) Frobenius Algebras and 2D Topological Quantum
See also: Braided and Modular Tensor Categories; Hopf
Field Theories. LMS Student Texts, vol. 59. Cambridge:
Algebras and q-Deformation Quantum Groups; Indefinite Cambridge University Press.
Metric; Quantum 3-Manifold Invariants; Topological Quinn F (1995) Lectures on axiomatic topological quantum field
Gravity, Two-Dimensional; Topological Quantum Field Freed DS and Uhlenbeck KK (eds.) Geometry and Quantum
Theory: Overview. Field Theory, pp. 325453. IAS/Park City Mathematical Series,
University of Texas, Austin: American Mathematical Society.
Segal G (1988) Two-dimensional conformal field theories and
modular functors. In: Simon B, Truman A, and Davies IM
Further Reading (eds.) IXth International Congress on Mathematical Physics,
pp. 2237. Bristol: Adam Hilger Ltd.
Atiyah M (1989) Topological Quantum Field Theories. Publica- Turaev V (1994) Quantum Invariants of Knots and 3-Manifolds.
tions Mathematiques de lIhes 68: 175186. de Gruyter Studies in Mathematics, vol. 18. Berlin: Walter de
Bakalov B and Kirillov A Jr. (2001) Lectures on Tensor Gruyter.
Categories and Modular Functors. University Lecture Series Witten E (1988) Topological quantum field theory. Communica-
vol. 21. Providence, RI: American Mathematical Society. tion in Mathematical Physics 117(3): 353386.
Quantum Field Theory in Curved Spacetime, Continuity as a distribution For all , 2 D, the
13
Symmetries in Quantum Field Theory of Lower linear functionals T, , on C1
0 (R ) defined by
Spacetime Dimensions, and Thermal Quantum Field
a
Theory). T;; : h; Fa i
Covariance There exist strongly continuous uni- (using the nuclear theorem) and
tary representations U and T of SL(2, C) and
wa11
aN a1 aN
N : h; F1 N i: 1
Spectrum condition The support of the Fourier and an antilinear involution by ( , ) : ( , ).
transform of each Wa11 aNN is contained in (V )N1 . This endows BI with the structure of a nonabelian
-algebra with unit element 1 = (1, ;) (Borchers
The uniqueness of the vacuum vector (up to a
algebra).
phase) is equivalent to the following condition.
If one defines F; (z) : z1, then w; (z) = z, and the
Cluster property For N 2, let x be a spacelike Wightman functions induce a C-linear functional !
vector in R13 , let L be a natural number < N, and on BI by
let and be tempered test functions on (R13 )L
13 NL ! ; : w 2
there is (up to unitary equivalence) a unique family unitary operators implementing the Lorentz boosts on
F 1 F m of GardingWightman fields with n1 nm the fields are elements of modular groups. This means
components such that eqn [1] holds. that a uniformly accelerated observer perceives the
vacuum as a thermal state with a temperature
The proof uses the GNS construction known from
proportional to its acceleration, corresponding to the
the theory of operator algebras. The Borchers
famous Unruh effect.
algebra plays several roles. On the one hand, it is a
In addition, it was shown that P1 CT symmetries
linear space with an inner product. The Hilbert
(i.e., PCT combined with rotations by the angle ) are
space H and the invariant space D of the field theory
implemented by modular conjugations (modular P1 CT
are constructed from this structure. On the other
symmetry). Modular P1 CT symmetry is a consequence
hand, the Borchers algebra acts on itself as an
of the Unruh effect (Guido and Longo 1995).
algebra of linear operators by its own algebra
multiplication. This is the structure the -algebra of
Spin and Statistics
field operators is constructed from.
Immediately following Luderss PCT theorem, the
spinstatistics theorem was proved for the N-point
Results
functions of the Wightman setting (Luders and
The mathematical and structural analysis of quan- Zumino 1958, Burgoyne 1958, DellAntonio 1961).
tum fields has improved the understanding of This was a remarkable and widely acknowledged
scattering theory in the different approaches men- progress. But as remarked earlier, the confinement to
tioned above; see Bogoliubov et al. (1975) and the finite-component fields, which is used in the proof,
relevant articles in this encyclopedia. Apart from cannot be motivated by physical first principles (i.e., in
this, the following results deserve to be mentioned. a truly axiomatic fashion). The representation D of
Evidently, many others have to be omitted for SL(2, C) acting on the components, however, is forced
practical reasons. to be finite dimensional by this assumption, and since
the representations Da are objects of investigation, a
PCT Symmetry considerable part of the result is assumed this way
An early famous result was Luderss proof (1957) from the outset. Even more so, there are examples of
that all fields in the above setting exhibit PCT fields with a wrong spinstatistics connection and
symmetry, that is, the symmetry under reflections in infinitely many components.
all space and time variables combined with a charge This was one reason to continue working on the
conjugation. This symmetry is exhibited by all subject. At the beginning of the 1990s, it was found
particle reactions observed so far. The proof, like that the spinstatistics theorem can be derived from
several of the main results, made extensive use of the the symmetries discovered by Bisognano and Wich-
fact that the N-point functions are boundary values mann, and Unruh. Two approaches not referring to
of analytic functions due to the spectrum condition, the number of internal degrees of freedom have been
and that a fundamental theorem by Bargmann, Hall, worked out: one assumes the Unruh effect (Guido
and Wightman (1957) yields invariant analytic and Longo 1995), the other modular P1 CT symme-
extensions. try (Kuckert 1995, 2005, Kuckert and Lorenzen
2005). The first approach has been generalized to
ReehSchlieder Theorem conformal fields, the second to the case that the
For each field Fa and each bounded open region symmetry groups homogeneous part is not SL(2, C),
O R13 , the vacuum vector is cyclic with respect but only SU(2).
to F a (O) (Reeh and Schlieder 1961). So excitations Both approaches can be applied to infinite-
of the vacuum vector by field operators located in O component fields. They yield existence theorems; a
are not to be considered as state vectors of a particle distinguished representation is constructed from the
localized in O, since they are not perpendicular to modular symmetries, and this representation exhib-
the excitations by field operators located outside O. its Paulis spinstatistics connection. As mentioned
before, nothing more can be expected at this level of
Unruh Effect and Modular P1 CT Symmetry generality. The line of argument works in both the
algebraic and the Wightman setting.
In the 1970s, Bisognano and Wichmann (1975, 1976)
discovered a surprising link of symmetries to the
A Dynamical Property of the Vacuum
intrinsic algebraic structure of quantum fields, which is
established by the TomitaTakesaki modular theory One can derive the spectrum condition, the Bisog-
(see TomitaTakesaki Modular Theory). Namely, the nanoWichmann symmetries/the Unruh effect, and
Axiomatic Quantum Field Theory 239
covariance from the condition that no (inertial or) (and, hence, also special) relativity have to satisfy to
uniformly accelerated observer can extract mechan- ensure causality. But the conflict can be solved by
ical energy from the field in vacuo by means of a smearing the densities out in space or time, as has
cyclic process (Kuckert 2002). first been realized by Ford (1991). The extent to
which the energy density can become negative
Interacting Fields depends on the extent to which it is smeared out:
more smearing means less violation of positivity,
The examples of interacting quantum fields that fit
so the classical positivity conditions are restored at
into the above settings live in one or two spatial
medium and large scales. There are many ways to
dimensions only, and their relevance for physics
make this principle concrete. Quantum energy
mainly consists in being such examples. This
inequalities hold for thermodynamically well-
has contributed to some frustration and to doubts
behaved quantum fields on causally well-behaved
on whether one is not, in fact, proving theorems on
classical spacetime backgrounds.
pretty empty sets, or in other words, working on
the most sophisticated theory of the free field.
The computations in quantum field theory are, like Bibliographic Notes
most of the computations in physics, perturbative. In
Important monographs on axiomatic quantum field
order to be successful, they need to yield good
theory are those by Streater and Wightman (1964),
agreement with experiment with reasonable compu-
Jost (1965), Bogoliubov et al. (1975), and Bogoliubov
tational efforts, that is, by evolution up to the second
et al. (1990). Note that the books of Bogoliubov et al.
or third order. This asymptotic convergence is more
differ in setup fundamentally and that neither replaces
important than convergence of the series as a whole.
the other. For a lecture notes volume, see also Volkel
There are low-dimensional examples of interacting
(1977), and for a review article, see Streater (1975).
Wightman fields (e.g., (4 )2 ; cf. the monograph by
A valuable discussion of the Wightman axioms can
Glimm and Jaffe (1987)), and time will tell whether
also be found in the second volume of the series by
four-dimensional interacting Wightman fields exist.
Reed and Simon (1970).
But there is no reason to expect convergence for
The first monograph on the algebraic approach to
general interacting fields; for example, QED does not
quantum field theory is due to Haag (1992), a more
fit into the Wightman framework.
recent one has been written by Araki (1999).
The appropriate extension of the Wightman
Concerning the sufficient conditions for switching
setting has been formulated by Epstein and Glaser
between the GardingWightman and the algebraic
(1973). It defines the S-matrix rather than the field
approach, see Wollenberg (1988) and the Ph.D.
itself as a (in general divergent) formal power series
thesis of Bostelmann (2000) and references given
of operator-valued distributions.
there. Dynamical and thermodynamical foundation
The above results apply to this somewhat more
of standard axioms, the BisognanoWichmann
modest setting as well, so the axiomatic
symmetries (Unruh effect), and the spinstatistics
approaches do help in understanding the known
theorem, have been investigated by Kuckert (2002,
high-energy physics interactions. This even includes
2005), see also the references given there for related
gauge theories (see Perturbative Renormalization
work.
Theory and BRST). The high-precision results of
In different formulations and at differing degrees of
QED can be reproduced within this setting, and
mathematical sophistication, the causal approach to
there occur no UV singularities: renormalization
perturbation theory can be found in the monographs
amounts to the need to extend distributions by
by Bogoliubov and Shirkov (1959), Scharf (1989,
fixing some parameters, that is, the renormalization
2001), and Steinmann (2000). Two modern review
constants. The infrared problem is circumvented by
articles have been written by Brunetti and Fredenhagen
considering the S-matrix as a (position-dependent)
(2000) and by Dutsch and Fredenhagen (2004).
distribution taking values in the unitary formal
The reference original articles on the Euclidean
power series of distributions rather than as a single
axioms are those of Osterwalder and Schrader (1973,
(global) unitary operator (or unitary power series).
1975). Note that the first one contains an error. (cf.
also Zinoviev (1995)). A monograph on Euclidean
Quantum Energy Inequalities
field theory and its relations to the other axiomatic
Energy densities of Wightman fields admit negative settings of quantum field theory and to statistical
expectation values (Epstein, Glaser, and Jaffe 1965). mechanics is that by Glimm and Jaffe (1987).
This is in contrast to the positivity conditions that A recent review on quantum energy inequalities is
the energymomentum tensors of classical general due to Fewster (2003).
240 Axiomatic Quantum Field Theory
w;xy sinw 2
we mean a BT from w to w0 with parameter a.
where the subscript denotes partial derivative. For sG equation [2] a trivial solution is given, for
Equation [2] is nowadays called the sine Gordon example, by w(x, y) = . Then, from eqn [3a] we get
(sG) equation. Bianchi (1879), Lie (1888, 1890,
1 e2axy
1893), and Backlund (1874) introduced a transfor- wx;
~ y 2 arcsin
mation which allows one to pass from a solution of 1 e2axy
eqn [2] to a new solution, that is, from a surface of Introducing this result in eqn [3b], we get ,y = 1=a.
constant curvature to a new one. Starting from the So, the application of the BT [3] to sG equation gives
work of Clarin (1903), this transformation has been the nontrivial solution
referred to as Backlund transformation (BT). The
BT for eqn [2] reads
w= ~ = 4 arctan 1 e[axy/a]
w 6
w~ w 1 + e[axy/a]
w~ ;x w;x 2a sin 3a
2
Clarin (1903) extended the results of Backlund to
2 w~ w the case of a generic partial differential equation of
~ ;y w;y sin
w 3b second order,
a 2
where a is a nonzero constant parameter and w is a Fx; y; w; w;x ; w;y ; w;xx ; w;xy ; w;yy 0 7
different solution of eqn [2]. It is immediate to prove by assuming that
by appropriate differentiation of eqns [3] with
respect to y and x that both w and w must satisfy w;x f w; w;
~ w~ ;x ; w
~ ;y
8
eqn [2]. The BT [3] provides a denumerable set of w;y gw; w;
~ w~ ;x ; w
~ ;y
exact solutions once a solution w is known. Bianchi
242 Backlund Transformations
If the compatibility of eqns [8] with s1 < m1 and s2 < m2 , represents the BT of
eqns [13] iff the compatibility of eqns [14] is
f;y g;x 0 9
identically satisfied on the solutions of eqns [13]
is identically satisfied by eqn [7] for the variable and Gj depends on a set of essential arbitrary
w(x, y), then we say that eqns [8] are an constant parameters.
auto-Backlund transformation for eqn [7]. In this
The Clarin formulation [8] and the classical BT
case, eqns [8] transform a solution of eqn [7] into a
for the sG [3] are clearly special subcases of this
new solution of the same equation. Thus, eqns [8]
definition. When a solution of F1 = 0 is known, a
simplify the problem of finding solutions of eqn [7].
solution of F2 = 0 is obtained by solving a set of
Given one solution w(x, y) of eqn [7], the existence
lower-order partial differential equations. By a
of a BT reduces the problem of integrating eqn [7]
proper choice of the BT parameters, once a new
into that of solving two first-order ordinary differ-
solution is obtained by solving the BT [14], one can
ential equations. From this point of view, the
use the obtained solution as a starting point to
CauchyRiemann relations
construct another one, and so on. In this way, one
w;x w
~ ;y ; w;y w
~ ;x 10 can construct a whole ladder of solutions, a priori a
denumerable set of solutions. This same construc-
for the Laplace equation tion has been applied also to the case of functional
w;xx w;yy 0 11 equations. In particular, it has been considered for
the case of differentialdifference and difference
are a BT ante litteram (however, without a free difference equations both for finite (dynamical
parameter). systems (Wojciechowski 1982)) and infinite lattices
Consider the case when w(x, y) satisfies a different (Toda 1989).
partial differential equation, In the case when F1 and F2 represent the same
Gx; y; w;
~ w~ ;x ; w
~ ;y ; w
~ ;xx ; w
~ ;xy ; w
~ ;yy 0 12 equation, s1 = s2 = 1 and the BTs Gj = 0 are linear in
u , then Definition 1 is strictly related to the notion
In this case, one still has a BT, but not an auto-BT. (1)
of nonclassical symmetry or conditional symmetry
The best-known cases are when F1 = w,y w,xxx (Levi and Winternitz 1989, Olver 1993), an exten-
ww,x and G1 = w,y w,xxx w2 w,x , and F2 = w,xy sion of the concept of Lie symmetry used to reduce
ew and G2 = w,xy (Lamb 1976). In the first case, the and integrate a differential equation. In the case of
BT relates the Kortewegde Vries (KdV) equation to the nonclassical symmetries, the known solution u ~ is
the modified KdV equation and this transformation included in the arbitrary x-dependent coefficients of
paved the way to the discovery of the complete the transformation. In this case, the BT is just a way
integrability of the KdV equation by Gardner et al. to construct an explicit solution of the differential
(1967). In the second case, the BT relates the equation [7].
Liouville equation to the wave equation, and can Definition 1 is often too general to be able to get
be used to solve it completely. Due to the first explicit results. It is constructive for any partial
example, often a non-auto-BT is denoted as Miura differential equation, linear or nonlinear, but if one
transformation. is not able to get a nontrivial BT this does not
One can now state an operative definition of BT, mean that a BT does not exist. As noted later, the
extending the results of Backlund and Clarin to existence of an auto-BT is associated to the
more general equations. existence of an infinity of symmetries, and this is
Definition 1 Consider two partial differential a condition for the exact integrability of eqn [13]
equations of order m1 and m2 : (Fokas 1980, Ibragimov and Shabat 1980). So, the
F1 x; u; u ; u ; . . . ; u 0 13a existence of a BT is closely related to the integr-
1 2 m1 ability of eqn [13].
F2 x; u
~; u
~; u
~;...; u
~ 0 13b
1 2 m2 Backlund via Integrability
where x 2 R n and (u, u~ ) 2 Cp , and u is the set of One can derive the BT from the integrability
(k)
k-order derivative of u. The set of n equations properties of eqn [13a]. Equation [13a] is said to
be integrable if it can be written as the compatibility
Gj x; u; u ; . . . ; u ; u
~; u
~;...; u
~0 condition of an overdetermined system of linear
1 s1 1 s2
partial differential equations for an auxiliary func-
j 1; 2; . . . ; n 14
tion depending on a free parameter belonging to the
Backlund Transformations 243
complex C plane. The prototype of such a situation In eqn [21] and henceforth, d=dt denotes the total
is given by the Lax pair for the KdV equation derivative with respect to t.
In the following, for the sake of the simplicity
u;t u;xxx 6uu;x 0 15 of exposition and for the concreteness of the
introduced by Lax (1968): presentation, all the results presented on the BT
will be derived for the KdV equation. Similar
L k2 ; L @x2 ux; t 16a results can be obtained and have been obtained in
the literature for many classes of integrable
;t M ; M 4@xxx 3u@x @x u 16b partial differential equations in two and three
dimensions and for differentialdifference and
where k is a free parameter and = (x, t; k). As eqn differencedifference equations. For a partial
[16a] is nothing else but the stationary Schrodinger review of the available recent literature on
equation, the function can be interpreted as a the subject, see Rogers and Shadwick (1982) and
wave function, and k2 is the spectral parameter Coley et al. (2001)
corresponding to the potential u(x, t). The condition A more general form of introducing the non-
for the existence of a solution of the over- linear partial differential equation as a compat-
determined system of eqns [16] is given by the ibility of an overdetermined system of linear
operator equation equations has been provided by Zaharov and
L;t L; M 17 Shabat (1979) with the dressing method (DM). In
the DM, the differential equations [16] are
the so-called Lax equation. In the case of substituted by a matrix system of linear equations
asymptotically bounded potentials, eqn [16a]
defines the spectrum unique. Introducing the ;x Uux; t; k 22a
following asymptotic boundary conditions for the ;t Vux; t; k 22b
wave function ,
where = (x, t; k) and U and V are matrix
x; t; k ! Tk; teikx functions. The existence of a nonsingular solution
x!1
18 of the system of linear equations [22] requires
x; t; k ! eikx Rk; teikx
x!1 that the matrix functions U and V satisfy the
equation
where R(k, t) and T(k, t) are, respectively, the
reflection and the transmission coefficient, the U;t V;x U; V 0 23
spectrum is defined in the complex plane of
the variable k by often called zero-curvature condition. The KdV
equation [15] in the DM is obtained by choosing
Su fRk; t; 1 < k < 1; pn ; cn t;
j 1; 2; . . . ; Ng 19 ik ux; t
Uux; t; k
1 ik
where pn are the bound state parameters corre-
Vux; t; k
sponding to isolated singularities of the reflection !
coefficients on the imaginary positive k-axis corre- 2u 4k2 ux 2iku 4ik3
sponding to a solution n (x, t; pn ) of the spectral
ux 2iku 4ik3 2uu 2k2 2ikux ux x
problem vanishing for x ! 1 and such that
24
lim epn x n x; t; pn 1 20
x!1
The existence of an auto-BT implies the existence
and cn are some functions of t related to the residues of a differential equation (see Definition 1) which
of R(k, t) at the poles pn . There is a one-to-one relates two solutions of the same nonlinear equa-
correspondence between the evolution of the poten- tion. The new solution u(x, t) of eqn [15] will be
tial u(x, t) in eqn [15] and that of the spectrum S[u] associated to a different Lax operator and a
of the Schrodinger spectral problem [16a]. In parti- different spectral problem (but of the same opera-
cular, for the KdV, taking into account eqn [16b], tional form)
the evolution of the reflection coefficient R(k, t) is
given by ~ @xx u
L ~x; t 25a
dRk; t
8ik3 Rk; t 21 ~ ~ k2 ~
L 25b
dt
244 Backlund Transformations
The existence of a relation between the potentials of the spectral problem, eqn [29a] provides a new
u(x, t) and u(x, t) thus implies that there must be a solution of the KdV, while eqn [29b] gives a new
(u, u; k)-dependent operator D such that solution of the spectral problem. This procedure can
be carried out recursively and gives a ladder of
~D 26 explicit solutions for the KdV equation.
The DM is a particularly simple setting in which
The compatibility of eqns [16a], [25b], and [26]
one can derive DTs. In fact, expressing the matrix
implies that LD = Dk2 , that is,
D in terms of , eqn [28a] gives a relation between
~ DL
LD 27 the potentials of the type given by eqn [29a], while
eqn [26] gives eqn [29b]. Depending on the form of
Equation [27] is the auto-BT in the Lax formalism. the matrix D in terms of k, one can introduce more
If L and L are two different spectral problems parameters in the DT. The classical DT [29]
related to two different nonlinear partial differential depends on just one parameter; however, in the
equations, then eqn [27] will provide a Miura case of the Schrodinger spectral problem [16a], one
transformation. In the DM, the requirement of the can also have DTs depending on two parameters, a
existence of a BT is given again by eqn [26] with TDT.
and substituted by and and the operator D A more general DT, which can provide solutions
substituted by a matrix function D. The BT in the even when the initial solution is not bounded
DM is given by asymptotically, can be obtained for many equations
D;x U~ux; t; kD DUux; t; k 28a and, in particular, also for the KdV equation. This is
obtained in a particular limit of the TDT when the
D;t V~
ux; t; kD DVux; t; k 28b parameters coincide (Levi 1988) and it is often
referred to as binary DT (Matveev and Salle 1991).
In the particular case of the HilbertRiemann The binary DT for the KdV is given by
problem with zeros, providing the soliton solutions,
u
~x; t ux; t 2log Fx; t;xx 30a
the matrix D can be expressed as a function of . In
this way, one derives the Moutard or Darboux
transformation (DT) (Moutard 1878, Levi et al. 1 Fx;t;xx
~x;t;k k2
2
;x x;t; k
1984), the most efficient way to get soliton solutions k2 2 2Fx;t
of the nonlinear partial differential equation.
Fx x;t
Given a linear ordinary differential equation for x;t; k 30b
Fx;t
the unknown , depending on a set of arbitrary
functions u(x) and parameters k, the DT provides a where is a value of k for which the function
discrete transformation which leaves the equation (x, t; k) is asymptotically bounded at 1 and the
invariant. In the particular case of the KdV equation function F(x, t) is given by
associated with the stationary Schrodinger spectral Z 1
problem [16a], we have Fx; t 1 y; t; 2 dy 31
u
~x; t ux; t 2log Fx; t;xx 29a x
equations and the operators D which give the Backlund and Symmetries
admissible BT. A technique to do so is provided by
A symmetry of the nonlinear equation [15] is given
the so-called Lax technique introduced by Bruschi
by a flow commuting with it, that is, by an
and Ragnisco (1980ac). Using the Lax technique,
equation
we can easily obtain the nonlinear partial differ-
ential equations and BT associated with the Lax u; f u; ux ; ut ; . . . 37
operator [16a] both in the isospectral and non-
isospectral case (when k,t = 0 and when k,t 6 0) where is the group parameter, u = u(x, t; ), and the
and the corresponding evolution of the spectrum. derivative of [15] is zero on its set of solutions.
We have A group transformation is obtained by integrating it.
Usually this is possible only when eqn [37] is a
u;t f L; tux gL; txux 2u 33a quasilinear partial differential equation of the first
order. Taking into account the evolution of the
k;t kg4k2 ; t spectrum of the KdV equation [15], it is easy to
dRk; t 33b prove that its symmetries are given by
2ikf 4k2 ; tRk; t ( )
dt X
1 X
1
n n
u; n L 3 n tL u;x
F~
u u G 1 0 33c n0 n0
( )
X
1
2 2 n Ln xu;x 2u 38
~ t F4k 2ikG4k Rk; t
Rk; 33d n0
F4k2 2ikG4k2
where n and n are a set of constant parameters.
where the functions f, g, F, and G are entire For each choice of the parameters n and n ,
functions of their first argument and the recursive one gets a symmetry of the KdV equation [15].
operators L and are given by With eqn [38] one can associate the following
evolution of the reflection coefficient R(k, t; ):
Lf x f;xx x 4ux; tf x (
Z 1 X
1
dR
2u;x x; t f y dy 34a 2ik n 4k2 n
x d n0
)
f x f;xx x 2~ux; t ux; tf x X 1
2 n1
Z 1 3 n t4k R 39
n0
f y dy 34b
x
and of the spectral parameter k
u;x x; t u;x x; tf x ~
f x ~ ux; t ux; t X
1
Z 1 k; n 4k2 n k 40
uy; t uy; tf y dy
~ 34c n0
x
As (1/2)L 1 = xu,x 2u, one can add to the
In the limit when u ! u the operator ! L. A BT symmetries [38] the exceptional one (which has no
is obtained by choosing the functions F and G in spectral counterpart as u is not bounded
eqn [33c]. The simplest BT is obtained by setting asymptotically):
F = and G = 1:
u; 1 6tu;x 41
~ v v 12~
v;x v;x ~ v v 0 35
By a proper natural choice of the constant para-
with u(x, t) = v,x (x, t) and is the Backlund meters n and n , one can define two infinite series
parameter. By combining together BT of the form of symmetries. The first one is obtained by choosing
[35] with different parameters as in eqn [5], we get n = 0 and n =
n, m with m = 1, 2, . . . , 1 and can
the permutability theorem for the KdV BTs: be denoted as the isospectral series as k, = 0. This is
formed by commuting symmetries. The second one
1 2 v0 v
~ is given by n = 0 and n =
n, m with m = 1, 2, . . . , 1
v0 v
~ 36 and can be denoted as the nonisospectral series as
1 2 1/2v0 ~v
k, 6 0. The nonisospectral symmetries have a
Its proof is immediate from the point of view of the nonzero commutation relation among themselves
spectrum. and with the isospectral ones.
246 Backlund Transformations
Except for a few Lie point symmetries (given by which is an integrable differentialdifference
eqn [41] and by choosing inside the series [38] those approximation to the KdV equation or
with different from zero only 0 or 0 or 1 ) they
are all generalized symmetries (Olver 1993). By wn 1; t;t wn; t;t
analyzing their spectrum, it is easy to prove that the wn 1; t wn; t
2a sin 46
choice [38] is such that they are all independent. For 2
the isospectral class, the evolution of the spectrum is a discrete integrable differentialdifference approxima-
simple and can be integrated to provide the group tion to the sG equation (Hirota 1977, Orfanidis 1978).
transformation of the spectrum As the nonlinear superposition formulas are
Rk; t; Rk; t purely algebraic relations involving potentials asso-
" ( )# ciated with integrable nonlinear partial differential
X
1
equations, one can interpret them as difference
2 n
exp 2ik n 4k 42
n0
difference equations. In the case of the sG equation
from eqn [7], we have
Let us now consider the simplest BT obtained by
wn1;m1 wn;m
choosing, in eqn [33c], F() = and G() = 1, where
is an arbitrary parameter. In the spectral space, this a1 a2 wn;m1 wn1;m
4 arctan1 tan 47
corresponds to the following change of the spectrum: a1 a2 4
where w(x, t) = wn, m , w(x, t) = wn1, m , w0 (x, t) =
~ t 2ik Rk; t
Rk; 43
2ik wn, m1 , and w0 (x, t) = wn1, m1 . In a similar manner,
from [36], one gets
Defining R(k, t) = R(k, t; ), eqn [42] is equal to
eqn [43] iff 1 2 vn1;m vn;m1
vn1;m1 vn;m 48
1 2 12 vn1;m vn;m1
2
n ; n 0; 1; . . . ; 1 44
2n1 2n 1 The continuous limit of eqn [47], obtained by setting
x = 1 n and y = 2 m and choosing
So we need an infinite number of symmetries to
be able to reconstruct the change of the spectrum a1 1 2
given by the BT. This shows that the existence of a BT a2 4
is strictly connected to the existence of an infinity of gives back eqn [2] (Rogers and Schief 1997). It is
symmetries which is a condition for the exact worth mentioning that one can also use known
integrability of the nonlinear partial differential nonlinear lattice equations to construct BT for
equation (Fokas 1980, Ibragimov and Shabat 1980). nonlinear partial differential equations (Levi 1981).
Wronskian technique. Journal of Mathematical Physics Levi D and Benguria R (1980) Backlund transformations and
18: 690700. nonlinear differential difference equations. Proceedings of the
Clarin J (1903) Sur quelques equations aux derivees partielles du National Academy of Science USA 77: 50255027.
second ordre. Annales de la Facult des Sciences de Toulouse pour Levi D and Winternitz P (1989) Non-classical symmetry reduction:
les Sciences Mathmatiques et les Sciences Physiques. Serie 2 example of the Boussinesq equation. Journal of Physics A:
5: 437458. Mathematical and General 22: 29152924.
Coley A, Levi D, Milson R, Rogers C, and Winternitz P (eds.) (2001) Levi D, Ragnisco O, and Sym A (1984) Dressing method vs. classical
Backlund and Darboux transformations. The Geometry of Darboux transformation. Il Nuovo Cimento 83B: 3442.
solitons. Proceedings of the AARMS-CRM Workshop, Halifax, Lie S (1888, 1890, 1893) Theorie der Transformationgruppen.
NS, June 49, 1999. CRM Proceedings and Lecture Notes, vol. Leipzig: B.G. Teubner.
29. Providence, RI: American Mathematical Society. Matveev VB and Salle LA (1991) Darboux Transformations and
Faddeev LD and Takhtajan LA (1987) Hamiltonian Methods in Solitons. Berlin: Springer.
the Theory of Solitons. Berlin: Springer. Moutard Th-F (1878) Sur la construction des equations de la forme
Fokas AS (1980) A symmetry approach to exactly solvable evolution (1=z)(d2 z=dxdy) = (x,y), qui admettent une integrale general
equations. Journal of Mathematical Physics 21: 13181325. explicite. Journal de lEcole Polytechnique, Paris 28: 111.
Gardner CS, Greene JM, Kruskal MD, and Miura RM (1967) Olver PJ (1993) Applications of Lie Groups to Differential
Method for solving the Kortewegde Vries equation. Physical Equations. New York: Springer.
Review Letters 19: 10951097. Orfanidis SJ (1978) Discrete sine-Gordon equations. Physical
Hirota R (1978) Nonlinear partial difference equations. III. Review D 18: 38223827.
Discrete sine-Gordon equation. Journal of the Physical Society Rogers C and Schief WK (1997) The classical Backlund
of Japan 43: 20792086. transformation and integrable discretization of characteristic
Ibragimov NH and Shabat AB (1980) Infinite LieBcklund algebras equations. Physics Letters A 232: 217223.
(in Russian). Funktsional. Anal. i Prilozhen 14: 7980. Rogers C and Shadwick WF (1982) Backlund Transformations
Lamb GL (1976) Backlund transformations at the turn of and Their Applications. New York: Academic Press.
the century. In: Miura RM (ed.) Backlund Transformations, Toda M (1989) Theory of Nonlinear Lattices. Berlin: Springer.
pp. 6979. Berlin: Springer. Wojciechowski S (1982) The analogue of the Backlund transforma-
Lax PD (1968) Integrals of nonlinear equations of evolution and tion for integrable many-body systems. Journal of Physics A:
solitary waves. Communications in Pure and Applied Mathe- Mathematical and General 15: L653L657.
matics 21: 647690. Zaharov VE and Shabat AB (1979) Integration of the nonlinear
Levi D (1981) Nonlinear differential difference equations as equations of mathematical physics by the method of the
Backlund transformations. Journal of Physics A: Mathematical inverse scattering problem. II (Russian). Funktsional Analiz
and General 14: 10831098. i ego Prilozheniya 13: 1322. (English translation: Functional
Levi D (1988) On a new Darboux transformation for the Analysis and Applications 13: 166173 (1980)).
construction of exact solutions of the Schroedinger equation.
Inverse Problems 4: 165172.
BatalinVilkovisky Quantization
A C Hirshfeld, Universitat Dortmund, examples of the BatalinVilkovisky formalism are
Dortmund, Germany given. At the present time, it is the most general
2006 Elsevier Ltd. All rights reserved. treatment available. Alexandrov, Kontsevich, Schwarz,
and Zabaronsky (AKSZ 1997) have presented a
geometric interpretation for the case in which the
action is topologically invariant.
Introduction
The BatalinVilkovisky formalism for quantizing
Structure of the Set of Gauge
gauge theories has a long history of development. It
Transformations
begins with the FaddeevPopov procedure for
quantizing YangMills theory, involving the Faddeev Consider a system whose dynamics is governed by
Popov ghost fields (Faddeev and Popov 1967). It a classical action S[i ] which depends on the
continued with the discovery of BRST symmetry by fields i (x), i = 1, . . . , n. We employ a compact
Becchi et al. (1976). Then Zinn-Justin (1975) notation in which the multi-index i may denote
introduced sources for these transformations, and the various fields involved, the discrete indices on
a symmetric structure in the space of fields and which they depend, and the dependence on the
sources in his study of renormalizability of these spacetime variables as well. The generalized
theories. Finally, Batalin and Vilkovisky (1981) summation convention then means that a
systematized and generalized these developments. repeated index may denote not only a sum over
A more detailed account of this history can be discrete variables, but also integration over
found in Gomis et al. (1994), where many worked the spacetime variables. i = (i ) denotes the
248 BatalinVilkovisky Quantization
Grassmann parity of the fields. Fields with i = 0 Equations [8] and [10] lead to the following
are called bosonic, with i = 1 fermionic. The condition:
graded commutation rule is
ji
1 ; 2 i Ri T S0;j E "1 "2 11
i xj y 1i j j yi x 1
For a gauge theory the action is invariant under a set The tensors T are called the structure constants of the
of gauge transformations with infinitesimal form gauge algebra, although they depend, in general, on
ij
the fields of the theory. When E = 0, the gauge
i Ri " ; 1 or 2 or . . . m 2 algebra is said to be closed, otherwise it is open.
Equation [11] defines a Lie algebra if the algebra is
The " are the infinitesimal gauge parameters and
closed and the T are independent of the fields.
Ri the generators of the gauge transformations. The gauge tensors have the following graded
When = (" ) = 0 we have an ordinary symmetry, symmetry properties:
when = 1 the equation is characteristic of a
supersymmetry. The Grassmann parity of Ri is
T
1 T
(Ri ) = i (mod 2). ij ji ij
12
A subscript after a comma denotes the right E 1 E 1 E
derivative with respect to the corresponding field,
The Grassmann parities are
that is, the field is to be commutated to the far right
and then dropped. The field equations may then be
T mod 2 13
written as
and
S0;i 0 3
ij
where S0 is the classical action. Let denote the E i j mod 2 14
surface in the space of solutions where the field
Various restrictions are imposed by the Jacobi
equations are satisfied:
identity
S0;i j 0 4 X
1 ; 2 ; 3 0 15
If the gauge transformations are independent cyclic123
on-shell, that is,
These restrictions are
rank Ri j m 5 X
ji
the gauge theory is said to be irreducible. We Ri A S0;j B " " " 0 16
cyclic123
assume here that this is the case. When it is not, the
theory is reducible. For details of the treatment in where
that case, see Gomis, Paris, and Samuel. The
classical solutions are 0 2 . 3A Tk
Rk T
T
1
The Noether identities are
Tk Rk T
T
S0;i Ri 0 6
The general solution to the Noether identity is 1 Tk
Rk T
T
and to replace the gauge parameters by ghost fields. For bosonic fields
One must then modify the graded symmetry proper-
ties of the gauge structure tensors according to @B @B
B; B 2 29
@A @A
2 4
T1 2 3 4 ... ! 1 T1 2 3 4 ... 18
for fermionic fields
The Noether identities then take the form
F; F 0 30
S0;i Ri C 0 19
and the structure relations [10] become and for any X
j ji X; X; X 0 31
2Ri;j R Ri T S0;j E C C 0 20
If one groups the fields and the antifields together
into the set
Introducing the Antifields za fA ; A g; a 1; . . . ; 2N 32
We incorporate the ghost fields into the field set
then the antibracket is seen to define a symplectic
A = {i , C }, where i = 1, . . . , n and = 1, . . . , m.
structure on the space of fields and antifields
Clearly A = 1, . . . , N, where N = n m. One then
further increases the set by introducing an antifield @r X ab @l Y
A for each field A . The Grassmann parity of the X; Y ! 33
@za @zb
antifields is
with
A A 1 mod 2 21
ab 0 BA
! 34
Each field is assigned a ghost number, with BA 0
ghi 0 The antifields can be thought of as conjugate
ghC 1 22 variables to the fields, since
A
gh A ghA 1 ; B BA 35
In the space of fields and antifields, the antibracket
is defined by
@r X @l Y @r X @l Y The Classical Master Equation
X; Y 23
@A @A @A @A Let S[A , A ] be a functional of the fields and
antifields with the dimension of an action, vanishing
where @r denotes the right, @l the left derivative. The ghost number and even Grassmann parity. The
antibracket is graded antisymmetric: equation
X; Y 1X 1Y 1 Y; X 24 @S @S
S; S 2 0 36
@A @A
It satisfies a graded Jacobi identity
is the classical master equation. Solutions of the
X; Y; Z 1X 1Y 1 classical master equation with suitable boundary
Y; Z; X 1Z 1X Y Z; X; Y 0 25 conditions turn out to be generating functionals for
the gauge structure of the theory. S is also the
It is a graded derivation starting point for the quantization. One denotes by
the subspace of stationary points of the action in
X; YZ X; YZ 1X Y X; ZY
26 the space of fields and antifields:
XY; Z XY; Z 1X Y YX; Z
a
@S
It has ghost number z
a0 37
@z
ghX; Y ghX ghY 1 27
Given a classical solution 0 of S0 one stationary
and Grassmann parity point is
X; Y X Y 1 mod 2 28 i i0 ; Ca 0; A 0 38
250 BatalinVilkovisky Quantization
An action which satisfies the classical master We define a surface in functional space
equation has its own set of invariances:
A @
@S a ; A jA 46
R 0 39 @A
@za b
so that for any functional X[, ]
with
@
@l @r S Xj X ; 47
Rab !ac 40 @
@zc @zb
This equation implies To construct a gauge-fixing fermion of ghost
number 1, one must again introduce additional
Rac Rab
0 41 fields. The simplest choice utilizes a trivial pair
,
with
C
One says that Rab is invariant on-shell. A nilpotent
2N 2N matrix has rank N. Let r be the rank of 1;
C
the hessian of S at the stationary point: 48
ghC 1; gh
0
@l @r S
r rank a b
42 The fields C are the FaddeevPopov antighosts.
@z @z
Along with these fields we include the corresponding
We then have r N. The relevant solutions of the ,
. Adding the term
C
antifields C to the
classical master equation are those for which r = N. action S does not spoil its properties as a proper
In this case the number of independent gauge solution to the classical master equation, and one
invariances of the type in eqn [39] equals the number gets the nonminimal action
of antifields. When at a later stage the gauge is fixed,
Snon S
C 49
the nonphysical antifields are eliminated.
To ensure the correct classical limit, the proper The simplest possibility for is
solution must contain the classical action S0 in the
sense that
C 50
S A ; A
0 S0 i 43 where are the gauge-fixing conditions for the
A
fields . The gauge-fixed action is denoted by
The action S[A , A ] can be expanded in a series in
the antifields, while maintaining vanishing ghost S Snon j 51
number and even Grassmann parity:
Quantization is performed using the path integral
S; S0 i Ri C Ca 12 T
1 C C to calculate a correlation function X, with the
i j 1i 14 Eji 1 C C 44 constraint [45] implemented by a -function:
Z
@
When this is inserted into the classical master I X DD A
equation, one finds that this equation implies the @A
gauge structure of the classical theory. i
exp W; X; 52
h
Here W is the quantum action, which reduces to S in
Gauge Fixing and Quantization the limit h ! 0. An admissible leads to well-
defined propagators when the path integral is
Equation [39] shows that the action S still possesses
expressed as a perturbation series expansion.
gauge invariances, and hence is not yet suitable for
The results of a calculation should be independent
quantization via the path integral approach: a
of the gauge fixing. Consider the integrand in eqn
gauge-fixing procedure is necessary. In the Batalin
[52],
Vilkovisky approach the gauge is fixed, and the
antifields eliminated, by use of a gauge-fixing i
fermion which has Grassmann parity () = 1 I; exp W; X; 53
h
and gh[] = 1. It is a functional of the fields A
only; its relation to the antifields is Under an infinitesimal change in
Z
@ I X I X D I 54
A 45
@A
BatalinVilkovisky Quantization 251
way that the functional S restricted to L is Considering the commutator of two gauge transfor-
Q-invariant. This invariance is BRST invariance. mations leads to (see eqns [8][11])
AKSZ apply these geometric constructions to obtain Z
in a natural way the action functionals of two- 2Pmi ;j Pnj Pji Pmn ;j Cm Cn 0
dimensional sigma-models (Witten 1998) and to ZM
show that the ChernSimons theory (Axelrod and 2Pjk i Dlj Pmk ;ij Am Pjl 70
Singer 1991) in BatalinVilkovisky formalism arises as M
a sigma-model with target space G, where G stands Dm kl j kl
i P ;m D X P ;ji Cl Ck 0
for a Lie algebra and denotes parity inversion.
The Jacobi identity is
Pij ;m Pmk Ci Cj Ck 0 71
The Poisson-Sigma Model
The quantization of the Poisson-sigma model was The fields and antifields of the model are
performed by Hirshfeld and Schwarzweller (2000)
A fAi ; Xi ; Ci g and A Ai ; Xi ; Ci 72
and by Cattaneo and Felder (2001). The Poisson-
sigma model is the simplest topological field theory The extended action is
in two dimensions. It is a field theory on a two- Z
dimensional world sheet without boundary (Schaller S Ai @ Xi Pij XAi Aj
M
and Strobl 1994). It involves a set of bosonic scalar
j 1
fields, which can be seen as a set of maps Ai Di Cj Xi Pji XCj Ci Pjk ;i XCj Ck
Xi : M ! N, where N is a Poisson manifold. In 2
addition, one has a 1-form A on the world sheet M 1 i j kl
A A P ;ij XCk Cl 73
which takes values in T (N), for x coordinates on M 4
we have A = Ai dxi ^ dXi . Its action is
Z The gauge-fixing conditions are taken to be of the
form i (A, X), so that the gauge fermion [50] becomes
S0 X; A Ai @ Xi Pij XAi Aj 64 i i (A, X). The antifields are then fixed to be
M =C
where is the antisymmetric tensor and is the j @j A; X
Ai C
volume form on M. The gauge transformations of @Ai
the model are
j @j A; X
Xi C 74
Xi Pij X"j ; Ai Di "j
j
65 @Xi
Ci 0
j j
where Di = @ i Pkj ,i Ak . The equations of motion i A; X
C i
are
The gauge-fixed action is
Dji Aj 0 66 Z
S Ai @ Xi Pij XAi Aj
and M
i ij
@ X P Aj D X 0 i
67 k @k A; X j k @k A; X Pij Cj
C Di Cj C
@Ai @Xi
The gauge algebra is given by
1 m @m A; X n @n A; X
C C Pkl ;ij X
"1 ; "2 Xi Pji Pmn ;j "1n "2m 4 @Ai @Aj
j
"1 ; "2 Ai Di Pmn ;j "1n "2m 68 Ck Cl
i i A; X 75
D Xj Pmn ;ji "1n "2m
Now consider different gauge conditions:
In our general notation the generators of the gauge
j 1. First, the Landau gauge for the gauge potential
transformations R are here Pij and Di . The gauge
tensors T and E are Pij ,k and Pmn ,ji . The higher- i = @ Ai , so that the gauge fermion becomes
order gauge tensors A and B vanish. =C i @ Ai . The antifields are fixed to be
The ghost fields are again denoted by Ci . The i
Ai @ C
Noether identities are then
Z Xi Ci 0 76
Dji Aj Pki D Xi Dki Ck 0 69 @ Ai
C i
M
Bethe Ansatz 253
for this gauge choice the gauge-fixed action is Notice that in the noncovariant gauges 2 and 3 the
Z action simplifies, in that the term which arose
S i @ Dj Cj
Ai @ Xi Pij XAi Aj C because of the nonclosed nature of the gauge algebra
i
M vanishes.
1 i @ C
j Pkl ;ij X
@ C
4 See also: BF Theories; BRST Quantization; Constrained
Systems; Graded Poisson Algebras; Operads;
Ck Cl i @ Ai 77 Perturbative Renormalization Theory and BRST;
Supermanifolds; Topological Sigma Models.
Translating this action into the notation of Cattaneo
and Felder, one sees that it is exactly the expression
they use to derive the perturbation series.
Further Reading
2. Now consider the temporal gauge i = A0i . The
gauge fermion is given by = C i A0i . The anti- Alexandrov M, Kontsevich M, Schwarz A, and Zaboronsky O
fields are fixed to (1997) Geometry of the Master Equation. International
Journal of Modern Physics A12: 14051430.
i
A0i C Axelrod S and Singer IM (1991) ChernSimons Perturbation
Theory, Proceedings of the XXth Conference on Differential
A1i 0 Geometric Methods in Physics, Baruch College/CUNY, NY.
78 (hep-th/9110056).
Xi Ci 0
Batalin IA and Vilkovisky GA (1977) Gauge algebra and
A0i
C quantization. Physics Letters 69B: 309312.
i
Becchi C, Rouet A, and Stora R (1976) Renormalization of gauge
The gauge-fixed action is theories. Annals of Physics (NY) 98: 287321.
Z Cattaneo AS and Felder G (2001) On the AKSZ formulation of
S Ai @ Xi Pij XAi Aj the PoissonSigma model. Letters of Mathematical Physics
M 56: 163179.
i Dj Cj i A0i Faddeev LD and Popov VN (1967) Feynman diagrams for the
C 0i 79 YangMills field. Physics Letters 25B: 2930.
Gomis J, Paris J, and Samuel S (1994) Antibracket Antifields and
3. Finally consider the SchwingerFock gauge gauge-theory quantization. Physics Reports 269: 1145.
i = x Ai . Then the antifields are fixed to be Hirshfeld AC and Schwarzweller T (2000) Path integral quantiza-
tion of the PoissonSigma model. Annals of Physics (Leipzig)
i
Ai x C 9: 83101.
Schaller P and Strobl T (1994) Poisson structure induced
Xi Ci 0 80 (topological) field theories. Modern Physics Letters A9:
x Ai
C 31293136.
i
Witten E (1988) Topological sigma models. Communications in
for this gauge choice the gauge-fixed action is Mathematical Physics 118: 411449.
Z Zinn-Justin J (1975) Renormalization of gauge theories. In:
Rollnik H and Dietz K (eds.) Trends in Elementary
S Ai @ Xi Pij XAi Aj Particle Physics, Lecture Notes in Physics, vol. 37. Berlin:
M
Springer.
C i x Dj Cj i @ Ai 81
i
Bethe Ansatz
M T Batchelor, Australian National University, theory. At the heart of the Bethe ansatz is the way in
Canberra, ACT, Australia which multibody interactions factor into two-body
2006 Elsevier Ltd. All rights reserved. interactions. The Bethe ansatz is thus intimately
entwined with the theory of integrability.
The way in which the Bethe ansatz works is best
Introduction understood by working through an explicit hands-on
example. The canonical example is the isotropic
The Bethe ansatz is a particular form of wave function antiferromagnetic Heisenberg Hamiltonian
introduced in the diagonalization of the Heisenberg
X
L1
spin chain. It underpins the majority of exactly solved H hi;i1 hL;1 ; hij 12 s i s j 1 1
models in statistical mechanics and quantum field i1
254 Bethe Ansatz
where j (x)i is the state with an up spin at site x. ax; y A12 eik1 x eik2 y A21 eik2 x eik1 y 12
The aim is to find the amplitudes a(x). It is clear
Substitution of the ansatz [12] into [8] gives
that
E L 4 2 cos k1 2 cos k2 13
Hj xi L 2j xi j x 1i Substitution of [12] into [10] gives
j x 1i 4
A12 1 2 eik1 eik1 k2
14
in the bulk (away from either boundary). Insertion A21 1 2 eik2 eik1 k2
of [3] into H = E gives The three relations [11], [12], and [14] give the
Bethe equations
Eax L 2ax ax 1 ax 1 5
A12 A21
ikx eik1 L and eik2 L 15
Substitution of spin waves a(x) = e gives A21 A12
Bethe Ansatz 255
which are to be solved for k1 and k2 . Note that In this case the Bethe ansatz is
ei(k1 k2 )L = 1. y y
ax; y; z A123 zx1 z2 zz3 A132 zx1 z3 zz2
Case 4: n = 3 A213 zx2 zy1 zz3 A231 zx2 zy3 zz1
The full power of the Bethe ansatz method becomes A321 zx3 zy2 zz1 A312 zx3 zy1 zz2 24
evident for three particles. Here
in which zj = eikj . This is a sum over the 3!
X permutations of the integers 1, 2, 3. Inserting this
ax; y; zj x; y; zi 16
x<y<z
ansatz into [17] gives
E L 6 2cos k1 cos k2 cos k3 25
There are several cases to consider:
To determine the kj , it is convenient to define
1. y > x 1 and z > y 1, where
sij 1 2zj zi zj 26
Eax; y; z L 6ax; y; z ax 1; y; z
Substitution of [24] into the meeting conditions [21]
ax; y 1; z ax; y; z 1 17
and [22] then gives
By a(x 1, y, z), we mean a(x 1, y,z) s12 A123 s21 A213 s13 A132 s31 A312
a(x 1, y, z), etc.
s23 A231 s32 A321 0 27
2. y = x 1 and z > y 1, with
Again, we must ensure that these equations are The boundary condition, a(y, z, x L) = a(x, y, z),
compatible. This involves comparison of the last gives
three equations with [17]. The three equations to be L y y
z1 A321 A132 z1x z3 z2x zL2 A312 A231 z2x z3 z1x
satisfied are
y y
zL1 A231 A123 z1x z2 z3x zL3 A213 A321 z3x z2 z1x
2ax; x 1; z ax; x; z ax 1; x 1; z 21
zL2 A132 A213 z2x z1y z3x zL3 A123 A312 z3x z1y z2x
2ax; y; y 1 ax; y; y ax; y 1; y 1 22 0 31
4ax; x 1; x 2 ax; x; x 2 ax; x 1; x 1 This leads to the equations
ax; x 2; x 2 A123 A132 s21 s31
ax 1; x 1; x 2 23 zL1
A231 A321 s12 s13
But note that setting z = x 2 in [21] and y = x 1 A213 A231 s12 s32
zL2 32
in [22] leads to [23] being automatically satisfied. A132 A312 s21 s23
We are thus left with only two equations [21] and A321 A312 s13 s23
[22]. Note the similarity between these two equa- zL3
A213 A123 s31 s32
tions and the meeting condition [10] for the n = 2
case. which can be solved for the Bethe roots kj .
256 Bethe Ansatz
Y
n
sp ;p Y
n
s;j XN
@2 X
zLp1 n1 1
or zLj n1 37 H 2c xi xj 44
s sj; @x2i
2 p1 ;p 1 i1 1i<jN
6j
uj 1/2i Y
N
kj k ic
eikj 39 expikj L
uj 1/2i 1
kj k ic
wave function [33] due to higher symmetries. This Theory; Integrable Systems: Overview; Quantum Spin
results in Bethe equations involving different types Systems; YangBaxter Equations.
or colors of roots.
The exactly solved one-dimensional quantum spin
chains may also be obtained from their two-dimen- Further Reading
sional classical counterparts the vertex models. For
Baxter RJ (1983) Exactly Solved Models in Statistical Mechanics.
example, the six-vertex model shares the same Bethe
London: Academic Press.
ansatz wave function and Bethe equations as the Baxter RJ (2003) Completeness of the Bethe ansatz for the six-
XXZ spin chain. The more general permutator and eight-vertex models. Journal of Statistical Physics
Hamiltonians are related to multistate vertex models. 108: 148.
One may also consider other spin-S models. Bethe HA (1931) Zur Theorie der Metalle I. Eigenwerte und
Eigenfunktionen der linearen Atomkette. Zeitschrift fur Physik
The discussion in this article has centered on what is
71: 205226.
known as the coordinate Bethe ansatz. Another Gaudin M (1967) Un Systeme a Une Dimension de Fermions en
formulation is the algebraic Bethe ansatz, which was Interaction. Physics Letters A 24: 5556.
developed for the systematic treatment of the higher- Gaudin M (1983) la Fonction donde de Bethe. Paris: Masson.
spin models. In this formulation, operators create the Korepin VE, Izergin AG, and Bogoliubov NM (1993) Quantum
Inverse Scattering Method and Correlation Functions.
Bethe states by acting on a vacuum. The algebraic
Cambridge: Cambridge University Press.
Bethe ansatz goes hand-in-hand with the quantum Lieb EH and Liniger W (1963) Exact analysis of an interacting
inverse-scattering method. In all of the exactly solved Bose gas I. The general solution and the ground state. Physical
Bethe ansatz models, it is possible to derive quantities Review 130: 16051616.
like the ground-state energy per site via the root density Mattis DC (1993) The Many-Body Problem: An Encyclopaedia of
Exactly Solved Models in One-Dimension. Singapore: World
method, which assumes that the Bethe roots form a
Scientific.
uniform distribution in the infinite-size limit. The McGuire JB (1964) Study of exactly soluble one-dimensional
thermodynamics of the Bethe ansatz solvable models N-body problems. Journal of Mathematical Physics
may also be calculated in a systematic fashion. 5: 622636.
Despite Bethes early optimism, the Bethe ansatz Sutherland B (2004) Beautiful Models: 70 Years of Exactly Solved
Quantum ManyBody Problems. Singapore: World Scientific.
has not been extended to higher-dimensional Takahashi M (1999) Thermodynamics of One-Dimensional
systems. Solvable Models. Cambridge: Cambridge University Press.
Yang CN (1967) Some exact results for the many-body problem
See also: Affine Quantum Groups; Eight Vertex and Hard in one-dimension with repulsive Delta-function interaction.
Hexagon Models; Integrability and Quantum Field Physical Review Letters 19: 13121315.
BF Theories
M Blau, Universite de Neuchatel, Neuchatel, that A is flat, FA = 0, and thus BF theories are
Switzerland topological gauge theories of flat connections.
2006 Elsevier Ltd. All rights reserved. Abelian BF theories and their relation to topolo-
gical invariants (the RaySinger torsion) were
originally discussed by Schwarz (1978, 1979). In
the context of the topological field theory, non-
Introduction
abelian BF theories were introduced in Horowitz
BF theories are a class of gauge theories with a (1989) and Blau and Thompson (1989, 1991).
nontrivial metric-independent classical action. As Since then, BF theories have attracted a lot of
such these theories are candidate topological field attention as simple toy-models of (topological)
theories akin to the ChernSimons theory in three gauge theories, and also because of their relation-
dimensions, but in contrast to the ChernSimons ships with the ChernSimons theory, the YangMills
theory these exist and are well defined in arbitrary theory, and gauge-theory formulations of gravity, as
dimensions. well as because of the rather rich and intricate
The name BF theories derives from the fact structure of their quantum theories.
that, roughly (see [1] below and the subsequent The purpose of this article is to provide an
discussion for a more precise description),
R the action overview of these various features of BF theories.
of the BF theory takes the form B ^ FA with FA the The standard reference for the basic classical and
curvature of a connection A and B a Lagrange quantum properties of BF theories is Birmingham
multiplier. The classical equations of motion imply et al. (1991).
258 BF Theories
and the nature of the local symmetries of the BF for example, the usual YangMills action for
theory depend strongly on the dimension n of M, the nonabelian gauge fields
structure and interpretation of the classical moduli Z
1
space also depend on n. SYM 2 trG FA ^ ?FA 17
For n = 2, by [5] the equation of motion [2] for 4g M
B 2 0 (M, g) says that A is invariant under the it does not require a metric (or the corresponding
infinitesimal gauge transformation generated by B. Hodge duality operator ?) for its formulation. This
Thus if A is irreducible, there are no nontrivial makes it a candidate action for a topological field
solutions for B and, away from reducible flat theory, this term loosely referring to field theories
connections, the classical moduli space is just the which, in a suitable sense, do not depend on
moduli space of flat connections on P ! M over the additional structures imposed on the underlying
surface M: space(-time) manifold M, in this case a Riemannian
structure.
Cn2 Mflat P; G 10
To establish that BF theories are topological
This space may or may not be empty, depending on quantum field theories, one needs to show that
whether P admits flat connections or not. the partition function (and correlation functions)
For n = 3, the equation of motion [2] for of the quantized BF theory are also metric
B 2 1 (M, g) says that B is a tangent vector to the independent. This is not completely automatic as
space of flat connections at the flat connection A, in typically the metric enters in the gauge fixing of
the sense that under the variation A = B, one has the local symmetries of the action which is
required to make the quantum theory well defined.
FA dA B 0 11
The usual lore is that since the metric only enters
The local G gauge symmetry and the 1-form symmetry through the gauge fixing and since the quantum
[6] now imply that the moduli space of classical theory should be independent of the choice of
solutions can be identified with the (co-)tangent bundle gauge, it should also be metric independent. In the
of the moduli space of flat connections on P ! M case of nonabelian BF theories, the complexity of
over the 3-manifold M: their local symmetries complicates the analysis
somewhat, but it can nevertheless be shown that
Cn3 TMflat P; G 12
BF theories indeed define topological field theories
In higher dimensions there appears to be less also at the quantum level.
geometrical structure associated with BF theories,
and all that can be said in general is that the tangent Special Features of Abelian BF Theories
space to Cn at a solution (A, B) of the equations of
motion [2] is the vector space: All the features of nonabelian BF theories discussed
above are, of course, also valid when G is abelian
TA;B Cn HA1 M; g HAn2 M; g 13 (with some obvious modifications and simplifica-
where HAk (M, g) are the cohomology groups of the tions). However, when G is abelian, a more general
deformation complex action than [1] is possible. Indeed, although there is
no obvious higher p-form analog of nonabelian
dA : M; g ! 1 M; g 14 gauge fields, in the abelian case G = U(1) or G = R,
2 and the condition FA 2 2 (M, R) can be relaxed. In
associated with the flat connection A, FA = (dA ) = 0.
particular, one can consider the actions
When M is topologically of the form M = R
Z
(where one can think of R as time), one has
Sn; p SBp ; Cnp1 Bp ^ dCnp1 18
TA;B Cn HA1 ; g HAn2 ; g 15 M
the flat connection A, and it reduces to the abelian BF is well defined. The RaySinger torsion of (M, g)
action [18] for g = R. (with respect to the flat connection A) is then
The action is invariant under the (reducible) local defined by
symmetries
n
Y p
p 1 p=2
Bp ! Bp dA p1 TA M det A 25
20 p0
Cnp1 ! Cnp1 dA 0np2
The space of solutions to the equations of motion Even though this definition depends strongly on the
dA C = dA B = 0 modulo gauge symmetries is (cf. [13]) metric g on M, the RaySinger torsion has the
the finite-dimensional vector space remarkable property of being independent of g. The
RaySinger torsion can be shown to be trivial
p np1
Cn; p HA M; g HA M; g 21 (essentially =1 modulo zero-mode contributions)
in even dimensions, but is a nontrivial topological
which is naturally symplectic for M = R. invariant in odd dimensions. Henceforth, we will
suppress the dependence on M and denote the
n-dimensional RaySinger torsion by TA (n).
Uses and Applications of Quantum
Abelian BF Theories Gaussian path integrals and determinants The path
Quantization of Abelian BF Theories and the integral for abelian BF theories is modeled on the
RaySinger Torsion usual formula for a -function
Z
We will now show that the partition function of 1
n x p n dn eix 26
the abelian BF theory (actually more generally that 2 Rn
of the linearized nonabelian BF action [19]) is
related to the RaySinger torsion of M. This from which one deduces the Gaussian integral
requires some preparatory material on Gaussian formula
path integrals, determinants, and gauge fixing that Z
1
we present first. p dn dn x eiDxiKxiJ
In order to simplify the exposition, we assume 2n Rn Rn
that there are no harmonic modes, either because Z
they have been gauged away or because the dn xn Dx J eiKx
Rn
cohomology groups of dA are trivial, HAk (M, g) = 0,
1 1
that is, the deformation complex [14] is acyclic. eiK:D J 27
det D
Here, we have assumed that the operator (matrix) D
Laplacians, determinants, and the RaySinger
is invertible. The model that one uses in the path
torsion Choosing a Riemannian metric g (and
integral is that
Hodge duality operator ?) on M, the twisted
Laplacian on p-forms is Z R
i
det D 1
?D
d d e M 28
p
A dA dA? 2 dA dA? dA? dA 22
where is a set of fields and the are a set of dual
where dA? = ? dA ? is the adjoint of d with respect to
fields with D again a nondegenerate operator. The
the scalar product on p-forms defined by ?. This is an
inverse determinant arises for Grassmann even fields
elliptic operator whose determinant can be defined, for
(as in [27]), while it is the determinant that appears
example, by a -function regularization. Denoting the
for Grassmann odd fields.
(nonzero) eigenvalues of A(p) by k(p) , its -function is
X p s Gauge fixing the FaddeevPopov trick If the
p s k 23 R
k
action [19], SA (n, p) = Bp dA Cnp1 , were non-
degenerate, its partition function could be defined
This converges for Re(s) sufficiently large and can be directly by [28]. However, because of gauge invariance
analytically continued to a meromorphic function of of the action, the kinetic term is degenerate and one
s analytic at s = 0, so that needs to eliminate the gauge freedom to obtain an (at
p p0
least formally) well-defined expression for the partition
det A : e 0
24 function. Concretely, this degeneracy can be seen by
BF Theories 261
recalling that, when there are no harmonic forms (as we where denotes collectively all the fields. Concre-
have assumed), there is a unique orthogonal Hodge tely, when n = 2 and p = 0 (or, equivalently, p = 1),
decomposition of a p-form Bp 2 p (M, g) into a sum of the quantum action is
a dA -exact and a dA -coexact form: Z
q 0
Bp dA p1 dA? p1 29 SA 2; 0 B0 dA C1 dA ? C1 c ? A c 35
(and likewise for C). Evidently, the exact (longitudinal) Likewise, for n = 3 and p = 1 (the only other case
parts dA of B and C do not appear in the action, and when the gauge symmetry is indeed irreducible),
these are precisely the gauge-dependent parts of B and both B1 and C1 require separate gauge fixing, and
C under the gauge transformation [20]. Gauge fixing the quantum action is
amounts to imposing a condition F (Bp ) = 0 on Bp that Z
determines the longitudinal part uniquely in terms of q 0
SA 3; 1 B1 dA C1 dA ? C1 c ? A c
the transversal part dA? . A natural condition is
0
dA p1 0 , F Bp dA? Bp 0 30 0 dA ? B1 c0 ? A c0 36
A gauge-fixing condition independent of the partition Formally, therefore, the two-dimensional partition
function results from inserting 1 in the form of function is
Z
det 0
1 dgF Bg F B 31 ZA 2; 0 37
G det DA
into the functional integral (the FaddeevPopov where DA is the operator:
trick), where G is the gauge group. This defines the
FaddeevPopov determinant F , and the functional ?dA
DA : 1 M; g
properties of the delta functional imply that F is ?dA ?
the determinant of the operator that one obtains ! 0 M; g 0 M; g 38
upon gauge variation of F (B).
In the general case of reducible gauge symmetries, One can define the determinant of this operator as
the nature of the gauge group is complicated and the square root of the determinant of the operator
requires some more thought. In the irreducible case, D?A DA = (1)
A , and therefore the partition function
however, that is, for p = 1, the Lie algebra of the
gauge group can be identified with 0 (M, g), and ZA 2; 0 det 0 det 1 1=2 TA 2 39
F is the determinant of the operator: is equal to the two-dimensional RaySinger torsion
F [25]. In this case, it is easy to see directly that the
dA : 0 M; g ! 0 M; g 32 even-dimensional RaySinger torsion is trivial, as
B
one could have equally well defined the determinant
For [30], this is simply the Laplacian on 0-forms, of DA as the square root of the operator
and thus DA D?A = (0) (0)
A A , which implies ZA (2, 0) = 1.
F det A
0
33 In three dimensions, the two pairs of ghosts each
contribute a det (0)
A , and thus
det 0 2
ZA 3; 1 40
det DA
The partition function Following the finite-dimen-
sional model, both the -function implementing the where !
gauge-fixing condition and the FaddeevPopov ?dA dA
determinant can be lifted into the exponential, the DA : 0 M; g 1 M; g
dA ? 0
former by a Lagrange multiplier [26], a Grassmann
even 0-form, and the latter by a pair of Grassmann ! 0 M; g 1 M; g 41
odd 0-forms c and c [28], the ghost and antighost
is the operator acting on the fields (B1 , C1 , , 0 ). As
fields, respectively. The sum of the classical action
before, this operator can be diagonalized by squar-
and these gauge-fixing and ghost terms defines the
q ing it, DA DA = (0) (1) , and thus
(BRST-invariant) quantum action SA (n, p), and the
partition function is 0
ZA 3; 1 det A 3=2 det A 1=2
1
Z
TA 31
q
ZA n; p deiSA n;p 34 42
262 BF Theories
is again related to the (this time genuinely nontrivial) Since the dimension of is equal to the codimen-
RaySinger torsion. sion of S0 = @0 , and S0 will generically intersect
In spite of the complications caused by reducible transversally at isolated points, and we define the
gauge symmetries, it can be shown that all of the linking number of S and S0 to be the intersection
above generalizes to arbitrary n and p, with the number of and S0 , expressed in terms of de Rham
result that (for n odd) currents as
p Z Z
ZA n; p TA n1 43
LS; S0 S0 S0 46
confirming the topological nature of BF theories. M
In the nonabelian case, the situation is significantly In terms of de Rham currents, the Wilson
R surface
more complicated because of the complexity of the operators can be written as WS [B] = M S ^ B, etc.
classical moduli space, the (higher cohomology) zero Thus, the generating functional for correlation
modes, and the on-shell reducibility of the gauge functions of Wilson surface operators
symmetries. Nevertheless, ignoring all the zero modes
except those of A, that is, except the moduli m of flat hei WS B ei
WS0 C i
Z R
connections A(m), the result is similar to that in the i B dC
S0 C S B
DCDBe M 47
abelian case, in that the partition function reduces to an
integral over the moduli space of flat connections, with
is simply a Gaussian path integral. Using the
measure determined by the RaySinger torsion TA(m) .
defining properties of de Rham currents, this can
be formally evaluated (using [27]) to give
Linking Numbers as Observables of Abelian 0
BF Theories hei WS B ei
WS0 C i ei
LS;S 48
With the exception of p = 0, there are no interesting As expected, correlation functions of these topolog-
local observables (gauge-invariant functionals of the ical field theories encode topological information.
fields C and B) in the abelian BF theory, since the gauge-
invariant field strengths dC and dB vanish by the
equations of motion. (For p = 0, B is a gauge-invariant Uses and Applications of Classical
0-form and hence B(x) is a good local observable.) Nonabelian BF Theories
However, as in the ChernSimons and YangMills Low-dimensional BF theories are closely related to
theories, certain (weakly) nonlocal observables such as other theories of interest, for example, the Yang
Wilson loops are also of interest. In the case at hand (eqn Mills theory, the ChernSimons theory, and gravity.
[18]), we have abelian Wilson surface operators Here, we briefly review some of these relationships.
Z Z
In order to avoid the complexities of quantum
WS B B; WS0 C C 44 nonabelian BF theories, we focus on their classical
S S0
features. Brief suggestions for further reading are
associated with p- and (n p 1)-dimensional sub- provided at the end of each subsection.
manifolds S and S0 of M, respectively. These operators
are gauge invariant, that is, invariant under the local Relation with YangMills Theory
symmetries [20] provided that @S = @S0 = 0, so that S
and S0 represent homology cycles of M. In any dimension, the nonabelian BF action can be
For M = R n , correlation functions of these opera- regarded as the zero-coupling limit g2 ! 0 of the
tors are related to the topological linking number of YangMills theory since the YangMills action [17]
S and S0 . We choose S = @ and S0 = @0 to be can be written in first-order form as
disjoint compact-oriented boundaries of oriented Z
1
submanifolds and 0 of Rn . We also introduce trG FA ^ ?FA
4g2 M
de Rham currents and S (essentially distribu- Z
tional differential forms with -function support on trG iBn2 ^ FA g2 Bn2 ^ ?Bn2 49
or S, respectively), characterized by the properties M
Z Z
However, whereas for n
3 the B2 -term breaks the
!p S ^ !p p-form gauge invariance of the BF action (and thus
Z S ZM 45 liberates the physical YangMills degrees of free-
!p1 ^ !p1 dom), this limit is nonsingular in two dimensions
M
where this p-form symmetry is absent and, indeed,
k
for all !k 2 (M, R) (and likewise for S0 and 0 ). both theories have zero physical degrees of freedom.
BF Theories 263
c c
A nonsingular BF-like zero coupling limit of [Ja , Jb ] = fab Jc , [Ja , Pb ] = fab Pc and [Pa , Pb ] = 0, and
the YangMills theory for n
3 can be obtained the curvature of the TG-connection C = Ja Aa Pa Ba is
by introducing an auxiliary (Stuckelberg) field
2 n3 (M, g) which restores the p-form gauge FC Ja FAa Pa dA Ba 53
invariance. The resulting BF YangMills action is Thus, the equations of motion of the TG Chern
Z
Simons theory are equivalent to the equations of
SBFYM trG iBn2 ^ FA motion [2] of the BF theory with gauge group G.
M
This equivalence also holds at the level of the action:
2 1
g Bn2 p dA 1
2g 2 SCS C SBF A; B 54
1
^ Bn2 p dA 50 provided that one chooses the nondegenerate invar-
2g iant scalar product to be
This action is not only invariant under ordinary G
trTG Ja Pb trG Ja Jb
gauge transformations, but also under the p-form 55
gauge symmetry B ! B dA [6] provided that trTG Ja Jb trTG Pa Pb 0
p
transforms as ! 2g. Thus, this shift can be
For G = SO(3), TG is the Euclidean group of
used to set to zero, upon which one recovers the
isometries of R3 and for G = SO(2, 1), TG is the
first-order form of the YangMills action. More-
Poincare group of isometries of the three-dimensional
over, in the zero-coupling limit all that survives is a
Minkowski space R2, 1 . For these gauge groups, the BF
standard (and nontopological) minimal coupling of
action takes the form of the three-dimensional
to the BF action:
(Euclidean or Lorentzian) EinsteinHilbert action,
lim SBFYM with the interpretation of B = e as the dreibein and
g2 !0 A = ! as the spin connection. The equations of motion
Z
for e and ! express the vanishing of the torsion
trG iBn2 ^ FA 12 dA ^ dA 51
M and the Riemann tensor (equivalent to the vanishing
of the Ricci tensor for n = 3), respectively. This
accounting for the correct number of degrees of
ChernSimons interpretation of three-dimensional
freedom of the YangMills theory (the (n 3)-form
gravity extends to gravity with a cosmological
being absent for n = 2).
constant, with H the appropriate de Sitter or anti-de
Two-dimensional quantum BF and YangMills
Sitter isometry group (SO(4), SO(3, 1), or SO(2, 2),
theories have a variety of interesting topological
depending on the signature and the sign of the
properties. An account of some of them can be found
cosmological constant). In terms of the BF interpreta-
in Blau and Thompson (1994) and Witten (1991). For
tion, this corresponds to the simple topological
a detailed discussion of the gauge symmetries and gauge
deformation
fixing of the BFYM action, see Cattaneo et al. (1998).
Z
SBF A; B trG B ^ FA 13 B ^ B ^ B 56
M
ChernSimons Theory, Gravity, and (Deformed)
BF Theory of the BF action, which has the deformed local
symmetries (cf. [5] and [6])
The ChernSimons theory is a three-dimensional
gauge theory. The ChernSimons action for an A dA B; 0 ; B B; dA 0 57
H-connection C, H the gauge group, is
Z A simple way to understand these symmetries is to
note that the action can be written as the difference
SCS C trH C ^ dC 23 C ^ C ^ C 52
M of two ChernSimons actions:
p p
It is invariant under the infinitesimal gauge transforma- SCS A B SCS A B
tions C = dC , 2 0 (M, h), and the gauge-invariant p
4 SBF A; B 58
equation of motion is the flatness condition FC = 0.
Now let H = TG be the tangent bundle group whose evident standard local gauge symmetries
p
TG G s g. This is a semidirect product group (A B) = dApB are equivalent to [57] for
p
with G acting on g via the adjoint and g regarded = 0 .
as an abelian Lie algebra of translations. Thus, in A detailed account of three-dimensional classical
terms of generators (Ja , Pa ), where the Ja are and quantum gravity can be found in Carlip
generators of G, the commutation relations are (1998).
264 BF Theories
Relation with Gravity Wilson loops and string topology has been investi-
gated in Cattaneo et al. (2003).
Theories of two-dimensional gravity and topological
gravity also have a BF formulation (Blau and See also: BatalinVilkovisky Quantization; BRST
Thompson 1991, Birmingham et al. 1991) which Quantization; ChernSimons Models: Rigorous Results;
resembles the ChernSimons BF formulation of Gauge Theories From Strings; Knot Invariants and
three-dimensional gravity described above, the nat- Quantum Gravity; Loop Quantum Gravity; Moduli
ural gauge group now being SO(2, 1) or SO(3) or Spaces: An Introduction; Nonperturbative and
one of its contractions. Topological Aspects of Gauge Theory; Schwarz-Type
In the first-order (Palatini) formulation, the Topological Quantum Field Theory; Spin Foams;
EinsteinHilbert action for four-dimensional gravity Topological Quantum Field Theory: Overview.
can be written as
Z
SEH tre ^ e ^ F! 59
Further Reading
Baez J (2000) An introduction to spin foam models of
where e is the vierbein and ! is the spin quantum gravity and BF theory. Lecture Notes in Physics
543: 2594.
connection. This action has the general form of a Birmingham D, Blau M, Rakowski M, and Thompson G (1991)
BF action with a constraint that B = e ^ e be a Topological field theory. Physics Reports 209: 129340.
simple bi(co-)vector. Thus, four-dimensional Blau M and Thompson G (1989) A New Class of Topological
general relativity can be regarded as a constrained Field Theories and the RaySinger Torsion. Physics Letters B
BF theory. Although this constraint drastically 228: 6468.
Blau M and Thompson G (1991) Topological gauge theories
changes the number of physical degrees of freedom of antisymmetric tensor fields. Annals of Physics
(BF theory has zero degrees of freedom, while 205: 130172.
four-dimensional gravity has two), this is never- Blau M and Thompson G (1994) Lectures on 2d gauge theories:
theless a fruitful analogy which also lies at the topological aspects and path integral techniques. In: Gava E,
heart of the spin-foam quantization approach to Masiero A, Narain KS, RandjbarDaemi S, and Shafi Q (eds.)
Proceedings of the 1993 Trieste Summer School on High
quantum gravity. This constrained BF description Energy Physics and Cosmology, pp. 175244. Singapore:
of gravity is also available for higher-dimensional World Scientific.
gravity theories. Carlip S (1998) Quantum Gravity in 2 1 Diemensions. Cambridge:
For further details, and references, see Freidel et al. Cambridge University Press.
(1999) and the review article (Baez 2000). Cattaneo A and Rossi C (2001) Higher-dimensional BF theories in
the BatalinVilkovisky formalism: the BV action and general-
ized Wilson loops. Communications in Mathematical Physics
Knot and Generalized Knot Invariants 221: 591657.
Cattaneo A, Cotta-Ramusino P, Fucito F, Martellini M, and
The known relationship between Wilson loop Rinaldi M, et al. (1998) Four-dimensional YangMills theory
observables of the ChernSimons theory with as a deformation of topological BF theory. Communications in
Mathematical Physics 197: 571621.
a compact gauge group and knot invariants Cattaneo A, Pedrini P, and Frohlich J (2003) Topological field
(Witten 1989), and the interpretation of the three- theory interpretation of string topology. Communications in
dimensional BF theory as a ChernSimons theory Mathematical Physics 240: 397421.
with a noncompact gauge group raise the question of Freidel L, Krasnov K, and Puzio R (1999) BF description of
higher-dimensional gravity theories. Advances in Theoretical
the relation of observables of an n = 3 BF theory to
and Mathematical Physics 3: 12891324.
knot invariants, and suggest the possibility of using Horowitz GT (1989) Exactly soluble diffeomorphism invariant
an n
4 BF theory to define higher-dimensional theories. Communications in Mathematical Physics
analogs of knot invariants. It turns out that an 125: 417437.
appropriate observable of n = 3 BF theory for Schwarz AS (1978) The partition function of a degenerate
G = SU(2) is related to the AlexanderConway quadratic functional and RaySinger Invariants. Letters in
Mathematical Physics 2: 247252.
polynomial. The analysis of higher-dimensional BF Schwarz AS (1979) The partition function of a degenerate
theories requires the full power of the Batalin functional. Communications in Mathematical Physics
Vilkovisky (BV) formalism. BV observables general- 67: 116.
izing Wilson loops have been shown to give rise to Witten E (1989) Quantum field theory and the Jones
cohomology classes on the space of imbedded curves. polynomial. Communications in Mathematical Physics
127: 351399.
For a detailed discussion of these issues, see Witten E (1991) On quantum gauge theories in two dimen-
Cattaneo and Rossi (2001) and references therein. sions. Communications in Mathematical Physics
A relation between the algebra of generalized 141: 153209.
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 265
C[SU2 ] in terms of the pi generators is an infinite which is of the form of Schrodingers equation with
series given by the CampbellBakerHausdorff series, respect to an auxiliary time variable and for a
and not the usual linear one (this is why the measure particle with mass 1=.
is not the Lebesgue one). The physical content here is The reader may ask what happens to the
in the plane waves themselves, one can use any other Euclidean group of translations and rotations in
momentum coordinates to parametrize them with the this context. From the above we find that
corresponding measure and coproduct. Differential U (poinc3 ) = C[SU2 ] U(su2 ), the semidirect pro-
operators on R3 are given by the action of elements of duct generated by translations @ i and usual rota-
C[SU2 ] and are diagonal on these plane waves, tions. This in turn is the quantum double D(U(su2 ))
of the classical enveloping algebra, and as such a
f: p f p p
quantum group with braiding etc. (see Hopf
which corresponds under Fourier transform simply Algebras and q-Deformation Quantum Groups).
to pointwise multiplication in C[SU2 ]. For example, This quantum double has been identified as part
the function 2 (tr 2) as a function on SU2 will of an effective theory in 2 1 quantum gravity in a
give a rotationally invariant wave operator which is Euclidean version based on ChernSimons theory
also invariant under inversion in the group. Its value with Lie algebra poinc3 and the spin space algebra
on plane waves is proposed as an effective theory for this. The
quotient of R3 by an allowed value of the quadratic
1 2
treip 1 2 cosjpj 1 Casimir x2 (which then makes it a matrix algebra)
2 is called a fuzzy sphere and appears as a world-
In the limit ! 0 this gives the usual wave operator volume algebra in certain string theories and
on R3 . reduced matrix models. The noncommutative dif-
It is also possible to put a differential graded ferential geometry that we have described is due to
algebra (DGA) structure of differential forms on this Batista and the author.
algebra, the natural one being 2. We take the same type of construction to
obtain the bicrossproduct model spacetime
2
dxi i ; xi xi i dxi algebra
dxi xj xj dxi iij k dxk iij R1;3
: t; xi ixi ; xi ; xj 0
where is the 2 2 identity matrix which, together These are the relations of a Lie algebra b (say) but
with the Pauli matrices i , completes the basis of again regarded as coordinates on a noncommutative
left-invariant 1-forms. The 1-form provides a spacetime. Here is a timescale which can be
natural time direction, even though there is no time written as a mass scale = 1= instead. We
coordinate, and the new parameter 6 0 appears as parametrize the plane waves as
the freedom to change its normalization. The partial 0
derivatives @ i are defined by p;p0 eipx eip t ; p;p0 p0 ;p00 pep0 p0 ;p0 p00
quantum group Fourier transform reduces to the in units where 1 is the usual speed of light. So
usual one but normal ordered, the prediction is that the speed of light depends
Z on energy. What is remarkable is that even if
0
F f d4 p f peipx eip t 1044 s (the Planck timescale), this prediction
R4 could in principle be tested, for example using -ray
(one can also Fourier transform with respect to the left- bursts. These are known in some cases to travel
invariant measure d4 p e3p on B ). The inverse is again
0
cosmological distances before arriving on Earth, and
given in terms of the usual inverse transform if we have a spread of energies from 0.1100 MeV.
specify general fields in R1,
3
by normal ordering of According to the above, the relative time delay t
usual functions, which we shall do. As before, the action on traveling distance L for frequencies correspond-
of elements of C[B ] defines differential operators on ing to p0 , p0 p0 is
R1, 3
and these act diagonally on plane waves.
We also have a natural DGA with L
t p0 1044 s 100 MeV 1010 y 1 ms
dxj x x dxj ; dtx x dt idx c
generators there do not close among themselves but where p = i@ . The wave operator @ @ is
mix with momentum). therefore given by the action of p p and has value
3. The usual Heisenberg algebra of quantum k k as usual on plane waves. On the other hand,
mechanics is another possible noncommutative 0
k
undeformed product. The result may look different above was already proposed in Snyder (1947).
when the same (x) is expressed as a function of the Here
variables with the
product. In other words, the
x ; x
i2 M
k x eixk
eix:k ; p k x k k x p ; x
i
2 p p
; p ; p
0
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 269
so the entire Poincare algebra is undeformed but the The full extent of quantum bundles and gravity
phase-space relations are deformed. Snyder also (see Quantum Group Differentials, Bundles and
constructed the orbital angular momentum realiza- Gauge Theory) and quantum field theory is not
tion M
= x p
x
p . This model is not a propo- always possible, although both have been done for
sal for a noncommutative spacetime because the covariant twist examples (for functorial reasons)
algebra does not even close among the x . Rather it and for small finite sets. For the first two models
is a proposal for mixing of position and Lorentz above, for example, it is not clear at the time of
generators. On the other hand (which was the point writing how to interpret scattering when the addi-
of view in Snyder (1947)), in any representation of tion of momenta is nonabelian.
the Poincare algebra, the M
become operators and
in some sense numerical. The rotational sector has
discrete eigenvalues as usual, so to this extent the
Matched Pair Equations
spacetime has been discretized. Although not fitting
into the methods in this article, it is also of interest Although we have presented noncommutative space-
that the relations above were motivated by con- time first, the first actual application of quantum
sidering p as coordinates projected from a 5D flat group methods to Planck-scale physics was the
space to de Sitter space and x as the 5-component Planck-scale Hopf algebra obtained by a theory of
of orbital angular momentum in the flat space. bicrossproducts. Like the Snyder model, the inten-
To conclude this section, let us note that there are tion here was to deform phase space itself, but since
further models that we have not included for lack of then bicrossproducts have had many further appli-
space. One of them is a much-studied R 1, q
3
in which cations. The main ingredient here is the notion of a
t is central but the xi enjoy complicated q-relations pair of groups (G, M), say, acting on each other as
best understood as q-deformed Hermitian matrices. we explain now. The mathematics here goes back to
One of the motivations in the theory was the result the early 1910s in group theory, but also arose in
in Majid (1990) that q-deformation could be used to mathematical physics as a toy version of Einsteins
regularize infinities in quantum field theory as poles equation in the sense of compatibility between
at q = 1. Another entire class is to use noncommu- quantization and curvature (see the next section).
tative geometry and quantum group methods on By definition, (G, M) are a matched pair of
finite or discrete spaces. Unlike lattice theory where groups if there are left and right actions
a finite lattice is viewed as approximation, these
3 "
models are not approximations but exact noncom- M M G!G
mutative geometries valid even on a few points. The
of each group on the set of the other, such that
noncommutativity enters into the fact that finite
differences are bilocal and hence naturally have s3e s; e"u u; s"e e; e3u e
different left and right multiplications by functions.
s3u3v s3uv; s"t"u st"u
Both aspects are mentioned briefly in the overview
article (see Hopf Algebras and q-Deformation s"uv s"us3u"v
Quantum Groups). Also, on the experimental st3u s3t"ut3u
front, another large area that we have not had
room to cover is the prediction of modified for all u, v 2 G, s, t 2 M. Here e denotes the relevant
uncertainty relations both in spacetime and phase group unit element. As a first application of such
space (Kempf et al. 1995). data, one may make a double cross product group
Moreover, for all of the models above, once one G M with product
has a noncommutative differential calculus one may u; s:v; t us"v; s3vt
proceed to gauge theory etc., on noncommutative
spacetimes, at least at the level where a connection and with G, M as subgroups. Since it is built on the
is a noncommutative (anti-Hermitian) 1-form . direct product space, the bigger group factorizes into
Gauge transformations are invertible (unitary) these subgroups. Conversely, if X is a group
elements u of the noncommutative coordinate factorization such that the product G M ! X is
algebra and the connection and curvature trans- bijective, each group acts on the other by actions
form as ", 3 defined by su = (s"u)(s3u) for u 2 G and s 2
M, where s, u are multiplied in X and the product is
! u1 u u1 du factorized as something in G and something in M.
So finite group matched pairs are equivalent to
F d ^ ! u1 Fu group factorizations. In the Lie group context, the
270 Bicrossproduct Hopf Algebras and Noncommutative Spacetime
s s s (t u)
(st ) u =
t t u
t t u
The Planck-Scale Hopf Algebra
u u
We consider a quantum algebra of observables H
s (uv ) s u (s u) v
and ask when it is a Hopf algebra extending some
s u classical position coordinate algebra C[M] and some
s s (uv) = s (s u ) v
possibly noncommutative momentum coordinate
uv u v
algebra U(g ) in the form of a strict extension
e u s e e CM ! H ! Ug
e e u = e u e s s e = s s
u From the theory above this problem is governed by local
u e e
solutions of the matched pair equations on (G, M). It
Figure 2 Matched pair condition as a subdivision property. requires that H C[M] U(g ) as an algebra, that is,
Bicrossproduct Hopf Algebras and Noncommutative Spacetime 271
the quantization of a particle moving on orbits in M a background curvature scale , and the correspond-
under some action of G (in an algebraic setting, or ing bicrossproduct C[p] C[x] is
one can use von Neumann or C -algebras etc.). And
it requires the classical phase space to be a p; x ih1 e x ; x x 1 1 x
x
nonabelian or curved group M g . This extends p p e 1 p; x p 0
to a coproduct on H which becomes the bicross- Sx x; Sp pe x
gravity (in the form of ) and both are required for a for i = 1, 2 and the usual additive ones for p3 , M3 .
nontrivial Hopf algebra. Moreover, the construction There is also an appropriate counit and antipode.
necessarily has a self-dual form and indeed the The deformed spheres under the nonlinear rotation
dually paired Hopf algebra is C[p] C[x] with new in Majid (1990) are constant values of the Casimir
parameters h0 = 1=h and 0 =
h if we take the for the above algebra. This is
standard pairing x, p across the two algebras. Hopf
2
algebra duality realized by the quantum group coshp3 1 p2 ep3
Fourier transform F takes one between the two 2
models. which from the group of motions point of view
generates the noncommutative Laplacian when
acting on R3 . The model here is a Euclidean
inhomogeneous one.
Bicrossproduct Poincare The four-dimensional (4D) version U(so1, 3 )
Quantum Groups C[B ] of this construction (Majid and Ruegg
Another example from the 1980s in the same family 1994) is again linked to Planck-scale predictions,
as the Planck-scale Hopf algebra is G = SU2 and this time as a generalized symmetry. In terms of
M = B , a nonabelian version of R3 with Lie algebra translation generators p , rotations Mi and boosts
b of the form Ni we have
x3 ; xi ixi ; xi ; xj 0 p ; p
0; Mi ; Mj iij k Mk
them. Here we mention just one partly infinite See also: Classical r-Matrices, Lie Bialgebras, and
example of current interest. Poisson Lie Groups; Hopf Algebra Structure of
Thus, the diffeomorphisms on the line R may be Renormalizable Quantum Field Theory; Hopf Algebras
factorized into transformations of the form ax b and q-Deformation Quantum Groups; Quantum Group
Differentials, Bundles and Gauge Theory;
and diffeomorphisms that fix the origin and have
RiemannHilbert Problem; von Neumann Algebras:
unit differential there. After a (logarithmic) change
Introduction, Modular Theory, and Classification Theory.
of generators to arrive at an algebraic picture, one
has a bicrossproduct
H1 Ub H1
Further Reading
where b is now the two-dimensional (2D) Lie
Amelino-Camelia G and Majid S (2000) Waves on noncommu-
algebra with relations [x, y] = x and H1 is the algebra tative spacetime and gamma-ray bursts. International Journal
of polynomials in generators n and a certain of Modern Physics A 15: 43014323.
coalgebra as a model of the coordinate algebra of Beggs E and Majid S (2001) PoissonLie T-duality for quasi-
the group of diffeomorphisms that fix the origin with triangular Lie bialgebras. Communications in Mathematical
unit differential. The Hopf algebra H(1) was intro- Physics 220: 455488.
Connes A and Moscovici H (1998) Hopf algebras, cyclic
duced by Connes and Moscovici (1998) although not cohomology and the transverse index theory. Communications
actually as a bicrossproduct (but motivated by the in Mathematical Physics 198: 199246.
bicrossproduct theory) as part of a family H(n) useful Kac GI and Paljutkin VG (1966) Finite ring groups. Transactions
in cyclic cohomology computations. It has cross of the American Mathematical Society 15: 251294.
relations and coproduct determined by Kempf A, Mangano G, and Mann RB (1995) Hilbert space
representation of the minimal length uncertainty relation.
n ; x n1 ; n ; y nn ; Physical Review D 52: 11081118.
Klimcik C (1996) PoissonLie T-duality. Nuclear Physics B (Proc.
1 1 1 1 1 Suppl.) 46: 116121.
x x 1 1 x 1 y; Lukierski J, Nowicki A, Ruegg H, and Tolstoy VN (1991)
q-Deformation of Poincare algebra. Physics Letters B
y y 1 1 y 268: 331338.
Majid S (1988) Hopf algebras for physics at the Planck scale.
which we see has a semidirect product form where Journal of Classical and Quantum Gravity 5: 15871606.
n 3x = n1 , n 3y = nn . The coalgebra is also a Majid S (1990) Physics for algebraists: non-commutative and
semidirect coproduct by means of a back-reaction of non-cocommutative Hopf algebras by a bicrossproduct
H1 in B (expressed as a coaction). From the construction. Journal of Algebra 130: 1764.
Majid S (1990) Matched pairs of Lie groups associated to
bicrossproduct theory, we also have a dual model
solutions of the YangBaxter equations. Pacific Journal of
CB Udiff 0 Mathematics 141: 311332.
Majid S (1990) On q-regularization. International Journal of
where diff 0 is the Lie algebra of the group of Modern Physics A 5: 46894696.
diffeomorphisms fixing the origin. As such it could be Majid S (1995) Foundations of Quantum Group Theory.
Cambridge: Cambridge University Press.
viewed as in the family of examples in the section
Majid S (2000) Meaning of noncommutative geometry and the
Bicrossproduct Poincare quantum groups but Planck-scale quantum group. Springer Lecture Notes in
now with a 2D B . We also conclude from Physics 541: 227276.
the bicrossproduct theory that this acts covariantly on Majid S and Ruegg H (1994) Bicrossproduct structure of the
R2 = U(b ) after introducing the scaling parameter . -Poincare group and non-commutative geometry. Physics
Letters B 334: 348354.
Finally, the Hopf algebra H(1) is also part of a
Oeckl R (2000) Untwisting noncommutative Rd and the
family of bicrossproduct Hopf algebras built on rooted equivalence of quantum field theories. Nuclear Physics B
trees and related to bookkeeping of overlapping 581: 559574.
divergences in renormalizable quantum field theories Seiberg N and Witten E (1999) String theory and noncommuta-
(see Hopf Algebra Structure of Renormalizable Quan- tive geometry. Journal of High Energy Physics 9909: 032.
Snyder HS (1947) Quantized space-time Physical Review D
tum Field Theory). While we have not had room to
67: 3841.
cover all bicrossproduct quantum groups of interest, it Takeuchi M (1981) Matched pairs of groups and bismash products
would appear that bicrossproducts are indeed inti- of Hopf algebras. Communications in Algebra 9: 841.
mately tied up with actual quantum physics.
Bifurcation Theory 275
Bifurcation Theory
M Haragus, Universite de Franche-Comte, Besancon, equation as varies. A widely used way to
France characterize these changes is to say that the vector
G Iooss, Institut Non Lineaire de Nice, Valbonne, field F( , 0 ) is structurally stable if the sets of orbits
France of the differential equation are homeomorphic for
2006 Elsevier Ltd. All rights reserved. close to 0 , with homeomorphisms which preserve
the orientation of the orbits in time t. Then a
bifurcation occurs at = 0 if F( , 0 ) is not
Introduction structurally stable. It turns out that there is a close
link between the stability properties of equilibrium
Consider the following equation: solutions of the differential equation and the type of
FX; 0 1 the bifurcation in static theory.
The tools developed in bifurcation theory are
where X is the variable, is a parameter, and X, , F extensively used to solve concrete problems arising
belong to appropriate (finite- or infinite-dimensional) in physics and natural sciences. These problems may
spaces. The problem of bifurcation theory is to be modeled by ordinary or partial differential
describe the singularities of the set of solutions equations, integral equations, but also delay equa-
S fX; X; satisfies FX; 0g tions or iteration maps, and in all these cases the
presence of parameters naturally leads to bifurcation
The word bifurcation was introduced by H phenomena. They can be regarded as problems of
Poincare (1885) in his study of equilibria of rotating the form [1] or [3], in suitable function spaces, and
liquid masses. bifurcation theory allows to detect solutions and to
The simplest example is the study of the real roots describe their qualitative properties. During the last
x of a quadratic polynomial decades, a class of problems in which the use of
x2 bx c 0 2 bifurcation theory led to significant progress is
concerned with nonlinear waves in partial differen-
where is represented by the pair of parameters tial equations, including hydrodynamic problems,
(b, c) 2 R 2 . As it is well known, real roots are nonlinear water waves, elasticity, but also pattern
determined by the sign of formation, front propagation, or spiral waves in
def reactiondiffusion type systems.
b2 4c
For < 0, there is no real solution of [2], while
there are two solutions x in the region > 0, Examples in One and Two Dimensions
which merge when the distance between the point
(b, c) and the parabola = 0 tends towards 0. It is The most complete results in bifurcation theory are
then clear that a singularity occurs in the structure available in one and two dimensions. The study of
of the set of solutions of [2] at the crossing of the static bifurcations in one dimension is concerned
parabola = 0 or, in other words, a bifurcation with scalar equations
occurs in the parameter space (b, c) on the parabola f x; 0 4
= 0. A point (0 , x0 ) 2 R 3 is then called a
bifurcation point if 0 = (b, c) satisfies = 0, and where x 2 R, 2 R, and the function f is supposed to
x0 = b=2. be regular enough with respect to (x, ). When
In the theory of differential equations, F(X, ) f (x0 , 0 ) = 0 and the derivative of f with respect to x
often represents a vector field. This study is then satisfies @x f (x0 , 0 ) 6 0, the implicit function theorem
concerned with the existence of equilibrium solu- gives a unique branch of solutions x() for close to
tions to the differential equation 0 , and shows the absence of bifurcation points near
(0 , x0 ). Bifurcation theory intervenes when
dX
FX; 3 @x f x0 ; 0 0 5
dt
and one cannot apply the implicit function theorem
and is therefore referred to as static bifurcation for solving with respect to x near x0 . A complete
theory. In addition, dynamic bifurcation theory is description of the set of solutions near (x0 , 0 ) can
concerned here with changes in the dynamic be obtained by looking at the partial derivatives of f
properties of the solutions of the differential with respect to x and .
276 Bifurcation Theory
For example, if x
@ f x0 ; 0 6 0;
it is possible to solve with respect to and obtain a
regular solution (x) such that (x0 ) = 0 and
f (x, (x)) 0. In addition, if the second order ( 0, 0)
derivative
@x2 f x0 ; 0 6 0
the picture of the solution set in the plane (, x), also
called bifurcation diagram, shows a turning point Figure 2 Supercritical pitchfork bifurcation in the case
2
with a fold opened to the left or to the right @x f (0, 0 ) > 0 and @x3 f (0, 0 ) < 0.. The solid (dashed) lines
depending upon the sign of the product @ f (x0 , 0 ) indicate the branch of stable (unstable) solutions in the
differential equation.
@x2 f (x0 , 0 ); see Figure 1. Notice that here the
bifurcation point (0 , x0 ) 2 R2 corresponds to the
appearance of a pair of solutions of [4] from solutions x() for close to 0 . This situation arises
nowhere. This is the simplest example of a one- often in applications where usually this branch consists
sided bifurcation in which the bifurcating solutions of trivial solutions x() = 0. Then at a bifurcation
exist for either > 0 or < 0 . point (0 , x0 ) a second branch of solutions appears
A particularly interesting situation arises when the forming either a one-sided bifurcation, or a two-sided
equation possesses a symmetry. For example, assume bifurcation; see Figure 3.
that in [4] the function f is odd with respect to x. This We can now view f as a vector field in the
implies that we always have the solution x = 0, for any ordinary differential equation
value of the parameter . Assume now that f satisfies dx
f x; 8
@x f 0; 0 0 6 dt
and the study above corresponds to looking for
and that
equilibrium solutions of [8]. The stability of such a
2
@x f 0; 0 6 0; @x3 f 0; 0 6 0 7 solution is determined by the sign of the derivative
@x f (x, ) of f at this equilibrium, and it is closely
Then the point (0 , 0) is a pitchfork bifurcation related to the type of the static bifurcation.
point, this denomination being related with the In the case of a turning point bifurcation, when
bifurcation diagram in the plane (, x); see Figure 2. @x2 f (x0 , 0 ) 6 0, the sign of @x f (x, ) is different for
Notice that here, the bifurcation point (0 , x0 ) 2 R2 the two bifurcating solutions. This means that one
corresponds to the bifurcation from the origin of a pair solution is attracting (i.e., stable), the other one
of solutions exchanged by the symmetry x !x, in being repelling (i.e., unstable); see Figure 1. In the
addition to the persistent trivial solution x = 0 case of a pitchfork bifurcation as above, the stability
which is invariant under the above symmetry. Such a of the trivial solution x = 0 changes when crosses
bifurcation is also referred to as a symmetry-breaking 0 , and the stability of both bifurcating nonzero
bifurcation. Similar bifurcation diagrams are found solutions is the opposite from the stability of the
when the equation [4] has a known branch of origin on the side of the bifurcation. The bifurcation
( 0, x 0)
that F(0, 0) = 0, or, in other words, that one solution Since dynamic bifurcations are related to the existence
is known. The equation can be then written as of purely imaginary spectral values of L, the kernel of L
alone is not enough to describe this situation. One has to
LX GX; 0
consider the spectral space Y c of L associated to the
in which L = DX F(0, 0) represents the differential of purely imaginary spectrum of L. A spectral gap is
F with respect to X at (0, 0), and is assumed to have needed between this part of the spectrum and the rest
a closed range. The implicit function theorem shows (always true in finite dimensions), so that the spectral
absence of bifurcation if L has a bounded inverse, so projection P onto Y c is well defined. One writes
that bifurcations are related to the existence of a X Xc Xh ; Xc PX; Xh id PX
nontrivial kernel of L. The LiapunovSchmidt
reduction then goes as follows. and obtains the decomposed system
Let N(L) and R(L) denote the kernel and the range of dXc
L, respectively, and consider continuous projections LXc PGXc Xh ;
dt
P : X ! N(L) and Q : Y ! R(L). Then there exists a dXh
bounded linear operator B : R(L) ! (id P)X , the right LXh id PGXc Xh ;
dt
inverse of L, satisfying LB = id on R(L) and BL = id P
on X . For X 2 X one may write The reduction procedure works provided the non-
homogeneous linear equation
X X0 X1 ; X0 PX; X1 id PX
dXh
LXh f t
and then by projecting with id Q and Q the dt
equation becomes possesses a unique solution in suitably chosen
function spaces with weak exponential growth,
id QGX0 X1 ; 0
such that one can then solve the second equation
X1 BQGX0 X1 ; 0 for Xh = (Xc ) in a neighborhood of the origin in
these function spaces. This property is always true in
The implicit function theorem allows to solve the finite dimensions, but it has to be checked in infinite
second equation for X1 = (X0 , ) in a neighborhood dimensions. Different results showing the solvability
of the origin. Substitution into the first equation leads of this equation are available in both Banach and
to the equation in (id Q)Y for X0 in PX , Hilbert spaces, relying upon additional conditions
id QGX0 X0 ; ; 0 on the spectrum of L, decaying properties of the
resolvent of L on the imaginary axis, and regularity
also called bifurcation equation. This equation properties of the nonlinearity G. The map is then
completely describes the set of solutions to [1] in a used to construct a map : PX M ! (id P)X ,
neighborhood of (0, 0), and this problem is then defined in a neighborhood of the origin, which
posed in a space of dimension much smaller than the parametrizes a local center manifold invariant under
dimension of X . the flow of the equation. The flow on this center
The basic principle of the LiapunovSchmidt method manifold is governed by the reduced equation in Y c ,
has been discovered and used independently by different
dXc
authors. E Schmidt (1908) used this method for integral LXc PGXc Xc ; ;
equations, while Liapunov used it to study the stability dt
of the zero solution of nonlinear partial differential which completely describes the bifurcation problem.
equations when the linear part has zero eigenvalues The first proofs of this result were given in finite
(1947), and later in 1960 for the bifurcation problem dimensions by Pliss (1964) and Kelley (1967). Center
studied by Poincare (1885). In working in a Banach manifolds in infinite dimensions have been studied in
space of t-periodic functions, the LiapunovSchmidt different settings determined by assumptions on the
method may be used to solve the Hopf bifurcation linear part L and the nonlinear part G. One typical
problem, as did Hopf himself in 1948. assumption in infinite dimensions is that the spectrum
The analog of this reduction procedure for the of L contains only a finite number of purely imaginary
differential equation [3] is the center manifold eigenvalues, so that the reduced equation above is a
reduction. Assuming that F(0, 0) = 0, we obtain the differential equation in a finite-dimensional space.
differential equation These reduction methods work for a large class of
problems and the advantage of such an approach is
dX that one is left with a bifurcation problem in a
LX GX;
dt lower-dimensional space. The methods involved in
Bifurcation Theory 279
solving this reduced bifurcation problem can be very part. The center manifold reduction provides a
different from one problem to another, and often two-dimensional reduced system with linear part
make use of some additional structure in the problem, having the simple eigenvalues i!, for which it is
such as a gradient-like structure, Hamiltonian convenient to write the normal form in complex
structure, or the presence of symmetries, which variables
are preserved by the reduction procedure.
dA 2 2k2
A powerful tool for the analysis of these reduced i!A AQ A ; o A
differential equations is provided by the normal dt
form theory, which goes back to works of Poincare for A(t) 2 C, where Q is a complex polynomial of
(1885) and Birkhoff (1927). The idea is to use degree k in jAj2 with Q(0, 0) = 0, or, equivalently, in
coordinate transformations to make the expression polar coordinates A = rei ,
of the vector field as simple as possible. The
transformed vector field is called normal form. dr
rQr r 2 ; o r 2k2
There is an extensive literature on normal forms dt
for vector fields in many different contexts, in both d
finite- and infinite-dimensional cases. Typically the ! Q r 2 ; o r 2k1
dt
classes of normal forms are characterized in terms of
the linear part of the differential equation. Qr and Q being the real and imaginary part of Q,
For differential equations of the form respectively. The radial equation for r truncated at
order 2k 1 decouples and admits a pitchfork bifurca-
dx tion. The bifurcating steady solutions of this equation
Lx gx; 9
dt then lead first to periodic solutions for the truncated
system, which are then shown to persist for the full
in which L is a matrix and g a sufficiently regular
equation by a standard perturbation analysis.
map such that g(0, 0) = 0, Dx g(0, 0) = 0, as encoun-
A situation that occurs in a large class of problems
tered in bifurcation theory, one possible character-
is when the problem possesses a reversibility
ization of normal forms makes use of the adjoint
symmetry, which often comes from some reflection
matrix L . Fixing any order k 2, there exist
invariance in the physical space, that is, when the
polynomials and N of degree k in x with
vector field F( , ) anticommutes with a symmetry
coefficients which are regular functions of ,
operator S. One of the simplest examples is the case
and (0, 0) = N(0, 0) = 0, Dx (0, 0) = Dx N(0, 0) = 0,
of a differential equation [9] when the matrix L has
such that by the change of variables
a double eigenvalue in 0, no other eigenvalues with
x y y; zero real part, and a one-dimensional kernel which
is invariant by S. In this case, the center manifold
the equation [9] is transformed into the normal form reduction provides a two-dimensional reduced rever-
dy sible system, which can be put in the normal form
Ly Ny; okykk 10
dt da
b
in which the polynomial N is characterized through dt
db
NetL y; etL Ny; a2 ojaj jbj3
dt
for all y, , and t, or, equivalently, which anticommutes with the symmetry
Dy Ny; L y L Ny; (a, b) 7! (a, b). The above system undergoes a
reversible TakensBogdanov bifurcation and has
for all y and . This characterization allows to determine for > 0 a phase portrait as in Figure 5. There are
the classes of possible normal forms for a given matrix L, two equilibria, one a saddle, the other a center, and
and also provides an efficient way to compute the a family of periodic orbits with the zero-amplitude
normal form for a given vector field g. As for the limit at the center equilibrium, and the infinite-
reduction methods, normal form transformations can be period limit a homoclinic orbit, originating at the
made to preserve the additional structure of the saddle point. In concrete problems the bounded
problem, such as Hamiltonian structure or symmetries. orbits of such a reduced system determine the shape
As an example, consider a differential equation of of physically interesting solutions of the full system
the form [9] with x 2 Rn and 2 R, which supports a of equations, such as, for example, in water-wave
Hopf bifurcation so that L has simple eigenvalues theory where to homoclinic and periodic orbits
i!, ! > 0, and no other eigenvalues with zero real correspond solitary and periodic waves, respectively.
280 Bifurcation Theory
Ize J and Vignoli A (2003) Equivariant Degree Theory. de Ruelle D (1989) Elements of Differentiable Dynamics and
Gruyter Series in Nonlinear Analysis and Applications, vol. 8. Bifurcation Theory. Boston MA: Academic Press.
Berlin: de Gruyter and Co. Vanderbauwhede A (1989) Centre Manifolds, Normal Forms and
Kielhofer H (2004) Bifurcation Theory. An Introduction with Elementary Bifurcations. Dynamics Reported, Dynam. Report.
Applications to PDEs, Applied Mathematical Sciences, Ser. Dynam. Systems Appl., vol. 2, pp. 89169. Chichester: Wiley.
vol. 156. New York: Springer. Vanderbauwhede A and Iooss G (1992) Center Manifold Theory
Kuznetsov YA (2004) Elements of Applied Bifurcation Theory, in Infinite Dimensions. Dynamics Reported: Expositions in
3rd edn. Applied Mathematical Sciences, vol. 112. New York: Dynamical Systems, vol. 1, pp. 125163. Berlin: Springer.
Springer.
Figure 1 The TaylorCouette problem with the Taylor vortices. Figure 3 Poiseuille flow with the trivial solution.
282 Bifurcations in Fluid Dynamics
@t U U NU
where U = 0 corresponds to the trivial solution, where
is a linear and N(U) = O(U2 ) for U ! 0 a nonlinear
operator. Most of the examples from the previous
section are semilinear, that is, from a functional
analytic point of view, the nonlinear operator N can
be controlled in terms of the linear operator .
Since the form of the bifurcating pattern is only
Figure 4 The inclined-plane problem. The trivial Nusselt slightly influenced by far away boundaries, that is, for
solution possesses a flat top surface and a parabolic flow profile. instance, the upper and lower end of the rotating
cylinders in the TaylorCouette problem, the problems
Kolmogorov flow consists in finding the flow of a are considered from a theoretical point of view in
viscous incompressible fluid under the action of an unbounded domains, = Rd , with Rm the
external force parallel to the flow direction x and bounded cross section that is, for instance, that the
varying periodically in the perpendicular y-direction. TaylorCouette problem is considered with two cylin-
This gedankenexperiment has been designed by ders of infinite length. Then the eigenfunctions of the
Kolmogorov in 1958 as a simplified model for the linear operator are given by Fourier modes, that is,
Poiseuille flow problem in order to study the nature
eikx k;n z n keikx k;n z
of turbulence. The trivial solution which is called
P
Kolmogorov flow can become unstable via a long- with x 2 Rd , k 2 Rd , k x = dj= 1 kj xj , z 2 , n 2 N.
wave instability along the flow direction. If an external control parameter is changed, inde-
The inclined-plane problem consists in finding the pendent of the underlying physical problem, the
flow of a viscous liquid running down an inclined trivial solution becomes unstable, then the surface
plane, cf. Figure 4. The trivial solution, the so-called k 7! Re1 (k) intersects the plane {Re1 (k) = 0}.
Nusselt solution, becomes sideband-unstable if the Generically, this happens first at a nonzero wave
inclination angle is increased. Then the dynamics is vector kc 6 0 (cf. Figure 5).
dominated by traveling pulse trains, although the Examples for such an instability are the Taylor
individual pulses are unstable due to the long-wave Couette problem, Benards problem, or Poiseuille
instability of the flat surface. Time series taken from flow. Very often, due to some conserved quantity in
the motion of the individual pulses indicates the the problem we have Re1 (0) = 0 for all values of
occurrence of chaos directly at the onset of instability. the bifurcation parameter. Then, a so-called side-
There are other famous hydrodynamical stability band instability can occur, cf. Figure 6.
problems, with arbitrarily complicated bifurcation Examples for such an instability are the Kolmo-
scenarios. gorov flow problem or the inclined plane problem.
According to some symmetries in the problem, for
instance, reflection along the cylinders in the
TaylorCouette problem or rotational symmetry in
Spectral Analysis of the Trivial Solution
Benards problem, the curves in Figure 5 are double
All classical hydrodynamical stability problems are or rotational symmetric.
described by the NavierStokes equations In case of being spherical symmetric, we have
1 fl rl; n z l fl rl; n z
@t U U rp U rU f
1
0rU
where U = U(x, t) 2 Rd with d = 2, 3 is the velocity
field, p = p(x, t) 2 R the pressure field, f some external
forcing, and the dynamic viscosity. These equations k
are completed with boundary conditions. In case of
Benards problem, the NavierStokes equations are
coupled to a nonlinear heat equation. Rest of spectrum
By projecting U onto the space of divergence-free
vector fields and by taking the trivial solution as
new origin all problems from the previous section Figure 5 Real part of the spectrum in case of an instability at a
can be written as evolutionary system wave number kc 6 0. Definition of the small bifurcation parameter ".
Bifurcations in Fluid Dynamics 283
Es
k Mc
Ec
Rest of spectrum
Rest of Rest of
spectrum spectrum
Figure 9 The dynamics of the Landau equation. Except of the
origin which corresponds to the Couette flow, all solutions
converge towards the circle of fixed points, which corresponds
to the family of Taylor vortices. The translation invariance of the
Figure 7 Generically, a simple real eigenvalue or a pair of TaylorCouette problem is reflected by the rotational symmetry of
complex-conjugate eigenvalues cross the imaginary axis. the reduced system.
284 Bifurcations in Fluid Dynamics
Im
LeraySchauder Theory and Mapping Degree; Multiscale
Approaches; Newtonian Fluids and Thermohydraulics;
Symmetry and Symmetry Breaking in Dynamical Systems;
Continuous spectrum
Turbulence Theories; Variational Methods in Turbulence.
Re Further Reading
Chandrasekhar S (1961) Hydrodynamic and Hydromagnetic
Stability. Oxford: Clarendon.
Discrete eigenvalues
Chang H-C and Demekhin EA (2002) Complex Wave Dynamics
on Thin Films, Studies in Interface Science, vol. 14. Amsterdam:
Figure 11 Spectrum for the flow around an obstacle. Elsevier.
Chossat P and Iooss G (1994) The TaylorCouette Problem,
multiple scaling analysis is possible, that is, that the Applied Mathematical Sciences, vol. 102. Springer.
modulation equations still depend on ". Chow S-N and Hale J (1982) Methods of Bifurcation Theory,
Grundlehren der Mathematischen Wissenschaften, vol. 251.
Berlin: Springer.
Discussion Golubitsky M and Schaeffer DG (1985) Singularities and Groups
in Bifurcation Theory I, Applied Mathematical Sciences,
There is no satisfactory bifurcation analysis for situa- vol. 51. Berlin: Springer.
tions where boundary layers play a role. The most Golubitsky M, Stewart I, and Schaeffer DG (1988) Singularities
and Groups in Bifurcation Theory II, Applied Mathematical
simple problem is the flow around some obstacle. The
Sciences, vol. 69. Berlin: Springer.
difficulties are according to the fact that due to the Haken H (1987) Advanced Synergetics. Berlin: Springer.
unbounded flow region there is always continuous Henry D (1981) Geometric Theory of Semilinear Parabolic Equa-
spectrum up to the imaginary axis. From the localized tions, Lecture Notes in Mathematics, vol. 840. Berlin: Springer.
obstacle discrete eigenvalues are created, (cf. Figure 11). Mielke A (2002) The GinzburgLandau equation in its role as a
modulation equation. In: Fiedler B (ed.) Handbook of Dyna-
In such a situation, so far there is no mathematical mical Systems II, pp. 759834. Amsterdam: North-Holland.
bifurcation theory available. Ruelle D and Takens F (1971) On the nature of turbulence.
Communications in Mathematical Physics 20: 167192.
See also: Bifurcation Theory; Dynamical Systems in Temam R (1988) Infinite-Dimensional Systems in Mechanics and
Mathematical Physics: An Illustration from Water Waves; Physics. Berlin: Springer.
The term oscillator has two meanings. A con- As the attractive singular points are structurally
servative oscillator is a plane vector field which stable, this is enough to assume that the system
displays an open set of periodic orbits. It is said to di
be isochronous if all orbits have same period. A Fi 0; ; 0 5
dt
dissipative oscillator is a planar vector field which
displays an attractive limit cycle (attractive periodic displays an attractive singular point.
orbit).
We consider N dissipative oscillators:
Periodic Orbits of Linear Systems
dxi
f xi ; yi
dt Consider the linear system
1
dyi dx
gxi ; yi Pt x qt 6
dt dt
where i = 1, . . . , m. where P is a continuous T-periodic matrix function
The dynamical system obtained by considering the and q is a vector T-periodic continuous function,
space of all the variables (xi , yi ), i = 1, . . . , m, dis- x = (x1 , . . . , xn ). Consider also the two associated
plays an invariant torus full of periodic orbits that homogeneous equations:
we denote by T m (0).
dx
Assume now that the N oscillators are weakly Pt x 7a
coupled: dt
dxi
f xi ; yi Fi x; y; dx
dt P t x 7b
2 dt
dyi
gxi ; yi Gi x; y; where P denotes the transposed of P.
dt
The set of T-periodic solutions of [7b] is a vector
where can be considered as small as we wish. space. m denotes its dimension. Let Uj (t), j = 1, . . . , m,
Definition The system [2] has a frequency locking be a basis of this vector space. This basis is completed
if it displays a family of stable periodic orbits for by adding n m solutions Uj (t), j = m 1, . . . , n, to
all values of small enough which tends to (in the obtain a basis of Rn . Let U(t) be the matrix whose
sense of Hausdorffs topology) a periodic orbit of [1] columns are these vectors; denote Uij (t) the elements of
contained in the periodic torus T m (0). this matrix.
With the change of variable x = U (0)1 y, system
Assume now that [2] has a frequency locking [6] gets transformed into
associated with the periodic orbit (t). Consider the
projections i (t) of (t) on the coordinates plane dy
Qty rt 8
(xi , yi ), i = 1, . . . , m. Assume that is small enough dt
so that the projection belongs to the open set Si on with Q(t) = U (0)P(t)U (0)1 and r(t) = U (0)q(t).
which are defined the amplitudephase coordi- Matrix V(t) = U1 (0)U(t) is such that
nates of the system [1]. We can write the system [2],
restricted to the open set S = m i=1 Si , as
dV
Q tV 0; V0 I
dt
Bifurcations of Periodic Orbits 287
and the k first column vectors V(t), denoted as [7a]. To be more specific, one can choose x (t) to
V j (t), j = 1, . . . , m, are T-periodic. be the unique solution of [6] such that
Let X(t) be the fundamental solution defined by y(0)k = 0, k = m 1, . . . , n, and j (t) solutions of
[7a], such that y(0)k = jk . With these notations,
dX
Qt X; X0 I x (t) is such that
dt
y0k k ; k 1; . . . ; m
then,
and its other initial conditions y(0)k = k , k = m
X1 t V t
1, . . . , n, are fixed:
The solution of [8] can be written as k k0
Z t
yt Xt y0 Xt X1 uru du 9
0
This yields that T-periodic solutions of [8] have Malkins Theorem for Quasilinear
initial data y(0) given by Systems
Z T
Consider now nonlinear systems with the
V T I y0 V srs ds 10 perturbation:
0
dx
Conversely, given a solution y(0) of [10], Pt x qt f x; t; 14
T-periodicity of P and q and uniqueness of solutions dt
of a differential equation imply that y(0) represents the where f is C1 and T-periodic in t.
initial data of a T-periodic solution of [8]. Hence, the Assume that the solutions y(t, y(0), ) of [14] exist
T-periodic solutions of [8] are in one-to-one corre- for all values of t, 0 t T. The solutions define a
spondence with the affine space defined by the differential function of their initial data y(0). This is,
solutions of [10]. The m first rows of V (T) I are for instance, true for perturbations of linear systems
zero and its rank is exactly n m. In the following, if is small enough.
assume that the determinant formed by the (n m) Assume that q satisfies la condition [12] and that
last rows and last columns of (V (T) I) is not zero. there is a solution
A necessary and sufficient condition so that [8] 0
1 ; . . . ; 0m
displays a T-periodic solution is
Z TX n to the equations
Vjk urj u du 0; k 1; . . . ; m 11a n Z T
X
0 j1
k Ujk ufj x u; u; 0 du 0;
j1 0
X
n
Vjk T jk yj 0 k 1; . . . ; m 15a
jm1
n Z
so that
X T
Vjk srj s ds; m1sn 11b @ k
j1 0 j0 ; k 1; . . . m; j 1; . . . ; m 15b
@j
This yields the Fredholm alternative, if the m is invertible.
conditions, Proceed as in previous section with the coordinate
Xn Z T change x = U (0)1 y. Equation [14] gets trans-
Ujk sqj s ds 0; k 1; . . . ; m 12 formed into
j1 0
dy
Qty rt Fy; t; 16
are satisfied, then [6] displays a family x (t) of dt
T-periodic solutions depending of m parameters with F = U (0)f (U (0)1 y, t, ).
(1 , . . . , m ): Solutions of [16] are uniquely determined by their
x t 1 1 t m m t x
t 13 initial data. We can understand the parameters (, )
as coordinates on the space of solutions. With this
where x (t) is a particular T-periodic solution and viewpoint, for instance, the set of T-periodic
j (t) denote T-periodic independent solutions of solutions of [6] is an affine space of dimension m
288 Bifurcations of Periodic Orbits
given by the equations = 0 and is parametrized by displays an m-parameter family x (t) of T-periodic
the coordinates . In this space, we pick up a point orbits.
(which corresponds to a particular T-periodic solu- Assume that the solutions y(t, y(0), ) exist for all
tion of [6]): ( = 0 ). T-periodic solutions of [16] are 0 t T and define a differentiable mapping of the
in one-to-one correspondence with the solutions of initial data y(0). This is, for instance, the case if we
Xn Z T assume that the nonperturbed equation defines a
Ck ; ; Vjk sFj ys; ; ; ; s; ds 0; flow and if is small enough.
j1 0 Assume also that the different solutions x (t) are
k 1; . . . ; m 17a independent in the sense that the mapping
X 7! x t
Ck ; ; Vjk T I j
jm1;...;n is an immersion for any t. In other words, the m
n Z
X T vectors dx (t)=dj are independent.
Vjk srj s ds We linearize the solution along the family of
j1 0
periodic orbits:
n Z
X T
Vjk sFj ys; ; ; ; s;ds 0;
0 x x t
23
j1
Dfx x t;t
gx t;t; 0 F
;t; 24
[14] in this way: dt
X
m
Set, furthermore,
y0U 0 x0; x0 j j 0 x
0 18
j1
Pt Dfx x t; t; rt gx t; t; 0
Consider the determinant of the Jacobian matrix
and denote U(t) the fundamental solution of [7b]
of the mapping
described earlier.
; 7! C; ; 19
Theorem Assume that there is a solution
0
for = , k = k0 ,
k = m 1, . . . , n , = 0. This is
equal to the product of and the determinant of 01 ; . . . ; 0m
@ k
j 0 20 of the m equations:
@j
which is nonzero. n Z
X T
The implicit-function theorem shows that the k Ujk ugj x u; u; 0 du 0;
j1 0
differential equation [14] (and thus [16] as well)
has, for small enough, a unique T-periodic solution
k 1; . . . ; m 25a
which tends to x0 when tends to 0.
such that
Note first that the m conditions [25a] imply that Then, the solutions
(t) depend linearly on . We thus
the m equations, obtain that a priori p () are quadratic functions of :
d
p 1 ; . . . ; m
Dfx x0 t; t
gx0 t; t; 0 Z
dt 1X T
@ 2 fj @zk @zl
q r Ujp ds
display a family of T-periodic solutions which 2 qrkl 0 @zk @zl @q @r
depend on m parameters = (1 , . . . ,m ). From Z " !
X T
1 @ 2 fj @zk @zl
(13), one can write q Ujp
l
k
qkl 0 2 @zk @zl @q @q
t 1 1 t m m t
t 26 #
@gj @zk
is a particular T-periodic solution and ds 28
where
(t) @zk @q
the j (t) are independent T-periodic solutions
of (22a). where the dots represent quantities independent of .
We use then the expression
Lemma 1 A possible choice for the solutions j (t)
is @x (t)=@j j=0 . 2
d @ zj
We have already assumed that these vectors are dt @q @@r
independent. They are obviously T-periodic solu- X @ 2 fj @zk @zl X @fj @ 2 zk
tions to (22a).
In the following, we will assume that all other periodic kl
@zk @zl @q @r k
@zk @q @@r
solutions of (22a) are linear combinations of these.
As a consequence of what was proved in the This allows one to find the homogeneous quadratic
section on periodic orbits of linear systems, system part as
[24] displays a periodic solution (for small enough)
if there exists a solution XZ T
@ 2 fj @zk @zl
Ujp ds
0 jkl 0 @zk @zl @q @r
0
1 ; . . . ; m 2
XZ T d @ zj
Ujp s ds
to equations 0 ds @q @@r
j
n Z
X T XZ T @fj @ 2 zk
k Ujk sFj
s; s; 0 ds 0; Ujp s ds
j1 0 jk 0 @zk @q @r
k 1; . . . ; m
Integration by parts yields
such that XZ T
@ 2 fj @zk @zl
Ujp ds
@k jkl 0 @zk @zl @q @r
j 0; k 1; . . . m; j 1; . . . ; m
@j X Z T dUjp 2
@fj @ zk
Ujp s ds 0
is invertible. j 0 ds @z k @ q @r
2
and the coefficient dp XZ T @ fj @gj @zk
Ujp
ds
n Z
X T dq 0 @zk @zl l @zk @q
kl 0
p Ujp ugj x u; u; 0 du
j1 0
This achieves the proof of the theorem. In the special
We can write case of Hamiltonian systems, in the case of the
Z T peturbations of an isochronous system, the method
dp @Ujp @gj @zk explained is equivalent to Mosers averaging theory.
gj Ujp ds
dq 0 @q @zk @q The reader is referred to other articles in this
encyclopedia for a discussion of other aspects of
Note that
synchronization, frequency locking, and phase locking.
d
j X @fj
r gj zt; 0 ; 0 See also: Bifurcation Theory; Fractal Dimensions in
dt r
@zr
Dynamics; Integrable Systems: Overview; Isochronous
and we obtain Systems; LeraySchauder Theory and Mapping Degree;
! LjusternikSchnirelman Theory; Singularity and
Z
dp T
@Ujpd
j X @fj Bifurcation Theory; Symmetry and Symmetry Breaking in
r Dynamical Systems; Synchronization of Chaos; Weakly
dq 0 @q ds r
@zr
Coupled Oscillators.
@gj @zk
Ujp ds
@zk @q
Further Reading
Integration by parts yields Hartman P (1964) Ordinary Differential Equations. New York:
Z T ! Wiley.
dp d @Ujp X @fj Malkin I (1952) Stability Theory of the Motion. Moscou
j
r
dq 0 0 ds @q r
@zr Leningrad: Izdat. Gos.
Malkin I (1956) Some Problems in the Theory of Nonlinear
Z T
@gj @zk Oscillations. Gostekhisdat.
Ujp ds Moser J (1970) Regularization of Keplers problem and the
0 @zk @q
averaging method on a manifold. Communication of Pure and
From the equation Applied Mathematics 23: 609636.
Roseau M (1966) Vibrations non lineaires et theorie de la stabilite,
dUjp X @fk Springer Tracts in Natural Philosophy, vol. 8. Berlin: Springer.
U 0 Van der Pol B (1926) On relaxation-oscillations. Philosophical
dt @zj kp
k Magazine 3(7): 978992.
Van der Pol B (1931) Oscillations sinusoidales et de relaxation.
we deduce that Londe electrique 245256.
X @fk @Ujp X @ 2 fk Van der Pol B and Van der Mark J (1927) Frequency
d @Ujp @zr demultiplication. Nature 120: 363364.
Ukp
dt @q k
@zj @q k
@zj @zr @q Van der Pol B and Van der Mark J (1928) The heart beat
considered as a relaxation oscillation, and an electrical model
and thus this shows that of the heart. Philosophical Magazine 6(7): 763775.
It was soon observed that the KdV equation can Hamiltonian Methods in Soliton Theory
be seen as an infinite-dimensional Hamiltonian
The most famous example of soliton equation is
system with an infinite sequence of constants of
the KdV equation [1], where u is usually a
motion in involution; the corresponding (commut-
periodic or rapidly decreasing real function. The
ing) vector fields are symmetries for the KdV
choice of the coefficients in the equation has no
equation, and form the so-called KdV hierarchy. In
special meaning, since they can be changed
particular, Zakharov and Faddeev constructed
arbitrarily by rescaling x, t, and u. Right after
action-angle variables for the KdV equation. These
the discovery of the inverse-scattering method for
facts pointed out that the KdV equation is an
solving the Cauchy problem for the KdV equation,
infinite-dimensional analog of a classical integrable
it was realized that this equation can be seen as an
Hamiltonian system (Dubrovin et al. 2001), whose
infinite-dimensional Hamiltonian system. Indeed,
theory has been developed during the nineteenth
from a geometrical point of view, eqn [1] defines a
century by Liouville, Jacobi, and many others.
vector field X(u) = (1=4)(uxxx 6uux ) on M, the
Moreover, the infinite-dimensional case suggested
infinite-dimensional vector space of C1 functions
methods (such as the existence of a Lax pair) which
from the unit circle S1 to R. (For the sake of
were applied successfully also to finite-dimensional
simplicity, we consider only the periodic case; the
cases such as the Toda lattices and the Calogero
integrals in this article are therefore understood to
systems. More recently, after the discovery by
be taken on S1 .) The vector field X associated with
Witten and Kontsevich of remarkable relations
the KdV equation is Hamiltonian, that is, it can be
between the KdV hierarchy and matrix models of
factorized as
two-dimensional (2D) quantum gravity, there has
been a renewed interest in the study of soliton
Xu 2@x 18uxx 3u2
equations in the community of theoretical physicists.
We also mention that the classical versions of the where dH = (1/8)(uxx 3u2 ) is the differential of
extended W n -algebras of 2D conformal field theory the functional
are the (second) Poisson structures of the Gelfand Z
1 1
Dickey hierarchies. Hu u3 u2x dx
In this article we describe the so-called 8 2
bi-Hamiltonian formulation of soliton equations. that is, the variational derivative h=u of the density
This approach to integrable systems springs from the h = (1=8)(u3 (1/2)u2x ), and P = 2@x is a Poisson
observation, made by Magri at the end of the 1970s, that (or Hamiltonian) operator. This means that the
the KdV equation can be seen as a Hamiltonian system corresponding composition law
in two different ways. In the same circle of ideas, there Z Z
were important works by Adler, Dorfman, Gelfand, fF; Gg dF PdG dx 2 dF dGx dx 2
Kupershmidt, Wilson, and many others. Thus, the
concept of bi-Hamiltonian manifold, which constitutes
between functionals of u has the usual properties
the geometric setting for the study of bi-Hamiltonian
of the Poisson bracket, that is, it is R-bilinear
systems, emerged. This notion and its applications to the
and skew-symmetric, and it fulfills the Leibniz
theory of finite-dimensional integrable systems is
rule and the Jacobi identity. In other words,
discussed in Multi-Hamiltonian Systems.
(M, P) is an infinite-dimensional Poisson mani-
In the first section of this article, we discuss the
fold. Using the Poisson bracket [2], eqn [1] can
Hamiltonian form of soliton equations and, more
be written as
generally, we present an important class of infinite-
dimensional Poisson (also called Hamiltonian) ut fu; Hg 3
structures, namely those of hydrodynamic type.
Then we show how to use the bi-Hamiltonian corresponding to the usual Hamilton equation in
properties of the KdV equation in order to construct R2n
its conserved quantities. We also recall that the KdV z_ i fzi ; Hg; i 1; . . . ; 2n 4
equation can be seen as an Euler equation on the
dual of the Virasoro algebra. In the third section, we up to the replacement of z with u, and of the
deal with other examples of integrable evolution discrete index i with the continuous index x. More
equations admitting a bi-Hamiltonian representa- precisely, in the expression ut = {u, H} the symbol u
tion, that is, the Boussinesq and the CamassaHolm should be replaced by ux (in analogy with zi ), the
equations, and we consider the bi-Hamiltonian functional assigning to the generic function v 2 M
structures of hydrodynamic type. its value at a fixed point x, that is, ux : v 7! v(x). In
292 Bi-Hamiltonian Methods in Soliton Theory
these notations, the Poisson bracket [2] takes the ij 0 I
form P
I 0
fux ; uy g 20 x y then we have the Hamiltonian formulation of the
where the -function is as usual defined as field equations,
Z h h
f yx y dx f x qit ; pit ; i 1; . . . ; N
pi qi
so that its derivatives are given by Another important example of Poisson bracket on
Z Mn is given by
f yk x y dx f k x
fui; x ; uj; y g gij 0 x y 8
Another important example is given by the where gij are the entries of a constant symmetric
Boussinesq equation matrix. In this case,R the Hamiltonian vector field
associated with H = h dx is given by
utt 13 uxxxx 4u2x 4uuxx 5
Xn
h
i ij
describing, like KdV, shallow water (soliton) waves ut g @x ; i 1; . . . ; n 9
uj
in a nonlinear approximation. It can be obtained by j1
the first-order (in time) system R
Notice that this vector field is zero if H = uk dx,
u1 t 23 u2 u2x u1xx 23 u2xxx ; u2 t 2u1x u2xx 6 with k = 1, . . . , n. This amounts to saying that such
an H is a Casimir function of the Poisson bracket
by taking the derivative of its second equation with [8], that is, that {H, F} = 0 for all functionals F. A
respect to t, plugging the result in the first one, and simple example of this class (with n = 2) is given by
setting u= u2 . The system [6] is Hamiltonian, since it the Poisson structure of the Boussinesq equation,
can be written as corresponding to the choice g11 = g22 = 0 and
g12 = g21 = 1. Suppose now that the matrix with
h h
u1 t 2
; u 2
t entries gij is invertible. Then they can be interpreted
u x u1 x as the contravariant components of a flat pseudo-
with h = (u1 )2 (1=9)(u2 )3 u1 u2x (1=3)(u2x )2 , and Riemannian metric in Rn . A change of coordinates
(u1 , . . . , un ) 7! ( n ) in Rn transforms the
u1 , . . . , u
0 @x Poisson bracket [9] in
7
@x 0
ui; x ; u
f u0 x y ijk
j; y g gij ukx x y 10
u
is easily seen to be a Poisson operator. Thus, the
Poisson manifold associated with the Boussinesq where gij (
u) are the components of the metric in the
ij
equation is the space of periodic C1 functions with new coordinates and the k are the contravariant
values in R2 . More generally, one can consider the Christoffel symbols related to the usual Christoffel
space Mn of C1 functions from the unit circle S1 to symbols by
Rn . If Pij , for i, j = 1, . . . , n, are the entries of a ij j
constant skew-symmetric matrix and ui, x assigns to k gil lk 11
the generic function v 2 Mn the value of its ith Conversely, the expression [10] gives a Poisson
components at a fixed point x, then bracket if the metric defined by gij is flat and its
Christoffel symbols are related to the ijk by [11].
fui; x ; uj; y g Pij x y
These are the Poisson structures of hydrodynamic
defines a Poisson bracket on Mn . One can also let type introduced by Dubrovin and Novikov. We will
the Pij depend on the uk in such a way that they consider them again later.
form Rthe components of a Poisson tensor on Rn . If
H = h dx is a functional on Mn with density h, the
associated Hamiltonian vector field gives rise to the Bi-Hamiltonian Formulation
following system of partial differential equations: of the KdV Equation
X
n
h The KdV equation [1] has a lot of remarkable
uit Pij ; i 1; . . . ; n
j1
u j properties, such as the Lax representation and the
existence of a -function. In this section, we recall a
In particular, if n = 2N and geometrical feature of KdV, namely, the fact that it
Bi-Hamiltonian Methods in Soliton Theory 293
has a second Hamiltonian structure, and we show Such relations are often called LenardMagri rela-
that the integrability of KdV can be seen as a natural tions. Then the functionals Hk are in involution with
consequence of its double Hamiltonian representa- respect to both Poisson brackets. Indeed, for k > j,
tion. We have already seen that the KdV vector field one has
X(u) = (1=4)(uxxx 6uux ) can be written as
fHj ; Hk g0 fHj ; Hk1 g1 fHj1 ; Hk1 g0
Xu P0 dH2 fHk ; Hj g0
where P0 = 2@x and so that {Hj , Hk }0 = 0 for all j, k 0, and therefore
Z {Hj , Hk }1 = 0 for all j, k 0. Hence, these func-
1 1
H2 u3 u2x dx tionals are constants of motion (in involution) for
8 2
the KdV equation. The Hamiltonian vector fields
But X admits another Hamiltonian representation: associated with them are symmetries for the KdV
equation; the corresponding evolution equations are
Xu P1 dH1 called higher-order KdV equations. The set of such
equations is the well-known KdV hierarchy. We
where P1 = (1=2)@xxx 2u@x ux and
remark that the existence of a sequence of func-
Z
1 tionals {Hk }k0 , fulfilling the LenardMagri rela-
H1 u2 dx tions [12] and starting from a Casimir of P0 , is
4
equivalentP to the existence of a Casimir function
The important point is that P1 is also a Poisson H() = k0 Hk k for the Poisson pencil
operator. Moreover, it is compatible with P0 , that is, P = P1 P0 , where is a real parameter. A
any linear combination of P0 and P1 is still a Poisson straightforward way (due essentially to Miura,
operator. Thus, the KdV equation is a bi-Hamiltonian Gardner, and Kruskal) to determine such a Casimir
system, that is, it can be seen in two different (but function is to consider the (generalized) Miura map
compatible) ways as a Hamiltonian system. Next, we h 7! u = hx h2 . As shown by Kupershmidt
will show how this property can be used to construct and Wilson, it transforms the Poisson structure
an infinite sequence of conserved quantities for the (1=2)@x (in the variable h) into the Poisson pencil
KdV equation, which are in involution with respect to P = (1=2)@xxx 2(u )@x ux . Given u, the
the Poisson brackets { , }0 and { , }1 associated with Riccati equation
P0 and P1 . In particular, the phase space M of KdV
is a bi-Hamiltonian manifold, that is, it has two hx h2 u 13
different (but compatible) Poisson structures. Let us
rename X1 = X the KdV vector field. Since admits a unique Psolution with the asymptotic
X = P0 dH2 = P1 dH1 , one is naturally led to con- expansion h = z k1 hk zk , where z2 = . More-
sider the vector fields over, the coefficients hk are differential polynomials
in u (i.e., polynomials in u and its x-derivatives) that
X0 P0 dH1 ; X2 P1 dH2 can be computed by recurrence. Thus, the general-
ized Miura map can be seen as an Rinvertible
Explicitly, X0 (u) = ux and X2 (u) = (1=16)(uxxxxx
transformation. Since the functional h 7! h dx is a
10uuxxx 20ux uxx 30u2 ux ). One can check that
Casimir of the Poisson structure (1=2)@x , it follows
these vector fields are also Rbi-Hamiltonian. Indeed,
that if h(u) is the
R solution of the Riccati equation
X0 (u) = P1 dH0 , with H0 = u dx, and
[13], then u 7! h(u) dx is a Casimir of the Poisson
X2 P0 dH3 with pencil P . More precisely,
R one has to introduce the
Z functional H() = z h(u) dx, that turns out to be a
1 2 5
H3 uxx 5uu2x u4 dx Laurent series in , because the even coefficients of
64 2 h(u) are x-derivatives. This is the Casimir function
The functional H0 is a Casimir of P0 , that is, we were looking for. Explicitly, one finds that the
P0 dH0 = 0, so that the iteration ends on this side, first terms of h(u) are
but it can be continued indefinitely from the other
h1 12u; h2 14ux ; h3 18uxx u2
side, as shown below. For the time being, let us take
1
for granted that there exists an infinite sequence h4 16 uxxx 4uux
{Hk }k0 of functionals such that P1 dHk = P0 dHk1 ; 1
h5 32 uxxxx 6uuxx 5u2x 2u3
in other words,
Obviously, h1 is the density of a Casimir function of
f; Hk g1 f; Hk1 g0 12 P0 , while h3 and h5 are (one-half of) the densities of the
294 Bi-Hamiltonian Methods in Soliton Theory
two Hamiltonians H1 and H2 of the KdV equation. This is (up to rescaling) the second Poisson
We conclude this section showing that, as observed bracket of KdV. The KdV equation is therefore
by Khesin and Ovsienko (Arnold and Khesin 1998), an Euler equation, that is, it can be obtained from
the bi-Hamiltonian structures of KdV have a clear the Euler equations for the rigid body by repla-
Lie-algebraic origin. Indeed, the second Hamiltonian cing the Lie algebra of the rotation group with
structure is the LiePoisson structure on the dual of the Virasoro algebra. To be more precise, the
the Virasoro algebra, while the first one can be Hamiltonian vector R field associated with
obtained by freezing the second one at a suitable H1 (u, c) = (1=2)( u2 dx c) is
point. Let X (S1 ) be the Lie algebra of vector fields
on S1 . The Virasoro algebra is the vector space ut 3uux cuxxx 0; ct 0
g = X (S1 ) R endowed with the Lie-algebra
If c 6 0, this is (up to rescaling) the KdV equation
structure
[1]. For c = 0, we have the Burgers equation (also
@ @ called dispersionless KdV equation), to be discussed
f x ; a ; gx ; b again later on. The first Poisson bracket for the KdV
@x @x
@ hierarchy can be obtained by freezing the Lie
f 0 xgx g0 xf x ; Poisson bracket at the point ((1=2)dx dx, 0) of the
Z @x
dual of the Virasoro algebra. This means that
f 0 xg00 x dx 14 instead of [16] one has to consider
Z x
A more complicated Poisson structure for this
system is ux my sinhy x dy
0
! Z 1
1 1
A 3@x4 3u2 @x2 9u1 @x 3u1x my cosh y x dy
P 18 2 sinh1=2 0 2
B 6@x3 6u2 @x 3u2x
The CamassaHolm equation is then bi-Hamiltonian
with with respect to the Poisson pair
A 2@x5 4u2 @x3 6u2x @x2 2u2 2 6u1x 6u2xx @x P1 @xxx @x ; P2 2m@x mx
3u1xx 2u2xxx 2u2 u2x
Indeed, it can be written as mt = P1 dH2 = P1 dH2 ,
and where
Z
B 3@x4 3u2 @x2 9u1 6u2x @x 6u1x 3u2xx 1
H1 u2 u2x dx
2
It can be obtained by means of the Drinfeld Z
1
Sokolov reduction (or also by means of a H2 u3 uu2x dx
2
bi-Hamiltonian reduction) from the LiePoisson
structure (modified with the cocycle @x ) on the Notice that the Poisson pair of the CamassaHolm
space of C1 maps from S1 to the Lie algebra of equation can be obtained from that of KdV by
3 3 traceless matrices. This is the reason why it is moving the cocycle @xxx from the second Poisson
a Poisson structure, compatible with [7]. The system structure to the first one. Indeed,
[6] can be written as
0 1
! Pa;b;c a@xxx b@x c2m@x mx
u1t h2 /u1
@ A P
a; b; c 2 R 20
u2 h2 /u2
t
is a family of pairwise compatible Poisson operators.
where h2 = (1=3)u1 is the density of a Casimir of the Moreover, we mention that Misioek has shown that
Poisson structure [7]. Thus, the Boussinesq equation also the CamassaHolm equation is an Euler equation
is a bi-Hamiltonian system and can be shown to on the dual of the Virasoro algebra. We conclude this
possess, like KdV, an infinite sequence of conserved article with a brief discussion concerning the so-called
quantities and symmetries, forming the Boussinesq bi-Hamiltonian structures of hydrodynamic type. They
hierarchy. The KdV and the Boussinesq hierarchy are play a relevant role in the theory of Frobenius
indeed particular examples of GelfandDickey hier- manifolds, that, in turn, have deep relations with
archies (Dickey 2003). They are hierarchies of many important topics in contemporary mathematics
systems of n equations with n unknown functions and physics, such as GromovWitten invariants and
and they are related, via the DrinfeldSokolov isomonodromic deformations. As we have seen in the
approach, to the Lie algebra sl(n 1). As shown by earlier section, a Poisson structure of hydrodynamic
Adler, Dickey, and Gelfand, these hierarchies have a type is given, on the space of C1 maps from S1 to (an
bi-Hamiltonian formulation. Also the generalized open set of) Rn , by
KdV equations, associated by Drinfeld and Sokolov
with an arbitrary affine KacMoody Lie algebra, are fui; x ; uj; y g gij u0 x y ijk uukx x y 21
bi-Hamiltonian (or are obtained as suitable reduc-
tions of bi-Hamiltonian systems). Let us consider where gij (u) are the contravariant components of
now the (dispersionless) CamassaHolm equation a (pseudo-)Riemannian flat metric and the ijk are
the (contravariant) Christoffel symbols of the
ut utxx 3uux 2ux uxx uuxxx 19 metric. If two Poisson structures of hydrodynamic
type are given, it can be shown that they are
which also describes shallow water waves, and compatible if and only if the two corresponding
possesses remarkable solutions called peakons, since metrics form a flat pencil. This means that their
they represent traveling waves with discontinuous linear combinations (with constant coefficients)
first derivative. In order to supply this equation with a are still flat (pseudo-)Riemannian metrics, and
(bi-)Hamiltonian structure, one has to perform the that the contravariant Christoffel symbols of the
change of variable m = u uxx , whose inverse, in the linear combinations are the linear combinations
space of period-1 functions, turns out to be given by of the contravariant Christoffel symbols of the
296 Billiards in Bounded Convex Domains
where Sn1 and Bn1 are the unit sphere and the unit
disk in Euclidean spaces.
Existence and Nonexistence of Caustics
Given a plane billiard table, a caustic is a curve
Figure 1 Billiard ball map. inside the table such that if a segment of a billiard
trajectory is tangent to this curve then so is each
reflected segment. Caustics correspond to invariant
inward vector v1 . Then, one has: T(x, v) = (x1 , v1 ). circles of the billiard ball map (i.e., invariant curves
For a convex M, the map T is continuous. If M is that go around the phase cylinder): such an invariant
n-dimensional, then the dimension of the phase circle is a one-parameter family of oriented lines,
space of the billiard ball map is 2n 2. and the respective caustic is their envelop. An
Equivalently, and more in the spirit of geometrical envelop may have cusp-like singularities but if the
optics, one considers L, the space of oriented boundary of the billiard table is a smooth curve with
geodesics (rays of light) that intersect the billiard positive curvature then a caustic, sufficiently close to
table. This space of lines is in one-to-one correspon- the boundary, is smooth and convex.
dence with the phase space of the billiard ball map: One can recover the table from a caustic by the
to an inward unit vector (x, v) there corresponds the following string construction. Let be a caustic.
oriented line through x in the direction v (Figure 1). Wrap a closed nonstretchable string around , pull it
The space of rays L carries a canonical symplec- tight at a point and move this point around to
tic structure, that is, a closed nondegenerate obtain a new curve . Then, is a caustic for the
differential 2-form. In the Euclidean case, this billiard inside . Note that this construction has one
symplectic structure ! is defined as follows. Given parameter, the length of the string.
an oriented line in Rn , let q be the unit vector The following useful mirror equation relates
along and p be the vector obtained by dropping various quantities depicted in Figure 2:
the perpendicular
P from the origin to . Then, 1 1 2k
! = dp ^ dq = dpi ^ dqi . This construction identi-
fies L with the cotangent bundle of the unit sphere: a b sin
q is a unit vector and p is a (co)tangent vector at q, where k is the curvature of the boundary at the
and ! identifies with the canonical symplectic impact point.
structure of T Sn1 . In the general case of a Do caustics exist for every convex billiard table?
Riemannian manifold M, the symplectic structure This is important to know, in particular, because the
on the space of oriented geodesics is obtained from existence of a caustic implies that the billiard ball
that on T M by symplectic reduction. map is not ergodic. The answer is given by a
One has an important result: the billiard ball map theorem of Lazutkin: if the boundary of the billiard
preserves the symplectic structure T (!) = !. As a table is sufficiently smooth and its curvature never
consequence, T is also measure preserving. In the vanishes, then there exists a collection of smooth
planar case, one has the following explicit formula caustics in the vicinity of the billiard curve whose
for this measure. Let t be an arc length parameter union has a positive area. Originally this theorem
along the boundary of the billiard table and let asked for 553 continuous derivatives; later this was
2 [0, ] be the angle made by the unit vector with reduced to six. This result uses the techniques of the
this boundary. Then, (, t) are coordinates in the KAM (KolmogorovArnoldMoser) theory. The
phase space, identified with the cylinder, and the
invariant measure is sin d dt. k
As a consequence, the total area of the phase
space equals 2L where L is the perimeter length of a b
the boundary of the billiard table, and the mean free
path equals A=L, where A is the area of the billiard
table. In the general n-dimensional case, the mean
free path equals
volSn1 volM
volBn1 vol@M Figure 2 String construction and mirror equation.
298 Billiards in Bounded Convex Domains
crucial fact is that, in appropriate coordinates, the One has the following theorem: a billiard
billiard ball map is approximated, near the bound- trajectory inside M remains tangent to fixed
ary of the phase cylinder, by the integrable map (n 1) confocal quadrics. A similar and closely
(x, y) 7! (x y, y). related result holds for the geodesic curves on M:
On the other hand, by a theorem of Mather, if the the tangent lines to a fixed geodesic on M are
curvature of a convex smooth billiard curve vanishes tangent to (n 2) other fixed quadrics, confocal
at some point, then this billiard ball map has no with M. For a triaxial ellipsoid, this theorem goes
invariant circles. This result belongs to the well- back to Jacobi.
developed theory of area-preserving twist maps of Explicit formulas for the integrals of the billiard
the cylinder, of which the billiard ball map is an in an n-dimensional ellipsoid [1] are as follows. Let
example. (x, v) be a phase point, a unit inward tangent vector
whose foot point x lies on the boundary. The
following functions are invariant under the billiard
ball map:
Integrable Billiards
X vi xj vj xi 2
Let a plane billiard table be an ellipse with foci F1 Fi x; v v 2i ; i 1; . . . ; n
and F2 . It is known since antiquity that a billiard j6i
a2j a2i
ball shot from F1 reflects to F2 . A generalization of
this optical property of the ellipse is the following these functions are not independent: F1 Fn = 1.
theorem: a billiard trajectory inside an ellipse In fact, the integrals Fi Poisson-commute (with
forever remains tangent to a fixed confocal conic. respect to the Poisson bracket associated with the
More precisely, if a segment of a billiard trajectory symplectic structure in the phase space of the
does not intersect the segment F1 F2 , then all the billiard ball map that was described above). Accord-
segments of this trajectory do not intersect F1 F2 and ing to the ArnoldLiouville theorem, this complete
are all tangent to the same ellipse with foci F1 and F2 ; integrability of the billiard inside an ellipsoid implies
and if a segment of a trajectory intersects F1 F2 , that the phase space is foliated by invariant tori and,
then all the segments of this trajectory intersect F1 F2 in appropriate coordinates, the map on each torus is
and are all tangent to the same hyperbola with foci a parallel translation.
F1 and F2 . Similar results on complete integrability hold
It follows that confocal ellipses are the caustics of for billiards inside quadrics in spaces of constant
the billiard inside an ellipse. In particular, a positive or negative curvature. The former is
neighborhood of the boundary of such a billiard the intersection of a quadratic cone with the
table is foliated by caustics. A long-standing unit sphere, and the latter with the unit
conjecture, attributed to Birkhoff, asserts that if a pseudosphere.
neighborhood of a strictly convex smooth boundary
of a billiard table is foliated by caustics, then this
Periodic Orbits
table is an ellipse. This conjecture remains open. The
best result in this direction is a theorem of Bialy: if Periodic billiard trajectories inside a planar billiard
almost every phase point of the billiard ball map in a table correspond to inscribed polygons of extremal
strictly convex billiard table belongs to an invariant perimeter length. When counting periodic trajec-
circle, then the billiard table is a disk. tories, one does not distinguish between polygons
The multidimensional analogs of the optical obtained from each other by cyclic permutation or
properties of an ellipse are as follows. Consider an reversing the order of the vertices. In other words,
ellipsoid M in Rn given by the equation one counts the orbits of the dihedral group Dn
acting on n-periodic billiard polygons.
x21 x22 x2n An additional topological characteristic of a
1 1
a21 a22 a2n periodic billiard trajectory is the rotation number
defined as follows. Assume that the boundary of a
and define the confocal family of quadrics M by the billiard table is parametrized by the unit circle and
equation consider a polygon (x1 , x2 , . . . , xn ) inscribed in .
x21 x2 x2 For all i, one has xi 1 = xi ti with ti 2 (0, 1). Since
2 2 2 n 1 the polygon is closed, t1 tn 2 Z. This integer,
a21 a2 an
that takes values from 1 to n 1, is called the
where is a real parameter. The topological type of rotation number of the polygon and denoted by .
M changes as passes the values a2i . Changing the orientation of a polygon replaces the
Billiards in Bounded Convex Domains 299
4 5 4 4
5 2
3
3 2 3 2 5
5
2 4 3
1 1
1 1
rotation number by n . The leftmost 5-periodic f = f , f j@M = 0. From the physical point of view,
trajectory in Figure 3 has = 1 and the other three the eigenvalues are the eigenfrequencies of the
= 2. membrane M with a fixed boundary. Roughly
The following theorem is due to Birkhoff: for speaking, one can recover the length spectrum from
every n 2 and b(n 1)=2c, coprime with n, that of the Laplacian. More precisely, the following
there exist two geometrically distinct n-periodic theorem of K Anderson and R Melrose holds:
billiard trajectories with the rotation number . For X p
example, there are at least two 2-periodic billiard cos t i
trajectories inside every smooth oval: one is the i 2spec
diameter, the longest chord, and another one is of is a well-defined generalized function (distribution)
minimax type, similar to the minor axis of an of t, smooth away from the length spectrum. That is,
ellipse. if l > 0 belongs to the singular support of this
In higher dimensions, lower bounds on the distribution, then there exists either a closed billiard
number of periodic billiard trajectories inside strictly trajectory of length l, or a closed geodesic of length l
convex domains with smooth boundaries were in the boundary of the billiard table.
obtained only recently by Farber and the present This relation between the Laplacian and the
author. Here is one of the results: for a generic length spectrum is due to the fact that geometric
billiard table in Rm , the number of n-periodic optics is not a very accurate description of light. In
trajectories is not less than (n 1)(m 1). The wave optics, light is considered as electromagnetic
proof consists in using the Morse theory to estimate waves, and geometric optics gives a realistic approx-
below the number of critical points of the perimeter imation only when the wave length is small. This
length function on the space of inscribed n-gons and small-wave approximation is based on the assump-
its quotient space by the dihedral group Dn , and the tion that the waves are locally almost harmonic,
main difficulty is in describing the topology of these while their amplitudes change slowly from point to
spaces. point. The substitution of such a function into the
Returning to convex smooth planar billiards, the corresponding PDEs gives, in the first approxima-
following conjecture remains open for a long time: tion, the equations of wave fronts, that is, of
the set of n-periodic points of the billiard ball map geometric optics.
has zero measure. This is easy for n = 2; for n = 3 Here is another spectral result concerning a
this is a theorem by M Rychlik. The motivation for smooth strictly convex plane domain, due to
this question comes from spectral geometry. In S Marvizi and R Melrose. Let Ln be the supremum
particular, according to a theorem of Ivrii, the and ln the infimum of the perimeters of simple
above conjecture implies the Weyl conjecture on billiard n-gons. Then,
the second term for the spectral asymptotics of the
Laplacian in a bounded domain with the Dirichlet lim nk Ln ln 0
n!1
or Neumann boundary conditions.
for any positive k. Furthermore, Ln has an asymp-
totic expansion, as n ! 1,
Length Spectrum X1
ci
The set of lengths of the closed trajectories in a Ln l
i1
n2i
convex billiard M is called the length spectrum of M.
There is a remarkable relation between the length where l is the length of the boundary of billiard table
spectrum and the spectrum of the Laplace operator and ci are constants, depending on the curvature of
in M with the Dirichlet boundary condition: the boundary.
300 Black Hole Mechanics
(see Hawking and Ellis (1973)). Thus, black holes can satisfies the t orthogonality property, its event
merge but can not bifurcate. (By a time reversal, i.e., by horizon E is a Killing horizon. (Although one can
replacing J with J and I with I , one can define a envisage stationary black holes in which these
white hole region W. However, here we will focus only additional symmetry conditions are not met, this
on black holes.) possibility has been ignored in black hole mechanics
A spacetime (M, gab ) is said to be stationary (i.e., time on stationary spacetimes. Quasilocal horizons, dis-
independent) if gab admits a Killing field t a that cussed below, do not require any spacetime symme-
represents an asymptotic time translation. By conven- tries.) In these cases, the normalization freedom in
tion, t a is assumed to be unit at infinity. (M, gab ) is said Ka is fixed by requiring that Ka have the form
to be axisymmetric if gab admits a Killing field a
Ka ta a 2
generating an SO(2) isometry. By convention a is
normalized such that the affine length of its integral on the horizon, where is a constant, called the
curves is 2. Stationary spacetimes with nontrivial Mn angular velocity of the horizon. The resulting is
I (J ) represent black holes which are in global called the surface gravity of the black hole. It is
equilibrium. In the EinsteinMaxwell theory in four remarkable that is constant for all such black
dimensions, there exists a unique three-parameter holes, even when their horizon is highly distorted
family of stationary black hole solutions, generally (i.e., far from being spherically symmetric) either
parametrized by mass m, angular momentum J, and due to rotation or due to external matter fields. This
electric charge Q. This is the celebrated KerrNewman is analogous to the fact that the temperature of a
family. Therefore, in general relativity a great deal of thermodynamical system in equilibrium is constant,
work on black holes has focused on these solutions and independently of the details of the system. In
perturbations thereof. The KerrNewman family is analogy with thermodynamics, constancy of is
axisymmetric and furthermore, its metric has the referred to as the zeroth law of black hole
property that the 2-flats spanned by the Killing fields mechanics.
t a and a are orthogonal to a family of 2-surfaces. This Next, let us consider an infinitesimal perturbation
property is called t orthogonality. These features of within the three-parameter KerrNewman family.
KerrNewman space-times are widely used in black A simple calculation shows that the changes in the
hole physics. Note however that uniqueness fails in ArnowittDeserMisner (ADM) mass m, angular
higher dimensions, and also in the presence of momentum J, and the total charge Q of the
nonabelian gauge fields or rings of perfect fluids around spacetime and in the area a of the horizon are
black holes in four dimensions. In mathematical constrained via
physics, there is significant literature on the new
stationary black hole solutions in EinsteinYang m a J Q 3
8G
MillsHiggs theories. These are called hairy black
holes. Research on stationary black hole solutions with where the coefficients , , are black hole para-
rings received a boost by a recent discovery that these meters, = Aa Ka being the electrostatic potential at
black holes can violate the Kerr inequality J Gm2 the horizon. The last two terms, J and Q, have
between angular momentum J and mass m. the interpretation of work required to spin the
A null 3-manifold K in M is said to be a Killing black hole up by an amount J or to increase its
horizon if gab admits a Killing field Ka which is charge by Q. Therefore, [3] has a striking resem-
everywhere normal to K. On a Killing horizon, one blance to the first law, E = TS W, of thermo-
can show that the acceleration of Ka is proportional dynamics if (as the zeroth law suggests) is made
to Ka itself: proportional to the temperature T, and the horizon
area a to the entropy S. Therefore, [3] and its
Ka ra Kb Kb 1 generalizations discussed below are referred to as
the first law of black hole mechanics.
The proportionality function is called surface In KerrNewman spacetimes, the only contribu-
gravity. We will show in the next section that if a tion to the stressenergy tensor comes from the
mild energy condition holds on K, then must be Maxwell field. Bardeen et al. (1973) consider
constant. Note that if we rescale Ka via Ka ! cKa , stationary black holes with matter such as perfect
where c is a constant, surface gravity also rescales as fluids in the exterior region and stationary perturba-
! c. tions thereof. Using Einsteins equations, they
In the KerrNewman family, the event horizon is show that the form [3] of the first law does not
a Killing horizon. More generally, if an axisym- change; the only modification is addition of certain
metric, stationary black hole spacetime (M, gab ) matter terms on the right-hand side which can be
302 Black Hole Mechanics
interpreted as the work W done on the total physically. These considerations call for a replace-
system. A generalization in another direction was ment of E by a quasilocal horizon which leads to a
made by Iyer and Wald (1994) using Noether first law involving only horizon attributes, and
currents. They allow nonstationary perturbations which can grow only in response to the influx of
and, more importantly, drop the restriction to energy. Such horizons are discussed in the next two
general relativity. Instead, they consider a wide sections.
class of diffeomorphism-invariant Lagrangian
densities L(gab , Rabcd , ra Rbcde , . . . , .. .. , ra .. .. , . . . )
Local Equilibrium
which depend on the metric gab , matter fields .. .. ,
and a finite number of derivatives of the Riemann The key idea here is drop the requirement that
tensor and matter fields. Finally, they restrict spacetime should admit a stationary Killing field and
themselves to 6 0. In this case, on the maximal ask only that the intrinsic horizon geometry be time
analytic extension of the spacetime, the Killing field independent. Consider a null 3-surface in a
Ka vanishes on a 2-sphere So called the bifurcate spacetime (M, gab ) with a future-pointing normal
horizon. Then, [3] is generalized to field a . The pullback qab := gab of the spacetime
metric to is the intrinsic, degenerate metric of
m Shor W 4 with signature 0, , . The first condition is that it
2
be time independent, that is, L qab = 0 on .
Here W again represents work terms and Shor is Then by restriction, the spacetime derivative opera-
given by tor r induces a natural derivative operator D on .
I While D is compatible with qab , that is, Da qbc = 0, it
L
Shor 2 nab ncd 5 is not uniquely determined by this property because
So Rabcd
qab is degenerate. Thus, D has extra information,
where nab is the binormal to So (with nab nab = 2), not contained in qab . The pair (qab , D) is said to
and the functional derivative inside the integral is determine the intrinsic geometry of the null surface
evaluated by formally viewing the Riemann tensor . This notion leads to a natural definition of a
as a field independent of the metric. For the horizon in local equilibrium. Let be a null, three-
EinsteinHilbert action, this yields Shor = a=4G and dimensional submanifold of (M, gab ) with topology
one recovers [3]. S R, where S is compact and without boundary.
These results are striking. However, the under-
Definition 1 is said to be isolated horizon if it
lying assumptions have certain unsatisfactory
admits a null normal a such that:
aspects. First, although the laws are meant to refer
just to black holes, one assumes that the entire (i) L qab = 0 and [L , D] = 0 on and
spacetime is stationary. In thermodynamics, by (ii) T a b b is a future pointing causal vector on .
contrast, one only assumes that the system under
On can show that, generically, this null normal field
consideration is in equilibrium, not the whole
a is unique up to rescalings by positive constants.
universe. Second, in the first law, quantities a, ,
are evaluated at the horizon while M, J are Both conditions are local to . In particular, (M, gab )
evaluated at infinity and include contributions from is not required to be asymptotically flat and there is no
possible matter fields outside the black hole. A more longer any teleological feature. Since is null and
satisfactory law of black hole mechanics would L qab = 0, the area of any of its cross sections is the
involve attributes of the black hole alone. Finally, same, denoted by a . As one would expect, one can
the notion of the event horizon is extremely global show that there is no flux of gravitational radiation or
and teleological since it explicitly refers to J . An matter across . This captures the idea that the black
event horizon may well be developing in the very hole itself is in equilibrium. Condition (ii) is a rather
room you are sitting today in anticipation of a weak energy condition which is satisfied by all
gravitational collapse in the center of our galaxy matter fields normally considered in classical general
which may occur a billion years hence. This feature relativity. The nontrivial condition is (i). It extracts
makes it impossible to generalize the first law to from the notion of a Killing horizon just a tiny part
fully dynamical situations and relate the change in that refers only to the intrinsic geometry of . As a
the event horizon area to the flux of energy and result, every Killing horizon K is, in particular, an
angular momentum falling across it. Indeed, one can isolated horizon. However, a spacetime with an
construct explicit examples of dynamical black holes isolated horizon can admit gravitational radiation
in which an event horizon E forms and grows in the and dynamical matter fields away from . In fact, as a
flat part of a spacetime where nothing happens family of RobinsonTrautman spacetimes illustrates,
Black Hole Mechanics 303
gravitational radiation could even be present arbitra- matter fields along a define a vector field X() on
rily close to . Because of these possibilities, there are G. One shows that it is an infinitesimal canonical
many nontrivial examples and the transition from transformation, that is, satisfies LX() W = 0, where W
event horizons of stationary spacetimes to isolated is the symplectic structure on G. The Hamiltonian
horizons represents a significant generalization of H() generating this canonical transformation is
black hole mechanics. (In fact, the derivation of the given by
zeroth and the first law requires slightly weaker
assumptions, encoded in the notion of a weakly H J J1
I I 7
isolated horizon (Ashtekar et al. 2000, 2001).) 1 1
An immediate consequence of the requirement J !a a Aa a ? F
8G S 4 S
L qab = 0 is that there exists a 1-form !a on such
()
that Da b = !a b . Following the definition of on a where J1 is the ADM angular momentum at
Killing horizon, the surface gravity () of (, ) is infinity, S is any cross section of , and the area
()
defined as () = !a a . Again, under a ! ca , we have element thereon. The term J is independent of the
(c) = c . Together with Einsteins equations, the choice of S made in its evaluation and interpreted as
two conditions of Definition 1 imply L !a = 0 and the horizon angular momentum. It has numerous
a D[a !b] = 0. The Cartan identity relating the Lie properties that support this interpretation. In parti-
and exterior derivative now yields cular, it yields the standard angular momentum
expression in KerrNewman spacetimes.
Da !b b Da 0 6 To define horizon energy, one has to introduce a
Thus, surface gravity is constant on every isolated time-translation vector field ta . At infinity, ta must
horizon. This is the zeroth law, extended to horizons tend to a unit time translation. On , it must be a
representing local equilibrium. In the presence of an symmetry of qab . Since a and a are both horizon
electromagnetic field, Definition 1 and the field symmetries, ta = ca a on , for some constants
equations imply L Fab = 0 and a Fab = 0. The first of c and . However, unlike a , the restriction of ta to
these equations implies that one can always choose a cannot be fixed once and for all but must be
gauge in which L Aa = 0. By Cartan identity it then allowed to vary from one phase-space point to
follows that the electrostatic potential () := Aa a is another. In particular, on physical grounds, one
constant on the horizon. This is the Maxwell analog expects to be zero at a phase-space point
of the zeroth law. representing a nonrotating black hole but nonzero
In this setting, the first law is derived using a at a point representing a rotating black hole. This
Hamiltonian framework (Ashtekar et al. 2000, freedom in the boundary value of ta introduces a
2001). For concreteness, let us assume that we are qualitatively new element. The vector field X(t) on G
in the asymptotically flat situation and the only defined by the Lie derivatives of gravitational and
gauge field present is electromagnetic. One begins by matter fields does not, in general, satisfy LX(t) W = 0;
restricting oneself to horizon geometries such that it need not be an infinitesimal canonical transforma-
admits a rotational vector field a satisfying tion. The necessary and sufficient condition is that
L qab = 0. (In fact for black hole mechanics, it ((c) =8G)a J (c) Q be an exact var-
suffices to assume only that L ab = 0, where ab is iation. That is, X(t) generates a Hamiltonian flow if
the intrinsic area 2-form on . The same is true on and only if there exists a function E(t) on G such that
dynamical horizons discussed in the next section.) t c
One then constructs a phase space G of gravitational E a J c Q 8
8G
and matter fields such that (1) M admits an internal
boundary which is an isolated horizon; and (2) all This is precisely the first law. Thus, the framework
fields satisfy asymptotically flat boundary conditions provides a deeper insight into the origin of the first
at infinity. Note that the horizon geometry is law: it is the necessary and sufficient condition for
allowed to vary from one phase-space point to the evolution generated by ta to be Hamiltonian.
another; the pair (qab , D) induced on by the Equation [8] is a genuine restriction on the choice of
spacetime metric only has to satisfy Definition 1 and phase-space functions c and , that is, of restrictions
the condition L qab = 0. to of evolution fields ta . It is easy to verify that M
Let us begin with angular momentum. Fix a admits many such vector fields. Given one, the
vector field a on M which coincides with the fixed Hamiltonian H(t) generating the time evolution
a on and is an asymptotic rotational symmetry along ta takes the form
at infinity. (Note that a is not restricted in any way t
in the bulk.) Lie derivatives of gravitational and Ht Et
1 E 9
304 Black Hole Mechanics
re-enforcing the interpretation of E(t) as the horizon It is tempting to ask if there is a local physical
energy. process directly responsible for the growth of area.
In general, there is a multitude of first laws, one for For event horizons, the answer is in the negative
each vector field ta , the evolution along which preserves since they can grow in a flat portion of spacetime.
the symplectic structure. In the EinsteinMaxwell However, one can introduce quasilocal horizons
theory, given any phase-space point, one can choose a also in the dynamical situations and obtain the
canonical boundary value toa exploiting the uniqueness desired result (Ashtekar and Krishnan 2003). These
theorem. E(to ) is then called the horizon mass and constructions are strongly motivated by earlier ideas
denoted simply by m . In the KerrNewman family, introduced by Hayward (1994).
H(to ) vanishes and m coincides with the ADM mass
Definition 2 A three-dimensional spacelike sub-
m1 . Similarly, if a is chosen to be a global rotational
() () manifold H of (M, gab ) is said to be a dynamical
Killing field, J equals J1 . However, in more general
horizon if it admits a foliation by compact
spacetimes where there is matter field or gravitational
2-manifolds S (without boundary) such that:
radiation outside , these equalities do not hold; m
and J represent quantities associated with the (i) the expansion () of one (future directed) null
horizon alone while the ADM quantities represent normal field a to S vanishes and the expansion
the total mass and angular momentum in the space- of the other (future directed) null normal field,
time, including contributions from matter fields and na is negative; and
gravitational radiation in the exterior region. In the (ii) T a b b is a future pointing causal vector on H.
first law [8], only the contributions associated with
One can show that this foliation of H is unique and
the horizon appear.
that S is either a 2-sphere or, under degenerate and
When the uniqueness theorem fails, as, for
physically over-restrictive conditions, a 2-torus. Each
example, in the EinsteinYangMillsHiggs theory,
leaf S is a marginally trapped surface and referred to as a
first laws continue to hold but the horizon mass m
cut of H. Unlike event horizons E, dynamical horizons
becomes ambiguous. Interestingly, these ambiguities
H are locally defined and do not display any teleological
can be exploited to relate properties of hairy black
feature. In particular, they cannot lie in a flat portion of
holes with those of the corresponding solitons. (For
spacetime. Dynamical horizons commonly arise in
a summary, see Ashtekar and Krishnan (2004).)
numerical simulations of evolving black holes as world
tubes of apparent horizons. As the black hole settles
down, H asymptotes to an isolated horizon , which
Dynamical Situations tightly hugs the asymptotic future portion of the event
horizon. However, during the dynamical phase, H
A natural question now is whether there is an analog of
typically lies well inside E.
the second law of thermodynamics. Using event
The two conditions in Definition 2 immediately
horizons, Hawking showed that the answer is in the
imply that the area of cuts of H increases mono-
affirmative (see Hawking and Ellis (1973)). Let (M, gab )
tonically along the outward direction defined by
admit an event horizon E. Denote by a a geodesic null
the projection of a on H. Furthermore, this change
normal to E. Its expansion is defined as () := qab ra b ,
turns out to be directly related to the flux of energy
where qab is any inverse of the degenerate intrinsic
falling across H. Let R denote the radius function
metric qab on E, and determines the rate of change of the
on H so that the area of any cut S is given by
area element of E along a . Assuming that the null energy
aS = 4R2 . Let N denote the norm of @a R and H,
condition and Einsteins equations hold, the Raychaud-
the portion of H bounded by two cross sections S1
huri equation immediately implies that if () were to
and S2 . The appropriate energy turns out to be
become negative somewhere it would become infinite
associated with the vector field Na , where a is
within a finite affine parameter. Hawking showed that,
normalized such that its projection on H is the unit
if there is a globally hyperbolic region containing
normal ^r a to the cuts S. In the generic and
I (J ) [ E that is, if there are no naked singularities
physically interesting case when S is a 2-sphere, the
this can not happen, whence () 0 on E. Hence, if a
Gauss and the Codazzi (i.e., constraint) equations
cross section S2 of E is to the future of a cross section S1 ,
imply
we must have aS2 aS1 . Thus, in any (i.e., not
Z
necessarily infinitesimal) dynamical process, the change 1 1
a in the horizon area is always non-negative. This R2 R1 Tab Na ^b d3 V
2G H 16G
result is known as the second law of black hole Z
mechanics. As in the first law, the analog of entropy is N ab ab 2
a
a d3 V 10
the horizon area. H
Black Hole Mechanics 305
Here ^ a is the unit normal to H, ab the shear of a a cosmological constant . (The only significant
(i.e., the tracefree part of qam qbm rm n ), and
a = change is that the topology of cuts S of dynamical
qab^rc rc b , where qab is the projector onto the horizons is restricted to be S2 if > 0 and is
tangent space of the cuts S. The first integral on completely unrestricted if < 0.) In the first two
the right-hand side can be directly interpreted as the frameworks, results have also been extended to higher
flux across H of matterenergy (relative to the dimensions. Since the notions of isolated and dynami-
vector field Na ). The second term is purely cal horizons make no reference to infinity, these
geometric and is interpreted as the flux of energy frameworks can be used also in spatially compact
carried by gravitational waves across H. It has spacetimes. The notion of an event horizon, by
several properties which support this interpretation. contrast, does not naturally extend to these space-
Thus, not only does the second law of black hole times. On the other hand, the generalization [4] of the
mechanics hold for a dynamical horizon H, but the first law [3] is applicable to event horizons of
cause of the increase in the area can be directly stationary spacetimes in a wide class of theories while
traced to physical processes happening near H. so far the isolated and dynamical horizon frameworks
Another natural question is whether the first law are tied to general relativity (coupled to matter
[8] can be generalized to fully dynamical situations, satisfying rather weak energy conditions). From a
where is replaced by a finite transition. Again, the mathematical physics perspective, extension to more
answer is in the affirmative. We will outline the idea general theories is an important open problem.
for the case when there are no gauge fields on H. As
with isolated horizons, to have a well-defined notion See also: Asymptotic Structure and Conformal Infinity;
of angular momentum, let us suppose that the Branes and Black Hole Statistical Mechanics; Dirac
intrinsic 3-metric on H admits a rotational Killing Fields in Gravitation and Nonabelian Gauge Theory;
Geometric Flows and the Penrose Inequality; Loop
field . Then, the angular momentum associated
Quantum Gravity; Minimal Submanifolds; Quantum Field
with any cut S is given by
Theory in Curved Spacetime; Quantum Geometry and its
I I
1 1 Applications; Random Algebraic Geometry, Attractors
JS Kab a^rb d2 V j d2 V 11 and Flux Vacua; Shock Wave Refinement of the
8G S 8G S
FriedmanRobertsonWalker Metric; Stationary Black
where Kab is the extrinsic curvature of H in (M, gab ) and Holes.
j() is interpreted as the angular momentum density.
Now, in the Kerr family, the mass, surface gravity, and
the angular velocity can be unambiguously expressed as Further Reading
well-defined functions m(a, J) of the
J), (a, J), and (a,
Ashtekar A, Beetle C, and Lewandowski J (2001) Mechanics
horizon area a and angular momentum J. The idea is to of rotating black holes. Physical Review 64: 044016 (gr-qc/
use these expressions to associate mass, surface gravity, 0103026).
and angular velocity with each cut of H. Then, a Ashtekar A, Fairhurst S, and Krishnan B (2000) Isolated horizons:
surprising result is that the difference between the Hamiltonian evolution and the first law. Physical Review D
62: 104025 (gr-qc/0005083).
horizon masses associated with cuts S1 and S2 can be
Ashtekar A and Krishnan B (2003) Dynamical horizons and their
expressed as the integral of a locally defined flux across properties. Physical Review D 68: 104030 (gr-qc/0308033).
the portion H of H bounded by H1 and H2 : Ashtekar A and Krishnan B (2004) Isolated and dynamical
Z I horizons and their applications. Living Reviews in Relativity
1 1 d2 V 10: 178 (gr-qc/0407042).
m2 m 1 da j
8G H 8G S2 Bardeen JW, Carter B, and Hawking SW (1973) The four laws of
I Z 2 I black hole mechanics. Communications in Mathematical
d2 V Physics 31: 161.
j
d j d2 V 12 DeWitt BS and DeWitt CM (eds.) (1972) Black Holes.
S1 1
S
Amsterdam: North-Holland.
If the cuts S2 and S1 are only infinitesimally separated, Frolov VP and Novikov ID (1998) Black Hole Physics.
this expression reduces precisely to the standard first Dordrecht: Kluwer.
Hawking SW and Ellis GFR (1973) Large Scale Structure of
law involving infinitesimal variations. Therefore, [12] is Space-Time. Cambridge: Cambridge University Press.
an integral generalization of the first law. Hayward S (1994) General laws of black hole dynamics. Physical
Let us conclude with a general perspective. On the Review D 49: 64676474.
whole, in the passage from event horizons in Iyer V and Wald RM (1994) Some properties of noether charge
stationary spacetimes to isolated horizons and then and a proposal for dynamical black hole entropy. Physical
Review D 50: 846864.
to dynamical horizons, one considers increasingly Wald RM (1994) Quantum Field Theory in Curved Spacetime and
more realistic situations. In all the three cases, the Black Hole Thermodynamics. Chicago: University of Chicago
analysis has been extended to allow the presence of Press.
306 Boltzmann Equation (Classical and Quantum)
f x; v f x; v01 f x; vf x; v1
0
2
The Boltzmann Heuristic Argument
and
Thus, we want to find an evolution equation for the
v0 v nn v v1 quantity f (x, v; t). The molecular system we are
3
v01 v1 nn v v1 considering consists of N identical particles of
diameter r in the whole space R3 . We denote by
Moreover, n (the impact parameter) is a unitary x1 , v1 , . . . , xN , vN a state of the system, where xi and
vector and S2 = {njn (v v1 ) 0}. vi indicate the position and the velocity of the
Note that v0 , v01 are the outgoing velocities after a particle i. The particles cannot overlap (i.e., the
collision of two elastic balls with incoming velocities centers of two particles cannot be at a distance
v and v1 and centers x and x rn, r being the smaller than the particle diameter r).
diameter of the spheres. Obviously, the collision The particles are moving freely up to the first
takes place if n (v v1 ) 0. Equations [3] are a instance of contact, that is, the first time when two
consequence of the conservation of total energy, particles (say particles i and j) arrive at a distance r.
momentum, and angular momentum. Note also that Then the pair interacts when an elastic collision
r does not enter in eqn [1] as a parameter. occurs. This means that they change instantaneously
Boltzmann Equation (Classical and Quantum) 307
their velocities, according to the conservation of that we have to integrate over the hemisphere
the energy and linear and angular momentum. S2 = {(v2 v) n > 0}:
More precisely, the velocities after a collision Z
with incoming velocities v and v1 are those given 2
G N 1r dv2
by formula [3]. After the first collision, the Z
system evolves by iterating the procedure. Here dn f2 x; v; x nr; v2 jv2 v nj 11
we neglect triple collisions because they are S
unlikely. The evolution equation for a tagged
Summing G and L, we get
particle is then of the form
Z
@t v rx f Coll 7 Coll N 1r2 dv2
Z
where Coll denotes the variation of f due to the dn f2 x; v; x nr; v2 v2 v n 12
collisions.
We have which, however, is not a very useful expression
Coll G L 8 because the time derivative of f is expressed in terms
of another object, namely f2 . An evolution equation
where L and G (the loss and gain terms, respectively) for f2 will imply f3 , the joint distribution of three
are the negative and positive contributions to the particles, and so on, up to we include the total
variation of f due to the collisions. More precisely, particle number N. Here the basic main assumption
L dx dv dt is the probability of the test particle to of Boltzmann enters, namely that two given particles
disappear from the cell dx dv of the phase space are uncorrelated if the gas is rarefied, namely
because of a collision in the time interval (t, t dt)
and Gdx dv dt is the probability to appear in the f x; v; x2 ; v2 f x; vf x2 ; v2 13
same time interval for the same reason. Let us
Condition [13], referred to as the propagation of
consider the sphere of center x with radius r and a
chaos, seems contradictory at first sight: if two
point x rn over the surface, where n denotes the
particles collide, correlations are created. Even though
generic unit vector. Consider also the cylinder with
we could assume eqn [13] at some time, if the test
base area dS = r2 dn and height jVjdt along the
particle collides with particle 2, such an equation
direction of V = v2 v.
cannot be satisfied anymore after the collision.
Then a given particle (say particle 2) with velocity
Before discussing the propagation of chaos
v2 can contribute to L because it can collide with the
hypothesis, we first analyze the size of the collision
test particle in the time dt, provided it is localized in
operator. We remark that, in practical situations
the cylinder and if V n 0. Therefore, the contri-
for a rarefied gas, the combination Nr3 104 cm3
bution to L due to the particle 2 is the probability of
(i.e., the volume occupied by the particles) is very
finding such a particle in the cylinder (conditioned to
small, while Nr2 = O(1). This implies that G = O(1).
the presence of the first particle in x). This quantity is
Therefore, since we are dealing with a very large
f2 (x, v, x nr, v2 ) j (v2 v) njr2 dn dv2 dt, where f2
number of particles, we are tempted to perform the
is the joint distribution of two particles. Integrating in
limit N ! 1 and r ! 0 in such a way that
dn and dv2 , we obtain that the total contribution to
r2 = O(N1 ). As a consequence, the probability that
L due to any predetermined particle is
Z Z two tagged particles collide (which is of the order of
the surface of a ball, i.e., O(r2 )) is negligible.
r2 dv2 dn f2 x; v; x nr; v2 jv2 v nj 9
S2 However, the probability that a given particle
performs a collision with any one of the remaining
where S2 is the unit hemisphere (v2 v) n < 0. N 1 particles (which is O(Nr2 ) = O(1)) is not
Finally, we obtain the total contribution multiplying negligible. Therefore, condition [13] is referring to
by the total number of particles: two preselected particles (say particles 1 and 2), so
Z
that it is not unreasonable to conceive that it holds
L N 1r2 dv2 in the limiting situation in which we are working.
Z However, we cannot insert [13] in [12] because
dn f2 x; v; x nr; v2 jv2 v nj 10 this latter equation refers to instants before and after
S
the collision and, if we know that a collision took
The gain term can be derived analogously by place, we certainly cannot invoke eqn [13]. Hence, it
considering that we are looking at particles which is more convenient to assume eqn [13] in the loss
have velocities v and v2 after the collisions so term and work over the gain term to keep advantage
308 Boltzmann Equation (Classical and Quantum)
of the factorization property which will be assumed a two-body interaction V = V(r), the resulting
only before the collision. Boltzmann equation is eqn [1], with
Coming back to eqn [11] for the outgoing pair Z Z
velocities v, v2 (satisfying the condition (v2 v) n > 0), Qf ; f dv1 dn Bv v1 ; n f 0 f10 ff1 17
we make use of the continuity property S2
where we are using the usual shorthand notation:
f2 x; v; x nr; v2 f2 x; v0 ; x nr; v02 14
where the pair v0 , v02 is pre-collisional. On f2 f 0 f x; v0 ; f10 f x; v01 ; f f x; v;
18
expressed before the collision, we can reasonably f1 f x; v1
apply condition [13] and obtain and B = B(v v1 ; n) is a suitable function of the
Z Z relative velocity v v1 and the impact parameter n,
2
G L N 1r dv2 dnv v2 n which is proportional to the cross section relative to
S2
the potential V. Another equivalent, sometimes
f x; v f x nr; v02
0
more convenient, way, to express eqn [17] is
f x; vf x nr; v2 15 Z Z Z
0
Qf ; f dv1 dv dv01 W v; v1 jv0 ; v01
after a change n ! n in the gain term, using the
0 0
notation S2 for the hemisphere {nj = (v2 v) n 0}. f f1 ff1 19
This transforms the pair v0 , v02 from a pre-collisional
to a post-collisional pair. with
Finally, in the limit N ! 1, r ! 0, Nr2 = 1 , we
W v; v1 jv0 ; v01
find
w v; v1 jv0 ; v01 v v1 v0 v01
@t v rx f 2 2
Z Z 12 v2 v21 v0 v01 20
1 dv2 dnv v2 n
S where w is a suitable kernel. All the qualitative
properties, such as the conservation laws and the
f x; v0 f x; v02 f x; vf x; v2 16
H-theorem, are obviously still valid.
The parameter , called mean free path, represents,
roughly speaking, the typical length a particle can
cover without undergoing any collision. In eqns [1] Consequences
and [2], we just chose = 1. The Boltzmann equation provoked a debate involving
Equation [16] (or, equivalently, eqns [1] and [2]) is Loschmidt, Zermelo, and Poincare, who outlined
the Boltzmann equation for hard spheres. Such an inconsistencies between the irreversibility of the equa-
equation has a statistical nature, and it is not tion and the reversible character of the Hamiltonian
equivalent to the Hamiltonian dynamics from which dynamics. Boltzmann argued the statistical nature of
it has been derived. Indeed, the H-theorem shows that his equation and his answer to the irreversibility
such an equation is not reversible in time as expected paradox was that most of the configurations behave
of any law of mechanics. as expected by the thermodynamical laws. However,
This concludes the heuristic preliminary analysis of he did not have the probabilistic tools for formulating
the Boltzmann equation. We certainly know that the in a precise way the statements of which he had a
above arguments are delicate and require a more precise intuition.
rigorous and deeper analysis. If we want the Boltzmann Grad (1949) stated clearly the limit N ! 1,
equation not to be a phenomenological model, derived r ! 0, Nr2 ! const:, where N is the number of
by ad hoc assumptions and justified only by its particles and r is the diameter of the molecules, in
practical relevance, but rather that it is a consequence which the Boltzmann equation is expected to hold.
of a mechanical model, we must derive it rigorously. In This limit is usually called the BoltzmannGrad limit
particular, the propagation of chaos should be not a (BG limit in the sequel).
hypothesis but the statement of a theorem. The problem of a rigorous derivation of the
Boltzmann equation was an open and challenging
problem for a long time. Lanford (1975) showed that,
Beyond the Hard Spheres
although for a very short time, the Boltzmann equation
The heuristic arguments we have developed so far can be derived starting from the mechanical model of the
can be extended to different potentials than that of hard-sphere system. The proof has a deep content but is
the hard-sphere systems. If the particles interact via relatively simple from a technical viewpoint.
Boltzmann Equation (Classical and Quantum) 309
Z
Existence 1 3 1
v2 M dv T u2 25
The mathematical study of the Boltzmann equation 2 2 2
starts with the problem of proving the existence of Moreover, the only solution to the equation
the solutions. One would like to be able to show that, Z
for all (or at least for a physically significant family hvQf ; f dv 0 26
of) initial distributions (which are positive and
summable functions) with finite momentum, energy, is any linear combination of the quantities (1, v, v2 ),
and entropy, there exists a unique solution to eqn [1] called collision invariants. The last property
with the same mass, momentum, and energy as of the obviously corresponds to the mass, momentum,
initial distribution. Moreover, the entropy should and energy conservation.
decrease and the solution should approach the right With this in mind, consider a change of
Maxwellian as t ! 1. The problem, in such a variables in the Boltzmann equation [1], passing
generality, is still unsolved, but several results in this from microscopic to macroscopic variables,
direction have been achieved since the pioneering x ! "x, t ! "t. Here " is a small scale parameter
works due to Carleman (1933) for the homogeneous expressing the ratio between the typical inter-
equation. Actually, there are satisfactory results for particle distances and the typical distances over
some special situations, such as the homogeneous which the macroscopic equations are varying.
solutions (independent of x) close to the equilibrium, Such a change yields
to the vacuum, or to homogeneous data. The most
1
general result we have up to now is, unfortunately, @t v rx f" Qf" ; f" 27
not constructive. This is due to Di Perna and Lions "
(1989), who showed the existence of suitable weak We need to allow the small parameter " (mean free
solutions to eqn [1]. However, we still do not know path or the Knudsen number) to tend to zero. In
whether such solutions, which preserve mass and order to eliminate the singularity on the right-hand
momentum, and satisfy the H-theorem, are unique side of [27], we multiply both sides by the collision
and also preserve the energy. invariants v with = 0, 1, 2; and obtain the five
equations:
Z
dv v @t v rx f" 0 28
Hydrodynamics
The derivation of hydrodynamical equations from On the other hand, if f" converges to f, as " ! 0,
the Boltzmann equation is a problem as old as the necessarily Q(f , f ) = 0 and hence f = M. Therefore,
equation itself and, in fact, it goes back to Maxwell we expect that in the limit " ! 0,
and Hilbert. Preliminary to the discussion of the Z
hydrodynamic limit, we establish a few properties of dv v @t v rx M 0 29
the collision kernel.
It is a well-known fact that the only solution to Equation [29] fixes a relation among the fields , u, T
the equation as functions of x and t. A standard computation gives
us the Euler equations for compressible gas
Qf ; f 0 21
@t divu 0 30
is a local Maxwellian, namely
f x; v : Mx; v 1
@t u u ru rp 0 31
x 2
3=2
ejvuxj =2Tx 22
2Tx
@t T u rT 23Tru 0 32
where the local parameters , u, and T satisfy the
where the pressure p is related to the density and
relations
the temperature T by the perfect gas law
Z
M dv 23 p T 33
Namely, he expressed a formal solution to eqn [27] the upstream and the downstream values of the
in the form of a power series expansion: densities, mean velocities, and temperatures. Such
X relations are known in gas dynamics as the
f" fj "j 34 RankineHugoniot conditions. A solution of this
j0
problem has been found by Caflisch and Nikolaenko
where f0 is the local Maxwellian, with the para- (1983) in case of a weak shock (namely, when M
meters , u, T satisfying the Euler equations. All the and M are close) by using Hilbert expansion
other coefficients fj of the developments can be techniques. More recently, Liu and Yu (2004)
determined by recurrence, inverting suitable opera- established also stability and positivity of this
tors. However, the series is not expected to be solution.
convergent, so that the way to show the validity of
the hydrodynamical limit rigorously is to truncate
the expansion and to control the remainder. The Quantum Kinetic Theory
first result in this direction was obtained by Caflisch
(1980). However, this approach is based on the Uehling and Uhlembeck (1933) introduced the
regularity of the solutions to the Euler equations, following kinetic equation for describing a large
which is known to hold only for short times since system of weakly interacting bosons or fermions:
Z Z Z
shocks can be formed. How to approximate the 0
shocks in terms of a kinetic description is still a @t v rx f dv1 dv dv01 W v; v1 jv0 ; v01
difficult and open problem. f1 f 1 f1 f 0 f10
Note that the hydrodynamical picture of the
Boltzmann equation just means that we are looking 1 f 0 1 f10 ff1 g 36
at the solutions of this equation at a suitable Here the / sign, stand for bosons/fermions,
macroscopic scale. The rarefaction hypothesis respectively, and
underlying the Boltzmann description is reflected in
the law of perfect gas, which states that the W v; v1 jv0 ; v01
particles, in the local thermal equilibrium, are free. ^ 0 v1 2 v v1 v0 v0
^ 0 v Vv
Vv 1
2
12 v2 v21 v0 2 v01 37
Stationary Problems Moreover,
Stationary non-Maxwellian solutions to the Z
^
Vp 4 dx eipx 38
Boltzmann equation should describe stationary
nonequilibrium states exhibiting nontrivial flows.
In spite of the physical relevance of these problems, where V is the interaction potential. Note that eqn
not many complete mathematical results are, at the [37] is the expression of the cross section of a
moment, available. Among them, there is the quantum scattering in the Born approximation.
traveling-wave problem, which can be formulated The unknown f = f (x, v; t) in eqn [37] is the expected
in the following way. We look for a solution number of molecules falling in the unit (quantum) cell
f = f (x ct, v), f : R R 3 ! R , constant in form of the phase space. This function is proportional to the
but traveling with a constant velocity c > 0, to one-particle Wigner function, introduced by Wigner
(1932) to handle kinetic problems in quantum
v1 cf 0 Qf ; f 35 mechanics, and defined as (setting h = 1):
0
where v1 is the first component of v and f denotes Z
1
the spatial derivative of f. Equation [35] must be 3
dy eiyv x 12 y; x 12 y
2
complemented by the boundary conditions which
are f ! M , as x ! 1, where M are the right where (x; z) is the kernel of a one-particle density
and left Maxwellians, namely two prescribed equili- matrix. Basically, the Wigner function is an equiva-
brium situations at infinity. The parameters (density, lent way to describe a state of a quantum system.
mean velocity, and temperature) of the Maxwel- For instance, eqn [40] below expresses the equili-
lians, however, cannot be chosen arbitrarily. Indeed, brium distributions for bosons and fermions in
the conservations of the mass, momentum, and terms of Wigner functions. In general, the Wigner
energy (which are properties of Q) imply the functions, due to the uncertainty principle, are real
conservations (in x) of the fluxes of these quantities. but not necessarily positive; however, the integral
Hence, we have to impose five equations that relate with respect to x and v gives the probability
Boltzmann Equation (Classical and Quantum) 311
distributions of the velocity and the position, limit, which consists in scaling space and time and the
respectively. In the kinetic regime, in which we are interaction potential as
interested, the scales are mesoscopic, namely the p
typical quantum oscillations are on a scale much x ! "x; t ! "t; ! " 43
smaller than the characteristic scales of the problem,
where "1 = N 1=3 is a parameter diverging when the
so that we expect that f should be a genuine
number of particles N tends to infinity.
probability distribution, since the Heisenberg
We mention, incidentally, that under such a
principle does not play an essential role. However,
scaling, a classical system is described by a transport
the interaction occurs on a microscopic scale, so that
equation, called FokkerPlanckLandau equation,
we expect that the statistics play a role in addition with a diffusion operator in the velocity space.
to the quantum rules for the scattering. The BG limit considered for classical particle
In this framework, the entropy functional is
systems is different from that considered here
Z Z
for weakly interacting quantum systems. It is actually
Hf dx dv f x; v log f x; v equivalent to rescaling space and time according to
1 f x; v log1 f x; v 39 x ! "x; t ! "t 44
It is decreasing along the solutions to eqn [35] and it is leaving the interaction unscaled but, in order to
also minimized (among the distributions with given control the total interaction, we make the density
mass, momentum, and energy) by the equilibria diverging gently as "1 = N 1=2 .
z A quantum system under such a scaling is expected to
Mv 2 40 be described by a Boltzmann equation [1] with the
e=2jvuj
z collision operator Q computed with the full quantum
namely the BoseEinstein and the FermiDirac cross section. Now we do not have any effect of the
distributions, respectively. Here > 1 and z > 0 statistics because in this rarefaction limit these correc-
are the inverse temperature and the activity, respec- tions disappear. On the other hand, the cross section is
tively. Note that, for the BoseEinstein distribution, that arising from the analysis of the quantum scattering.
z < 1. This creates, in a sense, an inconsistency with Since we do not rescale the interaction, all the other
eqn [36]. Indeed, assuming u = 0 and an initial terms in the Born expansion of the cross section play a
distribution f = f0 (v) with the density larger than the role. This kind of Boltzmann equation is a good
maximal density allowed by eqn [40], namely description of a rarefied gas in which quantum effects
Z are not negligible.
1
c : dv =2v2 41
e 1 See also: Adiabatic Piston; Evolution Equations: Linear
and Nonlinear; Gravitational N-Body Problem (Classical);
it cannot converge to any equilibrium. In order to Interacting Particle Systems and Hydrodynamic
overcome this difficulty related to the Bose con- Equations; Kinetic Equations; Multiscale Approaches;
densation, one can enlarge the definition of the Nonequilibrium Statistical Mechanics: Dynamical
equilibria family by setting Systems Approach; Quantum Dynamical Semigroups.
1
Mv v 42
e=2v2 1
Further Reading
to take care of excess of mass by means of a condensate
Balesku R (1978) Equilibrium and Nonequilibrium Statistical
component. However, it is not clear whether eqn Mechanics. Moscow: Mir (distributed by Imported Publica-
[36] can actually describe the Bose condensation tions, Chicago, Ill).
since its derivation from the Schrodinger equation Caflisch RE (1980) The fluid dynamical limit of the nonlinear
requires, just from the very beginning, the existence of Boltzmann equation. Communications of Pure and Applied
bosonic quasifree states which can be constructed only Mathematics 33: 651666.
Caflisch RE and Nicolaenko B (1983) Shock waves and the
if the density is moderate. Further analyses are certainly Boltzmann equation. Nonlinear partial differential equations.
needed to clarify the situation. A rigorous derivation of Contemporary Mathematics 17: 3544.
the Uehling and Uhlembeck equation is, up to now, far Carleman T (1933) Sur la theorie de lequation integro-differentielle
from being obtained even for short times; nevertheless, de Boltzmann. Acta Mathematica 60: 91146.
such an equation is extensively used in the applications. Cercignani C (1998) Ludwig Boltzmann. The Man Who Trusted
Atoms. Oxford: Oxford University Press.
Equation [36] concerns a weakly interacting gas of Cercignani C, Illner R, and Pulvirenti M (1994) The Mathema-
quantum particles. From a mathematical viewpoint, it tical Theory of Dilute Gases. Springer Series in Applied
is expected to be valid in the so-called weak-coupling Mathematics, vol. 106. New York: Springer.
312 BoseEinstein Condensates
Di Perna RJ and Lions P-L (1989) On the Cauchy problem for the Liu T-P and Yu S-H (2004) Boltzmann equation: micromacro
Boltzmann equations: Global existence and weak stability. decompositions and positivity of shock profiles. Communica-
Annals of Mathematics 130: 321366. tions in Mathematical Physics 246(1): 133179.
Grad H (1949) On the kinetic theory of rarified gases. Spohn H (1994) Quantum kinetic equations. In: Fannes M, Maes C,
Communications in Pure and Applied Mathematics and Verbeure A (eds.) On Three Levels: Micro, Meso and Macro
2: 331407. Approaches in Physics. New York: Plenum.
Hilbert D (1916) Begrundung der Kinetischen Gastheorie. Uehling EA and Uhlembeck GE (1933) Transport phenomena in
Mathematische Annalen 72: 331407. EinsteinBose and FermiDirac gases. I. Physical Reviews
Lanford OE III (1975) Time evolution of large classical systems. 43: 552561.
In: Ehlers J, Hepp K, and Weidenmuller HA (eds.) Lecture Wigner EP (1932) On the quantum correction for thermodynamic
Notes in Physics, vol. 38, pp. 1111. Berlin: Springer. equilibrium. Physical Reviews 40: 749759.
BoseEinstein Condensates
F Dalfovo, L P Pitaevskii, and S Stringari, general ground, one can start with the definition
Universita di Trento, Povo, Italy of the one-body density matrix
2006 Elsevier Ltd. All rights reserved. y
^ rr
n1 r; r 0 ^ 0 1
The quantities
^ y (r) and (r) are the field operators
Introduction which create and annihilate a particle at point r,
In 1924 the Indian physicist S N Bose introduced a new respectively; they satisfy the bosonic commutation
statistical method to derive the blackbody radiation law relations
in terms of a gas of light quanta (photons). His work, ^
r; ^ y r 0 r r 0 ;
^
r; ^ 0 0
r 2
together with the contemporary de Broglies idea of
matterwave duality, led A Einstein to apply the same If the system is in a pure state described by the
statistical approach to a gas of N indistinguishable N-body wave function (r 1 , . . . , r N ), then the
particles of mass m. An amazing result of his theory was average [1] is taken following the standard rules of
the prediction that below some critical temperature a quantum mechanics and the one-body density
finite fraction of all the particles condense into the matrix can be written as
lowest-energy single-particle state. This phenomenon,
named BoseEinstein condensation (BEC), is a conse- n1 r; r 0
quence of purely statistical effects. For several years, Z
such a prediction received little attention, until 1938, N dr 2 dr N r;r 2 ; ...; r N r 0 ; r 2 ;.. .;r N 3
when F London argued that BEC could be at the basis of
the superfluid properties observed in liquid 4 He below involving the integration over the N 1 variables
2.17 K. A strong boost to the investigation of Bose r 2 , . .., r N . In the more general case of a statistical
Einstein condensates was given in 1995 by the observa- mixture of pure states, expression [3] must be
tion of BEC in dilute gases confined in magnetic traps averaged according to the probability for a system
and cooled down to temperatures of the order of a few to occupy the different states.
nK. Differently from superfluid helium, these gases Since n(1) (r, r 0 ) = (n(1) (r 0 , r)) the quantity n(1) ,
allow one to tune the relevant parameters (confining when regarded as a matrix function of its indices
potential, particle density, interactions, etc.), so to make r and r 0 , is Hermitian. It is therefore always possible
them an ideal test-ground for concepts and theories on to find a complete orthonormal basis of single-
BEC. particle eigenfunctions, i (r), in terms of which the
density matrix takes the diagonal form
X
What Is BEC? n1 r; r 0 ni i ri r 0 4
i
In nature, particles have either integer or half-
integer spin. Those having half-integer spin, like P ni are subject to the normal-
The real eigenvalues
electrons, are called fermions and obey the Fermi ization condition i ni = N and have the meaning of
Dirac statistics; those having integer spin are occupation numbers of the single-particle states i .
called bosons and obey the BoseEinstein statis- BEC occurs when one of these numbers (say, n0 )
tics. Let us consider a system of N bosons. In becomes macroscopic, that is, when n0 N0 is a
order to introduce the concept of BEC on a number of order N, all the others remaining of order 1.
BoseEinstein Condensates 313
In this case eqn [4] can be conveniently rewritten in The sum on the right is the number of noncondensed
the form particles (N N0 ), and the quantity N0 =N is called
X condensate fraction.
n1 r; r 0 N0 0 r0 r 0 ni i ri r 0 5 If the system is not uniform, the eigenfunctions of
i60
the density matrix are no longer plane waves but,
and the state represented by 0 (r) is called provided N is sufficiently large, the concept of BEC
BoseEinstein condensate. This definition is rather is still well defined, being associated with the
general, since it applies to any macroscopic (N 1) occurrence of a macroscopic occupation of a
system of indistinguishable bosons independently of single-particle eigenfunction 0 (r) of the density
mutual interactions and external fields. matrix. Thus, the condensed bosons can be
The one-body density matrix [1] contains informa- described by means of the function (r) =
tion on important physical observables. By setting p
N0 0 (r), which is a classical complex field playing
r = r 0 one finds the diagonal density of the system the role of an order parameter. This is the analog of
^ y rri
^ the classical limit of quantum electrodynamics,
nr n1 r; r h 6
R where the electromagnetic field replaces the micro-
with N = dr n(r). The off-diagonal components scopic description of photons. The function may
can instead be used to calculate the momentum also depend on time and can be written as
distribution
r; t jr; tj eiSr;t 11
np h ^ y ppi
^ 7 Its modulus determines the contribution of the
R
^
where (p) = (2 h) 3=2 ^ exp [ip r=
dr (r) h] is the condensate to the diagonal density [6], while the
field operator in momentum representation. By phase S is crucial in characterizing the coherence
inserting this expression for (p) ^ into eqn [7] one and superfluid properties of the system. The order
finds parameter [11], also named macroscopic wave
Z function or condensate wave function, is defined
1 s s only up to a constant phase factor. One can always
np 3
dR ds n1 R ; R eips=h
2h 2 2 multiply this function by the numerical factor ei
8 without changing any physical property. This
reflects the gauge symmetry exhibited by all the
where s = r r 0 and R = (r r 0 )=2. physical equations of the problem. Making an
Let us consider a uniform system of N particles in explicit choice for the value of the order parameter,
a volume V and take the thermodynamic limit and hence for the phase, corresponds to a formal
N, V ! 1 with density N/V kept fixed. The eigen- breaking of gauge symmetry.
functions of the density matrix are plane waves and
the lowest-energy state has zero momentum, p = 0,
and constant wave function 0 (r) = V 1=2 . BEC in
this state implies a macroscopic number of particles BEC in Ideal Gases
having zero momentum and constant density N0 =V. Once we have defined what is a BoseEinstein
The density matrix only depends on s = r r 0 and condensate, the next question is when such a
can be written as condensation occurs in a given system. The ideal
N0 1 X Bose gas provides the simplest example. So, let us
n1 s np eips=h 9 consider a gas of noninteracting bosons described
V V p60
by the Hamiltonian H ^ =P H ^ (1)
i i , where the Schro-
In the s ! 1 limit, the sum on the right vanishes due dinger equation H ^ (1) i (r) = i i (r) gives the spec-
to destructive interference between different plane trum of single-particle wave functions and
waves, but the first term survives. One thus finds that, energies. One can define an occupation number
in the presence of BEC, the one-body density matrix ni as the number of particles in the state with
tends to a constant finite value at large distances. This energy i . Thus, any given state of the many-body
behavior is named off-diagonal long-range order, system is specified by a set {ni }. The mean
since it involves the off-diagonal components of the occupation numbers, n i , can be calculated by
density matrix. Its counterpart in momentum space is using the standard rules of statistical mechanics.
the appearance of a singular term at p = 0: For instance, by considering a grand canonical
X ensemble at temperature T, one finds
np N0 p np0 p p0 10
p0 60 i fexpi 1g1
n 12
314 BoseEinstein Condensates
with = 1=(kB T). The chemical P potential is fixed equivalent to saying that BEC occurs when the
by the normalization condition i n i = N, where N mean distance between bosons is of the order of
is the average number of particles in the gas. For their de Broglie wavelength.
T ! 1 the chemical potential is negative and large. Another interesting case, which is relevant for the
It increases monotonically when T is lowered. Let us recent experiments with BEC in dilute gases con-
call 0 the lowest single-particle level in the fined in magnetic and/or optical traps, is that of an
spectrum. If at some critical temperature Tc the ideal gas subject to harmonic potentials. Let us
normalization condition can be satisfied with consider, for simplicity, an isotropic external poten-
! 0 , then the occupation of the lowest state, tial Vext (r) = (1=2)m!2ho r2 . The single-particle Hamil-
0 = N0 , becomes of order N and BEC is realized.
n tonian is H ^ (1) = (h2 =2m)r2 Vext (r) and its
Below Tc the normalization condition P must be eigenvalues are nx , ny , nz = (nx ny nz 3=2)h!ho .
replaced with N = N0 NT , where NT = i60 n i is The corresponding density of states is () =
the number of particles out of the condensate, that (1=2)(h!ho )3 2 . A natural thermodynamic limit for
is, the thermal component of the gas. Whether BEC this system is obtained by letting N ! 1 and
occurs or not, and what is the value of Tc depends !ho ! 0, while keeping the product N!3ho constant.
on the dimensionality of the system and the type of The condition for BEC to occur is that approaches
single-particle spectrum. the value 000 = (3=2)h!ho from below by cooling the
The simplest case is that of a gas confined in a gas down to Tc . Following the same procedure as
cubic box of volume V = L3 with periodic boundary for the uniform gas, one finds
conditions, where H h2 =2m)r2 . The eigen-
^ (1) = (
functions are plane waves p (r) = V 1=2 exp [ip kB Tc h!ho N= 31=3 0:94h!ho N 1=3 15
r=h], with energy p = p2 =2m and momentum
and
p = 2 hn=L. Here n is a vector whose components
nx , ny , nz are 0 or integers. The lowest eigenvalue N0 T N1 T=Tc 3 16
has zero energy (0 = 0) and zero momentum. The
mean occupation numbers are given by Notice that the condensate is not uniform in this case,
p = {exp [(p2 =2m )] 1}1 . In the thermo-
n since it corresponds to the lowest eigenfunction of the
dynamic limit (N, V ! 1 with harmonic oscillator, which is a Gaussian of width
P N/V kept constant), aho = [h=(m!ho )]1=2 . Correspondingly, the condensate
one
R can replace the sum p with the pintegral
d(), where () = (2)2 V(2m= h2 )3=2 is the in the momentum space is also a Gaussian, of width
density of states. In this way, one can calculate the a1
ho . This implies that, differently from the gas in a box,
thermal component of the gas as a function of T, here the condensate can be seen both in coordinate and
finding the critical temperature momentum space in the form of a narrow distribution
2=3 emerging from a wider thermal component. Finally,
2 h2 N results [15] and [16] remain valid even for anisotropic
kB Tc 13
m V 3=2 harmonic potentials, with trapping frequencies !x , !y ,
and !z , provided the frequency !ho is replaced by the
where is the Riemann zeta function and (3=2) geometric average (!x !y !z )1=3 .
2.612. For T > Tc , one has < 0 and NT = N. For
T < Tc one instead has = 0, NT = N N0 and
BEC in Interacting Gases
N0 T N1 T=Tc 3=2 14
Actual condensates are made of interacting particles.
The critical temperature turns out to be fully The full many-body Hamiltonian is
determined by the density N/V and by the mass of Z
the constituents. These results were first obtained ^ dr
H ^ 0 r
^ y rH ^
by A Einstein in his seminal paper and used by Z
1
F London in the context of superfluid helium. We dr 0 dr
^ y r
^ y r 0 Vr r 0 r
^ 0 r
^ 17
notice that the replacement of the sum with an 2
integral in the above derivation is justified only if where V(r r 0 ) is the particleparticle interaction and
the thermal energy kB T is much larger than the H^ 0 = (h2 =2m)r2 Vext (r). Differently from the
energy spacing between single-particle levels, that is, case of ideal gases, H ^ is no longer a sum of single-
if kB T h2 =2mV 2=3 . Is is also worth noticing that particle Hamiltonians. However, the general defini-
the above expression for Tc can be written as tions given in the section What is BEC? are still
can write n(1) (r, r 0 ) = (r)(r 0 ) n ~(1) (r, r 0 ), where jj2 . It has been derived assuming that N is large
is the order parameter of the condensate ( (r)(r 0 ) while the fraction of noncondensed atoms is negli-
being of order N), while n ~(1) (r, r 0 ) vanishes for large gible. On the one hand, this means that quantum
jr r 0 j. This is equivalent to say that the bosonic field fluctuations of the field operator have to be small,
operator splits in two parts, which is true when njaj3
1, where n is the particle
density. In fact, one can show that, at T = 0 the
^
r ^
r r 18 quantum depletion of the condensate is proportional
to (njaj3 )1=2 . On the other hand, thermal fluctuations
where the first term is a complex function and the
have also to be negligible and this means that the
second one is the field operator associated with
theory is limited to temperatures much lower than
the noncondensed particles. This decomposition is
Tc . Within these limits, one can identify the total
particularly useful when the depletion of the
density with the condensate density.
condensate, that is, the fraction of noncondensed
The stationary solution of eqn [20] corresponds to
particles, is small. This happens when the interac-
the condensate wave function in the ground state. One
tion is weak, but also for particles with arbitrary
can write (r, t) = 0 (r) exp (it=h), where is the
interaction, provided the gas is dilute. In this case,
chemical potential. Then the GP equation [20] becomes
one can expand the many-body Hamiltonian by
as a small quantity. !
treating the operator
h2 r2 2
A suitable strategy consists in writing the Heisen- Vext r gj0 rj 0 r 0 r 21
2m
berg equation for the evolution of the field opera-
tors, i ^ = [,
h@t ^
^ H], using the many-body where n(r)= j0 (r)j2 is the particle density. The same
Hamiltonian [17]: equation can be obtained by minimizing the energy of
i ^ t
h@t r; the system written as a functional of the density:
Z Z " 2 #
H ^ 0 dr 0
^ y r 0 ; tVr r 0 r
^ 0 ; t h p 2 gn2
En dr j= nj nVext r 22
2m 2
^ t
r; 19
The first term on the right corresponds to the
The zeroth-order is thus obtained by replacing the quantum kinetic energy coming from the uncertainty
operator ^ with the classical field . In the integral principle; it is usually named quantum pressure
containing the interaction V(r r 0 ), this replacement is, and vanishes for uniform systems.
in general, a poor approximation when short distances The next order in gives the excited states of the
(r r 0 ) are involved. In a dilute and cold gas, one can condensate. In a uniform gas the ground-state order
nevertheless obtain a proper expression for the inter- parameter, 0 , is a constant and the first-order
action term by observing that, in this case, only binary expansion of H ^ was introduced by N Bogoliubov in
collisions at low energy are relevant and these collisions 1947. In particular, he found an elegant way to
are characterized by a single parameter, the s-wave diagonalize the Hamiltonian by using simple linear
scattering length, a, independently of the details of the combinations of particle creation and annihilation
two-body potential. This allows one to replace V(r r 0 ) operators. These are known as Bogoliubovs trans-
^ with an effective interaction V(r r 0 ) = g(r r 0 ),
in H formations and stay at the basis of the concept of
where the coupling constant g is given by g = 4h2 a=m. quasiparticle, one of the most important concepts in
The scattering length can be measured with several quantum many-body theory.
experimental techniques or calculated from the exact A generalization of Bogoliubovs approach to the
two-body potential. Using this pseudopotential and case of nonuniform condensates is obtained by
replacing the operator ^ with the complex function in considering small deviations around the ground
the Heisenberg equation of motion, one gets state in the form
i
h@t r; t r; t eit=h 0 r urei!t v rei!t 23
!
2 r2
h 2 Inserting this expression into eqn [20] and keeping
Vext r gjr; tj r; t 20
2m terms linear in the complex functions u and v, one gets
This is known as GrossPitaevskii (GP) equation and ^ 0 2g2 rur g2 rvr
h!ur H 24
0 0
it was first introduced in 1961. It has the form of a
nonlinear Schrodinger equation, the nonlinearity
coming from the mean-field term, proportional to ^ 0 2g2 rvr g2 rur
h!vr H 25
0 0
316 BoseEinstein Condensates
These coupled equations allow one to calculate the vessels, viscousless motion, quantized vorticity, and
energies " =
h! of the excitations. They also give the others. These features can also be observed in BEC.
so-called quasiparticle amplitudes u and v, which obey The link between BEC and superfluidity is given by
the normalization condition the phase of the order parameter [11]. To under-
Z stand this point, let us consider a uniform system. If
drui ruj r vi rvj r = ij ^ t) is a solution of the Heisenberg equation [19]
(r,
with Vext = 0, then
In a uniform gas, u and v are plane waves and one
recovers the famous Bogoliubovs spectrum ^ vt; t exp i mv r 1 mv2 t
^ 0 r; t r
28
" !#1=2 h 2
h2 q2
h2 q2
h! 2gn 26 where v is a constant vector, is also a solution. This
2m 2m
equation gives the Galilean transformation of
where q is the wave vector of the excitations. the field operator and also applies to its condensate
For large momenta the spectrum coincides with the component . At equilibrium, the p ground-state
free-particle energy h2 q2 =2m. At low momenta, it order parameter is given by 0 = n exp (it=h),
instead gives the phonon dispersion ! = cq, where where n is a constant independent of r. In a frame
c = [gn=m]1=2 is the Bogoliubov sound velocity. The where the condensate moves with velocity v, the
transition between the two regimes occurs when the order
p parameter instead takes the form 0 =
excitation wavelength is of the order of the healing n exp (iS), with S(r, t) = h1 [mv r (mv2 =2 )t].
length, The velocity of the condensate can thus be identified
p with the gradient of the phase S:
8na1=2
h=mc 2 27
h
which is an important length scale for superfluidity. vr; t =Sr; t 29
m
When the order parameter is forced to vanish at some
point (by an impurity, a wall, etc.), the healing length This definition is also valid for v varying slowly in
provides the typical distance over which it recovers its space and time. The modulus of the order para-
bulk value. In a nonuniform condensate the excitations meter plays a minor role in this definition and it is
are no longer plane waves but, at low energy, they have not necessary to assume the gas to be dilute and
still a phonon-like character, in the sense that they close to T = 0. Indeed, the relation [29] between the
involve a collective motion of the condensate. velocity field and the phase of the order parameter
The GP equation [20] is the starting point for an also applies in the presence of large quantum
accurate mean-field description of BEC in dilute depletion, as in superfluid 4 He, and at T 6 0. In
cold gases, which is rigorous at T = 0 and for this case, n should not be identified with the
njaj3
1. Static and dynamics properties of con- condensate density. Conversely, in dilute gases at
densates in different geometries can be calculated by T = 0, n is the condensate density and the velocity
solving the GP equation numerically or using [29] can be simply obtained by applying the usual
suitable approximated methods. The inclusion of definition of current density operator, ^j, to the order
effects beyond mean field is a highly nontrivial and parameter [11].
interesting problem. A rather extreme case is The velocity [29] describes a potential flow and
represented by liquid 4 He, which is a dense system corresponds to a collective motion of many particles
where the interaction between atoms causes a large occupying a single quantum state. Being equal to the
depletion of the condensate even at T = 0 (N0 =N gradient of a scalar function, it is irrotational
being less than 10%) and thus a full many-body (= vs = 0) and satisfiesH the OnsagerFeynman
treatment is required for its rigorous description. quantization condition vs dl = h=m, with
Nevertheless, even in this case, the general defini- non-negative integer. These conditions are not
tions of the section What is BEC? are still useful. satisfied by a classical fluid, where the hydro-
dynamic velocity field, v(r, t) = j(r, t)=n(r, t), is the
average over many different states and does not
correspond to a potential flow.
Superfluidity and Coherence
By using the definition of the phase S and velocity
With the word superfluidity, one summarizes a v, together with particle conservation, one can show
complex of macroscopic phenomena occurring in that the dynamics of a condensate, as far as
quantum fluids under particular conditions: persis- macroscopic motions are concerned, is governed by
tent currents, equilibrium states at rest in rotating the hydrodynamic equations of an irrotational
BoseEinstein Condensates 317
nonviscous fluid. Within the mean-field theory, this been observed in condensates of ultracold atoms. In
can be easily seen by rewriting the GP equation [20] these systems it was also possible to measure the
in terms of the density n = jj2 and the velocityp coherence length, that is, the distance jr r 0 j at which
[29]. Neglecting the quantum pressure term r2 n the one-body density vanishes and the phase of the
(hence limiting the description to length scales order parameter is no more well defined. In most
larger than the healing length ), one gets situations, the coherence length turns out to be of the
order of, or larger than the size of the condensates.
@ However, interesting situations exist when the coher-
n = vn 0 30
@t ence length is shorter but the system still preserves some
and features of BEC (quasicondensates).
@ mv2
m v = Vext n 0 31 Final Remarks
@t 2
BoseEinstein condensates of ultracold atoms are
with the local chemical potential (n) = gn. These easily manipulated by changing and tuning the
equations have the typical structure of the dynamic external potentials. This means, for instance, that one
equations of superfluids at zero temperature and can can prepare condensates in different geometries,
be viewed as the T = 0 case of the more general including very elongated (quasi-1D) or disk-shaped
Landaus two-fluid theory. (quasi-2D) condensates. This is conceptually impor-
One of the most striking evidences of superfluidity tant, since BEC in lower dimensions is not as simple as
is the observation of quantized vortices, that is, in three dimensions: thermal and quantum fluctua-
vortices obeying the OnsagerFeynman quantization tions play a crucial role, superfluidity must be properly
condition. A vast literature is devoted to vortices in re-defined, and very interesting limiting cases can be
superfluid helium and, more recently, vortices have explored (TonksGirardeau regime, Luttinger liquid,
also been produced and studied in condensates of etc.). Another possibility is to use laser beams to
ultracold gases, including nice configurations of produce standing waves acting as an external periodic
many vortices in regular triangular lattices, similar potential (optical lattice). Condensates in optical
to the Abrikosov lattices in superconductors. Other lattices behave as a sort of perfect crystal, whose
phenomena, such as the reduction of the moment of properties are the analog of the dynamic and transport
inertia, the occurrence of Josephson tunneling properties in solid-state physics, but with controllable
through barriers, the existence of thresholds for spacing between sites, no defects and tunable lattice
dissipative processes (Landau criterion), and others, geometry. One can investigate the role of phase
are typical subjects of intense investigation. coherence in the lattice, looking, for instance, at
Another important consequence of the fact that Josephson effects as in a chain of junctions. By tuning
BEC is described by an order parameter with a well- the lattice depth one can explore the transition from a
defined phase is the occurrence of coherence effects superfluid phase and a Mott-insulator phase, which is
which, in different words, mean that condensates a nice example of quantum phase transition. Control-
behave like matter waves. For instance, one can ling cold atoms in optical lattice can be a good starting
measure the phase difference between two conden- point for application in quantum engineering, inter-
sates by means of interference. This can be done in ferometry, and quantum information.
coordinate space by confining two condensates in Another interesting aspect of BECs is that the key
two potential minima, a and b, at a distance d. Let equation for their description in mean-field theory,
us take d along z and assume that, at t = 0, the order namely the GP equation [20], is a nonlinear Schro-
parameter is given by the linear combination dinger equation very similar to the ones commonly
(r) = a (r) exp (i
)b (r) with a and b real used, for instance, in nonlinear quantum optics. This
and without overlap. Then let us switch off the opens interesting perspectives in exploiting the analo-
confining potentials so that the condensates expand gies between the two fields, such as the occurrence of
and overlap. If the overlap occurs when the density dynamical and parametric instabilities, the possibility
is small enough to neglect interactions, the motion to create different types of solitons, the occurrence of
is ballistic and the phase of each condensate evolves nonlinear processes like, for example, higher harmonic
as S(r, t) mr2 =(2ht), so that v = r=t. This implies generation and mode mixing.
a relative phase
S(x, y, z d=2) S(x, y, z A relevant part of the current research also involves
d=2) =
mdz= ht. The total density n = jj2 thus systems made of mixtures of different gases, BoseBose
exhibits periodic modulations along z with wave- or FermiBose, and many activities with ultracold
length ht=md. This interference pattern has indeed atoms now involve fermionic gases, where BEC can
318 Bosons and Fermions in External Fields
also be realized by condensing molecules of fermionic Dalfovo F, Giorgini S, Pitaevskii LP, and Stringari S (1999)
pairs. An extremely active research now concerns the Theory of BoseEinstein condensation in trapped gases.
Reviews of Modern Physics 71: 463.
BCSBEC crossover, which can be obtained in Fermi Griffin A, Snoke DW, and Stringari S (1995) BoseEinstein
gases by tuning the scattering length (and hence the Condensation. Cambridge: Cambridge University Press.
interaction) by means of Feshbach resonances. Huang K (1987) Statistical Mechanics, 2nd edn. New York:
Ten years after the first observation of BEC in Wiley.
ultracold gases, it is almost impossible to summarize Inguscio M, Stringari S, and Wieman CE (1999) BoseEinstein
Condensation in Atomic Gases, Proceedings of the Inter-
all the researches done in this field. A large amount national School of Physics Enrico Fermi, Course CXL.
of work has already been devoted to characterize the Amsterdam: IOS Press.
condensates and several new lines have been opened. Ketterle W (2002) Nobel lecture: when atoms behave as waves:
Rather detailed review articles and books are BoseEinstein condensation and the atom laser. Reviews of
already available for the interested readers. Modern Physics 74: 1131.
Landau LD and Lifshitz EM (1980) Statistical Physics, Part 1.
Oxford: Pergamon Press.
See also: Interacting Particle Systems and Hydrodynamic
Leggett AJ (2001) BoseEinstein condensation in the alkali gases:
Equations; Quantum Phase Transitions; Quantum some fundamental concepts. Reviews of Modern Physics
Statistical Mechanics: Overview; Renormalization: 73: 307.
Statistical Mechanics and Condensed Matter; Superfluids; Lifshitz EM and Pitaevskii LP (1980) Statistical Physics, Part 2.
Variational Techniques for GinzburgLandau Energies. Oxford: Pergamon Press.
Pethick CJ and Smith H (2002) BoseEinstein Condensation in
Dilute Gases. Cambridge: Cambridge University Press.
Further Reading Pitaevskii LP and Stringari S (2003) BoseEinstein Condensation.
Oxford: Clarendon Press.
Cornell EA and Wieman CE (2002) Nobel lecture: BoseEinstein
condensation in a dilute gas, the first 70 years and some recent
experiments. Reviews of Modern Physics 74: 875.
moving in three-dimensional space and interacting certain (anti-) commutator relations, and this is a
with an external vector and scalar potentials A and convenient way to construct the appropriate many-
, respectively, particle Hilbert space, Hamiltonian, etc. In the
nonrelativistic case, this formalism can be regarded
1
i@t H ; H ir eA2 e 1 as an elegant reformulation of a pedestrian con-
2m struction of a many-body quantum-mechanical
(we set h = c = 1, @t = @=@t, and ,, and A can model, which is useful since it provides convenient
depend on the space and time variables x 2 R 3 and computational tools. However, this formalism nat-
t 2 R). This is a standard quantum-mechanical urally generalizes to the relativistic case where the
model, with the one-particle wave function one-particle model no longer has an acceptable
allowing for the usual probabilistic interpretation. physical interpretation, and one finds that one can
One interesting generalization to the relativistic nevertheless give a consistent physical interpretation
regime is the KleinGordon equation to [2] and [3] provided that are interpreted as
h i quantum field operators describing bosons and
i@t e2 ir eA2 m2 0 2 fermions. This particular exchange statistics of the
relativistic particles is a special case of the spin-
with a C-valued function . There is another
statistics theorem: integer-spin particles are bosons
important relativistic generalization, the Dirac
and half-integer spin particles are fermions. While
equation
many structural features of this formalism are
i@t e ir eA a m 0 3 present already in the simpler nonrelativistic models,
the relativistic models add some nontrivial features
with a = (1 , 2 , 3 ) and Hermitian 4 4
typical for quantum field theories.
matrices satisfying the relations
In the following, we discuss a precise mathema-
i j j i ij ; i i ; 2 1 4 tical formulation of the quantum field theory models
described above. We emphasize the functorial nature
and a C4 -valued function (we also write 1 for the of this construction, which makes manifest that it
identity). These two relativistic equations differ by also applies to other situations, for example, where
the transformation properties of under Lorentz the bosons and fermions are also coupled to a
transformations: in [2] it transforms like a scalar gravitational background, are considered in other
and thus describes spin-0 particles, and it transforms spacetime dimensions than 3 1, etc.
like a spinor describing spin-1/2 particles in [3]. While
these equations are natural relativistic generaliza-
tions of the Schrodinger equation, they no longer
Second Quantization:
allow to consistently interpret as one-particle
Nonrelativistic Case
wave functions. The physical reason is that, in a
relativistic theory, high-energy processes can create Consider a quantum system of nondistinguishable
particleantiparticle pairs, and this makes the particles where the quantum-mechanical descrip-
restriction to a fixed particle number inconsistent. tion of one such particle is known. In general, this
This problem can be remedied by constructing a one-particle description is given by a Hilbert space
many-body model allowing for an arbitrary number h and one-particle observables and transforma-
of particles and antiparticles. The requirement that tions which are self-adjoint and unitary operators
this many-body model should have a ground state is on h, respectively. The most important observable
an important ingredient in this construction. is the Hamiltonian H. We will describe a general
It is obviously of interest to formulate and study construction of the corresponding many-body
many-body models of nondistinguishable particles system.
already in the nonrelativistic case. An important
Example As a motivating example we take the
empirical fact is that such particles come in two
Hilbert space h = L2 (R3 ) of square-integrable func-
kinds, bosons and fermions, distinguished by their
tions f (x), x 2 R3 , and the Hamiltonian H in [1]. A
exchange statistics (we ignore the interesting possi-
specific example for a unitary operator on h is the
bility of exotic statistics). For example, the fermion
gauge transformation (Uf )(x) = exp(i(x))f (x) with
many-particle version of [1] for suitable and A is a
a smooth, real-valued function on R 3 .
useful model for electrons in a metal. An elegant
method to go from the one- to the many-particle In this example, the corresponding wave functions
description is the formalism of second quantization: for N identical such particles are the L2 -functions
one promotes to a quantum field operator with fN (x1 , . . . , xN ), xj 2 R3 . It is obvious how to extend
320 Bosons and Fermions in External Fields
one-particle observables and transformations to such for all f 2 h. Then the relations characterizing the
N-particle states: for example, the N-particle Hamil- field operators can be written as
tonian corresponding to H in [1] is y
f ; g f ; g
XN
1 f ; g 0 10
HN irxj eAt; xj 2 et; xj 5
j1
2m 8f ; g 2 h
and the N-particle gauge transformation
Q UN is defined where
through multiplication with N j=1 exp(i(x j )). Z
For systems of indistinguishable particles it is f ; g d3 xf xgx
3
enough to restrict to wave functions which are even R
or odd under particle exchanges, is the inner product in h. The Fock space F (h) can
then be defined by postulating that it contains a
fN x1 ; . . . ; xj ; . . . ; xk ; . . . ; xN
normalized vector called vacuum such that
fN x1 ; . . . ; xk ; . . . ; xj ; . . . ; xN 6
f 0 8f 2 h 11
for all 1 j < k N, with the upper and lower (y)
and that all (f ) are operators on F (h) such that
signs corresponding to bosons and fermions, respec- y
(f ) = (f )
, where
is the Hilbert space adjoint.
tively (this empirical fact is usually taken as a
Indeed, from this we conclude that F (h), as vector
postulate in nonrelativistic many-body quantum
space, is generated by
physics). It is convenient to define the zero-particle
Hilbert space as C (complex numbers) and to f1 ^ f2 ^ ^ fN y
f1 y
f2 y
fN 12
introduce a Hilbert space containing states with all
possible particle numbers: this so-called Fock space with fj 2 h and N = 0, 1, 2, . . . , and that the Hilbert
contains all states space inner product of such vectors is
0 1 hf1 ^ f2 ^ ^ fN ; g1 ^ g2 ^ ^ gM i
f0
B f1 x1 C X Y
N
B C N;M 1jPj fj ; gPj 13
B f2 x1 ; x2 C
B C 7 j1
B f3 x1 ; x2 ; x3 C P2SN
@ A
.. with SN the permutation group, with (1)jPj = 1
.
always, and (1)jPj = 1 and 1 for even and odd
with f0 2 C. The definition of HN and UN then permutations, respectively. The many-body Hamil-
naturally extends to this Fock space; see below. tonian q(H) corresponding to the one-particle Hamil-
tonian H can now be defined by the following relations:
y y
General Construction qH 0; qH; f Hf 14
The construction of Fock spaces and many-particle for all f 2 h such that Hf is defined. Indeed, this
observables and transformations just outlined in a implies that
specific example is conceptually simple. An alter-
qHf1 ^ f2 ^ ^ fN
native, more efficient construction method is to use
quantum fields, which we denote as (x) and X
N
y
(x), x 2 R 3 . They can be fully characterized by the f1 ^ f2 ^ ^ Hfj ^ ^ fN 15
j1
following (anti-) commutator relations:
which defines a self-adjoint operator on F (h), and
x; y
y 3 x y; x; y 0 8 it is easy to check that this coincides with our down-
where [a, b] ab ba, with the commutator and to-earth definition of HN above. Similarly, the
anticommutators (upper and lower signs, respec- many-body transformation Q(U) corresponding to
tively) corresponding to the boson and fermion case, a one-particle transformation U can be defined as
respectively. It is convenient to smear these fields QU ; QU y
f y
Uf QU 16
with one-particle wave functions and define
Z for all f 2 h, which implies that
f d3 xf x x
3 QUf1 ^ f2 ^ ^ fN
ZR 9 17
Uf1 ^ Uf2 ^ ^ UfN
y
f d3 x y xf x
R3
Bosons and Fermions in External Fields 321
and thus coincides with our previous definition of for all m, n. We also note that, in our definition of
UN . q(A), we made a convenient choice of normal-
While we presented the construction above for a ization, but there is no physical reason to not choose
particular example, it is important to note that it a different normalization and define
actually does not make reference to what the one-
q 0 A qA bA 24
particle formalism actually is. For example, if we
had a model of particles on a space M given by where b is some linear function mapping self-adjoint
some nice manifold of any dimension and with M operators A to real numbers. For example, one may wish
internal degrees of freedom, we would take to use another reference vector ~ instead of in the
h = L2 (M) CM and replace [9] by Fock space, and then would choose b(A) = h, ~ q(A)i.
~
Z Then the relations in [19] are changed to
XM
f dx fj x j x 18 q0 A; q 0 B q 0 A; B S0 A; B 25
M j1
where S0 (A, B) = b([A, B]). However, the C-number
and its Hermitian conjugate, with the measure on term S0 (A, B) in the relations [25] is trivial, since it
M defining the inner product in h, can be removed by going back to q(A).
Z X
f ; g dx fj xgj x
j Physical Interpretation
With that, all formulas after [9] hold true as they stand. The Fock space F (h) is the direct sum of subspaces
Given any one-particle Hilbert space h with inner of states with different particle numbers N,
product ( , ), observable H, and transformation U, the
M
1
formulas above define the corresponding Fock spaces F h hN 26
F (h) and many-body observable q(H) and transfor- N0
mation Q(U). It is also interesting to note that this
where the zero-particle subspace h(0)
= C is gener-
construction has various beautiful general (functorial)
ated by the vacuum , and h(N) is the N-particle
properties: the set of one-particle observables has a
subspace generated by the states f1 ^ f2 ^ ^
natural Lie algebra structure with the Lie bracket given
fN , fj 2 h. We note that
by the commutator (strictly speaking: i times the
commutator, but we drop the common factor i for N q1 27
simplicity). The definitions above imply that
is the particle-number operator, N FN = NFN for
(N)
qA; qB qA; B 19 all FN 2 h . The field operators obviously change
the particle number: y (f ) increases the particle
for one-particle observables A, B, that is, the above- number by one (maps h(N) to h(N1) ), and (f )
mentioned Lie algebra structure is preserved under decreases it by one. Since every f 2 h can be interpreted
this map q. In a similar manner, the set of one- as one-particle state, it is natural to interpret y (f ) and
particle transformations has a natural group struc- (f ) as creation and annihilation operators,
ture preserved by the map Q, respectively: they create and annihilate one particle in
QUQV QUV; QU1 QU1 20 the state f 2 h. It is important to note that, in the
fermion case, [10] implies that y (f )2 = 0, which is a
Moreover, if A is self-adjoint, then exp(iA) is mathematical formulation of the Pauli exclusion
unitary, and one can show that principle: it is not possible to have two fermions in the
same one-particle state. In the boson case, there is no
QexpiA expiqA 21
such restriction. Thus, even though the formalisms
For later use, we note that, if {fn }n2Z is some used to describe boson and fermion systems look very
complete, orthonormal basis in h, then operators A similar, they describe dramatically different physics.
on h can be represented by infinite matrices
(Amn )m, n2Z with Amn = (fm , Afn ), and Applications
X
qA Amn ym n 22 In our example, the many-body Hamiltonian
m;n H0 q(H) can also be written in the following
where (y)
= (y)
(fn ) obey suggestive form:
n
Z
m;
y
n m;n ; m;
y
n 0 23 H0 d3 x y xH x 28
322 Bosons and Fermions in External Fields
and similar formulas hold true for other observables Field Algebras and Quasifree Representations
and other Hilbert spaces h = L2 (M) Cn . It is
In the previous section, we identified the field
rather easy to solve the model defined by such
operators (y) (f ) with particular Fock space opera-
Hamiltonian: all necessary computations can be
tors. This is analogous to identifying the operators
reduced to one-particle computations. For example,
pj = i@xj and qj = xj on L2 (RM ) with the generators
in the static case, where A and are time
of the Heisenberg algebra, as usually done. (We
independent, a main quantity of interest in statistical
recall: the Heisenberg algebra is the star algebra
physics is the free energy
generated by Pj and Qj , j = 1, 2, . . . , M < 1, with
E 1 logtrexp H0 N 29 the well-known relations
the Dirac case, which is somewhat simpler. with the so-called normal ordering prescription
Fermions : ym n : ym n ; ym n 41
One-particle formalism Recalling that i@t is the where we made use of the freedom of normalization
energy operator, we define the Dirac Hamiltonian D explained after [23] to eliminate unwanted additive
P
by rewriting [3] in the following form: constants. We get q(D) = n2Z jEn j yn n , which is
i@t D ; D ir eA a m e 36 manifestly a non-negative self-adjoint operator with
as ground state. We thus found a physical many-
This Dirac Hamiltonian is obviously a self-adjoint body description for our model. We can now define
operator on the one-particle Hilbert space h = L2 (R 4 ) for other one-particle observables,
C4 , but, different from the Schrodinger Hamiltonian in X
[1], it is not bounded from below: for any E0 > 1, q
^A Amn : ^ym ^n : 42
n2Z
one can find a state f such that the energy expectation
value (f, Df ) is less than E0 . This can be easily seen for and, by straightforward computations, we obtain
the simplest case where the external potential vanishes,
A = = 0. Then the eigenvalues of D can be computed qA; q
^ ^B q
^A; B SA; B 43
P P
by Fourier transformation, and one finds where S(A, B) = m<0 n0 (Amn Bnm Bmn Anm ),
q that is,
E p 2 m2 ; p 2 R 3 37
SA; B trP AP BP P BP AP 44
Due to the negative energy eigenvalues we conclude P
with P = n<0 fn (fn , ) the projection onto the
that there is no ground state, and the Dirac
subspace spanned by the negative energy eigenvec-
Hamiltonian thus describes an unstable system,
tors of D and P = 1 P . One can show that q ^(A)
which is physically meaningless.
is no longer defined for all operators but only if
To summarize: a (unphysical) one-particle
description of relativistic fermions is given by a P AP and P AP are
Hilbert space h together with a self-adjoint Hamil- HilbertSchmidt operators 45
tonian D unbounded from below. Other observables
and transformations are given by self-adjoint and (we recall that a is a HilbertSchmidt operator if
unitary operators on h, respectively. tr(a
a) < 1). The C-number term S(A,B) in [43] is
324 Bosons and Fermions in External Fields
C i 48
no longer possible to remove it by a redefinition ; K 2
^0 (A) = q
q ^(A) b(A). This Schwinger term is an y iB C
example of an anomaly, and it has various interest- with
ing implications.
In a similar manner, one can construct the many- B2 ir eA2 m2 ; C e 49
body transformations Q(U)^ of unitary operators U Thus, one sees that the natural one-particle Hilbert
on h satisfying the very HilbertSchmidt condition space for the KleinGordon equation is
in [45], and one obtains h = L2 (R 3 ) C2 ; here, and in the following, we
^
QU ^
QV ^
U; VQUV 46 identify h with h0
h0 , h0 = L2 (R3 ), and use a
convenient 2 2 matrix notation naturally asso-
with interesting phase-valued functions . ciated with that splitting. However, the one-particle
More generally, for any one-particle Hilbert Hamiltonian is not self-adjoint but rather obeys
space h and Dirac Hamiltonian D, the physical
operator, which resolves one problem of the one-particle related to conformal field theory (see, e.g., Kac and
theory. However, q(K) is not bounded from below, and Raina (1987) for a textbook presentation and Carey
thus P(0)
is not yet the physical representation. and Ruijsenaars (1987) for a detailed mathematical
The physical representation can be constructed account within the framework described by us).
using the operators It turns out that the mathematical framework
discussed in the previous section is sufficient for
1 B1=2 iB1=2 1 0
T p 1=2 ; F 53 constructing fully interacting quantum field theories,
2 B iB1=2 0 1
in particular YangMills gauge theories, in 1 1
(for simplicity, we restrict ourselves to the case C = 0 but not in higher dimensions. The reason is that, in
and B > 0; we use the calculus of self-adjoint operators 3 1 dimensions, the one-particle observables A of
here) with the following remarkable properties: interest do not obey the HilbertSchmidt condition
in [45] but only the weaker condition
T 1 JT
F
tra
an < 1; a P AP 56
B 0 54
TKT 1 ^
K
0 B with n = 2, and the natural analog of g2 in 3 1
One can check that dimensions thus seems to be the Lie algebra g2n of
operators satisfying this condition with n = 2. Various
^y f y
Tf ; ^f T 1 f 55 results on the representation theory of such Lie
algebras g2n>2 have been developed (see Mickelsson
is a quasifree representation P of A (h) with
(1989), where various interesting relations to infinite-
P = (1 F)=2. With that the construction of q ^ and
^ is very similar to the fermion case described dimensional geometry are also discussed).
Q
^ and F now As mentioned, the Schwinger term S(A,B) in [44] is
above (the crucial simplification is that K
an example of an anomaly. Mathematically, it is a
are diagonal). In particular, q^(K) is a non-negative
nontrivial 2-cocycle of the Lie algebra g2 , and analogs
operator with the ground state , and q ^(A) and
^ for the groups g2n>2 have been found. These cocycles
Q(U) are self-adjoint and unitary for every one-
provide a natural generalization of anomalies (in the
particle observable A and transformation U, respec-
meaning of particle physics) to operator algebras. They
tively. One also gets relations as in [43] and [46].
not only shed some interesting light on the latter, but
also provide a link to notions and results from
Related Topics of Recent Interest noncommutative geometry (see, e.g., Gracia-Bonda
et al. (2001)). We believe that this link can provide a
The impossibility to construct relativistic quantum- fruitful driving force and inspiration to find ways to
mechanical models played an important role in the deepen our understanding of quantum YangMills
early history of quantum field theory, as beautifully theories in 3 1 dimensions (Langmann 1996).
discussed in chapter 1 of Weinberg (1995).
The abstract formalism of quasifree representations See also: Anomalies; C*-Algebras and Their
of fermion and boson field algebras was developed in Classification; Dirac Fields in Gravitation and Nonabelian
many papers (see, e.g., Ruijsenaars (1977), Grosse and Gauge Theory; Dirac Operator and Dirac Field; Gerbes in
Langmann (1992), and Langmann (1994) for explicit Quantum Field Theory; Quantum Field Theory in Curved
results on Q ^ and ). A nice textbook presentation Spacetime; Quantum n-Body Problem; Superfluids;
with many references can be found in chapter 13 of Two-Dimensional Models.
Gracia-Bonda et al. (2001) (this chapter is rather self-
contained but mainly restricted to the fermion case).
Further Reading
Based on the ShaleStinespring theorem, there has
been considerable amount of work to investigate Carey AL and Ruijsenaars SNM (1987) On fermion gauge
whether the quasifree representations associated groups, current algebras and KacMoody algebras. Acta
Applicandae Mathematicae 10: 186.
with different external electromagnetic fields
DeWitt B (2003) The Global Approach to Quantum Field
1 , A1 and 2 , A2 are unitarily equivalent, if and Theory, International Series of Monographs on Physics, vols.
which time-dependent many-body Hamiltonians 1 and 2, p. 114. New York: Oxford University Press.
exist, etc. (see chapter 13 of Gracia-Bonda et al. Gracia-Bonda JM, Varilly JC, and Figueroa H (2001) Elements
(2001), and references therein). of Noncommutative Geometry, Birkhauser Advanced Texts:
The infinite-dimensional Lie algebra g2 of Hilbert Basel Textbooks. Boston: Birkhauser.
Grosse H and Langmann E (1992) A superversion of quasifree second
space operators satisfying the condition in [45] is an quantization. Journal of Mathematical Physics 33: 10321046.
interesting infinite-dimensional Lie algebra with a Kac VG and Raina AK (1987) Bombay Lectures on Highest
beautiful representation theory. This subject is closely Weight Representations of Infinite-Dimensional Lie Algebras,
326 Boundaries for Spacetimes
Advanced Series in Mathematical Physics, vol. 2. Teaneck: Reed M and Simon B (1975) Methods of Modern Mathematical
World Scientific Publishing. Physics. II. Fourier Analysis, Self-Adjointness. New York:
Langmann E (1994) Cocycles for boson and fermion Bogoliubov Academic Press.
transformations. Journal of Mathematical Physics 96112. Ruijsenaars SNM (1977) On Bogoliubov transformations for
Langmann E (1996) Quantum gauge theories and noncommuta- systems of relativistic charged particles. Journal of Mathema-
tive geometry. Acta Physica Polonica B 27: 24772496. tical Physics 18: 517526.
Mickelsson J (1989) Current Algebras and Groups, Plenum Weinberg S (1995) The Quantum Theory of Fields, vol. I (English
Monographs in Nonlinear Physics. New York: Plenum Press. summary) Foundations. Cambridge: Cambridge University Press.
Rafelski J, Fulcher LP, and Klein A (1978) Fermions and bosons
interacting with arbitrary strong external fields. Physics
Reports 38: 227361.
Let Rn denote Euclidean n-space, Sn the unit termed future-null infinity, and I is past-null infinity.
n-sphere, and Ln Minkowski n-space, that is, Rn with All spacelike geodesics come to i0 , spacelike infinity.
metric ds2 = dx21 dx2n1 dt2 (so Ln = For n = 2, this picture produces the familiar
Rn1 L1 ). The n-dimensional Einstein static space- diamond representation of L2 (Figure 3): as E2 is
time is the product spacetime En = Sn1 L1 . Con- easily unrolled into another copy of L2 (metric
sider Sn1 as embedded in Rn = Rn1 R1 . Then the
conformal embedding is : Ln ! En , expressed as
i+
: Rn1 L1 ! Sn1 L1 Rn1 R 1 L1 given
by (x, t) = ((x=jxj) sin , cos , ), where = tan1
(t jxj) tan1 (t jxj) and = tan1 (t jxj)
tan1 (t jxj). The boundary @ (Ln ) consists of the =
following: the points { = ; 0 < }, composed
of an Sn2 of null lines coming together at the point
i = (0, 1, ); a similar cone of null lines { = ;
< 0} with vertex at i = (0, 1, ); and a single
limit-point for both cones at i0 = (0, 1, 0). The > 0
null cone is called I (the letter is read scri for
script-I), its counterpart I (Figures 1 and 2). As all
future-directed timelike geodesics in Ln have i as an
endpoint in En , i is called future-timelike infinity;
similarly, i is past-timelike infinity. Every future-
directed null geodesic ends up on I , which is thus
+
E2
=0
i+ =
+
i0
Image of L2
i0
=0
=
i =
causal curve from x to y): for x 2 M and P, Q 2 of more concern is that the topology prescribed by
^
@(M), x
P for I (x) P, P
x for P I (x), and GKP is not what might be expected in even the
P
Q for P Q. simplest of cases, for example, Minkowski space: Ln
The intent is to have the elements of @(M)^ provide needs no identifications among boundary points (no
future endpoints for future-endless causal curves in matter whose identification procedure is followed).
M; in particular, we want two such curves, c1 and The GKP topology on Ln , restricted to @(L ^ n ), is not
n2 1
c2 , to be assigned the same future endpoint precisely that of a cone (S R with a point added), as is
when I [c1 ] = I [c2 ]. This is accomplished by the the case for I in the conformal embedding into En ;
simple expedient of defining the future endpoint of a ^ n ) (not including i )
but, instead, each null line in @(L
future-endless causal curve c to be P = I [c]. We do is an open set, and i has no neighborhood in @(L ^ n)
not have a topology on M ^ as yet, but it is worth save for the entire boundary. This is a topology
noting that if P is the assigned future endpoint of c, bearing no relation at all to that of any embedding.
then I (P) = I [c]; this is at least the correct causal
behavior for a putative future endpoint of c.
Future Causal Boundary
We can perform all the operations above in the
time-dual manner, obtaining the past causal bound- Construction An alternative approach, initiated by
ary @(M), consisting of terminal indecomposable Harris (1998), is to forego the full causal boundary
future sets (TIFs), and the past causal completion and concentrate only on M ^ and M separately. There
M
= @(M) [ M. The full causal boundary of M is an advantage to this in that the process of future
consists of the union of @(M)^
with @(M) with some causal completion that is to say, forming M ^ from
sort of identifications to be made. M can be made functorial in an appropriate
As an example of the need for identifications, category of chronological sets: a set X with a
consider M to be L2 with a closed timelike line relation which is transitive and antireflexive such
segment deleted, say M = L2 {(0, t) j 0 t 1}. that it possesses a countable subset S which is
^
For @(M), we have first the boundary elements at chronologically dense, that is, for any x, y 2 X,
infinity: the TIP i = M (the past of the positive time there is some s 2 S with x s y. Any strongly
axis) and the set of TIPs making up I (the pasts of causal spacetime M is a chronological set, as is M. ^
null lines going out to infinity in L2 ); and then, the The entire construction of the future causal bound-
boundary elements coming from the deleted points: ary works just as well for a chronological set. The
for each t with 0 < t 1, two IPs emanating from role of a timelike curve in a chronological set is
(0, t), that is, P t , the past of the null line going taken by a future chain: a sequence c = {xn } with
pastwards from (0, t) toward x > 0, and P t , the past xn xn1 for all n. For any future chain c, I [c] is an
of the null line going pastwards from (0, t) toward IP, and any IP can be so expressed; but unlike in
x < 0; and P0 , emanating from (0, 0), that is, the spacetimes, I (x) may or may not be an IP for x 2 X.
past of the negative time axis. Similarly, @(M) Then, X ^ is always future complete in the sense that
^ there is an element 2 X ^
consists of i , I , TIFs Ft and Ft emanating from for any future chain c in X,
(0, t) for 0 t < 1, and the TIF F1 emanating from with I () = I [c]: for instance, if the chain c lies in
(0, 1). We probably want to make at least the X but there is no x 2 X with I (x) = I [c], just let
following identifications for each t with 0 < t < 1, ^
= I [c], which is an element of @(X). This yields a
Pt Ft
and P
t Ft ; P1 F1 P1 ; and F0 functor of future completion from the category of
P0 F0 . This results in a two-sided replacement chronological sets to the category of future-complete
for the deleted segment; for some purposes, it might chronological sets, and the embedding X ! X ^ is a
be deemed desirable to identify the two sides as one, universal object in the sense of the category theory;
but a universal boundary is probably a good idea, this implies that it is categorically unique and is the
leaving further identifications as optional quotients minimal future-completion process.
of the universal object. However, it is crucial to have more than the
How best to define the appropriate identifications chronology relation operating in what is to be a
in general is a matter of some controversy. GKP boundary; topology of some sort is needed. This is
defined a somewhat complicated topology on accomplished by defining what might be called the
M ^
= @(M)
[ @(M) [ M, then used an identification future-chronological topology for any chronological
intended to result in a Hausdorff space. There are set including for M ^ when M is a strongly causal
significant problems with this approach in some spacetime. This topology is defined by means of a
outre spacetimes, as pointed out by Budic and Sachs limit-operator L ^ on sequences: if X is the chron-
(1974) and Szabados (1989), both of whom recom- ological set, then for any sequence of points = {xn }
mended a different set of identifications. But what is ^
in X, L() denotes a subset of X which is the set of
Boundaries for Spacetimes 331
topology reveals; but it is still the case that @(M), ^ formed of TIPs and TIFs, plus any TIP or TIF that
apart from i , is fibered by R over @(Q). cannot be paired; this produces an appropriate set of
If Q is a warped product Q = (a, b) K for a ^
identifications within @(M)
[ @(M). The chronology
compact manifold K with metric dr2 e(r) h with h relation on M is extended to M
= @(M) [ M by treating
a metric on K, then one can calculate more precisely: each point x in M as the Szabados pair (I (x), I (x)) and
if, for instance, has a minimum in the interior of each unpaired IP P as (P, ;) and unpaired IF F as (;, F),
(a, b) and has suitable growth on either end, then and then defining (P, F) (P0 , F0 ) whenever
@(Q) represents two copies of K (one for each end of F \ P0 6 ;.
(a, b) K), the future-chronological topology is the The resulting chronological set is not necessarily
same as the function-space topology, and M ^ (apart either past- or future-distinguishing, but it is (past and
from i ) is a simple product of R with Q [ @(Q): future)-distinguishing. The topology they propose
^
@(M) is precisely a null cone over two copies of K. places endpoints in @(M) for all causal curves which
This applies, for instance, to exterior Schwarzschild, are endless in M, but there may be multiple future
where K = S2 ; the boundary at one end of exterior endpoints for a single future-endless curve. The
Schwarzschild is the usual I , and the boundary at topology need not be T1 : points can fail to be closed.
the other end is the null cone {r = 2m}, where For a product spacetime M = Q L1 , the MarolfRoss
exterior attaches to interior Schwarzschild. topology on M is always the function-space topology.
Calculations for the future-chronological topology As of this writing, there is active research by J L Flores
become much easier when @(M) ^ is purely spacelike, ^
to institute a MarolfRoss type of identification of @(M)
^
that is, no P 2 @(M) is contained in the past of any
with @(M) using a topology that partakes more of the
other element of M. ^ For instance, if M is conformal future- and past-chronological topologies.
to a multiwarped product, Q1 Qm (a, b)
with metric f1 (t)2 h1 fm (t)2 hm dt2 , where hi See also: Asymptotic Structure and Conformal Infinity;
is a Riemannian metric on Qi , then @(M) ^ will be Spacetime Topology, Causal Structure and Singularities.
purely spacelike if all theR Riemannian factors are
b
complete and for each i, b 1=fi (t) dt < 1; in that
^
case, @(M) Q, where Q = Q1 Qm and
^ Q (a, b). This applies, for instance, to inter- Further Reading
M
ior Schwarzschild, where Q1 = R 1 and Q2 = S2 , Budic R and Sachs RK (1974) Causal boundaries for general relativistic
yielding the topology of R 1 S2 for the Schwarzs- space-times. Journal of Mathematical Physics 15: 13021309.
Garca-Parrado A and Senovilla JMM (2003) Causal relationship:
child singularity.
a new tool for the causal characterization of Lorentzian
There is a categorical universality for spacelike manifolds. Classical and Quantum Gravity 20: 625664.
boundaries and the future-chronological topology. Geroch RP, Kronheimer EH, and Penrose R (1972) Ideal points
This means that any other reasonable way of in space-time. Proceedings of the Royal Society of London,
future-completing interior Schwarzschild must yield Series A 327: 545567.
Harris SG (1998) Universality of the future chronological
R1 S2 or a topological quotient of that for the
boundary. Journal of Mathematical Physics 39: 54275445.
singularity; and if the result is to be past-distinguishing, Harris SG (2000) Topology of the future chronological boundary:
R1 S2 is the only possibility. universality for spacelike boundaries. Classical and Quantum
Of course, all this can be done in the time-dual Gravity 17: 551603.
fashion, using the past-chronological topology on Harris SG (2001) Causal boundary for standard static spacetimes.
It would be desirable to combine the future and Nonlinear Analysis 47: 29712981 (Special Edition: Proceed-
M.
ings of the Third World Congress in Nonlinear Analysis).
past causal boundaries with a suitable topology as Harris SG (2004a) Boundaries on spacetimes: an outline. Classical
well as appropriate identifications. There has been and Quantum Gravity 359: 6585.
some work in that direction. Harris SG (2004b) Discrete group actions on spacetimes: causality
conditions and the causal boundary. Classical and Quantum
Gravity 21: 12091236.
Causal Boundary: Revisited Harris SG and Dray T (1990) The causal boundary of the trousers
space. Classical and Quantum Gravity 7: 149161.
Marolf and Ross (2003) have proposed an identification Hawking SW and Ellis GFR (1973) The Large Scale Structure of
of TIPs and TIFs that relies on the equivalence relation Space-Time. Cambridge: Cambridge University Press.
defined by Szabados. For an IP P and IF F, call (P, F) a Marolf D and Ross SF (2003) A new recipe for causal
completions. Classical and Quantum Gravity 20: 40854118.
Szabados pair if P I (x) for all x 2 F, P is maximal Schmidt BG (1972) Local completeness of the b-boundary.
among IPs for that property, and dually for F with Communications in Mathematical Physics 29: 4954.
respect to P. For instance, for any x 2 M, (I (x), I (x)) Scott SM and Szekeres P (1994) The abstract boundary a new
is a Szabados pair. The MarolfRoss version of the approach to singularities of manifolds. Journal of Geometry
causal boundary, @(M), consists of all Szabados pairs and Physics 13: 223253.
Boundary Conformal Field Theory 333
Szabados LB (1988) Causal boundary for strongly causal space- Wald RM (1984) General Relativity. Chicago: University of
times. Classical and Quantum Gravity 5: 121134. Chicago Press.
Szabados LB (1989) Causal boundary for strongly causal space-
times: II. Classical and Quantum Gravity 6: 7791.
= 0. It should be
can then be developed into a mathematically noted that the vanishing of the trace of the stress
consistent theory. tensor for a scale invariant classical field theory does
334 Boundary Conformal Field Theory
highest-weight representation of V. However, this is This is related to the (punctured) plane by the
not necessarily irreducible. There may be null conformal mapping z ! (1=2) ln z t ix. The
vectors, which are linear combinations of states at result is a QFT on the circle 0
x < 1, in
a given level which are themselves annihilated by all imaginary time t. The generator of infinitesimal
the Ln with n > 0. They exist whenever h takes a time translations is related to that for dilatations in
value from the Kac table: the plane:
rm 1 sm2 1 ^ 2D
H ^ c
h hr;s 14 6
4mm 1
2L ^ c
^0 L 18
0
with the central charge parametrized as c = 1 6= 6
(m (m 1)), and r, s are non-negative integers. These where the last term comes from the Schwartzian
null states should be projected out, giving an derivative in [9]. Similarly, the generator of transla-
irreducible representation V h . tions in x, the total momentum operator, is
The full Hilbert space of the CFT is then ^ ).
P = 2(L0 L 0
M A general torus is, up to a scale transformation, a
H
nh;h V h V 15
h
parallelogram with vertices (0, 1,
, 1
) in the
h;h
complex plane, with the opposite edges identified.
where the non-negative integers nh, h specify how We can make this by taking a cylinder of unit
many distinct primary fields of weights (h, h) there circumference and length Im,
, twisting the ends by
are in the CFT. a relative amount Re
, and sewing them together.
The consistency of the OPE [3] with the existence This means that the partition function of the CFT on
of null vectors leads to the fusion algebra of the the torus can be written as
CFT. This applies separately to the holomorphic and ^ ^
antiholomorphic sectors, and determines how many Z
;
tr eIm
HiIm
P
copies of V c occur in the fusion of V a and V b : ^ ^
tr qL0 c=24 q
L0 c=24 19
X
c
Va Vb Nab Vc 16 using the above expressions for H and P and
c
c
introducing q e2i
.
where the Nab are non-negative integers. Through the decomposition [15] of H, the trace
A particularly important subset of all CFTs sum can be written as
consists of the minimal models. These have rational X
central charge c = 1 6(p q)2 =pq, in which case Z
;
nh;h h q h q 20
the fusion algebra closes with a finite number of
h;h
possible values 1
r
q, 1
s
p in the Kac
where
formula [14]. For these models, the fusion algebra
^ X
takes the form h q trV h qL0 c=24 dh Nqhc=24N 21
0 0 N
X
r1 r X
2 1 s1 s2 1
V r1 ;s1 V r2 ;s2 V r;s 17 is the character of the representation of highest weight
rjr1 r2 j sjs1 s2 j h, which counts the degeneracy dh (N) at level N. It is
where the prime on the sums indicates that they are purely an algebraic property of the Virasoro algebra,
to be restricted to the allowed intervals of r and s. and its explicit form is known in many cases.
There is an important theorem which states that All of this would be less interesting were it not
the only unitary CFTs with c < 1 are the mini- for the observation that the parametrization of the
mal models with p=q = (m 1)=m, where m is an torus through
is not unique. In fact, the
integer 3. transformations S :
! 1=
and T :
!
1
give the same torus (see Figure 1). Together, these
Modular Invariance
The fusion algebra limits which values of (h, h)
might appear in a consistent CFT, but not which 1/
ones actually occur, that is, the values of the nh, h .
This is answered by the requirement of modular
invariance on the torus. First consider the theory on 0 1 0 1
an infinitely long cylinder, of unit circumference. Figure 1 Two equivalent parametrizations of the same torus.
336 Boundary Conformal Field Theory
operations generate the modular group SL(2, Z), half plane. The conformal Ward identity, cf. [7],
and the partition function Z(
,
) should be now reads
invariant under them. T-invariance is simply imple- D Y E
mented by requiring that h h is an integer, but Tz j zj ; zj
the S-invariance of the right-hand side of [20] j
This is conformally related to the upper-half plane jhii jh; N; ji jh; N; ji 32
N0 j1
(with an insertion of bcc operators at 0 and 1 if
a 6 b) by the mapping z ! (1=) ln z. The gen- These are called Ishibashi states. Matrix elements of
erator of infinitesimal translations along the strip is the translation operator along the cylinder between
them are simple:
^ ab D
H ^ c=24 L
^ 0 c=24 27
^
hhh0 jeH= jhii
Thus, for the annulus, 0
1 dhX
X 0 N 1 dh N
X X
Zab tr e
^ ab
H
tr q
^ 0 c=24
L
28 hh0 ; N 0 ; j0 j
N 0 0 j0 1 N0 j1
with q e . As before, this can be decomposed ^ ^
hh0 ; N 0 ; j0 je2=L0 L0 c=12 33
into characters:
X
Zab nab
h h q 29 jh; N; ji jh; N; ji
h
1 dX
X h N
but note that now the expression is linear. The non- h0 h e4=hNc=24 34
N0 j1
negative integers nhab give the operator content with
the boundary conditions (ab): the lowest value of h
with nhab > 0 gives the conformal weight of the bcc h0 h h e4= 35
operator, and the others give conformal weights of
Note that the characters which appear are
the other allowed primary fields which may also sit
related to those in [29] by the modular transfor-
at this point.
mation S.
On the other hand, the annulus partition function
The physical boundary states satisfying [29],
may be viewed, up to an overall rescaling, as the
sometimes called the Cardy states, are linear
path integral for a CFT on a circle of unit
combinations of the Ishibashi states:
circumference, being propagated for (imaginary)
time 1 . From this point of view, the partition X
jai hhhjaijhii 36
function is no longer a trace, but rather the matrix h
element of eH= between boundary states:
^
Equating the two different expressions [29] and [30]
Zab hajeH= jbi 30 for Zab , and using the modular transformation law
338 Boundary Conformal Field Theory
[22] and the linear independence of the characters from which one finds the allowed boundary states
gives the (equivalent) conditions:
X ~ p1 j0ii p1 1 1 1
j0i 43
nhab Shh0 hajh0 iihhh0 jbi 37 2 2 2 2 16
1=4
h0
X +
1
~ 1 1 1 1
0
hajh0 iihhh0 jbi Shh nhab 38 1
p j0ii p 1=4 44
h 2 2 2 2 2 16
These are called the Cardy conditions. The require- +
ments that the right-hand side of [37] should give a 1
~ 1
non-negative integer, and that the right-hand side of j0ii 45
16 2
[38] should factorize in a and b, give highly
nontrivial constraints on the allowed boundary
states and their operator content. The nontrivial part of the fusion algebra of this
For the diagonal CFTs considered here (and for CFT is
the nondiagonal minimal models) a complete solu-
V 161 V 161 V 0 V 12 46
tion is possible. It can be shown that the elements Sh0
of S are all non-negative, so one may choose
~ = (Sh )1=2 . This defines a boundary state
hhhj0i V 1 V1 V 1 47
0 16 2 16
X 1=2
~
j0i Sh0 jhii 39 V1 V1 V0 48
2 2
h
from which can be read off the boundary operator
and a corresponding boundary condition such that
content
nh00 = h0 . Then, for each h0 6 0, one may define a
boundary state 1 1 1
nhh~ 1 n01~ 1~ n21~ 1~ n21~ ~
1
n116~ 1
~ 1 49
hhhjh~0 i Sh0 =Sh 1=2
h 0 40 16 16 16 16 16 16 2 16
where the first term is the usual extensive contribu- KacMoody algebras via the coset construction. The
tion. The other two pieces sa ln (haj0i) and sb classification of boundary conditions from this point
ln (hbj0i) may be identified as the boundary entropy of view is fruitful and also important for applica-
associated with the corresponding boundary states. tions, but is beyond the scope of this article.
A similar definition may be made in massive QFTs.
It is an unproven but well-verified conjecture that Stochastic Loewner Evolution
the boundary entropy is a nonincreasing function
In recent years, there has emerged a deep connection
along boundary RG flows, and is stationary only for
between BCFT and conformally invariant measures
conformal boundary states.
on curves in the plane which start at a boundary of a
BulkBoundary OPE domain. These arise naturally in the continuum limit
of certain statistical mechanics models. The measure
The boundary Ward identity [24] has the implica- is constructed dynamically as the curve is extended,
tion that, from the point of view of the dependence using a sequence of random conformal mappings
of its correlators on zj and zj , a primary field called stochastic Loewner evolution (SLE). In CFT,
j (zj , zj ) may be thought of as the product of two the point where the curve begins can be viewed as
local fields which are holomorphic functions of zj the insertion of a boundary operator. The require-
and zj , respectively. These will satisfy OPEs as jzj ment that certain quantities should be conserved in
zj j ! 0, with the appearance of primary fields on the mean under the stochastic process is then equivalent
right-hand side being governed by the fusion rules. to this operator having a null state at level two.
These fields are localized on the real axis: they are Many of the standard results of CFT correspond to
the boundary operators. There is therefore a kind of an equivalent property of SLE.
bulkboundary OPE:
X
j zj ; zj djk Im zj hj hj hk bk Re zj 52 Acknowledgments
k
This article was written while the author was a
where the sum on the right-hand side is, in principle,
member of the Institute for Advanced Study. He
over all the boundary fields consistent with the
thanks the School of Mathematics and the School of
boundary condition, and the coefficients djk are
Natural Sciences for their hospitality. The work was
analogous to the OPE coefficients in the bulk. As
supported by the Ellentuck Fund.
before, they are nonvanishing only if allowed by the
fusion algebra: a boundary field of conformal weight See also: Affine Quantum Groups; Eight Vertex and Hard
hk is allowed only if Nhhkh > 0. Hexagon Models; Indefinite Metric; Operator Product
j j
For example, in the c = 12 CFT, the bulk operator Expansion in Quantum Field Theory; Quantum Phase
1
with h = h = 16 goes over into the boundary opera- Transitions; Stochastic Loewner Evolutions; String Field
tor with h = 0, or that with h = 12 , depending on the Theory; Superstring Theories; Symmetries in Quantum
boundary condition. The bulk operator with Field Theory: Algebraic Aspects; Two-Dimensional
h = h = 12 , however, can only go over into the Conformal Field Theory and Vertex Operator Algebras.
identity boundary operator with h = 0 (or a descen-
dent thereof.)
The fusion rules also apply to the boundary Further Reading
operators themselves. The consistency of these with Affleck I (1997) Boundary condition changing operators in
bulkboundary and bulkbulk fusion rules, as well conformal field theory and condensed matter physics. Nuclear
as the modular properties of partition functions, was Physics B Proceedings Supplement 58: 35.
examined by Lewellen. Cardy J (1984) Conformal invariance and surface critical
behavior. Nuclear Physics B 240: 514532.
Cardy J (1989) Boundary conditions, fusion rules and the
Extended Algebras Verlinde formula. Nuclear Physics B 324: 581.
CFTs may contain other conserved currents apart di Francesco P, Mathieu P, and Senechal D (1999) Conformal
Field Theory. New York: Springer.
from the stress tensor, which generate algebras Kager W and Nienhuis B (2004) A guide to stochastic Loewner evolution
(KacMoody, superconformal, W-algebras) which and its applications. Journal of Statistical Physics 115: 1149.
extend the Virasoro algebra. In BCFT, in addition to Lawler G (2005) Conformally Invariant Processes in the Plane.
the conformal boundary condition, it is possible (but American Mathematical Society.
not necessary) to impose further boundary condi- Lewellen DC (1992) Sewing constraints for conformal field theories
on surfaces with boundaries. Nuclear Physics B 372: 654.
tions relating the holomorphic and antiholomorphic Petkova V and Zuber JB Conformal Boundary Conditions and What
parts of the other currents on the boundary. It is They Teach Us, Lectures given at the Summer School and
believed that all rational CFTs can be obtained from Conference on Nonperturbative Quantum Field Theoretic
340 Boundary Control Method and Inverse Problems of Wave Propagation
Methods and their Applications, August 2000, Budapest, Werner W Random Planar Curves and SchrammLoewner Evolu-
Hungary, hep-th/0103007. tions, Springer Lecture Notes (to appear), math.PR/0303354.
Verlinde E (1988) Fusion rules and modular transformations in
2D conformal field theory. Nuclear Physics B 300: 360.
hAir : fx 2 j dx; A
rg; r0
T
the hypersurfaces := {x 2 j d(x, ) = T}, T > 0
= T*
are equidistant to . In terms of the dynamics of
=T
the system, the value T
c
() T
T : minfT > 0 j hiT g max d ; ()
x
*
* T
=0
means the time needed for waves, moving from
with the unit speed, to fill . Figure 1 Manifold and pattern. (Data from Belishev (1997).)
Boundary Control Method and Inverse Problems of Wave Propagation 341
Controllability
(closure in H H). By virtue of the relation
Open subsets and ! determine the LT Wbd T T
f = g Wbd f following from the wave
subspaces equation [1] and [6], the operator LT is interpreted
as Laplacian on waves filling the subdomain hiT .
F T : ff 2 F T j supp f 0; Tg
In the case T > T , one has hiT = , cl U T = H,
GT! : fh 2 GT j supp h ! 0; Tg and LT is a densely defined operator in H, satisfying
LT L. Using [7], one proves the equality LT = L.
of controls acting from and !, respectively. In view
This equality and representation [4] imply that
of hyperbolicity of the problem [1][3], the relation
Z r h i
supp u f ;h ; t hit [ h!it ; t0 6 r
Wvol h LT 1=2 sin r tLT 1=2 h;tdt 9
0
holds for f 2 F T and h 2 GT! . This means that the
waves propagate in with the speed = 1. for all r 0 and any fixed T > T .
342 Boundary Control Method and Inverse Problems of Wave Propagation
for all 2 , 0, r 0. As a result, since any x 2 This sample is isometric to the original (, d) by
can be represented as x = x(, ), one attaches to every construction. Identifying properly the boundaries @
point of the manifold a family of expanding subspaces and , one turns (, d) into a canonical representa-
{W r(,) jr 0} built out of waves. As is seen from [20], tive of the class of equivalent manifolds possessing
the family is determined by the point x (not dependent the given inverse data.
on the representation x = x(,)); the subspaces which If the response operator R2T is given for a fixed
it consists of coincide with Hhxir . T < T , the above procedure produces the wave
Expressing the distance as copy of the submanifold (hiT , d). This locality in
time is an intrinsic feature and advantage of the BC
dx0 ; x00 2 inf fr > 0 j Hhx0 ir \ Hhx00 ir 6 f0gg method: longer time of observation on increases
in accordance with [20], one can represent the depth of penetration into .
In view of [21], one has d(x0 , x00 ) = d(x0 , x00 ), so defined on [0, T ] is called the image of y. The
that the metric space (, d) is an isometric copy amplitude formula represents the images of waves
of (, d) by construction. Thus, the correspondence initiated by boundary controls in the form
x 7! x (point 7! family) is an isometry and
satisfies the general principles (i)(iii) of u f ;0g
; T; T
lim Wbd T
I P Wbd f ; t
t!T 0
coordinatization. 0< <T
The manifold (, d) is the end product of the
wave coordinatization. It represents the original where I is the identity operator and P is the
manifold as a collection of infinitesimal sources projection in H onto clWbd
F . The formula is
interacting with each other via the waves which they derived by the ray method going back to
produce. J Hadamard, the derivation uses the controllability
[7].
Solving Inverse Problems Any model determines the right-hand side of the
T
last relation by the isometry: (Wbd ) (I P )
The motivation for the above coordinatization is T T T T
Wbd = (Wbd ) (I P )Wbd , where Wbd = UWbd T
, I is
that the wave copy can be reproduced via any
the identity operator, and P = UP U is the projec-
model. Namely, the external observer with the
tion in H~ onto cl Wbd F . This leads to the
d)
knowledge of or R2T (T > T ) can recover (, representation
up to isometry by the following procedure:
uf ;0g
; T; ~ T ~I P
lim W ~ W
~ T f ; t
1. Construct the model corresponding to the given t!T 0
bd bd
inverse data and determine the operators Wbd ,
0< <T 23
0 T by [13], [15]; then determine
T r
L, L , and Wvol by [14] or [16], [17]. and makes the amplitude formula a useful tool for
2. Replace on the right-hand side of [19] all solving the inverse problems. The external observer
operators W without tildes by the ones with can construct a model via inverse data and then
r
tildes, and get the subspaces W~ (,) = UW r(,) , visualize by [23] the wave images on the part T of
2 , 0, r 0. the pattern (see Figure 1). The collection of images
r g
3. Gather all nonzero families {W~ (,) jr 0} = : x in the uf ,0 corresponding to all possible controls f is rich
set = {x} and redenote the subspaces as enough for recovering the tensor g on T (i.e., the
r r
~
W x := W~ (,) 2 x; endow the set with the metric metric tensor in semigeodesic coordinates) and
r r
d(x0 , x00 ):= 2 inf{r > 0 j W~ x0 \ W~ x00 6 {0}} (see [22]), turning the pattern into an isometric copy of the
and get a sample (, d) of the wave copy (, d). submanifold (hiT , d). This variant of the method is
Boundary Control Method and Inverse Problems of Wave Propagation 345
more appropriate if one needs to recover unknown framework of linear system theory (Belishev
coefficients of the wave equation in it can be 2001). The method is also related to the problem
realized in terms of numerical algorithms. of triangular factorization of operators (Belishev
and Pushnitski 1996).
Numerical algorithms for solving two-dimensional
spectral and dynamical inverse problems for the wave
Extensions of the Method
equation
utt u = 0 which recover the variable
Electromagnetic waves are also well suited for density
have been developed and tested (Filippov,
coordinatization and for constructing the wave copy Gotlib, Ivanov, 19941999).
d). An appropriate version of the amplitude
(,
formula also exists for the system governed by the See also: Dynamical Systems and Thermodynamics;
Maxwell equations (see Further Reading). At present Geophysical Dynamics; Inverse Problem in Classical
(2004), the applicability of the BC method to three- Mechanics.
dimensional inverse problems of elasticity theory is
still an open question. The following hypothesis Further Reading
concerns the Lame system: the wave coordinatization
procedure (steps 13) using the elastic waves instead Belishev MI (1988) On an approach to multidimensional inverse
of the above uf ,0 , gives rise to the copy of R3 problems for the wave equation. Soviet Mathematics. Doklady
36(3): 481484.
endowed with the metric jdxj2 =c2p where
p Belishev MI (1996) Canonical model of a dynamical system with
cp = ( 2 )=
is the speed of the pressure waves. boundary control in the inverse problem of heat conductivity.
The concept of model is used for solving inverse St. Petersburg Mathematical Journal 7(6): 869890.
problems for the heat and Schrodinger equations Belishev MI (1997) Boundary control in reconstruction of
manifolds and metrics. Inverse Problems 13(5): R1R45.
(Avdonin and Belishev, 19952004), as well as for
Belishev MI (2001) Dynamical systems with boundary control:
the problem of boundary data continuation models and characterization of inverse data. Inverse Problems
(Belishev 2001, Kurylev and Lassas 2002). A variant 17: 659682.
of the BC method allows one to recover not only the Belishev MI (2002) How to see waves under the Earth surface
manifold but also the Schrodinger type operators on (the BC-method for geophysicists). In: Kabanikhin SI and
it and/or the dissipative term in the scalar wave Romanov VG (eds.) Ill-Posed and Inverse Problems, pp. 6784.
Utrecht/Boston: VSP.
equation (Kurylev and Lassas 19932003). Belishev MI (2003) The Calderon problem for two-dimensional
An appropriate version of the amplitude formula manifolds by the BC-method. SIAM Journal of Mathematical
solves the inverse problem for one-dimensional two- Analysis 35(1): 172182.
velocity dynamical system which describes the waves Belishev MI (2004) Boundary spectral inverse problem on a class
consisting of two modes propagating with different of graphs (trees) by the BC-method. Inverse Problems
20(3): 647672.
speeds and interacting with each other (Belishev, Belishev MI and Glasman AK (2001) Dynamical inverse problem
Blagoveschenskii, Ivanov, 19972000). for the Maxwell system: recovering the velocity in the regular
One more variant of coordinatization going back zone (the BC-method). St. Petersburg Mathematical Journal
to the first paper on the BC method, associates with 12(2): 279319.
Belishev MI and Gotlib VYu (1999) Dynamical variant of the
points x 2 the Dirac measures x ; then, their
BC-method: theory and numerical testing. Journal of Inverse
images x are identified via suitable models. This and Ill-Posed Problems 7(3): 221240.
variant solves inverse problems on graphs and the Belishev MI, Isakov VM, Pestov LN, and Sharafutdinov VA
two-dimensional elliptic Calderon problem. The (2000) On reconstruction of metrics from external electro-
reader is referred to articles by the present author magnetic measurements. Russian Academy of Sciences.
listed in Further Reading. Doklady. Mathematics 61(3): 353356.
Belishev MI and Ivanov SA (2002) Characterization of data of
Within the scope of the method, one derives some dynamical inverse problem for two-velocity system. Journal of
natural analogs of the classical GelfandLevitan Mathematical Sciences 109(5): 18141834.
KreinMarchenko equations (Belishev, 19872001). Belishev MI and Lasiecka I (2002) The dynamical Lame system:
Also, an appropriate analog solves the kinematic regularity of solutions, boundary controllability and boundary
inverse problem for a class of two-dimensional data continuation. ESAIM COCV 8: 143167.
Katchalov A, Kurylev Y, and Lassas M (2001) Inverse Boundary
manifolds (Pestov 2004). Spectral Problems. Chapman and Hall/CRC Monographs and
There exists an abstract version of the Surveys in Pure and Applied Mathematics, vol. 123. Boca
approach, embedding the BC method into the Raton, FL: Chapman and Hall/CRC.
346 Boundary-Value Problems for Integrable Equations
Because of this simple fact, a straightforward Recently, Fokas (2000) introduced a general
application of the ideas of the inverse scattering methodology to extend the ideas of the inverse
transform immediately encounters one crucial diffi- scattering transform to boundary-value problems.
culty. This transform method yields an integral This methodology provides the tools to analyze
representation of the solution which involves not boundary-value problems for integrable equations to
only the given boundary conditions f (t), but also the a considerable degree of generality. We note as a
other unknown boundary values in our example side remark that linear PDEs are trivially integrable,
for the NLS equation, the function qx (0, t). The in the sense of admitting a Lax pair (in this case the
problem of characterizing these unknown boundary Lax pair can be found algorithmically, while the
values has impeded progress in this direction for over construction of the Lax pair associated with a
thirty years. nonlinear equation is by no means trivial). As a
On account of their physical significance, various consequence of this remark, the extension of the
boundary-value problems for the KdV equation have inverse scattering transform also provides a method
been considered, and classical PDE techniques (not for solving boundary-value problems for a large
specific to integrable models) have been used to variety of linear PDEs of mathematical physics.
establish existence and uniqueness results (Bona What follows is a general description of the
et al. 2001, Colin and Ghidaglia 2001, Colliander approach of Fokas, considering, for the sake of
and Kenig 2001). These approaches, and in parti- concreteness, the case of an integrable PDE in the
cular the approach of Colliander and Kenig, are two variables (x,t) which vary in the domain D
quite general and possibly of wide applicability, and (typically, for an evolution problem D = (0, 1)
give global existence results in wide functional (0, T)). We assume that q(x, t) denotes the unique
classes. However, they do not rely on integrability solution of a boundary-value problem posed for
properties. Indeed, none of these results use the such an equation.
integrable structure of the equation in any funda-
The method consists of the following steps.
mental or systematic way. However, the fact that
these equations are integrable on the full line implies 1. Write the PDE as the compatibility condition of a
very special properties that should be exploited in Lax pair. This is a pair of linear ODEs for the
the analysis and it is natural to try to generalize the function = (x, t, k) involving the solution
inverse scattering transform approach. q(x, t) of the PDE, the derivatives of this solution,
Such a generalization is sometimes directly possi- and a complex parameter k, called the spectral
ble. For example, it has been used for studying the parameter. This can be done algorithmically for
problem on the half-line for the hyperbolic version linear PDEs, and in this case (x, t,k) is a scalar
of the sG equation [4a] which does not involve function. For nonlinear integrable PDEs, (x, t, k)
unknown boundary values (Fokas 2000, Pelloni). It is in general a matrix-valued function.
has also been used to study some specific boundary- The equivalence of the PDE with a Lax pair
value problems for the NLS equation, for example, can be reformulated in the language of differ-
for homogeneous Dirichlet or Neumann conditions, ential forms, and in this language it is easier to
when it is possible to use even or odd extensions of describe the methodology in general. Assume
the problem to the full line (Ablowitz and Segur then that (x, t, k) is a differential 1-form
1974), or more recently in Degasperis et al. (2001). expressed in terms of a function q(x, t) and its
In the latter case, however, the unknown boundary derivatives, and of a complex variable k, and one
values are characterized through an integral Fred- which is characterized by the property that
holm equation, which does not admit a unique d = 0 if and only if q(x, t) satisfies the given
solution. Some special cases of boundary-value PDE. The closure of the form yields the two
problems for the KdV equation (Adler et al. 1997, important consequences 2(a) and 2(b) below.
Habibullin 1999) and elliptic sG (Sklyanin 1987) 2. (a) Since the domain D under consideration is
have also been studied via the inverse scattering simply connected, the closed form is also exact;
transform. However all the examples considered are hence, it is possible to find the particular, 0-form
nongeneric, and it has recently been shown (Fokas, (x, t,k), solving d = . In particular, (x, t, k)
in press) that the boundary conditions chosen fall in can be chosen to be sectionally bounded with
the special class of the so-called linearizable respect to k by solving either a RiemannHilbert
boundary conditions, for which the problem can be problem or a d-bar problem in the complex
solved as if it were posed on the full line. One spectral k plane, and the solution (x, t, k) is
cannot hope to use similar methods to solve the then expressed in terms of certain spectral
problem with generic boundary conditions. functions depending on all the boundary values
348 Boundary-Value Problems for Integrable Equations
x ik3 Qx; 0; k; 0 < x < 1; Im k 0 Theorem 1 Consider the boundary-value problem
for the NLS equation [1] determined by the conditions
0 [7]. Let a(k), b(k) be given by [8], and suppose that
x; k eikx o1 as x ! 1
1 there exists a function g1 (t) such that if A(k), B(k) are
0 q0 x defined by [9], then the global relation [8] holds.
Qx; 0; k Let M(x, t, k) be the solution of the 2 2
q0 x 0
RiemannHilbert problem with jump on the real
and imaginary axes given by
(3 and Q(x, t, k) are defined after eqns [5] and [6],
respectively). M (x, t, k) = M (x, t, k)J(x, t, k) with M = M in
Given q0 (x) and g0 (t) characterize g1 (t) by the the second and fourth quadrants of C, M = M in the
requirement that the spectral functions first and third quadrants of C, and J(x, t, k) is defined
2
{A(t, k), B(t, k)} satisfy the global relation in terms of a, b, A, B and the exponential eikx2ik t :
M = I O(1=k) as k ! 1 and has appropriate
2 ct; k
Bt; k RkAt; k e4ik t
residue conditions if there are poles
ak Then M(x,t,k) exists and is unique, and
8
bk
Rk ; t 2 0; T; k 2 D qx; t 2i lim kMx; t; k12
ak k!1
350 Boundary-Value Problems for Integrable Equations
The result above relies on characterizing the representation has now been derived for all equations
unknown boundary value g1 (t) a priori by requiring [1][3], see Fokas (in press).
that the global relation hold. Recently, substantial The analysis of the invariance properties of the
progress has been made in this direction in the case of global relation with respect to k also yields the
integrable nonlinear evolution equations, in particu- characterization of all the boundary conditions for
lar of NLS. Namely Fokas (in press) contains an which the transform obtained to represent the solution
effective description of the map assigning to each linearizes. For these boundary conditions, called
given q(x, 0) = q0 (x) and g0 (t) = q(0, t) a unique value linearizable, the solution can be represented as
for qx (0, t) (called the Dirichlet to Neumann map) for effectively as for the Cauchy problem. For example,
the NLS, as well as for a version of the Korteweg the linearizable boundary conditions for the NLS
deVries and sG equations. We state below the equation are given by any boundary values that satisfy
relevant theorem for the case of the NLS equation.
Theorem 2 Let q(x, t) satisfy the NLS equation on g0 tg1 t g0 tg1 t 0
the half-line 0 < x < 1, t > 0 with the initial and
An example of boundary condition satisfying
boundary conditions [7]. Then g1 (t) := qx (0, t) is
this constraint, encompassing also Dirichlet and
given by
Neumann homogeneous conditions, is q(0, t)
Z qx (0, t) = 0, with a non-negative constant.
g0 t 2
g1 t e2ik t 2 t;k2 t;kdk As mentioned at the beginning of the previous
@D
Z section, the approach described in general can be
4i 2
e2ik t kRk2 t; kdk used to obtain results similar to those given for the
@D
Z NLS equation for many other integrable evolution
2i 2
equations, in particular, mKdV (Boutet de Monvel
e2ik t k1 t;k1 t;k ig0 tdk
@D et al. 2004), sG, and KdV (Fokas 2002). The results
obtained are essentially the same as for NLS,
with =(1 ,2 ) given by the solution of [10]. The
starting from the general form [5] of the Lax pair,
Neumann datum g1 (t) is unique and exists globally
and include the derivation of the solution representa-
in t.
tion, the complete characterization of linearizable
This result yields a rigorous proof of the global boundary conditions, and the analysis of the Dirichlet
existence of the solution of boundary-value pro- to Neumann map.
blems on the half-line for the NLS equation. There- The approach above can also be used for studying
fore, the assumption in Theorem 1 that a suitable boundary-value problems posed on finite domains,
function g1 (t) exists can be dropped. for x 2 [0, 1]. This has been done for a model for
transient simulated Raman scattering (Fokas and
Menyuk 1999), for the sG equation in light-cone
coordinates (Pelloni, in press), and for the NLS
Generalizations and Summary of Results
equation (Fokas and Its 2004). In this case also the
Results analogous to the ones presented in the method yields a representation of the solution which
previous section can be phrased exclusively in terms is suitable for asymptotic analysis. In this respect,
of integral equations rather than in terms of the question of soliton generation from boundary
RiemannHilbert problems, as done for example in data is of some importance, and has been recently
Khruslov and Kotlyarov (2003). This is the point of considered by various authors (Fokas and Menyuk
view of the school of Gelfand and Marchenko, and in 1999, Boutet de Monvel and Kotlyarov 2003,
this setting the functions are given in the so-called Pelloni in press, Boutet de Monvel et al. 2004).
GelfandLevitanMarchenko representation. Results The results are however still considered case by case,
on boundary-value problems for the NLS equation and there is no general framework for this problem
using this representation have been obtained only identified yet. For problem on the half-line, solitons
under additional assumptions on the unknown part may be generated but not necessarily in correspon-
of the boundary values. It was only after the idea that dence to the singularities that generate soliton for
the x- and t-parts of the spectral equations should be the full line problem, even when the same singula-
treated simultaneously that this approach yielded rities are present. For problems posed on finite
complete results. However, the GelfandLevitan domains, in some specific cases at least for the
Marchenko representation yields a crucial simplifica- simulated Raman scattering, and the sG equations,
tion for deriving the explicit form of the Dirichlet to it appears that the dominant asymptotic behavior is
Neumann map and proving Theorem 2. This given by a similarity solution.
Braided and Modular Tensor Categories 351
In conclusion, the extension of the inverse scattering Colliander JE and Kenig CE (2001) The generalized Korteweg
transform given by Fokas provides the tool for analyzing deVries equation on the half line (http://arxiv.org/abs/
math.AP/0111294).
boundary-value problems specific to nonlinear integr- Degasperis A, Manakov S, and Santini PM (2001) The nonlinear
able equations. This tool relies, in an essential way, on Schrodinger equation on the half line. JETP Letters
the integrability structure of the problem, and yields a 74(10): 481485.
full characterization of the solution as well as uniqueness Fokas AS (2000) On the integrability of linear and nonlinear
and existence results. The solution representation thus PDEs. Journal of Mathematical Physics 41: 4188.
Fokas AS (2002) Integrable nonlinear evolution equations on the half
obtained is not always fully explicit, but it is always line. Communications in Mathematical Physics 230: 139.
suitable for asymptotic analysis using standard techni- Fokas AS (2005) A generalised Dirichlet to Neumann map for
ques such as the recent nonlinearization of the classical certain nonlinear evolution PDEs. Communications on Pure
steepest descent method. and Applied Mathematics 58: 639670.
Fokas AS and Its AR (2004) The nonlinear Schrodinger equation
See also: @ Approach to Integrable Systems; Integrable on the interval. Journal of Physics A: Mathematical and
General 37: 60916114.
Discrete Systems; Integrable Systems and the Inverse
Fokas AS and Menyuk CR (1999) Integrability and self-similarity
Scattering Method; Integrable Systems: Overview;
in transient stimulated Raman scattering. Journal of Nonlinear
Nonlinear Schrodinger Equations; RiemannHilbert Science 9: 131.
Methods in Integrable Systems; Separation of Variables Gardner GS, Greene JM, Kruskal MD, and Miura RM (1967)
for Differential Equations; Sine-Gordon Equation. Method for solving the Kortewegde Vries equation. Physical
Review Letters 19: 1095.
Habibullin IT (1999) KdV equation on a half-line with the zero
Further Reading boundary condition. Theoretical and Mathematical Fizika
119: 397.
Ablowitz MJ and Segur HJ (1974) The inverse scattering Khruslov E and Kotlyarov VP (2003) Generation of asymptotic
transform: semi-infinite interval. Journal of Mathematical solitons in an integrable model of stimulated Raman scattering by
Physics 16: 1054. periodic boundary data. Mat. Fiz. Anal. Geom. 10(3): 366384.
Adler VE, Gurel B, Gurses M, and Habibullin IT (1997) Journal Lax PD (1968) Integrals of nonlinear equations of evolution and
of Physics A 30: 3505. solitary waves. Communications in Pure and Applied Mathe-
Bona J, Sun S, and Zhang BY (2001) A non-homogeneous boundary matics 21: 467490.
value problem for the KortewegdeVries equation. Transactions of Pelloni B (2005) The asymptotic behaviour of the solution of boundary
the American Mathematical Society 354: 427490. value problems for the SineGordon equation on a finite interval.
Boutet de Monvel A, Fokas AS, and Shepelsky D (2004) The Journal of Nonlinear Mathematical Physics 12: 518529.
modified KdV equation on the half-line. Journal of the Sklyanin EK (1987) Boundary conditions for integrable equations.
Institute of Mathematics of Jussieu 3: 139164. Functional Analysis and its Applications 21: 8687.
Boutet de Monvel A and Kotlyarov VP (2003) Generation of Zakharov VE and Shabat AB (1972) An exact theory of two-
asymptotic solitons of the nonlinear Schrodinger equation by dimensional self-focusing and one-dimensional automodula-
boundary data. Journal of Mathematical Physics 44: 31853215. tion of waves in a nonlinear medium. Soviet Physics JEPT
Colin T and Ghidaglia J-M (2001) An initial-boundary value problem 34: 6278.
for the KortewegdeVries equation posed on a finite interval.
Advanced Differential Equations 6(12): 14631492.
It means that a canonical braiding isomorphism realized as categories of modules over weak Hopf
c : X Y ! Y X still exists, but it is not involutive algebras, but we stress again that the monoidal product
any more, c2 6 id. The braiding c satisfies the Yang for such modules does not coincide with the tensor
Baxter equation product of vector spaces. So, general features are better
seen at the level of category theory, and we now start
c 11 cc 1 with precise definitions.
1 cc 11 c : X Y Z ! Z Y X
^
conformal field theories (RCFTs), integrable models commutes (the pentagon equation) and
of statistical mechanics and topological quantum
field theories (TQFTs). The common feature of Xl Y r 1
X
Y
aX;1;Y X 1 Y ! X Y ! X 1 Y
these categories is that they are semisimple abelian
with finite number of simple modules. In other
words, such a category C is equivalent to the category Definition 2 A monoidal functor (F, , f ) : (C, ) !
of finite-dimensional Cn = C C-modules for (D, ) is a functor F : C ! D, a functorial isomorph-
some n. However, not monoidally equivalent, the ism = X, Y : F(X) F(Y) ! F(X Y) 2 D, and an
monoidal structure can be rather involved. For isomorphism f : 1 ! F1 2 D such that
instance, from the Ising model one can obtain the 1
monoidal category with two simple objects I and X, FX FY FZ ! FX FY Z ! FX Y Z
which obey the monoidal law 1 1 = 1, 1 X = X a# #Fa
1 = X, X X = 1 X. Clearly, such relations cannot 1
be satisfied by finite-dimensional C-vector spaces 1 FX FY FZ ! FX Y Z ! FX Y Z
and X, if would mean the usual tensor product C
F1 FX ! F1 X FX F1 ! FX 1
of C-vector spaces. However, here means simply a
functor : C C ! C with certain properties. Cate- f 1 " # F l; 1f " #Fr
gories which come from RCFT, integrable models or 1 FX l FX FX 1 r FX
^
_ _
which induces an isomorphism jX, Y : Y X ! (X t
f = f
Y)_ , such that the above pairing coincides with
Y
_ _ 1j _ ev
X
X Y Y X ! X Y X Y ! 1 X
The equation
We have a monoidal self-equivalence of C,
coevY _ _
coevXY 1 ! Y Y Y 1 Y __ ; j2 : C; ; 1 ! C; ; 1; X 7! X__ ; f 7! f tt
Y _ X_ X Y
1coevX 1 j j1t 1
j2 X;Y X__ Y __ ! Y _ X_ _ ! X Y__
^
j 1
! X Y_ X Y
It is not always true that the two duals X_ and _ X
also holds. Similarly, there is an isomorphism are isomorphic. However, there are canonical
jX, Y : _ Y _ X ! _ (X Y). isomorphisms
Morphisms constructed from braidings and (co)-
evaluations are often described by tangles. The X ! _ X_ ; X ! _ X_
354 Braided and Modular Tensor Categories
We may replace the category C with an equivalent one, These are isomorphisms of monoidal functors
such that the above isomorphisms become identity (see [1])
morphisms, and the functors _ and _ are inverse to
each other. We shall assume this to simplify notations. u21 : Id; c2 ! __ ; j2
Finally, we denote the iterated duals by X(n_) = X__ u21 : Id; c2 ! __ ; j2
(n times) and X(n_) = __ X (n times) for n 0.
In particular, this implies the commutativity of the
diagram
Braided Categories XY c2
XY
^
Here we review the definitions of the braiding u21 u21 # #u21
isomorphism and further derived isomorphisms. Sev- j2
eral basic relations between them are listed. Two X__ Y __ ! X Y__
important classes of examples of braided categories The square of the monoidal functor (__ , j2 ) is
are given by the categories of modules over quasitrian-
gular Hopf algebras and the categories of tangles. ____ ; j4 : C; ; 1 ! C; ; 1;
Definition 4 A braided category (C, c) is a monoidal X 7! X____ ; f 7! f tttt
category C equipped with a functorial isomorphism
where
c = cX, Y : X Y ! Y X the braiding, or the
commutativity isomorphism such that the two ____ ____ j2 __
jtt
__ __ 2 ____
j4X;Y X Y ! X Y ! X Y
hexagons commute,
X Y Z 1c
1 X Z Y ! X Z Y
a The natural isomorphism u40 = u21 u21 is, in fact, an
^
a # # c
1 1 isomorphism of monoidal functors u40 : (Id, id) !
c
1 a (____ , j4 ).
X Y Z ! Z X Y ! Z X Y
The following result can be used to simplify are very similar to those of usual Hopf algebras, for
notations: example, the antipode is antimultiplicative with
respect to the braiding (see, e.g., Majid (1993)).
Proposition 1 For any ribbon category C there exists
For Hopf algebras in rigid braided categories, there
a ribbon category D equivalent to C such that in it
exist integrals in a sense very much similar to the
(i) 1_ = 1; case of ordinary finite-dimensional Hopf algebras,
(ii) for any object X we have _ X = X_ , X__ = X, as shown by Bespalov et al. (2000).
and X = idX : X ! X__ X.
(iii) for any object X we have evX = ev0X_ : X
X_ ! 1, and coevX = coev0X_ : 1 ! X_ X.
Modular Categories
In the category C = H-mod, where H is a ribbon
Assume that a braided rigid monoidal category C is
Hopf algebra, the equation X_ = _ X is not neces-
equivalent as a category (with monoidal structure
sarily satisfied. Nevertheless, X_ is canonically
ignored) to the category of finite-dimensional mod-
isomorphic to _ X. The same holds in any ribbon
ules over a finite-dimensional algebra. In particular,
category. We identify these objects via = u20 :
_ C is abelian. Then there exists an object F in C,
X ! X_ . This allows us to use the right dual
equipped with a morphism iX : X X_ ! F for each
objects in place of the left ones. In that role, the
X 2 Ob C, such that the diagram
right duals are equipped with the left evaluation
and coevaluation, called flipped evaluation and f Y _
X Y_ Y Y_
^
coevaluation, respectively:
Xf t # #iY
_ X_ _ __ ev
e :X X
ev X X ! 1 _ iX
^
XX F
^
g :1
coev coev X__ X_ 1 X_ X X_
^
Let C be a braided monoidal category. A Hopf It turns out that the coend F is a Hopf algebra in
algebra H in C is an object H 2 Ob C together with the braided category C, when it is equipped with the
an associative multiplication m : H H ! H and an following operations. The comultiplication in F is
associative comultiplication : H ! H H, obeying uniquely determined by the equation
the bialgebra axiom
iX
m
X X_ !F ! F F
H H!H!H H
X X_ X 1 X_
H H H H H H
^
XcoevX_ X X_ X X_
^
HcH
HHHH
^
iX iX FF
^
mm
HH
^
The unit is given by the morphism and is universal between morphisms with such
property. By duality, the integral functional : F ! 1
i1
: 1 1 1_ ! F is also two sided. It satisfies
1
The diagram corresponding to the antipode F ! F F ! F 1 F
F : F ! F is given by
F ! 1 ! F
F
1
F ! F F ! 1 F F
X sXY sXZ
In particular, its restriction to S is a matrix sjS : S This proves the second formula. &
S ! C, denoted again by s = (sXY )X, Y2S by abuse of Proposition 2 (Criterion of modularity) In the
notation; here X and Y run over simple objects. above assumption of semisimplicity, the following
Notice that sXY = sYX , so the matrix s is symmetric. conditions are equivalent:
Let us consider the C-algebra Inv F = HomC (1, F). It has (i) C is modular (! is nondegenerate);
the basis X , X 2 S; hence, it is n-dimensional, where (ii) the matrix (sXY )X, Y2S is nondegenerate;
n = Card S. The form ! on F induces a bilinear form (iii) for any X 2 S its dimension dimq X does not
vanish, and there exist numbers 0Y , Y 2 S, such
! 0 : Inv F Inv F ! Hom1; F F Hom1;! 1 P
^
for an arbitrary object Z of C, where Z is the Multiplying both sides of [7] with 1 , we find
natural coaction. The equation
Y 1 dimq Y
X X X X The normalization is fixed by eqn [6], which we can
Y Y write as
= XY 3 Y
X Y
1 1 1 Y u20
Y2S
Y Y Y Y
follows from the properties of the two-sided integral X
21 dimq Y2
of the Hopf algebra F. Due to uniqueness of Y2S
integrals, is proportional to 1 . In eqn [3], X and
Y vary over S. The right-hand side is the identity Hence,
morphism if X = Y, and vanishes otherwise. Sub- !1
X 2
stituting the definition of Y , we rewrite the 2
1 dimq Y 8
equation as follows: Y2S
u02
~ Conjugation Properties
y = XY 4
From the Verlinde formula [2], we conclude that
the commutative C-algebra Inv F possesses
X Y X Y homomorphisms
For X = 1, we get X : Inv F ! C
Y ~Y 1Y idY : Y ! Y 5 Y 7! dimq X1 sXY sXY =sX1
If Y 6 1, then ~Y = 0. So [5] tells essentially that The matrix s is invertible, so that its columns cannot
be proportional. Hence, all X are different char-
1 ~1 id1 : 1 ! 1 6 acters. Their number is n = Card S = dimC F; hence,
Now return to [4] with X = Y. If we compose that there is an isomorphism of C-algebras
equation with coev : 1 ! Y _ Y, we obtain
: Inv F ! C C Cn
X X 7! 1 ; . . . ; n
Now we show that the dimensions dimq (Y) are
y . ~n = y
~
real numbers, so that 1 is also a real number. One
can introduce in Inv F an antilinear involution,
Y Y Y Y : Inv F ! Inv F; X X_
and a scalar (Hermitian) product
Y u02 Y X jY XY ; X; Y 2 S
7 Then Inv F becomes a finite-dimensional commu-
=
tative Hilbert algebra. Indeed,
Y Y X Y jZ dim HomX Y; Z
dim HomX; Y _ Z X jY Z
From the theory of finite-dimensional commutative
= dimqY Hilbert algebras, we know that idempotents in the
algebra Inv F are self-adjoint (only in that case the
scalar product can be positive definite). Hence, is
Y Y a -morphism, that is, X ( ) = X (). Therefore,
Braided and Modular Tensor Categories 359
sXY _ =sX1 = sXY =sX1 . In the particular case of X = 1, the constructions due to Kerler and Lyubashenko
we obtain (2001) takes a nonsemisimple modular category as an
input and assigns to it a double TQFT functor, that is,
dimq Y dimq Y _ s1Y _ s1Y dimq Y a functor between double categories. The target is the
since s11 = 1. This proves that for any Y 2 C its 2-category of abelian categories.
dimension dimq (Y) is a real number.
See also: Axiomatic Approach to Topological Quantum
It is natural to take for 1 the positive root of the
Field Theory; Hopf Algebras and q-Deformation Quantum
right-hand side of [8]. Positiveness fixes 1 uniquely.
Groups; The Jones Polynomial; Knot Invariants and
Quantum Gravity; Quantum 3-Manifold Invariants;
Examples of Semisimple Modular Categories
Symmetries in Quantum Field Theory of Lower
In their original paper, Reshetikhin and Turaev Spacetime Dimensions; Topological Quantum Field
(1991) use as algebraic input data the representation Theory: Overview; von Neumann Algebras: Introduction,
theory of the quantum deformation U = Uq (sl2 ) of Modular Theory, and Classification Theory; von
the Lie algebra sl(2, C), where q is a root of unity. Neumann Algebras: Subfactor Theory.
They construct the invariant as a trace over
U-equivariant morphisms, and prove the necessary
modularity condition concerning the nondegeneracy Further Reading
of the braided pairing. Bakalov B and Kirillov A Jr. (2001) Lectures on Tensor
The general picture is drawn by Turaev (1994), Categories and Modular Functors, University Lecture Series,
where 3-manifold invariants and TQFTs are con- vol. 21. Providence, RI: American Mathematical Society.
Bespalov Y, Kerler T, Lyubashenko VV, and Turaev VG (2000)
structed from semisimple modular categories. He
Integrals for braided Hopf algebras. Journal of Pure and
shows how to obtain the latter as quotients of Applied Algebra 148(2): 113164 (arXiv:math.QA/9709020).
certain subcategories of representations of a modu- Drinfeld VG (1987) Quantum groups. In: Gleason A (ed.)
lar Hopf algebra by the ideal of trace-negligible Proceedings of the International Congress of Mathematicians
morphisms. (Berkeley, 1986), vol. 1, pp. 798820. Providence, RI:
American Mathematical Society.
Finkelberg (1996), based on results of Gelfand
Drinfeld VG (1989a) Quasi-Hopf algebras. Algebra i Analiz
and Kazhdan, establishes (via the theory of Kazhdan 1(6): 114148.
and Lusztig) an equivalence between two modular Drinfeld VG (1989b) Quasi-Hopf algebras and Knizhnik
categories. The first is the semisimple category C of Zamolodchikov equations. In: Problems of Modern Quantum
integrable modules over an affine Lie algebra ^g of Field Theory, pp. 113. BerlinNew York: Springer.
Finkelberg M (1996) An equivalence of fusion categories.
positive integer level k. The second is a certain
Geometric and Functional Analysis 6(2): 249267.
subquotient of the category of Uq (g )-modules for Huang Y-Z and Lepowsky J (1999) Intertwining operator
q = exp(
im1 =(k h_ )), where m 2 {1, 2, 3} and h_ algebras and vertex tensor categories for affine Lie algebras.
is the dual Coxeter number of g . Huang and Duke Mathematical Journal 99(1): 113134 (arXiv:q-alg/
Lepowsky (1999) describe the rigid braided struc- 9706028) (arXiv:q-alg/9706028).
Joyal A and Street RH (1991) Tortile YangBaxter operators in
ture of C using vertex operators. Bakalov and
tensor categories. Journal of Pure and Applied Algebra
Kirillov (2001) use geometrical constructions to 71: 4351.
make C into a modular category, associated with Kerler T and Lyubashenko VV (2001) Non-Semisimple Topologi-
the WessZuminoWitten (WZW) model. They cal Quantum Field Theories for 3-Manifolds with Corners,
construct the corresponding WZW modular functor. Lecture Notes in Mathematics, vol. 1765, vi + 379 pp.
Heidelberg: Springer.
Mac Lane S (1971) Categories for the Working Mathematician,
Modular Functor and TQFT GTM, vol. 5. New York: Springer.
Majid S (1993) Braided groups. Journal of Pure and Applied
Modular categories give rise to a modular functor Algebra 86(2): 187221.
and a TQFT. The meanings of those differ from Majid S (1995) Foundations of Quantum Group Theory.
Cambridge: Cambridge University Press.
author to author, but the common features are the Moore G and Seiberg N (1989) Classical and quantum conformal
following. Such a TQFT is a functor from the field theory. Communications in Mathematical Physics
category whose objects are smooth surfaces with 123: 177254.
additional structures and morphisms are three- Reshetikhin NY and Turaev VG (1991) Invariants of 3-manifolds
dimensional manifolds with additional structures to via link polynomials and quantum groups. Inventiones
Mathematicae 103(3): 547597.
the category of vector spaces. A modular functor is Turaev VG (1994) Quantum Invariants of Knots and 3-Manifolds,
the restriction of such TQFT to the subcategory whose de Gruyter Stud. Math, vol. 18. BerlinNew York: Walter de
morphisms are homeomorphisms of surfaces. One of Gruyter.
360 Brane Construction of Gauge Theories
Using the chosen boundary conditions, the varia- boundary terms, the total variation of the action
tion of the full action contains the boundary terms due to the shift X (, 0) = becomes
Z 0 Z
1
J
Sbound Ai Ai
I
@ i d S @ @ X d2
2 0
Z 1 Z
1
i @ Xi ; 0d @ X ; 0d 12
2 0 0 2 0 0
i The resulting momentum is
Xi ; 0 Xi 0; 0
2 0 Z
P
1
@ X ; 0d
0 J I
2 Ai Ai 10 2 0 0
Imposing the condition of its vanishing gives the On the bulk, the fields X satisfy the standard wave
physical interpretation for the normal components equation in two dimensions, so that the general
of the U(1) fields solution is the sum of a left-moving and a right-
moving part, X (, ) = XL ( ) XR ( ).
J I
Xi ; 0 Xi 0; 0 2 0 Ai Ai 11 Imposing the boundary conditions, one finds
Massless NS5
v
x6
x
D4
Massive
Dirichlet conditions allow the action of supersym- can try to consider the coexistence of more kinds of
metric transformations of the form
L QL
R QR , branes.
where QL and QR are the fermionic left and right One way to do this is to consider n parallel 4-branes
supercharge operators and
L ,
R are spinors satisfy- ending on an NS5 brane in type IIA string theory
ing the brane projection condition
L = 0 1 . . . (Figure 4), and then analyze the gauge theory restricted
p
R . Here are the ten-dimensional Dirac to the four-dimensional intersection (here the theory is
matrices and one refers to antibranes for the nonchiral as 0 . . . 9
L=R =
L=R ). What kind of
negative sign. branes can end on other kind of branes can be
Second, the gauge group can be converted into an established, starting from the fact that strings can end
SO(n) or an Sp(n=2) (for even n), adding an on a brane, and using the dualities tool (Giveon and
orientifold plane parallel to the branes. The orienti- Kutasov 1999).
fold plane acts on the orthogonal spacetime direc- Let us fix some conventions. We will indicate with
tions with a Z2 -action x = (x0 , x1 , x2 , x3 ) 2 R4 the coordinates on the inter-
section, so that (x; v) = (x; x4 , x5 ) 2 R6 define the NS5
Xi Xi 16 brane, and (x, x6 ), with x6 2 [0, 1), the 4-branes. Also
if Xi = 0 is the position of the orientifold. It further vI will indicate the position of the Ith 4-brane on the 5-
acts on the string world sheet as making it brane, and y = (x7 , x8 , x9 ) will collect the remaining
an unoriented string. The effect is to project out coordinates. Finally, we will indicate the product of -
some states from the spectra, thus reducing the matrices, corresponding to given directions, indicizing
gauge group. a simple with the respective coordinates. For
example v = 4 5 . With these conventions, the
brane projection conditions for D4 and NS5 branes,
Geometric Engineering of Gauge respectively, read
Theories from Branes
L x 6
R 17
To illustrate how brane construction of gauge
L x v
L ;
R x v
R 18
theories works, we will consider a particular con-
figuration of branes (Witten 1997). These projections reduce supersymmetry to N = 2.
We would like to obtain a four-dimensional U(n) After a short manipulation and using for example
gauge theory. A possibility could be to take n D3 antichirality of
R , it is easy to see that the first
branes in a type IIB string background. However, condition can be substituted by
such a model would contain too many supersymme-
L x y
R 19
tries: in ten dimensions, supersymmetries are gener-
ated by two 16-dimensional chiral spinors
L ,
R In other words, we could add a number of 6-branes
(0 . . . 9
L,R =
L,R ). From the four-dimensional in the (x, y) directions, without further reducing
point of view, each of them represents four four- supersymmetry. We will consider this possibility
dimensional spinors giving an N = 8 supersymmetric later.
theory. The projection condition, due to the branes, On the D4 branes there is an eventually broken
reduces the number of supersymmetries to four. U(n) gauge theory. Here the vector fields
Supersymmetry not being manifest in nature, it is A , = 0, 1, 2, 3, 6, and the scalar fields vI and y
desirable to have fewer supersymmetric gauge theo- live. The last ones are set to zero by the Dirichlet
ries at hand. Because different brane projection conditions, whereas vI measure the fluctuations of
conditions can further reduce supersymmetry, we the D3 brane positions over NS5. The O(2) group
364 Brane Construction of Gauge Theories
let us sketch how the exact SeibergWitten solution where nL is the number of D4 branes ending on
can be obtained for the N = 2 model described in the the left-hand side of the NS5 brane, in the positions
previous section, in the simplest case without v()
Li , and similar for the R index, which refers to
matter.
x NS5
(n1, n2)
NS5
D4 n1 n2 D4
x6 v
Figure 8 N = 2 four-dimensional super YangMills theory with U(n1 ) U(n2 ) gauge group and matter. Strings crossing the central
NS5 brane give matter in the (n1 , n2 ) representation.
366 Brane Construction of Gauge Theories
Y
n
x
y NS5
w
s s1 s s2 v vi 0 24
Matter
D6 i1
x 10
t2 vn an2 vn2 a1 v a0 t 1 0 26
Figure 11 D4 branes become M5 membranes in M-theory. This realizes a quantum-mechanical correspondence
between the M5 membrane configurations described
the right-hand side. Here () refers to the th NS5 by the given polynomials, and the N = 2 super
brane, and k is an integration constant. YangMills vacua. But this is also the claimed
Because x6 is the real part of a holomorphic field, SeibergWitten curve. In particular, M-theory gives
whose imaginary part is compactified on a circle of a concrete physical meaning for the support Rie-
ray R10 , we then find mann surfaces of the SeibergWitten solutions.
To conclude, let us make some further comments.
X
nL
sv R10 log v vLi It is clear how the construction can be extended for
i1 involving more configurations, for example, with
X
nR more NS5 branes, or adding matter.
R10 log v vRi 22 Also, we have seen that the geometrical picture
i1 which branes give of gauge theories extends at the
This describes the quantum fluctuations of the NS5 quantum level.
brane as seen in M-theory. In particular, because of A similar construction can be made for the N = 1
the imaginary part of s, the ends of the D4 branes model, which also permits a full geometrical proof
appear as vortices on the NS5 brane. In place of s, it of the Seiberg duality at both classical and quantum
is now convenient to introduce a new field levels.
t := exp (s=R10 ) so that Finally, we should note that there are also
Q nR other methods, which work in spacetimes where extra
i1 v v Ri dimensions are compactified. There, the branes wrap
tv Q 23 around certain singular loci which contain information
nL
i1 v vLi about gauge symmetries (Lerche 1997).
Before continuing, let us look a bit again at the See also: AdS/CFT Correspondence; Compactification of
classical limit. In this case, a fixed value of v will Superstring Theory; Gauge Theories from Strings;
correspond to the position of a D4 brane, whereas a Noncommutative Geometry from Strings; SeibergWitten
fixed value of s will correspond to the fixed position Theory; Supergravity; Superstring Theories;
of an NS5 brane. The classical configuration is then Supersymmetric Particle Models.
Brane Worlds 367
Brane Worlds
R Maartens, Portsmouth University, Portsmouth, UK extended objects of higher dimension than strings
2006 Elsevier Ltd. All rights reserved. play a fundamental role in the theory. These objects
are known as branes (from membranes), and the
relation between them and strings leads to a new
Introduction picture of how gravity and matter may be connected
in the universe. Roughly speaking, open strings
At high enough energies, Einsteins classical theory describe the particles of the nongravitational sector,
of general relativity breaks down, and will be and their ends are attached to branes, while closed
superseded by a quantum gravity theory. The strings, which describe the graviton and associated
singularities predicted by general relativity in grav- particles of the gravitational sector, can move freely
itational collapse and in the hot big bang origin of in all dimensions.
the universe are thought to be artifacts of the Thus, the observable universe could be a
classical nature of Einsteins theory, which will be 1 3-surface a brane, embedded in a
removed by a quantum theory of gravity. Develop- 1 3 d-dimensional spacetime the bulk,
ing a quantum theory of gravity and a unified theory with standard-model particles and fields trapped on
of all the forces and particles of nature are the two the brane, while gravity is free to access the bulk.
main goals of current work in fundamental physics. Brane-world models offer a phenomenological way to
The problem is that general relativity and quantum test some of the novel predictions and corrections to
field theory cannot simply be molded together. general relativity that are implied by M theory.
There is as yet no generally accepted (pre-)quantum
gravity theory.
The quest for a quantum gravity theory has a long
and thus far not very successful history. Many Higher-Dimensional Gravity
different lines of attack have been developed, each Brane worlds can be seen as reviving the original
having a different way of dealing with the classical higher-dimensional ideas of Kaluza and Klein in the
singularities that arise from point particles and 1920s, but in a new context of quantum gravity. An
smooth spacetime geometry. String theory does important consequence of extra dimensions is that
away with zero-dimensional point particles, and the four-dimensional Planck scale Mp M(4) =
particles are modeled as different states of new 1.2 1019 GeV is no longer the fundamental energy
fundamental objects, the one-dimensional strings. It scale of gravity. The fundamental scale is instead
turns out, however, that there is a price to pay the M(4d) . This can be seen from the modification of
number of spacetime dimensions must be greater the gravitational potential. For an EinsteinHilbert
than four for a consistent theory. When fermions are gravitational action,
included, which leads to superstring theory, the Z q
required number of dimensions is ten one time and 1
Sgravity 2 d4 x dd y 4d g
nine space dimensions. 24d
There are in fact five distinct 19-dimensional h i
superstring theories. In the mid-1990s, duality 4d R 24d 1
transformations were discovered that relate these
superstring theories to each other and to the 110- we have the higher-dimensional Einstein field
dimensional supergravity theory. This led to the equations,
conjecture that all of these theories arise as different 4d
GAB 4d RAB 124d R4d gAB
limits of a single theory, which has come to be
known as M theory. It was also discovered that 4d 4d gAB 24d 4d TAB 2
368 Brane Worlds
where xA = (xa , y1 , . . . , yd ) and 2(4d) is the gravita- fundamental scale is much less than the Planck
tional coupling constant given by scale felt in four dimensions. This emerges by virtue
of the large size of the extra dimensions. It is not
8
24d 8G4d 3 necessary for all extra dimensions to be of equal size
M2d
4d for this mechanism to operate. There are string
theory solutions (HoravaWitten solutions) with
The static weak field limit of the field equations
two 19-branes located at the boundaries of the
leads to the 4d-dimensional Poisson equation,
bulk, at the endpoints of an S1 =Z2 orbifold, that is,
whose solution is the gravitational potential
a circle folded on itself across a diameter. The
24d orbifold extra dimension is the large one, whereas
Vr / 4 the other six extra dimensions on the branes are
r1d
compactified on a very small scale, close to the
In the simplest scenario, we can assume a fundamental scale, and their effect on the
toroidal configuration for the d extra dimensions, dynamics is felt through moduli fields, that is,
with each compactified on the same length scale L. five-dimensional scalar fields.
Then on scales r . L, the potential is 4d- These solutions can be thought of as effectively
dimensional, V r(1d) . By contrast, on scales five dimensional, with an extra dimension that can
large relative to L, where the extra dimensions do be large relative to the fundamental scale. They
not contribute to variations in the potential, V behaves provide the basis for the RandallSundrum 1 (RS1)
like a four-dimensional potential, V Ld r1 . This phenomenological models of five-dimensional grav-
means that the usual Planck scale becomes an effective ity. The single-brane RandallSundrum 2 (RS2)
coupling constant, describing gravity on scales much models with infinite extra dimension arise when
larger than the extra dimensions, and related to the the orbifold radius tends to infinity. The RS models
fundamental scale via the volume of the extra are not the only phenomenological realizations of M
dimensions: theory ideas. They were preceded by the brane-
world models of Arkani-Hamed, Dimopoulos, and
M2p M2d
4d L
d
5 Dvali (ADD), which put forward the idea that a
large volume for the compact extra dimensions
would lower the effective Planck scale M(4d) . If
Large Extra Dimensions
M(4d) is close to the electroweak scale, Mew , then
If the extra-dimensional volume is significantly this would address the long-standing hierarchy
above the Planck scale, then the true fundamental problem, that is, why there is such a large gap
scale M(4d) can be much less than the effective scale between Mew 1 TeV and Mp 1016 TeV.
Mp , In the ADD models, more than one extra
dimension is required for agreement with experi-
Ld Md
p ) M4d Mp 6 ments, and there is democracy among the equiva-
In this case, we understand the weakness of gravity lent extra dimensions, which, in addition, are flat.
as due to the fact that it spreads into extra By contrast, the RS models have a preferred extra
dimensions, and only a part of it is felt in four dimension, with other extra dimensions treated as
dimensions. ignorable (i.e., stabilized except at energies near the
A lower limit on M(4d) is given by null results in fundamental scale). Furthermore, this extra dimen-
table-top experiments to test for deviations from sion is curved or warped rather than flat: the bulk
Newtons law in four dimensions, V / r1 . These is a portion of anti-de Sitter (AdS5 ) spacetime. The
experiments currently probe submillimeter scales, RS branes are Z2 -symmetric (mirror symmetry), and
and find no detectable deviation, so that have a tension, which serves to counter the influence
on the brane of the negative bulk cosmological
L . 101 mm 1015 TeV1 constant. This also means that the self-gravity of the
) M4d & 103215d=d2 TeV 7 branes is incorporated in the RS models. The novel
feature of the RS models compared to previous
Stronger bounds can be derived from null results in higher-dimensional models is that the observable
particle accelerators in some brane-world models, or three dimensions are protected from the large extra
from constraints imposed by observations of super- dimension (at low energies) by curvature (warping),
novae or of light-element abundance. rather than straightforward compactification.
Brane worlds, arising in the framework of string The RS brane worlds provide phenomenological
theory, thus incorporate the possibility that the models that reflect at least some of the features of
Brane Worlds 369
M theory, and that bring exciting new geometric The massless mode, h0 , is the usual four-
and particle physics ideas into play. The RS2 dimensional graviton mode. But there is a tower
models also provide a framework for exploring of massive modes, L1 , 2L1 , . . . , which
holographic ideas that have emerged in M theory. imprint the effect of the five-dimensional gravita-
Roughly speaking, holography suggests that tional field on the four-dimensional brane. Com-
higher-dimensional dynamics may be determined pactness of the extra dimension leads to
from a knowledge of the fields on a lower- discreteness of the spectrum. For an infinite
dimensional boundary. The AdS/CFT correspon- extra dimension, L ! 1, the separation between
dence is an example in which the classical the modes disappears and the tower forms a
dynamics of the higher-dimensional AdS gravita- continuous spectrum.
tional field are equivalent to the quantum
dynamics of a conformal field theory (CFT) on
the boundary.
RandallSundrum Brane Worlds
RS brane worlds do not rely on compactification to
localize gravity at the brane, but on the curvature of
KaluzaKlein Modes the bulk. What prevents gravity from leaking into
The dilution of gravity via extra dimensions not the extra dimension at low energies is a negative
only weakens gravity, it also broadens the range of bulk cosmological constant,
graviton modes felt on the brane. The graviton is 6
more than just the four-dimensional massless mode 5 62 12
2
of four-dimensional gravity other modes, with an
effective mass on the brane, arise from the fact where is the curvature radius of AdS5 and is the
that the graviton is a (4d)-dimensional massless corresponding energy scale. The bulk cosmological
particle. These extra modes on the brane are constant with its repulsive gravity effect acts to
known as KaluzaKlein (KK) modes of the squeeze the gravitational potential closer to the
graviton. brane. We can see this clearly in Gaussian normal
For simplicity, consider a flat brane with one flat coordinates xA = (x , y) based on the brane at y = 0,
extra dimension, compactified through the identi- for which the metric takes the form
fication y $ y 2nL, where n = 0, 1, 2, . . . . The 5
ds2 dy2 e2jyj= dx dx 13
perturbative five-dimensional graviton is defined
via with the Minkowski metric. The exponential
5
warp factor reflects the confining role of the bulk
AB ! 5 AB hAB 8 cosmological constant. The Z2 -symmetry about the
where (5) AB is the five-dimensional Minkowski metric brane at y = 0 is incorporated via the jyj term. In the
and hAB is a small transverse traceless perturbation. Its bulk, this metric is a solution of the five-dimensional
amplitude can be Fourier expanded as Einstein equations,
X 5
hxa ; y einy=L hn xa 9 GAB 5 5 gAB 14
n
that is, (5) TAB = 0 in eqn [2]. The brane is a flat
where hn are the amplitudes of the KK modes, that Minkowski spacetime, gAB (x , 0) = A B , with
is, the effective four-dimensional modes of the five- self-gravity in the form of brane tension.
dimensional graviton. To see that these KK modes The two RS models are distinguished as follows:
are massive from the brane viewpoint, we start from
the five-dimensional wave equation that the massless RS1 There are two branes in RS1, at y = 0 and
five-dimensional field h satisfies (in a suitable y = L, with Z2 -symmetry identifications
gauge): y $ y; yL$Ly 15
5 & 2
h 0 ) &h @ yh 0 10 The branes have equal and opposite tensions, ,
where
It follows that the KK modes satisfy a four-
2
dimensional KleinGordon equation with an effec- 3 Mp
tive four-dimensional mass, mn : 16
4 2
n The positive-tension TeV brane has fundamental
&hn m2n hn ; mn 11
L scale M(5) 1 TeV. Because of the exponential
370 Brane Worlds
h
warping factor, the effective scale on the negative
hm y e2y= Bm J2 mley=
tension Planck brane at y = L is Mp . On the i
positive tension brane, Cm Y2 mley= 27
h i
M2p M35 1 e2L= 17 where J2 , Y2 are Bessel functions.
The boundary condition for the perturbations is
So RS1 gives a new approach to the hierarchy h0 (t, 0) = 0, which implies
problem. Because of the finite separation between
J1 m
the branes, the KK spectrum is discrete. C0 0; Cm Bm 28
Y1 m
RS2 In RS2, there is only one, positive-
tension, brane. This may be thought of as arising In the RS1 model, we have a further boundary
from sending the negative tension brane off to condition, h0 (t, L) = 0, which leads to a discrete
infinity, L ! 1. Then the energy scales are eigenspectrum, namely the masses m that satisfy
related via
J1 meL= Y1 m Y1 meL= J1 m 0 29
M2p
M35 18 The zero mode is normalizable, since
Z 1
On the RS2 brane, the negative (5) is offset by B e2y=
dy <1 30
0
the positive brane tension . The fine-tuning in eqn 0
[16] ensures that there is zero effective cosmological
Its contribution to the gravitational potential
constant on the brane, so that the brane has the
V = 1/2h00 gives the four-dimensional result, V /
induced geometry of Minkowski spacetime. To see
r1 . The contribution of the massive KK modes sums
how gravity is localized at low energies, we consider
to a correction of the four-dimensional potential.
the five-dimensional graviton perturbations of the
For r , one obtains
metric:
5
GM GM
gAB ! 5 gAB hAB Vr
1
2 31
19 r r r
hAy 0 h @ h
which simply reflects the fact that the potential
We split the amplitude h into three-dimensional becomes truly five dimensional on small scales. For
Fourier modes, and the linearized five-dimensional r ,
Einstein equations lead to the wave equation (y > 0)
h i GM 22
Vr
1 2 32
e2y= h k2 h h00 4 h0 20 r 3r
which gives the small correction to four-dimensional
Separability means we can write gravity at low energies from extra-dimensional effects.
X
ht; y m t hm y 21
m
the brane. More precisely, the junction conditions E , the projection of the bulk Weyl tensor on the
across the brane are brane, encodes corrections from KK or five-
dimensional graviton effects. From the brane-
g
g 0 33
observer viewpoint, the energymomentum
h i corrections in S are local, whereas the KK
K K
2
T brane
1 brane
T g 34 corrections in E are nonlocal, since they
5 3
incorporate five-dimensional gravity wave
where modes. These nonlocal corrections cannot be
brane
determined purely from data on the brane. In
T T g 35 the perturbative analysis of RS2 which leads to
is the total energymomentum tensor on the brane the corrections in the gravitational potential, eqn
and T brane = g T
brane
. The Z2 -symmetry means that [32], the KK modes that generate this correction
when approaching the brane from one side and are responsible for a nonzero E ; this term is
going through it, one emerges into a bulk that looks what carries the modification to the weak-field
the same, but with the normal reversed. This implies field equations.
that The effective field equations are not a closed system.
One needs to supplement them by five-dimensional
K
K
36
equations governing E , which are obtained from the
so that we can use the junction condition (eqn [34]) five-dimensional Einstein equations.
to determine the extrinsic curvature:
K 1225 T 13 Tg 37 Cosmological Dynamics
where T = T , we have dropped the () and we A (14)-dimensional spacetime with spatial
evaluate quantities on the brane by taking the limit 4-isotropy (four-dimensional spherical/ plane/
y ! 0. hyperbolic symmetry) has a natural splitting into
Together with the GaussCodazzi equations, eqn [37] hypersurfaces of symmetry, which are (13)-
leads to the induced field equations on the brane: dimensional surfaces with 3-isotropy and
3-homogeneity, that is, FriedmannRobertson
2
G g 2 T 6 S E 38 Walker (FRW) surfaces. In particular, the AdS5
z bulk of the RS2 brane world, which admits a
where foliation into Minkowski surfaces, also admits an
FRW foliation since it is 4-isotropic. The general-
2 24 1645 39
ization of AdS5 that preserves 4-isotropy and
solves the five-dimensional Einstein equation is
4 12 5 2 40 Schwarzschild AdS5 , and this bulk therefore
admits an FRW foliation. It follows that an
1
S 12 TT 14T T FRW cosmological brane world can be embedded
in Schwarzschild AdS5 spacetime.
1
24 g 3T
T
T 2 41
The black hole in the bulk is felt on the brane
and via the E term. The bulk black hole gives rise to
dark radiation on the brane via its Coulomb
E 5 CACBD nC nD g A g B 42
effect. The FRW brane can be thought of as
A moving radially along the fifth dimension, with the
where n is the unit normal to the brane and
(5) junction conditions determining the velocity via
CACBD is the Weyl tensor in the bulk.
The induced field equations [38] show two key the Friedmann equation. Thus, one can interpret
modifications to the standard four-dimensional Einstein the expansion of the universe as motion of the
field equations arising from extra-dimensional effects. brane through the static bulk. In the special case
of no black hole and no brane motion, the brane is
S (T )2 is the high-energy correction term,
empty and has Minkowski geometry, that is, the
which is negligible for , but dominant for original RS2 brane world is recovered, in different
(where is the energy density): coordinates.
An intriguing aspect of the cosmological metric is
j2 S =j jT j
43 that five-dimensional gravitational wave signals can
j2 T j take shortcuts through the bulk in traveling
372 Brane Worlds
between points A and B on the brane. The travel This is much weaker than the limit imposed by
time for such a graviton signal is less than the time table-top experiments, which limit the curvature
taken for a photon signal (which is stuck to the radius to . 0.2 mm, leading to
brane) from A to B.
Cosmological dynamics on the brane are governed > 100 GeV4 ) M5 > 108 GeV 47
by the modified Friedmann equation:
The high-energy regime during radiation domina-
2
m 1 K tion is short-lived. Since 2 = decays as a8 during the
H2 1 4 2 44 radiation era, it will rapidly drop below one, and the
3 2 a 3 a
universe will enter the low-energy four-dimensional
regime. However, traces of the high-energy era may be
where H = a=a is the Hubble expansion rate, a(t) is
left in the perturbation spectra that leave an imprint in
the scale factor, K is the curvature index, and m is
the cosmic microwave background radiation.
the mass of the bulk black hole.
In conclusion, simple brane-world models of RS2
The 2 = term is the high-energy term. When
type provide a rich phenomenology for exploring
, in the early universe, then H 2 / 2 . This means
some of the ideas that are emerging from M theory.
that a given energy density produces a greater rate of
The higher-dimensional degrees of freedom for the
expansion that it would in standard four-dimen-
gravitational field, and the confinement of standard
sional gravity. As a consequence, inflation in the
model fields to the visible brane, lead to a complex
early universe is modified in interesting ways, some
but fascinating interplay between gravity, particle
of which may leave a signature in cosmological
physics, and geometry, which enlarges and enriches
observations.
general relativity in the direction of a quantum
The m=a4 term in eqn [44] is the dark
gravity theory. High-precision astronomical data
radiation, so called because it redshifts with
mean that cosmology is a potential laboratory for
expansion like ordinary radiation. But, unlike
testing and constraining these brane worlds. The
ordinary radiation, it is not a form of detectable
models predict extra-dimensional signatures in the
matter, but the imprint on the brane of the
cosmic microwave background and other observa-
gravitational field in the bulk (the Coulomb effect
tions, and these predictions can in principle be tested
of the bulk black hole). This additional effective
against data.
relativistic degree of freedom is constrained by
nucleosynthesis in the early universe. Any extra See also: String Theory: Phenomenology; Supergravity;
radiative energy not thermally coupled to radiation Superstring Theories.
affects the rate of production of light elements, and
observed abundances place tight constraints on
such extra energy. The dark radiation can be no
more than 3% of the radiation energy density at Further Reading
nucleosynthesis: Brax P and van de Bruck C (2003) Cosmology and brane worlds:
a review. Classical and Quantum Gravity 20: R201 (arXiv:
3m hep-th/0303095) (arXiv: hep-th/0303095).
. 0:03 45
2 nuc Cavaglia M (2003) Black hole and brane production in TeV
gravity: a review. International Journal of Modern Physics
The other modification to the Hubble rate is via A18: 1843 (arXiv:hep-ph/0210296).
Langlois D (2003) Cosmology in a brane-universe. Astrophysics
the high-energy correction =. In order to recover and Space Science 283: 469 (arXiv:astro-ph/0301022).
the observational successes of general relativity, the Maartens R (2004) Brane-world gravity. Living Reviews in
high-energy regime where significant deviations Relativity 7: 7 (arXiv:gr-qc/0312059).
occur must take place before nucleosynthesis, that Quevedo F (2002) Lectures on string/brane cosmology. Classical
is, cosmological observations impose the lower and Quantum Gravity 19: 5721 (arXiv:hep-th/0210292).
Rubakov V (2001) Large and infinite extra dimensions. Physics-
limit Uspekhi 44: 871 (arXiv:hep-ph/0104152).
Wands D (2002) String-inspired cosmology. Classical and
> 1 MeV4 ) M5 > 104 GeV 46 Quantum Gravity 19: 3403 (arXiv:hep-th/0203107).
Branes and Black Hole Statistical Mechanics 373
level, there are, in addition, extended objects of gauge, the off-diagonal gauge fields and their super-
other dimensionalities. Duality symmetries imply symmetric partners (which include scalar fields in
that these extended objects are as fundamental the adjoint representation) are the low-energy
as the strings themselves. Such extended objects are degrees of freedom of open strings which connect
also called branes. For an exhaustive account of different branes.
branes in string theory, see Johnson (2003). The mass density or tension Tp of a single Dp
Like their counterparts in supergravity, branes in brane is given by
string theory are typically charged with respect to
1
some gauge fields. While supergravity solutions are Tp p1
16
possible with any value of the charge, in string gs 2p ls
theory the brane charges have to be quantized. This couples to the (p 1)-form gauge field with a
Multiple units of the minimum quantum of charge charge
can appear as collections of branes each with unit
charge or, alternatively, branes which wrap around p gs Tp 17
compact cycles in space a multiple number of times. and the YangMills coupling constant for the collec-
tive theory on the brane world volume is given by
D-Branes 2
gYMDp 2p2 gs lsp3 18
The extended objects in string theory are described
in terms of their collective excitations. These The ground state of a single Dp brane is a BPS state
are best understood for the class of branes called which preserves 16 of the 32 supersymmetries of the
D-branes in the type II theory, discovered by original theory. One consequence of this is that two or
Polchinski. These are D1, D3, D5, and D7 branes more parallel Dp branes of the same type form a
in type IIB and D0, D2, D4, and D6 branes in threshold bound state preserving the same supersym-
type IIA theory. Dp branes are characterized by the metries, with no net force between them. As a result, the
fact that they couple to, and act as sources for, tension of N parallel Dp branes is simply NTp .
(p 1)-form gauge fields which belong to the Branes of different dimensionalities can also form
RamondRamond sector of the theory. Collective bound states. Of particular interest are configura-
excitations of a p-dimensional extended object in tions which can form threshold bound states which
field theory are expected to be described by waves preserve some supersymmetries. For example, a set
on its (p 1)-dimensional world volume. The of N1 parallel Dp branes can form a threshold
collective coordinate action would be a quantum bound state with a set of N2 parallel D(4 p)
field theory which has vectors, corresponding branes with all the p branes lying entirely along the
to longitudinal oscillations of the brane, and (4 p)-branes. This configuration is also a BPS
scalars which correspond to transverse oscillations. saturated state preserving eight of the original
For D-branes in string theory, the theory of supersymmetries and would have charges under
collective excitations is a string field theory of open both (p 1)-form and (p 5)-form gauge poten-
strings whose endpoints lie on the brane. (This is the tials. The BPS nature ensures that the total mass
origin of the nomenclature D-brane: an open string density is the sum of the individual mass densities.
whose ends are constrained to lie on the brane has a
NS Branes
world-sheet description in which the bosonic
fields corresponding to transverse target space The other extended objects in string theory are
coordinates have Dirichlet boundary conditions.) called NS branes since they couple to p-form
The lowest-energy states of open superstrings are gauge fields which arise from the NeveuSchwarz/
ordinary massless gauge fields and their supersym- NeveuSchwarz sector of the world-sheet theory.
metric partners so that the low-energy limit of These are present in all the five string theories and
the string field theory is a supersymmetric gauge appear in two types. The first is a macroscopic
theory. fundamental string which may be wound around a
The fact that the underlying theory is a string compact direction. The second is called a solitonic
theory has an important consequence. For a system 5-brane. While the collective dynamics of a funda-
of N parallel D-branes of the same type, one mental string is the standard world-sheet description
would have open strings which join different branes of string theory, the description for the NS 5-brane
as well as the same brane. The low-energy is rather complicated and not known in full
theory then becomes a supersymmetric nonabelian detail. The rest of this article deals exclusively with
gauge theory with gauge group U(N). In a suitable D-branes.
Branes and Black Hole Statistical Mechanics 377
where the sign refers to fermions and bosons, The key point, however, is that the two-charge
respectively, and we have introduced left- and right- solution has a singular horizon where the string
moving temperatures TL , TR . The physical tempera- frame curvature is large. Consequently, low-energy
ture is tree-level supergravity breaks down near the horizon
and higher-derivative terms (e.g., higher powers of
1 1 1 1 curvature) become important. This issue has been
20
T 2 TL TR best studied for the fundamental heterotic string
compactified on T 6 . This is dual to the D1D5
The extensive quantities, such as the energy E, system in type IIB theory compactified on K3 T 2 .
momentum P, and entropy S, then become the sum The classical supergravity solution is then a singular
of left- and right-moving pieces: black hole in four spacetime dimensions. In one of
E EL ER ; P P L PR ; S SL SR 21 the first papers on the string-theoretic understanding
of black hole thermodynamics, Sen (1995) showed
and the distribution function [19] leads to the that, for large np , nw , string-loop effects are small
following thermodynamic relations: near the horizon so that the only relevant correc-
s tions are higher-derivative terms coming from
3Ei 4Si integrating out the massive modes of the string at
Ti ; i L; R 22 tree level. Furthermore, a robust scaling argument
Lf fL
shows that regardless of the detailed nature of the
Since the total momentum P = PR PL = ER EL is derivative corrections, the macroscopic entropy
nonzero, the lowest-energy state is clearly the one in defined through the horizon area must be of the
p
which all the particles move in the same direction, form a np nw , where a is a pure number. Finally,
for example, right moving. This is a BPS state and one can define a stretched horizon as the surface
corresponds to the extremal solution in supergravity. where the curvature becomes of the order of the
Then E = ER = P = PR . This approach to the black string scale and the area of the stretched horizon
p
hole entropy was initiated by Das and Mathur is indeed proportional to np nw . This result gives
(1996) and Callan and Maldacena (1996). a strong indication that string theory provides a
For our two-charge system, f = 8, P = 2Q1 =L, microscopic basis for black hole thermodynamics,
and L = 2RQ1 Q5 . Using [22] we get although the coefficient a cannot be determined
without more detailed knowledge of higher-
p
2-charge-II
Smicro 2 2Q1 Q5 23 derivative terms.
orbifold (T 4 )Q1 Q5 =S(Q1 Q5 ) or (K3)Q1 Q5 =S(Q1 Q5 ) times. Thus, the thermodynamics may be analyzed
and is 4Q1 Q5 dimensional. Since any instanton exactly along the lines of the fundamental string in
configuration is independent of time x0 and the S1 the previous section. The thermodynamic relations
direction x5 , the collective coordinate dynamics is a are given by [22] with f = 4 and L = 2RQ1 Q5 . The
(1 1)-dimensional field theory which lives in the extremal state consists entirely of right movers and
(x0 , x5 ) space. At low energies, this flows to a E = ER = N=R. Substituting these values in [22]
conformal field theory with a central charge yields the correct formula for the microscopic
c = 6Q1 Q5 since there are 4Q1 Q5 bosons each entropy
contributing 1 to the central charge and an equal p
number of fermions each contributing 1=2. The BPS S3-charge
micro 2 Q1 Q5 N 27
state with momentum N=R is a purely right- or left-
moving state in this conformal field theory which The same expression follows if f = 4Q1 Q5 and
has a conformal weight N. From general principles L = 2R corresponding to Q1 Q5 singly wound
of conformal invariance, the degeneracy of such strings. However, for statistical methods to hold,
states for large N is given by Cardys formula the entropy must be much larger than the number of
p flavors. The ratio p of
the entropy to the number of
dN e2 cN=6 25 flavors is S=f N=Q1 Q5 for multiple singly
wound strings and is not guaranteed to be large
so that the microscopic entropy is when all of Q1 , Q5 , p N
are large. On the other hand,
p this ratio is S=f Q1 Q5 N for the long string.
Smicro
3-charge log dn 2 cN=6 26
This shows that the long string is always entropi-
Substituting the value of c = 6Q1 Q5 , this is in exact cally favored.
agreement with the BekensteinHawking entropy of A departure from the extremal state is achieved by
the classical solution given in [15]. adding a left-moving momentum 2n=L as well as a
right-moving momentum 2n=L to the extremal
state, thus adding energy to the system but main-
Nonextremal Black Holes and Hawking taining the total momentum. For the long string, this
Radiation yields
The BPS property of ground states of D-brane p p
systems enables us to compute the degeneracy of SR 2 Q1 Q5 N n; SL 2 n 28
microstates exactly in the regime of parameters For small departures from extremality, n N, the
where the state can be reliably described as a black expressions for the total entropy and temperature as
hole solution in the low-energy theory. However, a function of the excess energy E = 2n=Q1 Q5
extremal black holes have vanishing temperature agree exactly with the near-extremal Bekenstein
and do not radiate. To understand the microscopic Hawking entropy and the Hawking temperature of
origins of Hawking radiation, one has to go away the classical solution, as shown by Callan and
from extremality. Such states are not supersym- Maldacena (1996) and by Horowitz and
metric and an extrapolation of weak-coupling Strominger.
calculations to strong coupling is not a priori The necessity of the long string appears in another
justified. Nevertheless, it turns out that for small important physical consideration. For statistical
departures from extremality, weak-coupling results mechanics to be valid, the specific heat of the system
still reproduce semiclassical answers for entropy, has to be larger than unity. This implies that for
temperature, and luminosity. the case considered here the energy gap E must be
larger than 1=RQ1 Q5 , which is precisely what the
Near-Extremal Entropy
long string yields.
Nonextremal properties are best understood for the
D1D5N system on T 4 S1 . In the orbifold limit,
Hawking Radiation
the conformal field theory which describes the low-
energy dynamics is equivalent to a gas of strings A nonextremal state described above is unstable,
which are wound around the S1 and which can since a left mover can annihilate a right mover into a
oscillate along the T 4 . The total winding number is closed-string mode which may leave the brane
k = Q1 Q5 and may be achieved by sets of strings system and propagate to the asymptotic region.
which are multiply wound in various ways. As The resulting closed-string state will be in a thermal
argued below, entropically the most favored config- state whose temperature is the physical temperature
uration is a single long string wound around Q1 Q5 of the initial state. This process is the microscopic
380 Branes and Black Hole Statistical Mechanics
description of Hawking radiation. The decay rate is supersymmetric and therefore a naive extrapolation
related to the absorption cross section of the to strong coupling is not a priori justified. There
corresponding mode by the principle of detailed are strong indications, however, that low-energy
balance, encoded in eqn [4]. nonrenormalization theorems are at work. This
From the point of view of the classical solution, agreement has been established not only for black
the absorption cross section can be calculated by holes with finite-horizon areas, but also for other
solving the linearized wave equation in the systems with no horizons most significantly, a set
background geometry and calculating the ratio of of parallel 3-branes and forms the basis for
the incident and reflected waves. It follows from Maldacenas conjecture about AdS/CFT Correspon-
these calculations that at low energies, absorption dence (see AdS/CFT Correspondence).
(and hence emission) are dominated by massless
minimally coupled scalars. In fact, for any spheri-
cally symmetric black hole in any number of Effects of Higher-Derivative Terms
dimensions, there is a general theorem which The classical low-energy limit of string theory is
ensures that the low-energy limit of this absorption supergravity. The effects of the massive modes of the
cross section is exactly equal to the horizon area. string as well as effect of string loops is to add terms to
In the microscopic model for the three-charge the supergravity action which involve higher number
black hole, this absorption cross section may be of spacetime derivatives, for example, terms containing
calculated by the usual rules of quantum mechanics. higher powers of the curvature. In the presence of such
In the long-string limit and in the approximation terms, the BekensteinHawking formula for black hole
that the modes on the long string form a dilute gas, entropy [2] receives corrections which can be calcu-
the result has been derived by Das and Mathur lated in a systematic fashion. It turns out that for a
(1996): class of extremal black holes, this corrected entropy as
2G10 Q1 Q5 e!=T 1 computed in the modified supergravity is also in exact
! ! !=2T 29 agreement with a microscopic calculation.
V e R 1e!=2TL 1
One example of this agreement is provided by four-
where V is the volume of the T 4 and T is the dimensional extremal black holes in type IIA string
physical temperature given by [20]. For a near- theory compactified on a CalabiYau manifold. These
extremal hole TR TL , so that T 2TL . Then are obtained by wrapping D4 branes on three different
in the extreme low-energy limit ! TR , so that 4-cycles on the CalabiYau and having in addition a
the corresponding Bose factor may be approxi- number of D0 branes. Let pA , A = 1, . . . ,3 denote the
mated as 1=(e!=2TR 1) 2TR =!. The cross three D4 charges and q0 denote the D0 charge. The
section [29] becomes microscopic entropy of the BPS state can be computed
by embedding this in M-theory:
4Q1 Q5 G10 TR 4G10 SR
SCYBlack
micro
hole
V 2RV r
1
4G5 Sextremal AH 30 2 jq0 jCABC pA pB pC c2A pA 31
6
where G5 is the five-dimensional Newtons gravita- where CABC is the intersection number of the
tional constant. We have used the relation [22] with 4-cycles and c2 denotes the second Chern class of
L = 2RQ1 Q5 and f = 4. The fact that in the near- the CalabiYau space. When all the charges pA are
extremal limit SR is simply the extremal entropy and large, the term involving c2 is subdominant. In this
the fact that the extremal entropy reproduces the case, the result agrees with the BekensteinHawking
BekensteinHawking formula has been used as well. entropy of the corresponding classical solution.
Thus, the microscopic cross section exactly reproduces When the charges are not all large (so that the
the semiclassical result at low energies. Even more second term is appreciable), the curvatures of the
remarkably, the full cross section [29] agrees with the supergravity solution become large at the horizon
semiclassical answer for the gray-body factor for and higher-derivative corrections to the action
parameters which correspond to the dilute-gas regime, cannot be ignored. In this particular case, it turns
as shown by Maldacena and Strominger. out that these higher-derivative corrections are
It is rather surprising that the results for micro- string-loop corrections and can be computed using
scopic absorption cross section calculated at weak general properties of N = 2 supersymmetry, so that
coupling agree with the semiclassical answers, since one can compute corrections to near-horizon
the relevant process involves states which are not geometry. Furthermore, one has to now modify the
Branes and Black Hole Statistical Mechanics 381
expression for macroscopic entropy using the open strings. This is a consequence of the basic
formalism of Wald. Putting these together, it is duality between open strings and closed strings.
found that the macroscopic entropy following from Furthermore, the open-string theory lives in a lower-
the modified supergravity is in exact agreement with dimensional spacetime. This is a manifestation of
[31]. This subject is reviewed in Mohaupt (2000). the holographic principle. As argued by Maldacena,
These methods have also been applied to the the presence of a horizon implies that the low-
problem of two-charge black holes in heterotic energy limit retains all the modes of the closed
string theory on T6 or, equivalently, type IIA on strings near the horizon, while it truncates the open-
K3 T 2 (Dabholkar 2004). Recall that in this case string theory to a gauge theory. Openclosed duality
the horizon of the usual supergravity solution is then reduces to gaugestring duality. This provides a
singular. It has been found that leading-order strong evidence that black holes obey the normal
higher-derivative corrections smoothen out the laws of quantum mechanics and hence their time
horizon into a AdS2 S2 spacetime and the evolution is unitary.
modified expression for the macroscopic entropy is One of the most outstanding problems in the
again in exact agreement with the microscopic subject is a proper understanding of neutral black
answer [23]. holes. Most of the quantitative results described
above depend on supersymmetry, which allows
extrapolation of weak-coupling answers to the
Geometry of Microstates
strong-coupling domain. Some of these results can
A satisfactory solution of the information-loss be extended to situations which have small depar-
paradox requires a much more detailed understand- tures from supersymmetry, for example, near-
ing of black holes in string theory. The discussion extremal black holes. States corrresponding to
above shows that black holes have microstates neutral black holes are, however, far from super-
which may be described well in the weak-coupling symmetry and known calculational techniques fail.
regime. It is interesting to ask whether there is a There are good reasons to expect, however, that the
description of these microstates in the strong- general philosophy in particular the holographic
coupling regime in terms of the effective geometry principle is still valid. Finally, so far string theory
perceived by suitable probes. This question has been has been able to attack problems of eternal black
answered for the two-charge system in great detail holes. A satisfactory understanding of the informa-
(see Mathur (2004)). It turns out that the D1D5 tion-loss problem requires an understanding of the
microstates can be described by perfectly smooth dynamics of black hole formation and subsequent
metrics with no horizons, and they asymptote to evaporation. Unfortunately, very little is known
the standard two-charge metric discussed above. about this at the moment.
The location of the erstwhile stretched horizon
marks the point where the different microstates See also: AdS/CFT Correspondence; Black Hole
start differing from each other significantly. Since Mechanics; Supergravity; Superstring Theories.
each such geometry does not have a horizon, neither
does it have any entropy this is consistent with
their identification with nondegenerate microstates. Glossary
Indeed, the number of such microstates correctly ADM (ArnowittDeserMisner) mass Mass of a gravita-
accounts for the microscopic entropy. Whether a tional background which is asymptotically flat.
similar picture holds for the three-charge system AdSn (anti-de Sitter space) A space (or spacetime) with
remains to be seen in detail, although there are some constant negative curvature in n dimensions.
indications that this may be true. In this approach, it BPS state (BogomolnyPrasadSommerfeld state) In a
is not yet fully understood how a horizon emerges theory of extended supersymmetry, a state that is
and why the entropy scales as the horizon area. invariant under a nontrivial subalgebra of the full
supersymmetry algebra. These states always carry
conserved charges, and supersymmetry determines the
Outlook mass exactly in terms of the charges.
CalabiYau space Complex Kahler manifold with
One key feature of the understanding of black hole
vanishing first Chern class.
statistical mechanics from the dynamics of branes is Compactify (n. compactification) To consider a field or
the fact that a problem in gravity is mapped to a string theory in a spacetime some of whose spatial
problem in a theory without gravity, for example, dimensions are compact.
open-string field theory. In fact, the closed strings in Dirichlet boundary condition The boundary condition
the bulk are already contained in the spectrum of the which fixes the value of a field on the boundary.
382 Branes and Black Hole Statistical Mechanics
Duality Equivalence of systems which appear to be Threshold bound state A bound state which is margin-
distinct. For string theories, such equivalences relate ally bound, that is, the binding energy is zero.
string theories on different spacetimes as well as Tree level In a Feynman diagram expansion of a field
theories with different coupling constants. theory, terms which contribute to lowest order of the
EinsteinHilbert action The standard action for gravity Planck constant h.
which leads
R to Einsteins equation, U(N) The group of N N unitary matrices. If the
p
S = (1=16G) dd x gR, where R is the Ricci scalar, determinant is unity, the subgroup is called SU(N).
g denotes the determinant of the metric, and G is
Newtons gravitational constant.
Instanton A classical solution of Euclidean field theory Further Reading
with finite action.
KaluzaKlein gauge field In a compactified theory, the Callan CG and Maldacena M (1996) D-brane approach to black
gauge field which arises from the metric of the higher- hole quantum mechanics. Nuclear Physics B 472: 591
dimensional theory. (arXiv:hep-th/9602043).
Dabholkar A (2004) Exact counting of black hole microstates,
K3 The unique CalabiYau manifold in four dimensions
arXiv:hep-th/0409148.
having an SU(2) holonomy. Das SR and Mathur SD (1996) Comparing decay rates for black
Loop levels In a Feynman diagram expansion of a field holes and D-branes. Nuclear Physics B 478: 561 (arXiv:hep-
theory, terms which contribute in higher orders of the th/9606185).
Planck constant h. Das SR and Mathur SD (2001) The quantum physics of black
Macroscopic entropy Entropy associated with gravita- holes: results from string theory. Annual Review of Nuclear
tional backgrounds via the BekensteinHawking for- and Particle Science 50: 153 (arXiv:gr-qc/0105063).
mula or its generalization. David JR, Mandal G, and Wadia SR (2002) Microscopic
Microscopic entropy Entropy which follows from the formulation of black holes in string theory. Physics Reports
degeneracy of states of a system via Boltzmanns 369: 549 (arXiv:hep-th/0203048).
Dijkgraaf R, Moore GW, and Verlinde E (1996) Elliptic genera of
relation.
symmetric products and second quantized strings. Commu-
Minimally coupled scalar A scalar field whose equation nications in Mathematical Physics 185: 197 (arXiv:hep-th/
of motion is the standard KleinGordon equation 9608096).
where the derivatives are covariant derivatives. t Hooft G (1990) The black hole interpretation of string theory.
NeveuSchwarz/NeveuSchwarz states In type I and II Nuclear Physics B 335: 138.
string theories, bosonic closed-string states whose left- Johnson C (2003) D-Branes. Cambridge: Cambridge University
and right-moving parts are bosonic. Press.
No-hair theorem A theorem in general relativity which Maldacena JM (1996) Black holes in string theory, arXiv:hep-th/
states that black holes with nonsingular horizons are 9607235.
uniquely characterized by their mass, angular Maldacena J, Strominger A, and Witten E (1997) Black hole
entropy in M-theory. Journal of High Energy Physics
momenta, and charges which can couple to long-
9712: 002 (arXiv:hep-th/9711053).
range gauge fields. Mathur SD (2004) Where are the states of a black hole?,
Orbifold A coset space M=G where G is a group of arXiv:hep-th/0401115.
discrete symmetries of a manifold M. If G has a fixed Mohaupt T (2000) Black hole entropy, special geometry
point, the space is singular. and strings. Fortschritte der Physik 49: 3 (arXiv:hep-th/
p-Form A fully antisymmetric p-index tensor. 0007195).
RamondRamond states In type I and II string theories, Sen A (1995) Extremal black holes and elementary string states.
bosonic closed-string states whose left- and right- Modern Physics Letters A 10: 2081.
moving parts are fermionic. Strominger A and Vafa C (1996) Microscopic origin of the
ReissnerNordstrom black hole Black hole solution of BekensteinHawking entropy. Physics Letters B 379: 99
(arXiv:hep-th/9601029).
general relativity with electric Maxwell charge.
Susskind L (1993) Some speculations about black hole entropy in
Sn n-Dimensional sphere. string theory, arXiv:hep-th/9309145.
Supergravity Supersymmetric extension of general Wald R (1994) Quantum Field Theory In Curved Space-Time and
relativity. Black Hole Thermodynamics. Chicago, IL: University of
Supersymmetry A symmetry between bosons and Chicago Press.
fermions.
Breaking Water Waves 383
Introduction
Watching the sea or a lake it is often possible to The Governing Equations
trace a wave as it propagates on the waters surface.
One can roughly distinguish two types of breaking The water waves that one typically sees propagating
waves. All waves break while reaching the shore but on the surface of the sea or on a lake are, as a matter
certain waves break far from the shore. In the first of common experience, approximately two dimen-
case, the change in water depth or the presence of an sional. That is, the motion is identical in any direction
obstacle (e.g., a rock) seems to cause wave breaking, parallel to the crest line. To describe these waves, it
while for certain waves within the second category, suffices to consider a cross section of the flow that is
these factors appear not to be essential. It is a matter perpendicular to the crest line. Choose Cartesian
of observation that for many waves that break in the coordinates (x, y) with the y-axis pointing vertically
open water a drastic increase in their slope near upwards and the x-axis being the direction of wave
breaking is noticeable. This leads us to the following propagation, while the origin lies at the mean water
mathematical definition: the wave profile gradually level. Let (u(t, x, y), v(t, x, y)) be the velocity field of
steepens as it propagates until it develops a point the flow, let y = d be the flat bed (for some fixed
where the slope is vertical and the wave is said to d > 0), and let y = (t, x) be the waters free surface.
have broken (Whitham 1980). Throughout this Homogeneity (constant density) is a physically reason-
article, we are concerned with wave breaking that able assumption for gravity waves (Johnson 1997),
is not caused by a drastic change of the topography and it implies the equation of mass conservation
of the bottom; for a discussion of wave breaking at ux vy 0 1
the beach we refer to Johnson (1997). The governing
equations for water waves (see the next section) are The inviscid setting is realistic since experimental
too difficult to be dealt with in their full generality. evidence confirms that the length scales associated
Therefore, to gain some insight, one has to find with an adjustment of the velocity distribution due to
simpler models that are more tractable mathemati- laminar viscosity or turbulent mixing are long com-
cally. Investigating the properties of the model, pared to typical wavelengths. Under the assumption of
certain predictions can be made. The conclusions inviscid flow the equation of motion is Eulers equation
reached will reflect reality only to some limited ut uux vuy Px
extent. The value of a model depends on the number 2
and the degree of accuracy of physically useful vt uvx vvy Py g
deductions that can be made from it the truth of where P(t, x, y) denotes the pressure and g is the
the model is meaningless as all experiments contain gravitational constant of acceleration. The free
inaccuracies and effects other than those accounted surface decouples the motion of the water from
for (while deriving the model) cannot be totally that of the air so that (Johnson 1997) the dynamic
excluded. We intend to discuss the way in which a boundary condition
recent model due to Camassa and Holm (1993) can
lead to a better understanding of breaking water P P0 on y t; x 3
waves. Firstly we survey a few classical nonlinear must hold if we neglect surface tension, where P0 is
partial differential equations that model the propa- the (constant) atmospheric pressure. Moreover,
gation of water waves over a flat bed (within the since the same particles always form the free surface,
confines of the linear theory one cannot cope with we have the kinematic boundary condition
the wave breaking phenomenon) and discuss their
relevance to the study of breaking waves. We then v t ux on y t; x 4
analyze the breaking of waves within the context of On the flat bed we have the kinematic boundary
the CamassaHolm equation: existence of breaking condition
waves, criteria that guarantee that a certain initial
shape develops into a breaking wave, specific v0 on y d 5
384 Breaking Water Waves
expressing the fact that the flow is tangent to the yields an equation that is usually of significance in
horizontal bed (or, equivalently, that water cannot some region of space/time. The aim of this process is to
penetrate the rigid bed). The governing equations obtain a simpler model that can be used to gain some
for water waves are [1][5]. Other than the fact that understanding and to make some predictions for
they are highly nonlinear, a main difficulty in specific physical processes. This scaling method yields
analyzing the governing equations lies in the fact the Kortewegde Vries (KdV) equation
that we deal with a free boundary problem: the free
surface y = (t, x) is not specified a priori. In our t x xxx 0; t > 0; x 2 R 7
discussion, we suppose that initially (at time t = 0), a as a model for the unidirectional propagation of
disturbance of the flat surface of still water was shallow water waves over a flat bed (Johnson 1997).
created and we analyze the subsequent motion of In [7] the function (t, x) represents the height of the
the water. The balance between the restoring gravity waters free surface above the flat bed. We would
force and the inertia of the system governs the like to emphasize that the shallow water regime
evolution of the mass of water and our primary does not refer to water of insignificant depth it
objective is the behavior of the free surface. indicates that the typical wavelength is much larger
An important category of flows are those of zero than the typical depth (e.g., tidal waves are
vorticity, characterized by the additional assumption considered to be shallow water waves although
uy v x 6 they affect the motion of the deep sea). The KdV
model admits the solitary wave solutions
The vorticity of a flow, ! = uy vx , measures the local p
spin or rotation of a fluid element. In flows for which c
c t; x 3c sech2 x ct ; c 2 R 8
[6] holds the local whirl is completely absent and for 2
this reason such flows are called irrotational. Relation
[6] ensures the existence of a velocity potential, namely For any fixed c > 0, the profile c propagates without
a function (t, x, y) defined up to a constant via change of form at constant speed c on the surface on
the water, that is, it represents a traveling wave. Since
x u; y v the profiles [8] of the traveling waves drop rapidly to
Notice that [1] ensures that is a harmonic the undisturbed water level = 0 ahead and behind the
function, that is, (@x2 @y2 ) = 0. In this way, the crest of the wave, c are called solitary waves. Notice
powerful methods of complex analysis become that [8] shows that taller solitary waves travel faster.
available for the study of irrotational flows. Thus, They have other special properties: an initial profile
while most water flows are with vorticity, the study consisting of two solitary waves, with the taller
of irrotational flows can be defended mathemati- preceding the smaller one, evolves in such a way that
cally on grounds of beauty. Concerning the physical the taller wave catches up the other, there is a period of
relevance of irrotational water flows, experimental complicated nonlinear interaction but eventually both
evidence indicates that for waves entering a region solitary waves emerge completely unscathed! This
of still water the assumption of irrotational flow is special type of nonlinear interaction (the superposition
realistic (Johnson 1997). Moreover, as a conse- principle is not valid since KdV is a nonlinear
quence of Kelvins circulation theorem (Acheson equation) in which solitary waves regain their form
1990), a water flow that is irrotational initially has upon collision occurs only for special equations, in
to be irrotational at all later times. It is thus which case the solitary waves are called solitons. A
reasonable to consider that water motions starting further interesting property of the KdV model, relevant
from rest will remain irrotational at later times. for the understanding of the interaction of solitons, is
the fact that it is completely integrable (McKean
1998): there is a transformation which converts the
equation into an infinite sequence of linear ordinary
Nonlinear Model Equations differential equations which can be trivially integrated.
Starting from the governing equations [1][6] one can Moreover, the KdV-solitons c are stable: an initial
derive a variety of model equations using the non- profile that is close to the form of a soliton will evolve
dimensionalization and scaling approach: a suitable into a wave that at any later times has a form close to
set of nondimensional variables is introduced, which, that of a soliton (Benjamin 1972). Despite all these
after scaling, leads to the appearance of parameters. intriguing features of the KdV-model, for all initial
The sizes and relative sizes of these parameters then profiles x 7! (0, x) within the Sobolev space H 1 (R) of
govern the type of phenomenon that is of interest. An square-integrable functions with a square-integrable
asymptotic expansion in one or several parameters distributional derivative, eqn [7] has a unique solution
Breaking Water Waves 385
defined for all times t 0 (cf. Kenig et al. (1996)) so H 3 (R) there is a unique solution of [10] defined on
that the KdV model cannot be used to shed light on the some maximal time interval [0, T) and the solution
wave breaking phenomenon. stays uniformly bounded on [0, T) with
Whitham (1980) suggested the equation
Z lim inf fx t; xgT t 2 if T < 1
t"T x2R
t x kx yy t; ydy 0 9
R In addition to this, for a large class of initial data, there
for the free surface profile x 7! (t, x), with the is precisely one point where the slope of the wave
singular kernel becomes infinite at breaking time (Constantin 2000): if
Z 0 6 0 is odd and such that 0 (x) 000 (x) 0 for all
1 tanh 1=2 ix x 0, then the corresponding wave t 7! [x 7! (t, x)]
kx e d
2 R will break in finite time T < 1 and
to model wave breaking. It can be shown lim x t;0 1
t"T
(see Constantin and Escher (1998) and references
therein) that [9] describes wave breaking: there are whereas
smooth initial profiles x 7! (0, x) such that the
coshx
resulting unique solution of [9] exists on a maximal jx t; xj K K
time interval [0, T) with jsinhxj
t 2 0; T; x 6 0
sup ft; xg < 1
t;x20;TR for some constant K > 0. Thus, the CamassaHolm
inf fx t; xg ! 1 as t"T model is an integrable infinite-dimensional Hamil-
x2R tonian system with stable solitons and eqn [10]
admits also breaking waves as local solutions (see
(the solution remains bounded but its slope becomes
Constantin and Escher (1998) and McKean (1998)
infinite in finite time). However, in contrast to the KdV
and references therein for further results on wave
model, eqn [9] is not integrable and does not possess
breaking for the CamassaHolm equation).
soliton solutions. As emphasized by Whitham (1980),
We conclude our discussion by pointing out that it
it is intriguing to find models for water waves which
is possible to continue solutions of the Camassa
exhibit both soliton interaction and wave breaking.
Holm equation past the breaking time. For this
The CamassaHolm equation
purpose it is convenient to rewrite [10] as the
t txx 3x 2x xx xxx 10 nonlinear nonlocal conservation law
Z
1 2
was first obtained by Fokas and Fuchssteiner (1981/ t x @x ejxyj 2 x dy 0 11
82) as a nonlinear partial differential equation with 2 R 2
infinitely many conservation laws. Camassa and Holm reminiscent to some extent to the form of [7] and [9]
(1993) derived [10] as a model for shallow water and obtained by formally applying the operator
waves, established that the equation possesses soliton (1 @x2 )1 to [10] in view of the fact that
solutions and found that it is formally integrable (for
a discussion of the integrability issues we refer 1 @x2 1 f P f for f 2 L2 R
to Constantin (2001), and Lenells (2002)). Moreover,
the solitons of [10] are stable (Constantin and Strauss the kernel of the convolution being
2003). An astonishing plentitude of structures is
Px 12ejxj ; x2R
tied into the CamassaHolm equation: [10] is a re-
expression of geodesic flow on the diffeomorphism By introducing a new set of independent and depen-
group (Constantin 2000, Kouranbaeva 1999), a dent variables it is possible to resolve all singularities
property that can be used to show that the least action due to wave breaking in the sense that [11] is
principle holds in the sense that there is a unique flow transformed into a semilinear system, the unique
transforming a wave profile into a nearby profile solution of which can be obtained as a fixed point of
within the class of flows that minimize the kinetic a contractive operator (Bressan and Constantin 2005).
energy (see the discussion in Constantin (2000) and In terms of [11], a semigroup of global conservative
Constantin and Kolev (2003)). Interestingly, the solutions (in the sense that the total energy
CamassaHolm equation also models wave breaking. Z
1
More precisely (see the discussion in Constantin 2 x2 dx
(2000)), for any initial data x 7! 0 (x) = (0, x) in 2 R
386 BRST Quantization
equals a constant, for almost every time), depending Constantin A and Kolev B (2003) Geodesic flow on the
continuously on the initial data (0, ) 2 H 1 (R), is diffeomorphism group of the circle. Commentarii Mathematici
Helvetica 78: 787804.
thus constructed. Constantin A and Strauss WA (2000) Stability of peakons. Commu-
nications on Pure and Applied Mathematics 53: 603610.
See also: Compressible Flows: Mathematical Theory; Fokas AS and Fuchssteiner B (1981/82) Symplectic structures,
Dynamical Systems in Mathematical Physics: their Backlund transformations and hereditary symmetries.
An Illustration from Water Waves; Integrable Systems: Physica D 4: 4766.
Overview; Interfaces and Multicomponent Fluids. Gesztesy F and Holden H (2003) Soliton Equations and their
Algebro-Geometric Solutions. Cambridge: Cambridge Univer-
sity Press.
Further Reading Johnson RS (1997) A Modern Introduction to the Mathematical
Theory of Water Waves. Cambridge: Cambridge University Press.
Acheson DJ (1990) Elementary Fluid Dynamics. New York: Johnson RS (2002) CamassaHolm, Kortewegde Vries and
Oxford University Press. related models for water waves. Journal of Fluid Mechanics
Benjamin TB (1992) The stability of solitary waves. Proceedings 455(2002): 6382.
of the Royal Society of London Series A 328: 153183. Kenig CE, Ponce G, and Vega LA (1996) A bilinear estimate with
Bressan A and Constantin A (2005) Global conservative applications to the KdV equation. Journal of the American
solutions of the CamassaHolm equation, Preprints on Mathematical Society 9: 573603.
Conservation Laws 2005-016 (www.math.ntnu.no/conserva- Kouranbaeva S (1999) The CamassaHolm equation as a geodesic
tion/2005/016) . flow on the diffeomorphism group. Journal of Mathematical
Camassa R and Holm DD (1993) A new integrable shallow water Physics 40: 857868.
equation with peaked solitons. Physical Review Letters Lenells J (2002) The scattering approach for the CamassaHolm
71: 16611664. equation. Journal of Nonlinear Mathematical Physics
Constantin A (2000) Existence of permanent and breaking waves 9: 389393.
for a shallow water equation: a geometric approach. Annales McKean HP (1979) Integrable systems and algebraic curves. In:
de lInstitut Fourier (Grenoble) 50: 321362. Global Analysis, Lecture Notes in Mathematics, vol. 755,
Constantin A (2001) On the scattering problem for the Camassa pp. 83200. Berlin: Springer.
Holm equation. Proceedings of the Royal Society of London McKean HP (1998) Breakdown of a shallow water equation.
Series A 457: 953970. Asian Journal of Mathematics 2: 867874.
Constantin A and Escher J (1998) Wave breaking for nonlinear Whitham GB (1980) Linear and Nonlinear Waves. New York:
nonlocal shallow water equations. Acta Mathematica Wiley.
181: 229243.
BRST Quantization
M Henneaux, Universite Libre de Bruxelles, Bruxelles, the necessary algebraic material underlying the con-
Belgium struction and then illustrates it in the cases of the
2006 Elsevier Ltd. All rights reserved. Hamiltonian BRST formalism and the Lagrangian
BRST formalism.
Introduction
A Result from Homological Algebra
The BRST symmetry was originally introduced in the
seminal papers by Becchi et al. (1976) and Tyutin (1975) The main result of homological algebra needed in
for YangMills gauge theories as a tool for controlling the BRST construction deals with a differential
the renormalization of the models in a consistent (gauge- complex C with two gradings. The first grading is
independent) way. This symmetry was discovered as a an N-degree and is called the resolution degree, or
residual symmetry of the gauge-fixed action. It was r-degree. The second grading is a Z-degree and is
realized later that, in fact, the BRST construction is quite called the total ghost number. It is denoted by gh.
general, in the sense that it covers arbitrary gauge We assume that there are two odd derivations and
theories and not just YangMills gauge models. s0 that have the following properties:
Furthermore, it is intrinsic, in that no gauge choice is
actually necessary to define it. r 1; gh 1
1
The purpose of this review is to explain the general, rs0 0; ghs0 1
intrinsic features of the BRST formalism applicable to
any gauge theory. The proper setting for discussing and
these issues is that of homological algebra (Stasheff
(1998), and references therein). This article first explains 2 0; s0 s0 0; s20 ; s1 2
BRST Quantization 387
for some derivation s1 of r-degree 1 and ghost In physical applications, the total ghost number is
number 1. The bracket [ ,] is the graded commu- a derived quantity. The primary gradings are the
tator in this specific case, the anticommutator. We resolution degree and the filtration degree called
also assume that the homology of vanishes at the pure ghost number and denoted pgh. It is an
nonzero value of the r-degree, both in the original N-degree and one has
complex C,
gh pgh r 11
Hk ; C 0; k>0 3 The r-degree is known as the antighost or antifield
number, depending on the context (see below).
(which is equivalent to a 0, ra > 0 ) a b) When r(x) = 0, one has gh(x) = pgh(x). Since the
and in the space of derivations, pure ghost number is non-negative, this implies that
; 0; r 6 0 ) ; 4 H k s; C 0; k<0 12
where and are both derivations in C. The
r-degree of a homogeneous linear operator
is defined through r((x)) = r() r(x) for any
element x 2 C and is negative when decreases the A Geometric Application
r-degree. Geometric Setting
In H0 (, C), the (odd) derivation s0 defines a
differential. The cohomology of s0 modulo , Theorem 1 is relevant to the following situation.
denoted H k (s0 , H0 (, C)), is the cohomology of s0 in Consider a surface in a manifold M, defined by
H0 (, C). It is explicitly defined through the cocycle equations
condition fa 0 13
s0 a m 5 which may or may not be independent. (We assume
with coboundaries of the form for definiteness that the variables in M are bosonic,
that is, that M is an ordinary manifold as opposed
s0 b n 6 to a supermanifold. The graded case can be covered
without difficulty by including appropriate sign
The central result underlying the BRST construc-
factors at the relevant places.) Assume that is
tion is:
partitioned by orbits generated by vector fields X
Theorem 1 Given the above setting, there exists defined everywhere in M, tangent to and closing
an odd derivation s in C with the following on in the Lie bracket,
properties:
X ; X C X more 14
s s0 s1 7
where more denotes terms that vanish on . We
assume, for simplicity, that the vector fields X are
rsk k; ghsk 1 8 linearly independent of , although this is not
necessary. The formalism can be developed in the
s2 0 9 nonindependent case, but it then requires more vari-
ables. We are interested in the quotient space =O of
Furthermore, one has the surface by the orbits. To guide the geometrical
intuition, we shall assume that this quotient space is a
Hk s; C H k s0 ; H0 ; C 10 smooth manifold (the fiber of the orbits, etc.), and we
shall suggestively adopt notations adapted to this best
The proof is straightforward (see, e.g., possible case. The approach, being purely algebraic, is
Henneaux and Teitelboim (1992)). In particular, in fact more general. (Accordingly, the notations
the proof of [10] is a standard spectral sequence should be understood with a liberal mind.)
argument with a sequence that collapses after the The aim here is to describe the algebra of
second step. It is interesting to note that, contrary observables, that is, the algebra C1 (=O) of
to s0 , which is only a differential modulo ,s is a functions on the quotient space =O. The terminology
true differential. The construction of s provides a observables anticipates the physical situation dis-
model for H k (s0 , H0 (, C)). The differential s is not cussed below, where the orbits are the gauge orbits.
unique, but this does not affect the subsequent In order to describe algebraically the algebra of
discussion. observables, one observes that this algebra is obtained
388 BRST Quantization
through a two-step procedure. First, one restricts the functions on M are annihilated by , they are
functions from M to . Second, one imposes the clearly cycles at r-degree zero. Because the left-
invariance condition along the orbits. To each of these hand side fa of the equations fa = 0 are exact
steps corresponds a separate differential. (equal to ta ), the ideal N coincides with the set
of boundaries in degree zero.
Longitudinal Complex Thus,
The longitudinal complex is associated with the H0 ; K C1 21
second step. One can consider on an exterior
derivative operator D along the gauge orbits. This We see accordingly that successfully enforces the
operator is defined on functions on as restriction to the surface through its homology in
degree zero.
Df X f C 15 However, if the equations fa = 0 are not indepen-
where the 1-forms C dual to the X s are called dent, this is not the end of the story. Indeed, any
ghosts. In the physical context, the form-degree is identity ZaA fa = 0 on the functions fa leads to a
the pgh described earlier, and so pgh(C ) = 1. The nontrivial cycle ZaA ta in r-degree 1, (ZaA ta ) = 0. This
action of D on the ghosts is given by is undesirable. To cure this drawback, one intro-
duces further generators tA in r-degree 2, one for
DC 12C C C 16 each identity ZaA fa = 0, and defines
The longitudinal complex L is the complex of
tA ZaA ta ; rtA 2 22
exterior forms along the gauge orbits. In our
representation used here, it is given by the space
in order to kill the unwanted cycles ZaA ta . The
of polynomials in the ghosts C with coefficients
Koszul complex K is thus enlarged to contain these
that are functions on . The exterior derivative D
new (even) variables and redefined as
is defined on this space by extending the formulas
[15] and [16] so that it is an odd derivation. One K C1 M ^ta StA 23
clearly has (on )
where S(tA )
is the symmetric algebra in The tA .
D2 0 17 operator is extended to K as an odd derivation.
One has 2 = 0 and the property [21] is unaffected
The functions on the quotient space =O are just the by the inclusion of the new generators. Furthermore,
elements of the zeroth cohomological group by construction,
H 0 (D, L ),
H1 ; K 0 24
H 0 D; L C1 =O 18
If there is no identity on the identities, we shall
In general, H k (D, L ) 6 0. assume that the process stops. Otherwise, one needs
to introduce further generators in r-degree 3 and
KoszulTate Differential
possibly higher. When all the appropriate variables
The KoszulTate differential implements the first are included, there is no homology at higher
step in the reduction procedure. More precisely, it r-degree. Thus,
provides an algebraic resolution of the algebra
Hk ; K 0; k>0 25
C1 () of the smooth functions on the surface .
That algebra can be identified with the quotient
algebra
Combining with D
C1 C1 M=N 19
We now turn to the problem of combining the
where N is the ideal of functions that vanish on .
KoszulTate complex with the longitudinal com-
The KoszulTate complex K is defined by adding
plex, so as to implement the full reduction. To that
one new generator for each equation fa = 0 defining
end, we define C by adding the ghosts to K,
, denoted ta and assigned r-degree 1. In the algebra
C1 (M) ^(ta ) (where ^(ta ) is the exterior algebra C K ^C 0 26
on t ), one defines through
We then extend the action of the KoszulTate
f 0 8f 2 C1 M; ta fa 20 differential in the simplest way which preserves all
gradings, namely
and extends it as an odd derivation. It is clear
that r() = 1 and that 2 = 0. Because the C 0 27
BRST Quantization 389
It is clear that the homology of in C is given by canonical transformations that are generated by the
first-class constraints. Assuming that all the second-
H0 ; C L ; Hk ; C 0 k > 0 28
class constraints have been eliminated and that the
One can also extend the longitudinal derivative bracket being used is the Dirac bracket, one sees
D to the whole complex C because the vector fields that there is a vector field X for each constraint
X are defined throughout M and so, the defini- function fa , a. (The functions fa are thus
tions [15] and [16] make sense in C. One defines assumed to be independent since the vector fields
the action of D on the generators t by requiring X are assumed to be so. If not, further variables are
that needed, but the analysis proceeds along the same
ideas.)
D D 0 29 This implies, in turn, that there is a pairing between
This is easily verified to be possible. However, the the ghosts Ca associated with the longitudinal exterior
(odd) derivation so obtained fails to be a differential derivative and the generators ta of the KoszulTate
in C when the vector fields X do not close off the complex. This pairing enables one to extend the
surface . In that case, the gauge transformations bracket structure defined on the phase space to the
are not integrable off ; one says that they form an pairs (Ca , ta ) by declaring that these are canonically
open algebra. One has then D2 = 0 only on , or, conjugate. The variables ta are the momenta conjugate
more precisely, to the ghosts, [ta ,Cb ] = ab . Accordingly, the complex C
relevant to the Hamiltonian situation,
D2 s1 s1 30
C C1 P ^Ca ^ ta 33
for some (odd) derivation s1 (that vanishes in the
closed algebra case). But this situation is precisely has a phase-space structure (here, P M is the
the one discussed earlier, with the KoszulTate manifold obtained after eliminating the second-class
differential being indeed , as anticipated by the constraints, equipped with the Dirac bracket). The
notation, and the longitudinal differential D playing space C is known as the extended phase space.
the role of s0 (the degrees also match). Applying the The r-degree is called antighost number in the
theorem discussed there, we can conclude: Hamiltonian context.
Theorem 2 There exists a differential s in C, By the general theorem described in the previous
section, one knows that the cohomology at gh = 0 of
s D s1 ; s2 0 31 the BRST differential is isomorphic to the algebra of
such that the observables. Thus, there are two alternative
ways to describe this physical algebra, either
H 0 s; C C1 =O 32 through reduction, by eliminating the redundant
(gauge) variables, or cohomologically in an extended
This is an immediate consequence of Theorem 1 space containing additional variables, the ghosts,
and eqns [18] and [28]. The differential s is known and their momenta.
in the physical applications described below as the There is an additional interesting feature of the
BRST differential. BRST construction in the Hamiltonian case: the
BRST transformation is a canonical transformation
in the extended phase space, in the sense that
As a first application of the above setting, we for some BRST generator of ghost number 1
consider the Hamiltonian description of gauge (F, 2 C). The nilpotency s2 of the BRST differen-
systems. As already known, gauge systems are tial is equivalent to
characterized in the Hamiltonian description by
; 0 35
constraints and, for this reason, are called con-
strained Hamiltonian systems. Furthermore, the That s is canonically generated implies that the
gauge transformations generate gauge orbits on the cohomological BRST groups come with a natural
constraint surface and the physical observables are bracket structure: the Poisson bracket of the extended
the functions on the quotient space of the constraint phase space passes on to the BRST cohomological
surface by the gauge orbits. groups. In particular, H 0 (s, C), equipped with this
A further important feature arises in the Hamilto- bracket structure, is isomorphic (as Poisson algebra)
nian formalism: the gauge transformations are to the algebra of physical observables.
390 BRST Quantization
operator approach, which is based on the Hamiltonian exhaustive here. Some of its main successes are
formalism. outlined here, with suggestions for Further reading.
In the operator approach, all the variables,
including the ghosts and the conjugate momenta, Renormalization of Gauge Theories
are realized as operators in a space endowed with a
nonpositive-definite inner product (because of the First, there is the original context of perturbative
ghosts and the gauge modes). Real dynamical renormalization and anomalies for gauge theories of
variables become formally Hermitian operators. the YangMills type. The relevant cohomology here
Ignoring anomalies, the BRST generator becomes is the BRST cohomology in the space of local
an operator that fulfills the conditions functionals involving the fields, the ghosts, and the
antifields. The antifields are also known in this
; 2 0 41 context as Zinn-Justin sources for the BRST varia-
(which allows for nontrivial solutions 6 0 because tions of the fields and ghosts, since Zinn-Justin was
the inner product is not positive definite). The the first to introduce them (with that meaning).
second relation is a consequence of the classical Many authors have contributed to the full computa-
Poisson bracket relation [,] = 0 and the fact that tion of the local BRST cohomology. A review is
the graded Poisson bracket of two odd objects given in Barnich et al. (2000), where extensions to
becomes the anticommutator. other theories are also indicated.
To remove the ghost and gauge redundancy, which
has no physical content, one must impose a condition String Theory
that selects physical states. The appropriate condition
Modern string theory would be inconceivable with-
is motivated by the general cohomological result
out the BRST formalism. This started with the
connecting the BRST cohomology with the algebra of
pioneering paper by Kato and Ogawa (1983), where
physical observables. One imposes the condition the critical dimension of the bosonic string was
j i 0 42 derived from the condition that 2 should vanish
(quantum mechanically), and where it was shown
Because of [41], states of the form ji are solutions that the string physical states could be identified
of [42], but they have a vanishing inner product with with the state BRST cohomology. The reader is
any other physical states, including themselves. They referred to excellent monographs on modern string
are called null states. The physical states are given by theory (see Further reading).
the BRST state cohomology. The physical operators
are given by the BRST operator cohomology at
Deformations of Gauge Models
gh = 0 and induce a well-defined action in the state
cohomology. In particular, the Hamiltonian, being The study of consistent deformations of a given
gauge invariant in the original theory, is represented gauge theory (i.e., the problem of introducing
by a BRST cohomological class, so that the time consistent couplings) is also efficiently dealt with in
evolution maps physical states on physical states. the BRST context. References to applications may
The whole scheme is (formally) consistent because be found in Henneaux (1998).
exact BRST operators have vanishing matrix elements
between states annihilated by the BRST operator , See also: Anomalies; BatalinVilkovisky Quantization;
while null states ji are such that h jAji= 0 whenever BF Theories; Constrained Systems; Functional
A is a BRST-closed operator, [A, ] = 0, and j i a Integration in Quantum Physics; Graded Poisson
Algebras; Indefinite Metric; Perturbative Renormalization
physical state. Problems may arise, however, if the
Theory and BRST; Quantum Chromodynamics; Quantum
classical relations [, ] = 0 and [H,] = 0 are not
Field Theory: A Brief Introduction; Renormalization:
satisfied in presence of extra terms of order
h,that is, General Theory; String Field Theory; Supermanifolds;
Topological Sigma Models.
2 6 0 or H H 6 0 43
Batalin IA and Vilkovisky GA (1977) Relativistic S-matrix of Henneaux M and Teitelboim C (1992) Quantization of Gauge
dynamical systems with boson and fermion constraints. Systems. Princeton: Princeton University Press.
Physics Letters B69: 309. Kato M and Ogawa K (1983) Covariant quantization of strings
Batalin IA and Vilkovisky GA (1981) Gauge algebra and based on BRS invariance. Nuclear Physics B212: 443.
quantization. Physics Letters B102: 27. Kugo T and Ojima I (1979) Local covariant operator formalism
Becchi C, Rouet A, and Stora R (1976) Renormalization of gauge of nonabelian gauge theories and quark confinement problem.
theories. Annals of Physics, NY 98: 287. Progress of Theoretical Physics (Suppl.) 66: 1.
Fradkin ES and Vilkovisky GA (1975) Quantization of relativistic Polchinski J (1998) String Theory,vols. 1 and 2. Cambridge:
systems with constraints. Physics Letters B55: 224. Cambridge University Press.
Green MB, Schwarz JH, and Witten E (1987) Superstring Theory, Stasheff JD (1998) The (secret?) homological algebra of the
vols. 1 and 2. Cambridge: Cambridge University Press. BatalinVilkovisky approach. Contemporary Mathematics
Henneaux M (1998) Consistent interactions between gauge fields: 219: 195.
the cohomological approach. Contemporary Mathematics Tyutin IV (1975) Gauge invariance in field theory and statistical
219: 93. physics in the operator formalism. Preprint Lebedev-75-39.
C
C-Algebras and their Classification
G A Elliott, University of Toronto, Toronto, Canada unital commutative C -algebra under the Gelfand
2006 Elsevier Ltd. All rights reserved.
Naimark correspondence may be viewed as the
space of maximal proper ideals, with a natural
topology (the hull-kernel, or Jacobson, topology),
The study of algebras of Hilbert space operators, closed and is called the spectrum. This space may also be
under the adjoint operation and in the weak operator viewed as the set of (unital, linear, multiplicative)
topology, was begun by John von Neumann shortly maps from the algebra into the complex numbers,
after the discovery of quantum mechanics, and partly in which case the topology is that of pointwise
with the aim of understanding the monolithic ideas convergence.
proposed by Heisenberg and Schrodinger. Second, using this result, Gelfand and Naimark
Seventy-five years later, the theory of these proved that arbitrary C -algebras could be axioma-
algebras has become a monolith in its own right tized in a simple way abstractly, as -algebras that
(see von Neumann Algebras: Introduction, Modular is, as algebras over the complex numbers with a
Theory and Classification Theory; von Neumann conjugate linear anti-automorphism of order 2 with
Algebras: Subfactor Theory), with more internal certain special properties. It is now known that the
structure and with more external reference to physics only property that needs to be assumed is the
and, as it turns out, to other areas of mathematics existence of a (necessarily unique) Banach space
than could possibly have been imagined at the outset. norm related to the -algebra structure by means of
(The most striking example of an application to the so-called C -algebra identity:
mathematics is perhaps the discovery of the Jones kx xk kx k kxk 1
knot polynomial (see The Jones Polynomial); note
that this has also had repercussions for physics.) This is clearly related to and in fact implies the
Twenty-five years after the beginning of the normed algebra inequality
theory of von Neumann algebras, as these algebras
kx yk kxk kyk 2
are now called, Gelfand and Naimark noticed that a
second class of algebras of operators on a Hilbert One reason that the GelfandNaimark axiomati-
space, closed under the adjoint operation, was zation of C -algebras is important is that it under-
worthy of study, namely those closed in the norm lines how natural it is to consider a C -algebra
topology. Gelfand and Naimark made two impor- abstractly, i.e., independently of any particular
tant discoveries concerning this class of operator representation. Indeed, while one of the fundamen-
algebras, now called C -algebras. tal phenomena of von Neumann algebra theory
First, Gelfand and Naimark showed that, in the (discovered by Murray and von Neumann) is that,
commutative case, at least when the C -algebra is essentially in rather a strong sense there is only
considered only up to isomorphism with its one way to represent a given von Neumann algebra
identity as a concrete algebra of operators sup- on a Hilbert space (and there is even a canonical
pressed the information contained in a C -algebra way, called the standard representation!), it is an
is purely topological. More precisely, Gelfand and equally fundamental phenomenon of C -algebra
Naimark showed that the category of unital theory that, except in extremely special cases, this
commutative C -algebras, with unit-preserving is no longer true.
algebra homomorphisms (these necessarily preserve For instance, although the C -algebra of compact
the adjoint operation), is equivalent in a contra- operators on a given Hilbert space has, up to unitary
variant way (i.e., with reversal of arrows) to the equivalence, only a single irreducible representation
category of compact Hausdorff spaces, with con- this is what underlies the fact, proved by von
tinuous maps. The compact space associated with a Neumann, referred to as the uniqueness of the
394 C-Algebras and their Classification
Heisenberg commutation relations for a quantum- C -algebra should contain the compact operators.
mechanical system with finitely many degrees of Third, any two irreducible representations with the
freedom as soon as one considers a physical system same kernel should be unitarily equivalent. Fourth,
with infinitely many degrees of freedom, one finds that it should be possible to parametrize the unitary
the naturally associated C -algebra has infinitely equivalence classes of irreducible representations by
many indeed, uncountably many unitary equiva- a real number in a natural way (respecting the
lence classes of irreducible representations, and it is natural Borel structure introduced by Mackey).
impossible to parametrize these in any reasonable way. The first of the equivalent properties listed above,
This striking dichotomy presents itself also in that all representations of a C -algebra should be of
other contexts, more elementary perhaps than the type I, suggested a name for the property that the
physics of infinitely many degrees of freedom. C -algebra itself should be of type I. This property
Consider the dynamical system consisting of a circle of a C -algebra, identified by Glimm or, rather, its
and a fixed rotation acting on it. If the rotation is of opposite, which as mentioned above is much more
finite order i.e., if the angle is a rational multiple common (just as irrational numbers are more
of 2 then the naturally associated C -algebra is common than rationals, or systems with infinitely
relatively easy to study. In the case of angle zero, it many degrees of freedom are, at least in theory,
is the unital commutative C -algebra with Gelfand much more common than those with finitely many
Naimark spectrum the torus. In the general case of a degrees of freedom) is a fundamental unifying
rational angle, the space of unitary equivalence principle of nature.
classes of irreducible representations is still naturally Besides commutative C -algebras as mentioned
parametrized by the torus. (And this is the same as above, just another way of looking at topological
the space of primitive ideals the kernels of the spaces (compact Hausdorff spaces, that is) and
irreducible representations with the Jacobson besides the C -algebra associated to a rotation or to
topology.) a physical system with infinitely many degrees of
In the irrational case the case of a rotation by an freedom, what are some of the naturally occurring
irrational multiple of 2 (still elementary from a examples of C -algebras of type I or not!
geometrical point of view; note that the calendar is First, let us take a closer look at what arises from
based on such a system!) the irreducible represen- a system with infinitely many degrees of freedom
tations are no longer parametrized up to unitary in the fermion case. As shown by Jordan and
equivalence by the torus and the space of primitive Wigner, one obtains what, as a C -algebra, is very
ideals consists of a single point the C -algebra is easy to describe, namely, just the infinite tensor
simple. (But it is decidedly not simple to study!) product in the category of unital C -algebras of
This fundamental dichotomy in the classification copies of the algebra of 2 2 matrices over the
of C -algebras conjectured by Gaarding and complex numbers. As it happens, in work earlier
Wightman in the quantum-mechanical setting and than that referred to above, Glimm had considered
by Mackey in the geometrical one was established such infinite tensor product C -algebras, also allow-
by Glimm. Glimm proved (in the setting of separ- ing the components to be matrix algebras of order
ability; most of his results were generalized later different from two. This raised a problem of
to the nonseparable case) that a large number of classification for those C -algebras, all of which
a priori different ways that a C -algebra could were simple and not of type I. (The only simple
behave well were in fact one and the same behavior: unital C -algebra of type I is a single matrix algebra,
either all present for a given C -algebra, or all or a finite tensor product of matrix algebras!)
catastrophically absent! In a pioneering classification paper (the first paper
Some of the properties considered by Glimm, and on the classification of C -algebras being perhaps
shown to be equivalent (for a separable C -algebra) that of Gelfand and Naimark, in which the commu-
were as follows. First of all, every representation of tative case was described), Glimm obtained the
the C -algebra on a Hilbert space should be of type classification of infinite tensor products of matrix
I, i.e., should generate a von Neumann algebra of algebras, showing that it was a direct extension of
type I. (A von Neumann algebra was said by Murray the classification of finite tensor products, i.e., just
and von Neumann to be of type I if it contained a of the matrix algebras themselves. As described later
minimal projection of central support one, i.e., a by Dixmier, Glimms classification was as follows.
projection not contained in a proper direct sum- Given a sequence n1 , n2 , . . . of natural numbers
mand and minimal with this property.) Second, in (equal to one or more), form the infinite product in
every irreducible representation (not necessarily a natural way just by keeping track of the total
injective) on a Hilbert space, the image of the number of times each prime number appears in the
C-Algebras and their Classification 395
finite products n1 . . . nk (a multiplicity which may be to be added have orthogonal representatives) one
either finite or infinite). Call such a formal infinite might refer to this as a local abelian semigroup
product a generalized integer or, perhaps, a which was used by Murray and von Neumann to
supernatural number! Two (countably) infinite divide von Neumann algebras into what they called
tensor products of matrix algebras are isomorphic types I, II, and III was shown by the author to
(just as in the finite tensor product case) if and only determine Brattelis algebras up to isomorphism.
if the corresponding supernatural numbers are Bratteli called his algebras approximately finite-
equal. dimensional C -algebras, or AF algebras. The author
In formulating Glimms classification of infinite referred to his invariant simply as the range of the
tensor products of matrix algebras in this way, (abstract) dimension, and pointed out that this
Dixmier pointed out that each supernatural number structure determined an enveloping ordered abelian
determines a subgroup of the rational numbers group, which he called the dimension group. It was
(those with denominator dividing the supernatural soon noticed that the dimension group was related
number) and that every subgroup of the rational to the K-group introduced by Grothendieck in
numbers containing the integers arises in this way. algebraic geometry (see K-Theory), and by Atiyah
He then gave an alternative derivation of Glimms and Hirzebruch (see K-Theory) in topology.
theorem by recovering this subgroup of the rational Grothendiecks K-group was defined for an arbi-
numbers as a natural invariant of the algebra, trary ring with unit, and Atiyah and Hirzebruch in
namely, as the subgroup generated by the values effect considered the special case of the ring of
on projections of the unique normalized trace. (By a continuous functions on a compact Hausdorff space
trace is meant here a unitarily invariant positive in other words, a commutative C -algebra in the
linear functional.) This could even be interpreted as process showing that the deep phenomenon of Bott
an alternative statement of Glimms theorem. periodicity could be expressed in terms of this
Soon afterwards, Bratteli considered an extension invariant. The invariant itself (see below) is essen-
of Glimms class of C -algebras, namely, the tially the same as that of Murray and von Neumann.
inductive limits of arbitrary sequences of finite- In the special case that the ring is an AF algebra, the
dimensional C -algebras, and gave a classification of K-group coincides with the dimension group. (The
these algebras in terms of the embedding multiplicity K-group has a natural ordered, or pre-ordered,
data in the sequences. This was exactly analogous to structure, although this was often suppressed.)
the original classification of Glimm, but now vastly Let us consider the definition of the K-group of a
more complex, with the multiplicity data of the not necessarily unital C -algebra; it is in this setting
sequence encoded in what is now called a Bratteli that the statement of Bott periodicity attains its
diagram. (Note that a finite-dimensional C -algebra simplest form.
is just a direct sum of matrix algebras over the First, in the unital case, one constructs the abelian
complex numbers.) Bratteli diagrams have proved to local semigroup (addition just partially defined) of
be very important, and in particular have been shown Murrayvon Neumann equivalence classes of pro-
by Putnam and others to be useful for the study of jections, as described above in the case of an AF
minimal homeomorphisms of the Cantor set. algebra. Let us call this the dimension range. As
Brattelis extension of Glimms tensor product stated above, for AF algebras this is all that needs to
classification was followed by a corresponding be done the enveloping group of the dimension
extension by the present author of Dixmiers range is already the K-group. In the general case,
approach to Glimms result. It was no longer one must repeat the construction for the algebra of
possible to express the appropriate data in terms of 2 2 matrices over the given algebra, with the given
traces (even in the case of a unique normalized algebra considered as embedded as the upper left-
trace). Instead, the present author recalled the hand corner of the matrix algebra. The dimension
concept of equivalence of projections introduced range of the given algebra then maps naturally into
by Murray and von Neumann forty years earlier, (but not necessarily onto) the dimension range of the
together with the fact, proved by Murray and von matrix algebra. One should then repeat this con-
Neumann, that equivalence is compatible with struction, doubling the order of the matrix algebra
addition of orthogonal projections. (Two projec- at every stage (or, alternatively, increasing it just by
tions in a -algebra are equivalent if they are equal one). The enveloping group of the (algebraic)
to x x and xx for some element x.) The resulting inductive limit of this sequence of local semigroups
elementary invariant the set of equivalence classes is then the K-group of the given algebra. (Alterna-
of projections with the operation of addition tively, one may just consider immediately the
whenever defined (whenever the equivalence classes -algebra of all infinite matrices over the given
396 C-Algebras and their Classification
C -algebra with only finitely many nonzero entries, first referred to as the index map, and the second
and form the dimension range of this -algebra and (sometimes referred to as the odd-order index map)
the enveloping group of this abelian local semi- obtained from this immediately from Bott periodicity
group, now in fact a semigroup.) (as stated above) such that the periodic six-term
In the case of a nonunital C -algebra, one adjoins sequence
a unit (as may be done, for instance, by representing
K0 J ! K0 A ! K0 A=J
the C -algebra faithfully on a Hilbert space, and
" #
showing that the C -algebra obtained by adjoining
K1 A=J K1 A K1 J
the identity operator is independent of the representa-
tion actually, one need only check that the -algebra is exact. (The periodicity stated above can also be
structure is unique, as the C -algebra norm on a recovered from this.)
C -algebra is always determined by the -algebra Given that the functor K0 classifies AF algebras,
structure). The K-group of the resulting unital one might expect the functor K1 to be useful for
C -algebra then maps naturally into the K-group of classification purposes also. In fact, this is the case.
the natural one-dimensional quotient, and the kernel (Indeed, as shown by Brown, the K1 -functor is
of this map is, for reasons that will become clearer already important for the theory of AF algebras in
later, defined to be the K-group of the nonunital spite of, or even because of (!), the fact that the
algebra. K1 -group of an AF algebra is zero.) Using the six-
Atiyah and Hirzebruch in fact referred to the term exact sequence of Bott periodicity described
K-group of the C -algebra as K0 the reason being above, corresponding to an extension of C -algebras,
that there is another very natural group to consider, together with results of the present author, Brown
namely, the K-group of the suspension of the showed that any extension of one AF algebra by
C -algebra. (The suspension, SA, of a C -algebra A another is again an AF algebra.
is defined as the C -algebra of all continuous A rather large class of simple unital C -algebras
functions from the real line R into A which converge has by now been classified by means of the
to zero at 1, with the pointwise -algebra invariants K0 and K1 together with the class of
operations and the supremum norm. It may also be the unit in K0 , and the order (or pre-order) structure
defined as the (unique) C -algebra tensor product on K0 and also taking into account the compact
A C0 (R), where C0 (R) denotes the suspension of convex set of tracial states on the C -algebra
the C -algebra C of complex numbers.) Denoting (a positive linear functional on a C -algebra is called
the K0 -group of the suspension of a given C -algebra a trace if it has the same value on x x and x x for
by K1 , one might expect this process to continue, every element x, and a tracial state if it is a state,
but in fact it is periodic (K0 , K1 , K0 , K1 , . . .). Bott that is, has norm 1, or has value 1 on the unit in the
periodicity states that there is a natural isomorphism case the algebra has a unit). In addition to the set of
of K2 with K0 . (C -algebras can also be defined with tracial states, together with its natural topology and
the field of real numbers as scalars, and in this case convex structure, one should also keep track of the
the period of Bott periodicity is eight.) natural pairing between traces and K0 (any trace on
Another way of stating Bott periodicity, or, more a unital C -algebra has the same value on two
precisely, of embedding it into the K-theory of equivalent projections equal to x x and x x for
C -algebras, is as follows. Given a short exact some element x and hence gives rise to an additive
sequence of C -algebras, real-valued functional on K0 ).
In terms of these invariants (which might, broadly
0 ! J ! A ! A=J ! 0 3
speaking, be called K-theoretical), it has been
i.e., given a C -algebra A and a closed two-sided possible to classify the simple unital C -algebras
ideal J (the quotient -algebra is then a C -algebra (not of type I) arising as inductive limits (i.e., as the
with the quotient norm) A is sometimes referred to completions of increasing unions) of sequences of
as an extension of J by A=J consider the natural finite direct sums of matrix algebras over separable
short (not necessarily exact) sequences commutative C -algebras, these assumed to have
spectra of dimension at most three, on the one hand
K0 J ! K0 A ! K0 A=J 4
(work of the present author together with Guihua
and Gong and Liangqing Li, a culmination of earlier
work of these authors together with a number of
K1 J ! K1 A ! K1 A=J 5
others), and, on the other hand, it has been possible
(K0 and K1 are functors!). There exist natural connect- (work of Kirchberg and Phillips, also based on
ing maps K1 (A=J) ! K0 (J) and K0 (A=J) ! K1 (J) the earlier work by a number of authors) to classify the
C-Algebras and their Classification 397
C -algebra tensor products (in a natural sense) of who settled a particularly stubborn case), it is
these C -algebras with what is called the Cuntz natural to ask whether the K-theoretical invariants
C -algebra O1 (see below). In the first of these two described above might be sufficient to classify all
cases, the compact convex set of tracial states amenable separable C -algebras, say, those which
always a Choquet simplex is an arbitrary (metriz- are simple and unital.
able) such space. The work of Villadsen has shown that additional
In the second case, this space is empty (as it is for invariants must in fact be considered, if one is to
O1 in particular). In both cases, K0 and K1 are deal with arbitrary amenable simple C -algebras,
arbitrary countable abelian groups, with the proviso and this has been confirmed in subsequent work of
that K0 is not the sum of a torsion group and a Rrdam and of Toms. (Villadsens examples were
cyclic group. In the first case, the order structure on obtained by removing the condition of low dimen-
K0 , the class of the unit element, and the pairing of sion on the spectra of the commutative C -algebras
K0 with the space of traces have certain special appearing in the inductive limit decomposition
properties; as it turns out, these can be expressed in considered above.) The very nature of these authors
a simple way. (The class of the unit need only be work, however, has been to introduce additional
positive and nonzero.) In the second case, the order invariants, all of which it seems natural to consider
structure on K0 is degenerate every element is as, broadly speaking, K-theoretical. (And all of
positive and the class of the unit can be arbitrary which, as it happens, are already familiar.)
(including zero!). The question of the classifiability, in terms of
Let us just note that the Cuntz C -algebra O1 is simple invariants (K-theoretical in nature, at least in
the unital C -algebra generated by an infinite the broad sense, and including the spectrum which is
sequence s1 , s2 , . . . of isometries with orthogonal indispensable in the nonsimple case), of all (separ-
ranges (in other words, elements si such that si si is able) amenable C -algebras would therefore still
the unit and sj si = 0 if j 6 i). One need not require appear to be on the agenda.
the C -algebra to have the universal property with Already, in any case, just like the analogous
respect to these generators and relations as it is in question for von Neumann algebras (now settled),
fact unique (up to an isomorphism preserving these this question would appear to have had a noticeable
generators). In particular, this C -algebra is simple. influence on the development of the subject not
(If one considers a finite sequence of isometries with least in underlining the importance of K-theoretical
orthogonal ranges, and assumes in addition that the methods, which have proved to be pertinent both in
sum of these is the unit, one also obtains a simple connection with the index theory of differential
C -algebra, the Cuntz C -algebra On , n = 2, 3, . . .). operators on geometrical structures from foliations
The K0 -group and K1 -group of O1 are, respectively, to fractals and in connection with questions in
Z and 0. (The K0 -group and K1 -groups of On for physics, related to quantum statistical mechanics
n = 2, 3, . . . are, respectively, Z=(n 1)Z and 0.) (see e.g., Quantum Hall Effect), to quantum field
Both classes of C -algebras considered in the theory (e.g., the standard model), and even to string
classification result stated above, although des- theory and M-theory.
cribed in rather a concrete way (in terms of
inductive limits and tensor products), can also be See also: Axiomatic Quantum Field Theory; Bosons and
characterized axiomatically, in a way that makes it Fermions in External Fields; The Jones Polynomial;
clear that they are, in fact, much more general than K-Theory; Positive Maps on C *-Algebras; Quantum Hall
Effect; von Neumann Algebras: Introduction, Modular
they seem. (These axiomatizations are due to
Theory, and Classification Theory; von Neumann
Lin and to Kirchberg and Phillips. Typically, the Algebras: Subfactor Theory.
abstract axioms are easier to establish in a
given case than the inductive limit form described
above.)
In view of this, and the fact that one of the axioms Further Reading
is a notion of amenability (the analogous property Davidson KR (1996) C -Algebras by Example. Fields Institute
for C -algebras of a notion that has also been Monographs, 6. Providence, RI: American Mathematical
considered for von Neumann algebras) and since Society.
amenable von Neumann algebras (on a separable Dixmier J (1969) Les C -Algebres et leurs Representations,
Hilbert space) have been classified completely (in 2nd edn. Paris: GauthierVillars.
Elliott GA (1995) The classification problem for amenable
remarkable work of Connes, together with many C -algebras. In: Chatterji SD (ed.) Proceedings of the Interna-
others, starting with Murray and von Neumann tional Congress of Mathematicians, vols. 1, 2, pp. 922932.
and, one must also mention, ending with Haagerup, (Zurich, 1994). Basel: Birkhauser.
398 Calibrated Geometry and Special Lagrangian Submanifolds
Evans DE and Kawahigashi Y (1998) Quantum Symmetries on Pedersen GK (1979) C -Algebras and their Automorphism
Operator Algebras. Oxford: Oxford University Press. Groups, London Math. Soc. Monographs. London: Academic
Fillmore PA (1996) A Users Guide to Operator Algebras. Press.
New York: Wiley. Rrdam M (2002) Classification of Nuclear, Simple C -Algebras,
Kadison RV and Ringrose J (198392) Fundamentals of the Theory Encyclopaedia of Mathematical Sciences, vol. 126, pp. 1145.
of Operator Algebras (4 volumes). New York: Academic Press. Berlin: Springer.
Lin H (2001) An Introduction to the Classification of Amenable Sakai S (1971) C -Algebras and W -Algebras. Berlin: Springer.
C -Algebras. Singapore: World Scientific.
is reasonably large say, if F has small Cm . Thus, a CalabiYau m-fold (M, g) with
codimension in the family of all tangent k-planes V Hol(g) = SU(m) has a holomorphic volume form
on M. A maximally boring example is the k-form . The real part Re is a calibration on M, and
= 0, which is a calibration but has no calibrated the corresponding calibrated submanifolds are
tangent k-planes, so no -submanifolds. called special Lagrangian submanifolds.
Thus, most calibrations will have few or no The group G2 O(7) preserves a 3-form 0 and a
-submanifolds, and only special calibrations with 4-form 0 on R7 . Thus, a Riemannian 7-manifold
F large will have interesting calibrated geometries. (M, g) with holonomy G2 comes with a 3-form
Now the field of Riemannian holonomy groups is a and 4-form , which are both calibrations. The
natural companion for calibrated geometry, because corresponding calibrated submanifolds are called
it gives a simple way to generate interesting associative 3-folds and coassociative 4-folds.
calibrations which automatically have F large. The group Spin(7) O(8) preserves a 4-form 0
Let G O(n) be a possible holonomy group of a on R8 . Thus a Riemannian 8-manifold (M, g) with
Riemannian metric. In particular, we can take G to be holonomy Spin(7) has a 4-form , which is a
one of the holonomy groups U(m), SU(m), Sp(m), G2 , calibration. The -submanifolds are called Cayley
or Spin(7) from Bergers classification. Then G acts 4-folds.
on the k-forms k (R n ) on Rn , so we can look for
It is an important general principle that to each
G-invariant k-forms on Rn . Suppose 0 is a nonzero,
calibration on an n-manifold (M, g) with special
G-invariant k-form on Rn .
holonomy constructed in this way, there corre-
By rescaling 0 we can be arrange that for each
sponds a constant calibration 0 on Rn . Locally, -
oriented k-plane U R n , we have 0 jU volU , and
submanifolds in M resemble the 0 -submanifolds in
that 0 jU = volU for at least one such U. Let H be the
Rn , and have many of the same properties. Thus, to
stabilizer subgroup of this U in G. Then 0 jU =
understand the calibrated submanifolds in a mani-
volU by G-invariance, so U is a calibrated
fold with special holonomy, it is often a good idea to
k-plane for all 2 G. Thus, the family F 0 of
start by studying the corresponding calibrated
0 -calibrated k-planes in R n contains G=H, so it is
submanifolds of Rn .
reasonably large, and it is likely that the calibrated
In particular, singularities of -submanifolds in M
submanifolds will have an interesting geometry.
will be locally modeled on singularities of 0 -
Now let M be a manifold of dimension n, and g
submanifolds in Rn . (In the sense of geometric
a metric on M with Levi-Civita connection r and
measure theory, the tangent cone at a singular point
holonomy group G. Then there is a k-form on M
of a -submanifold in M is a conical 0 -submanifold
with r = 0, corresponding to 0 . Hence d = 0,
in Rn .) So by studying singular 0 -submanifolds in
and is closed. Also, the condition 0 jU volU for
Rn , we may understand the singular behavior of
all oriented k-planes U in Rn implies that jV
-submanifolds in M.
volV for all oriented tangent k-planes V in M. Thus,
is a calibration on M. The family F of calibrated
tangent k-planes on M fibers over M with fiber F 0 ; Special Lagrangian Geometry
so, it is reasonably large.
This gives a general method for finding interesting We now focus on one class of calibrated submani-
calibrations on manifolds with reduced holonomy. folds, special Lagrangian submanifolds in Calabi
Here are the most significant examples. Yau manifolds. CalabiYau 3-folds are used to
make the spacetime vacuum in string theory, and
Let G = U(m) O(2m). Then G preserves a special Lagrangian 3-folds are the classical versions
2-form !0 on R 2m . If g is a metric on M with of A-branes, or supersymmetric 3-cycles, in Calabi
holonomy U(m), then g is Kahler with complex Yau 3-folds. Special Lagrangian geometry aroused
structure J, and the 2-form ! on M associated to great interest amongst string theorists because of its
!0 is the Kahler form of g. role in the SYZ conjecture, providing a geometric
One can show that ! is a calibration on (M, g), basis for mirror symmetry of CalabiYau 3-folds.
and the calibrated submanifolds are exactly the
CalabiYau Manifolds
holomorphic curves in (M, J). More generally,
!k =k! is a calibration on M for 1 k m, and Here is our definition of CalabiYau manifold.
the corresponding calibrated submanifolds are the Readers are warned that there are several different
complex k-dimensional submanifolds of (M, J). definitions of CalabiYau manifolds in use in the
Let G = SU(m) O(2m). Then G preserves a literature. Ours is unusual in regarding as part of
complex volume form 0 = dz1 ^ ^ dzm on the given structure.
400 Calibrated Geometry and Special Lagrangian Submanifolds
structure on T N. Let : T ! N be the obvious for all t can be satisfied by choosing the phases of
projection. the t appropriately, and if the image of H2 (N, Z) in
Under this identification, submanifolds N 0 in T H2 (M, R) is zero, then the condition [!jN ] = 0 holds
M which are C1 close to N are identified with the automatically.
graphs of small smooth sections of T N. That is, Thus, the obstructions [!t jN0 ] = [Im t jN0 ] = 0 in
submanifolds N 0 of M close to N are identified with Theorem 9 are actually fairly mild restrictions, and
1-forms on N. We need to know: which 1-forms SL m-folds should be considered as pretty stable
are identified with SL m-folds N 0 ? under small deformations of the CalabiYau
Now, N 0 is special Lagrangian if !jN0 Im jN0 0. structure.
But jN0 : N 0 ! N is a diffeomorphism, so we can
Remark The deformation and obstruction theory
push !jN0 and Im jN0 down to N, and regard them
of compact SL m-folds are extremely well behaved
as functions of . Calculation shows that
compared to many other moduli space problems in
!jN0 d and Im jN0 F; r differential geometry. In other geometric problems
(such as the deformations of complex structures on a
where F is a nonlinear function of its arguments.
complex manifold, or pseudoholomorphic curves in
Thus, the moduli space MN is locally isomorphic to
an almost-complex manifold, or instantons on a
the set of small 1-forms on N such that d 0
Riemannian 4-manifold), the deformation theory
and F(, r) 0.
often has the following general structure.
Now it turns out that F satisfies F(, r)
d() when is small. Therefore, MN is locally There are vector bundles E, F over a compact
approximately isomorphic to the vector space of 1- manifold M, and an elliptic operator P : C1 (E) !
forms with d = d() = 0. But by Hodge theory, C1 (F), usually first order. The kernel Ker P is the
this is isomorphic to the de Rham cohomology set of infinitesimal deformations, and the cokernel
group H 1 (N, R), and is a manifold with dimension Coker P the set of obstructions. The actual moduli
b1 (N). space M is locally the zeros of a nonlinear map
To carry out this last step rigorously requires : Ker P ! Coker P.
some technical machinery: one must work with In a generic case, Coker P = 0, and then the
certain Banach spaces of sections of T N, 2 T N moduli space M is locally isomorphic to Ker P,
and m T N, use elliptic regularity results to prove and so is locally a manifold with dimension ind(P).
that the map 7! (d, F(, r)) has closed image in However, in nongeneric situations Coker P may be
these Banach spaces, and then use the implicit nonzero, and then the moduli space M may be
function theorem for Banach spaces to show that nonsingular, or have an unexpected dimension.
the kernel of the map is what is expected. However, SL m-folds do not follow this pattern.
Instead, the obstructions are topologically determined,
and the moduli space is always smooth, with dimen-
Obstructions to Existence of Compact SL m-Folds
sion given by a topological formula. This should be
Let {(M, Jt , gt , t ) : t 2 ( , )} be a smooth one- regarded as a minor mathematical miracle.
parameter family of CalabiYau m-folds. Suppose
N0 is an SL m-fold in (M, J0 , g0 , 0 ). When can we
Mirror Symmetry and the SYZ Conjecture
extend N0 to a smooth family of SL m-folds Nt in
(M, Jt , gt , t ) for t 2 ( , )? Mirror symmetry is a mysterious relationship
By Corollary 7, a necessary condition is that between pairs of CalabiYau 3-folds M, M, arising
[!t jN0 ] = [Im t jN0 ] = 0 for all t. Our next result from a branch of physics known as string theory,
shows that locally, this is also a sufficient condition. and leading to some very strange and exciting
conjectures about CalabiYau 3-folds, many of
Theorem 9 Let {(M, Jt , gt , t ) : t 2 ( , )} be a
which have been proved in special cases.
smooth one-parameter family of CalabiYau m-folds,
In the beginning (the 1980s), mirror symmetry
with Kahler forms !t . Let N0 be a compact SL m-fold
seemed mathematically completely mysterious. But
in (M, J0 , g0 , 0 ), and suppose that [!t jN0 ] = 0
there are now two complementary conjectural
in H 2 (N0 , R) and [Im t jN0 ] = 0 in H m (N0 , R) for all
theories, due to Kontsevich and StromingerYau
t 2 ( , ). Then N0 extends to a smooth one-
Zaslow, which explain mirror symmetry in a fairly
parameter family {Nt : t 2 (
,
)}, where 0 <
mathematical way. Probably both are true, at some
and Nt is a compact SL m-fold in (M, Jt , gt , t ).
level. The second proposal, due to Strominger, Yau,
This can be proved using similar techniques to and Zaslow (1996), is known as the SYZ conjecture.
Theorem 8. Note that the condition [Im t jN0 ] = 0 Here is an attempt to state it.
402 Calibrated Geometry and Special Lagrangian Submanifolds
The SYZ conjecture Suppose M and M are mirror submanifolds, and especially their singularities,
CalabiYau 3-folds. Then (under some additional rather than on global topological questions. In
conditions), there should exist a compact topologi- addition, we are intrested in what fibrations of
cal 3-manifold B and surjective, continuous maps generic CalabiYau 3-folds might look like.
f : M ! B and f : M ! B, such that There is now a well-developed theory of SL
m-folds with isolated singularities modeled on
(i) There exists a dense open set B0 B, such that
cones (Joyce 2003a). This is applied to SL
for each b 2 B0 , the fibers f 1 (b) and f 1 (b) are
fibrations and the SYZ conjecture in Joyce
nonsingular special Lagrangian 3-tori T 3 in M (2003a, b), leading to the tentative conclusions
and M. Furthermore, f 1 (b) and f 1 (b) are in that for generic CalabiYau 3-folds M, special
some sense dual to one another. Lagrangian fibrations f : M ! B will be only piece-
(ii) For each b 2 = BnB0 , the fibers f 1 (b) and wise smooth, and have discriminants of real
f 1 (b) are expected to be singular special codimension 1 in B, in contrast to smooth fibra-
Lagrangian 3-folds in M and M. tions which have of codimension 2. We also
The fibrations f and f are called special Lagran- argue that for generic mirrors M, M and f , f,
the discriminants , cannot be homeomorphic
gian fibrations, and the set of singular fibers is
called the discriminant. In part (i), the nonsingular and so do not coincide. This contradicts part (ii)
fibers of f and f are supposed to be dual tori. What above.
does this mean? A better way to formulate the SYZ conjecture
On the topological level, we can define duality may be in terms of families of mirror CalabiYau
between two tori T, T to be a choice of isomorph- 3-folds Mt , Mt and fibrations ft : Mt ! B, ft : Mt !
ism H 1 (T, Z) H1 (T, Z). We can also define B for t 2 (0, ) which approach the large complex
duality between tori equipped with flat Riemannian structure limit as t ! 0. Then we could require the
discriminants t , t of ft , f to converge to some
metrics. Write T = V=, where V is a Euclidean t
vector space and a lattice in V. Then the dual common, codimension 2 limit 0 as t ! 0.
torus T is defined to be V = , where V is the It is an important, and difficult, open problem to
dual vector space and the dual lattice. However, construct examples of special Lagrangian fibrations
there is no notion of duality between nonflat of compact, holonomy SU(3) CalabiYau 3-folds.
metrics on dual tori. None are currently known.
Strominger, Yau, and Zaslow argue only that
their conjecture holds when M, M are close to the See also: Minimal submanifolds; Mirror Symmetry:
large complex structure limit. In this case, the A Geometric Survey; Moduli Spaces: An Introduction;
Riemannian Holonomy Groups and Exceptional Holonomy.
diameters of the fibers f 1 (b), f 1 (b) are expected to
be small compared to the diameter of the base space
B, and away from singularities of f , f, the metrics on
the nonsingular fibers are expected to be approxi- Further Reading
mately flat. So, part (i) of the SYZ conjecture says Gross M, Huybrechts D, and Joyce D (2003) CalabiYau
that for b 2 BnB0 , f 1 (b) is approximately a flat Manifolds and Related Geometries, Universitext Series, Berlin:
Riemannian 3-torus, and f 1 (b) is approximately the Springer.
dual flat Riemannian torus. Harvey R and Lawson HB (1982) Calibrated geometries. Acta
Mathematical research on the SYZ conjecture has Mathematica 148: 47157.
Joyce DD (2000) Compact Manifolds with Special Holonomy.
followed two broad approaches. The first could be Oxford: Oxford University Press.
described as symplectic topological. For this, we Joyce DD (2003a) Special Lagrangian submanifolds with isolated
treat M, M just as symplectic manifolds and f , f just conical singularities. V. Survey and applications. Journal of
as Lagrangian fibrations. We also suppose B is a Differential Geometry 63: 279347, math.DG/0303272.
smooth 3-manifold and f , f are smooth maps. Under Joyce DD (2003b) Singularities of special Lagrangian fibrations
and the SYZ conjecture. Communications in Analysis and
these simplifying assumptions, Mark Gross, Wei- Geometry 11: 859907, math.DG/0011179.
Dong Ruan, and others have built up a beautiful, Joyce DD (2003c) U(1)-invariant special Lagrangian 3-folds in C3
detailed picture of how dual SYZ fibrations work at and special Lagrangian fibrations. Turkish Mathematical
the global topological level. Journal 27: 99114, math.DG/0206016.
The second approach could be described as local McLean RC (1998) Deformations of calibrated submanifolds.
Communications in Analysis and Geometry 6: 705747.
geometric. Here, we try to take the special Lagran- Strominger A, Yau S-T, and Zaslow E (1996) Mirror symmetry
gian condition seriously from the outset, and focus is T-duality. Nuclear Physics B 479: 243259, hep-th/
on the local behavior of special Lagrangian 9606040.
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 403
to the commuting Hamiltonians (the Toda limit in the intersection of the theory of Hilbert space
being a case in point) or to the joint eigenfunctions eigenfunction expansions and the theory of linear
(as exemplified by the -function system limit); it analytic difference equations.
seems difficult to control both sets of quantities at The study of the thermodynamics (N ! 1 limit
once. with temperature 0 and density 0 fixed) asso-
Starting from the spin type CMS systems, another ciated with the trigonometric and elliptic CMS
kind of limit can be taken. Specifically, by freez- systems and their spin cousins yields its own circle
ing the particles at equilibrium positions, it is of problems. It was initiated by Sutherland three
possible to arrive at integrable spin chains of decades ago, and even though a host of results on
HaldaneShastry and Inozemtsev type. partition functions, correlation functions, fractional
At this point, it is expedient to insert a brief statistics, strongweak coupling duality, relations to
remark on finite-dimensional integrable systems. As Yangians, etc., have meanwhile been obtained,
the term suggests, one may expect that, with due many questions are still open. This area also has
effort, such systems can be integrated, or, equiva- links with random-matrix theory, but the input from
lently, solved. But it should be noted that the this field is thus far limited to certain discrete
latter terms (let alone the qualifier due effort) couplings.
have no unambiguous mathematical meaning. Cer- The above N-dimensional integrable systems are
tainly, solving involves obtaining explicit infor- related to a great many infinite-dimensional integr-
mation on the action-angle map and joint able systems, both at the classical and at the
eigenfunction transform at the classical and quan- quantum level. On the one hand, there are structural
tum level, resp., but a priori it is not at all clear how analogs that have been used to advantage in the
far one can proceed. study of CMS systems, including Lax pair and R-
Focusing again on the CMS systems and their matrix formulations, zero-curvature representations,
relatives, it should be stressed that, in many cases, bi-Hamiltonian formalism, Backlund transforma-
one is still far removed from a complete solution, tions, time discretizations, and tools such as Baker
especially for the elliptic CMS systems. In this Akhiezer functions, Bethe ansatz, separation of
regard the previous remark serves not only as a variables, and Baxter-type Q-operators.
caveat, but also to make clear why the various On the other hand, there are striking physical
vantage points provided by different subfields in similarities between various soliton field theories
mathematics and physics are crucial: typically, they (a prominent one being the sine-Gordon field
yield complementary insights and distinct represen- theory) and infinite soliton lattices (in particular
tations for solutions, serving different purposes. several Toda type lattices), and the CMS systems for
To be sure, in first approximation the mathe- special parameter values. Particularly conspicuous
matics involved at the classical and quantum level is are the ties between the classical CMS systems and
symplectic geometry and Hilbert space theory, resp. the KP and two-dimensional Toda hierarchies. The
In point of fact, however, far more ingredients have latter relations actually extend beyond the solitons,
turned out to be quite natural and useful. On the including rational and theta function solutions.
classical level, these include the theory of groups, Lie CMS systems are relevant in various other
algebras and symmetric spaces, linear algebra and contexts not yet mentioned. A prominent one
spectral theory, Riemann surface theory, and more among these is a class of supersymmetric gauge
generally algebraic geometry. field theories. In this quantum context, the classical
On the quantum level, the viewpoint of harmonic CMS systems have surfaced in the description
analysis on symmetric spaces is particularly natural of moduli spaces encoding the vacuum structure
and fruitful for the nonrelativistic CMS systems and (SeibergWitten theory). Equally surprising, certain
their arbitrary root-system versions, whereas quan- classical CMS systems (with internal degrees
tum groups/algebras/symmetric spaces can be tied in of freedom) have found a second application in a
with the relativistic systems and their versions for quantum context, namely in the description of
other root systems. (The c ! 1 limit amounts to the quantum chaos (level repulsion).
q ! 1 limit in the quantum group picture.) As a We conclude this introduction by listing addi-
matter of fact, the whole area of special functions tional disparate subjects where connections with
and their q-analogs is intimately related to the CMS type systems have been found. These include
quantum CMS type systems (cf. also the last section the theory of Sklyanin, affine Hecke, KacMoody,
of this article). Finally, the occurrence of commut- Virasoro and W-algebras, equations of Knizhnik
ing analytic difference operators in the relativistic Zamolodchikov, YangBaxter, WittenDijkgraaf
(q 6 1) systems leads to largely uncharted territory VerlindeVerlinde, and Painleve type, Gaudin,
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 405
Hitchin, WessZumino, matrix and quasi-exactly reduces to [5] (up to an additive constant). Likewise,
solvable models, DunklCherednik and Polychrona- [7] results from [6] by choosing ! = =2 and
kos operators, the quantum Hall effect and quantum taking i!0 to 1.
transport, two-dimensional YangMills theory, The physical picture associated with the trigono-
functional equations, integrable mappings, Huygens metric and elliptic systems is quite different from
principle, and the bispectral problem. that of the rational and hyperbolic ones. Of course,
the potentials [7] and [6] are again repulsive, but
now the internal motion is confined and oscillatory.
Classical Nonrelativistic CMS Systems More specifically, due to energy conservation the
A system of N nonrelativistic equal-mass m particles phase spaces
on the line interacting via pair potentials can be III GIII R N ;
described by a Hamiltonian
GIII fxN < < x1 ; x1 xN < =g 8
1 XN X
H p2j Vxj xk ; m>0 1 IV GIV R N ;
2m j1 1j<kN
GIV fxN < < x1 ; x1 xN < 2!g 9
The CMS systems are defined by four distinct
choices of pair potential. The simplest choice reads are left invariant by the flow generated by the
trigonometric and elliptic N-particle Hamiltonian, resp.
Vx g2 =mx2 ; g>0 I 2 Alternatively, one may interpret the trigonometric
Hamiltonian as describing particles constrained to
Hence, the coupling constant g has dimension
move on a circle and interacting via the inverse
[action] (the product of [position] and [momen-
square potential [2]. In this picture, the quantities
tum]). This potential is clearly repulsive. Thus, each
2x1 , . . . , 2xN are viewed as angular positions on
initial state in the phase space
the circle, and one needs a suitable quotient of the
fx; p 2 R2N j x 2 Gg 3 phase space [8] by a discrete group action to
describe a state of the system.
where G is the configuration space Turning to integrability aspects, we begin by
G fx 2 RN j xN < < x1 g 4 noting that the total momentum Hamiltonian
is a scattering state. X
N
P pj 10
The next level is given by the hyperbolic choice
j1
Vx g2 2 =m sinh2 x; >0 II 5 obviously Poisson commutes with the above defin-
1
Hence, has dimension [position] , and the ing Hamiltonians of the systems. For N = 2, there-
previous system arises by taking to 0. It is clear fore, integrability is plain. It is possible to write
that [5] yields again a repulsive particle system, so down explicitly the higher commuting Hamiltonians
that each state in given by [3] is a scattering state. for N > 2 as well but, in the nonrelativistic setting,
The highest level in the hierarchy is the elliptic it is more illuminating to characterize them as the
level, where power traces or (equivalently) the symmetric func-
tions of a so-called Lax matrix.
Vx g2 }x; !; !0 =m; !; i!0 > 0 IV 6 The Lax matrix is an N N matrix-valued
and }(x; !, !0 ) denotes the Weierstrass }-function function on the phase space of the system. It plays
with periods 2! and 2!0 . It is beyond the scope of a pivotal role not only for understanding integr-
this article to elaborate on the elliptic regime, even ability, but also for setting up an action-angle
though it is of considerable interest. It reappears in transformation. The latter issue is discussed again
later sections as the most general regime in which later. Here the more conspicuous features of the Lax
integrability holds true. Indeed, a prominent feature matrix will be explained, focusing on the type II
of the elliptic case [6] is that it can be specialized system for expository ease. Then one can choose
both to the hyperbolic case [5] and to the trigono- Ljj pj ; Ljk ig=sinh xj xk ;
metric case, given by
j; k 1; . . . ; N; j 6 k 11
Vx g2 2 =m sin2 x III 7
Thus, L is Hermitean and we have
To obtain the hyperbolic specialization, one
should take !0 = i=2 and send ! to 1; then [6] tr L P; tr L2 2mH 12
406 CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type
(The rational Lax matrix results from [11] by taking Accordingly, one gets
! 0, and the trigonometric one by taking ! i.
The elliptic Lax matrix has a similar structure, but it ^1 ; . . . ; p
Lt diagp ^N L1 ; t!1 19
involves an extra spectral parameter.)
Since the time evolution is a canonical transforma-
Although not obvious, it is true that all of the
tion and the Poisson brackets {Hk , Hl } are time
power traces
independent (by the Jacobi identity), it now readily
1 follows from [19] that they vanish. (Indeed, Hk and
Hk tr Lk ; k 1; . . . ; N 13 Hl reduce to power traces of L1 , and the asymptotic
k
momenta p1 , . . . , pN Poisson commute.)
are in involution (i.e., Poisson commute). One way to
understand this involves the so-called Lax pair
equation associated with the Hamiltonian flow gener-
Quantum Nonrelativistic CMS Systems
ated by H = H2 =m. This involves a second N N
matrix function given by The canonical quantization prescription
X ig 2 pj ! ih@=@xj ; j 1; . . . ; N 20
Mjj 2
l6j m sinh xj xl
(h being the Planck constant) gives rise to an
2 14 unambiguous quantum Hamiltonian
ig cosh xj xk
Mjk
m sinh2 xj xk
2 X
h N X
j 6 k H @j2 Vxj xk 21
2m j1 1j<kN
When the positions and momenta in L and M evolve
according to the H-flow, one has for any classical Hamiltonian [1]. Thus, the defin-
ing Hamiltonians of the above systems give rise to
L_ t Mt ; Lt 15 well-defined partial differential operators (PDOs),
which act on suitable dense subspaces of the
where [ , ] is the matrix commutator. (Indeed, [15]
Hilbert space L2 (G , dx), = I, . . . , IV, with GI and
amounts to the Hamilton equations, as is readily
GII given by G in [4], and GIII , GIV by [8] and [9],
checked.) Since M is anti-Hermitean, it is not
respectively.
difficult to derive from this Lax pair equation that
We recall that there is no general result ensuring that
the flow is isospectral: Lt is related to L0 by a
a classically integrable system admits an integrable
unitary transformation Lt = Ut L0 Ut obtained from
quantum version. More precisely, when one substi-
Mt , so that the spectrum of Lt is time independent.
tutes [20] in N Poisson commuting Hamiltonians, it
This argument already shows the existence of N
need not be true that they commute as quantum
conserved quantities under the H-flow, namely the
operators, even when no ordering ambiguities are
N eigenvalues of L. It is, however, simpler to work
present. For the power trace Hamiltonians such
with either the power traces Hk given by [13] or
ambiguities do occur. (For example, [11] gives rise
with the symmetric functions Sk of L, given by
to a term in H3 proportional to p1 =sinh2 (x1 x2 ).)
X
N On the other hand, no noncommuting factors occur
det1N L k Sk 16 in the quantization of S1 , . . . , SN . To verify this, one
k0 need only note that Sk equals the sum of all k k
These Hamiltonians depend only on the eigenvalues principal minors of L, cf. [16]; choosing a diagonal
of L, so they are also conserved under the flow. element pj in a summand, one therefore has no
Note that dependence on xj in the remaining factors, hence no
ordering ambiguity.
S1 P; S2 P2 mH 17 As a result, the prescription [20] yields N
To see why these Hamiltonians are in involution, unambiguous operators Sk (x, ihr), which are
one can invoke the long-time asymptotics of the moreover formally self-adjoint on L2 (G , dx) for
H-flow. It reads each of the four cases = I, . . . , IV. Although by no
means obvious, it is true that these operators do
^;
pt p p^N < < p
^1 ; commute. Thus, integrability is preserved under
xj t x ^j =m;
tp 18 quantization of the above systems. Now the power
j
traces of a matrix can be expressed as polynomials
j 1; . . . ; N; t ! 1 in the symmetric functions (via the Newton
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 407
identities), so this yields an ordering ensuring that A natural ansatz to take interaction into account
the quantized power traces commute as well. now reads
Just as the action-angle transformation for a
X
N p
classically integrable system diagonalizes all of j
H mc2 cosh Vj x
the Poisson commuting Hamiltonians at once (in the j1
mc
sense that the transformed Hamiltonians depend
X
N p
only on the action variables), one expects that there P mc sinh
j
Vj x 25
exists a unitary operator that transforms all of the j1
mc
commuting Hamiltonians to diagonal form. In the Y
classical setting, the existence of this diagonalizing Vj x f xj xk
k6j
map follows (under suitable technical restrictions)
from the LiouvilleArnold theorem, whereas in the Indeed, it is plain that this still entails
quantum context the existence of such a joint
eigenfunction transformation is a far more delicate fH; Bg P; fP; Bg H=c2 26
issue. This problem is briefly discussed later again,
noting here that the solutions obtained to date vary But to obtain a relativistic particle system, the time
considerably in completeness and explicitness for and space translations must also commute. The
the four regimes. corresponding requirement {H, P} = 0 yields a severe
constraint on the pair potential function f (x) in
[25] whenever N > 2. (For N = 2, one gets
{H, P} = 0 irrespective of the choice of f.)
Classical Relativistic CMS Systems As it turns out, the vanishing requirement is
The nonrelativistic spacetime symmetry group is the satisfied when
Galilei group. Its Lie algebra is represented by the
f 2 x a b}x 27
time translation generator H given by [1], space
translation generator P given by [10], and the Galilei where a, b are constants and }(x) is the Weierstrass
boost generator function already encountered. Taking, for example,
a, b > 0, one can take the positive square root of the
X
N
B m xj 22 right-hand side of [27]. This choice of f (x) yields the
j1
defining Hamiltonian of the relativistic elliptic
system (type IV). In the three degenerate cases, it is
More precisely, the Poisson brackets are given by convenient to choose
8 2 2 2 2 1=2
fH; Pg 0; fH; Bg P; fP; Bg Nm 23 < 1 g =m c x
> I
2 2 1=2
so that the last bracket does not vanish (as is f x 1 sin g=mc=sinh x II 28
>
: 2 2 1=2
the case for the Galilei Lie algebra). This deviation 1 sinh g=mc=sin x III
is inconsequential, however, since the constant
Nm (central extension) yields trivial Hamilton It is an elementary exercise to check that this
equations. implies
The relativistic spacetime symmetry group (Poin-
lim H Nmc2 Hnr ; lim P Pnr 29
care group) yields a Lie algebra that differs from c!1 c!1
[23] only in Nm being replaced by H=c2 , where c is
where Hnr and Pnr are the above nonrelativistic time
the speed of light. Clearly, the functions
and space translation generators. Hence, the defin-
X
N p ing Hamiltonians of the relativistic systems reduce
j
H mc2 cosh to their nonrelativistic counterparts in the limit
mc
j1 c ! 1.
24
X
N p The special character of the function [27] makes
j
P mc sinh itself felt not only in ensuring Poincare invariance,
mc
j1 but also in entailing integrability. To begin with,
note that the functions
together with B given by [22] give rise to these
altered Poisson brackets. Physically, these three X
N
generators describe a system of N relativistic free S
N exp
pj ; 1=mc 30
mass-m particles in terms of their rapidities pj =mc. j1
408 CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type
commute with H and P, so that integrability for f (x) = 1, one obtains commuting quantum operators
N = 3 is plain. More generally, the Hamiltonians whose action is exemplified by
X X Y
S
l exp
pj f xj xk ; h d h
exp i Fx F x i 39
If1;...;Ng j2I j2I
31 mc dx mc
jIjl k62I
sinhig
Cjk expxj xk 36 do commute. In the elliptic case [27], this factoriza-
sinh xj xk ig
tion involves the Weierstrass -function, and com-
In [35], f (x) is the type II function given by [28]. The mutativity can be encoded in a sequence of
matrix C arises from Cauchys matrix 1=(wj zk ) functional equations satisfied by the -function.
via a suitable substitution, and Cauchys identity For the type IIII systems the pertinent factorization
N of [28] is given by
1 8
det > 1=2
wj zk j;k1 < 1
ig=x I
Y
N Y wj wk zj zk f
x sinh x
ig=sinh x1=2 II 41
1 >
:
37 sin x
ig=sin x1=2 III
w zj 1j<kN wj zk zj wk
j1 j
(Here one has g > 0, and the choice of square root is
ensures that [34] yields the Hamiltonians Sl of [31]. such that f
(x) ! 1 for g # 0.)
To conclude this section, we point out that the The nonrelativistic limit c ! 1 of the quantum
relation Hamiltonians [33] can be determined by expanding
S1 and S1 in a power series in = 1=mc. In this
L 1N Lnr O 2 ; !0 38
way, one obtains once more [29], except for a small,
where Lnr denotes the nonrelativistic Lax matrix but crucial change in Hnr : instead of the coupling
[11], can be used to deduce the involutivity of the constant dependence g2 in the potential energy, one
nonrelativistic Hamiltonians from that of their gets g(g h). The extra term arises from the action
relativistic counterparts. of the term linear in in the expansion of the
exponential on the term linear in in the expansion
of the functions f
(x).
Quantum Relativistic CMS Systems
From the perspective of the nonrelativistic quan-
When the canonical quantization prescription [20] is tum CMS systems, the change g2 ! g(g h) appears
applied to the classical Hamiltonians [31] with ad hoc. As it transpires, however, the different
CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type 409
dependence on g ensures that the eigenfunctions of and in particular to reveal its hidden duality
Hnr depend on g in a far simpler way. This will properties. The starting point is a commutation
become clear shortly. relation of L(x, p) with a diagonal matrix A(x)
given by
Ax diagdx1 ; . . . ; dxN
Action-Angle Transforms and Duality
y (I) 44
Under certain technical assumptions, any integrable dy
exp2y (II)
system given by N independent Poisson commu-
ting Hamiltonians S1 (x, p), . . . , SN (x, p) on a 2N- Obviously, the symmetric functions Dk (x) of A(x)
dimensional phase space admits local canonical yield an integrable system on , so the Hamiltonians
transformations to action-angle variables. Like the k
1 ^
^ D ^;
x; p
Dk ^ x; p k 1; . . . ; N 45
spectral theorem on the quantum level, this
structural result is of limited practical value. Indeed, yield an integrable system on the action-angle phase
just as the spectral theorem yields no concrete space . The crux of the matter is now that these
information concerning eigenfunctions, bound-state systems are familiar: they are also systems of type I
energies, scattering, etc., associated with a given and II!
self-adjoint Hamiltonian, the LiouvilleArnold To be specific, let us denote the dual systems just
theorem only yields general insight in the type of described by a caret, and the nonrelativistic/relati-
motion that can occur and the geometric character vistic systems by a suffix nr/rel, resp. Then the
of the local maps (in terms of invariant tori). duality properties alluded to are given by
To fully comprehend (solve) a given integrable
^Inr Inr ; ^Irel IInr
system, one should render the associated action- 46
angle map as concrete as possible. For the CMS type ^ nr Irel ;
II ^ rel IIrel
II
systems, a complete solution to this problem has
only been achieved for the systems of type IIII. The and 1 serves as the action-angle map for the dual
motion in the trigonometric systems is oscillatory, so systems.
that a closeup via the action-angle transform In order to sketch why this state of affairs holds
involves extensive geometric constructions. By con- true for the IIrel system, recall that its Lax matrix is
trast, the type I and II systems are scattering systems, given by [34]. From this, one readily checks the
and here the action-angle map can be tied in with commutation relation
the classical wave maps (Mller transformations). cothigA; L 2e e AL LA 47
We now sketch some salient features of the
action-angle maps for systems of type I and II. In Since L is Hermitean, there exists a unitary U
all cases the map (denoted ) is a canonical diagonalizing L. It can now be shown that the
transformation from the phase space (eqn [3]) spectrum of L is positive and nondegenerate, and
with 2-form dx ^ dp to the phase space that U e has nonzero components. The gauge
ambiguity in U (given by a permutation matrix and
^ f^
^ 2 R2N j p
x; p ^ 2 Gg 42 diagonal phase matrix) can, therefore, be fixed by
requiring
with 2-form dx ^ dp. Thus, the actions p1 , . . . , pN
vary over G given by [4] and the angles x1 , . . . , xN U LU diagexpp
^1 ; . . . ; expp
^N ;
over R. Consequently, amounts to with x and p
^ ^
pN < < p1 48
interchanged.
As should be the case, the transformed commuting U ej > 0; j 1; . . . ; N 49
Hamiltonians
A suitable reparametrization of U e then yields the
^
Sk Sk 1 ; k 1; . . . ; N 43 angle vector x.
depend only on the action vector p. To be specific, As a consequence, U AU becomes a function of x
they arise from Sk (x, p) by taking g = 0 (no interac- and p. In detail, one finds
tion, hence no x dependence) and substituting p ! p. U AU^ ^ L=2; 2; p
x; p ^ T
^; x 50
Indeed, the actions pk are the t ! 1 limits of the
momenta pk (t), where the t dependence refers to the where L(, ; x, p) is given by [34] and T denotes the
defining Hamiltonian of the system. transpose. Therefore, the dual Lax matrix
As it happens, the Lax matrix L is of decisive A = U AU is essentially equal to L, explaining the
importance to concretize the action-angle map , ^ rel IIrel announced above.
self-duality II
410 CalogeroMoserSutherland Systems of Nonrelativistic and Relativistic Type
With the action-angle transform under explicit existence of joint eigenfunctions has been shown,
control, much more can be said about the solutions but also because in the relativistic case the unitarity
to Hamiltons equations for each of the commuting of II and IV already breaks down for N = 2 when
Hamiltonians, both as regards finite times and as g increases beyond a critical value, cf. [57] below. It
regards long-time asymptotics (scattering). It is is quite likely that this happens for N > 2 as well,
beyond the scope of this article to enlarge on this, but this is not readily apparent from the current
but it is worth mentioning that the scattering reveals fragmentary knowledge on joint eigenfunctions for
the solitonic character of the particles. Indeed, the N > 2.
set of asymptotic momenta p1 , . . . , pN is conserved The only two cases where the g > 0 joint
under the scattering and the asymptotic position eigenfunction transform is of an elementary nature
shifts are factorized in terms of pair shifts. A quite are the IIInr and IIIrel cases. Indeed, the joint
remarkable feature of the type I systems is that the eigenfunctions describing the internal motion are of
shifts actually vanish (billiard ball scattering). the form
defining Hamiltonian is changed from g2 to g(g h) To conclude, we mention that the soliton scatter-
(a change already encountered above), Hermites ing behavior at the classical level is preserved under
results apply to couplings g = l h, l = 2, 3, 4, . . . His quantization in all cases where this can be checked.
eigenfunctions have a structure that is nowadays That is, no new momenta are created in the
referred to as the Bethe ansatz. For the same g values scattering process and the S-matrix is factorized as
and arbitrary N, Hnr eigenfunctions of Bethe ansatz a product of pair S-matrices. Moreover, for the type
type were found and studied by Felder and I cases, the S-matrix is a momentum-independent
Varchenko, but even for these g values much (but g-dependent) phase, as a quantum analog of the
remains to be done to achieve a complete under- classical billiard ball scattering.
standing of the IV transform.
A quite different approach, due to Komori and See also: Bethe Ansatz; Classical r-Matrices, Lie
Takemura, does yield rather detailed information on Bialgebras, and Poisson Lie Groups; Functional
IV for arbitrary g > 0. The key feature of their Equations and Integrable Systems; Integrable Discrete
Systems; Integrable Systems and Algebraic Geometry;
strategy is to view the IVnr case as a perturbation of
Integrable Systems in Random Matrix Theory; Integrable
the IIInr case. This entails, however, that the validity
Systems: Overview; Isochronous Systems; Ordinary
of their results is restricted to large imaginary period Special Functions; q-Special Functions; Quantum
of the }-function. CalogeroMoser Systems; SeibergWitten Theory;
For the IVrel system, there are only rather Separation of Variables for Differential Equations;
complete results on IV for N = 2. More specifically, Sine-Gordon Equation; Toda Lattices.
the eigenfunction transform is known to be unitary
for
g 2 0;
h = 57 Further Reading
and a dense set in a corresponding parameter space. Babelon O, Bernard D, and Talon M (2003) Introduction to
(For g outside this interval, unitarity is violated.) Classical Integrable Systems. Cambridge: Cambridge Univer-
The kernel of IV involves eigenfunctions of Bethe sity Press.
ansatz structure. For g = lh, l = 2, 3, . . . and arbitrary Calogero F (1971) Solution of the one-dimensional N-body
problem with quadratic and/or inversely quadratic pair
N, Bethe ansatz type Hrel eigenfunctions were found potentials. Journal of Mathematical Physics 12: 419436.
by Billey, generalizing the FelderVarchenko results Calogero F (2001) Classical Many-Body Problems Amenable to
mentioned above. Exact Treatments. Berlin: Springer.
It remains to discuss the Irel and IIrel systems. To van Diejen JF and Vinet L (eds.) (2000) CalogeroMoser
this end, we first recall the classical dualities [46]. It Sutherland Models. Berlin: Springer.
Fock V, Gorsky A, Nekrasov N, and Rubtsov V (2000) Duality in
is natural to expect that these dualities are still integrable systems and gauge theories. Journal of High Energy
present at the quantum level. For the Inr case, this is Physics 7(28): 139.
readily confirmed: the transform is indeed invariant Marshakov A (1999) SeibergWitten Theory and Integrable
under interchange of x and p. In fact, the N = 2 Systems. Singapore: World Scientific.
center-of-mass Hankel transform even depends only Moser J (1975) Three integrable Hamiltonian systems connected
with isospectral deformations. Advances in Mathematics
on (x1 x2 )(p1 p2 ), so that self-duality is manifest 16: 197220.
in this case. Olshanetsky MA and Perelomov AM (1981) Classical integrable
More generally, for N = 2 the expected dualities finite-dimensional systems related to Lie algebras. Physics
[46] are indeed present. The IInr 2 F1 transform Reports 71: 313400.
satisfies the Irel analytic difference equation in p1 Olshanetsky MA and Perelomov AM (1983) Quantum integrable
systems related to Lie algebras. Physics Reports 94: 313404.
p2 due to the contiguous relations obeyed by 2 F1 . The Ruijsenaars SNM (1987) Complete integrability of relativistic
IIrel transform is only unitary when g is restricted by CalogeroMoser systems and elliptic function identities.
[57], and it is indeed self-dual in the same sense as the Communications in Mathematical Physics 110: 191213.
action-angle map (Ruijsenaars). Ruijsenaars SNM (1999) Systems of CalogeroMoser type. In:
Turning finally to the case N > 2, the multi-variable Semenoff G and Vinet L (eds.) Proceedings of the 1994 Banff
Summer School Particles and Fields, pp. 251352. Berlin:
hypergeometric transform II does have the expected Springer.
duality property. More specifically, its inverse diag- Ruijsenaars SNM and Schneider H (1986) A new class of
onalizes the commuting Irel AOs (Chalykh). For IIrel integrable systems and its relation to solitons. Annals of
with N > 2 and g = l h, l = 2, 3, . . . , Chalykh also Physics (NY) 170: 370405.
finds elementary joint eigenfunctions with the Sutherland B (1972) Exact results for a quantum many-body
problem in one dimension II. Physical Review A
expected self-duality. To date, no Hilbert space results 5: 13721376.
for the N > 2 IIrel case have been obtained.
412 Canonical General Relativity
view of its application to GR. Diracs theory is A constrained system is first class if the Poisson
beautiful, finds vast applications, and it is still brackets of the constraints among themselves
commonly taken as the basis to discuss Hamiltonian vanishes weakly. Maxwell theory and GR are first-
GR, although GR does not fit very naturally into class constrained systems. In a first-class constrained
Diracs scheme. In the following, only the part of system, the constraints generate flows that preserve
Diracs theory relevant for GR is summarized. C and foliate it into orbits. The space of these
Consider a Lagrangian system with Lagrangian orbits is called the physical phase space (see
variables qi , with i = 1, . . . , n. Call vi the corresponding Figure 1).
velocities. Let the system be defined by the Lagrangian This flow is interpreted as a gauge transforma-
L(qi , vi ). The momenta are defined as functions of qi tion, namely as a change of mathematical descrip-
and vi by pi (qi , vi ) = @L(qi , vi )=@vi . The canonical tion of the same physical state. As first observed by
Hamiltonian H(qi , pi ) = vi (qi , pi )pi L(qi , vi (qi , pi )) Dirac, such interpretation is necessary if we demand
(summation over repeated indices is understood) is a deterministic physical evolution, for the following
obtained by inverting the function pi (qi , vi ) and expres- reason. A first-class constrained system is a system
sing the velocities as functions of the momenta vi (qi , pi ). in which the time evolution qi (t) of the Lagrangian
The phase space 0 is the space of the variables (qi , pi ). variables is not completely determined by the
Infinitesimal time evolution is given by the vector field equations of motion. (The relation between con-
V = vi (qi , pi )@=@qi fi (qi , pi )@=@pi , where velocities straints and underdetermination of the evolution is
and forces are given by the Hamilton equations simple to understand. In a Lagrangian system, the
vi = @H=@pi and fi = @H=@qi . number of equations of motion is equal to the
More formally, the 2-form ! = dpi ^ dqi endows number of Lagrangian variables. If one of these
0 with a symplectic structure. In the presence of equations is a constraint (between the initial
such a structure, every function A determines a velocities and initial coordinates), then one evolu-
vector field VA , defined by iVA ! = dA. By inte- tion equation is missing.) To recover a deterministic
grating this field, we have a flow in 0 , called the physical evolution, we must interpret two mathe-
flow generated by A. Time evolution is the flow matical states that can evolve from the same initial
generated by the Hamiltonian. Given two functions data, as describing the same physical state. As
A and B, their Poisson brackets are defined by the shown by Dirac, the transformations generated by
function {A, B} = VA (B) = VB (A). Therefore, the the constraints are precisely the ones that implement
time evolution of an observable A satisfies such an identification.
dA=dt = {A, H}. A dynamical system is completely It follows that the physical states must be identified
characterized by the set (0 , !, A, H), where with the equivalence classes of the points of C under
A = (A1 , . . . , AN ) is the ensemble of the observables. the gauge transformations generated by the con-
A constrained system, in the sense of Dirac, is straints, namely with the orbits of their flow. It is
a system for which the image of the function vi ! easy to show that (locally) there is a unique
pi (qi , vi ) is smaller than Rn . We can characterize symplectic 2-form !ph on ph such that its pullback
the image I of the map (qi , vi ) ! (qi , pi ) with a set to C is equal to the pullback of ! to C (i ! = !ph ,
of equations on 0 see Figure 1). Physical observables Aph are functions
on C that are gauge invariant, namely constant on
C qi ; pi 0 1
the orbits. That is, they are functions on ph . The freedom of GR are therefore (10 4 4) = 2 per
Hamiltonian is a physical observable. The dynamical point. In the linearized theory, these are the two
system (ph , !ph , Aph , H), where Aph is the ensemble degrees of freedom that describe the two polariza-
of the physical observables, is a complete description tions of a gravitational wave of given momentum.
of the physical system, called the gauge-invariant Formulations of GR in which there are additional
formulation, with no more constraints or gauges. gauge invariances (such as Cartans tetrad formula-
For instance, the phase space of Maxwell theory is tion, see below) have, accordingly, more constraints.
coordinatized by the Maxwell potential Since the Hamiltonian generates evolution in the
A (x), = 0, 1, 2, 3, and its conjugate momentum Lagrangian evolution parameter t, and since such
E (x). Since the time derivative of A0 does not evolution can be obtained as a gauge transforma-
appear in the Maxwell action, the primary con- tion, it follows that the Hamiltonian is a constraint
straint is in GR. The vanishing of the Hamiltonian is a
characteristic feature of general-relativistic systems.
E0 x 0 2 The Hamiltonian structure of GR is therefore
The secondary constraint turns out to be the Gauss determined by its phase space and its constraints.
law, The gauge-invariant formulation of the theory is
given just by the set (ph , !ph , Aph ) and no Hamilto-
@a Ea x 0 3 nian. The physical interpretation of this structure is
where a = 1, 2, 3. The first generates arbitrary discussed in the last section.
transformations of A0 , while the second gene-
rates the time-independent gauge transformations
Aa (x) = @a (x). The pair (A0 , 0 ) can be dropped ADM Formalism
altogether, since it is formed by a pure gauge In Einsteins formulation, the Lagrangian variable of
variable and a variable constrained to vanish. GR is the metric field g (x, t) (here we use the
RThe3 (gauge-invariant) Hamiltonian is H = 1=8 signature [ , , , ]). Arnowit, Deser, and
d x (Ea Ea Ba Ba ), where Ba = abc @b Ac is the Misner have introduced the following change of
magnetic field and Ea is easily recognized as the variables:
electric field. Ea and Ba are the physical p
observables. qab gab ; N 1= g00 ; N a qab ga0 6
where qab is the inverse of the three-dimensional
metric qab , used henceforth to raise and lower space
General Structure of GR Constraints indices a, b = 1, 2, 3. This is equivalent to writing the
GR fits into Dirac theory with a certain difficulty. invariant interval in the form
Since the constraints are the generators of the gauge ds2 N 2 dt2 qab dxa N a dtdxb N b dt
invariances, it is easy to determine their structure in
GR. The gauge invariances of GR are given by the These variables have an interesting geometric inter-
coordinate transformations x ! x0 = f (x), where pretation. Consider a family of spacelike (ADM)
x = (x, t). Accordingly, we have four primary con- surfaces t defined by t = constant. qab is the 3-metric
straints = 0, analogous to [2], and four secondary induced on the surface. N is called the lapse function
constraints C (x) = 0, analogous to [3]. These are and N a is called the shift function. Their geometrical
usually separated into the three momentum interpretation is illustrated in Figure 2.
constraints When written in terms of these variables, the
action of GR takes the form
Ca x 0 4 Z
p
which generate fixed-time spatial coordinate trans- Sqab ; N; N a d4 x qNR kab kab k2
formations and the Hamiltonian constraint
Cx 0 5 where q = det qab and R are the determinant and the
Ricci scalar of the metric qab ;
which generates changes in the t coordinate.
1
The metric g (x) that represents the gravitational kab @t qab Da Nb Db Na
field in Einsteins original formulation has ten 2N
independent components per point. Each first-class is the extrinsic curvature of the constant time
constraint indicates that one Lagrangian variable is surface; and Da is the covariant derivative of qab .
a gauge degree of freedom. The physical degrees of This action is independent of the time derivatives of
Canonical General Relativity 415
t + dt Tetrad Formalism
N a dt
The tetrad formalism, developed by Cartan, Weyl,
(x, t + dt) and Schwinger, has definite advantages with respect
N
to the metric formalism. It allows the coupling of
t
fermion fields to GR and is, therefore, needed to
(x, t) couple the standard model to GR. In the tetrad
Figure 2 The geometrical interpretation of the lapse N(x , t)
formalism, the gravitational field is represented by
and shift N a (x , t) fields. Two ADM surfaces, defined by the four covariant fields eI (x), where I, J, . . . = 0, 1, 2, 3
values t and t dt, are displayed. N(x , t)dt is the proper length are flat Lorentz indices raised and lowered with the
of the vector joining the two surfaces, normal to the first surface Minkowski metric IJ = diag[1, 1, 1, 1]. The
at (x , t). This is the proper time lapsed between the two surfaces relation with the metric formalism is given by
for an observer at rest on the first surface at (x , t). The quantity
dx a = N a (x , t)dt is the shift (the displacement) between the g IJ eI eJ
endpoint of this vector and the point (x , t dt) having the same
spacial coordinates as (x , t). In this formulation, GR has an additional local
SO(3,1) gauge invariance, given by local Lorentz
transformations on the I indices. The corresponding
N and N a . The conjugate momenta and a of these
canonical formalism is usually defined in a gauge
quantities are therefore the primary constraints and
in which ei0 = 0, where i, j, . . . = 1, 2, 3 are flat
the pairs (, N) and (a , N a ) can be taken out of the
three-dimensional indices raised and lowered with
phase space as for the pair (E0 , A0 ) in the Maxwell
example. We can therefore take the 3-metric qab (x) the ij = diag[1, 1, 1]. In this gauge, the
and its conjugate momentum pab (x) as the canonical Lorentz group is reduced to the local SO(3) group
variables of GR. The momentum is related to the of spatial transformations, and the ADM variable
velocity @t qab , by are defined by
p N Ni
pab qkab kqab I
e 11
0 eia
where k = kab qab .
where N i = eia Na . This is equivalent to writing the
The secondary constraints [4] and [5] turn out to be
invariant interval in the form
p 1 b
Ca qDb p p a 0 7 ds2 N 2 dt2 eai dxa Ni dt eib dxb N i dt
q
and The reduced canonical variables can be taken to be
the field eia (x) that represents the triad of the
1 1 p
C p pab pab p2 qR 0 8 ADM surface, and its conjugate momentum pai (x).
q 2 Their relation with the three-dimensional metric
where p = pab qab variables is given by transforming internal indices
If the two fields qab (x, t) and pab (x, t) satisfy the into tangent indices with the triad field eia and its
Hamilton equations inverse eai . In particular,
j
@qab x; t qab ij eia eb 12
fqab x; t; Htg 9
@t
pab ebi pai 13
ab
@p x; t
fpab x; t; Htg 10 Also, for later reference,
@t
where 2 i 1 i
Z kia eib kab p e p 14
det e a 2 a
3 ab
Ht d x Nx; tCqab x; t; p x; t
where p = eia pai .
N a x; tCa qab x; t; pab x; t The momentum and Hamiltonian constraints are
the same as in the ADM formulation, with qab and
with arbitrary functions N(x, t), N a (x, t), then the pab expressed in terms of the triad variables. The
metric g (x, t), defined from qab , N, N a by eqn [6], is additional constraint that generates the internal
the general solution of the vacuum Einstein equation rotations is
Ricci[g] = 0. Therefore, these equations provide a
Hamiltonian form of the Einstein field equation. Gi ijk eja pak 0 15
416 Canonical General Relativity
(qab (x), pab (x)) can take on arbitrary spacelike ADM explored. Among these: definitions of the physical
surfaces embedded in a given solution of the symplectic structure directly on the space of the
Einstein equation. Motion along the orbit (which solutions of the field equations; generalization of the
has dimension 4 13 ) corresponds to arbitrary initial and final surfaces to boundaries of compact
deformations of the surface. spacetime regions; construction of evolving con-
Physical applications of classical GR deal with stants of motion, namely families of gauge-invar-
relations between partial observables. A partial iant observables depending on a clock time
observable is any variable physical quantity that can parameter; multisymplectic formalisms that treats
be measured, even if its value cannot be determined space and time derivatives on a more equal footing;
from the knowledge of the physical state. An example and others. Many of these techniques are attempts
of partial observable in nonrelativistic mechanics is to overcome the unequal way in which time and
given precisely by the nonrelativistic time t. Partial space dependence are treated in the conventional
observables are represented in GR as functions on 0 . Hamiltonian formalism.
A physical state in ph determines an orbit in C, and GR has deeply modified our understanding of
therefore a set of relations between partial observables space and time. An extension of the canonical
(see Figure 1). That is, it determines the possible values formalism of mechanics, compatible with such a
that the partial observables can take when and modification, is needed, but consensus on the way
where other partial observables have given values. (or even the possibility) of formulating a fully
All physical predictions of classical GR can be satisfactory general-relativistic extension of Hamil-
expressed in this form. tonian mechanics is still lacking.
One of the partial observables can be selected to
play the role of a physical clock time, and evolution See also: Asymptotic Structure and Conformal Infinity;
can be expressed in terms of such clock time. In Constrained Systems; General Relativity: Overview;
general, it is difficult if not impossible to find a Loop Quantum Gravity; Quantum Cosmology; Quantum
Geometry and its Applications; Spin Foams;
clock time observable in terms of which evolution is
WheelerDe Witt Theory.
a proper conventional Hamiltonian evolution. Mat-
ter couplings partially simplify the task. For
instance, if the motion of planet Earth is coupled
Further Reading
to GR, then proper time along this motion from a
significative event on Earth, which is a partial Arnowitt R, Deser S, and Misner CW (1962) The dynamics of
observable, can be a convenient clock time. In pure general relativity. In: Witten L (ed.) Gravitation: An Introduc-
gravity, the York time defined as the trace of the tion to Current Research, p. 227. New York: Wiley.
Ashtekar A (1991) Non-Perturbative Canonical Gravity. Singapore:
extrinsic curvature TY = k, on ADM surfaces where World Scientific.
k is spatially constant, has been extensively and Bergmann P (1989) The canonical formulation of general
effectively used as a clock time in formal analysis of relativistic theories: the early years, 19301959. In: Howard D
the theory. A Hamiltonian that generates evolution and Stachel J (eds.) Einstein and the History of General
in a given clock time T can be formally obtained by Relativity. Boston: Birkhauser.
Dirac PAM (1950) Generalized Hamiltonian dynamics. Canadian
solving the Hamiltonian constraint with respect to a Journal of Mathematical Physics 2: 129148.
momentum PT conjugate to T. Such reparametriza- Dirac PAM (1958) The theory of gravitation in Hamiltonian form.
tions of the relative evolution of the partial Proceedings of the Royal Society of London, Series A 246: 333.
observables can be useful to analyze equations and Dirac PAM (1964) Lectures on Quantum Mechanics. New York:
to help intuition, but they are by no means necessary Belfer Graduate School of Science, Yeshiva University.
Gotay MJ, Isenberg J, Marsden JE, and Montgomery R (1998)
to have a well-defined interpretation of the theory. Momentum maps and classical relativistic fields. Part 1:
Another possibility to introduce a preferred time Covariant field theory. Archives: physics/9801019.
flow is to consider asymptotically flat solutions of Hanson A, Regge T, and Teitelboim C (1976) Constrained
the field equations. In this case, one can define a Hamiltonian Systems. Rome: Academia Nazionale dei Lincei.
nonvanishing Hamiltonian, given by a boundary Henneaux M and Teitelboim C (1972) Quantization of Gauge
Systems. Princeton: Princeton University Press.
integral at spacial infinity. This Hamiltonian gen- Isham CJ (1993) Canonical quantum gravity and the problem of
erates evolution in an asymptotic Minkowski time. time. In: Ibort LA and Rodriguez MA (eds.) Recent Problems in
This choice is convenient for describing observations Mathematical Physics, Salamanca, Dordrecht: Kluwer Academic.
performed from a large distance on isolated gravita- Lagrange JL (1808) Memories de la premiere classe des sciences
tional systems. Many general-relativistic physical mathematiques et physiques. Paris: Institute de France.
Rovelli C (2004) Quantum Gravity. Cambridge: Cambridge
observations do not belong to this category. University Press.
Various other techniques to define a fully gen- Souriau JM (1969) Structure des Systemes Dynamics. Paris:
erally covariant canonical formalism have been Dunod.
418 Capacities Enhanced by Entanglement
that with probability p completely randomizes the This equivalence is a direct consequence of the
input but otherwise leaves the input invariant. For existence of the teleportation and superdense coding
such channels, the maximum is achieved 0 by choos- protocols. When maximal entanglement is available,
ing a maximally entangled state for jiAA , yielding teleportation converts the ability to send classical
data into the ability to send quantum data at half
CE Dp 2 log2 d the classical rate. Conversely, by consuming
d2 1
hd 2 1 p 7
d2
An Bn
n
where for any 0 q 1 and integer r 1,
B
hr q q log2 q 1 q
B
1q
log2 8
r1 Figure 2 Circuit representation of the elements of an
entanglement-assisted quantum code for the channel N . E is
is the Shannon entropy of the distribution
Alices encoding operation, which acts on both her input state
(q, (1 q)=(r 1), . . . , (1 q)=(r 1)). and her half of the shared entanglement. Bob decodes using a
Entanglement assistance also simplifies the rela- quantum operation D acting on the output of the channel and his
tionship between the classical and quantum half of the shared entanglement.
420 Capacities Enhanced by Entanglement
maximal entanglement, superdense coding converts quantity of an ensemble of states that can be produced
the ability to send quantum data into the ability to by Alice acting on half of a shared entangled state and
send classical data at double the quantum rate. then sending her half through the channel. Invok-
ing the HolevoSchumacherWestmoreland (HSW)
theorem for the classical capacity (Holevo 1998,
Sketch of Proof Schumacher and Westmoreland 1997) therefore com-
The proof of a capacity theorem can usually be pletes the proof; using coding, the Holevo quantity is
broken into two parts, achievability and optimality. an achievable communication rate.
The achievability part demonstrates the existence of The proof that eqn [5] is optimal involves a series
a sequence of codes reaching the prescribed rate of entropy manipulations similar to the optimality
while the optimality part shows that it is impossible proofs for the unassisted classical and quantum
to do better. capacities. From the point of view of quantum
The main idea in the achievability proof can be information, the truly unusual part of the proof is
understood by studying the special case where the demonstration that it is unnecessary to consider
d2n
0 0
A = A . Let dA0 = dimA0 and {Uj }j =A0 1 be a set of multiple copies of N (Cerf and Adami 1997).
Weyl operators for A0n . The relevant property of Specifically, let
these operators is that averaging over them imple-
f N max IA; B 17
ments the constant map: for all density operators ,
2n
Uj Ujy A 11 Techniques analogous to those used for the unas-
dA0 j1 sisted capacities yield the upper bound
Consider the state j that arises if Alice acts with Uj 1 n
on the A0n 0nhalf of a rank-dAn 0 maximally entangled CE N lim f N 18
n!1 n
state jiAA and then sends the A0n half of the
resulting state through N . (Note that here A0n also Unlike the unassisted case, however, a relatively easy
~ The entropy of the resulting
plays the role of A.) argument shows that
state is f N 1 N 2 f N 1 f N 2 19
Hj H N Uj IB~ Ujy IB~ 12 (The analogous statement is an important conjecture
for the classical capacity and is known to be false for
H N 13 the quantum capacity (DiVincenzo et al. 1998).) As
a result, CE (N ) f (N ), which is the optimality part
since Uj does not change the local density operator of Theorem 1.
on A0n . To see the origin of eqn [19], it will be helpful to
On the other hand, if Alice selects a value of j BE
invoke Stinesprings theorem to write N j = trEj U j j j ,
from the uniform distribution, then the resulting where0 0 U j : A0j ! Bj Ej is an isometry. Fix a state
average input state to the channel will be jiAA1 A2 and let = (U 1 U 2 )(). Equation [19]
0n 0n follows from the fact that
A A A A 14
and the corresponding average output state will be IA; B1 B2 IAB2 E2 ; B1
0n
N (A ) A , which has entropy IAB1 E1 ; B2 20
0n
HN A HA 15 Simply redefining A to be AB2 E2 shows that the first
Therefore, the Holevo quantity of the ensemble of term of the right-hand side is upper bounded by
output states, defined as the entropy of the average f (N 1 ). The second term, likewise, is upper bounded
state minus the average of the entropies of the by f (N 2 ). Equation [20] is itself equivalent to the
individual output states, will be equal to inequality
0n 0n HB1 B2 jE1 E2 HB1 B2
HA H N A H N AA 16
HB1 jE1 HB2 jE2
This is precisely the quantity I(A; B) for the state HB1 HB2 21
0n
N (AA ) since the channel N transforms the A0n
system into B. Moreover, if Bob is given the A part of The inequality H(B1 B2 ) H(B1 ) H(B2 ) holds
the maximally entangled state, then this is the Holevo by the subadditivity of the von Neumann entropy.
Capacities Enhanced by Entanglement 421
Repeated applications of the strong subadditivity decoding will likewise be a TPCP map D : Bm B ~ ! Bn
inequality, moreover, lead to the inequality acting on m copies of the output of the channel, and his
half of the shared entanglement, B.~ This procedure is
HB1 B2 jE1 E2 HB1 jE1 said to -simulate N n on ( A0 n
) if
2
HB2 jE2 22
n 0n m ~ ~ 0n
F N 2 AA ; D N 1 E AB AA
Together, they prove eqn [20] and, thence, eqn [19].
The intuitive meaning of this single-letterization is 1 25
unclear, but regardless, it is interesting to note that
where F is the mixed state fidelity F(, ) =
p
the proof involved invoking a pair of purifying
(tr 1=2 1=2 )2 . The entire procedure, illustrated in
environment systems, E1 and E2 , and studying the
Figure 3, is said to be a (2nS , m, n, ) entanglement-
entropy relationships between the true outputs of
assisted simulation of N 2 by N 1 . A rate R, measured
the channel and the environments share.
in copies of N 2 per copy of N 1 , is said to be
0
achievable for A if there exists a choice of S 0 and
The Quantum Reverse Shannon Theorem a sequence of (2nS , mn , n, n ) entanglement-assisted
simulations with n=mn ! R while n ! 0.
A strong argument can be made that the entanglement- The quantum reverse Shannon theorem states
assisted capacity of a quantum channel is the most that the entanglement-assisted capacity completely
important capacity of that channel and that all the governs the achievable simulation rates.
other capacities are, in some sense, of less significance.
The fact that it is unnecessary to distinguish between Theorem 2 (Winter 2004, Bennett et al.). Given
the classical and quantum entanglement-assisted capa- two channels N 1 : A0 ! B and N 2 : A0 ! B, R is an
cities because they are related by a factor of 2 is a hint achievable simulation rate for N 2 by N 1 and all
0
in that direction, as is the simple, single-letter formula input states A if and only if
for CE (N ). CE N 1
A more general argument can be made by R 26
CE N 2
considering the problem of having one channel
simulate another. Indeed, the quantum capacity of Note that the form of eqn [26] ensures that the
a quantum channel is simply the optimal rate at simulation is asymptotically reversible: if a channel
which that channel can simulate the noiseless N 1 is used to simulate N 2 and the simulation is then
channel id2 on a single qubit. Likewise, the classical used to simulate N 1 again, then the overall rate
capacity of a quantum channel is its optimal rate for becomes
simulation of a qubit dephasing channel CE N 1 CE N 2
1 27
7! j0ih0jj0ih0j j1ih1jj1ih1j 23 CE N 2 CE N 1
In this spirit, the fact that CE (N ) = 2QE (N ) can be Thus, in the presence of free entanglement and for a
0
re-expressed in the form known input density operator of the form (A )n , a
single parameter, the entanglement-assisted classical
CE N
QE N 24 capacity, suffices to completely characterize the
CE id2 asymptotic properties of a quantum channel.
Equivalently, when entanglement is free, the optimal
rate at which N can simulate a noiseless qubit channel
is given by the ratio between the entanglement- An Am m Bn
assisted classical capacities of N and id2 . The
1
Moreover, since two channels that are asymptoti- can be written trE U BE for some isometry U BE .0 Let
AA0
cally equivalent without free entanglement will ji be a pure state and jiABE = U BE jiAA the
surely remain equivalent if free entanglement is corresponding purified channel output state. Careful
permitted, eqn [26] gives essentially the only analysis of the entanglement-assisted classical commu-
possible nontrivial, single-parameter asymptotic nication protocol achieving the rate I(A; B) leads to
characterization of quantum channels. This is the an entanglement-assisted quantum communication
sense in which the entanglement-assisted capacity protocol consuming entanglement at the rate
should be regarded as the most important capacity (1=2)I(A; E) ebits per use of N and yielding commu-
of a quantum channel. nication at the rate of (1=2)I(A; B) qubits per use N .
The proof of the quantum reverse Shannon The protocol achieving this goal is known as the
theorem is quite involved, but some of its features father (Devetak et al. 2004).
can be understood without much work. First, note If the entanglement consumed in the father were
that by the optimality statement of the entanglement- actually supplied by quantum communication from
assisted classical capacity, the desired simulation can Alice to Bob, then the net rate of quantum
exist only if eqn [26] holds. Otherwise, composing communication produced by the resulting protocol
the simulation of N 2 by N 1 with a sequence of codes would be (1=2)I(A; B) (1=2)I(A; E) qubits from
achieving CE (N 2 ) would result in a sequence of codes Alice to Bob, that is, the total produced minus the
beating the capacity formula for N 1 . total consumed.
Similarly, note that one method to simulate a This quantity, how much more information B has
channel N 1 using N 2 is to first use N 2 to simulate about A than E does, can be simplified using an
the noiseless channel and then use the simulated interesting identity. Since jiABE is pure,
noiseless channel to simulate N 1 . Since the achiev-
able rates for the first step are characterized by the IA; E HA HE HAE 28
entanglement-assisted capacity theorem, proving the
HA HAB HB 29
achievability part of Theorem 2 reduces to finding
protocols for simulating a general noisy quantum Expanding I(A; B) and canceling terms then reveals
channel N 2 by a noiseless one. That perhaps sounds that
like a strange goal, but nonetheless is the difficult
1
part of the quantum reverse Shannon theorem. 2IA; B 12IA; E HAjB
It is likely that the quantum reverse Shannon Ic AiB 30
theorem can be extended to cover other types of
0
inputs than the known tensor power states (A )n . where the function Ic is known as the coherent
The most desirable form of the theorem would be information. After optimizing over input states and
one valid for all possible input density operators on multiple channel uses, this is precisely the formula for
A0n , providing a single simulation procedure the unassisted quantum capacity of a quantum channel
dependent only on the channels and not the input (Devetak 2005). Thus, the net rate of qubit commu-
state. It is known that without modifying the form nication for the protocol derived from the father
of the free entanglement, this most ambitious form exactly matches the rates necessary to achieve the
of the theorem fails, but it is conjectured that the unassisted quantum capacity. The only caveat is that
full-strength theorem does hold provided very large the protocol derived from the father uses quantum
amounts of entanglement are supplied in the form of communication catalytically, meaning that some com-
the so-called embezzling states (van Dam and munication needs to be invested in order to get a gain
Hayden 2003). of Ic (AiB). For the unassisted quantum capacity, no
investment is necessary. Nonetheless, detailed analysis
of the situation reveals that the amount of catalytic
Relationships between Protocols communication required can be reduced to an amount
There is another sense in which the entanglement- sublinear in the number of channel uses, meaning the
assisted capacity can be viewed as the fundamental rate of required investment can be made arbitrarily
capacity of a quantum channel: an efficient protocol small. In this sense, the father protocol essentially
for achieving the entanglement-assisted capacity can generates the optimal protocols for the unassisted
be converted into protocols achieving the unassisted quantum capacity.
quantum and classical capacities, or at least very Protocols achieving the unassisted classical capa-
close variants thereof. city can be constructed in a similar way. In this case,
0
An efficient protocol in this case refers to one that one starts from an ensemble E = {pj , N ( jA )} of
does not waste entanglement. Suppose that N : A0 ! B states generated by the channel. Achievability of
Capacities Enhanced by Entanglement 423
the unassisted classical capacity formula follows discuss their results prior to their publication and
from achievability of rates of the form to Jon Yard for a careful reading of the manu-
X 0
script. This work has been supported by the
E H pj N A
j Canadian Institute for Advanced Research, the
j Canada Research Chairs program, and Canadas
X
pj H N A0 NSERC.
j 31
j
See also: Capacity for Quantum Information; Channels in
for arbitrary ensembles of output states. Consider Quantum Information Theory; Entanglement; Finite Weyl
the channel Systems; Quantum Channels: Classical Capacity;
X Quantum Entropy.
e
N hjjjji N j 32
j
0 P p 0
and input state jiAA = j pj jjiA jjiA . If = Ne(), Further Reading
then I(A; B) is equal to (E). Thus, there are protocols
consuming entanglement that achieve the classical Abeyesinghe A, Devetak I, Hayden P, and Winter A (2005) Fully
quantum SlepianWolf (in preparation).
communications rate (E) for the modified channel
Bennett CH, Devetak I, Harrow AW, Shor PW, and Winter A (2005)
Ne. Because the channel Ne includes an orthonormal The quantum Reverse Shannon Theorem (in preparation).
measurement which destroys all entanglement between Bennett CH, Shor PW, Smolin JA, and Thapliyal AV (1999)
A and B, however, it can be argued that any Entanglement-assisted classical capacity of noisy quantum
entanglement used in such a protocol could be replaced channels. Physical Review Letters 83: 3081 (arXiv.org:quant-
ph/9904023).
by shared randomness, which could then in turn be
Bennett CH, Shor PW, Smolin JA, and Thapliyal AV (2002)
eliminated by a standard derandomization argument. Entanglement-assisted capacity of a quantum channel and
The net result is a procedure for choosing rate (E) the reverse Shannon theorem. IEEE Transactions on Informa-
codes for the channel N consisting of states of the form tion Theory 48(10): 2637 (arXiv.org:quant-ph/0106052).
Cerf N and Adami C (1997) Von Neumann capacity of noisy
j1 jn , which is the essence of the achievability
quantum channels. Physical Review A 56: 3470 (arXiv.org:
proof for the unassisted classical capacity.
quant-ph/9609024).
This may seem like an unnecessarily cumbersome Devetak I (2005) The private classical capacity and quantum
and even circular approach to the unassisted capacity of a quantum channel. IEEE Transactions on
classical capacity given that the proof sketched Information Theory 51(1): 44 (arXiv.org/0304127).
above for the entanglement-assisted classical capa- Devetak I, Harrow AW, and Winter A (2004) A family of
quantum protocols. Physical Review Letters 93: 230504
city itself invokes the unassisted result in the form of
(arXiv.org:quant-ph/0308044).
the HSW theorem. The approach becomes more DiVincenzo DP, Smolin JA, and Shor PW (1998) Quantum
satisfying when one learns that simple and direct channel capacity of very noisy channels. Physical Review A
proofs of the father protocol exist that completely 57: 830 (arXiv.org:quantph/9706061).
bypass the HSW theorem (Abeyesinghe et al. 2005). Holevo AS (1998) The capacity of the quantum channel with
general signal states. IEEE Transactions on Information
Thus, the entanglement-assisted communication
Theory 44: 269273.
protocols can be easily transformed into their Schumacher B and Westmoreland MD (1997) Sending classical
unassisted analogs, confirming the central place of information via noisy quantum channels. Physical Review A
entanglement-assisted communication in quantum 56: 131138.
information theory. van Dam W and Hayden P (2003) Universal entanglement
transformation without communication. Physical Review A
67: 060302 (arXiv.org:quant-ph/0201041).
Winter A (2004) Extrinsic and instrinsic data in quantum
Acknowledgmnts measurements: asymptotic convex decomposition of
positive operator valued measures. Communications in
The author is grateful to the inventors of the Mathematical Physics 244(1): 157 (arXiv.org:quantph/
quantum reverse Shannon theorem for letting him 0109050).
424 Capacity for Quantum Information
denotes p the
norm of linear operators, and or even the average fidelity,
k%k1 := tr % % is the trace norm on the space of Z
trace-class operators B (H).
FT : h jTj ih jj i d 3
We use base two logarithms throughout, and we
write ld x := log2 x and exp2 x := 2x . Unfortunately, this equivalence is restricted to
capacities with noiseless reference channel S = id.
In the vicinity of other (nonideal) channels, equiva-
lence of the stabilized and unstabilized error criteria
Quantum Channel Capacity
may be lost. Of course, the comparison of channels
The intuitive concept underlying quantum channel is ultimately based on the comparison of a state to
capacity is made rigorous in the following its image, and here the pure states are the worst
definition: case. Hence, the remarkable insensitivity of the
quantum capacity to the choice of the error criterion
Definition 1 A positive number R is called achiev-
stems from the observation that the comparison
able rate for the quantum channel T : A ! B with
between an arbitrary state and a pure state is rather
respect to the quantum channel S : A0 ! B0 iff for any
insensitive to the criterion used.
pair of integer sequences (n )2N and (m )2N with
Instead of requiring the error quantity in eqn [1] to
lim ! 1 n = 1 and lim ! 1 mn R we have
approach zero in the large block limit ! 1, one
lim inf kDT n E S m kcb 0 1 might feel tempted to impose that the errors vanish
!1 D;E
completely for some sufficiently large block length,
the infimum taken over all encoding channels E and since this is the standard setup in the theory of
decoding channels D with suitable domain and quantum error correction (see Quantum Error Correc-
range. The channel capacity Q(T, S) of T with tion and Fault Tolerance). While it is true that errors
respect to S is defined to be the supremum of all can always be assumed to vanish exponentially in eqn
achievable rates. The quantum capacity is the special [1], requiring perfect correction may completely change
case Q(T) := Q(T, id2 ), with id2 being the ideal the picture: if a channel has some small positive
qubit channel. probability for depolarization, the same also holds for
its tensor powers, and no such channel allows the
In this article, we mainly concentrate on
perfect transmission of even one qubit. Hence, the
channels between finite-dimensional systems. This
capacity for perfect correction will vanish for such
is enough to bring out the basic ideas. Many of the
channels, while the standard capacity (in accordance
concepts and results discussed here can be general-
with Definition 1) will be close to maximal, Q(T) 1.
ized to Gaussian channels, which play a central
The existence of perfect error-correcting codes thus
role as building blocks for quantum optical
gives lower bounds on the channel capacity, but is not
communication lines (Holevo and Werner 2001,
required for a positive transfer rate.
Eisert and Wolf).
In the other extreme, one might sometimes feel
There is considerable freedom in the definition
inclined to tolerate (small) finite errors in the
of quantum channel capacity, at least for ideal
transmission. For some " > 0, we define Q" (T)
reference channels (Kretschmann and Werner
exactly like the quantum capacity in Definition 1,
2004). In particular, the encoding channels E in
but require only that the error quantity in eqn [1]
eqn [1] may always be restricted to isometric
falls below " for some sufficiently large .
embeddings.
Obviously, Q" (T)
Q(T) for any quantum
In addition, it is not necessary to check an infinite
channel T. We also have lim" ! 0 Q" (T) = Q(T)
number of pairs of sequences (n )2N and (m )2N
(Kretschmann and Werner 2004). In the classical
when testing a given rate R, as Definition 1 would
setting, even a strong converse is known: if " > 0 is
suggest. Instead, it is enough to find one such pair
small enough, one cannot achieve bigger rates by
which achieves the rate R infinitely often,
allowing small errors, that is, C" (T) = C(T). It is still
lim ! 1 m =n = R.
undecided whether an analogous property holds for
Without affecting the capacity, the cb-norm kTkcb
the quantum capacity Q(T).
may be replaced by the unstabilized operator norm
kTk or by fidelity measures, which are in general
much easier to compute. In particular, one might
choose the minimum fidelity, Related Capacities
FT : min h jTj ih jj i 2 This article is chiefly concerned with the quantum
k k1 capacity of a quantum channel. A variety of other
426 Capacity for Quantum Information
capacities have been derived from Definition 1 by enhance it. However, unlike in the purely classical
either amending the channel S to be simulated, or case, both the quantum and classical channel
allowing Alice and Bob to make use of additional capacity (but not the entanglement-assisted capacity)
resources. Their interrelations are reviewed in Bennett may increase under classical feedback.
et al. (2004)
Much interest has been devoted to the hybrid
problem of transmitting classical information undis-
Elementary Properties
torted over noisy quantum channels. The classical The capacity of a composite channel T1 T2 cannot
capacity C(T) of a quantum channel T is discussed in be bigger than the capacity of the channel with the
the article Quantum Channels: Classical Capacity of smallest bandwidth. This in turn suggests that
this Encyclopedia. It is obtained by choosing the ideal simulating a concatenated channel is in general easier
one-bit channel rather than the one-qubit channel as than simulating any of the individual channels. These
the standard of reference in Definition 1. Encoding relations are known as bottleneck inequalities:
channels E and decoding channels D are then
QT1 T2 ; S minfQT1 ; S; QT2 ; Sg 4
restricted to preparations and measurements, respec-
tively. Since a quantum channel can also be employed
QT; S1 S2
maxfQT; S1 ; QT; S2 g 5
to send classical information, we have C(T)
Q(T).
There are, obviously, examples in which this Instead of running T1 and T2 in succession, we may
inequality also run them in parallel. In this case, the capacity
P is strict: the entanglement-breaking channel
T(%) = j hjj%jji jjihjj is composed of a measurement can be shown to be superadditive,
in the orthonormal basis {jji}j , followed by a prepara-
QT1 T2 ; S
QT1 ; S QT2 ; S 6
tion of the corresponding basis states. It destroys all
the entanglement between the sender and a reference For the standard ideal channels, we even have
system, implying Q(T) = 0. Yet all the basis states jji additivity. The same holds true if both S and one
are transmitted undistorted, which is enough to of the channels T1 , T2 are noiseless, the third
guarantee that C(T) = 1. channel being arbitrary. However, results on the
Definition 1 also applies to purely classical activation of bound-entangled states seem to suggest
channels, and thus to the setting of Shannons that the inequality in eqn [6] may be strict for some
information theory. A classical channel T between channels (see Entanglement).
two d-level systems is completely specified by the Finally, the two-step coding inequality tells us that
d d matrix (Txy )dx, y = 1 of transition probabilities. by using an intermediate channel in the coding
For these channels the cb-norm difference is just process we cannot increase the transmission rate:
(twice) the maximal error probability:
QT1 ; T2
QT1 ; T3 QT3 ; T2 7
kid Tkcb = 2 supx {1 Txx } Applying eqn [7] twice with T2 = id and T3 = id
immediately yields upper and lower bounds on the
which is the standard error criterium for classical channel capacity with nonideal reference channel,
information transfer.
QT1
Dense coding and teleportation suggest that
QT1 ; T2
QT1 Qid; T2 8
entanglement is a powerful resource for information QT2
transfer. It doubles the classical channel capacity of The evaluation of the lower bound in eqn [8] then
a noiseless channel, and it allows to send quantum requires efficient protocols for simulating a noisy
information over purely classical channels. Surpris- channel T2 with a noiseless resource.
ingly, the entanglement-assisted capacities are often There are special cases in which the quantum
simpler and better behaved than their unassisted channel capacity can be evaluated relatively easily,
counterparts. Unlike the classical and quantum the most relevant one being the noiseless channel idn ,
capacities proper, they are relatively easy to calcu- where by the subscript n we denote the dimension of
late using finite optimization procedures, and there the underlying Hilbert space. In this case, we have
has recently been significant progress in under-
standing the simulation rates for nonideal channels ld n
Qidn ; idm 9
in this scenario (see Capacities Enhanced by ld m
Entanglement). The lower bound Q(idn , idm )
ldn=ldm is immedi-
The quantum channel capacity is unaffected by ate from counting dimensions. To establish the
entanglement-breaking side channels. In particular, upper bound, we use the fact that a noiseless
classical forward communication alone cannot quantum channel cannot simulate itself with a rate
Capacity for Quantum Information 427
exceeding unity: Q(idm , idm ) 1. This is just the which n copies of a given bipartite quantum state %
upper bound we want to prove for the special case shared between Alice and Bob can be asymptotically
n = m, and it can be extended to the general case converted into m maximally entangled qubit pairs
with the help of the two-step coding inequality [7]: (see Entanglement). Similar to the quantum capa-
Q(idm , idn ) Q(idn , idm ) Q(idm , idm ) 1, implying city, the definition involves the large block limit
Q(idn , idm ) 1=Q(idm , idn ) ld n=ld m, where in the n, m ! 1 and an optimization over all conceivable
last step we have applied the lower bound with the distillation protocols. These may consist of several
roles of n and m interchanged. rounds of local quantum operations and (forward or
Combining eqn [9] with the two-step coding two-way) classical communication. The one-way
inequality [7], we see that for any channel T and two-way distillable entanglement of % will be
denoted by D1 (%) and D2 (%), respectively.
ld m
QT; idn QT; idm 10 Suppose that Alice and Bob are connected by a
ld n quantum channel T and run such a one-way distilla-
which shows that quantum channel capacities relative tion protocol on (many copies of) theP state
p
to noiseless channels of different dimensionality only %T := (T id)jihj, where ji := (1= dA ) i ji, ii
differ by a constant factor. Fixing the dimensionality is maximally entangled on HA HA0 . If the distillation
of the reference channel then only corresponds to a yields maximally entangled qubits at positive rate R,
choice of units. Conventionally, the ideal qubit Alice may apply the standard teleportation scheme to
channel id2 is chosen as a standard of reference, as send arbitrary quantum states to Bob undistorted at
in Definition 1 above, thereby fixing the unit bit. that same rate R. Like the distillation protocol itself,
The upper bound on the capacity of ideal channels teleportation requires classical forward communica-
can also be obtained from a general upper bound on tion, which however does not affect the channel
quantum capacities (Holevo and Werner 2001), capacity (cf. the section Related capacities). Thus,
which has the virtue of being easily calculated in Q(T)
D1 (%T ). If two-way distillation is allowed, we
many situations. It involves the transposition map, have Q2 (T)
D2 (%T ) for the capacity Q2 (T) assisted
which we denote by , defined as matrix transposi- by two-way classical side communication.
tion with respect to some fixed orthonormal basis. Conversely, if Alice and Bob use a bipartite
The transposition is positive but not completely quantum state % shared between them as a substitute
positive, and thus does not describe a physical for the maximally entangled state ji in the
channel (see Channels in Quantum Information standard teleportation protocol, they will implement
Theory). We have kkcb = d for a d-level system. some noisy quantum channel T% . If this channel
For any channel T and small " > 0, allows to transfer quantum information at nonvan-
ishing rate R, Alice may share maximally entangled
QT Q" T ld kTkcb : Q T 11
states with Bob at that same rate R. Consequently,
where Q" is the finite error capacity introduced in D1 (%)
Q(T% ) and D2 (%)
Q2 (T% ).
the section Quantum channel capacity. These relations (Bennett et al. 1996) allow to
The upper bound Q (T) has some remarkable bound channel capacities in terms of distillable
properties, which make it a capacity-like quantity in entanglement and vice versa. If the two maps
its own right. For example, it is exactly additive, T 7! %T and % 7! T% are mutually inverse, we even
have D1 (%) = Q(T% ) and D2 (%) = Q2 (T% ). In this
Q S T Q S Q T 12
case, the duality % T% is the physical implementa-
for any pair S, T of channels, and it satisfies tion of Jamiolkowskis isomorphism between bipar-
the bottleneck inequality: tite states and channels (see Channels in Quantum
Information Theory). This has been shown
Q ST min{Q S; Q T}
(Horodecki et al. 1999) to hold for isotropic states,
Moreover, it coincides with the quantum capacity on which are invariant under the group of all U U
ideal channels, Q (idn ) = Q(idn ) = ld n, and it vanishes transformations, where U is the complex conjugate
whenever T is completely positive. In particular, if of the unitary U. The corresponding channels are
id T maps any entangled state to a state with positive partly depolarizing.
partial transpose, we have Q (T) = 0. In general, T%T 6 T. However, the so-called con-
clusive teleportation allows us to implement T at
least probabilistically, resulting in the relation
StateChannel Duality
1
Quantum capacity is closely related to the distillable 2
QT D1 %T QT 13
entanglement, which is the optimal rate m/n at dA
428 Capacity for Quantum Information
The duality [13] can be applied to show that both taking the limit n ! 1 in eqn [15] is indeed required,
the unassisted and the two-way quantum capacities and in general the evaluation of the capacity formula
are continuous in any open set of channels [15] still demands the solution of asymptotically large
having nonvanishing capacities (Horodecki and variational problems. This should be contrasted with
Nowakowski 2005). the entanglement-assisted capacities CE (T) = 2QE (T)
(where a simple nonregularized coding theorem is
known to hold, see Capacities Enhanced by Entan-
Coding Theorems glement) and the capacity for classical information
C(T) (where additivity is conjectured but not proved,
Computing channel capacities straight from Defini-
see Quantum Channels: Classical Capacity). Even a
tion 1 is a tricky business. It involves optimization in
maximization of the single-shot coherent information
systems of asymptotically many tensor factors, and
Ic (T, %) appears to be a difficult optimization
can only be performed in special cases, like the
problem, since this quantity is neither convex nor
noiseless channels in the section Elementary prop-
concave and may have multiple local maxima (Shor
erties. Coding theorems aspire to reduce this
2003). Thus, even for simple-looking systems like the
problem to an optimization over a low-dimensional
qubit depolarizing channel, so far we only have upper
space. They usually come in two parts: the converse
and lower bounds on the quantum channel capacity,
provides an upper bound on the channel capacity
but do not yet know how to compute its exact value.
(typically in terms of some entropic expression),
We now sketch Devetaks proof of Theorem 1,
while the direct part consists of a coding scheme
assuming only some familiarity with Holevo
that attains this bound. By Shannons celebrated
SchumacherWestmoreland (HSW) random codes
coding theorem, the classical capacity of a classical
for the classical channel capacity (see Quantum
noisy channel can be obtained from a maximization
Channels: Classical Capacity). It is easily seen from
of the mutual information over all joint input
Stinesprings dilation theorem (see Channels in
output distributions.
Quantum Information Theory) that a noiseless
For the quantum channel capacity, the relevant
quantum channel provides perfect security against
entropic quantity is the coherent information,
eavesdropping. This is one of the characteristic traits
of quantum mechanics and lies at the heart of
Ic T; % : H T% H T idj % ih % j 14
quantum cryptography. In his proof, Devetak
where H denotes the von Neumann entropy: showed a way to turn this around and upgrade
H(%) = tr% ld%, and % 2 HA HA0 is a purifica- coding schemes for private classical information to
tion of the density operator % 2 A. The coherent quantum channel codes.
information does not increase under quantum The relation between quantum information trans-
operations, Ic (S T, %) Ic (T, %) for any quantum fer over a channel T : A ! B and privacy against
channel S and state % 2 A. This is the data eavesdropping is best understood in terms of the
processing inequality (Barnum et al. 1998), which companion channel TE : A ! E. TE arises from a
shows that the regularized coherent information given Stinespring isometry V : HA ! HB HE of
provides an upper bound on the quantum channel T TB by interchanging the roles of the output
capacity: if Alice and Bob have a coding scheme for system B and the environment E:
the channel T with capacity Q(T), n channel uses
TB % trE V%V TE % trB V%V 16
allow them to share a maximally entangled state of
size
exp2 n Q(T). The coherent information of this The channel TE describes the information flow into
state equals
n Q(T), and was no larger prior to the environment E, a system we assume to be under
Bobs decoding. complete control of a potential eavesdropper, Eve
Recently, Devetak (2005) developed a coding say. The setup for private classical information
scheme to show that this bound is in fact attainable. transfer (including the definition of rates and capa-
Different proofs were outlined by Lloyd and Shor. city) is then exactly the same as for the classical
channel capacity (see Quantum Channels: Classical
Theorem 1 For every quantum channel T,
Capacity), but the protocols now have to satisfy the
1 additional requirement that TE releases (almost) no
QT lim max Ic T n ; % 15 information to the environment. This can be achieved
n!1 n %
by randomizing over E
exp2 n (TE , {pi , %i }) code
Unlike the classical or quantum mutual information, words of a standard HSW code of total size
coherent information is strictly superadditive for
exp2 n (TB , {pi , %i }), where {pi , %i } is the quantum
some channels (DiVincenzo et al. 1998). Hence, ensemble from which a set of random code words
Capacity for Quantum Information 429
{k, l }kB=, 1,E l = 1 is generated. The appearance of Given a set of pure state code words
the Holevo bound {jkl i}kB=, 1,E l = 1 of a private classical information
! protocol, for entanglement transfer Alice prepares
X X
T;fpi ;%i g : H pi T%i pi H T%i 17 the input state
i i
1 X 1 X
B E
in the dimension of both these code spaces can be jiA0 A p jkiA0 p jkl iA 20
B k1 E l1
understood from the size of the relevant typical
subspaces (Devetak and Winter 2004).
The randomization guarantees that the remaining where A0 denotes a reference system that Alice keeps
B
exp2 n((TB ) (TE )) code words are almost in her lab. On his share of the resulting output state
indistinguishable to Eve: j0 iA0 BE Bob will then employ the corresponding
measurement operators {Mkl }k,B l, =
E
1 to implement the
1 XE
coherent measurement
n
TE kl jl "; 8j; k 1; . . . ; B 18
E X p
l1 1
VM j iB := kl
Mkl jiB j kliB1 B2
The net transfer rate for private classical informa-
tion is then R
(TB ) (TE ), which is just the total which places the measurement outcomes into some
transfer rate for the channel Alice ! Bob reduced by reference system B1 B2 . Any measurement which
the transfer rate Alice P
! Eve. identifies the output with high probability only
Remarkably, if % = i pi j i ih i j is a decomposi- slightly disturbs the output state, and thus Bobs
tion of % 2 A into pure states, the private transfer coherent measurement leaves the total system in an
rate exactly equals the coherent information, approximation of the state
Ic TB ; % H TB % H TE % X
B ;E
1
TB TE 19 j00 i p jki 0 jki jli j0 i 21
B E k1;l1 A B1 B2 kl BE
The so-called entropy exchange
in which Eve and Bob are still entangled. A
H TE % = H TB idj % ih %
completely depolarizing channel TE would directly
quantifies the extent to which a formerly pure yield a factorized output state B E here. Although
ancilla state becomes mixed via interaction with the randomization in eqn [18] does not necessarily
the signal states. Equation[19] then nicely reflects result in complete depolarization, there is a controlled
the intuition that for high-rate quantum information unitary operation which Bob may apply to effectively
transfer the signal states should not entangle too decouple Eves system, resulting in the output state
p P
much with the environment. In fact, for an almost
(1= B ) k j kkiA0 B1 E, which is the maximally
noiseless channel the entropy exchange nearly entangled state of size B
exp2 n Ic (TB , %) required
vanishes, and the optimized coherent information for teleportation. The direct part of the capacity
almost attains the maximal value 1, while for nearly theorem then follows by applying the above coding
depolarizing channels we have Ic (TB , %) H(%) 0. scheme to large blocks and maximizing over (pure)
So far, we have sketched a protocol for private input ensembles, concluding the proof.
classical information transfer. Devetaks coherenti- Devetaks proof of the coding theorem seems to
fication allows to pass from the transmission of indicate that the private classical capacity Cp (T)
classical messages to the transmission of coherent equals the quantum capacity Q(T) for every
superpositions. This technique has also been applied quantum channel T. However, for the coherentifica-
to obtain entanglement distillation protocols from tion protocol, we have restricted the private coding
secret key distillation, and offers a unified view on schemes to pure state input ensembles, and thus we
the secret classical resources and their quantum can only conclude that Q(T) Cp (T). The existence
counterparts (Devetak and Winter 2004, Devetak of bound-entangled states with positive one-way
et al. 2004). distillable secret key rate (Horodecki et al. 2005)
In order to transfer quantum information, Alice implies that this inequality can be strict. A general
will only need to send one half of a maximally procedure does exist to retrieve (almost) all the
entangled state of dimensionality
exp2 n Ic (TB , %). information from the output of a noisy quantum
As described in the previous section, teleportation channel that releases (almost) no information to the
then allows her to transfer arbitrary quantum states environment. But this requires a stronger form of
from a subspace of that size. privacy than eqn [18].
430 Capacity for Quantum Information
Quantum Channels with Memory shown to die out even exponentially. The set of
these channels is open and dense in the set of
This article has so far been restricted to memory-
quantum memory channels. Hence, generic memory
less quantum channels, in which successive chan-
channels are forgetful.
nel inputs are acted on independently. Messages of
The capacity of memory channels is defined in
n symbols are then processed by the tensor
complete analogy to the memoryless case, replacing
product channel T n , as in Definition 1 and
the n-fold tensor product T n in Definition 1 by
illustrated in Figure 1. In many real-world applica-
the n-fold concatenation Tn . The coding theorems
tions, the assumption of having uncorrelated noise
for (private) classical and quantum information
cannot be justified, and memory effects need to be
can then be extended from the memoryless case
taken into account. For a quantum channel T with
to the very important class of forgetful channels
register input A and register output B, such effects
(Kretschmann and Werner 2005).
are conveniently modeled (Bowen and Mancini
Nonforgetful channels call for universal coding
2004) by introducing an additional memory
schemes, which apply irrespective of the initializa-
system M, so that now T : M A ! B M is a
tion of the input memory. Such schemes are
completely positive and trace-preserving map with
presently known only for very special cases.
two input systems and two output systems. Long
messages with n signal states will then be
processed by the concatenated channel Acknowledgmnts
Tn : M An ! Bn M. In such a concatenation,
the memory system is passed on from one channel The author thanks the members of the quantum
application to the next, and thus introduces information group at TU Braunschweig for their
(classical or quantum) correlations between con- careful reading of the manuscript and many helpful
secutive register inputs. suggestions. He also gratefully acknowledges the
Remarkably, this relatively simple model can be funding from Deutsche Forschungsgemeinschaft
shown (Kretschmann and Werner 2005) to encom- (DFG).
pass every reasonable physical process: every sta-
See also: Capacities Enhanced by Entanglement;
tionary channel S : A1 ! B1 which turns an infinite
Channels in Quantum Information Theory; Entanglement;
string of input states (on the quasilocal algebra A1 ) Positive Maps on C -Algebras; Quantum Channels:
into an infinite string of output states on B1 and Classical Capacity; Quantum Error Correction and Fault
satisfies the causality constraint is in fact a con- Tolerance; Source Coding in Quantum Information Theory.
catenated memory channel. Causality here means
that the outputs of the stationary channel S at given
time t0 do not depend on inputs at times t > t0 . Further Reading
Figure 2 illustrates the structure theorem for causal
Barnum H, Nielsen MA, and Schumacher B (1998) Information
stationary quantum channels. In general, it produces transmission through a noisy quantum channel. Physical
not only the memory channel T with memory Review A 57: 4153 (quant-ph/9702049).
algebra M, but also a map R describing the Bennett CH, Devetak I, Shor PW, and Smolin JA (2004)
influence of input states in the remote past. Inequalities and separations among assisted capacities of
quantum channels, quant-ph/0406086.
Intuitively, such a map is often not needed, because
Bennett CH, DiVincenzo DP, Smolin JA, and Wootters WK
memory effects decrease in time: the memory (1996) Mixed-state entanglement and quantum error correc-
channel T is called forgetful if outputs at a large tion. Physical Review A 54: 3824 (quant-ph/9604024).
time t depend only weakly on the memory initializa- Bowen G and Mancini S (2004) Quantum channels with a finite
tion at time zero. In fact, memory effects can be memory. Physical Review A 69: 012306 (quant-ph/0305010).
Devetak I (2005) The private classical information capacity and
quantum information capacity of a quantum channel. IEEE
Transactions on Information Theory 51: 44 (quant-ph/0304127).
tr tr Devetak I, Harrow AW, and Winter A (2004) A family of
quantum protocols. Physical Review Letters 93: 230504
(quant-ph/0308044).
S = R T T tr
Devetak I and Winter A (2004) Relating quantum privacy and
quantum coherence: an operational approach. Physical
Time Time Review Letters 93: 080501 (quant-ph/0307053).
DiVincenzo DP, Shor PW, and Smolin JA (1998) Quantum
Figure 2 By the structure theorem, a causal automaton S can channel capacities of very noisy channels. Physical Review A
be decomposed into a chain of concatenated memory channels 57: 830 (quant-ph/9706061).
T plus some input initializer R. Evaluation with the partial trace tr Eisert J and Wolf MM Gaussian quantum channels. In Cerf N,
means that the corresponding output is ignored. Leuchs G, and Polzik E (eds.) Quantum Information with
Capillary Surfaces 431
Continuous Variables of Atoms and Light. London: Imperial Horodecki K, Pankowski L, Horodecki M, and Horodecki P
College Press (in preparation)(quant-ph/0505151). (2005) Low dimensional bound entanglement with one-way
Holevo AS and Werner RF (2001) Evaluating capacities of distillable cryptographic key, quant-ph/0506203.
bosonic Gaussian channels. Physical Review A 63: 032312 Kretschmann D and Werner RF (2004) Tema con variazioni:
(quant-ph/9912067). quantum channel capacity. New Journal of Physics 6: 26
Horodecki M, Horodecki P, and Horodecki R (1999) General (quant-ph/0311037).
teleportation channel, singlet fraction, and quasidistillation. Kretschmann D and Werner RF (2005) Quantum channels with
Physical Review A 60: 1888 (quant-ph/9807091). memory. Physical Review A 72: 062323 (quant-ph/0502106).
Horodecki P and Nowakowski ML (2005) Simple test for Shor PW (2003) Capacities of quantum channels and how to find
quantum channel capacity, quant-ph/0503070. them. Mathematical Programming 97: 311 (quant-ph/0304102).
Capillary Surfaces
R Finn, Stanford University, Stanford, CA, USA
a
2006 Elsevier Ltd. All rights reserved.
g
Historical and Conceptual Background u0
A capillary surface is the interface separating two
fluids that lie adjacent to each other and do not mix.
Examples of such surfaces are the upper surface of
liquid partially filling a vertical cylinder (capillary
tube), the surface of a liquid drop resting in Figure 1 Capillary tube in infinite reservoir, in downward
equilibrium on a tabletop (sessile drop) and the gravity field.
surface of a liquid drop hanging from a ceiling
(pendent drop); further instances are the surface of a
falling raindrop, the bounding surface of the liquid more general usage adopted in the definition above
in the fuel tank of a spaceship, and the interface derives from the recognition of a class of phenomena
formed by a fluid mass rotating within another fluid. with a common physical basis.
This last example extends to the problem of rotating The first recorded observations concerning
stars. capillarity seem due to Aristoteles c. 350 BC. He
Interfaces separating fluids and solids share some wrote that a broad flat body, even of heavy
of the physical attributes of capillary surfaces, and material, will float on water, however a narrow
the study of wetted portions of rigid support thin one such as a needle will always sink. Any
surfaces becomes essential for describing global reader with access to a needle and a glass of water
behavior of capillary configurations. However, some will have little difficulty refuting the assertion.
significant distinctions appear that change the Remarkably, the error in reasoning seems not to
formal structure of the problems, and must be have been pointed out for almost 2000 years,
accounted for in the theory. when Galileo addressed the problem in his
Phenomena governed by capillarity pervade all of Discorsi, about 1600. The only substantive studies
daily life, and most are so familiar as to escape till that time are apparently those of Leonardo da
special notice. By contrast, throughout the eigh- Vinci a hundred years earlier. Leonardo intro-
teenth century and presumably earlier, great atten- duced reasoning close in spirit to that of current
tion centered on the rise of liquid in a narrow glass literature; however, the Calculus was not available
circular-cylindrical tube dipped vertically into a to him, and he was not in a position to develop his
liquid reservoir (Figure 1); this striking event had a ideas in quantitative ways.
dramatic impact that confounded intuition. Clarifi-
Youngs Contribution
cation of the behavior became one of the major
problems challenging the scientific world of the The later discovery of the Calculus provided a
time, and was not achieved during that period. The driving impetus guiding many new studies during
term capillary, adapted from the Latin capillus the eighteenth century. But despite the enormity of
for hair, was applied to the phenomenon since it was that weapon, it did not on its own suffice, and initial
observed only for tubes with very fine openings; the quantitative success had to await two initiatives
432 Capillary Surfaces
taken by Thomas Young in 1805. Young based his where N is a unit normal on S, and n is unit
studies on the concept of surface tension that had conormal (as indicated in Figure 2) on . Multi-
been introduced by von Segner half a century earlier. plying both sides of [4] by , the right-hand side
Segner hypothesized that every curve on a fluid/fluid becomes the net surface tension force on S. Since
interface S experiences on both its sides an orthogo- that must equal the net balancing pressure force, we
nal force per unit length, which (for given obtain
temperature) depends only on the materials and is Z
directed into the tangent planes on the respective
p 2H N dS 0 5
sides. The presence of such forces can be indicated S
by simple experiments. They become clearly evident
in the case of thin (soap) films spanning a frame, in Letting the diameter of S tend to zero, the assertion
which case there is an easily observed orthogonal follows.
pull on the frame, see the section Dual interpreta- We emphasize here the implicit assumption above,
tion of : distinction between fluids and solids. that is a constant depending only on the particular
Young made two basic conceptual contributions materials, and not on the shape of S. This author
(Y1, Y2): knows of no source in which that is clearly
established, although experiments and experience
Y1. Relation of pressure jump across a free interface
provide some a posteriori justification. See the
to mean curvature and surface tension.
further comments under Y2, and later in sections
Consider a piece of surface S in the shape of a Gauss contribution: the energy method and
spherical bowl of radius R, separating two immisci- Dual interpretation of : distinction between fluids
ble fluid media, as in Figure 2. In equilibrium, any and solids.
pressure difference p across S must be balanced by
Y2. The capillary contact angle.
a tension on its rim . If S projects to a disk of
(small) radius r on the plane tangent to S at the Young asserted that there are surface tensions for
symmetry point, we are led to solid/fluid interfaces analogous to those just intro-
duced, and again depending only on the materials.
r2 p 2r sin # 1
This assertion is erroneous, as was suggested in
where # is inclination of S at the rim, relative to the writings of Bikerman and of others, and more
plane. We thus find at the base point recently established in a definitive example by Finn.
Using his premise, Young attempted to characterize
d sin # 1
p 2 2 2 the contact angle made by the fluid surface with a
dr R rigid boundary, by requiring that the net tangential
Young then went on to consider a general S, without component of the three surface tension vectors
symmetry hypothesis. Letting 1=R1 , 1=R2 denote the vanish at the triple interface; this leads to the often
planar curvatures at a point in S of two normal employed but incorrect Young diagram, see
sections in orthogonal directions, he asserted that Figure 3, and the relation
1 1 1 1 2
p 2 2H 3 cos 6
2 R1 R2 0
where H is the mean curvature of S at the point.
Although Young provided no formal justification for
this step, we can establish it with the aid of a general 1
formula from differential geometry that was not
known in his lifetime: Solid Gas
Z I
2HN dS n ds 4
S r
2
0
n
p1
Liquid
p2
Figure 2 Pressure change across fluid element, balanced by Figure 3 Young diagram; balance of tangential forces.
surface tension. Residual normal force remains.
Capillary Surfaces 433
for cos in terms of the magn itudes of the three quantitative indication of what narrow should
surface tensions. Young concluded that the signify. Note that whenever 0 < =2, [9]
contact angle depends only on the materials, and becomes negative when the nondimensional Bond
in no other way on the conditions of the problem. Number B = a2 exceeds 8; since u is known to be
This basic assertion is by a fortuitous acciden t positive in the indicated range for , [9] provides
correc t, as follows from the contribution by no information in that case, whereas [7] is still of
Gauss described below; it underlies all modern some value. Nevertheless, [9] is asymptotically
theory. exact and consists of the first two terms of the
Using Y1 and Y2, Young produced the first formal expansion in powers of a; that was first
verifiable prediction for the rise height u0 in proved by D Siegel in 1980, almost 200 years
the circular capillary tube of Figure 1. He following the discovery of the formulas. In 1968,
assumed the interface to be spherical, so that H P Concus extended the formal expansion for the
is constant and a = cos =H. He assumed vanish- height to the entire traverse 0 < r < a. F Brulois
ing outside pressure. According to classic laws of (1981) and independently E Miersemann (1994)
hydrostatics, p = gu0 = 2H by Y1, where is proved the expansion to be asymptotic to every
fluid density; there follows the celebrated rela- order. Explicit bounds for the rise height above
tion, presented entirely in words in his 1805 and below, making quantitative the notion of
article: narrow, were obtained by Finn.
Laplace supplied the first detailed mathematical
2 cos g
u0 ; 7 investigations into the behavior of capillary surfaces,
a applying his ideas to many specific examples. His
underlying motivation apparently derived at least
Young scorned the mathematical method, and partly from astronomical problems, and he pub-
made a point of deriving and publishing his lished his contributions in two Supplements to the
results on capillarity without use of any mathe- tenth volume of his Mecanique Celeste.
matical symbols. This personal idiosyncrasy
causes his publications to be something of a
Gauss Contribution: The Energy Method
challenge to read.
Young and Laplace both based their reasonings
The Laplace Contribution on force-balance arguments, which at best were
unclear and at worst conceptually wrong. In
In 1806, Laplace published the first analytical expres-
1830, Gauss took up the problem anew from a
sion for the mean curvature of a surface u(x, y), and
variational point of view, using the Johann
showed that the expression can be written as a
Bernoulli principle of virtual work. To do so, he
divergence. He obtained the equation
attempted to characterize both surface energies
ru and bulk fluid energies in terms of postulated
div Tu 2H; Tu q 8
particle attractions and repulsions. In an aston-
1 jruj2
ishing 30 pages, he essentially introduced founda-
Thus, if H is known from geometrical or physical tions of modern potential theory, of measure
considerations, as it is for the capillary tube in theory, and of thermodynamics. He ended up
the example just considered, one finds a second- with elaborate expressions that could not readily
order (nonlinear) equation for the surface height be applied, and which at least to some extent he
of any solution as a graph. The equation is did not use. He asserted, for example, that the
elliptic for any function u(x, y) inserted into the bulk internal energy would be proportional to
coefficients, however not uniformly so; the parti- volume, which for an incompressible fluid is
cular nonuniformity leads to some striking and constant under admissible deformations, and on
unusual behavior of its solutions, as we shall see. that basis he ignored the bulk energy term
With the aid of [8], Laplace improved the Young completely. His procedures then led him, in an
estimate [7] to independent and more convincing way, to the
" !# identical equation and boundary condition that
2 cos 1 2 1 sin3 had been produced by his predecessors. It must,
u0 a 9
a cos 3 cos3 of course, be remarked that his justification for
ignoring the bulk energy term would not be
Both Young and Laplace proposed their for- correct for a compressible liquid (see the section
mulas for narrow tubes, but neither gave any Compressibility), and it is open to some
434 Capillary Surfaces
P
Spoon (left) Rotationally Potato chip
symmetric
2
Spoon (left) Potato chip
Figure 10 Symmetry breaking in exotic container, g = 0. Below:
calculated presumed global minimizer (spoon) and local minimizer + D1
D2
(potato chip). Above: experiment on Mir: symmetric insertion of fluid (No graph)
(center); spoon (left); potato chip (right). This is a grayscale version
(D)
of a color figure reproduced from Journal of Fluid Mechanics, 224:
38394, (1991) with permission of Cambridge University Press.
R 2
Discontinuous Dependence II
any > 0, and thus the Miranda question has a Gerhardt (F and G) extended this condition, and
positive answer for that configuration. But if we showed in particular that solutions exist in general
replace 0 by a concentric disk " 1 of radius in piecewise smooth . This result contrasts with the
1 ", we find zero-gravity case [18] discussed in the section
( ) Existence p
questions
II, for which solutions fail to
2" cos
1 " exist when 1 L2 cos > 1 at a protruding corner
inf u x; sup u x;
" 1" (see the section Discon
" ptinuous
dependence I).
1 sin ! 1 sin However, in the cases 1 L2 cos > 1 studied
< 1 " 23 by F and G the solution u(x) is necessarily
cos cos
unbounded in the corner. This condition is equiva-
where ! = arccos(cos = sin ), and u" is the solution lent to < j =2j at the corner. Concus and Finn
of [22], [17] in " . Since does not appear on the showed that if j =2j in a neighborhood
right side of [23], there follows in particular that for of a corner with rectilinear sides, as indicated in
any " > 0, there holds Figure 11, then the solution u(x) satisfies
( )
2
lim inf u1 x; sup u" x; 1 24 jux;j < 26
!0 " "
the gravity field. For a container consisting of a existence theorem above can no longer be expected;
semi-infinite vertical cylinder, closed at the bottom, it is possible to give explicit examples of analytic
one obtains domains, and constant data , for which no solution
0 g of the problem exists. Thus, even in a large down-
div Tu u
g1 cos ! 32 ward gravity field, the solutions can emulate the
behavior of solutions of [18]. That can happen,
where ! is the angle between the upward directed however, only for data exceeding =2. The
surface normal and the vertical axis, and is to be condition [33] is again necessary for existence.
determined by a volume constraint. Athanassenas For eqn [34], cannot be eliminated by addition
and Finn proved that for a general smooth domain of a constant to the solution, and its determination
, prescribed , and prescribed fluid mass M subject creates a new level of difficulty toward solution of
to the restriction the physical existence question. Athanassenas and
M < 0 jj=
g 33 Finn proved unique existence of solutions of [35],
[17] for a capillary tube of general smooth section
there exists exactly one solution of [32] achieving dipped into an infinite liquid bath (which corre-
the boundary data . sponds to = 0), when 0 =2. If > =2 then
The condition [33] is necessary for existence with solutions do not always exist; it can happen that the
the prescribed mass. surface moves down to the bottom of the tube,
The methods used for this theorem do not permit regardless of the depth of immersion. Under a
regularity conditions to be relaxed to allow domains hypothesis of radial symmetry, Finn and Luli were
with corner points. An approximation procedure able to prove the existence of solutions with
yields an existence theorem for such cases, however prescribed mass in a semi-infinite cylinder closed at
the uniqueness proof then fails; it can be replaced by the bottom, in the range 0 < , and uniqueness
a weaker result, estimating the difference between if 0 =2. Note that in this case, values >
two eventual solutions: Let u, v, be solutions of [32] =2 are not excluded. For large enough mass, the
in a piecewise smooth domain , and suppose surface will always cover the base of the tube.
Tu Tv on = @ except at the corner points,
where no data are prescribed. Then
Closing Remarks
u v
=0 34
This brief survey is intended only as a general
throughout .
indication of the current state of the theory; much
Note that in this result, no growth condition is
material of interest could not be included. Nor have
imposed at the corner points. It can happen that
we addressed hysteresis effects on contact angle.
both u and v are unbounded at a corner point;
Detailed references to the material discussed and also
nevertheless, [34] holds uniformly over .
to further information can be found in the articles
The solutions of [32] emulate many of the
listed below. More recent publications can be located
characteristics of solutions of [16]. Notably, there is
by following links in MathSciNet or Zentralblatt.
again a dichotomy of behavior, depending on open-
ing angle 2 at a corner point, with all solutions
either bounded, or unbounded with growth like 1=r. Acknowledgmnt
I owe a special debt of thanks to my colleague
The Equations II Paul Concus, who read the material in detail and
provided many effectual suggestions, leading to a
If in addition to taking account of the change of density
much-improved exposition.
with height, one accounts for the energy change due to
expansion or contraction of volume elements with See also: Compressible Flows: Mathematical Theory;
changing density, one is led to the equation Interfaces and Multicomponent Fluids; Newtonian Fluids
0
p0
gu and Thermohydraulics.
div Tu e 1
g1 cos ! 35 Further Reading
Here the changes from the incompressible case are References for text material and for further reading are cited in
much more significant than for [32]. In order to the expository articles:
ensure stable behavior of solutions, it seems appro- Finn R (2002a) Milan Journal of Mathematics 70: 123.
priate to impose the condition 0 >
p0 . The general Finn R (2002b) Mathematical Intelligencer 24: 2133.
446 Cauchy Problem for Burgers-Type Equations
The standard classical questions concerning From references one can deduce the following gene-
Cauchy problems [1], [3] and [2], [4], namely ral properties of Cauchy problems [1], [3] and [2], [4].
those relating to existence, unicity, regularity, and
Theorem 0 Under Assumption 1, we have:
conservation laws are well established (see Oleinik
(1959), and Serre (1999)). This section formulates (i) There exists a unique (weak) solution f(x, t), x 2
only those which are essential for the study R, t 2 R of the problem [1], [3]; this solution is
of asymptotic behavior of solutions f(x, t) and necessarily smooth for t > 0; besides, it satisfies
F(x, t), when t ! 1 or " ! 0, and of the relation the following conservation laws for t > 0:
between vanishing viscosity and difference scheme
f x; t ! ; x ! 1
approximations for inviscid Burgers type
equations. f x; t ! ; x ! 1
One can see that asymptotic behavior of solutions Z 1 Z 0
d
of [2], [4] when " ! 0 is not the same as the f x; t dx f x; t dx
dt 0 1
asymptotic behavior of [1], [3] when " ! 0, in Z
spite of fact that in the limiting case " = 0 both [1] ydy
and [2] look identical. It can be explained by the
fact that eqn [2] can be interpreted as a semidiscrete Moreover, if the initial value f(x, 0) is nonde-
approximation of the nonconservative (nonphysical) creasing as a function of x, then solution f(x, t)
equation is nondecreasing as a function of x for all t 0.
@F @F " @2F (ii) There exists a unique solution F(x, t) x 2 R, t 2
F F 2 R of the problem [2], [4]; this solution is
@t @x 2 @x
smooth for t > 0; besides, it satisfies the follow-
However, the problem [2], [4] can be naturally ing conservation laws for t > 0 and 2 [0, 1):
transformed into conservative (physical) initial pro-
blem. Indeed, the substitution Fk" "; t ! ; k ! 1
Z F Fk" "; t ! ; k ! 1
dy " #
f 1 Z Z Fk"";t
0 y
d X dy X0
dy
dt k1 Fk"";t y k1 y
(under condition of integrability of 1=(y)) trans-
forms [2] into the equation
Gelfands problem admits natural extension for N-wave has been obtained by Dafermos (1977)
eqn [2] with the initial conditions and Liu (1978).
For the case of a general (f ), in particular, for
Fx; 0 ; if x > x the case of nonincreasing (f ), we need the notion
8
Fx; 0 F0 x; if x 2 x ; x of shock profile. Following Serre (1999), three
definitions can be introduced.
Let us Rintroduce, for u 2 [ , ], the function
u Definition The initial problem [1], [3] (correspond-
(u) = (y)dy. Let the function (u), u 2
[ , ], be upper bound of the convex set ingly, [2], [4]) admits ( , )-shock profile ( < )
if there exists a traveling-wave solution of this equation,
fu; v: v u; u 2 ; g that is, of the form f = f(x ct) (correspondingly,
F = F(x Ct)), such that f(x) ! when x ! 1
By Assumption 1, the set s = {u 2 [ , ]:
(correspondingly, F(x) ! when x ! 1).
(u) < (u)} is the finite union of intervals,
s= ( , 0 ) [ (1 , 1 ) [ (L , ), where = 0 From the results of Gelfand (1959) and Oleinik
0 1 < 1 L L = . (1959), it follows that initial problem [1], [3] admits
Let us define the function f(x, t) by ( , )-shock profile iff
8 Z
< ; if x < t 1
^f x; t ^0 1 x=t; if t x t c ydy
:
; if x > t Z u
1
0 < ydy; 8u 2 ; 9
where in the case (u) l , u 2 (l , l ), l = 0, u
0
1, . . . , L; also, by definition, ( )( 1) (l ) = [l , l ]. From the results of Henkin and Polterovich
Theorem 1 (Gelfand) The solution f (x, t) of the (1991) and Belenky (1990), it follows that initial
problem [1], [7] for the case " = 0 and initial problem [2], [4] admits ( , )-shock profile iff
conditions f (x, 0) = , if x > 0, has the explicit Z
form: f (x, t) = f(x, t). 1 1 dy
C y
The analogous statement is valid also for the Z u
1 dy
problem [2], [8] if, in the construction above, one >
; 8u 2 ; 10
u u y
takes
Z u In the case " = 0, the equality in [9] and [10] is
dy
u called the RankineHugoniot condition, the inequal-
0 y ity in [9] and [10] is called the entropy condition (or
instead of (u), u 2 [ , ]. the GelfandOleinik condition).
The Gelfand problem for [1], [3] and [1], [7] with Definition For initial problem [1], [3] (correspond-
monotonic (f ) was solved by Iljin and Oleinik ingly, [2], [4]) admitting ( , )-shock profile and
(1960). In the case = , the solution of this for " = 0, we will call by shock waves the weak
problem follows from an earlier work of Lax (1957). solutions of [1], [3] (correspondingly, [2], [5], [4]) of
For the case of linear (f ), the solution of this problem the form
follows from an earlier work of Hopf (1950).
For semidiscrete initial problems [2], [4] and [2], ~f x ct ; if x ct
[8], the analog of the asymptotic results of Hopf and
IljinOleinik have been obtained and applied by ~ x Ct ;
F if x Ct
Henkin and Polterovich (1991).
The case of increasing (f ) has been studied in where c, C satisfy RankineHugoniot and entropy
detail. In this case, for both initial problems [1], [3] conditions [9], [10].
and [2], [4], there is uniform convergence of solutions Definition The ( , )-shock profile for [1] (cor-
f (x, t) and F(x, t) to the so-called rarefaction profile respondingly, for [2]) is called strict if in addition to
[9], [10] we have the Lax (1954) condition:
; x > t
gx=t 1
x=t; x 2 t; t < c < 11
t ! 1 (see Iljin and Oleinik (1960) and Henkin and correspondingly
and Polterovich (1991)). More precise result in
this case about convergence to the so-called < C < 12
Cauchy Problem for Burgers-Type Equations 449
The ( , )-shock profile for [1] or [2] is called The values of d0 and D0 are determined by
semicharacteristic if one of the inequalities in [11] or Z d0 Z 1
[12] is strict and the other is an equality. This profile f x; 0 dx f x; 0 dx 0
is called characteristic if both inequalities in [11] or 1 d0
Z D0 Z
[12] are equalities. 1
Fx; 0 dx Fx; 0 dx 0
One can check (Iljin and Oleinik 1960, Henkin and 1 D0
Polterovich 1991) that if in addition to Assumption 1
the function on [ , ] is nonconstant and
nonincreasing then eqn [1] (correspondingly, [2])
admits a strict ( , )-shock profile. Remarks
The main result of IljinOleinik (1960) for eqn [1] (i) The statements of Theorem 2 give a positive
and analogous statement of Henkin and Polterovich answer to Gelfands question for the case of
(1991) for eqn [2] can be presented as follows. initial problem [1], [3] and [2], [4], admitting
Theorem 2 strict shock profiles.
(ii) For linear (f ) = a bf , a > 0, a b > 0,
(i) Let the initial problem [1], [3] admit a strict b < 0, the traveling waves f, F for [1], [3] and
( , )-shock profile f. Let f (x, t), x 2 R, t 2 [2], [4] can be found explicitly:
R , be a solution of [1], [3]. Then there exists
d0 2 R
~f
sup jf x; t ~f x ct d0 j ! 0; t ! 1 13 1 expfpx ctg
x2R b b
c a ; p
The value of d0 is determined uniquely by relation 2 2"
Z 1 ~
F
ff x; 0 ~f x d0 g dx 0 1 expfPx Ctg
1
a b a b
(ii) Let the initial problem [2], [4] admit a strict C b ln
; P ln
a b " a b
( , )-shock profile F. Let F(x, t), x 2 R, t 2
R be a solution of [2], [4]. Then there exists where
continuous function D0 (), 2 [0, 1), such that
a b
~ Ct D0 fx="gj ! 0;
sup jFx; t Fx b b 1
a b
x2R 14
t!1 (iii) For initial problems [1], [7] and [2], [8], >
The function D0 (), 2 [0, 1], is determined , the asymptotic convergence statements
uniquely from relation [13][15] admit the precise asymptotic esti-
mates (see Iljin and Oleinik (1960) for [1], [7]:
X
1
~ D0 g 0
fFn; 0 Fn
k1 sup jf x; t ~f x ct d0 j Oet
x2R 16
where > 0; " > 0
Z A
dy ~<A
F ; F < A; F ~ Ct D0 fx="gj Oet
sup jFx; t Fx
F y
x2R
(iii) If in conditions (i) and (ii), we take " = 0 then > 0; " > 0 17
there exist d0 , D0 such that 8 > 0, we have
sup j Fx; tj ! 0; t!1 Theorem 2(i) is proved basing on the following
xCtD0 idea. Let f satisfy the initial problem [1], [3] and let
450 Cauchy Problem for Burgers-Type Equations
Remark For the cases of nonstrict shock profiles Asymptotic Behavior of Solutions of
(characteristic or semicharacteristic) the statements Generalized Burgers Equations
of Theorem 2 are not valid. The reason is that,
under initial conditions [3], [4] for any d0 and D0 , The main current interest and the main difficulty in
we have the study of Gelfands problem for generalized
Z 1 Burgers equations consist in the following question
ff x; o ~f x d0 gdx 1 formulated explicitly for initial problem [1], [3] by
1 Liu et al. (1998): In the Cauchy problem there is
Cauchy Problem for Burgers-Type Equations 451
the question of determining the location of viscous show, on the contrary, that characteristic shock
shock waves. A similar question and related profiles and, as a consequence, the behavior of
conjecture were formulated by Henkin and Potter- initial problems [1], [3] and [2], [4] as in Theorem
ovich (1999) for the initial problem [2], [4]. 4 are rather a rule than an exception.
For solving this problem, it is important to solve it (ii) The statement of Theorem 4(i) (and also of
first for the Burgers type equations admitting Theorem 5(i)) below) disprove the Gelfand hope
nonstrict shock profiles. that the main term of asymptotic (t ! 1) of
f (x, t), satisfying [1], [7], coincides with the
Theorem 4 (HenkinShananinTumanov).
solution of [1], [7] for = 0 with the same
(i) Let the initial problem [1], [3] admit the nonstrict initial condition. Indeed, in conditions of Theorem
( , )-shock profile [9] and f(x ct) be a 4, we have ( ) = c or ( ) = c, but 0 ( ) 6
corresponding traveling-wave solution. Let 0 ( ); then for any > 0 the traveling wave
f(x ct 0 ln t d0 ) for [1], [3], concentrated
0 6 0; if c
near the point x (t) = ct 0 ln t d0 , moves
0 6 0; if c away (t ! 1) from the shockwave for [1], [7] for
= 0, concentrated near the point x0 (t) = ct
Let f (x, t) be a solution of [1], [3]. Then there
o( ln t), where o( ln t)= ln t ! 0, t ! 1.
exist constants 0 and d0 such that
(iii) Theorem 4 (and also Theorem 5 below) also
sup jf x; t ~f x ct 0 ln t d0 j ! 0; t!1 illustrate another interesting phenomenon: for
x2R
the case 0 ( ) 6 0 ( ), one has asymptotic
where convergence of the solution of [1], [3] (corre-
spondingly of [2], [4]) to the traveling
0
8 wave f(x ct 0 ln t d0 ) (correspondingly
0
< 1= ;
> if > c F(x Ct 0 ln t D0 )), which does not
0
1= ; if c > satisfy eqn [1] or correspondingly eqn [2]. Such
>
:
1=0 1=0 ; if c a phenomenon was first discovered by Liu and
Yu (1997) in the special boundary-value pro-
(ii) Let the initial problem [2], [4] with = 1 admit the
blem for the classical Burgers equations, if
nonstrict ( , )-shock profile [10] and F(n Ct)
u(x, t) satisfies the following conditions:
be a corresponding traveling-wave solution. Let
0 6 0; if C if ut u ux uxx ; u0; t 1; u1; t 1;
x
0 6 0; if C ux; 0 th ; then
2
Let F(n, t) be a solution of [2], [4]. Let 1
jux; t th x ln1 tj ! 0; t ! 1; x 0
def 2
Fn; 0 Fn; 0 Fn 1; 0 0
Theorem 4 is proved in basing on the following
Then there exist constants 0 and D0 such that idea. Let f (x, t) satisfy [1], [3] and F(n, t) satisfy [2],
~ Ct 0 ln t D0 j ! 0;
sup jFn; t Fn [4]. Let f(x ct) be the traveling wave for [1], [3]
n2Z and F(n Ct) be the traveling wave for [2], [4].
t!1 Suppose that ( ) > c = C = ( ). Let dA (t) and
DA (t), A > 0 be functions such that
where
Z ctApt
0 ~
8
C=20 ; if > C p ff x; t f x ct dA tgdx 0 19
>
> ctA t
>
< C=20 ; if C >
and, correspondingly,
>
> C=21=0
>
:
1=0 ;
if C p
X t
CtA
~ Ct DA tg
fFk; t Fk
p
kCtA t
Remarks p p p
Ct A t Ct A tFCt A t 1; t
(i) One could think that nonstrict shock profiles p
~
FCt A t 1 Ct DA t 0
as in Theorem 4 can appear only in exceptional
cases. But Proposition 2 and Theorem 5 below 20
452 Cauchy Problem for Burgers-Type Equations
The relations [9], [20] can be called localized initial problems [1], [3] and [2], [4] and some
conservation law. The proof contains two difficult partial results which confirm this conjecture. To
parts. p simplify formulation we admit the following.
The first part consists in
pproving
that for A > 2 c
Assumption 2 Let (u) and (u) be upper bounds of
(correspondingly, A > 2 C) the following asymp-
the convex hulls for the graphs of
totics are valid:
Z u
ln t u ydy
dA t d0 o1; t ! 1
0
C ln t and
DA t D0 o1; t ! 1 Z
2 0
u
dy
21 u
y
where d0 , D0 are independent of A. respectively, with u 2 [ , ]. We suppose that
The second part gives the following convergence
statements: s fu 2 ; : u < ^ug
Z
x ; 0 [ 1 ; 1 [ L ;
sup ~
p p p ff y; t f y ct dA tg
x2ctA t;ctA t ctA t
where
dy ! 0; t!1
0 < 0 < 1 < 1 < < L < L
X n
sup fFk; t
p p or, correspondingly,
x2CtA t;CtA t kCtApt
^
~ Ct DA tg ! 0; t ! 1
Fk S fu 2 ; : u < ug
; b0 [ a1 ; b1 [ aM ;
The precise a priori estimates of local solutions of
[1], [2] play an important role in the proof. An where
example of such an estimate, also useful for further
results, is given below. a0 < b0 < a1 < b1 < < aM < bM
Proposition 1 Let, in eqn [2], C = (0) > 0, =
p In addition, we suppose that 0 (l ) 6 0, 0 (l ) 6
1, 0 0 (0) < 0 , x def
= (x Ct)= Ct . Let the func- 0, l = 0, 1, . . . , L or, correspondingly, 0 (am ) 6
tion F(x, t), defined in the domain 0 = {(x, t): a1 < 0, 0 (bm ) 6 0, m = 0, 1, . . . , M.
x < a2 }, a2 > 0, satisfy eqn [2],
Proposition 2 (Weinberger 1990, Henkin and
def
Fx; t Fx; t Fx 1; t 0 Polterovich 1999). Under Assumptions 1, 2, one has:
(i) If u 2 [ , ] n s and, correspondingly, u 2
jFx; tj p ; x; t 2 0 ; t t0
Ct [ , ] n S, then following functions are well
defined:
Then
8
> l ; if x < l t
B >
> 1
Fx; t ; x; t 2 0 ; t t0
x < x=t ; if l t x
Ct gl l1 t
where t >
>
>
: l1 ; if x > l1 t;
l 0; 1; . . . ; L
1 0
B B 0 a2 1 ln1 a2
d C and, correspondingly,
d minx a1 ; a2 x
8
> b ; if x < bm t
> m1
and B0 is an absolute constant.
x > < x=t; if bm t x
Gl am1 t
It is interesting to compare a priori estimate of t >
>
>
: am ; if x > am1 t;
Proposition 1 with some similar (but less precise) m 0; 1; . . . ; M
estimates in the theory of classical quasilinear
parabolic equations (Ladyzhenskaya et al. 1968). (ii) For any interval (l , l ) s and, correspond-
We will formulate now the general conjecture ingly, (am , bm ) S there exist traveling waves
concerning asymptotic behavior of solutions of fl (x cl t) for [1] with overfall (l , l ) and,
Cauchy Problem for Burgers-Type Equations 453
correspondingly, Fm (x Cm t) for [2] with over- (iii) For any solution F(n", t), n 2 Z, t 2 R , of initial
fall (am , bm ), where problem [2], [4], there exist shift-functions m (t):
Z l
1
m ln t O1 m t m ln t O1
cl ydy
l l l 0
l 0; 1; . . . ; L
m m < 1;
cl l ; l 0; . . . ; L 1
such that
cl l ; l 1; . . . ; L
~
sup jFn"; t Fn"; t; 0 ; 1 ; . . . ; M j ! 0;
and, correspondingly, n2Z
Z bm t!1
1 dy
C1
m
bm am am y (iv) Moreover, in (iii) one can take
Cm bm ; m 0; . . . ; M 1
m m
Cm am ; m 1; . . . ; M Cm
Conjecture (Henkin and Polterovich 1994, 1999, bm am
8
Henkin and Shananin 2004). Let > 1
>
> 0 ; if m 0 < M; a0 6 b0
>
> b m
>
>
~f x; t; 0 ; . . . ; L < 1 1
; if 0 < m < M
X
L L1
X X
L1 >
> 0 am 0 bm
x >
>
~f x c t " t g l >
> 1
l l l l >
: 0 ; if m M > 0; aM 6 bM
l0 l0
t l0 am
X
L
The main result confirming formulated conjec-
l ; L1
l1
tures is the following.
~
Fn"; t; 0 ; . . . ; M Theorem 5 (Henkin and Shananin). Conjecture
X
M X
M 1
n" (i) for L = 1 and corresponding conjecture (iii) for
~ m n" Cm t "m t
F Gm M = 1 are true, that is,for solution of initial problem
m0 m0
t [1], [3] there exist shift functions l (t) = O (ln t) such
X
M 1 X
M that for t ! 1 we have
bm am ; M1 8
m0 m1 < ~f 0 x c0 t "0 t; if x c0 t
f x; t7! 1 x=t; if c0 t x c1 t
Then under Assumptions 1, 2, the following state- :~
ments are valid: f 1 x c1 t "1 t; if x c1 t
(i) For any solution f (x, t), x 2 R, t 2 R , of ini- and for solution of initial problem [2], [4] there exist
tial problem [1], [3], there exist shift-functions l (t): shift functions m (t) = O(ln t) such that for t ! 1
we have
l ln t O1 l t l ln t O1 8
0 l l < 1; l 0; 1; . . . ; L > ~ n" C0 t "0 t; if n" C0 t
F
>
< 01
n"=t; if C0 t n"
such that Fn"; t7!
>
> C1 t
:~
F1 n" C1 t "1 t; if n" C1 t
sup jf x; t ~f x; t; 0 ; 1 ; . . . ; L j ! 0;
x2R
The proof of Theorem 5 is of the same nature as
t!1 the proof of Theorem 4.
(ii) Moreover, in (i) one can take Remarks
l l (i) The proof of stronger Conjectures (ii) and (iv)
"
for L = 1 or M = 1 are in preparation.
8
l l
> 1 (ii) The numerical results, Rykova and Spivak (pre-
>
> ; if l 0 < L; 0 6 0
>
> 0
print, 2004), confirm conjecture (iii) for M = 2.
>
< 1 l 1 (iii) The results of Weinberger (1990) and Henkin
0 ; if 0 < l < L
>
> 0
l l and Polterovich (1999) confirm convergence
>
> 1
>
> statements of Conjectures (i), (iii) for all L and
: 0 ; if l L > 0; L 6 L
l M, but only on the intervals of rarefaction
454 Cauchy Problem for Burgers-Type Equations
profiles: x 2 [(l )t, (l 1 )t] or, correspond- Henkin GM and Polterovich VM (1999) A difference-differential
ingly, x 2 [(bm )t, (am 1 )t], t > 0. analogue of the Burgers equation and some models of
economic development. Discrete and Continuous Dynamical
The problem of finding asymptotics (t ! 1) of Systems 5: 697728.
solutions of (viscous) conservation laws has been Henkin GM and Shananin AA (2004) Asymptotic behavior of
solutions of the Cauchy problem for Burgers type equations.
posed originally not only for generalized Burgers Journal Mathematiques Pure et Appliquee 83: 14571500.
equations but also for systems of conservation laws in Henkin GM, Shananin AA, and Tumanov AE (2005) Estimates
one spatial variable (see Gelfand (1959)). In this for solutions of Burgers type equations and some applications.
direction many important results on existence and Journal Mathematiques Pure et Appliquee 84: 717752.
asymptotic stability of viscous shock profiles (con- Hoff D and Zumbrun K (2000) Asymptotic behavior of multi-
dimensional viscous shock fronts. Indiana University Mathe-
tinuous and discrete) have been obtained and applied matical Journal 49: 427474.
(see Benzoni-Gavage (2004), Lax (1973), Serre Hopf E (1950) The partial differential equation ut uux =
uxx .
(1999), Zumbrun and Howard (1998) and references Communications in Pure and Applied Mathematics 3: 201230.
therein). The results of type of Theorems 4,5 have not Iljin AM and Oleinik OA (1960) Asymptotic behavior of the
yet been obtained for systems of conservation laws. solutions of the Cauchy problem for some quasilinear
equations for large values of time. Mat. Sbornik 51: 191216
It is also very interesting to study asymptotic (in Russian).
behavior of scalar (viscous) conservation laws in Ladyzhenskaya OA, Solonnikov VA, and Uralceva NN (1968)
several spatial variables (continuous or discrete), Linear and Quasilinear Equations of Parabolic Type. Amer.
basing on the asymptotic properties of Burgers type Math.Soc.Transl. Monogr. vol. 23. Providence, RI.
equations. In this direction there have been several Landau LD and Lifschitz EM (1968) Fluid Mechanics. Elmsford,
NY: Pergamon.
important results and problems (see Bauman and Lax PD (1954) Weak solutions of nonlinear hyperbolic equation
Phillips (1986), Henkin and Polterovich (1991), and their numerical computation. Communications in Pure
Hoff and Zumbrun (2000), Serre (1999), and Applied Mathematics 7: 159193.
Weinberger (1990), and references therein). Lax PD (1957) Hyperbolic systems of conservation laws, II.
Communications in Pure and Applied Mathematics
10: 537566.
Lax PD , (1973) Hyperbolic systems of conservation laws and the
mathematical theory of shock waves. Conference Board of the
Mathematical Science, Monograph 11. SIAM.
Further Reading
Levi D, Ragnisco O, and Brushi M (1983) Continuous and discrete
Bauman P and Phillips D (1986b) Large-time behavior of matrix Burgers Hierarchies. Nuovo Cimento 74: 3351.
solutions to a scalar conservation law in several space Liu T-P (1978) Invariants and asymptotic behavior of solutions of
dimensions. Transactions of the American Mathematical a conservation law. Proceedings of American Mathematical
Society 298: 401419. Society 71: 227231.
Belenky V (1990) Diagram of growth of a monotonic function and Liu T-P, Matsumura A, and Nishihara K (1998) Behaviors of
a problem of their reconstruction by the diagram. Preprint, solutions for the Burgers equation with boundary correspond-
CEMI Academy of Science, Moscow, 144 (in Russian). ing to rarefaction waves. SIAM Journal of Mathematical
Benzoni-Gavage S (2002a) Stability of semi-discrete shock profiles Analysis 29: 293308.
by means of an Evans function in infinite dimension. J.Dyn. Liu T-P and Yu S-H (1997) Propagation of stationary viscous
Diff. Equations 14: 613674. Burgers shock under the effect of boundary. Archieves for
Burgers JM (1940) Application of a model system to illustrate Rational and Mechanical Analysis 139: 5792.
some points of the statistical theory of free turbulence. Proc. Oleinik OA (1959) Uniqueness and stability of the generalized
Acad. Sci. Amsterdam 43: 212. solution of the Cauchy problem for a quasi-linear equation.
Dafermos CM (1977) Characteristics in hyperbolic conservation Usp.Mat.Nauk 14: 165170. ((1963) American Mathematical
laws. A study of structure and the asymptotic behavior of Society Translations 33).
solutions. In: Knops RJ (ed.) Nonlinear Analysis and Serre D (1999) Systems of Conservation Laws, I. Cambridge:
Mechanics: HeriotWatt Symposium, vol. 17, pp. 158. Cambridge University Press.
Research Notes in Mathematics, London: Pitman. Serre D (2004) L1 -stability of nonlinear waves in scalar
Gelfand IM (1959) Some problems in the theory of quasilinear conservation laws. In: Dafermos C and Feireisl E (eds.)
equations. Usp. Mat. Nauk 14: 87158 (in Russian). ((1963) Handbook of Differential Equations, pp. 473553. Elsevier.
American Mathematical Society Translations 33). Weinberger HF (1990) Long-time behavior for a regularized
Harten A, Hyman JM, and Lax PD (1976) On finite-difference scalar conservation law in the absence of genuine non-
approximations and entropy conditions for shocks. Commu- linearity. Annales de Linstitut Henri Poincare (C) Analyse
nications in Pure and Applied Mathematics 29: 297322. Nonlineaire.
Henkin GM and Polterovich VM (1991) Schumpeterian dynamics as Zumbrun K and Howard D (1998) Poinwise semigroup methods
a nonlinear wave theory. Journal of Mathematical Economics and stability of viscous shock waves. Indiana University
20: 551590. Mathematical Journal 47: 63185.
Cellular Automata 455
Cellular Automata
M Bruschi, Universita di Roma La Sapienza, Rome, (iia) if the box is empty and the box on its left is
Italy empty then put a ball in the box;
F Musso, Universita Roma Tre, Rome, Italy (iib) if there is a ball in the box and also there is a ball
2006 Elsevier Ltd. All rights reserved. in the box on its left then empty the box.
An example of the evolution of such a rather trivial
CA is given in Figure 1.
What is a Cellular Automaton? A more precise notation can now be established.
First, let us denote the state of a cell at time t by a
Cellular automata (CAs) were first introduced by state function, say S. According to the point (iib)
J von Neumann in his investigation of complexity, above, the number of possible states is arbitrary but
following an inspired suggestion by S Ulam. But in the finite: denote this number by the positive integer M
last 50 years they have been investigated and used in a (M > 1). Then S takes values on a finite field, say
number of fields; widely different terminologies have ZM = Z=MZ = {0, 1, 2, . . . , M 1} (in plain words,
been used by researchers that now it is difficult even we have denoted the M states for the CA by the
to give a precise general definition of a CA. Thus, first M non-negative integers). Different cells can be
some definitions and approximations are in order. labeled with a progressive number: c(n), n = n1 , n1
First a broad definition: 1, . . . , n2 1, n2 ; possibly, in case of an infinite
1. have a number of cells (boxes); number of cells, one has n1 ! 1 and/or
2. at any (discrete) time step, any cell can present n2 ! 1. In the case of n1 = 1, n2 = 1, one
itself in a certain state among a finite number speaks of a unidimensional CA. Of course, the field S
of different states; depends on n as well as on time (remember that, for a
3. the state of any cell can change (evolve) from a CA, time is a discrete variable: t = 0, 1, 2, . . .). The
time step to the subsequent time step; and field S(n, t) describes completely the CA. If the EL is
4. there is a rule (evolution law, EL) which deterministic, then one can determine (com-
determines this transition. pute) S(n, t) step by step for t > 0 from the initial
configuration S(n, 0) (initial datum, ID). Consider
Note that the number of cells can be finite or infinite; only static ELs, namely those that do not change in
the cells can be arranged on a line, on a surface, in the time. A further distinction can be made: there are
ordinary three-dimensional (3D) space, or possibly in a ELs such that the future state of the generic cell,
hyperspace (in any case, the cells can be numbered); the S(n, t 1), depends on the whole current configura-
different states of a cell can be denoted by integer tion of the CA (these are called nonlocal ELs) and
numbers but, in different contexts of application of there are ELs for which S(n, t 1) depends only on
CAs, different imaginative pictures have also been used
(e.g., different colors, dead and living cells, number of
balls in a box, etc.); the evolution of a CA proceeds in
c (1) c (2) c (3) c (4) c (5) c (6) c (7)
finite time steps (time is also discrete); the EL, provided
that it is effective on any possible configuration of a t=0
given CA (computability), is otherwise completely
arbitrary (indeed, there are not only deterministic and t=1
probabilistic ELs, but also those that evolve in time t=2
following a meta-EL, which in turn can be determinis-
tic or probabilistic). t=3
Consider some examples of CAs.
t=4
Example 1 (CA1) Consider a linear array of seven
boxes (cells; one can number them c(i), i = 1, 2, . . . , 7). t=5
Each box can be empty or it can contain a ball (so
t=6
there are just two states for each cell). Given a
configuration of this CA at time t, what happens at t=7
time t 1 (EL)?
Figure 1 A seven time-step evolution of CA1 starting from a
(i) the state of the first box c(1) never changes; given ID (t = 0). Note that a stable configuration has been
(ii) for each other box c(i), i = 2, 3, . . . , 7; reached at t = 6.
456 Cellular Automata
the current state of a finite number, say N, of cells S(n, t) 6 V be called population set (PS), then PS is
(local ELs): a finite set at each time.
Of course, one can easily devise an EL for which
fSn ki ; tg; i 1; 2; . . . ; N; ki 2 Z this is not true; nevertheless, the EL itself is still
) Sn; t 1 1 valid (computable), for instance,
Note that, in principle, the set of cells that Example 3 (CA3) This is an unidimensional CA,
determine, according to the EL, the future state of the namely there are infinite cells on a line (n 2 Z). The
generic cell n, could depend on n, namely one can have cells have M states and V = 0; the EL reads:
N = N(n), as well ki = ki (n), i = 1, 2, . . . , N(n) (see
CA2 below). In any case, such a set of cells is called the state of each cell cycles in the set of available states
the interaction set (IS). Moreover, the distance from 0 ! 1; 1 ! 2; . . . ; M 2 ! M 1; M 1 ! 0
the cell n of the farthest cell in the IS is called
the range R (of the interaction): R = max(jki j). If Note that the range R is zero, there is a vacuum
IS {c(n R), c(n R 1), . . . c(n), . . . c(n R 1), excitation; nevertheless, the EL is effective.
c(n R)}, then this IS is called a neighborhood of
Deterministic, static, and local ELs that do not give
range R. It is, moreover, clear that, for unidimensional
rise to vacuum excitation are called normal ELs (NELs).
CA, there exists at least one infinite subset of cells that
Since M, N are finite for an NEL, one can give the
have the same state. If there is only one such subset,
then it is called the vacuum set and the state of its NEL itself as a table, considering every possible
configuration of the IS and specifying the outcome
cells is called vacuum state: let V denote the value of
for each configuration (note that there are MN
this state (0 V < M, S(n, t) n!1 ! V).
possible configurations).
Example 2 (CA2) An example of CA with
n-dependent IS (M = 2, R = 3, V = 0). This is the Example 4 (CA4) n 2 Z, M = 2, V = 0, IS {c(n),
c(n 1), c(n 2)}, N = 3, R = 2. The EL is:
EL: the cell c(n) changes its state (0 ! 1, 1 ! 0) iff
Sn; t 0 0 0 0 1 1 1 1
(i) n is even and at least one of the two cells on its
left is not in the vacuum state; Sn 1; t 0 0 1 1 0 0 1 1
2
Sn 2; t 0 1 0 1 0 1 0 1
(ii) n is odd and one or three of the three cells on its
Sn; t 1 0 1 1 0 1 1 0 1
right are not in the vacuum state.
An example of the evolution of such a CA is given
An example of the evolution of such a CA is given
in Figure 2. in Figure 3.
However, these NELs can also be given in an
Usually, only a subclass of ELs is considered for
alternative representation (more useful in view of the
which the phenomenon of vacuum excitation
cannot occur. Namely, during the evolution of extensions of the concept of CA itself, see below).
Namely, an NEL can be given as a discrete-time
the CA, an infinite subset of the vacuum set
EL for the state function S(n, t) in the finite field
cannot change its state in just one time step. In
ZM = {0, 1, 2, . . . , M 1}.
other words: if the set of cells starting from the
first cell and ending with the last one for which
Extensions
The concept of a CA is so simple that many
extensions of the above-sketched definition of a
Figure 4 Four hundred and sixty-one time steps of CA5, CA can be easily devised. A (nonexhaustive) survey
starting from a random chosen PS of 50 cells. of such extensions follows.
458 Cellular Automata
5
Figure 9 A class-3 CA: M = 5, V = 0, R = 2, EL: S(n, t 1) =
2S(n 1, t) S(n 1, t) S(n, t)(S(n 1, t) S(n 2, t))
S(n 1, t)S(n 1, t).
In this extension, the state function S(n, t) is Thus, in a sense, vector CAs are still usual CAs
considered as a vector, namely S(n, t) with a complicated EL.
(S1 (n, t), S2 (n, t), . . . SL (n, t)), L being a positive inte-
Example 6 (CA6) A two-component vector CA:
ger. Each component Sl (n, t)(l = 1, 2, . . . , L) takes
values in a finite field, say ZMl = {0, 1, 2, . . . , Ml M
S1 n; t 1 1 S1 n; tS1 n 1; t
1}, and evolves, according to some EL, interacting
with the other components. Of course, one can give M1 1S2 n 1; tS2 n; t c1 8
separately the time evolution for each component; M
S2 n; t 1 2 S1 n 1; tS2 n; t
however, it is also possible to give a global
representation of a vector CA, introducing a global S1 n; tS2 n 1; t c2 9
Cellular Automata 459
Multidimensional CA
Up to now we have considered CAs with finite number
of cells (finite CAs) or with an infinite number of cells
arranged on a line (unidimensional CAs). Now we
Figure 11 A class-4 CA. Note the interacting moving struc- consider CAs with cells arranged on a surface,
tures on the left and on the right; note also the apparently usually a plane (bidimensional CAs), or on 3-space
2
chaotic behavior in the center; M = 2,V = 0,R = 2,EL: S(n,t 1)= (tridimensional CAs), or even on a hyperspace (multi-
S(n,t) S(n 1,t) S(n 1,t)S(n 2,t). dimensional CAs). In any case, if the number of cells
is finite, the evolution of such CAs, according to an
The global behavior of this CA can be expressed, NEL, must end up to a final cycle: this is due to the
for example, through the global state function finiteness of the phase space (thus, these CAs should
be classified as class 1; however, note that, if the
~
Sn; t M2 S1 n; t S2 n; t 10 phase space is large enough, the dynamics of
460 Cellular Automata
such CAs can still be very rich). Usually, one (periodic structures), gliders and ships (moving
considers an infinite number of cells tessellating structures), emitters and absorbers (namely, struc-
the whole s-space, s = 2, 3, . . . (e.g., squares or tures that, after a time period, reconstitute them-
hexagons on the plane, cubic cells in 3-space). The selves, but meanwhile they have emitted or adsorbed
changes in the previous notation and definitions are moving structures). These structures are essential to
plain: for example, for a bidimensional CA, the state prove that Life can be used to construct a universal
function depends now on two discrete space Turing machine (see below). One can get a rough
variables (S(n1 , n2 , t), n1 2 Z, n2 2 Z); furthermore, idea of such richness from Figure 14.
there is a greater freedom in choosing a neighbor- As in the previous case of vector CA, one could
hood of range R. Two most-used neighborhoods of object that also multidimensional CAs are not true
range 1 are shown below: extensions of the unidimensional CAs. Indeed, since
the whole set of cells is still a countable set, one
Neumann neighborhood
could number the cells with just a discrete space
variable (say n 2 Z ). For example, in the case of a
square tessellation of the plane, we could enumerate
}
the cells in the plane starting from the origin as
follows:
22 !
11
MooreConway neighborhood 21 20 19 18
13 12 11 4 5 6 17
14 9 10 3 2 7 16
15 8 1 0 1 8 15 14
}
16 7 2 3 10 9 14
17 6 5 4 11 12 13
18 19 20 21
The most famous (and interesting) bidimensional 22
CA is Life, introduced by J H Conway, which is
discussed next.
Thus, any multidimensional CA could in principle
Example 7 (CA Life; MooreConway neighbor- be viewed as a unidimensional one. Of course, one
hood, V = 0, M = 2). A cell in the vacuum state 0 is has to pay a price for this: ISs and ELs that are
called dead; a cell in the state 1 is called alive. simple for a multidimensional CA become cumber-
The EL is as follows: some for its unidimensional version and vice versa.
(i) If a cell is dead at time t, it comes alive at time
t 1 if and only if exactly three of its eight
Higher Time Derivatives
neighbors are alive at time t (reproduction).
(ii) If a cell is alive at time t, it dies at time t 1 if and Up to now, we have considered CAs whose evolved
only if fewer than two (loneliness) or more than state S(t 1) depends only on the state S(t), namely
three (overcrowding) neighbors are alive at time t. the state of the CA itself at the previous time step. In
other words the EL involves just the first (discrete)
Clearly, this is a totalistic NEL. Now considering
time derivative (1 CA). One can easily extend all the
the explicit form of (see [6]):
previous definitions to consider higher-order discrete
n1 ; n2 ; t Sn1 ; n2 ; t time derivatives (K CA). Of course, the ID and the IS
for such a CA involve the state of the CA at K
X
1 X
1
Sn1 k1 ; n2 k2 ; t 12 subsequent time steps.
k1 1 k2 1 An example of a unidimensional 2 CA is given
below.
the above EL can be simply expressed as:
Example 8 (CA7) M = 3, V = 0, R = 1. The EL is:
Sn1 ; n2 ; t 1 3; 2; Sn1 ; n2 ; t 13
3
Sn; t 1 Sn 1; t Sn; t 1 Sn 1; t 15
where 3, is the Kroenecker symbol.
Life is a class-4 CA; it exhibits a rich variety of An example of the evolution of such a CA is given in
interesting structures: stable structures, oscillators Figure 15.
Cellular Automata 461
(a) (b)
where
j
i 1; 2; . . . ; N j ; ki 2 Z
18b
j 0; 1; 2; . . . ; K 2
~ M
Sn;~t 1 ~
Sn;~t K 1
M 1F ~ Sn kji ;~t j K 2 19
Of course, more complicated invertible ELs can be Example 11 (CA10) A 1.5 CA, M = 2, V = 0, R = 3.
devised. Invertible ELs can be also easily devised for The EL is:
filter CA, for example, if an NEL for a filter CA
reads 2
Sn; t 1 Sn; t Sn 3; t 1Sn 2; t 1
Sn 2; tSn 3; t
M
Sn; t 1 Sn; t Sn 2; t 1Sn 1; t 1
~j ; t 1 Sn 1; tSn 2; t 24
FSn ki ; t; Sn k 22
Note that this EL is of the form [22]; therefore, it
where ki and k ~j are positive integers is invertible (see Figure 18a). According to [23], the
~ and F is an arbitrary
(i = 1, 2, . . . , N; j = 1, 2, . . . , N) inverse EL reads:
(polynomial) function, then it is invertible and
the inverse NEL reads 2
~Sn; ~t 1 Sn; ~t ~Sn 3; ~t 1~Sn 2; ~t 1
M
~ 2; tSn
Sn ~ 3; t
~
Sn; ~t 1 ~
Sn; ~t M 1
~Sn 2; ~t 1~Sn 1; ~t 1
F~
Sn ki ; ~t 1; ~ ~j ; ~t
Sn k 23 ~Sn 1; ~t~Sn 2; ~t 25
Note that [22] is computable starting from This CA exhibits a very rich dynamics: any
n = 1, whereas [23] is computable starting from complex ID rapidly decays in a great variety of coherent
n = 1. particle-like structures, steady or moving to the right or
(a) (b)
(c) (d)
Figure 18 CA10: (a) 230 time-step evolution, then the inverse EL is applied for 230 further time step in order to recover the initial
configuration. (b) Collisions between different kinds of particle-like coherent moving structures. The last collision (on the right) is
a solitonic one: the interaction produces just a phase shift, preserving number, shape, and velocities of the involved particles.
(c) Particles moving with different velocities and interacting in complex ways (solitonic collisions, particle creations and annihilations).
(d) A particle goes through a nonhomogeneous medium and undergoes refraction by the medium itself.
464 Cellular Automata
to the left with different velocities. The interactions the constructing arm. When on the tape, it stores a
between different particles may be solitonic (the description of the universal constructor itself, then it
particles emerge unchanged but shifted) or annihila- self-reproduces. The total size of the self-reproducing
tioncreation phenomena can occur (see Figures 18ad). automaton amounts to 200 000 cells. (Some com-
puter simulations of von Neumann self-reproducing
automaton are available on the web.)
Applications of CAs Since von Neumanns CA is a very complex one,
it led researchers to think that a CA able to simulate
CAs as Universal Constructors and
a universal Turing machine should also be quite
Turing Machines
complex. The perspective changed completely after
In the 1950s, von Neumann, who contributed to the the introduction of CA Life. Conway was looking
development of the first computer (ENIAC), decided for a simple CA with a possible rich dynamics;
to work out a mathematical theory of automata. however, it was subsequently realized that Life was
Such a theory was finalized to give an answer to the much more complicated that anyone could have
following question: is it possible to build an thought. Finally, thanks to the development of faster
automaton such that it allows universal computa- computers that allowed visualization of the evolu-
tion (i.e., it embodies a universal Turing machine) tion of quite large populations and through the
and, moreover, it is able to build (in order of contribution of a large number of researchers, it was
decreasing generality) proved that a universal Turing machine could be
embedded in Life.
1. an arbitrary automata (universal constructor);
The discovery that even a simple CA such as Life
2. a copy of itself (self-reproducing); and
could incorporate a universal Turing machine led to
3. an automaton that is itself a universal Turing
the question whether it could be possible to build a
machine (constructor)?
universal Turing machine inside a simple one-
The last question von Neumann had intention to dimensional CA. This is indeed the case: up to
address was if in the process of automata self- now, the simplest CA capable of universal computa-
reproduction (if possible) a process of evolution tion is the W110 CA (see Figure 10), as proved
could take place, that is, if a simpler automaton recently by Cook after a conjecture formulated by
could generate a more complex one. Wolfram in 1985.
In the beginning, the idea of von Neumann was to
describe, using mathematical axioms, an automaton
CAs for Computer Simulations
moving inside a warehouse and selecting various
elementary spare parts (e.g., muscles, switches, rigid One of the major applications of CAs is the
girders) and then assembling them into a new auto- computer simulation of various dynamical pro-
maton. While this original idea was very realistic, it was cesses. Even if CAs were not invented for this
also very difficult to pursue, so that von Neumann, purpose, they possess peculiarities that make them
following a suggestion by Ulam, decided to consider his particularly suitable for this task. The main advan-
questions in the more abstract framework of CAs. tage of using a CA for a dynamical simulation is due
The particular CA he considered is an infinite to their completely discrete nature that allows exact
square CA with 29 possible states. The transition rule simulations on a computer. Thus, any spurious
is dependent upon the cell to update and its north, effect due to rounding errors is ruled out. Another
east, south, and west neighbor cell (the von Neumann advantage is that the EL of a CA can be seen as a
neighborhood). Among the 29 possible states there is function between finite sets. For this reason, one can
one state that is quiescent (the vacuum state). specify the EL through a lookup table (see [2]):
von Neumann proved the existence of a configura- then when running the simulations, the computer
tion of 50 000 cells immersed in a sea of quiescent has only to access the table instead of computing the
states that embodies a universal Turing machine and function every time, shortening considerably the
that is a universal constructor. An infinite one- computation time. Another great advantage of CAs
dimensional tape is used to store a description of in computer simulations is that, for their very nature
the automaton to build. The universal constructor (at least for local EL), they can be implemented on
reads the description on the tape, develops a parallel machines. These two concepts are at the
constructing arm that builds the configuration basis of dedicated computers for CAs simulations
described on the tape in an unoccupied part of the developed by Toffoli, Margolus, and co-workers
cellular space, makes a copy of the tape and finally (CAM series). The possibility to use efficiently
attaches it to the newly built automaton and retracts parallel computers for CA simulation could prove
Cellular Automata 465
CAs in Physics
Since Newton, physics has been described through
differential equations and continuous functions.
However, such a mathematical description is not
fit for simulation on a computer, and some
discretizations must be considered. First, one has to
discretize space and time passing from differential
equations to (finite systems of) finite difference
equations; second, one has to round off the values
of the functions to store them in the memory of the
computer. The main drawback of this procedure is
(a)
that in chaotic systems such approximations can
rapidly lead to great differences between the real Collisions
and the simulated behavior. As already noticed, this
problem does not appear in CA. Thus, one would
like to use this good characteristic of CAs in physical
modeling taking due account of the continuous
nature of the physics involved. This requires atten-
tion and ingenuity in constructing reliable CA
models for physical processes. For example, this
goal has been achieved in the so-called lattice gas
automata (LGAs).
LGAs are CA models for the microscopic Free flight
dynamics of fluids and gases. The thermodynamic
limit of these CAs yields the correct continuous
functions for the macroscopic quantities (density,
pressure, viscosity, etc.).
The first step toward LGAs was the discovery that
the HPP model developed in the 1970s by Hardy, (b)
Pomeau and De Pazzis was in fact a CA. The HPP Figure 20 (a) An example of configuration for the HPP model.
model describes the behavior of a fluid (or a gas) in (b) Head on collisions and three particle collisions in the HPP
a plane. The configuration space is given by a model.
466 Cellular Automata
Accordingly, only four bits nj (x, t), j = 1, 2, 3, 4, are nonlinear dynamical systems (nonlinear continuous
required to denote the presence (1) or the absence and discrete evolution equations, many-body pro-
(0) of a particle with velocity cj pointing vertex x at blems) could profitably be extended to find integr-
time t. The dynamical rule for HPP can be written in able CAs. Indeed, many such CAs have been found
the form that exhibit solitons and are endowed with non-
trivial conservation laws (of course, this is very
nj x cj ; t 1 nj x; t !j x; t 27 important in physical modeling). Moreover, the
above-cited similarity between certain CA behaviors
where term nj (x, t) on the right-hand side accounts and elementary particle physics phenomena suggests
for the free flight of particles, while !j (x, t) modifies that the fundamental structure of reality (at the Planck
the trajectories in the case of collisions. The !j are level) could indeed be that of a CA (cells of Plank
determined by the state of the system according to length, discrete time flow): attempts to construct this
the following rules: underlying CA physics have been pursued.
Jackson EA (1990) Perspectives of Nonlinear Dynamics. von Neumann J (1966) In: Burks AW (ed.) Theory of Self-
Cambridge: Cambridge University Press. Reproducing Automata. Urbana: University of Illinois Press.
Toffoli T and Margolus N (1987) Cellular Automata Machines Wolfram S (2002) A New Kind of Science. Champaign: Wolfram
A New Enviroment for Modeling. Cambridge: The MIT Press. Media.
P
applied to the Poincare map of a periodic orbit). We the Taylor series of
(X) is A y 1 k = 2 gk (y). For
concentrate on the simplification of the Taylor series. practical computations, it is often appropriate to
The general idea is to apply consecutive polynomial first simplify the linear part A and to diagonalize it
changes of variables; at each step we simplify terms of whenever possible. Hence, it is convenient to use a
a degree higher than in the step before. The ideal complexified setting and to use complex polyno-
simplification would be to put all higher-order terms mials or power series. One can show that all
to zero, which would (at least at the level of formal involved changes of variables preserve the property
series) linearize the system. But as soon as there are of being a complex system coming from a real
resonances (see below), this is impossible: the planar system, that is, at the final stage we can return to a
system 2x@=@x (y x2 )@=@y cannot be formally real system (see, e.g., Arrowsmith and Place (1990)
linearized. for a more precise mathematical description).
Hence, we can assume that A is an upper
Setting triangular matrix. Let the eigenvalues be 1 , . . . , n .
It can be calculated that the eigenvalues of LA , as an
Let X be a Cr1 vector field defined on a neighbor- operator H k ! P Hk , are then the numbers h, i j
hood of 0 2 Rn , and denote A = dX(0) (its linear where 2 N , nj= 1 j = k and 1 j n. Hence, if
n
approximation at 0). The Taylor expansion of X at these would all be nonzero then Bk = H k , and then
0 takes the form we have an ideal simplification, that is, all gk equal
X
r to zero. However, if such a number is zero, that is,
Xx A x Xk x Ojxjr1
k2 h; i j 0 2
where Xk 2 H k , the space of vector fields whose it is called a resonance between the eigenvalues. In
components are homogeneous polynomials of such a case, we have to choose a complementary
degree k. The classical formal normal-form theorem space Gk . From linear algebra it follows that one
is as follows. We define the operator LA on H k by can always choose
putting LA h(x) = dh(x) A x A h(x); one calls LA
the homological operator. One checks that Gk kerLA
3
LA (H k ) H k . One also denotes this by ad A(h)(x):
where A
is the adjoint operator. But this choice [3] is
see further in the Lie algebra setting. Let Rk be the
not unique and is, from the computational point of
range of LA , that is, Rk = LA (H k ). Let Gk denote any
view, not always optimal, especially if there are
complementary subspace to Rk in H k . The formal
nilpotent blocks. This fact has been exploited by
normal-form theorem states, under the above
many authors. A typical example is the case where
settings:
A = y@=@x. On the other hand, if A is semisimple we
Theorem 3 (Chow et al. 1994, Dumortier 1991) can choose the complementary space to be ker(LA ), so
There exists a composition of near identity changes LA gk = 0; we can assume it to be the (complex)
of variables of the form diagonal[1 , . . . , n ]. In that case we can be more
explicit as follows. Let ej = @=@xj denote the standard
x y k y 1 basis on Cn . For a monomial one can calculate that
k
where the components of are homogeneous LA x ej h; i j x ej 4
polynomials of degree k, such that the vector field
X is transformed into If the latter is zero, then the monomial is called
X
r resonant. This implies that the normal form can be
Yy A y gk y Ojyjr1 chosen so that it only contains resonant monomials.
k2 Putting a system into normal form not only
simplifies the original system, it also gives more
where gk 2 Gk , k = 2, . . . , r.
geometric insight on the Taylor series. To be more
Sometimes this theorem is applied to the restric- precise, suppose (for simplicity, this can be general-
tion of a vector field to its central manifold, for ized (Dumortier 1997)) that A is semisimple. One
reasons explained in the last section. This is the can calculate that the condition LA gk = 0 implies:
reason why we did not assume X to be C1 ; in the exp (At)gk ( exp (At)x) = gk (x) for all t 2 R. This
latter case one can let r ! 1 and obtain a normal means that gk is invariant for the one-parameter
form on the level of formal Taylor series (also called group exp(At). A typical example in the plane
1-jets). Using a theorem of Borel, we infer the is: A has eigenvalues i, i. Note that the (only)
existence of a C1 change of variables such that resonances are h(i, i), (p 1, p)i i = 0 and
470 Central Manifolds, Normal Forms
h(i, i), (p, p 1)i i = 0 for all p 2 N. We done, one says that L0 respects the grading by the
suppose that the original system was real, that is, homogeneous polynomials. In order to fix ideas,
on R2 ; we can choose linear coordinates such that suppose that L0 are the divergence-free planar vector
for z = x iy, z = x iy the linear part is fields. Note that a monomial xi yj @=@x is not diver-
A = diagonal[i, i]. Applying the remarks above, gence free. We can instead use time mappings of
we conclude that the normal form only contains the homogeneous vector fields of the form a(q
monomials (zz)p z@=@z and (zz)pz@=@z. The geo- 1)xp1 yq @=@x a(p 1) xp yq1 @=@y. Up to terms
metric interpretation here is that these monomials of higher order we can use the time-one map of hk
are invariant for rotations around (0, 0). This can instead of x hk (x). In case that one asks for a C1 -
also be seen on the real variant of this: the Taylor realization of the normalizing transformation, we need
series of the (real) normalized system has the an extra assumption on the extra structure, that is, on
form ( f (x2 y2 ))(x@=@y y@=@x) g(x2 y2 ) L0 , called the Borel property: denote by J1, 0 the set of
(x@=@x y@=@y) and is invariant for rotations. formal series such that each truncation is the Taylor
Warning: the dynamic behavior of a formal normal polynomial of an element of L0 . The extra assumption
form in the central manifold can be very different is: each element of J1, 0 must be the Taylor series of a
from that of the original vector field, since we are C1 vector field in L0 . It can be proved (Broer 1981)
only looking at the formal level. A trivial example is that the following structures respect the grading and
(take f = g = 0 in the foregoing example) X(x, y) = satisfy the Borel property: being an r-parameter family,
(x@y y@x) exp (1=(x2 ))@=@x, where orbits respecting a volume form on Rn , being a Hamiltonian
near (0, 0) spiral to (0, 0), whereas the normal form vector field (n even), and being reversible for a linear
is just a linear rotation. This difference is due to the involution.
so-called flat terms, that is, the difference between One could consider other types of grading of the
the transformed vector field and a C1 -realization of Lie-algebras involved.
its normalized Taylor series (or polynomial). In case This method, using the framework of the so-called
of analyticity of X, one can ask for analyticity of the filtered Lie algebras, is explained and developed
normalizing transformation . Generically, this is systematically in a more general and abstract
not the case in many situations. The precise meaning context in Broer (1981).
of this genericity condition is too elaborate to In nonlocal bifurcations, such as near a homo-
explain in this brief review article. We provide some clinic loop, for example, it is not enough to perform
suggestions for further reading in the next section. central manifold reduction near the singularity: a
One could roughly say that, in the central manifold, simplified smooth model in a full neighborhood of
the normal form has too much symmetry and is too the singularity is often needed, for example, in order
poor to model more complicated dynamics of the to compute Poincare maps.
system, which can be hidden in the flat terms. To Let us start with the purely hyperbolic case (i.e.,
quote Ilyashenko (1981): In the theory of normal dim Ec = 0). First we compute the formal normal
forms of analytic differential equations, divergence form such as the above. If there are no resonances
is the rule and convergence the exception . . . . [2] then we can formally linearize the vector field X.
In many applications, we want to preserve some If X is C1 then a classical theorem of Sternberg
extra structure, such as a symplectic structure, a (1958) states that this linearization can be realized
volume form, some symmetry, reversibility, some by a C1 change of variables (i.e., no more flat terms
projection etc.; the case of a projection is important remaining). In case there are resonances, we must
since it includes vector fields depending on a para- allow nonlinear terms: the resonant monomials. In
meter. Sometimes a superposition of these structures this case we can also reduce C1 to this normal form.
appears (e.g., a family of volume-preserving systems). Using the same methods, it is also possible to reduce
We would like that the normal-form procedure to a polynomial normal form, but this time using
respects this structure at each step. One can often Ck (k < 1) changes of variables. More precisely, if k
formulate this in terms of vector fields belonging to is a given number and if we write the vector field as
some Lie subalgebra L0 . The idea is then to use X = XN RN , where XN is the Taylor polynomial
changes of variables like [1], where k is then generated up to order N (which can be assumed to be in
by a vector field in L0 . This will guarantee that all normal form) and where RN (x) = O(jxjN1 ), then for
changes of variables are compatible with the extra N sufficiently large there is a Ck change of variables
structure. Unlike the general case where we could conjugating X to XN near 0. The number N depends
work with monomials as in [4], we will have to on the spectrum of A = dX(0). An elegant proof of
consider vector fields hk in L0 whose components are these facts can be found in Ilyashenko and Yakovenko
homogeneous polynomials of degree k. If this can be (1991). For the case when extra structure must be
Central Manifolds, Normal Forms 471
preserved, see Bonckaert (1997), which also deals with For local diffeomorphisms there are completely
the partially hyperbolic case (dim Ec 1). As already similar theorems pertaining to all the cases consid-
remarked above, the case of a parameter-dependent ered above.
family can be regarded as a partially hyperbolic
stationary point preserving this extra structure.
The question of an analytic normal form, also in Concluding Remarks
the hyperbolic case, leads to convergence questions
and calls upon the so-called small-divisor problems. The concept of central manifold can be extended to
The classical results are due to Poincare and Siegel. more general invariant sets (see Chow et al. (2000)
Let us summarize them; they are formulated in the and references therein). It can also be extended to
complex analytic setting: the infinite-dimensional case and can be applied to
partial differential equations (Vanderbauwhede and
Theorem 4 Iooss 1992).
(i) If the convex hull of the spectrum of A does not Concerning the generic divergence of normalizing
contain 0 2 C then X can locally be put into transformations, the reader is referred to Broer and
normal form by an analytic change of variables. Takens (1989), Bruno (1989), Ilyashenko (1981), and
Moreover, this normal form is polynomial. Ilyashenko and Pyartli (1991). Although the power
(ii) If the spectrum {1 , . . . , n } of A satisfies the series giving the normalizing transformation generally
condition that there exists C diverges, the study of the dynamics is often performed
P> 0 and > 0 such by truncating the normal form at a certain order.
that for any m 2 N n with j mj 2:
Recently, Iooss and Lombardi (2005) considered the
C question as to what an optimal truncation is. It is
jh1 ; . . . ; n ; mi j j 5 shown, in case dX(0) is semisimple, that the order of
jmj
the normal form can be optimized so that the remainder
for 1 j n then X can be locally linearized by satisfies some estimate shrinking exponentially fast to
an analytic change of variables. zero as a function of the radius of the domain.
Note that case (i) contains the case where 0 is a Concerning normal forms preserving the
hyperbolic source or sink. This case (i) in Theorem 4 Hamiltonian structure, see Birkhoff (1966) and
can be extended if there are parameters: if X Siegel and Moser (1995) for a starting point; this is
depends analytically on a parameter " 2 Cp near an extended subject on its own, sometimes called
" = 0 then the change of variables is also analytic in Birkhoff normal form, and it would require another
"; moreover, the normal form is then a polynomial review article.
in the space variables whose coefficients are analy- Further simplifications of the normal form can
tically dependent on the parameter ". sometimes be obtained by taking into account
For case (ii) this is surely not the case, since the nonlinear terms (instead of just A) in order to obtain
condition [5] is fragile: a small distortion of the reductions of higher-order terms (see Gaeta (2002)
parameter generically causes resonances, be it of a and especially the references therein).
high order. To fix ideas, consider n = 2 and suppose Applications of normal forms and central mani-
1 < 0 < 2 . By a generic but arbitrary small folds to bifurcation theory have been explained in
perturbation, we can have that the ratio of these Dumortier (1991).
eigenvalues becomes a negative rational number
See also: Averaging Methods; Bifurcation Theory;
p=q, which gives a resonance of the form [2] Dynamical Systems and Thermodynamics; Dynamical
with j = 1 and = (q 1, p), so [5] is violated. Systems in Mathematical Physics: An Illustration from
So analytic linearization, or even a polynomial Water Waves; Finite Group Symmetry Breaking;
analytic normal form, is ungeneric for families of Kortewegde Vries Equation and Other Modulation
such hyperbolic stationary points. The search for Equations; Multiscale Approaches; Normal Forms and
analytic normal forms, that is, simplified models, for Semiclassical Approximation; Symmetry and Symmetry
families is still under investigation. A first simplifica- Breaking in Dynamical Systems.
tion is obtained via the stable and unstable manifold
from Theorem 1, that is, the graphs of ss and uu .
When X is analytic near 0 then these manifolds are Further Reading
also analytic. So, up to an analytic change of variables,
Arrowsmith D and Place C (1990) Dynamical Systems. Cambridge:
we can assume that Es and Eu are invariant, which Cambridge University Press.
gives a simplification of the expression of X. More- Aulbach B (1992) One-dimensional center manifolds are C1 .
over, there is analytic dependence on parameters. Results in Mathematics 21: 311.
472 Channels in Quantum Information Theory
Birkhoff GD (1966) Dynamical Systems. With an addendum by Ilyashenko YS (1981) In the theory of normal forms of analytic
Jurgen Moser. American Mathematical Society Colloquium differential equations violating the conditions of Bryuno
Publications, vol. IX. Providence, RI: American Mathematical divergence is the rule and convergence the exception. Moscow
Society. University Mathematical Bulletin 36(2): 1118.
Bonckaert P (1997) Conjugacy of vector fields respecting Ilyashenko YS and Pyartli AS (1986) Materialization of reso-
additional properties. Journal of Dynamical and Control nances and divergence of normalizing series for polynomial
Systems 3: 419432. differential equations. Journal of Mathematical Sciences
Bonckaert P (2000) Symmetric and reversible families of vector 32(3): 300313.
fields near a partially hyperbolic singularity. Ergodic Theory Ilyashenko YS and Yakovenko SY (1991) Finitely smooth normal
and Dynamical Systems 20: 16271638. forms of local families of diffeomorphisms and vector fields.
Broer H (1981) Formal normal forms for vector fields and some Russian Mathematical Surveys 46: 143.
consequences for bifurcations in the volume preserving case. Iooss G and Lombardi E (2005) Polynomial normal forms with
In: Dynamical Systems and Turbulence, Warwick 1980, exponentially small remainder for analytic vector fields.
vol. 898, Lecture Notes in Mathematics. New York: Springer. Journal of Differential Equations 212: 161.
Broer H and Takens F (1989) Formally symmetric normal forms Palis J and Takens F (1977) Topological equivalence of normally
and genericity. Dynamics Reported. A Series in Dynamical hyperbolic dynamical systems. Topology 16(4): 335345.
Systems and their Applications 2: 1118. Siegel CL and Moser JK (1971) Lectures on Celestial Mechanics,
Bruno AD (1989) Local Methods in Nonlinear Differential (reprint 1995). Berlin: Springer.
Equations. New York: Springer. Sternberg S (1958) On the structure of local homeomorphisms of
Chow S-N, Li C, and Wang D (1994) Normal Forms and Euclidean n-space. II. American Journal of Mathematics
Bifurcations of Planar Vector Fields. Cambridge: Cambridge 80: 623631.
University Press. Takens F (1971) Partially hyperbolic fixed points. Topology
Chow S-N, Liu W, and Yi Y (2000) Center manifolds for invariant 10: 133147.
sets. Journal of Differential Equations 168: 355385. Vanderbauwhede A (1989) Center manifolds, normal forms and
Dumortier F (1991) Local study of planar vector fields: singula- elementary bifurcations. In: Kirchgraber U and Walther O
rities and their unfoldings. In: Van Groesen E and De Jager EM (eds.) Dynamics Reported, vol. 2, pp. 89169. New York:
(eds.) Structures in Dynamics, Studies in Mathematical Physics, Wiley.
vol. 2, pp. 161241. Amsterdam: Elsevier. Vanderbauwhede A and Iooss G (1992) Center manifold theory in
Gaeta G (2002) Poincare normal and renormalized forms. Acta infinite dimensions. In: Jones CKRT et al. (eds.) Dynamics
Applicandae Mathematicae 70(13): 113131 (symmetry and Reported, vol. 1, New Series, pp. 125163. Berlin: Springer.
perturbation theory).
we have used the fact that an element A 2 B(H) (i) T is called positive if T(A) 0 holds for all
C(X) can be represented in a canonical way by a positive A 2 A.
sequence (Ax )x2X of operators on H. The set of (ii) T is called completely positive (CP) if T
states will be denoted in the following by S(A) and Id : A B(Cn ) ! B(H) B(Cn ) is positive for
the set of effects by E(A). all n 2 N. Here Id denotes the identity map
on B(Cn ).
Completely Positive Maps (iii) T is called unital if T(1) = 1 holds.
Our aim is now to get a mathematical object which Consider now the map T : B ! A which is dual
can be used to describe a channel. To this end, to T, that is, T (A) = (TA) for all 2 B and A 2 A.
consider two C -algebras, A, B, describing the input It is called the Schrodinger-picture representation of
and output system, respectively, and an effect A 2 B the channel T, since it maps states to states provided T
of the output system. If we invoke first a channel is unital. (Complete) positivity can be defined in the
which transforms A systems into B systems, and Schrodinger picture as in the Heisenberg picture, and
measure A afterwards on the output systems, we end we immediately see that T is (completely) positive iff
up with a measurement of an effect T(A) on the T is.
input systems. Hence, we get a map T : E(B) ! E(A) It is natural to ask whether the distinction
which completely describes the channel (note that between positivity and complete positivity is
the direction of the mapping arrow is reversed really necessary, that is, whether there are
compared to the natural ordering of processing). positive maps which are not CP. If at least one
Alternatively, we can look at the states and interpret of the algebras A or B is classical, the answer is
a channel as a map T : S(A) ! S(B) which trans- no: each positive map is CP in this case. If both
forms A systems in the state 2 S(A) into B systems algebras are quantum however, complete positiv-
in the state T (). To distinguish between both ity is not implied by positivity alone. The most
maps, we can say that T describes the channel in the prominent example for this fact is the transposi-
Heisenberg picture and T in the Schrodinger tion map.
picture. On the level of the statistical interpretation, If item (ii) holds only for a fixed n 2 N,
both points of view should coincide of course, that the map T is called n-positive. This is obviously
is, the probabilities (T )(A) and (TA) to get the a weaker condition than complete positivity.
result yes during an A measurement on B systems However, n-positivity implies m-positivity for
in the state T , respectively a TA measurement on all m n, and for A = B(Cd ) complete positivity
A systems in the state , should be the same. Since is implied by n-positivity, provided n d holds.
(T )(A) is linear in A, we see immediately that T Let us consider now the question whether a
must be an affine map, that is, T(1 A1 2 A2 ) = channel should be unital or not. We have already
1 T(A1 ) 2 T(A2 ) for each convex linear combina- mentioned that T(1) 1 must hold since effects
tion 1 A1 2 A2 of effects in B, and this in turn should be mapped to effects. If T(1) is not equal to 1,
implies that T can be extended naturally to a linear we get (T1) = T (1) < 1 for the probability to
map, which we will identify in the following with measure the effect 1 on systems in the state T ,
the channel itself, that is, we say that T is the but this is impossible for channels which produce an
channel. output with certainty, because 1 is the effect which
Let us now change slightly our point of view and is always true. In other words, if a CP map is not
start with a linear operator T : A ! B. To be a unital, it describes a channel which sometimes
channel, T must map effects to effects, that is, T has produces no output at all and T(1) is the effect
to be positive: T(A) 0 8 A 0 and bounded from which measures whether we have got an output. We
above by 1, that is, T(1) 1. In addition, it is natural will assume henceforth that channels are unital if
to require that two channels in parallel are again a nothing else is explicitly stated.
channel. More precisely, if two channels T : A1 ! B1
and S : A2 ! B2 are given, we can consider the map
T S which associates to each A B 2 A1 A2 the Quantum Channels
tensor product T(A) S(B) 2 B1 B2 . It is natural to
In this section we will discuss some basic properties
assume that T S is a channel which converts
of CP maps which transform quantum systems into
composite systems of type A1 A2 into B1 B2
quantum systems, in particular the Stinespring
systems. Hence, S T should be positive as well.
theorem, which constitutes the most important
Definition 1 Consider two observable algebras structural result. For a more detailed presentation,
A, B and a linear map T : A ! B B(H). including generalizations to more general input/
474 Channels in Quantum Information Theory
output algebras the reader should consult the This representation of a channel has a (seemingly)
textbook by Paulsen (2002). very nice physical interpretation, because we can
look at eqn [3] as the unitary interaction of the
system with an unobservable environment, which is
The Stinespring Theorem
initially in the state 0 . The problem, however, is
Hence consider channels between quantum systems, that there is a great arbitrariness in the choice of U
i.e., A = B(H1 ) and B = B(H2 ). A fairly simple and 0 . This is the weakness of the ancilla form
example (not necessarily unital) is given in terms of compared to the Stinespring representation.
an operator V : H1 ! H2 by B(H1 ) 3 A 7! VAV 2 Finally, let us state a related result. It characterizes
B(H2 ). A second example is the restriction to a all decompositions of a given completely positive
subsystem, which is given in the Heisenberg picture map into completely positive summands. By analogy
by B(H) 3 A 7! A 1K 2 B(H K). Finally the com- with results for states on abelian algebras (i.e.,
position S T = ST of two channels is again a probability measures), we will call it a Radon
channel. The following theorem says that each Nikodym theorem (see Arveson (1969) for a proof).
channel can be represented as a composition of Theorem 5 (RadonNikodym theorem). Let
these two examples [7]. Tx : B(H1 ) ! B(H2 ), x 2 X be a family of CP
Theorem 2 (Stinespring dilation theorem). Every maps and let V : P H2 ! H1 K be the Stinespring
operator of T =
completely positive map T : B(H1 ) ! B(H2 ) has the x Tx ; then there are uniquely
form determined
P positive operators Fx in B(K) with
x Fx = 1 and
TA V A 1K V 1
Tx A V A Fx V 4
with an additional Hilbert space K and an operator
V : H2 ! H1 K. Both (i.e., K and V) can be
chosen such that the span of all (A 1)V with A 2
B(H1 ) and 2 H2 is dense in H1 K. This The Jamiokowski Isomorphism
particular decomposition is unique (up to unitary The subject of this section is a relation between CP
equivalence) and is called the minimal maps and states of bipartite systems, first discovered
decomposition. by Jamiokowski (1972), and which is very useful in
By introducing a family jj ihj j of one-dimen- translating properties of bipartite systems into
P
sional projectors with j jj ihj j = 1, we can define properties of positive maps and vice versa.
the Kraus operators h , Vj i = h j , Vi. The idea is based on the following setup. Alice
In terms of these, we can rewrite eqn [1] in and Bob share a bipartite system in a maximally
the following form (Kraus 1983): entangled state
Corollary 3 (Kraus form). Every CP map 1 X d
Theorem 6 The map defined in eqns [7] and [6] is The most prominent examples of covariant
a linear isomorphism. The inverse map is given by channels arise with H1 = H2 = Cd , G = U(d) and
1 (U) =
2 (U) = U. All channels of this type are of
BH H0 3 7! T 2 L 8
the form
with
TA 1 #A #d 1 trA1
he0 ; T e0 i d tr je0 ihe0 j T 9 with # 2 0; d2 =d2 1 11
where e01 , . . . , e0d0 2 H0 denote an (arbitrary) ortho- and are known as depolarizing channels. They
normal basis of H0 and the transposition of is often serve as a standard model for noise. Two
defined with respect to the basis e , = 1, . . . , d used particular cases are the ideal channel arising with
to define in [5]. # = 0, and the completely depolarizing channel
From the definition of RT in eqn [6], it is obvious (# = 1) which erases all information. If we choose
(where the bar denotes complex conju-
2 (U) = U
that RT is positive, if T is CP. To see that the
converse is also true is not as trivial (because a gate) instead of
2 (U) = U, we get
transposition is involved), but it requires only a #
short calculation, which is omitted here. Hence, we TA trA1 AT
d1
get:
1 #
trA1 AT ; # 2 0; 1 12
Corollary 7 The operator RT is positive, iff the d 1
map T is CP.
If we map these channels to states of bipartite
systems (using the Jamiokowski isomorphism from
Examples the last section), we get Isotropic states from
eqn [11] and Werner states from [12].
Let us return now to the general case (i.e., arbitrary
input and output algebras) and discuss several
examples. Classical Channels
Observables tr Tx tr Tx 1 trT1 ex 18
Let us consider now a channel which transforms is (again) the probability to measure x 2 X on .
quantum information B(H) into classical information The instrument T can be expressed in terms of the
C(X). Since positivity and complete positivity are operations Tx by
again equivalent, we just have to look at a positive X
and unital map E : C(X) ! B(H). With the canonical TA f f xTx A 19
basis ex , x 2 X, of C(X), we get a family x
Ex = E(e
P x ), x 2 X, of positive operators Ex 2 B(H) Hence, we can identify T with the family Tx , x 2 X.
with x2X Ex = 1. Hence, the Ex form a positive Finally, we can consider the second marginal of T
operator valued (POV) measure, i.e., an observable. X
If, on the other hand, a POV measure Ex 2 B(H), x 2 BH 3 A 7! TA 1 Tx A 2 BK 20
X, is given, we can define a quantum-to-classical x2X
channel E : C(X) ! B(H) by It describes the operation we get if the outcome of
X the measurement is ignored.
Ef f xEx 15 The best-known example of an instrument is a von
x2X
NeumannLuders measurement associated with a PV
This shows that the observable Ex , x 2 X, and the measure given by family of projections Ex , x = 1,
channel E can be identified. . . . , d; for example, the eigenprojections of a self-
adjoint operator A 2 B(H). It is defined as the channel
Preparations
T : BH CX ! BH
Let us now exchange the role of C(X) and B(H); in with X f1; . . . ; dg and Tx A Ex AEx 21
other words, let us consider a channel R : B(H) !
1
C(X) with a classical input and a quantum output Hence, we get the final state tr(Ex ) Ex Ex if we
algebra. In the Schrodinger picture, we get a family of measure the value x 2 X on systems initially in the
density matrices x := R (x ) 2 B (H), x 2 X, where state this is well known from quantum mechanics.
x 2 C (X) denotes again the Dirac measure on X.
Hence, we get a parameter-dependent preparation
Parameter-Dependent Operations
that can be used to encode the classical information
x 2 X into the quantum information x 2 B (H). Let us change now the role of B(H) C(X) and
B(K); in other words, consider a channel T : B(K) !
Instruments B(H) C(X) with hybrid input and quantum output.
An observable describes only the statistics of It describes a device which changes the state of a
measuring results, but does not contain information system depending on the additional classical infor-
about the state of the system after the measurement. mation. As for an instrument, T decomposes into a
To get a description which fills this gap, we have family of (unital!) channels
P Tx : B(K) ! B(H) such
to consider channels which operate on quantum that we get T ( p) = x px Tx () in the Schrodin-
systems and produce hybrid systems as output, that is, ger picture. Physically, T describes a parameter-
T : B(H) C(X) ! B(K). Following Davies (1976), dependent operation: depending on the classical
we will call such an object an instrument. From T we information x 2 X, the quantum information 2
can derive the subchannel B(K) is transformed by the operation Tx .
Finally, we can consider a channel T : B(H)
CX 3 f 7! T1 f 2 BK 16 C(X) ! B(K) C(Y) with hybrid input and output
to get a parameter-dependent instrument: similarly
which is the observable measured by T, that is, to the above discussion, we can define a family of
tr(T(1 ex )) is the probability to measure x 2 X on instruments Ty : B(H) P C(X) ! B(K), y 2 Y, by the
systems in the state . On the other hand, we get for equation T ( p) = y py Ty (). Physically, T
each x 2 X a quantum channel (which is not unital) describes the following device: it receives the
BH 3 A 7! Tx A TA ex 2 BK 17 classical information y 2 Y and a quantum system
in the state 2 B (K) as input. Depending on y, a
It describes the operation performed by the instru- measurement with the instrument Ty is performed,
ment T if x 2 X was measured. More precisely, if a which in turn produces the measuring value x 2 X
measurement on systems in the state gives the and leaves the quantum system in the state (up to
result x 2 X, we get (up to normalization) the state normalization) Ty, x (); with Ty, x given as in eqn
Tx () after the measurement, while [17] by Ty, x (A) = Ty (A ex ).
Chaos and Attractors 477
See also: Capacities Enhanced by Entanglement; Davies EB (1976) Quantum Theory of Open Systems. London:
Capacity for Quantum Information; Entanglement; Academic Press.
Optimal Cloning of Quantum States; Positive Maps on Jamiokowski A (1972) Linear transformations which preserve
C*-Algebras; Quantum Channels: Classical Capacity; trace and positive semidefiniteness of operators. Reports on
Mathematical Physics 3: 275278.
Quantum Dynamical Semigroups; Quantum Entropy;
Keyl M and Werner RF (1999) Optimal cloning of pure states, testing
Quantum Spin Systems; Source Coding in Quantum single clones. Journal of Mathematical Physics 40: 32833299.
Information Theory. Kraus K (1983) States Effects and Operations. Berlin: Springer.
Paulsen VI (2002) Completely Bounded Maps and Dilations.
Cambridge: Cambridge University Press.
Further Reading Stinespring WF (1955) Positive functions on C -algebras.
Proceedings of the American Mathematical Society 6: 211216.
Arveson W (1969) Subalgebras of C -algebras. Acta Mathematica
123: 141224.
and responsible for recurrence. Strange attractors Lyapunov exponents and squeezing occurs in the
are generated by dissipative dynamical systems, directions identified by the negative Lyapunov
which satisfy the additional condition 1 2 exponents. In R3 there is one stretching direction
3 < 0. For such attractors, 1 = 2 = 1 and and one squeezing direction.
3 = 1 =j3 j by the KaplanYorke conjecture, so A simple stretch-and-squeeze mechanism that
that dL = 2 3 = 2 1 =j3 j. nature appears to be very fond of is illustrated in
A number of tools from classical topology have Figure 1. In this illustration, a cube of initial
been exploited to probe the structure of strange conditions at (a) is advected by the flow in a short
attractors in three dimensions. These include the time to (b). During this process, the cube is
Gauss linking number, the Euler characteristic, the deformed by being stretched (1 > 0). It also shrinks
PoincareHopf index theorem, and braid theory. in a transverse direction (3 < 0). During the initial
More recent topological contributions include sev- phase of this deformation, two nearby points
eral definitions for entropy, the development of a typically separate exponentially in time. If they
theory for knot holders or braid holders (also called were to continue to separate exponentially for all
branched manifolds), the BirmanWilliams theorem times, the invariant set would not be bounded.
for these objects, and relative rotation rates, a Therefore, this separation cannot continue indefi-
topological index for individual periodic orbits and nitely, and in fact it must somehow reverse itself
orbit pairs. after some time because the motion is recurrent. The
Three-dimensional strange attractors are mechanism shown in Figure 1 involves folding,
remarkably well understood; those in higher which begins between (b) and (c) and continues
dimensions are not. As a result, the description through to (d). Squeezing occurs where points from
that follows is largely restricted to strange attrac- distant parts of the attractor approach each other
tors with dL < 3 that exist in R3 or other three- exponentially, as at (d). Finally, the cube, shown
dimensional manifolds (e.g., R2 S1 ). The obstacle deformed at (d), returns to the neighborhood of
to progress in higher dimensions is the lack of a initial conditions (a). This process repeats itself and
higher-dimensional analog of the Gauss linking builds up the strange attractor. As can be inferred
number for orbit pairs in R3 . from this figure, the strange attractor constructed by
the repetitive process is smooth in the expanding
(1 ) and flow (2 = 0) directions but fractal in the
Overview squeezing (3 ) direction. The attractors fractal
The program described below has two objectives: dimension is 1 2 3 = 2 3 = 2 1 =j3 j.
Figure 1 summarizes the boundedness and recur-
1. classify the global topological structure of strange rence conditions that were introduced to define
attractors in R3 ; and strange attractors, and illustrates one stretching and
2. determine the perestroikas (changes) that such squeezing mechanism that occurs repetitively to
attractors can undergo as experimental condi- build up the fractal structure of the strange attractor
tions or control parameters change.
Four levels of structure are required to complete
this program. Each is topological and discretely
quantifiable. This provides a beautiful interaction
between a rigidity of structure, demanded by
topological constraints, and freedom within this
rigidity. These four levels of structure are: Boundary (c)
layer
1. basis sets of orbits,
2. branched manifolds or knot holders,
3. bounding tori, and
4. embeddings of bounding tori. Squeeze Stre
(d) tch
(b)
and to organize all the (unstable) periodic orbits in it outflow side of the branch line) have two preimages
in a unique way. The particular mechanism shown above the branch line, one in each inflow sheet. This
in Figure 1 is called a stretch-and-fold mechanism. structure generates positive entropy.
Other mechanisms involve stretch and roll, and tear A beautiful theorem of Birman and Williams
and squeeze. justifies the use of the two cartoons shown at the
The stretch-and-squeeze mechanisms are well bottom of Figure 2 to characterize strange attractors
summarized by the cartoons shown in Figure 2. On in R3 . As preparation for the theorem, Birman and
the left, a cube of initial conditions (top) is deformed Williams introduced an important identification for
under the flow. The flow is downward. Stretching the nongeneric or atypical points that are not
occurs in one direction (horizontal) and shrinking sensitive to initial conditions
occurs in a transverse direction (perpendicular to the
t!1
page). In the limit of extreme shrinking (3 ! x y if jxt ytj ! 0 2
1), the dynamics of the stretching part of the
flow is represented by the two-dimensional surface That is, two points in a strange attractor are
shown on the bottom left. This surface fails to be a identified if they have asymptotically the same
manifold because of the singularity, called a splitting future. In practice, this amounts to projecting the
point. This singularity represents an initial condition flow down along the stable (3 < 0) direction onto a
that flows to an unstable fixed point with at least two-dimensional surface described by the stretching
one stable direction. On the right (squeezing), two (1 > 0) and the flow (2 = 0) directions. This
distant cubes of initial conditions (top) in the flow surface is not a manifold because of lower-
are deformed and brought to each others proximity dimensional singularities: splitting points and branch
under the flow (middle). In the limit of extreme lines. The two-dimensional surface has many names,
dissipation, two two-dimensional surfaces represent- for example, knot holder (because it holds the
ing inflows are joined at a branch line to a single periodic orbits that exist in abundance in strange
surface representing an outflow. This surface fails to attractors), braid holders, templates, branched mani-
be a manifold because of the branch line, which is a folds. The flow, restricted to this surface, is called a
singularity of a different kind. Points below the semiflow. Under the semiflow, points in the branched
branch line in this representation of the flow (on the manifold have a unique future but do not have a
unique past. The degree of nonuniqueness is mea-
sured by the topological entropy of the dynamical
system. The BirmanWilliams theorem is:
Theorem Assume that a flow t
3
(i) on R is dissipative (1 > 0, 2 = 0, 3 < 0 and
1 2 3 < 0);
Shrink
Stretch (ii) generates a hyperbolic strange attractor (the
Shrink Shrink
xxx
eigenvectors of the local Lyapunov exponents
Boundary 1 , 2 , 3 span everywhere on the attractor).
layer
Squeeze
Then the projection [2] maps the strange attractor
Flow
Flow
unstable periodic orbits in the strange attractor is unique up to cyclic permutation. This symbol
the same as the topological organization of all the sequence provides a symbolic name for the orbit.
unstable periodic orbits in the branched manifold. In For example, (a)()(b)(ba) is a period-4 orbit.
fact, the branched manifold (knot holder) defines The structure of a branched manifold is determined
the topological organization of all the unstable in part by a transition matrix T. The matrix element
periodic orbits that it supports. Topological organi- Tij is 1 if the transition from branch i to branch j is
zation is defined by the Gauss linking number and allowed, 0 otherwise. The transition matrix for the
the relative rotation rates, another braid index. figure-8 branched manifold is shown in Figure 3.
The significance of this theorem is that strange The BirmanWilliams theorem is stronger than its
attractors can be characterized in fact classified statement suggests. More systems satisfy the state-
by their branched manifolds. Figure 3 shows a ment of the theorem than do the assumptions of the
branched manifold for a figure-8 knot as well as theorem. The figure-8 knot, and its attendant
the figure-8 knot itself (dark curve). If a constant magnetic field, is not dissipative in fact, it is not
current is sent through a conducting wire tied into even a dynamical system, yet the closed loops can be
the shape of a figure-8 knot, a discrete countable set isotoped to the figure-8 knot holder. There are other
of magnetic field lines will be closed. These closed ways in which the BirmanWilliams theorem is
field lines can be deformed onto the two-dimen- stronger than its statement suggests.
sional surface shown in Figure 3. Each of the eight It is apparent from Figure 3 that the figure-8
branches of this branched manifold can be named. branched manifold can be built up Lego fashion
One way to do this specifies the two branch lines from the two basic building blocks shown in
that are joined by the branch in the sense of the flow Figure 2. This is more generally true. Every
(e.g., (a) and () (but not (a)). Every closed field branched manifold can be built up, Lego fashion,
line can be labeled by a symbol sequence that is from the stretch (with a splitting point singularity)
and the squeeze (with a branch line singularity)
building blocks, subject to the following two
conditions:
1. outputs flow to inputs and
2. there are no free ends.
dx = y z 4 z(t )
dt 2
dy
= x + ay 0
dt x(t )
dz = b + z(x c) 2
dt 4
(d) (e)
Figure 4 The Rossler dynamical system. (a) Rossler equations. (b) Time series z(t) and x(t) generated by these equations, and
(c) projection of the strange attractor onto the xy plane. (d) Caricature of the flow and (e) knot holder derived directly from the
caricature. Control parameter values (a, b,c) = (2:0, 4:0, 0:398): The Topology of Chaos; R Gilmore and M Lefranc; Copyright 2002,
Wiley. This material is used by permission of John Wiley & Sons, Inc.
50
40
z(t )
30
dx = x + y 20
dt 10
dy
= Rx y xz 0 x(t )
dt
dz = bz + xy 10
dt 20
(e) (d)
Figure 5 (a) Lorenz equations. (b) Time series x (t) and z(t) generated by these equations, and (c) projection of the strange attractor
onto the xy plane. (d) Caricature of the flow and (e) knot holder derived directly from the caricature by rotating the right-hand lobe by
radians. Control parameter values (R, , b) = (26:0, 10:0, 8=3): The Topology of Chaos; R Gilmore and M Lefranc; Copyright 2002,
Wiley. This material is used by permission of John Wiley & Sons, Inc.
description of two branched manifolds. Figure 7a elements are twice the linking number of the
shows the branched manifold that describes experi- period-1 orbits in the corresponding pair of branches.
mental data generated by many physical systems. Since the period-1 orbits in these two branches do not
The mechanism is a simple stretch-and-fold defor- link, the off-diagonal matrix elements are 0. The
mation with zero global torsion that generates a period-1 orbits in the branches labeled 1 and 2 in
typical Smale horseshoe. There are two branches. Figure 7b have linking number 1, so the off-diagonal
The diagonal elements of the matrix identify the matrix elements are T(1, 2) = T(2, 1) = 2 1. The
local torsion of the flow through the corresponding array identifies the order (above, below) that the two
branch, measured in units of . Branch 0 has no branches are joined at the branch line, the smaller the
local torsion, and branch 1 shows a half-twist and value, the closer to the viewer. These two pieces of
has local torsion 1. The off-diagonal matrix information, four integers in Figure 7a and eight in
482 Chaos and Attractors
1
0
0 1 0 1 2
(a) (b)
b
c
a
0 0 0
a b 0 0
0 1 2
a 0 1
c 0 2 2
b
(c) (d) 0 1 0 2 1
Figure 6 Branched manifolds for four standard sets of
equations: (a) Rossler equations, (b) periodically driven Duffing (a) (b)
equations, (c) periodically driven van der Pol equations, and Figure 7 Branched manifolds are described algebraically. The
(d) Lorenz equations. The Topology of Chaos; R Gilmore and diagonal matrix elements describe the twist of each branch.
M Lefranc; Copyright 2002, Wiley. This material is used by The off-diagonal matrix elements are twice the linking number of
permission of John Wiley & Sons, Inc. the period-1 orbits in each of the two branches. The array
describes the order in which the branches are connected at the
branch line. (a) Smale horseshoe branched manifold. (b) Beginning
Table 1 Four sets of equations that generate strange attractors of a gateau roule (jelly roll) branched manifold.
Dynamical Parameter
system ODEs values Table 2 shows the number of orbits of period
p 20 for the branched manifolds with two and
x_ = y z
Rossler y_ = x ay (a, b, c) = (2:0, 4:0, 0:398) three branches shown in Figure 7. The number of
z_ = b z(x c) orbits of period p grows exponentially with p, and
x_ = y the limit hT = limp ! 1 log (N(p))=p defines the topo-
Duffing y_ = y x 3 x (, A, !) = (0:4, 0:4, 1:0) logical entropy hT for the branched manifold. The
A sin(!t) limits are ln 2 and ln 3 for the branched manifolds
van der Pol x_ = by (c dy 2 )x (b, c, d , A, !) = with two and three branches, respectively. The
y_ = x A sin(!t) (0:7, 1:0, 10:0, 0:25, =2) linking numbers of orbits up to period 5 in the
x_ = x y Smale horseshoe branched manifold are shown in
Lorenz y_ = Rx y xz (R, , b) = (26:0, 10:0, 8=3) Table 3, which identifies each of the orbits by its
z_ = bz xy
symbol sequence (e.g., 00111).
Table 3 Linking numbers of orbits to period 5 in the Smale horseshoe branched manifold with zero global torsion
0 1 21 31 31 41 42 42 51 51 52 52 53 53
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 1 1 2 1 1 2 2 2 2 1 1
21 01 0 1 1 2 2 3 2 2 4 4 3 3 2 2
31 011 0 1 2 2 3 4 3 3 5 5 5 5 3 3
31 001 0 1 2 3 2 4 3 3 5 5 4 4 3 3
41 0111 0 2 3 4 4 5 4 4 8 8 7 7 4 4
42 0011 0 1 2 3 3 4 3 4 5 5 5 5 4 4
42 0001 0 1 2 3 3 4 4 3 5 5 5 5 4 4
51 01111 0 2 4 5 5 8 5 5 8 10 9 9 5 5
51 01101 0 2 4 5 5 8 5 5 10 8 8 8 5 5
52 00111 0 2 3 5 4 7 5 5 9 8 6 7 5 5
52 00101 0 2 3 5 4 7 5 5 9 8 7 6 5 5
53 00011 0 1 2 3 3 4 4 4 5 5 5 5 4 5
53 00001 0 1 2 3 3 4 4 4 5 5 5 5 5 4
Tables of linking numbers have been used supports. Whenever a low-dimensional strange
successfully to identify mechanisms that nature uses attractor is subjected to topological analysis, it is
to generate chaotic data. This analysis procedure is always the case that fewer periodic orbits are
called topological analysis. Segments of data are present and identified than are allowed by the
identified that closely approximate unstable periodic branched manifold that classifies it. This is the case
orbits existing in the strange attractor. These data for strange attractors generated by experimental
segments are then embedded in R3 . Each orbit is data as well as strange attractors generated by
given a trial identification (symbol sequence). Their ODEs. The full spectrum occurs only in the
pairwise linking numbers are computed either by hyperbolic limit, which has never been seen.
counting signed crossings or using the time- The orbits that are present are organized exactly
parametrized data segments and estimating the as in the hyperbolic limit that is, as determined by
integers numerically using the Gauss linking integral the underlying branched manifold. As control para-
meters change, the strange attractor undergoes
LinkA; B
I I perestroikas. New orbits are created and/or old
1 r A t1 r B t2 orbits are annihilated in direct or inverse period-
dr A t1 dr B t2
4 jr A t1 r B t2 j3 doubling and saddlenode bifurcations. The orbits
that are present are always organized as determined
This table of experimental integers is compared with by the branched manifold. Orbits are not created or
the table of linking numbers for orbits with the same annihilated independently of each other. Rather,
symbolic name on a trial branched manifold. This there is a partial order (forcing order) involved in
procedure serves to identify the branched manifold orbit creation and annihilation. This partial order is
and refine the symbolic identifications of the poorly understood for general branched manifolds.
experimental orbits, if necessary. The procedure is It is much better understood for the two-branch
vastly overdetermined. For example, the linking Smale horseshoe branched manifold.
numbers of only three low-period orbits serve to The forcing diagram for this branched manifold
identify the four pieces of information required to is shown in Figure 8 for orbits up to period 8. It is
specify a branched manifold with two branches. typically the case that the existence of one orbit in
Since six or more surrogate periodic orbits can a strange attractor forces the presence of a
typically be
extracted from experimental data, spectrum of additional orbits. Forcing is transitive,
providing 62 = 15 or more linking numbers, this so if orbit A forces orbit B(A ) B) and B forces C,
topological analysis procedure has built-in self- then A forces C: if A ) B and B ) C then A ) C.
consistency checks, unlike analysis procedures For this reason, it is sufficient to show only the
based on geometric and dynamical tools. first-order forcing in this figure. The orbits shown
are labeled by their period and the order in which
they are created in a particular highly dissipative
Basis Sets of Orbits
limit of the dynamics: the logistic map (U-sequence
A branched manifold determines the topological order in Figure 8). For example, 52 describes the
organization of all the periodic orbits that it second (pair) of period-5 orbits created in the
484 Chaos and Attractors
0.70
ln 2 815
78
0.65
Forcing of horseshoe 64
orbits to period 8
0.60
810F 811R
810R 811F
0.55 52
85F 86F 8
8
0.50 813R 814R
85R 86R
Entropy
813F 814F
0.45 73F 74F
72
73R 74R
(a)
U-sequence order
Wo : f 1/2 3/7 2/5 3/8 1/3 2/7 1/4 1/5 1/6 1/7 1/8
Braids
PD 21 41 81 61 82 71 51 72 83 31 62 84 73 85 52 86 74 87 63 88 75 42 89 76 810 64 811 77 812 53 813 78 814 65 815 79 816
(b)
Figure 8 (a) Forcing diagram for orbits up to period 8 in the Smale horseshoe branched manifold. (b) The sequence (universal
order) in which orbits are created in the highly dissipative limit, which is the logistic map. The Topology of Chaos; R Gilmore and
M Lefranc; Copyright 2002, Wiley. This material is used by permission of John Wiley & Sons, Inc.
logistic map in the transition from simple, non- period. The basis set of orbits can be constructed
chaotic behavior to fully chaotic (hyperbolic) algorithmically. The algorithm is as follows:
behavior.
1. Write down all the orbits that are present in
The orbits in the forcing diagram are organized
order of increasing two-dimensional entropy
according to their one-dimensional entropy
from left to right.
(horizontal axis, U-sequence order) and their two-
2. For orbits with the same two-dimensional entropy,
dimensional entropy (vertical axis). Nonchaotic
order by increasing one-dimensional entropy.
(laminar) behavior occurs at the lower left of
3. Remove the highest (rightmost) orbit from this
this figure, where both entropies are zero. Fully
list, together with all the orbits that it forces.
chaotic behavior occurs at the upper right, where
This is the first basis orbit.
both entropies are ln 2. As control parameters
4. Of the orbits remaining, again remove the right-
change, a dynamical system that can exhibit chaos
most and all the orbits that it forces. This is the
generated by a stretch-and-fold mechanism follows a
second basis orbit.
path in the forcing diagram from the lower left to
5. Continue until all orbits have been removed.
the upper right. Each such path is a route to
chaos. The Smale horseshoe mechanism exhibits For any finite period, the above algorithm
many different routes to chaos: each follows a terminates because there is only a finite number of
different path in the forcing diagram. orbits. For example, if the orbit 52 is present as well
The state of a strange attractor at any stage in its as all orbits with lower one-dimensional entropy,
route to chaos can be specified by a basis set of the basis set is 87 R, 76 , 74 F, 86 F, 88 , 52 . As control
orbits. This is a set of orbits whose presence forces parameters change, a strange attractor undergoes
the existence of all other orbits that can concur- perestroikas that are quantitatively determined by
rently be found in the attractor, up to any finite changes in the basis sets of orbits.
Chaos and Attractors 485
1 7
1 A E
a
A
1
2 6
2 a 7 3 2 A
B
B D
4 D c c b B a b c
b 4 5 c 4
5 6 E
c E C D
3 6 7 3 5
ABCBDED ABCDCBE ABCBDBE
abbacca abccbaa abbccaa
holes that have singularities. Heavy lines are used to Table 4 Number of canonical bounding tori as a function of
show the location of the seven components of the genus g
global Poincare surface of section for each of the three g N(g) g N(g) g N(g)
inequivalent genus-8 canonical forms shown in
Figure 9. The structure of the flow is summarized by 3 1 9 15 15 2 211
a transition matrix. For the canonical form shown in 4 1 10 28 16 5 549
5 2 11 67 17 14 290
Figure 9c the transition matrix is 6 2 12 145 18 36 824
2 3 7 5 13 368 19 96 347
1 1 0 0 0 0 0
60 0 1 1 0 0 07 8 6 14 870 20 252 927
6 7
60 0 1 1 0 0 07
6 7
T6 60 0 0 0 1 1 07
7
60 0 0 0 1 1 07 canonical forms grows rapidly with g, as shown in
6 7 Table 4. In fact, the number, N(g), grows exponen-
40 1 0 0 0 0 15
tially and can even be assigned an entropy:
1 0 0 0 0 0 1
lnNg
where Ti, j = 1 if the flow can proceed directly from lim ln 3 5
g!1 g1
component i to component j, 0 otherwise.
Bounding tori, dressed with flows, can be labeled. In In some sense, canonical forms that constrain
fact, two dual labeling schemes are possible. Following branched manifolds within them behave like branched
the outer boundary in the direction of the flow, one manifolds that constrain periodic orbits on them.
encounters the g 1 components of the global Poin- Every strange attractor that has been studied in R3
care surface of section sequentially, the interior holes has been described by a canonical bounding torus that
without singularities at least once each, and the interior contains it. This classification is shown in Table 5.
holes with singularites at least twice each. The Branched manifold perestroikas are constrained
canonical form (genus-g torus dressed with a flow) on by bounding tori as follows. Each branch line of any
the genus-8 bounding torus shown in Figure 9a can be branched manifold can be moved into one of the
labeled by the sequence in which the holes without g 1 components of the global Poincare surface of
singularities are encountered (ABCBDED) or the order section. Any branched manifold contained in a
in which the holes with singularities are encountered genus-g bounding torus (g 3) must have at least
(abbacca). Both sequences contain g 1 symbols. one branch between each pair of components of the
These labels are unique up to cyclic permutation. global Poincare surface of section between which the
Symbol sequences for canonical forms for bounding flow is allowed, as summarized by the canonical
tori act in many ways like symbol sequences for forms transition matrix. New branches can only be
periodic orbits on branched manifolds. Although there added in a way that is consistent with the canonical
is a 1:1 correspondence between bounded closed two- forms transition matrix, continuity requirements,
dimensional surfaces in R3 and genus g, the number of and the no intersection condition.
Table 5 All known strange attractors of dimension dL < 3 are bounded by one of the standard dressed tori. Dual labels for the
bounding tori depend on g 1 symbols describing holes with or without singularities
In the simplest case, g = 1, a third branch can be canonical flow have a larger (but discrete) variety of
added to a branched manifold with two branches only extrinsic embeddings in R3 .
if its local torsion differs by
1 from the adjacent
branch. In addition, the ordering of the new branch
must be consistent with the continuity and no The Embedding Question
intersection (ODE uniqueness theorem) requirements.
The mechanism that nature uses to generate chaotic
behavior in physical systems is not directly observable,
and must be deduced by examining the data that are
Embeddings of Bounding Tori generated. Typically, the data consist of a single scalar
The last level of topological structure needed for the time series that is discretely recorded: xi , i = 1, 2, . . . .
classification of strange attractors in R3 describes In order to exhibit a strange attractor, a mapping of the
their embeddings in R3 . The classification using data into RN must also be constructed. If the attractor
genus-g bounding tori is intrinsic that is, the is low dimensional (dL < 3), one can hope that a
canonical form shows how the flow looks from mapping into R3 can be constructed that exhibits no
inside the torus. Strange attractors, and the tori that self-intersections or other degeneracies. Such a map is
bound them, are actually embedded in R3 . For a called an embedding. Once an embedding in R3 is
complete classification, we must specify not only the available, a topological analysis can be carried out. The
canonical form but also how this form sits in R3 . analysis reveals the mechanism that underlies the
This program has not yet been completed, but we creation of the embedded strange attractor.
illustrate it with the genus-1 bounding torus in But how do you know that the mechanism that
Figure 10. Figure 10a shows the canonical form, and generates the observed, embedded strange attractor
two different embeddings of it in R3 . The embedding has anything to do with the mechanism nature used
on the left is unknotted. The embedding on the right is to generate the experimental data?
knotted like a figure-8 knot. Extrinsic embeddings of If the embedding is contained in a genus-1 bounding
genus-1 tori are described by tame knots in R3 , and torus, then the topological mechanism that generates
tame knots can be used as centerlines for extrinsi- the data, as defined by some unknown branched
cally embedded genus-1 tori. Higher-genus (g 3) manifold BMEXP , and the topological mechanism that
canonical forms intrinsic genus-g tori dressed with a is identified from the embedded strange attractor
BMEMB , are identical up to three degrees of freedom:
parity, global torsion, and the knot type. As a result, in
this case (genus-1) a topological analysis of embedded
data does reveal natures hidden secrets.
Further Reading
Abraham R and Shaw CD (1992) Dynamics: The Geometry of
Behavior, Studies in Nonlinearity, 2nd edn. Reading, MA:
Addison-Wesley.
Eckmann J-P and Ruelle D (1985) Ergodic theory of chaos and
strange attractors. Reviews of Modern Physics 57(3): 617656.
(b) (c)
Gilmore R (1998) Topological analysis of chaotic dynamical
Figure 10 (a) Canonical form for genus-1 bounding torus. systems. Reviews of Modern Physics 70(4): 14551529.
Extrinsic embeddings of the torus into R 3 that are (b) unknotted Gilmore R and Lefranc M (2002) The Topology of Chaos, Alice
and (c) knotted like the figure-8 knot. in Stretch and Squeezeland. New York: Wiley.
488 Characteristic Classes
Gilmore R and Letellier C (2006) The Symmetry of Chaos Alice Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge
in the Land of Mirrors. Oxford: Oxford University Press. University Press.
Gilmore R and Pei X (2001) The topology and organization of Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear
unstable periodic orbits in HodgkinHuxley models of receptors Physics and Its Mathematical Tools. Bristol: IoP Publishing.
with subthreshold oscillations. In: Moss F and Gielen S (eds.) Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental
Handbook of Biological Physics, Neuro-informatics, Neural Approach to Nonlinear Dynamics and Chaos. Reading, MA:
Modeling, vol. 4, pp. 155203. Amsterdam: North-Holland. Addison-Wesley.
Characteristic Classes
P B Gilkey, University of Oregon, Eugene, OR, USA Frames
R Ivanova, University of Hawaii Hilo, Hilo, HI, USA
A frame s := (s1 , . . . , sk ) for V 2 Vectk (M, F) over an
S Nikcevic, SANU, Belgrade, Serbia and Montenegro
open set O M is a collection of k smooth sections
2006 Elsevier Ltd. All rights reserved. to V jO so that {s1 (P), . . . , sk (P)} is a basis for the
fiber VP of V over any point P 2 O. Given such a
frame s, we can construct a local trivialization which
Vector Bundles identifies O Fk with VjO by the mapping
Let Vectk (M, F) be the set of isomorphism classes of P; 1 ; . . . ; k ! 1 s1 P k sk P
real (F = R) or complex (F = C) vector bundles of
rank k over a smooth connected m-dimensional Conversely, given a local trivialization of V, we can
manifold M. Let take the coordinate frame
[
VectM; F Vectk M; F si P P 0; . . . ; 0; 1; 0; . . . ; 0
k
Thus, frames and local trivializations of V are
equivalent notions.
Principal Bundles Examples
Let H be a Lie group. A fiber bundle Simple Covers
:P!M An open cover {O } of M, where ranges over some
indexing set A, is said to be a simple cover if any
with fiber H is said to be a principal bundle if there finite intersection O1 \ \ Ok is either empty or
is a right action of H on P which acts transitively on contractible.
the fibers, that is, if P=H = M. If H is a closed Simple covers always exist. Put a Riemannian
subgroup of a Lie group G, then the natural metric on M. If M is compact, then there exists a
projection G ! G=H is a principal H bundle over uniform > 0 so that any geodesic ball of radius is
the homogeneous space G=H. Let O(k) and U(k) geodesically convex. The intersection of geodesically
denote the orthogonal and unitary groups, respec- convex sets is either geodesically convex (and hence
tively. Let Sk denote the unit sphere in Rk1 . Then contractible) or empty. Thus, covering M by a finite
we have natural principal bundles: number of balls of radius yields a simple cover.
The argument is similar even if M is not compact
Ok Ok 1 ! Sk
where an infinite number of geodesic balls is used
Uk Uk 1 ! S2k1 and the radii are allowed to shrink near 1.
Let RPk and CPk denote the real and complex Transition Cocycles
projective spaces of lines through the origin in Rk1
and Ck1 , respectively. Let Let Hom(F, k) be the set of linear transformations of
Fk and let GL(F, k) Hom(F, k) be the group of all
Z2 f
Idg Ok invertible linear transformations.
S1 f Id : jj 1g Uk Let {s } be frames for a vector bundle V over some
open cover {O } of M. On the intersection O \ O ,
One has Z2 and S1 principal bundles: one may express s = s , that is
Z2 ! Sk1 ! RPk1 X
j
s;i P ;i Ps;j P
S1 ! S2k1 ! CPk1 1jk
488 Characteristic Classes
Gilmore R and Letellier C (2006) The Symmetry of Chaos Alice Ott E (1993) Chaos in Dynamical Systems. Cambridge: Cambridge
in the Land of Mirrors. Oxford: Oxford University Press. University Press.
Gilmore R and Pei X (2001) The topology and organization of Solari HG, Natiello MA, and Mindlin GB (1996) Nonlinear
unstable periodic orbits in HodgkinHuxley models of receptors Physics and Its Mathematical Tools. Bristol: IoP Publishing.
with subthreshold oscillations. In: Moss F and Gielen S (eds.) Tufillaro NB, Abbott T, and Reilly J (1992) An Experimental
Handbook of Biological Physics, Neuro-informatics, Neural Approach to Nonlinear Dynamics and Chaos. Reading, MA:
Modeling, vol. 4, pp. 155203. Amsterdam: North-Holland. Addison-Wesley.
Characteristic Classes
P B Gilkey, University of Oregon, Eugene, OR, USA Frames
R Ivanova, University of Hawaii Hilo, Hilo, HI, USA
A frame s := (s1 , . . . , sk ) for V 2 Vectk (M, F) over an
S Nikcevic, SANU, Belgrade, Serbia and Montenegro
open set O M is a collection of k smooth sections
2006 Elsevier Ltd. All rights reserved. to V jO so that {s1 (P), . . . , sk (P)} is a basis for the
fiber VP of V over any point P 2 O. Given such a
frame s, we can construct a local trivialization which
Vector Bundles identifies O Fk with VjO by the mapping
Let Vectk (M, F) be the set of isomorphism classes of P; 1 ; . . . ; k ! 1 s1 P k sk P
real (F = R) or complex (F = C) vector bundles of
rank k over a smooth connected m-dimensional Conversely, given a local trivialization of V, we can
manifold M. Let take the coordinate frame
[
VectM; F Vectk M; F si P P 0; . . . ; 0; 1; 0; . . . ; 0
k
Thus, frames and local trivializations of V are
equivalent notions.
Principal Bundles Examples
Let H be a Lie group. A fiber bundle Simple Covers
;
h1 ; h
Lifting the Structure Group
The associated vector bundle is then given by
Let be a representation of a Lie group H to
GL(F, k). One says that the structure group of V can P Fk : P Fk=
be lifted to H if there exist frames {s } for V and
smooth maps : O \ O ! H, so = Clearly, { } are the transition cocycles of the
where eqn [1] holds for . vector bundle P Fk .
490 Characteristic Classes
The map V ! Rank(V) extends to a surjective respectively. The topology on these spaces is the
map from KF(M) to Z. We denote the associated weak or inductive topology. The Grassmannians are
ideal of virtual bundles of virtual rank 0 by called classifying spaces. The isomorphisms of
eqn [4] are compatible with the inclusions of eqn [5]
f
KFM : kerRank and we have
In the stable range, V ! [V] k[ l ] identifies M; Grk F; 1 Vectk M; F 6
g
Vectk M; R KRM if k > m
3
g
Vectk M; C KCM if 2k > m
Spaces with Finite Covering Dimension
These groups contain nontrivial torsion. Let L be the A metric space X is said to have a covering
nontrivial real line bundle over RPk . Then dimension at most m if, given any open cover {U }
g
KRRP k
Z fL l g=2k ZfL l g of X, there exists a refinement {O } of the cover so
that any intersection of more than m 1 of the {O }
where (k) is the Adams number. is empty. For example, any manifold of dimension
m has covering dimension at most m. More
Classifying Spaces generally, any m-dimensional cell complex has
covering dimension at most m.
Let Grk (F, n) be the Grassmannian of k-dimensional The isomorphisms of [2][4], and [6] continue to
subspaces of Fn . By mapping a k-plane in Fn to the hold under the weaker assumption that M is a metric
corresponding orthogonal projection on , we can space with covering dimension at most m.
identify Grk (F, n) with the set of orthogonal projec-
tions of rank k:
f
2 HomFn :
2
;
; tr
kg Characteristic Classes of Vector
There is a natural associated tautological k-plane Bundles
bundle The Cohomology of Grk (F, 1)
Vk F; n 2 Vectk Grk F; n; F The cohomology algebras of the Grassmannians are
whose fiber over a k-plane is the k-plane itself: polynomial algebras on suitably chosen generators:
Vk F; n : f
; x 2 HomFn Fn :
x xg H Grk R; 1; Z2 Z2 sw1 ; . . . ; swk
7
H Grk C; 1; Z Zc1 ; . . . ; ck
Let [M, Grk (F, n)] denote the set of homotopy
equivalence classes of smooth maps f from M to
Grk (F, n). Since [f1 ] = [f2 ] implies that f1 V is The StiefelWhitney Classes
isomorphic to f2 V, the association
Let V 2 Vectk (M, R). We use eqn [6] to find
f ! f Vk F; n 2 Vectk M; F : M ! Grk (R, 1) which classifies V; the map
induces a map is uniquely determined up to homotopy and, using
eqn [7], one sets
M; Grk F; n ! Vectk M; F
swi V : swi 2 H i M; Z2
This map defines a natural equivalence of functors
in the stable range: The total StiefelWhitney class is then defined by
M; Grk R; k Vectk M; R for > m swV 1 sw1 V swk V
4
M; Grk C; k Vectk M; C for 2 > m The StiefelWhitney class has the properties:
The natural inclusion of Fn in Fn1 induces natural 1. If f : X1 ! X2 , then f (sw(V)) = sw(f V).
inclusions 2. sw(V W) = sw(V)sw(W).
3. If L is the Mobius bundle over S1 , then sw1 (L)
Grk F; n Grk F; n 1
5 generates H1 (S1 ; Z2 ) = Z2 .
Vk F; n Vk F; n 1
The cohomology algebra of real projective space
Let Grk (F, 1) and Vk (F, 1) be the direct limit is a truncated polynomial algebra:
spaces under these inclusions; these are the infinite-
dimensional Grassmannians and classifying bundles, H RPk ; Z2 Z2 x=xk1 0
492 Characteristic Classes
Since TRPk l = (k 1)L, one has classes pi (V) 2 H 4i (X; Z) are characterized by the
properties:
swTRPk 1 xk1
k 1k 2 1. p(V) = 1 p1 (V) pk (V).
1 kx x 8
2 2. If f : X1 ! X2 , then f (p(V)) = p(f V).
3. Rp(V W) = p(V)p(W) mod elements of order 2.
2
4. CP2 1 p (TCP ) = 3.
Orientability and Spin Structures
The StiefelWhitney classes have real geometric We can complexify a real vector bundle V to
meaning. For example, sw1 (V) = 0 if and only if V construct an associated complex vector bundle VC .
is orientable; if sw1 (V) = 0, then sw2 (V) = 0 if and We have
only if V admits a spin structure. With reference to
the discussion on the tangent bundle or projective pi V : 1i c2i VC
space, eqn [8] yields Conversely, if V is a complex vector bundle, we can
construct an underlying real vector bundle VR by
sw1 TRP 0 if k
0 mod 2
k
forgetting the underlying complex structure. Mod-
x if k
1 mod 2
ulo elements of order 2, we have
Thus, RPk is orientable if and only if k is odd.
Furthermore, pVR cVcV
Let TCPk be the real tangent bundle of complex
sw2 TRPk 0 if k
3 mod 4
x if k
1 mod 4 projective space. Then
Thus, TRPk is spin if and only if k
3 mod 4. pTCPk 1 x2 k1
Chern Classes
Let V 2 Vectk (M, C). We use eqn [6] to find Line Bundles
: M ! Grk (C, 1) which classifies V; the map
is uniquely determined up to homotopy and, using Tensor product makes Vect1 (M, F) into an abelian
eqn [7], one sets group. One has natural equivalences of functors
which are group homomorphisms:
ci V : ci 2 H 2i M; Z
sw1 : Vect1 M; R ! H 1 M; Z2
The total Chern class is then defined by
c1 : Vect1 M; C ! H 2 M; Z
cV : 1 c1 V ck V
The Chern class has the properties: A real line bundle L is trivial if and only if it is
orientable or, equivalently, if sw1 (L) vanishes. A
1. If f : X1 ! X2 , then f (c(V)) = c(f V). complex line bundle L is trivial if and only if
2. c(V W) = c(V)c(W). c1 (L) = 0. There are nontrivial vector bundles with
3. Let L be the R classifying line bundle over vanishing StiefelWhitney classes of rank k > 1. For
S2 = CP1 . Then S2 c1 (L) = 1. example, swi (TSk ) = 0 for i > 0 despite the fact that
The cohomology algebra of complex projective TSk is trivial if and only if k = 1, 3, 7.
space also is a truncated polynomial algebra
H CPk ; Z Zx=xk1
Curvature and Characteristic Classes
where x = c1 (L) and L is the complex classifying line
de Rham Cohomology
bundle over CPk = Gr1 (C, k 1). If Tc CPk is the
complex tangent bundle, then We can replace the coefficient group Z by C at the cost
of losing information concerning torsion. Thus, we
cTc CPk 1 xk1 may regard pi (V) 2 H 4i (M; C) if V is real or ci (V) 2
H 2i (M; C) if V is complex. Let M be a smooth
manifold. Let C1 p M be the space of smooth
The Pontrjagin Classes
p-forms and let
Let V be a real vector bundle over a topological
space X of rank r = 2k or r = 2k 1. The Pontrjagin d : C1 p M ! C1 p1 M
Characteristic Classes 493
be the exterior derivative. The de Rham cohomology The curvature operator can also be computed
groups are then defined by locally. Let (si ) be a local frame. Expand
X j
p kerd : C1 p M ! C1 p1 M rsi !i
sj
HdeR M :
imd : C1 p1 M ! C1 p M j
The de Rham theorem identifies the topological to define the connection 1-form !. One then has
cohomology groups H p (M; C) with the de Rham
p
cohomology groups HdeR (M) which are given j j
r2 si d! i !ki ^ !k
sk
differential geometrically.
Given a connection on V, the ChernWeyl theory and so
enables us to compute Pontrjagin and Chern classes in
de Rham cohomology in terms of curvature. j j
i d!i !ki ^ !k
j
js
If s = i j is another local frame, we compute
Connections
Let V be a vector bundle over M. A connection ~ dgg1 g!g1
! and ~ gg1
where h , i denotes the natural pairing between the Such connections always exist and, relative to a
tangent and cotangent spaces. This generalizes to the local orthonormal frame, the curvature is skew-
bundle setting the notion of a directional derivative symmetric, that is,
and has the properties:
0
1. rfX s = f rX s.
2. rX (fs) = X(f )s f rX s. Thus, can be regarded as a 2-form-valued element
3. rX1 X2 s = rX1 s rX2 s. of the Lie algebra of the structure group, O(V) in the
4. rX (s1 s2 ) = rX s1 rX s2 . real setting or U(V) in the complex setting.
This is not a second-order partial differential Let VP be the fiber of V over a point P 2 M. The
operator; it is a zeroth-order operator, that is, inclusion i : V R n defines the classifying map
f : P ! Grk (R, n) where we set
fs ddf
s df ^ rs df ^ rs f r2 s
f s f P iVP
494 Characteristic Classes
are even functions of x, so the ambiguity in the If M is an even-dimensional manifold, let em (M) :=
choice of sign in the eigenvalues plays no role. This em (TM). If we reverse the local orientation of M,
defines characteristic classes then em (M) changes sign. Consequently, em (M) is a
measure rather than an m-form; we can use the
Li V 2 H 4i M; C and ^ i V 2 H 4i M; C
A Riemannian measure on M to regard em (M) as a
scalar. Let Rijkl be the components of the curvature of
Summary of Formulas the Levi-Civita connection with respect to some local
orthonormal frame field; we adopt the convention
We summarize below some of the formulas in terms that R1221 = 1 on the standard sphere S2 in R3 . If
of characteristic classes: "I,J := (eI , eJ ) is the totally antisymmetric tensor, then
p
1tr()
1. c1 () = , X "I;J Ri Rim1 im jm jm1
2 1 i2 j2 j1
e2n :
1 8n n!
2. c2 () = 2 {tr(2 ) tr()2 }, I; J
8
1 Let R := Rijji and ij := Rikkj be the scalar curvature
3. p1 () = 2 tr(2 ),
8 and the Ricci tensor, respectively. Then
c21 2c2
4. ch(V) = k c1 (V), 1
2 e2 R
2 4
c1 (c1 c2 ) c1 c2
5. td(V)= 1 (V), 1
2 12 24 e4 R2 4jj2 jRj2
322
p1 7p21 4p2
6. A(V) = 1 (V),
24 5760
Characteristic Classes of Principal
p1 7p2 p21
7. L(V) = 1 (V), Bundles
3 45
8. td(V W) = td(V)td(W), Let g be the Lie algebra of a compact Lie group G.
9. A(V W) = A(V)A(W), Let : P ! M be a principal G bundle over M. For
2 P, let
10. L(V W) = L(V)L(W).
V
: ker : T
P ! T
M and H
: V ?
metric on P V ; the curvature of the connection r Bott R and Tu LW (1982) Differential forms in algebraic
defined here agrees with the definition previously. topology. Graduate Texts in Mathematics, p. 82. New York
Berlin: Springer-Verlag.
Let Q(G) be the algebra of all polynomials on Chern S (1944) A simple intrinsic proof of the GaussBonnet
g which are invariant under the adjoint action. If formula for closed Riemannian manifolds. Annals of Mathe-
Q 2 Q(G), then Q() is well defined. One has matics 45: 747752.
dQ() = 0. Furthermore, the de Rham cohomology Chern S (1945) On the curvatura integra in a Riemannian
class Q(P) := [Q()] is independent of the particular manifold. Annals of Mathematics 46: 674684.
Conner PE and Floyd EE (1964) Differentiable periodic maps.
connection chosen. We have Ergebnisse der Mathematik und ihrer Grenzgebiete, N.F.,
Band 33. New York: Academic Press; BerlinGottingen
QUk Cc1 ; . . . ; ck
Heidelberg: Springer-Verlag.
QSUk Cc2 ; . . . ; ck de Rham G (1950) Complexes a automorphismes et homeomorphie
differentiable (French). Ann. Inst. Fourier Grenoble 2: 5167.
QO2k Cp1 ; . . . ; pk
Eguchi T, Gilkey PB, and Hanson AJ (1980) Gravitation, gauge
QO2k 1 Cp1 ; . . . ; pk theories and differential geometry. Physics Reports 66: 213393.
Eilenberg S and Steenrod N (1952) Foundations of Algebraic
QSO2k Cp1 ; . . . ; pk ; ek =e2k pk Topology. Princeton, NJ: Princeton University Press.
QSO2k 1 Cp1 ; . . . ; pk Greub W, Halperin S, and Vanstone R (1972) Connections,
Curvature, and Cohomology. Vol. I: De Rham Cohomology
Thus, for this category of groups, no new character- of Manifolds and Vector Bundles. Pure and Applied Mathe-
istic classes ensue. Since the invariants are Lie- matics, vol. 47. New YorkLondon: Academic Press.
algebra theoretic in nature, Hirzebruch F (1956) Neue topologische Methoden in der
algebraischen Geometrie (German). Ergebnisse der Mathema-
QSpink QSOk tik und ihrer Grenzgebiete (N.F.), Heft 9. BerlinGottingen
Heidelberg: Springer-Verlag.
Other groups, of course, give rise to different Husemoller D (1966) Fibre Bundles. New YorkLondonSydney:
characteristic rings of invariants. McGraw-Hill.
Karoubi M (1978) K-theory. An introduction. Grundlehren der
Mathematischen Wissenschaften, Band 226. BerlinNew York:
Acknowledgmnts Springer-Verlag.
Kobayashi S (1987) Differential Geometry of Complex Vector
Research of P Gilkey was partially supported by Bundles. Publications of the Mathematical Society of Japan, 15.
the MPI (Leipzig, Germany), that of R Ivanova by Kano Memorial Lectures, 5. Princeton, NJ: Princeton University
the UHH Seed Money Grant, and of S Nikcevic by Press; Tokyo: Iwanami Shoten.
Milnor JW and Stasheff JD (1974) Characteristic Classes. Annals
MM 1646 (Serbia), DAAD (Germany), and Dierks of Mathematics Studies, No. 76. Princeton, NJ: Princeton
von Zweck Stiftung (Esen, Germany). University Press; Tokyo: University of Tokyo Press.
Steenrod NE (1962) Cohomology Operations. Lectures by NE
See also: Cohomology Theories; Gerbes in Quantum Steenrod written and revised by DBA Epstein. Annals of Mathe-
Field theory; Instantons: Topological Aspects; K-Theory; matics Studies, No. 50. Princeton, NJ: Princeton University Press.
Mathai-Quillen Formalism; Riemann Surfaces. Steenrod NE (1951) The Topology of Fibre Bundles. Princeton
Mathematical Series, vol. 14. Princeton, NJ: Princeton
University Press.
Further Reading Stong RE (1968) Notes on Cobordism Theory. Mathematical
Notes. Princeton, NJ: Princeton University Press; Tokyo:
Besse AL (1987) Einstein manifolds. Ergebnisse der Mathematik University of Tokyo Press.
und ihrer Grenzgebiete (3) [Results in Mathematics and Weyl H (1939) The Classical Groups. Their Invariants and
Related Areas (3)], p. 10. Berlin: Springer-Verlag. Representations. Princeton, NJ: Princeton University Press.
ChernSimons Functional Integrals subject to the initial condition g(0) = I, the identity.
The path t 7! g(t) describes parallel transport along C
We shall describe here the typical ChernSimons
by the connection A. If C is a loop then the final value
functional integral. For the purposes of this article,
g(1) is the holonomy of A around C. If R is a repre-
we will confine ourselves to a simpler setting rather
sentation of G on some finite-dimensional vector space
than the most general possible one. In fact, we shall
then the trace of R(g(1)) is the Wilson loop observable:
work with fields over three-dimensional Euclidean
space R 3 (instead of a general 3-manifold). WC;R A trRg1 3
The typical ChernSimons functional integral is of
Thus, we have specified the meaning of the terms
the form
appearing in the formal integral [1], where
Z
C1 , . . . , Cn of eqn [1] form a link (a family of
eik=4SCS A WC1 ;R1 A . . . WCn ;Rn ADA 1 nonintersecting, imbedded loops) in R3 and
A
R1 , . . . , Rn are finite-dimensional representations of
Our objective in this section will be to specify what G. Witten showed that, at least for suitable values of
the terms in this formal integral mean. Very briefly, k, integrals of this form ought to produce topologi-
the integration is with respect to a formal Lebesgue cal invariants, which he identified, for the link.
measure on A, an infinite-dimensional space of The integral [1] is problematic for several reasons.
geometric objects A called connections over R 3 with First, there is no reasonable and useful analog of
values in the Lie algebra LG of a group G. In the Lebesgue measure on an infinite-dimensional space.
first term in the integrand, in the exponent, k is a Even if one were to regularize this measure in some
real number, and SCS (A) is the ChernSimons action simple way, one would run into the problem that the
for the connection A. Each term WCi ,Ri (A) is a measure would not live on the space of smooth
Wilson loop observable, the trace in some represen- connections, and so the integrand would become
tation Ri of the holonomy of the connection A meaningless.
around the loop Ci . The entire integral, formal There are several different approaches to a
though it may be, provides an invariant associated mathematical interpretation of [1]. The approach
with the system of loops C1 , . . . , Cn . that is often taken in practice is to simply ignore the
Let G be a compact Lie group; for ease of analytical problem and define the value of the
exposition, let us take G to be a closed, connected integral [1] to be what Wittens calculations have
subgroup of U(n). Thus, each element of G is an given. One approach, used, for instance, by Bar-
n n complex matrix g with g g = I, the identity. Natan (1995) is to expand the integrand in a series
The Lie algebra LG consists of all n n matrices A and relate each individual integral in this expansion
which are skew-Hermitian, that is, satisfy A = A, separately to topological invariants. Discrete
and for which etA 2 G for all real numbers t. On LG approximation procedures to the continuum integral
there is a convenient inner product given by have also been explored. In the abelian case, infinite-
hA; Bi trAB dimensional oscillatory integral techniques have
been used to understand the functional integral.
This inner product is invariant under the conjuga- Frohlich and King (1999) showed the possibility of
tion action of the group G on its Lie algebra LG. interpreting parallel transport using ideas from
By a connection over R3 we shall mean a C1 stochastic differential equations. Such an approach
1-form with values in LG. The set of all connections has been used successfully in the case of two-
is an affine (in our case, actually a linear) space A. If dimensional YangMills theory, where the func-
A 2 A, then define tional integral actually corresponds to integration
Z with respect to a measure. In this article, we focus
SCS A trA ^ dA 23 A ^ A ^ A 2 on a method of understanding the normalized
R3 ChernSimons functional integral in terms
This is, up to constant multiple, the ChernSimons of infinite-dimensional distribution theory and
action functional. examining some ideas for understanding Wilson
Let A be a connection and consider a piecewise loop expectation values in this setting.
smooth path
C : 0; 1 ! R3 Infinite Dimensional Distributions
With this one can associate a G-valued path [0,1] ! Let (x0 , x1 , x2 ) denote the usual coordinates on R 3 .
G : t 7! g(t) 2 G satisfying the differential equation Gauge symmetry, an issue which will not be
examined here, may be used to simplify the problem
g0 tgt1 AC0 t of the ChernSimons integral. In particular, one
498 ChernSimons Models: Rigorous Results
need only focus on connections which vanish in the The inner products h , ip give rise to a nuclear space
x2 -direction, that is, connections of the form structure on function spaces over E. Let U be the
A = A0 dx0 A1 dx1 . For such A, the triple wedge- algebra of functions on E 0 generated by the exponen-
product term in the ChernSimons action disap- tials e^x , with x running over E and over C. For each
pears, and we are left with the quadratic expression: p 0, there is an inner product hh , iip on U such that
Z DD 2 2 2 2
EE
SCS A trA ^ dA 4 e^x jxjp =2 ; e^y jyjp =2
ehx;yi p 7
p
R3
For p = 0 the left-hand side coincides with the L2 ()
This is good, since the functional integral now
inner product. Let [E]p be the Hilbert space
involves a quadratic exponent and so stands a good
completion of U in the hh , iip inner product. Then
chance of rigorous realization, just as Gaussian
measure can be given rigorous meaning in infinite E3 E2 E1 E0 L2 E 0 ; 8
dimensions. However, in the ChernSimons situa-
tion, there is no hope of actually getting a measure, Let [E] = \p 0 [E]p , equipped with topology from all
not even a complex measure. the norms kkp , and [E]0 its topological dual.
The next best thing to a measure is a distribution Elements of [E]0 , being continuous linear functionals
or generalized function. A distribution over a space on the test function space [E], are called distribu-
Y is a continuous linear functional on a topological tions over E, in the language of white-noise analysis.
vector space of functions on Y. Thus, the objective is A fundamental tool in the study of infinite-
to realize the ChernSimons functional integral as a dimensional distributions is the S-transform. This
continuous linear functional on some space of test generalizes the traditional SegalBargmann trans-
functions over A (more precisely, on an extension of form from the L2 -setting to the context of distribu-
A). Before turning to the specific case of the Chern tions. Let E c be the complexification of E. The inner
Simons integral, let us examine some elements of the product h , i0 on E extends to a complex-bilinear
theory of infinite-dimensional distributions, in as pairing E c E c ! C : (z, w) 7! z w. The evaluation
much as they are relevant to our needs. pairing E 0 E ! R also extends naturally to the
Let us consider a Hilbert space E 0 , and a positive complexifications. For a distribution belonging to
HilbertSchmidt operator T on E 0 . For each integer [E]0 , define a function S on E by
p 0, let E p = T p (E 0 ), which is a Hilbert space with Sz cz
the inner product hx, yip = hT p x, T p yi. Then we
have the chain of inclusions for all z 2 E c . Here cz is the coherent state function on
\ E 0 given by cz () = e(z)(1=2)zz . A fundamental and
E Ep E2 E1 E0 5 useful result in white-noise analysis, due originally to
p1 Potthoff and Streit, specifies the range of the transform
S and allows reconstruction of a distribution from
with each inclusion E p1 ! E p being Hilbert
the function S. Briefly, the range of S consists of
Schmidt. Let E p = E 0p be the topological dual of E p ,
functions which are holomorphic, in an appropriate
the space of continuous linear functionals on E p , and
sense, and have at most quadratic exponential growth.
let E 0 be the topological dual of E, where the latter is
In particular, this theorem implies that a function of the
given the topology generated by all the norms kkp .
form z 7! eazz , for any constant a, is in the range of .
Then we have the inclusions
[
E 0 E 00 E 1 E 2 E 0 E p 6
p0
Rigorous Realization of ChernSimons
For each x 2 E there is the evaluation map Integrals
^ : E 0 ! R : 7! (x). A very special case of a general
x We return to the ChernSimons context. As men-
theorem of Minlos guarantees that on the dual E 0 there tioned earlier, gauge symmetry may be invoked to
is a measure on the sigma algeba generated by all the reduce the space of connections to the smaller space:
functions x ^ such that each x ^ is a Gaussian random
variable of mean zero and variance jxj20 , that is, E X X 9
Z 3
where X = S(R )
LG is the space of rapidly
2 2
eit^x d et jxj0 =2 decreasing functions with values in the Lie algebra
E0
LG. Let
for all x 2 E and t 2 R. This measure is the !1
standard Gaussian measure on E 0 for the infinite- d2 x2
T1 2
dimensional nuclear space E. dx 4
ChernSimons Models: Rigorous Results 499
as a linear operator on L2 (R 3 ), T2 = T1
3
I the by (x) = 3 (x=). Next, for a smooth loop
induced operator on L2 (R 3 )
LG, and T = T2 T2 . [0, 1] ! l(t) = (l0 (t), l1 (t), l2 (t)), let l (t) = ( l(t)),
Then, as described in the preceding section, we have the scaled bump function centered now at the path
the space E and its dual E 0 . There is then the point l(t). Now consider a generalized connection
standard Gaussian measure on E 0 , and the space A = (A0 , A1 ) 2 E 0 . Set
[E]0 of distributions over E 0 .
The normalized ChernSimons integral may be BlA t A0 l tl0 t0 A1 l tl0 t1 13
viewed as a linear functional
The equation of parallel transport can be reformu-
Z
1 lated as a differential equation for a matrix-valued
CS : F 7! eik=4SCS A FADA 10
path t 7! PlA (t) satisfying
N E
where N is a normalizing factor. Rigorous mean- d l
P t BlA tPlA t 0 14
ing can be given to this by first formally working out dt A
what the S-transform of CS ought to be. Calcula-
and the initial condition PlA (t) = I. With this smear-
tion shows that S is indeed a holomorphic function
ing, one can consider functions of the form
on E c of quadratic growth. The PotthoffStreit
theorem then implies that CS does exist as a Y
n
Further Reading Frohlich J and King C (1989) The ChernSimons theory and Knot
polynomials. Communications in Mathematical Physics
Albeverio S, Hahn A, and Sengupta AN (2003) ChernSimons 126: 167199.
theory, Hida distributions, and state models. Infinite Dimen- Kondratiev Yu, Leukert P, Potthoff J, Streit L, and Westerkamp W
sional Analysis Quantum Probability and Related Topics (1996) Generalized functionals in Gaussian spaces the
6: 6581. characterization theorem revisited. Journal of Functional
Albeverio S and Schafer J (1994) Abelian ChernSimons Analysis 141 (suppl. 2): 301318.
theory and linking numbers via oscillatory integrals. Kuo H-H (1996) White Noise Distribution Theory. Boca Raton,
Journal of Mathematical Physics (N.Y.) 36 (suppl. FL: CRC Press.
5): 21352169. Landsman NP, Pflaum M, and Schlichenmaier M (2001)
Albeverio S and Sengupta A (1997) A mathematical construction Quantization of Singular Symplectic Quotients. BaselBoston
of the non-Abelian ChernSimons functional integral. Com- Berlin: Birkhauser.
munications in Mathematical Physics 186: 563579. Leukert P and Schafer J (1996) A rigorous construction of Abelian
Altschuler D and Freidel L (1997) Vassiliev Knot invariants and ChernSimons path integrals using White Noise analysis. Rev.
ChernSimons perturbation theory to all orders. Communica- Math. Phys. 8 (suppl. 3): 445456.
tions in Mathematical Physics 187: 261287. Sen Samik, Sen Siddhartha, Sexton JC, and Adams DH (2000)
Atiyah M (1990) The Geometry and Physics of Knot Polyno- Geometric discretization scheme applied to the Abelian
mials. Cambridge: Cambridge University Press. ChernSimons theory. Physical Review E 61: 31745185.
Bar-Natan D (1995) Perturbative ChernSimons theory. Journal Simon B (1971) Distributions and their Hermite expansions.
of Knot Theory and its Ramifications 4: 503. Journal of Mathematical Physics (N.Y.) 12: 140148.
Chern S-S and Simons J (1974) Characteristic forms and Witten E (1989) Quantum field theory and the Jones polynomial.
geometric invariants. Annals of Mathematics 99: 4869. Communications in Mathematical Physics 121: 351399.
action of the group has a matrix fractional linear (inhomogeneous) coordinate chart we obtain the
form: let condition that the matrix z is symmetric. Thus, we
have the (dense) coordinate chart on the Lagrangian
A B Grassmannian CN = Sym(k), N = k(k 1)=2 the
g
C D linear space of symmetric matrices.
A 2 Matk; D 2 Matn k; There is one more type of minimal flag manifolds
B 2 Matk; n k; C 2 Matn k; k for the orthogonal group SO(n; C) the quadric Q
in the projective space:
Then we have the transformation in inhomogeneous
coordinates: Iz zIz> 0
where rows z 2 Cn n{0} represent, in homogeneous
z 7! A zC1 B zD coordinates, points in CPn1 . If I = En we have the
The condition C = 0 defines the parabolic sub- equation (z1 )2 (zn )2 = 0. This quadric is the
group which has affine action in inhomogeneous complex compact conformal flat manifold
coordinates which is transitive in the coordinate CCN , N = n 2; it is the compactification of CN
chart. In such a way the Grassmannian is a endowed with the flat conformal structure corre-
compactification of Ck(nk) (realized as a space of sponding to the quadratic isotropic cone. The
k (n k) matrices). If n = 2k, we can consider it as parabolic group is generated by linear conformal
the compactification of the space of square matrices transformations and translations. On the quadric Q
z of order k with the flat generalized conformal the conformal structure is defined by intersections of
structure defined by translations of the isotropy cone tangent spaces with Q. Apparently, this structure is
{det z = 0}. invariant relative to the natural action of SO(n; C).
There are similar constructions of flag manifolds
for other classical groups. We will consider only the Classical Stein Manifolds
minimal flag manifolds. For O(2k; C) we consider
the isotropic Grassmannian GrIC (2k; C) of isotropic Such homogeneous complex manifolds X = G=H have
k-subspaces relative to the symmetric form I. We complex reductive isotropy subgroups H. Contrary to
take the matrix realization of GrC (k; 2k), using the flag manifolds which are compact, these manifolds
Stiefels homogeneous coordinates, and add the are Stein ones and there are many holomorphic
matrix equation functions on them. The typical examples for
G = GL(n; C) are homogeneous spaces S(k1 , . . . ,
ZIZ> 0 kr1 ), n = k1 kr1 , for which the isotropy sub-
groups are blockdiagonal matrices with the blocks of
which is well defined in the homogeneous coordi- sizes k1 , . . . , kr1 . Then points of the manifold can be
nates (compatible with the equivalency classes) and realized as generic sets of subspaces Lj Cn ,
defines isotropic subspaces relative to I. This matrix dim Lj = kj , 1 j r 1 or, what is equivalent, gen-
cone is preserved by the subgroup O(2k; C) eric sets of (kj 1)-dimensional planes in CPn1 . Since
GL(2k; C) corresponding to the matrix I. If we the isotropy subgroup of such a homogeneous space is a
take the symmetric matrix subgroup of the parabolic subgroup P(n1 , . . . , nr ),
kj = nj nj1 , we have the natural fibering S(k1 , . . . ,
0 Ek
I kr1 ) ! F(n1 , . . . , nr ) (it is simple to see this geo-
Ek 0
metrically: the ith subspace of a flag in the base is the
then in inhomogeneous coordinates (z is a square direct sum of first i subspaces representing a point in
k-matrix) this equation is transformed into the the fiber). This is a convenient tool to apply
condition that the matrix z is skew-symmetric. So, complex analysis on S to the compact manifold F
in a natural sense, the isotropic Grassmannian is where there are no nontrivial holomorphic functions.
the compactification of the linear space of skew- Let us emphasize that such a connection exists only
symmetric matrices Alt(k) = CN , N = k(k 1)=2. for special classes of classical Stein manifolds.
A similar construction makes sense for the Let us pay special attention to the subclass of
symplectic group: if we replace the symmetric form symmetric Stein manifolds. For such manifolds X, the
I with the skew-symmetric form J, we obtain the isotropy subgroup H is fixed relative to a holomorphic
equation of the matrix cone representing the involutive automorphism of G. Complex semisimple
Lagrangian Grassmannian GrLC (k; 2k) of Lagrangian Lie groups G (including classical ones) are symmetric
subspaces in 2k-dimensional linear symplectic space. Stein manifolds relative to the action of their square
If we were to choose J as above, then in the G G by left and right multiplications.
Classical Groups and Homogeneous Spaces 503
Classical Stein manifolds for SL(n; C) considered Similarly, we can interpret the local isomorphism
above are symmetric if r = 1 and we have the SO(4; C) SL(2; C) SL(2; C). We realize C4 as the
manifold of pairs of subspaces of complimentary space of square matrices z of order 2 with the
dimensions intersecting only on {0}. The simplest symmetric quadratic form I(z, z) = det (z). Then left
example is the manifold of pairs of different points and right multiplications of z on unimodular
of the projective line CP1 . Let us point out again matrices (z 7! uzv, u, v 2 SL(2; C)) induce orthogonal
that the transition to the generic pairs of points transforms for the form I and any orthogonal
transforms the compact complex manifold without transform can be represented in such a form (one
nonconstant holomorphic functions into a Stein can see it by the calculation of dimensions).
manifold with a large collection of holomorphic The local isomorphism SL(4; C) SO(6; C) has a
functions. slightly more complicated nature. Let us consider the
Some other examples of symmetric Stein mani- Grassmannian GrC (2; 4) of lines in the projective
folds are connected with classical geometry and space CP3 with 2 4 matrices Z as matrix homo-
linear algebra. The affine hyperboloid in Cn , geneous coordinates. Let pij , i < j, be the minors of Z
with ith and jth columns. They are called Plucker
Qz 1
coordinates on GrC (2; 4): the equivalency class of
is a symmetric space for G = O(n; C), H = O(n 1; C). Z is defined by the sequence of six numbers
We can compare it with the projective quadric p = (pij , 1 i < j j) 6 (0, . . . , 0) up to a constant
Q(z) = 0 which is a minimal flag manifold. Let us factor. Thus, we have an imbedding of GrC (2; 4) in the
remark that there is a duality here: it is possible to projective space CP5 . The image will be the quadric
interpret points of the hyperboloid of dimension n
p12 p34 p13 p24 p14 p24 0
as generic hyperplane sections of the projective
quadric of dimension n 1. Thus, we have the isomorphism of two flag manifolds
The space X of complex symmetric matrices of and the action of SL(4; C) on the Grassmannian
order n with determinant 1 is symmetric for the transforms in orthogonal transformations of four-
group SL(n; C) which acts by the changes of dimensional quadric in CP5 . The Plucker coordinates
variables in the corresponding quadratic forms: can be defined for any Grassmannian, but they do not
produce in other cases some isomorphisms with other
z 7! g> zg; g 2 SLn; C flag manifolds; nevertheless, they realize them as
The transitive action reflects the possibility of intersections of quadrics in projective spaces.
transforming such a form into a sum of squares.
The isotropy subgroup is SO(n; C).
The Stein symmetric manifold X = SO(n; C)= Compact Classical
S(O(k; C) O(n k; C)) is realized as the manifold Homogeneous Manifolds
of k-dimensional subspaces in Cn on which the
restriction of the principal symmetric form I is Compact classical groups U(n), SU(n), O(n), SO(n),
nondegenerate. Sp(l) are maximal compact subgroups in the corre-
sponding classical complex groups GL(n; C), SL(n; C),
O(n; C), SO(n; C), Sp(l; C). This condition defines
Isomorphisms in Small Dimensions
them up to an isomorphism. They are fixed subgroups
Isomorphisms of classical groups in small dimen- of some antiholomorphic involutive automorphisms.
sions produce isomorphisms of some classical The unitary groups U(n) and SU(n) are the groups
homogeneous manifolds. Such isomorphisms were of unitary matrices (g
g = E,) correspondingly, of
very important in the history of geometry; below are unitary matrices with determinant 1. As the compact
a few examples. We will consider local isomorph- orthogonal group we can take the intersection U(n) \
isms (up to a finite center). We have SL(2; C) O(n; C). For the standard form I, it will be the group of
SO(3; C). Let us realize C3 as the space of symmetric real orthogonal matrices: g> g = E (so the involution in
matrices z of order 2. Then, as we remarked above, O(n; C) is the conjugation g 7! g). Similarly, we can
the two-dimensional submanifold X of matrices take Sp(l) = SU(2l) \ Sp(l; C) (then the involution is
with determinant 1 is the symmetric Stein manifold g 7! JgJ).
for the group SL(2; C). On the other hand, we can Compact classical groups act on compact homo-
take det z as the quadratic symmetric form I in C3 ; geneous Riemann manifolds. There are two mech-
then X is the hyperboloid for this form and the anisms connecting compact and complex
action of SL(2; C) on symmetric matrices gives the homogeneous manifolds. We observe the first
orthogonal transformations relative to this form I. possibility in the case of flag manifolds which are
504 Classical Groups and Homogeneous Spaces
compact. We considered them so far relative to the real Grassmannian GrR (k; n) of k-subspaces in Rn
action of complex (noncompact) groups. It turns out can be defined as SO(n)=S(O(k) O(n k)). This
that on the flag manifold F = G=P the maximal representation corresponds to the characterization
compact subgroup U G continues to be transitive: of subspaces by orthonormal bases. The considera-
so we can consider flag manifolds also as being tion of arbitrary bases defines the action of the
homogeneous with compact groups. Then F = U=C, larger group GL(n; R) on GrR (k; n). Relative to this
where C is the centralizer of a torus in U. There is a action, the real Grassmannian is not symmetric since
Kahler metric on F, invariant relative to U. Thus, G the isotropy subgroup is parabolic and is not
is the group of all automorphisms of F as the involutive. Such a possibility to extend the group is
complex manifold, but U is the group of its typical for a class of compact symmetric manifolds
automorphisms as the Kahler manifold. It defines called symmetric R-spaces. They are real forms of
two sides of geometry of flag manifolds: complex Hermitian compact symmetric manifolds (minimal
and Kahler. Flag manifolds are the only compact flag manifolds). Let us also mention compact
homogeneous Kahler manifolds with semisimple Lie symmetric spaces SU(n)=SO(n), which is the compact
groups (the class of all compact Kahler manifolds form of the space of unimodular symmetric matrices
also contains locally flat compact manifolds and can be presented by the submanifold of unitary
toruses). In the example considered above we have matrices in it. Also, all compact Lie groups G are
F(n1 , . . . , nr ) = SU(n)=S(U(k0 ) U(kr )). In the lan- symmetric spaces relative to the action of G G.
guage of Stiefel (homogeneous) coordinates, we fix a
positive Hermitian form in Cn and characterize
subspaces by orthonormal bases. For r = 1 we have
Noncompact Riemannian
Grassmannians GrC (k; n), in particular the projec-
Symmetric Manifolds
tive space CPn1 which we consider relative to the
action of the unitary groups. Relative to this action This class of symmetric manifolds has the strongest
they are Hermitian symmetric spaces. In the case of connections with classical mathematics. Let us
minimal flag manifolds for other groups the action consider noncompact real semisimple Lie groups
of maximal compact subgroups also defines on them real forms of complex semisimple Lie groups. They
the structure of compact Hermitian symmetric correspond to antiholomorphic involutions in com-
spaces. Let us emphasize that relative to noncom- plex groups.
pact groups of biholomorphic automorphisms G, Between real forms of SL(C, n) there are real and
the minimal flag manifolds (including the Grass- quaternionic unimodular groups SL(R, n), SL(H, n)
mannians) are not symmetric. and pseudounitary groups SU(p, q) of complex
In the case of homogeneous Stein manifolds matrices preserving a Hermitian form H of the
X = G=H, the picture is different: the maximal signature (p, q). The complex orthogonal group has
compact subgroups have no open orbits. There are as real forms, in particular, pseudoorthogonal
totally real orbits which are the compact forms of groups SO(p, q) of real matrices preserving a
X: XR = GR =HR , where GR and HR are compact quadratic form of the signature (p, q).
forms of G and H, respectively. It is the canonical Let G be a real simple Lie group and K be its
embedding of compact homogeneous manifolds maximal compact subgroup. Then X = G=K is a
in their complexifications. The important special Riemann symmetric manifold of noncompact type;
case is the embedding of compact symmetric K is defined by an involutive automorphism of G.
manifolds in the Stein symmetric manifolds their Therefore, in irreducible situation there is a corre-
complexifications. spondence between noncompact Riemann sym-
For compact symmetric manifolds X = U=K the metric manifolds and real simple noncompact Lie
groups U, K are compact Lie groups and elements groups. K-orbits on X are parametrized by points of
of K are fixed for an involutive automorphism the orbit on X of a maximal abelian subgroup A
such that K contains the connected component of the Cartan subgroup of the symmetric space X. Its
the subgroup of all fixed elements of . This dimension l is the important invariant of X its
possibility to connect several symmetric manifolds rank. The algebraic base for geometry of X is the
with one involution is illustrated by the next Iwasawa decomposition
example. The sphere Sn1 Rn is the symmetric
G KAN
space SO(n)=SO(n 1); the real projective space
RPn1 is SO(n)=O(n 1). Here SO(n 1) is the where N is a maximal unipotent subgroup (in a
connected component of O(n 1) and Sn1 is a natural sense compatible with A). Then the para-
double covering of RPn1 . A few more examples, the bolic subgroup P = AN is transitive on X.
Classical Groups and Homogeneous Spaces 505
manifold X = G=K to be Hermitian is that K has an have the realization of this Hermitian symmetric
one-dimensional center. All Hermitian symmetric space as a bounded domain in CN , N = kq. In the
manifolds of noncompact type can be realized as case k = 1, we have the usual (scalar) complex ball.
bounded domains in Cn (but, of course, not all their Let us remark that the edge of the boundary
holomorphic automorphisms extend in Cn ). In the (Shilovs boundary) is the compact symmetric space
case of classical manifolds, these domains are called
zz
Ek
Cartans domains: Cartan gave their explicit matrix
realizations. with the group of automorphisms S(U(k) U(q))
The nature of groups of holomorphic automorph- (the isotropy subgroup of X). For k = q the edge
isms of symmetric domains X = G=K CN is coincides with the set of unitary matrix U(k).
explained by Cartans duality. Each such domain Different forms H of the signature (k, q) are
(Hermitian symmetric manifold of noncompact linearly equivalent and they correspond to different
type) admits an embedding in a Hermitian sym- (biholomorphically equivalent) realizations of this
metric manifold of compact type XC such that the Hermitian symmetric spaces. Let us, in the beginning,
complexification GC of G is the group of holo- set k = q; the inhomogeneous matrix coordinates are
morphic automorphisms of XC (correspondingly, square matrices of order k. Let us take the form
D is an open G-orbit on XC ). Moreover, X lies
inside a (Zariski open) coordinate chart CN , which 0 iEk
H2
is an orbit of a parabolic subgroup. iEk 0
The simplest example is the complex ball CBn Then, in inhomogeneous matrix coordinates, we
(complex hyperbolic space) imbedded in the com- have the domain X2 :
plex projective space CPn . The affine chart Cn is the
orbit of the parabolic subgroup of affine transfor- 1
z z
0
mations. Let us consider more complicated i
examples. (complex matrices with positive skew-Hermitian
Let XC be the Grassmannian GrC (k; n), q = n parts). This domain (but not its boundary) lies in
k p; we will use matrix homogeneous coordinates the chart. It has the structure of the tube domain
Z k n matrices for the description of the T = R n iV, n = k2 , corresponding to the symmetric
symmetric domain. Then GC = SL(n; C). Let us take cone of positive Hermitian matrices (we take the
its real form G = SU(k; q), k q = n. We fix a space of such matrices as a real form of Cn ). The
Hermitian form H of the signature (k, q) and realize group of affine transformations of the tube domain:
G as the group of matrices preserving H:
z 7! uzu
a; u 2 GLk; C; a 2 Hermk
gHg
H
is transitive on X2 ; it is the parabolic subgroup in
Then X = Xk, q = SU(k, q)=S(U(k) U(q)) can be rea- SU(k, q).
lized as the domain in the Grassmannian The biholomorphic equivalency of the realizations
of X corresponding to different H is induced by the
ZHZ
0
equivalency of these forms. We have
so that this Hermitian matrix of order k must be p
positive. It is essential that this condition is invariant
2 Ek iEk
H2 H1 ;
relative to multiplications of Z on nondegenerate 2 iEk Ek
matrices u on the left and, therefore, it is a well- Then the transform Z 7! Z transforms X2 in X1 . In
defined condition in homogeneous coordinates. inhomogeneous coordinates it is the fractional linear
Let us specify the choice of H: matrix transform
Ek 0
H1 z 7! iz iEk 1 z iEk
0 Eq
It is the matrix version of the classical Cayley transform.
Then the corresponding domain X1 is defined in Similarly, we can write down the inverse transform.
inhomogeneous coordinates Z = (Ek , z), z 2 Mat(k, q), If q 6 k, then there is also an analog of the tube
by the condition realization. Let r = q k > 0 and
Ek zz
0 0 1
0 iEk 0
This matrix ball lies completely in the coordinate H2 @iEk 0 0 A
chart Ckq . Its rank is equal to min (k, q). Thus, we 0 0 Er
Classical Groups and Homogeneous Spaces 507
Let us represent the inhomogeneous coordinates The corresponding tubes are called the future (past)
as z = (Ek , w, u), w 2 Mat(k), u 2 Mat(k, r). Then the tube, depending on which light cone was taken.
domain X2 is defined by the condition Let us consider this construction. The group of
holomorphic automorphisms of these domains is
1
w w
uu
0 G = SO(2; n) the conformal extension of the
i Lorentz group. To realize this group, let us fix a
This is an example of Siegel domains of the second real symmetric matrix Q of signature (2, n) and the
kind (Pyatetskii-Shapiro 1969). This domain has a group is the group of linear transformations preser-
transitive group of affine transformations: ving simultaneously the quadratic symmetric and
Hermitian forms with this matrix Q:
w; u 7! w a 2ub
bb
; u b
a 2 Hermk; b 2 Matk; r g> Qg Q; g
Qg Q
w; u 7! cwc
; cu c 2 GL k; C The standard realization corresponds to the diagonal
matrix Q with the diagonal (1, 1, 1, . . . , 1).
This class of symmetric domains in Grassman- Cartans domains of the fourth class are connected
nians is called Cartans domains of the first class. components of the manifold
There are similar constructions for minimal flag
domains (compact Hermitian symmetric spaces) ZQZ> 0; ZQZ
> 0
with other groups. Let us consider the Lagrangian
where rows Z are homogeneous coordinates in the
Grassmannian GrLC (k; 2k) corresponding to the
projective space CPn1 . In other words, we consider
form J above. Here GC = Sp(k, C). Its real form
a domain on the quadric in the projective space
G = Sp(k; R) can be realized as the subgroup
(which is the complex flat conformal space CCn ).
of complex symplectic matrices preserving a
For the standard Q the domain will lie in the
Hermitian form H of the signature (k, k). In other
coordinate chart; thus it is the bounded realization.
words, we intersect the domains from the last
For the tube realization, we take
example with the Lagrangian Grassmannians. We
0 1
consider the coordinate chart with inhomogeneous 0 1 0
coordinates symmetric matrices z 2 Sym(k). For Q @ 1 0 0 A
H1 we have the domain of symmetric matrices z 0 0 En
with the condition
Let Z = (z0 , z1 , w1 , . . . , wn ), w = u iv, q(s, t) = s1 t1
Ek zz 0; z z> s2 t2 sn tn and we consider the affine
chart Cn1 = {z0 = 1}. We have
This bounded realization is called Siegels disk. For
H2 the real form is the group of real symplectic ZQZ> 2z1 qw; w 0
matrices and X2 is the domain ZQZ
2<z1 qw; w
>0
1 The first condition gives 2<z1 = q(v, v) q(u, u) and
=z z z 0; z z>
2i then the second condition gives the final description
of complex symmetric matrices with positive ima- of the considered set in Cnw :
ginary parts; it is called Siegels half-plane. This is
qv; v v21 v22 v2n > 0; w u iv
the third class of Cartans domains. There are Siegel
domains of second kind connecting with the cones as the union of the future and the past tubes
of positive symmetric matrices; some of them are (T = {v1 00}). The edge Rn of these tubes (v = 0)
homogeneous, but they are never symmetric. has the structure of the Minkowski space correspond-
There are two more series of classical minimal flag ing to the form q. The parabolic subgroup is the affine
manifolds: the isotropic Grassmannians and quadrics. conformal group of the Minkowski space. It includes
They both contain the dual bounded symmetric the Poincare group and is transitive on tubes. The
domains (Cartans domains of second and fourth complete group of holomorphic automorphisms of
classes correspondingly). Some of these domains in tubes G = SO(2, n) is the group of all (not only affine)
the isotropic Grassmannians admit the realizations as conformal transformations of the Minkowski space.
tubes with the cone of positive Hermitian quaternionic The complete edge of these symmetric domains in the
matrices and others as Siegel domains of the second quadric CCn is the conformal compactification of the
kind corresponding to the same cones. Minkowski space (a compact symmetric R-space with
Symmetric domains in quadrics can be realized as the compact group S(O(2) O(n)) on which the
tube domains with the Lorentzian (light) cones. noncompact group SO(2, n) also acts).
508 Classical Groups and Homogeneous Spaces
1983). The isomorphism above for the group manifold of smaller dimension (which plays a role
SL(2, H) also corresponds to Hopfs fibering of of infinity).
CP3 on complex lines over the sphere S4 or the There are pseudo-Hermitian symmetric manifolds
isomorphism S4 and the quaternionic projective line which are not satellites of Hermitian ones. Let us
HP1 . In all these cases, isomorphisms of homo- give an interesting example. The group SL(2p, R)
geneous manifolds intertwine the actions of locally has two open orbits on the Grassmannian
isomorphic groups. GrC (p; 2p) which are both pseudo-Hermitian sym-
metric spaces. Let us consider as above the Stiefel
coordinates Z 2 MatC (p, 2p) and let Z = X iY.
Pseudo-Riemann Symmetric Manifolds Then the orbits are defined by the conditions
We obtain the next broad class of homogeneous X
det 00
manifolds if we preserve conditions that the group G Y
is a real semisimple one, the isotropy subgroup H is
involutive, but we remove the restriction that H In the intersection with the coordinate chart
must be (maximal) compact. Such symmetric mani- Z = (E, z), z 2 MatC (p), z = x iy, we have the
folds are often called semisimple pseudo-Riemann conditions
symmetric manifolds (since there are also pseudo-
det y00
Riemann symmetric manifolds whose groups are not
semisimple). This class of spaces contains symmetric Therefore, we obtain (nonconvex) tube domains in
Stein manifolds XC = GC =HC . Each semisimple CN = MatC (p), N = p2 , corresponding to nonconvex
symmetric manifold X = G=H admits complexifica- homogeneous cones V of real matrices with
tion as a symmetric Stein manifold. Each real positive (negative) determinants. These tubes do
semisimple Lie group G is symmetric relative to not coincide with the symmetric manifolds which
the group G G. include also some sets of small dimensions outside of
The simplest family of semisimple symmetric the coordinate chart (on infinity). There are other
manifolds is the family of all hyperboloids of all homogeneous nonconvex cones such that corre-
signatures sponding tube domains are Zariski open parts of
Hp;q fx21 x2p x2p1 x2n 1g pseudo-Hermitian symmetric spaces (DAtri and
Gindikin 1993). Between these cones are cones of
with the groups SO(p, q). Their complexifications nondegenerate skew-symmetric matrices, of skew-
are complex hyperboloids. There are two types Hermitian quaternionic matrices. We again observe
of Riemann manifolds in these families: compact strong connections with classical mathematics. Not
ones spheres and noncompact ones two-sheeted all pseudo-Hermitian symmetric manifolds admit
hyperboloids; all others are pseudo-Riemann. such tube realizations of dense parts. Analysis in
The Cartan duality holds for pseudo-Hermitian pseudo-Hermitian symmetric manifolds is very
symmetric manifolds: they are domains in compact interesting: we consider there instead of holo-
Hermitian symmetric manifolds (minimal flag mani-
morphic functions @-cohomology of some degree.
folds) Z = GC =PC . They are open orbits of real Geometric relations between different symmetric
forms G of the groups of holomorphic automorph- manifolds are usually important for analytic applica-
isms GG . We construct examples of such manifolds tions since they can produce some nontrivial integral
if we consider one of the above-described realiza- transformations. In a broad sense, such transforms are
tions of noncompact Hermitian symmetric mani- considered in integral geometry (Gelfand et al. 2003).
folds (through matrix homogeneous coordinates) An important example is duality between some
and replace the condition of positivity with the compact Hermitian symmetric manifolds (when points
condition that the symmetric (Hermitian) matrix in in one of them are interpreted as submanifolds in
the definition has a fixed nondegenerate signature another one). The simplest example is the projective
(i, k i). We can call such pseudo-Hermitian sym- duality between dual copies of projective spaces or,
metric manifolds satellites of Hermitian ones. more generally, the realization of points of Grass-
Correspondingly, we can consider nonconvex mannians as projective planes. Such a duality can
tubes, for example, the set T of such symmetric induce a duality between orbits of real forms of groups.
matrices whose imaginary parts have the signature In a special case, it can be a duality between Hermitian
(i, n i). This domain is linear homogeneous, but it and pseudo-Hermitian symmetric manifolds.
is not symmetric; to receive the symmetric manifold Here is one important example. Let us consider in
we need to extend the nonconvex tube by a the projective space CP2k1 the domain D which in
510 Classical Groups and Homogeneous Spaces
homogeneous coordinates rows z = (z0 , z1 , . . . , zn ) spite of the fact that this group acts neither on X
are defined by the equation zHz
> 0, where H nor on Hn . Such an extension of the symmetry
is a Hermitian form of the signature (k, k), for group is a very interesting phenomenon. It happens
example, for several other symmetric manifolds, but is not a
general fact. This geometrical construction gives a
jz0 j2 jzk j2 jzk 1 j2 jzn j2 > 0 possibility to construct a multidimensional version
This domain is (k 1)-pseudoconcave and it con- of the Penrose transform from (n 2)-dimensional
@-cohomology with different coefficients into solu-
tains (k 1)-dimensional complex compact cycles,
namely (k 1)-dimensional planes. The manifold of tions of massless equations on the future (past)
these planes is exactly the domain X in the Grass- tubes.
mannian GrC (k; 2k) (of projective (k 1)-planes) The last duality is connected with some general
which is the noncompact Hermitian symmetric geometrical construction. We mentioned that each of
space the orbit of the group SU(k, k) (see above). the Riemann symmetric manifolds X = G=K admits a
This picture is the geometrical basis for a deep canonical embedding in the symmetric Stein manifold
analytic construction. In the domain D the spaces XC = GC =KC . It turns out that X has in XC a canonical
of (k 1)-dimensional @-cohomology are infinite Stein neighborhood the complex crown (X) such
dimensional for some coefficients. Their integration that many analytic objects on X can be holomorphi-
on (k 1)-planes (the Penrose transform) gives cally extended on the crown (Gindikin 2002). For
sections of corresponding vector bundles on X. The example, all solutions of all invariant differential
images are described by differential equations equations on X (which are elliptic) admit such
generalized massless equations. The basic twistor holomorphic extension. In the last example, D is
theory corresponds to k = 2 when X is isomorphic the crown of the Riemann symmetric space which is
to four-dimensional future tube (see above). defined, in Hn , by the condition =() = 0, <(0 ) > 0.
Similar dual realizations of Hermitian symmetric Symmetric manifolds are distinguished from most
manifolds exist only in special cases. The twistor other homogeneous manifolds by a very rich
realization of four-dimensional future tube was geometry which is a background for deep analytic
possible since the Grassmannian GrC (2; 4) is iso- considerations. There are several important nonsym-
morphic to the quadric in CP5 . This does not work metric homogeneous manifolds. We already men-
for the future tubes of bigger dimensions but there is tioned flag manifolds and Stein homogeneous
another possibility (Gindikin 1998). Let us have the manifolds with complex semisimple Lie groups
quadric Qn1 CPn be defined in the homogeneous which can be nonsymmetric. Pseudo-Riemann sym-
coordinates by the equation metric manifolds are open orbits of real groups on
compact Hermitian symmetric spaces. It turns out
&z z0 2 z1 2 zn 2 0 that open orbits on other flag manifolds also
produce interesting homogeneous manifolds. Let
and z is the bilinear form. As already mentioned, F = GC =PC be a flag manifold. Flag domains are
the set of (nondegenerate) hyperplane sections open orbits of a real form G on F. Of course,
z 0; 2 Cn1 ; & 1 pseudo-Hermitian symmetric manifolds are a special
case of this construction. Let us consider a simple
of Qn1 is the corresponding hyperboloid Hn . Thus, example with GC = SL(3; C) and P the triangle
we have the duality between a flag manifold (the group. Then points of F are pairs {a point z and a
quadric Qn1 ) and a symmetric Stein manifold (the line l passing through it}. Let G = SU(2; 1); it has
hyperboloid Hn ) with the same group SO(n 1, C); two open orbits on CP2 : the complex ball D and its
they have different dimensions. complementary DC . On F, the group G has three
The group SO(1, n) has two orbits on Qn1 : open orbits (flag domains): in the first z 2 D, l is
the real quadric QR = {z 2 Qn1 ; =(z) = 0} and its arbitrary; in the second l DC ; in the third z 2 DC , l
complement X = Qn1 nQR . Hyperplane sections intersects D. They are all 1-pseudoconcave. In one-
which do not intersect QR (lie at X) correspond
dimensional @-cohomology of these flag domains
such 2 Hn that with coefficients in line bundles, are realized all
three discrete series of unitary representations of
&<z > 0
SU(2, 1). For arbitrary semisimple Lie groups, all
This set has two connected components D which discrete series of representations can also be realized
are biholomorphically equivalent to the future and
in @-cohomology of flag domains. Crowns of
past tubes T of the dimension n. Let us emphasize Riemann symmetric spaces which we just mentioned
that their group of automorphisms is SO(2, n) in parametrize cycles (complex compact submanifolds)
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 511
in flag domains. Some general version of the Penrose Faraut J and Koranyi A (1994) Analysis on Symmetric Cones.
transform connects through the integration along Oxford: Oxford University Press.
Gelfand I, Gindikin S, and Graev M (2003) Selected Topics in
cycles cohomology in flag domains with holo- Integral Geometry. Providence, RI: American Mathematical
morphic solutions of some differential equations in Society.
crowns (generalized massless equations). Gindikin S (1983) The complex universe of Roger Penrose.
Mathematical Intellingencer 5(1): 2735.
See also: Combinatorics: Overview; Compact Groups Gindikin S (1998) SO(1; n)-twistors. Journal of Geometry and
and their Representations; Lie Groups: General Theory; Physics 26: 2636.
Pseudo-Riemannian Nilpotent Lie Groups; Several Gindikin S (2002) Some remarks on complex crowns of real
symmetric spaces. Acta Mathematica Applicata 73(12): 95101.
Complex Variables: Compact Manifolds; Stability of
Helgasson S (1978) Differential Geometry, Lie Groups and
Minkowski Space; Symmetry Classes in Random Matrix
Symmetric Spaces. New York: Academic Press.
Theory; Twistor Theory: Some Applications; Twistors. Helgasson S (1994) Geometric Analysis on Symmetric Spaces.
Providence, RI: American Mathematical Society.
Onishchik A and Vinberg E (1993) Lie Groups and Lie Algebras I
Further Reading Foundations of Lie Theory. In: Onishchik A (ed.) Lie Groups and
Lie Algebras. Encyclopaedia of Mathematical Sciences, vol. 20.
Akhiezer D (1990) Homogeneous complex manifolds. In: Gindikin
New York: Springer.
S and Henkin G (eds.) Several Complex Variables IV, vol. 10,
Pyatetskii-Shapiro I (1969) Automorphic Functions and Geometry
Encyclopaedia of Mathematical Science. New York: Springer.
of Classical Domains. Amsterdam: Gordon and Breach.
DAtri J and Gindikin S (1993) Siegel domain realization of
pseudo-Hermitian symmetric manifolds. Geometriae Dedicata
46: 91126.
Poisson bracket on G equips F(G) with the structure we conclude from eqn [4] that : g ! g ^ g is a
of a PoissonHopf algebra, that is 1-cocycle on g, that is,
3
Classical r-Matrices and Special A coboundary Lie bialgebra with [[r, r]] 2 (^ g)g
Classes of Lie Bialgebras is called quasitriangular; it is called triangular
if r satisfies the classical YangBaxter equation
The general classification problem for Lie bialgebras [[r, r]] = 0. (Both terms come from another name of
is unfeasible (e.g., classification of abelian Lie the classical YangBaxter equation, the classical
bialgebras includes classification of all Lie algebras). triangle equation.)
In applications, one mainly deals with important When a Lie algebra g admits a nondegenerate
special classes of Lie bialgebras, of which factoriz- invariant inner product, the class of quasitriangular
able Lie bialgebras are probably the most important. Lie bialgebra structures on g admits an important
In a sense, this class may be regarded as exhaustive, specialization. Let g g g g be the natural
since (as explained below) any Lie bialgebra is isomorphism induced by the inner product. Let I 2
canonically embedded into a factorizable one. g g be the canonical element; its image t 2 g g
Various other special classes discussed in literature under this isomorphism is called the tensor
are coboundary bialgebras, triangular bialge- Casimir element. Clearly, t 2 (S2 g)g and, more-
bras, and quasitriangular bialgebras. 3
over, [t12 , t23 ] 2 (^ g)g . When g is semisimple, the
The Lie bialgebra (g, g , ) is called a coboundary 3
mapping (S g) ! (^ g)g : s 7! [s12 , s23 ] is an iso-
2 g
bialgebra if the cobracket is a trivial 1-cocycle on g, morphism; in particular, if g is simple, both spaces
that is, are one dimensional and generated by a tensor
X X I I X; r for all X 2 g 6 Casimir (which is unique up to a scalar multiple). A
Lie bialgebra (g, r) is called factorizable if r 2 g ^ g
the constant element r 2 g ^ g is called the classical satisfies the modified classical YangBaxter
r-matrix. If g is semisimple, H 1 (g, V) = 0 for any equation
g-module V by the classical Whitehead theorem, and
hence all Lie bialgebra structures on g are of r; r ct12 ; t23 ; c const 6 0 9
coboundary type. The associated Lie bracket on g The convenient normalization is c = 1=4 (it can be
is given by the formula achieved by an appropriate normalization of r).
; 0 adg r 0 adg r0 7 Instead of dealing with the modified YangBaxter
equation, we may relax the antisymmetry condition
where we identified r 2 g ^ g with a skew-symmetric imposed on r. Set r = r (1=2)t 2 g g. Since t
linear operator r : g ! g. The restrictions imposed is ad g-invariant, the symmetric part of r drops
on r by the Jacobi identity are formulated in terms out from the cobracket; on the other hand, one
of the so-called YangBaxter tensor [[r, r]] 2 g ^ has [[r , r ]] = 0. Regarding r as a linear operator,
g ^ g, which is a quadratic expression in r. To define r 2 Hom(g , g), we get the following important
it, let us mark different factors in tensor products, result:
for example, g g g, by fixed numbers 1, 2, 3, . . .
Proposition 2 Let (g, g ) be a factorizable Lie
which indicate their place; for simplicity, we assume
bialgebra.
that g is embedded in an associative algebra A with a
unit. The embeddings are defined as (i) The mappings r 2 Hom(g , g) are Lie algebra
homomorphisms; moreover, r = r .
i12 ; i23 ; i13 : g g ! A A A (ii) The combined mapping
by setting i12 (X Y)=X Y I, and similarly ir : g ! g g : X 7! r X; r X
in other cases. For a 2 g g, we put i12 (a) = a12 ,
etc. Set is a Lie algebra embedding.
(iii) Any X 2 g admits a unique decomposition
r; r r12 ; r13 r12 ; r23 r13 ; r23 8 X = X X with (X , X ) 2 Im ir .
The commutators in the RHS are computed in the The additive decomposition in a factorizable Lie
associative algebra A A A; it is easy to check bialgebra gives rise to a multiplicative factorization
that the result does not depend on the choice of the problem in the associated Lie group. Namely, ir may
embedding g ,! A. be extended to a Lie group embedding ir : G ! G
Proposition 1 The Jacobi identity for [ , ] is valid if G and any x 2 G, which is sufficiently close to the
and only if [[r, r]] is ad g-invariant, that is, if unit element, admits a decomposition x = x x1
with (x , x ) 2 Im ir .
X I I I X I I I X; r 0 Any Lie bialgebra (g, g ) admits a canonical
for all X 2 g embedding into a larger Lie bialgebra (called its
514 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups
double) which is already factorizable. Namely, set triple. Hence, any compact semisimple Lie group
d = g g as a linear space and equip it with the K carries a natural Poisson structure; its double
natural inner product, G = D(K) is the complex group G = KC (regarded
as a real Lie group). The associated factorization
hhX; F; X0 ; F0 ii hF; X0 i hF0 ; Xi 10
problem in G is the Iwasawa decomposition
G = KAN, which exists globally.
Theorem 2 2. Let g be a real split semisimple Lie algebra, h its
(i) There exists a unique structure of the Lie algebra Cartan subalgebra, and a system of positive
on d such that: (a) g, g
d are Lie subalgebras. roots. Fix an invariant inner product on g which
(b) The inner product [10] is invariant. is positive on h, and let {e ; 2 } be the root
(ii) Let Pg , Pg be the projection operators onto vectors normalized in such a way that
g, g
d parallel to the complementary sub- (e , e ) = 1. Let
algebra. Set rd = Pg , rd = Pg ; then (d, rd ) is a M
n R e
factorizable Lie bialgebra. 2
(iii) The inclusion map (g, g ) V (d, d ) is a homo-
morphism of Lie bialgebras and the dual inclusion Fix an orthonormal basis {Hi } in h; let P , P0
map (g , g) V (d, d ) is an antihomomorphism. be the projection operators onto n , h in the
Bruhat decomposition g = n . h. n . The
Conversely, let a be a Lie algebra equipped with a standard Lie bialgebra structure on g is given
nondegenerate invariant inner product, a
a its Lie by the r-matrices r = P 12 P0 . In tensor
subalgebras such that (i) a are isotropic with respect notation,
to inner product, (ii) a = a. a as a linear space.
The triple (a, a , a ) is called a Manin triple. Let X 1X
r e ^ e Hi Hi 11
P be the projection operators onto a in this 2
2 i
decomposition. Set r = P . Then (a, r ) is a
factorizable Lie bialgebra; moreover, a and a are Let b = h. n be the opposite Borel subalge-
set into duality by the inner product in a and inherit bras; the inner product in g sets them into
the structure of a Lie bialgebra, and a is their double. duality, and (b , b ) is a Lie sub-bialgebra
If (g, g ) is itself a factorizable Lie bialgebra, its in (g, g ). Let G be the connected, simply
double admits a simple explicit description. Set connected Lie group associated with g, B =
d = g g (direct sum of Lie algebras); let us equip HN its opposite Borel subgroups which corres-
d with the inner product pond to b . Let p : B ! B =N H be the
canonical projection. The associated factoriza-
hhX; X0 ; Y; Y 0 ii hX; Yi hY; Y 0 i tion problem in G, g = b b1 , (b , b ) 2 B
B , p(b ) = p(b )1 , is closely related to the
Let g
d be the diagonal subalgebra; we identify Bruhat decomposition; it is solvable for all g in
g with the embedded subalgebra ir (g )
d. the open Bruhat cell B N
G.
Proposition 3 3. Let Lg = g C((z)) be the loop algebra of a finite
dimensional semisimple Lie algebra g, as usual we
(i) (d, g , ir (g )) is a Manin triple. denote the ring of formal Laurent series by C((z)).
(ii) As a Lie algebra, d = g g is isomorphic to the Put Lg = g C[[z]], Lg = g z1 C[z1 ]. Fix an
double of g. invariant inner product on g and equip Lg with
Key examples of factorizable Lie bialgebras are the inner product
associated with semisimple Lie algebras and their hhX; Yii Resz0 hXz; Yzi dz
loop algebras.
Then (Lg, Lg , Lg ) is a Manin triple. The associa-
1. Let k be a compact semisimple Lie algebra: g = kC ted classical r-matrix is called rational r-matrix; in
its complexification regarded as a real Lie algebra, tensor notation, it is represented by a singular kernel
2 Aut g the Cartan involution which fixes k, and
t
g = k p the associated Cartan decomposition. rz; z0
z z0
Fix a real split Cartan subalgebra a
p and the
associated Iwasawa decomposition g = k. a. n; where t 2 g g is the tensor Casimir, which is
put s = a. n. Let B be the complex Killing form essentially the Cauchy kernel.
on g; let us equip g with the real inner product 4. Let us assume that g = sl(n); in this case, the loop
(X, Y) = Im B(X, Y), then (g, k, s) is a Manin algebra Lg admits a nontrivial decomposition
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 515
associated with the so-called elliptic r-matrix. Drinfeld have given a complete classification of
Set factorizable Lie bialgebra structures for semisimple
Lie algebras; in the loop algebra case, the problem they
I1 diag1; "; . . . ; "n1 ; solved consists of classification of all meromorphic
0 1 solutions of the classical YangBaxter equation. In
0 1 ... 0
B C other words, we assume that the distribution kernel
B 0 1 C
B. associated with the classical r-matrix is represented by
B. .. .. C
C
12
I2 B . . . C; " e2 i=n a meromorphic function (of two complex variables).
B .. C Up to an equivalence, any such solution depends
B C
@ . 1A only on one variable and belongs to the rational,
1 0 ... 0 trigonometric, or elliptic type (in the latter case, the
underlying Lie algebra is necessarily sl(n)). Classifi-
Put Z2n = Z=n Z Z=n Z; for a = (a1 , a2 ) 2 Z2n , cation of solutions in the elliptic case is completely
set Ia = I1a1 I2a2 ; matrices Ia define an irreducible rigid; in the trigonometric case, the moduli space is
projective representation of Z2n (they form the so- finite dimensional and admits an explicit descrip-
called finite Heisenberg group). Let us denote tion. In the rational case, the classification is
the elliptic curve of modulus
by E = C=Z
Z somewhat less explicit (it has been completed by A
and let P ! E be the n-dimensional holomorphic Stolin under some nondegeneracy condition). Con-
vector bundle with flat connection and with trary to to the popular belief, there are many other
monodromies given by structures of a factorizable Lie bialgebra on loop
algebras, for which the associated r-matrices are
z 7! z 1 : h1 Ad I1 ; z 7! z
: h2 Ad I2 given by more singular distribution kernels.
Let GE
Lg be the subspace of Laurent expansions
at zero of the global meromorphic sections of P
with a unique pole at 0 2 E. Then (Lg, Lg , GE ) is Poisson Lie Groups
again a Manin triple. The associated classical
If the tangent Lie bialgebra of a Poisson Lie group is
r-matrix is the kernel of a singular integral operator
of coboundary type, the cocycle is also trivial,
which associates a meromorphic section of P to its
(g) = r Ad g Ad g r. Hence, the Poisson
principal part at 0. Explicitly, it is given by
bracket on G is given by
n1
0 1X z z0 f; g hr; r0 ^ r0 i hr; r ^ r i; r2g^g
rz z a b
n a;b0 n 13
where r, r0 2 g are left and right differentials of
Ad Ia;b I t 2 C1 (G). This is the so-called Sklyanin bracket.
Let us assume that G is a matrix group; its affine
where is the Weierstrass zeta function.
ring generated by evaluation functions ij which
5. Let g be an arbitrary semisimple Lie algebra
assign to L 2 G its matrix coefficients, ij (L) = Lij .
again. Let us equip the loop algebra Lg with the
The Poisson bracket on G is completely determined
inner product
by its values on ij . Explicitly, we get
hhX; Yii0 Resz0 hXz; Yziz1 dz
ij ; km L r; L Likjm 14
Set N = n _ g zC[[z]], N = n _ g
the commutator in the RHS is in Mat(n2 ). By a
z1 C[z1 ]. We have Lg = N _ h
_ N , where
variation of language, evaluating functions and their
we identify h, n
g with the corresponding
values on a generic element L 2 G are denoted by
subalgebras of constant loops in Lg. Let P , P0
the same letter; using tensor notation to suppress
be the projection operators onto N , h in this
matrix indices, we get
decomposition and r = P (1=2)P0 . The
classical r-matrices r define on Lg the structure fL1 ; L2 g r; L1 L2 ; L1 L I; L2 I L 15
of a factorizable Lie bialgebra. The associated
In the case of loop algebras, these Poisson bracket
tensor kernels are called the trigonometric classi-
relations take the form
cal r-matrices.
fL1 ; L2
g r;
; L1 L2
Classical r-matrices described above are associated
with factorization problems in the infinite-dimensional Let us assume that G is factorizable and the
loop groups: matrix Riemann problems or matrix associated factorization problem is globally solvable.
Cousin problems (in the elliptic case). Belavin and The Poisson bracket on the dual group G
516 Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups
ir (G )
G G may be characterized in terms of the the Poisson structure. Moreover, the maps , 0 and
matrix coefficients of (h , h ) = ir (h), or of their p, p0 form the so-called dual pairs, that is, the
quotient h = h h1
. Explicitly, we get algebras of functions which are constant on the fibers
1 2 of and 0 (or of p and p0 ) are mutual centralizers of
h1 ; h2
r; h1 h2 ; h ; h r ; h1 h2 16 one another in the big Poisson algebra F(D ).
Since D = G G = G G, we have G =D G,
fh1 ; h2 g rh1 h2 h1 h2 r h2 r h1 h1 r h2 ; G=D G ; it is easy to check that the quotient
17 Poisson structure induced on G, G coincides with
r 12 r r
the original one. Applying the fundamental theorem
on dual pairs of Poisson mappings (going back to S.
The key question in the geometry of Poisson
Lie), we conclude that symplectic leaves in G and G ,
groups consists in description of symplectic leaves in
respectively, coincide with the orbits of G (respec-
G, G . This question is already nontrivial when G is
tively, G) in these quotient spaces. The actions G
abelian (and hence may be identified with the dual of
G ! G , G G ! G are called dressing transfor-
the Lie algebra g = Lie(G)). The Poisson bracket on
mations. Unit elements in G and G are fixed points
g is linear; this is the well-known LiePoisson (alias,
of dressing transformations; their linearizations at the
BeresinKirillovKostant) bracket. Its symplectic
tangent spaces Te G g , Te G g coincide with the
leaves coincide with the orbits of the coadjoint
coadjoint actions of G and G , respectively.
representation of G in g . The natural way to prove
When D 6 G G (i.e., the factorization problem in
this fundamental result (which goes back to Lie) is to
D is not always solvable), dressing actions are still well
consider first the natural action of G on the
defined as global transformations of the quotient
cotangent bundle T G G g ; this action is
spaces; in this case G, G may be identified with open
Hamiltonian, and the coadjoint orbits arise as a
cells in D=G , D=G, respectively, which means that
result of Hamiltonian reduction associated with this
dressing action on G, G is, in general, incomplete.
action. The generalization of the theory of coadjoint
If the group G is factorizable, symplectic leaves in the
orbits to the case of arbitrary Poisson groups starts
dual group G admit a nice uniform description: since
with the notion of symplectic double, which is the
in this case D = G G and G
D is the diagonal
nonlinear analog of the cotangent bundle.
subgroup, the quotient D=G may be modeled on G
Let D be the double of (G, G ); assume for
itself. The quotient Poisson bracket in this realization
simplicity that D = G G globally and hence the
coincides with [17], while the dressing action coin-
associated factorization problem is always solvable.
cides with conjugation in G (and is independent of
Let rd = (1=2)(Pg Pg ). Set
r). Hence, symplectic leaves in D/G coincide with
f; g hrd r; r i hrd r0 ; r0 i 18 conjugacy classes in G; the equivalence of this model
with G (equipped with the bracket [16]) is provided
The bracket { , } is the usual Sklyanin bracket which by the factorization map. The description of sym-
defines the structure of a Poisson group on D, while plectic leaves in G is more subtle (and already
{ , } is nondegenerate and defines a symplectic crucially depends on the choice of r!); for semisimple
structure on D. Let us denote the copies of D equipped Lie groups with the standard Poisson structure, it is
with the bracket { , } by D . The bracket on D is not related to the geometry of double Bruhat cells.
multiplicative, but it is covariant with respect to the For loop groups with rational, trigonometric, or
action of D by left and right translations; in other elliptic r-matrices, dressing action is associated with
words, the natural mappings D D ! D and auxiliary factorization problems in the loop group.
D D ! D , associated with multiplication in D, Roughly speaking, symplectic leaves correspond to
preserve Poisson brackets. Since G,G
D are rational loops with prescribed singularities. Many
Poisson subgroups, natural actions G D ! D important examples have been described in connection
and G D ! D by left and right translations are with integrable lattice systems, although a complete
Poisson mappings. Consider the natural projections classification theorem is still not available. For
D D g = sl(2), the elliptic Manin triple described earlier
. & 0 p. &p0 leads to the Poisson structure on the group of elliptic
loops with values in SL(2); its simplest symplectic
G D=G GnD G G D=G G nD G
leaves (corresponding to loops with simple poles) are
onto the space of left and right coset classes. It is easy associated with a remarkable Poisson algebra, the
to see that functions on D which are constant on each Sklyanin algebra (with four generators and two
projection fiber are closed with respect to the Poisson Casimir functions), which admits an interesting
bracket. This means that the quotient spaces inherit explicit quantization.
Classical r-Matrices, Lie Bialgebras, and Poisson Lie Groups 517
Dressing action is a nontrivial example of a linear operators). Equations [19] give the compat-
Poisson group action. In general, such actions are ibility conditions for the auxiliary linear system
not Hamiltonian in the usual sense; the appropriate
d m
generalization is provided by the notion of the m1 Lm m; Mm m; m 2 Z 20
nonabelian moment map. Let G M ! M be an dt
action of a Poisson group G on a Poisson manifold The use of finite-difference operators associated with
M, g ! Vect M, the associated homomorphism of a one-dimensional lattice, as in [20], is particularly
Lie algebras. A mapping
: M ! G is called the well suited for the study of multiparticle lattice
nonabelian moment map associated with this action, models. Let we assume that the potential Lm in [20]
if for any X 2 g and 2 F(M), we have is periodic, LmN = Lm ; the period N may be
interpreted as the number of copies of an elemen-
X h
1 f
; gM ; Xi tary system. It is natural to presume that Lax
In this case, G M ! M is a fortiori a Poisson matrices Lm in [19] are elements of a matrix Lie
map. Both dressing actions G G ! G and G group G (or of a loop group, if they depend on an
G ! G admit nonabelian moment maps, which are extra parameter). The auxiliary linear problem [20]
just the identity maps
= idG and
= idG . For leads to a family of dynamical systems on GN which
compact Poisson groups, the nonabelian moment remain integrable for any N. Let T : GN ! G be the
map has good convexity properties, which general- monodromy map which assigns to the set
ize the convexity properties of the ordinary moment L1 , . . . , LN of local Lax matrices their ordered
map for Hamiltonian group actions. product TL = LN LN1 L1 . Let us assume that G is
The general theory of homogeneous Poisson spaces equipped with the Sklyanin bracket associated with a
has some peculiarities. Typically, the G-covariant factorizable r-matrix r. Then T is a Poisson map. Let
Poisson structure on a given homogeneous space is I(G) be the algebra of central functions on G; for 2
not unique (when it exists); this is true already for I(G), set H = T. All functions H , 2 I(G) are
principal homogeneous spaces (a simple example is in involution with respect to the product Poisson
provided by the symplectic double D ). Let G be a bracket on GN and give rise to lattice zero-curvature
Poisson Lie group, (g, g ) its tangent Lie bialgebra, d equations of the same form as [19]; for a given , we
its double, U its Lie subgroup, u = Lie U. A subalgebra may choose the M-matrix in either of the two forms:
l
d is called Lagrangian if it is isotropic with respect Y
M m r
1
m rTL m ; m Lk
to the canonical inner product in d. The general
1km
classification result, according to Drinfeld, asserts that
there is a bijection between G-covariant Poisson Let Lm (t), m = 1, . . . , N, be the integral curve of
structures on G=U and the set of all Lagrangian this equation which starts at L0m . The construction of
subalgebras l
d such that l \ g = u. Various non- this curve reduces to the factorization problem asso-
trivial examples arise, notably in the study of integr- ciated with the chosen r-matrix. Explicitly, we get
able systems. For instance, the geometric proof of the
factorization theorem for lattice zero-curvature equa- Lm t gm1 t1 0 1 0
Lm gm t gm1 t Lm gm t
tion, which is stated in the following section, uses a where (gm (t) , gm (t) ) is the curve in G which
different Poisson structure on the double (the so-called solves the factorization problem
twisted symplectic double).
gm t gm t1
0
m exptrTL0 0 1
m ;
0 0
m m L
Applications to Integrable Systems
This result exhibits the double role of the r-matrix.
The definition of PoissonLie groups was motivated On the one hand, it serves to define the Poisson
by key examples which arise in the theory of structure on GN which is adapted to the study of
integrable systems. In applications, one often deals lattice zero-curvature equations; in particular, the
with nonlinear differential equations which may be dynamical flow associated with these equations is
written in the form of the so-called lattice zero automatically confined to symplectic leaves in GN .
curvature equations (In applications, G is usually a loop group equipped
with a factorizable r-matrix; despite the fact that
dLm
Lm Mm Mm1 Lm ; m2Z 19 dim G = 1, it admits plenty finite-dimensional sym-
dt plectic leaves.) In its second incarnation, the r-matrix
where Lm , Mm are matrices, possibly depending on serves to define the factorization problem which
an additional parameter (or, more generally, abstract solves these zero-curvature equations. In the loop
518 Clifford Algebras and Their Representations
group case, this is a matrix Riemann problem; its 1998, Classic Reviews in Mathematics and Mathematical
explicit solution is based on the study of the spectral Physics, vol. 1. Amsterdam: Harwood Academic Publishers.
Chari V and Pressley A (1995) A Guide to Quantum Groups.
curve associated with the monodromy matrix TL Cambridge: Cambridge University Press.
and uses the technique of algebraic geometry. Drinfeld VG (1987) Quantum groups. In: Proceedings of the
The monodromy map T : GN ! G may be regarded International Congress of Mathematicians, (Berkeley, Calif.,
as a nonabelian moment map associated with an 1986) vol. 1, pp. 798820. Providence, RI: American
action of the dual Lie algebra g on the phase space. Mathematical Society.
Etingof P and Schiffman O (1998) Lectures on Quantum Groups.
This action actually extends to an action of the (local) Boston: International Press.
Lie group G which transforms solutions into solu- Frenkel E, Reshetikhin N, and Semenov-Tian-Shansky MA (1998)
tions again. This is the prototype dressing action DrinfeldSokolov reduction for difference operators and
(originally defined by Zakharov and Shabat in their deformations of W-algebras. I. The case of Virasoro algebra.
study of zero-curvature equations related to Riemann Communications in Mathematical Physics 192(3): 605629.
Lu J-H (1991) Momentum mappings and reduction of Poisson
Hilbert problems). Dressing provides an effective tool actions. Symplectic Geometry, Groupoids, and Integrable Sys-
to produce new solutions of zero-curvature equations tems (Berkeley, CA, 1989), Mathematical de Sciences Research
from the trivial ones; it was also the first nontrivial Institute Publications vol. 20: 209226. New York: Springer.
example of a Poisson group action. Lu J-H and Weinstein A (1990) PoissonLie groups, dressing
transformations, and Bruhat decompositions. Journal of
See also: Affine Quantum Groups; Bicrossproduct Differential Geometry 31(2): 501526.
Reshetikhin N (2000) Characteristic systems on PoissonLie
Hopf Algebras and Noncommutative Spacetime;
groups and their quantization. In: Integrable Systems:
Bi-Hamiltonian Methods in Soliton Theory; Deformations
From Classical to Quantum (Montreal, QC, 1999), CRM
of the Poisson Bracket on a Symplectic Manifold; Proceedings Lecture Notes, vol. 26, pp. 165188. Providence,
Functional Equations and Integrable Systems; RI: American Mathematical Society.
Hamiltonian Fluid Dynamics; Hopf Algebras and Reshetikhin NY and Semenov-Tian-Shansky MA (1990) Central
q-Deformation Quantum Groups; Integrable Systems extensions of quantum current groups. Letters in Mathema-
and Recursion Operators on Symplectic and Jacobi tical Physics 19(2): 133142.
Manifolds; Integrable Systems: Overview; Lie, Symplectic Reyman AG and Semenov-Tian-Shansky MA (1994) Group-
and Poisson Groupoids, and their Lie Algebroids; Multi- theoretical methods in the theory of finite-dimensional integrable
Hamiltonian Systems; Poisson Reduction; Recursion systems. In: Encyclopaedia of Mathematical Sciences, Dynamical
Systems VII, ch. 2, vol. 16, pp. 116225. Berlin: Springer.
Operators in Classical Mechanics; Toda Lattices;
Semenov-Tian-Shansky MA (1994) Lectures on R-matrices,
YangBaxter Equations.
PoissonLie groups and integrable systems. In: Babelon O,
Cartier P, and Kosmann-Schwarzbach Y (eds.) Lectures on
Integrable Systems (Sophia-Antipolis, 1991), pp. 269317.
Further Reading River Edge: World Scientific.
Terng C-L and Uhlenbeck K (1998) Poisson actions and scattering
Babelon O, Bernard D, and Talon M (2003) Introduction to Classical theory for integrable systems. In: Surveys in Differential Geome-
Integrable Systems. Cambridge: Cambridge University Press. try: Integrable Systems, pp. 315402. Lectures on geometry and
Belavin AA and Drinfeld VG (1984) Triangle equations and simple topology, sponsored by Lehigh Universitys Journal of Differential
Lie algebras. In: Mathematical physics reviews, vol. 4, Soviet Geometry. A supplement to the Journal of Differential Geometry.
Scientific Reviews Section C Mathematical Physics Reviews, Edited by Chuu Lian Terng and Karen Uhlenbeck. Surveys in
pp. 93165. Chur: Harwood Academic Publishers, Reprinted in Differential Geometry IV, Boston: International Press.
Clifford algebras and spinors are implicit in every s 2 S1 and ! 2 S2 . If S1 and S2 are complex
Euclids solution of the Pythagorean equation x2 vector spaces, then a map f : S1 ! S2 is said to be
y2 z2 = 0, which is equivalent to semilinear if it is R-linear and f (is) = if (s). The
! ! complex conjugate of a finite-dimensional complex
yx z p vector space S is the complex vector space S of all
=2 p q 1
z yx q semilinear maps from S to C. There is a natural
semilinear isomorphism (complex conjugation) S ! S,
and gives x = q2 p2 , y = p2 q2 , z = 2pq. If the s 7! s such that h!, si = hs, !i for every ! 2 S .
numbers appearing in [1] are real, then this equation The space S can be identified with S and then s = s.
can be interpreted as providing a representation of a The spaces (S) and S are identified. If f : S1 ! S2
vector (x, y, z) 2 R 3 , null with respect to a quadratic is a complex-linear map, then there is the complex-
form of signature (1, 2), as the square of a spinor conjugate map f : S1 ! S2 given by f (s) = f (s) and
def
(p, q) 2 R2 . The pure spinors of Cartan (1938) the Hermitian conjugate map f y f : S1 ! S2 .
provide a generalization of this observation to A linear map A : S ! S such that A = A is said to
y
quadratic (resp., symplectic) if B is symmetric (resp., chosen in V so that, defining g = g(e , e ), one
antisymmetric and nonsingular). A quadratic space is has g = (1)1 and, if 6 , then g = 0.
characterized by its quadratic form s 7! B(s, s). For If A : S ! S is a Hermitian isomorphism, then
K = C, a Hermitian map A : S ! S defines a there is a (pseudo)unitary frame (e ) in S such that
Hermitian scalar product A(s, t) = hs, A(t)i. the matrix A = A(e , e ) is diagonal, has p 1s
An orthogonal space is defined here as a quadratic and q 1s on the diagonal, p q = dim S. If p = q,
space (S, B) such that B : S ! S is an isomorphism. then A is said to be neutral. A is definite if either p
The group of automorphisms of an orthogonal space or q = 0.
is the orthogonal group O(S, B). The group of
automorphisms of a symplectic space is the sym-
plectic group Sp(S, B). The dimension of a symplec-
tic space is even. If S = K2n is a symplectic space Algebras
over K = R or C, then its symplectic group is Definitions An algebra over K is a vector space A
denoted by Sp2n (K). Two quaternionic symplectic over K with a bilinear map A A ! A, (a, b) 7! ab,
groups appear in the list of spin groups of low- which is distributive with respect to addition.
dimensional spaces: The algebra is associative if (ab)c = a(bc) holds for
all a, b, c 2 A. It is commutative if ab = ba for all
Sp2 H = fa 2 H2 j ay a = Ig
a, b 2 A. An element 1A is the unit of A if
and 1A a = a1A = a holds for every a 2 A.
From now on, unless otherwise specified, the bare
Sp1;1 H = fa 2 H2 j ay z a = z g word algebra denotes a finite-dimensional, associa-
tive algebra over K = R or C, with a unit element.
Here ay denotes the matrix obtained from a by
If S is an N-dimensional vector space over K, then the
transposition and quaternionic conjugation.
set End S of all endomorphisms of S is an N2-
dimensional algebra over K, the product being
defined by composition; if f , g 2 End S, then one
Contractions, frames, and orthogonality From now writes fg instead of f g; the unit of End S is
on, unless otherwise specified, (V, g) is a quadratic the identity map I. By definition, homomorphisms
space of dimension m. Let ^V = m p
p = 0 ^ V be its of algebras map units into units. The map K ! A,
exterior (Grassmann) algebra. For every v 2 V and a 7! a1A is injective and one identifies K with its
w 2 ^V there is the contraction gvcw characterized image in A by this map so that the unit can be
as follows. The map V ^V ! ^V, v, w 7! represented by 1 2 K A. A set B A is said to
gvcw, is bilinear; if x 2 ^p V, then gvcx ^ w = generate A if every element of A can be represented
gvcx ^ w 1p x ^ gvcw and gvcv= gv, v. as a linear combination of products of elements of B.
A frame (e ) in a quadratic space (V, g) is said to For example, if V is a vector space over K, then its
be a quadratic frame if 6 implies g(e , e ) = 0. tensor algebra
For every subset W of V there is the orthogonal
p
subspace W ? containing all vectors that are ortho- T V = 1
p=0 V
gonal to every element of W.
If (V, g) is a real orthogonal space, then there is an is an (infinite-dimensional) algebra over K generated
orthonormal frame (e ), = 1, . . . , m, in V such that by K V. The algebra of all N N matrices
k frame vectors have squares equal to 1, l frame with entries in an algebra A is denoted by A(N).
vectors have squares equal to 1 and k l = m. The Its unit element is the unit matrix I. In particular,
pair (k, l) is the signature of g. The quadratic form g R(N), C(N), and H(N) are algebras over R. The
is said to be neutral if the orthogonal space (V, g) algebra R(2) is generated by the set fx , z g. As a
admits two maximal totally null subspaces W and vector space, the algebra R(2) is spanned by the set
W 0 such that V = W W 0 . Such a space V is 2n- fI, x , ", z g.
dimensional, either complex or real with g of The direct sum A B of the algebras A and B
signature (n, n). A Lorentzian space has maximal over K is an algebra over K such that its underlying
totally null subspaces of dimension 1 and a vector space is A B and the product is defined by
Euclidean space, characterized by a definite quad- (a, b)
(a0 , b0 ) = (aa0 , bb0 ) for every a, a0 2 A and
ratic form, has no null subspaces. The Minkowski b, b0 2 B. Similarly, the product in the tensor
space is a Lorentzian space of dimension 4. product algebra A K B is defined by
If (V, g) is a complex orthogonal space, then an
orthonormal frame (e ), = 1, . . . , m, can be a b
a0 b0 = aa0 bb0 3
Clifford Algebras and Their Representations 521
For example, if A is an algebra over R, then the isomorphism, then the representations 1 and 2 are
tensor product algebra R(N) R A is isomorphic to said to be equivalent, 1 2 . The following two
A(N) and propositions are classical:
KN K KN 0 = KNN 0 4 Proposition (A)
for K = R or C and N, N0 2 N. There are isomorph- (i) An algebra over K is simple if and only if it
isms of algebras over R: admits a faithful irreducible representation in a
vector space over K. Such a representation is
C R C = C C unique, up to equivalence.
C R H = C2 5 (ii) The complexification of a central simple algebra
H R H = R4 over R is a central simple algebra over C.
An algebra over R can be complexified by complex- For real algebras, one often considers complex
ifying its underlying vector space; it follows from [5] representations, that is, representations in complex
that C(2) is the complex algebra obtained by vector spaces. Two such representations 1 : A !
complexification of the real algebra H. End S1 and 2 : A ! End S2 are said to be complex
The center of an algebra A is the set equivalent if there is a complex isomorphism F : S1 !
S2 intertwining the representations; they are real
ZA = fa 2 A j ab = ba 8 b 2 Ag equivalent if there is an isomorphism among the
The center is a commutative subalgebra containing realifications of S1 and S2 , intertwining the
K. An algebra over K is said to be central if its center representations. For example, C, considered as an
coincides with K. The algebras R(N) and H(N) are algebra over R, has two complex-inequivalent
central over R. The algebra C(N) is central over C, representations in C : the identity representation
but not over R. and its complex conjugate. The realifications of
these representations, given by i 7! " and i 7! ",
respectively, are real equivalent: they are intertwined
Simplicity and representations Let B1 and B2
by z . The real algebra H, being central simple, has
be subsets of the algebra A. Define B1 B2 = fb1 b2 j
only one, up to complex equivalence, representation
b1 2 B1 , b2 2 B2 g. A vector subspace B of A is said
in C2: every such representation is equivalent to the
to be a left (resp., right) ideal of A if AB B (resp.,
one given by
BA B). A two-sided ideal or simply an ideal is
p p p
a left and right ideal. An algebra A 6 f0g is said to i 7! x = 1; j 7! y = 1; k 7! z = 1
be simple if its only two-sided ideals are f0g and A.
For example, the algebras R(N) and H(N) are This representation extends to an injective homo-
simple over R; the algebra C(N) is simple when morphism of algebras i : H(N) ! C(2N) which is used
considered as an algebra over both R and C; every to define the quaternionic determinant of a matrix a 2
associative, finite-dimensional simple algebra over R H(N) as detH a= det ia, so that detH (a)5 0 and
or C is isomorphic to one of them. detH (ab)= detH (a)detH (b) for every a, b 2 H(N). In
A representation of an algebra A over K in a vector particular, if q 2 H and , 2 R, then detH (q)= q
q and
space S over K is a homomorphism of algebras : A ! !
q
End S. If is injective, then the representation is said to detH = qq2 6
be faithful. For example, the regular representation : q
A ! End A of an algebra A, defined by (a)b = ab There are quaternionic unimodular groups
for all a, b 2 A, is faithful. A vector subspace T of SLN H = fa 2 HN j detH a = 1g. For example,
the vector space S carrying a representation of A the group SL1 (H) is isomorphic to SU2 and SL2 (H)
is said to be invariant for if (a)T T for every is a noncompact, 15-dimensional Lie group, one of
a 2 A; it is proper if distinct from both f0g and S. the spin groups in six dimensions.
For example, a left ideal of A is invariant for the
regular representation. Given an invariant subspace
T of one can reduce to T by forming the Antiautomorphisms and inner products An auto-
representation T : A ! End T, where T (a)s = (a)s morphism of an algebra A is a linear isomorphism :
for every a 2 A and s 2 T. A representation is A ! A such that (ab) = (a)(b). An invertible
irreducible if it has no proper invariant subspaces. element c 2 A defines an inner automorphism Ad(c) 2
A linear map F : S1 ! S2 is said to intertwine the GL(A), Ad(c)a = cac1 . Complex conjugation in C,
representations 1 : A ! End S1 and 2 : A ! End S2 if considered as an algebra over R, is an automorphism
F1 (a) = 2 (a)F holds for every a 2 A. If F is an that is not inner. An antiautomorphism of an
522 Clifford Algebras and Their Representations
algebra A is a linear isomorphism : A ! A such that when one reduces the degree of every element
(ab) = (b)(a) for all a, b 2 A. An (anti)auto- mod 2. A graded isomorphism of graded algebras
morphism is involutive if 2 = id. For example, is an isomorphism that preserves the grading.
conjugation of quaternions defines an involutive A Z2 -grading of A is characterized by the
antiautomorphism of H. involutive automorphism such that, if a 2 Ap ,
Let : A ! End S be a representation of an algebra then (a) = (1)p a. From now on, grading means
with an involutive antiautomorphism . There is then Z2 -grading unless otherwise specified. The elements
the contragredient representation : A ! End S given of A0 (resp., A1 ) are said to be even (resp., odd). It
by (a) = (((a))) . If, moreover, A is central simple is often convenient to denote the graded algebra as
and is faithful irreducible, then there is an isomorph-
ism B : S ! S intertwining and which is either A0 ! A 7
symmetric, B = B, or antisymmetric, B = B. It Given such an algebra over K and N 2 N, one
defines on S the structure of an inner-product space. constructs the graded algebra A0 (N) ! A(N). Two
This structure extends to End S: there is a symme- graded algebras over K, A0 ! A and A00 ! A0 are
tric isomorphism B B1 : End S ! (End S) = End S said to be of the same type if there are integers N
given, for every f 2 End S, by (B B1 )(f ) = Bf B1 . and N 0 such that the algebras A0 (N) ! A(N) and
Let K = Knf0g be the multiplicative group of the A00 (N 0 ) ! A0 (N 0 ) are graded isomorphic. The prop-
field K. Given a simple algebra A with an involutive erty of being of the same type is an equivalence
antiautomorphism , one defines N(a) = (a)a and relation in the set of all graded algebras over K.
the group Given an algebra A, one constructs two canoni-
G = fa 2 A j Na 2 K g cal graded algebras as follows:
and b 2 Aq , is given as the supercommutator a, b = map such that f (v)2 = g(v, v)1A for every v 2 V. There
ab (1)pq ba. then exists a homomorphism f : C(V, g) ! A of
algebras with units, an extension of f, so that f (v) = f(v)
Supercentrality and graded simplicity A graded for every v 2 V.
algebra A over K is supercentral if Z(A) \ A0 = K. As a corollary, one obtains
The algebra R ! C is supercentral, but the real
ungraded algebra C is not central. Proposition (D) If f is an isometry of (V, g) into
A subalgebra B of a graded algebra A is said to be (W, h), then there is a homomorphism of algebras
a graded subalgebra if B = B \ A0 B \ A1 . A C(f ) : C(V, g) ! C(W, h) extending f so that there
graded ideal of A is an ideal that is a graded is the commutative diagram
subalgebra. A graded algebra A 6 f0g is said to be Cf
graded simple if it has no graded ideals other than CV; g ! CW; h
f0g and A. The double algebra of a simple algebra is " "
graded simple, but not simple. V ! W
f
0-form, then [10] shows that the Clifford and exterior The Chevalley Theorem and the BrauerWall
multiplications coincide and C(V, 0) is isomorphic, as Group
an algebra, to the Grassmann algebra. If (V, g) and (W, h) are quadratic spaces over K, then
their sum is the quadratic space (V W, g h)
Complexification of Real Clifford Algebras characterized by g h : V W ! V W so that
Proposition (F) If (V, g) is a real quadratic space, (g h)(v, w) = (g(v), h(w)). By noting that the map
^ C(W, h)
V W 3 (v, w)7! v 1 1 w 2 C(V, g)
then the algebras C C(V, g) and C(C V, C g)
are isomorphic, as graded algebras over C. has the Clifford property, Chevalley proved
From now on, through the end of the article, one Proposition (I) The algebra C(V W, g h) is
^ C(W, h).
isomorphic to the algebra C(V, g)
assumes that (V, g) is an orthogonal space over
K = R or C. The type of the (graded) algebra C(V W, g h)
The Clifford algebra associated with the orthogo- depends only on the types of C(V, g) and C(W, h).
nal space Cm is denoted by Cm . The Clifford The Chevalley theorem (I) shows that the set of types
algebra associated with the orthogonal space of Clifford algebras over K forms an abelian group for
(Rkl , g), where g is of signature (k, l), is denoted a multiplication induced by the graded tensor product.
by Ck, l , so that C Ck, l = Ckl . The unit of this BrauerWall group of K is the type of
the algebra C(K2 , h) described in [11]; for a full
Relations between Clifford Algebras in Spaces of account with proofs, see Wall (1963).
Adjacent Dimensions
The Volume Element and the Centers
Consider an orthogonal space (V, g) over K and the
Let e = (e ) be an orthonormal frame in (V, g). The
one-dimensional orthogonal space (K, h1 ), having a
unit vector w 2 K, h1 (w, w) = ", where " = 1 or 1. volume element associated with e is
The map V 3 v 7! vw 2 C0 (V K, g h1 ) satisfies
= e1 e2
em
(vw)2 = "g(v, v) and extends to the isomorphism
of algebras C(V, "g) ! C0 (V K, g h1 ). This If
0 is the volume element associated with another
proves orthonormal frame e0 in the same orthogonal space,
then either
0 =
(e and e0 are of the same
Proposition (G) There are isomorphisms of algebras: orientation) or
0 =
(e and e0 are of opposite
Cm ! C0m1 and Ck, l ! C0k1, l . orientation). For K = C, one has
2 = 1; for K = R
Consider the orthogonal space (K2 , h) with a and g of signature (k, l) one has
neutral h such that, for , 2 K, one has
2 = 11=2klkl1 13
h(, ), h(, )i = . The map
! It is convenient to define 2 f1, ig so that
2 = 2 . For
2
0 every v 2 V one has v
= (1)m1
v. The structure of
K ! K2; ; 7! the centers of Clifford algebras is as follows:
0
Proposition (J) If m is even, then Z(C(V, g)) = K
has the Clifford property and establishes the and Z(C0 (V, g)) = K K
. If m is odd, then
isomorphisms represented by the horizontal arrows Z(C(V, g)) = K K
and Z(C0 (V, g)) = K.
in the diagram The graded algebra C(V, g) is supercentral for
every m.
CK2 ; h ! K2
" " 11 The Structure of Clifford Algebras
C0 K2 ; h ! KK The complex case Using [4] one obtains from [11]
and [12] the isomorphisms of algebras
C02n1 = C2n = C2n 14
2
Proposition (H) If (K , h) is neutral and (V, g) is
over K, then the algebra C(V K2 , g h) is
C2n1 = C02n2 = C2n C2n 15
isomorphic to the algebra C(V, g) K(2)_ Specifically,
there are isomorphisms for n = 0, 1, 2 , . . . : Therefore, there are only two types
of complex Clifford algebras, represented by
Ck1;l1 = Ck;l R2
12 C ! C C and C C ! C(2) : the BrauerWall
Cm2 = Cm C2 group of C is Z2 .
Clifford Algebras and Their Representations 525
The real case In view of proposition (I) and The spinorial clock is symmetric with respect to
C1, 1 = R(2), the algebra Ck, l is of the same type as the reflection in the vertical line through its center;
Ckl, 0 if k > l and of the same type as C0, lk this is a consequence of the isomorphism of algebras
if k < l. Since Ck, l ^ Cl, k = Ckl, kl , the type Ck, l2 = Cl, k R(2).
of Cl, k is the inverse of the type of Ck, l . The algebra Note that the abstract algebra Ck, l carries, in
C04, 0 ! C4, 0 is isomorphic to H H ! H(2): if general, less information than the Clifford algebra
x = (x1 , x2 , x3 , x4 ) 2 R4 C4, 0 , and q = ix1 jx2 defined in [8], which contains V as a distinguished
kx3 x4 2 H, then an isomorphism is obtained from vector subspace with the quadratic form
the Clifford map f , v 7! v2 = g(v, v). For example, the algebras C8, 0 ,
! C4, 4 , and C0, 8 are all graded isomorphic.
0 q
f x = 16
q 0 Theorem on Simplicity
irreducible representation
: C(V, g) ! End S in the Example One of the most used representations
:
2n-dimensional complex vector space S of Dirac C3, 1 ! C(4) is given by the Dirac matrices
spinors. The Dirac endomorphisms (matrices) are ! !
0 x 0 y
=
(e ). Put
(
) so that 2 = I: the matrix
1 = ;
2 =
generalizes the familiar
5 . The Dirac representation
x 0 y 0
restricted to C0 (V, g) decomposes into the sum
of two irreducible representations in the vector spaces
! ! 20
S
= fs 2 S j s =
sg 0 z 0 I
3 = ;
4 =
of Weyl (chiral) spinors. The elements of S are said z 0 I 0
to be of opposite chirality with respect to those of
S . The transpose defines a similar split of S . Change Conjugation and Majorana Spinors
The representations
and
are never complex-
equivalent, but they are real equivalent and Throughout this section and next, one assumes
faithful for K = R and (1=2)(k l) odd. K = R so that, given a representation : C(V, g) !
The representations
and
are both equiva- End S,one can form the complex- (charge) conjugate
lent to
. It is convenient to describe simultaneously representation : C(V, g) ! End S defined by
the properties of the transpositions of the Pauli and (a) = (a) and the Hermitian conjugate representa-
Dirac matrices; let be either the Pauli matrices tion y : C(V, g) ! End S , where y (a) = (a).
for V of dimension 2n 1 or the Dirac matrices for
V of dimension 2n. There is a complex isomorphism Even dimensions The representations
and
are
B : S ! S such that equivalent: there is an isomorphism C : S !
S such
that
= 1n B B1 18
= C
C1 21
n
In the case of the Dirac matrices, the factor (1) in The automorphism CC is in the commutant of
; it
[18] implies that this equation also holds for in is, therefore, proportional to I and, by a change of
place of . The isomorphism B preserves (resp., scale, one can achieve CC = I for k l 0 or
changes) the chirality of Weyl spinors for n even
6 mod 8 and CC I for k l 2 or 4 mod 8.
(resp., odd). Every matrix of the form B
1 . . .
p , The spinor sc C1s 2 S is the charge conjugate of
where s 2 S. If : V ! S is a solution of the Dirac equation
141 <
< p 2n 19
@ iqA = 0
(resp., ) if
2 = 1 (resp.,
2 = 1). There is an of dimension 2(m) , where (m) is the mth Radon
isomorphism C : S ! S such that Hurwitz number given by
m= 1 2 3 4 5 6 7 8
= 11=2kl1 C C1 22
m = 1 2 2 3 3 3 3 4
= I (resp., CC
and CC = I) for k l 1 or 7 mod 8
(resp., k l 3 or 5 mod 8). For k l 1 mod 8, the and (m 8) = (m) 4. The matrices 2 R(2(m) ),
restriction of the Pauli representation to C0k, l is real = 1, . . . , m, defining these representations satisfy
and the Pauli matrices are pure imaginary; for k l v v = 2v I
7 mod 8, the Pauli representations of Ck, l are both real
and so are the Pauli matrices. In both these cases there and can be chosen so as to be antisymmetric. In all
are PauliMajorana spinors. dimensions other than m 3 mod 4 the representa-
tions are faithful.
For m 2 and 4 mod 8 (resp., m 1, 3, and
Hermitian Scalar Products and Multivectors 5 mod 8) the representations are the realifications of
For m = k l odd and C as in [22], the map the corresponding Dirac (resp., Pauli) representations.
:S !
A = BC S intertwines the representations y In dimensions m 0 and 6 mod 8 (resp.,
and (resp., ) for k even (resp., odd), m 7 mod 8) the Dirac (resp., Pauli) representations
themselves are real.
y = 1k A A1
By rescaling of B, the map A can be made Inductive Construction
Hermitian. The corresponding Hermitian form
of Representations
s 7! A(s, s) is definite if and only if k or l = 0;
otherwise, it is neutral. An inductive construction of the Pauli
For m = k l even, the representations
y and
representations
are equivalent and one can define a Hermitian
isomorphism A : S !
S so that : Cn1;n ! R2n1 ; n = 1; 2; . . .
and of the Dirac representations
y = A
A1 23
: Cn;n ! R2n ; n = 1; 2; . . .
The isomorphism A0 = A intertwines the represen-
tations
y and
; it can also be made Hermitian is as follows.
by rescaling. The Hermitian form A(s, s) is definite 1. In dimension 1, put 1 = 1.
for k = 0 and A0 (s, s) is definite for l = 0; otherwise, 2. Given 2 R(2n1 ), = 1, . . . , 2n 1, define
these forms are neutral. For example, in the familiar !
representation [20], one has A =
4 , a neutral form. 0
For p = 0, 1, . . . , m = 2n, two spinors s and t 2 S
= for = 1; . . . ; 2n 1
0
define the p-vector with components
and
A1 ...p s; t = hs; A
1 . . .
p ti 24
!
0 I
where the indices are as in [19]. The Hermiticity of
2n =
A and [23] imply I 0
forms of other signatures. For example, in dimension fields on odd-dimensional spheres can be constructed
3, (1 , i2 , 3 ) are the Pauli matrices. In dimension 4, with the help of the representation described in
multiplying
2 by i one obtains the Dirac matrices for g proposition (M). Given a positive even integer N, let
of signature (1, 3), in the chiral representation: m be the largest integer such that N = 2(m) p, where
p is an odd integer. Consider the unit sphere
0 x 0 y
1 ;
2 SN1 = fx 2 RN j jjxjj = 1g of dimension N 1. For
x 0 y 0 v 2 Rm , put 0 (v) = (v) I, where I 2 R(p) is the
25
0 z 0 I unit matrix. Since (v) is antisymmetric, so is the
3 ;
4
z 0 I 0 matrix 0 (v) 2 R(N). Therefore, for every x 2 SN1 ,
the vector 0 (v)x is orthogonal to x. The map
To obtain the real Majorana representation one uses x 7! 0 (v)x defines a vector field on SN1 that
the following fact: vanishes nowhere unless v = 0 : the (N1)-sphere
Proposition (N) If the matrix C 2 R(2n ) is such admits a set of m tangent vector fields which are
that C2 = I and [21] holds, then the matrices linearly independent at every point. Using methods of
(I iC)
(I iC)1 , = 1, . . . , 2n, {\it are real}. algebraic topology, it has been shown that this
method gives the maximum number of linearly
For the matrices [25], one can take C =
1
3
4 to independent tangent vector fields on spheres.
obtain If m = 1, 3, or 7, then m 1 = 2(m) and, for these
! ! values of m, the sphere Sm is parallelizable. More-
0 x I 0
0
1 = ; 0
2 = over, one can then introduce in Rm1 the structure
x 0 0 I of an algebra Am as follows. Put 0 = I. If e0 2 Rm1
! ! is a unit vector and e = (e0 ), then (e0 , e1 , . . . , em )
0 z 0 I is anPorthonormal framePin Rm1 . The product of
30 = ;
40 = x= m m
= 0 x e and y = = 0 y e is defined to be
z 0 I 0
X
m
representation of C(V, g) or C C(V, g) gives u1 . . . u2p v1 . . . v2q such that u2i = 1 and v2j = 1.
spinor representations of the spinor groups it The connected groups Spinm:0 and Spin0, m are
contains. isomorphic and denoted by Spinm . Since Spin0k, l
G1 (), the Hermitian form A and the bilinear form
Pin Groups
B are invariant with respect to the action of this
group. Moreover, for k l even, from [24] and
It is convenient to define a unit vector v 2 V [28] there follows the transformation law of
C(V, g) to be such that v2 = 1 for V complex and multivectors formed from pairs of spinors,
v2 = 1 or 1 for V real. The group Pin(V, g) is
A1
p
as;
at
defined as the subgroup of Cpin(V, g) consisting of
products of all finite sequences of unit vectors. = Av1 ...vp s; tRv11 a1 . . . Rvpp a1
f
Defining now the twisted adjoint representation Ad
f 1
Consider Spin0 (V, g) and assume that either V is
by Ad(a)v = (a)va , one ontains the exact sequence complex of dimension 52 or real with k or l 5 2.
e
Ad Then there are two unit orthogonal vectors
1 ! Z2 ! PinV; g!OV; g ! 1 27
e1 , e2 2 V such that (e1 , e2 )2 = 1. The vector
If dimV is even, then the adjoint representation u(t) = e1 cos t e2 sin t is obtained from e1 by rotation
Ad(a)v = ava1 also yields an exact sequence like in the plane span fe1 , e2 g by the angle t 2 R. The
[27]; if it is odd, then the image of Ad is SO(V, g) and curve t 7! e1 u(t), 0 t , connects the elements
the kernel is the four-element group f1, 1,
,
g. 1 and 1 of Spin0 (V, g). Its image in SO0 (V, g), that
Given an orthonormal frame (e ) in (V, g) and is, the curve t 7! Ad(e1 u(t)), 0 t , is closed:
a 2 Pin(V, g), one defines the orthogonal matrix Ad(1) = Ad(1). This fact is often expressed by
R(a) = (Rv (a)) by saying that a spinor undergoing a rotation by 2
f v
changes sign. There is no homomorphism not
Adae = ev R a 28 even a continuous map f : SO0 (V, g) ! Spin0 (V, g)
If (V, g) is complex, then the algebras C(V, g) and such that Ad f = id.
C(V, g) are isomorphic; this induces an iso-
morphism of the groups Pin(V, g) and Pin(V, g). Spinc Groups
If V = Cm , then this group is denoted by Pinm (C). If
V = Rkl and g of signature (k, l), then one writes For the purposes of physics, to describe charged
Pin(V, g) = Pink, l . A similar notation is used for the fermions, and in the theory of the SeibergWitten
groups spin, see below. invariants, one needs the Spinc groups that are spinorial
extensions of the real orthogonal groups by the group U1
of phase factors. Assume V to be real and g of
Spin Groups
signature (k, l) so that the sequence [29] can be
The spin group Spin(V, g) = Pin(V, g) \ C0 (V, g) is written as
generated by products of all sequences of an even
1 ! Z2 ! Spink;l ! SOk;l ! 1
number of unit vectors. Since the algebras C0 (V, g)
and C0 (V, g) are isomorphic, so are the groups Define the action of Z2 = f1, 1g in Spink, l U1 so
Spin(V, g) and Spin(V, g). Since (a) = a for a 2 that (1)(a, z) = ( a, z). The quotient (Spink, l
Spin(V, g), the twisted adjoint representation U1 )=Z2 = Spinck, l yields the extensions
reduces to the adjoint representation and yields the
exact sequence 1 ! U1 ! Spinck;l ! SOk;l ! 1
Ad
1 ! Z2 ! SpinV; g ! SOV; g ! 1 29 and
1 ! Spink;l ! Spinck;l ! U1 ! 1
For V = Cm , the spin group is denoted by Spinm (C).
Since Spinm (C) G1 (), the bilinear form B is For example, Spin3 = SU2 and Spinc3 = U2 .
invariant with respect to the action of this group.
Consider the four-dimensional vector space See also: Dirac Operator and Dirac Field; Index
(of twistors) T over K, with a volume element Theorems; Relativistic Wave Equations Including Higher
vol 2 ^4 T. The six-dimensional vector space Spin Fields; Spinors and Spin Coefficients; Twistors.
V = ^2 T has a scalar product g defined by
g(u, v)vol = 2u ^ v for u, v 2 V. The quadratic form
g(u, u) is the Pfaffian, Pf(u). If u 2 V is represented Further Reading
by the corresponding isomorphism T ! T and a 2
End T, then Pf(aua ) = det aPf(u). The last for- Adams JF (1981) Spin (8), triality, F4 and all that. In: Hawking
SW and Rocek M (eds.) Superspace and Supergravity.
mula shows Spin0 (V, g) = SL(T), so that Spin6 (C) =
Cambridge: Cambridge University Press.
SL4 (C). For K = R, the Pfaffian is of signature (3, 3), so Atiyah MF, Bott R, and Shapiro A (1964) Clifford modules.
that Spin03, 3 = SL4 (R). A non-null vector v 2 V defines Topology 3(suppl. 1): 338.
a symplectic form on T . The five-dimensional vector Baez JC (2002) The octonions. Bulletin of the American
space v? V is invariant with respect to the symplec- Mathematical Society 39: 145205.
Brauer R and Weyl H (1935) Spinors in n dimensions. American
tic group Sp(T , u) = Spin0 (v? , Pfjv? ). This shows that
Journal of Mathematics 57: 425449.
Spin5 (C) = Sp4 (C) and Spin02, 3 = Sp4 (R). Spin groups Budinich P and Trautman A (1988) The Spinorial Chessboard,-
for other signatures in real dimensions 6 and 5 are Trieste Notes in Physics. Berlin: Springer.
obtained by considering appropriate real subspaces of Cartan E (1938) Theorie des spineurs. Actualites Scientifiques et
C6 and C5 , respectively. For example, [6] is used to Industrielles, No. 643 et 701. Paris: Hermann (English
transl.:The Theory of Spinors. Paris: Hermann, 1966).
show that Spin01, 5 = SL2 (H).
Chevalley C (1954) The Algebraic Theory of Spinors. New York:
Spin groups in dimensions 4 and lower are Columbia University Press.
similarly obtained from the observation that det is Clifford WK (1878) Applications of Grassmanns extensive
a quadratic form on the four-dimensional space K(2) algebra. American Journal of Mathematics 1: 350358.
and C0 (K(2), det) = K(2) K(2). Clifford WK (1882) On the classification of geometric algebras.
In: Tucker R (ed.) Mathematical Papers by William Kingdon
Several spin groups are listed below.
Clifford, pp. 397401. London: Macmillan.
The complex spin groups Dirac PAM (1928) The quantum theory of the electron.
Proceedings of the Royal Society of London A 117: 610624.
Spin2 C = C ; Spin3 C = SL2 C Eckmann B (1942) Gruppentheoretische Beweis des Satzes von
HurwitzRadon uber die Komposition quadratischer Formen.
Spin4 C = SL2 C SL2 C Commentarii Mathematici Helvetici 15: 358366.
Spin5 C = Sp4 C Karoubi M (1968) Algebres de Clifford et K-theorie. Annales
Scientifiques de lEcole Normale Superieure 4eme ser 1: 161270.
Spin6 C = SL4 C Lipschitz RO (1886) Untersuchungen uber die Summen von
Quadraten. Berlin: Max Cohen und Sohn.
The real, compact spin groups Lounesto P (2001) Clifford Algebras and Spinors, 2nd edn.
London Math. Soc. Lecture Note Series, vol. 286. Cambridge:
Spin2 = U1 ; Spin3 = SU2
Cambridge University Press.
Spin4 = SU2 SU2 ; Spin5 = Sp2 H Pauli W (1927) Zur Quantenmechanik des magnetischen
Elektrons. Z. Physik 43: 601623.
Spin6 = SU4
Penrose R and MacCallum MAH (1973) Twistor theory: an
The groups Spin0k, l for 1 4 k 4 l and k l 6 approach to the quantisation of fields and space-time. Physics
Report 6C(4): 241316.
Spin01;1 = R ; Spin01;2 = SL2 R Porteous IR (1995) Clifford Algebras and the Classical Groups,
Cambridge Studies in Advanced Mathematics, vol. 50. Cam-
Spin01;3 = SL2 C bridge: Cambridge University Press.
Postnikov MM (1986) Lie groups and Lie algebras. Mir: Moscow.
Spin02;2 = SL2 R SL2 R Sudbery A (1987) Division algebras (pseudo)orthogonal groups
and spinors. Journal of Physics A17: 939955.
Spin01;4 = Sp1;1 H Trautman A (1997) Clifford and the square root ideas.
Spin02;3 = Sp4 R; Spin01;5 = SL2 H Contemporary Mathematics 203: 324.
Trautman A and Trautman K (1994) Generalized pure spinors.
Spin02;4 = SU2;2 Journal of Geometry and Physics 15: 122.
Wall CTC (1963) Graded Brauer groups. Journal fur die Reine
Spin03;3 = SL4 R und Angewandte Mathematik 213: 187199.
Cluster Expansion 531
Cluster Expansion
R Kotecky, Charles University, Prague, and
Czech Republic, and the University of Warwick, UK
1 @
2006 Elsevier Ltd. All rights reserved. ; lim log Z; ; V 6
V!1 jVj @
X1 Z Q 3 Q 3 wg wg 12
zN HN d pi d r i
Z;; V e 1
N0
N! 3N
R V N h3N
Z P we can rewrite
X1
N r i r j
Y 3
e i;j d ri 4 X
1
N X Y
N0
N! V N Z; ; V wg 13
N0
N! fgl g g2G
In the second expression we absorbed the factor
resulting from the integration over impulses into with the sum running over all disjoint collections fgl g
(configurational) activity = (2m=h2 )3=2 z. In par- of connected graphs with vertices in {1, . . . , N}. A
ticular, the pressure p and the density are defined straightforward exponential expansion can be used to
by the thermodynamic limits (with V ! 1 in the show that, at least in the sense of formal power series,
sense of Van Hove)
X1
n X
1 1 log Z; ; V wg 14
p; lim log Z; ; V 5 n1
n! g2Cn
V!1 jVj
532 Cluster Expansion
where C[n] is the set of all connected graphs on n Vertices v 2 V are called abstract polymers, with
vertices. Using bn(V) to denote the coefficients two abstract polymers connected by an edge in the
graph G called incompatible. We shall refer to w(v)
1 1 X
bV
n wg 15 as to the weight of the abstract polymer v. For any
jVj n! g2Cn finite W V, we consider the induced subgraph
G[W] of G spanned by W and define
and observing that the limits limV ! 1 (1=jVj)w(g) of XY
cluster integrals exist, we get bn = limV ! 1 b(V)
n . The ZW w wv 18
convergence of Mayer series can be controlled directly IW v2I
by combinatorial estimates on the coefficients b(V)
n . As a Here the sum runs over all collections I of
result, the diameter of convergence of the series [7] and
compatible abstract polymers or, in other words,
[8] can be proved to be at least (C()e2B1 )1 . A less
the sum is over all independent sets I of vertices in
direct proof is based on an employment of linear
W (no two vertices in I are connected by an edge).
integral KirkwoodSalsburg equations in a suitable
The partition function ZW (w) is an entire function
Banach space of correlation functions.
in w = {w(v)}v2W 2 CjWj and ZW (0) = 1. Hence, it is
Similar combinatorial methods are available also
nonvanishing in some neighborhood of the origin
for evaluation of coefficients of the virial expansion
w = 0 and its logarithm is, on this neighbourhood, an
of pressure in powers of gas density,
analytic function yielding a convergent Taylor series
X
1 X
p; n n 16 log ZW w aW XwX 19
n1 X2X W
obtained by inverting [8] (notice that b1 = 1) and Here, X (W) is theQset of all multi-indices X : W !
inserting it into [7]. One is getting n = limV ! 1 n(V) {0 1, . . . } and wX = v w(v)X(v) . Inspecting the formula
with for aW (X) in terms of corresponding derivatives of
1 1 X log ZW (w), it is easy to show that the Taylor coefficients
nV wg 17 aW (X) actually do not depend on W : aW (X) = asupp
jVj n! g2Bn
X(X), where supp X = {v 2 V: X (v) 6 0}. As a result,
where B[n] C[n] is the set of all 2-connected one is getting the existence of coefficients a(X) such that
X
graphs on {1, . . . , n}; namely, those graphs that log ZW w aXwX 20
cannot be split into disjoint subgraphs by erasing X2X W
one vertex (and all adjacent edges). The diameter of
convergence of the virial expansion turns out to be for every finite W V.
no less than (C()e(e2B 1))1 . The coefficients a(X) can be obtained explicitly.
One can pass from [18] to [20] in a similar way as
passing from [10] to [13]. The starting point is to
Abstract Polymer Models replace the restriction to compatible collections of
abstract
Q polymers in the sum [18] by the factor
An application of the ideas of Mayer expansions to 0 (1 F(v; v0 )) with
v; v 2W
lattice models is based on a reformulation of the 8 0
partition function in terms of a polymer model, a < 0 if v and v are compatible
>
formulation akin to [13] above. Namely, the partition Fv; v0 1 otherwise v and v0 21
function is rewritten as a sum over collections of >
:
connected by an edge from G
pairwise compatible geometric objects polymers.
Most often, the compatibility means simply their and to expand the product afterwards. The resulting
disjointness. formula is
While the reformulation of physical partition X
function in terms of a polymer model (including the aX X!1 1jEHj 22
HGX
definition of compatibility) depends on particularities P
of a given lattice model and on the considered region of Here, G(X) is the graph with jXj = jX(v)j vertices
parameters high-temperature, low-temperature, large induced from G[supp X] by replacing each of its
external fields, etc. the essence and results of cluster vertices v by the complete graphQon jX(v)j vertices
expansion may be conveniently formulated in terms of and X! is the multifactorial X! = v2supp X X(v)!. The
an abstract polymer model. sum is over all connected subgraphs H G(X)
Let G = (V, E) be any (possibly infinite) countable spanned by the set of vertices of G(X) and jE(H)j
graph and suppose that a map w : V ! C is given. is the number of edges of the graph H.
Cluster Expansion 533
A useful property of the coefficients a(X) is their The restriction to compatible collections of polymers
alternating sign, can be actually relaxed. Namely, replacing [25] by
X Y Y
1jXj1 aX 0 23 ZW w wv Uv; v0 25
W 0 W v2W 0 v;v0 2W 0
More important than an explicit form of the
coefficients a(X) are the convergence criteria for the with U(v, v0 ) 2 [0, 1] (soft repulsive interaction), and
series [20]. One way to proceed is to find direct the condition [24] by
combinatorial bounds on the coefficients as expressed Y 1 rv0
by [22]. While doing so, one has to take into account the Rv rv 26
1 Uv; v0 rv0
cancelations arising in view of the presence of terms of v0 6v
opposite signs in [22]. Indeed, disregarding them would
one can prove that the partition function ZW (w)
lead to a failure since, as it is easy to verify, the number
does not vanish on the polydisk DW, R implying thus
of connected graphs on jXj vertices is bounded from
that the power series of log ZW (w) converges
below by 2(jXj1)(jXj2)=2 . An alternative approach is to
absolutely on DW, R .
prove the convergence of [20] on polydisks DW, R =
Polymers that arise in typical applications are
{w : jw(v)j R(v) for v 2 W} by induction in jWj,
geometric objects endowed with a support in the
once a proper condition on the set of radii R = {R(v);
considered lattice, say Zd , d 1, and their weights
v 2 V} is formulated. The most natural for the inductive
satisfy the condition of translation invariance. Cluster
proof (leading in the same time to the strongest claim)
expansions then yield an explicit power series for the
turns out to be the Dobrushin condition:
pressure (resp. free energy) in the thermodynamic
There exists a function r : V ! [0; 1) such that, for
limit as well as its finite-volume approximation.
each v 2 V
To formulate it for an abstract polymer model, we
Y
Rv rv 1 rv0 24 assume that for each x 2 Zd , an isomorphism
v0 2N v
x : G ! G is given and that with each abstract polymer
v 2 V a finite set (v) Zd is associated so that
Here N (v) is the set of vertices v0 2 V adjacent in (x (v)) = (v) x for every v 2 V and every x 2 Zd .
graph G to the vertex v. For any finite W V and any multi-index X, let
Using X to denote the set of all P multi-indices (W) = [v2W (v) and (X) = (supp(X)). On the
X : V ! {0; 1, . . . } with finite jXj = jX(v)j and other hand, for any finite Zd , let W() = {v 2
saying that X 2 X is a cluster if the graph G(supp V : (v) }. Assuming also that the weight w : V ! C
X) is connected, we can summarize the cluster is translation invariant that is, w(v) = w(x (v)) for
expansion claim for an abstract polymer model in every v 2 V and every x 2 Zd we get an explicit
the following way: expression for the pressure of abstract polymer model
in the thermodynamic limit
Theorem (Cluster expansion). There exists a func-
tion a : X ! R that is nonvanishing only on clusters, 1 X aXwX
so that for any sequence of diameters R satisfying p lim log ZW w 27
!1 jj jXj
X:X30
the condition [24] with a sequence {r(v)}, the
following holds true: In addition, the finite-volume approximation can be
(i) For every finite W V, and any contour weight explicitly evaluated, yielding
w 2 DW, R, one has ZW (w) 6 0 and
X log ZW w
log ZW w aXwX X jX \ j
X2X W
pjj aXwX 28
X:X\c 6;
jXj
P
(ii) X2X : suppX3v ja(X)jjwjX log(1 r(v)).
Using the claim (ii), the second term can be bounded
Notice that, we have got not only an absolute by const. j@j.
convergence of the Taylor series of log ZW in the closed
polydisk DW, R , but also the bound (ii) (uniform in W)
Cluster Expansions for Lattice Models
on the sum over all terms containing a fixed vertex v.
Such a bound turns out to be very useful in applications There is a variety of applications of cluster expan-
of cluster expansions. It yields, eventually, bounds on sions to lattice models. As noticed above, the first
various error terms, avoiding a need of an explicit step is always to rewrite the model in terms of a
evaluation of the number of clusters of given size. polymer representation.
534 Cluster Expansion
High-Temperature Expansions yielding [34] (1 t > e2t for t < 1=2). To have w 2
DW, R (for any W) is, for R(B) = (e2 )jBj , sufficient
Let us illustrate this point in the simplest case of the Ising
to take 0 with tanh 0 = e2 .
model. Its partition function in volume Zd , with
As a consequence, for 0 we can use the
free boundary conditions and vanishing external field, is
8 9 cluster expansion theorem to obtain a convergent
>
< >
= power series in powers of tanh . In particular,
X X
Z exp x y 29 using (X) = [B2suppX (B), we get the pressure by
>
: x;y2 >
; the explicit formula
jxyj1
p
Using the identity
X aX X 37
ex y cosh x y sinh 30 log 2 d logcosh w
X:X3x
jXj
it can be rewritten in the form
X for any fixed x 2 Zd (by translation invariance of
Z 2jj cosh jBj tanh jBj 31 the contributing terms, the choice of x is irrelevant).
B The function p() is analytic on the region 0
Here, the sum runs over all subsets B of the set B() of since it is obtained as a uniformly absolutely
all bonds in (pairs of nearest-neighbor sites from ) convergent series of analytic terms ( tanh )jXj .
such that each site is contained in an even number of This type of high-temperature cluster expansion
bonds from B. Using (B) to denote the set of sites can be extended to a large class of models P with
contained in bonds from B, we say that B1 , B2 B() Boltzmann factor in the form exp { A UA (
)},
are disjoint if (B1 ) \ (B2 ) = ;. Splitting now B into a where
= (
x ; x 2 Zd ) is the configuration with
collection B = {B1 , . . . , Bk } of its connected components a priori on-site probability distribution (d
x ) and
called (high-temperature) polymers and using B() to UA , for any finite A Zd , are the multi-site
denote the set of all polymers in , we are getting interactions (depending only on (
x ; x 2 A)). Using
X Y the Mayer trick we can rewrite
Z 2jj cosh jBj tanh jBj 32 ( )
X Y
BB B2B
exp UA
1 fA
38
with the sum running over all collections B of mutually A A
2jj ( cosh )jB()j ZB() (w). under appropriate bounds on the interactions UA
To apply the cluster expansion theorem, we have to and for small enough, using (A) to denote the set
find a function r such that the right-hand side of [24] is [A2A A, we get,
positive and yields thus the radius of a polydisk of X
convergence. Taking r(B) = jBj with a suitable , we get jwAj 1 40
Y A:A 3 x
1 rB0 e2jBj 34
B0 2NB This assumption allows, as before in the case of the
2 jBj high-temperature Ising model, to apply the cluster
allowing to choose R(B) = r(B)e2jBj = (e ) .
expansion theorem yielding an explicit series expan-
Indeed, to verify [34] we just notice that the number
sion for the pressure.
of polymers of size n containing a fixed site is
bounded by n with a suitable constant . Thus,
X X
1 Correlations
0
jB j n n 1 35
Cluster expansions can be applied for evaluation of
B0 : B0 3x n1
decay of correlations. Let us consider, for the class
once is sufficiently small, and thus of models discussed above, the expectation
X Z Y
jBj jBj jBj 36 1
hi
eH
d
x 41
B0 2NB Z x2
Cluster Expansion 535
P
with H (
) = A UA (
) and a function we extend AS () to AS = [ AS () and X S,A0 () to
depending only on variables
x on sites x from a X S,A0 = [ X S,A0 (). As a result, we have an explicit
finite set S Zd . expression for the limiting expectation hi in terms of
A convenient way of evaluating the expectation starts an absolutely convergent power series. This can be
with introduction of the modified partition function immediately applied to show that jhi hi j decay
exponentially in distance between S and the comple-
Z; Z Z; Z 1 hi 42
ment
P of . Indeed, it suffices to find a suitable bound on
X
Clearly, X ja(X)jjwj with the sum running over all clusters
X reaching from the set S to c . To this end one does not
d log Z;
hi 43 need to evaluate explicitly the P number of clusters of
d 0 given diameter diam(X)= A X(A) diam((A))=m
Thus, one may get an expression for the expectation with m dist(S,c ). The needed estimate is actually
hi , by forming a polymer representation of Z, () already contained in the condition (ii) from the cluster
and isolating terms linear in in the corresponding expansion theorem. It just suffices to choose a suitable
cluster expansion. For the first step, in the just cited k and assume that is small P enough to assure validity
high-temperature case with general multi-site inter- of (40) in a stronger form, A:(A)3x jw(A)jK(A)j 1,
actions, we first enlarge the original set A() of all yielding eventually
X
polymers in (consisting of connected collections c
jaXjjwjX KdistS; jSj
A = (A1 , . . . , Ak )) to W S () = A() [ AS (), where X : diamX distS; c
AS () is the set of all collections (A1 , . . . , Ak ) of X P
polymers such that each of them intersects the set S jaXjjwjX K XAjAj
for A = (A1 , . . . , Ak ) 2 AS (), we get Z, () The resulting claim can be readily generalized to one
exactly in the form [18], about the decay of the correlation h1 ; . ..; k i in
X Y terms of the shortest tree connecting supports
Z; w A 45 S1 , ... , Sk of the functions 1 , . .., k .
I W S A2I
Low-Temperature Expansions
As a result, we have
X Finally, in some models with symmetries, we can apply
log Z ; aXwX
46 cluster expansion also at low temperatures. Let us
X2XW S illustrate it again in the case of Ising model. This time,
we take the partition function Z () with plus
allowing easily to isolate terms linear in : namely,
boundary conditions. First, let us define for each
the terms with multi-indices X with supp X \ AS ()
nearest-neighbor bond hx, yi its dual as the (d 1)-
consisting of a single collection, say A0 , that occurs
dimensional closed unit hypercube orthogonal to the
with multiplicity one, X(A0 ) = 1. Explicitly, using
segment from x to y and bisecting it at its center. For a
X S;A0 fX 2 X W S : supp X \ AS given configuration , we consider the boundary of
fA0 g; XA0 1g 47 the regions of constant spins consisting of the union
@( ) of all hypercubes that are dual to nearest-
we get neighbor bonds hx, yi for which x 6 y . The contours
X X corresponding to are now defined as the connected
hi aXwX 48
A0 2AS X2X S;A0
components of @( ). Notice that, under the fixed
boundary condition, there is a one-to-one correspon-
It is easy to show that, for sufficiently small , the series dence between configurations and sets of
on the right-hand side is absolutely convergent even if mutually compatible (disconnected) contours in .
536 Cluster Expansion
Observing that the number of faces in @( ) is just does not vanish only if A(X) \ 6 ;, we can expand
the sum of the areas j
j of the contours
2 , we the product to obtain decorations of the boundary
get the polymer representation @ by clusters fX . In the case of interface these clusters
! can be incorporated into the weight of interface, while
X X
Z e jEj
exp j
j 51 on a fixed boundary they yield a wall free energy.
2 The possibility of the (low-temperature) polymer
representation of the partition function in terms of
where the sum is over all collections of disjoint contours is based on the $ symmetry of the
contours in . Here E() is the set of all bonds hx, yi Ising model. In absence of such a symmetry, cluster
with at least one endpoint x, y in . expansions can still be used, but in the framework of
The condition [24] with r(
) =
yields a similar PirogovSinai theory (see PirogovSinai Theory).
bound on the weights w(
) = ej
j as in the high-
temperature expansion. To verify it, for sufficiently
large, boils down to the evaluation of number of Bibliographical Notes
contours of size n that contain a fixed site.
As a result, we can employ the cluster expansion Cluster expansions originated from the works of Ursell,
theorem to get Yvon, Mayer, and others and were first studied in terms
X of formal power series. The combinatorial and enu-
log Z jEj aXwX 52 meration problems considered in this framework were
X:X2X C summarized in Uhlenbeck and Ford (1962). For related
with an explicit formula for the limit topics in modern language, see Bergeron et al. (1998).
The convergence results for Mayer and virial expansions
X aX X for dilute gas were first proved in the works of Penrose,
p d w 53
jAXj Lebowitz, Groenveld, and Ruelle (see Ruelle (1969) for
X:AX30
a detailed survey). General polymer models on lattice
Here, A(X) is the set of sites attached to contours were discussed by Gruber and Kunz (1971) (see also
from supp X, Simon (1993) for discussion of high-temperature and
low-temperature cluster expansions of lattice models).
AX [
2supp X A
54
Abstract polymer models were introduced in Kotecky
with and Preiss (1986). An elegant proof of a general claim
presented by Dobrushin (1996) was further extended
A
fx 2 Zd j such that distx;
1=2g 55 and summarized by Scott and Sokal (2005). We follow
As a consequence of the fact that [53] is, for large their reformulation of the Dobrushin condition. Cluster
, an absolutely convergent
P sum of analytic terms expansions with a view on applications in quantum field
a(X)wX = a(X)e
X(
)j
j
(considered as functions theory are reviewed in Brydges (1986).
of ), the function p() is, for large , analytic in .
See also: Phase Transitions in Continuous Systems;
The fact that one can explicitly express the
PirogovSinai Theory; Wulff Droplets.
difference log Z () jjp() (cf. [28]) found
numerous applications in situations where one
needs an accurate evaluation of the influence of the Further Reading
boundary of the region on the partition function.
One such example is a study of microscopic Bergeron F, Labelle G, and Leroux P (1998) Combinatorial
behavior of interfaces. The main idea is to use the Species and Tree-Like Structures, Coll. Encyclopaedia of
Mathematics and Its Applications, vol. 67. Cambridge, MA:
explicit expression in the form Cambridge University Press.
Z Brydges DC (1986) A short course on cluster expansions. In:
8 9 Osterwalder K and Stora R (eds.) Critical Phenomena, Random
< X = Systems, Gauge Theories, pp. 129183. Les Houches, Session
X jAX \ j
expfpjjgexp aXw XLIII, 1984. Amsterdam/New York: Elsevier.
: jAXj ;
X:AX\c 6; Dobrushin RL (1996) Estimates of semi-invariants for the Ising
Y model at low temperatures. In: Dobrushin RL, Minlos RA,
expfpjjg 1 fX 56 Shukin MA, and Vershik AM (eds.) Topics in Statistical and
X:AX\c 6; Theoretical Physics, pp. 5981. Providence, RI: American
Mathematical Society.
Noticing that
Gruber C and Kunz H (1971) General properties of polymer
systems. Communications Mathematical Physics 22: 133161.
jAX \ j
fX exp aXwX 1 Kotecky R and Preiss D (1986) Cluster expansion for abstract polymer
jAXj models. Communications in Mathematical Physics 103: 491498.
Coherent States 537
Ruelle D (1969) Statistical Mechanics: Rigorous Results, The Simon B (1993) The Statistical Mechanics of Lattice Gases, Princeton
Mathematical Physics Monograph Series. Reading, MA: Series in Physics, vol. 1. Princeton: Princeton University Press.
Benjamin. Uhlenbeck GE and Ford GW (1962) The theory of linear graphs with
Scott AD and Sokal AD (2005) The repulsive lattice gas, the applications to the theory of the virial development of the
independent-set polynomial, and the Lovasz local lemma. properties of gases. In: de Boer J and Uhlenbeck GE (eds.) Studies
Journal of Statistical Physics 118: 11511261. in Statistical Mechanics, vol. I, Amsterdam: North-Holland.
Coherent States
S T Ali, Concordia University, Montreal, QC, Canada and group-theoretical properties which are taken as
2006 Elsevier Ltd. All rights reserved.
starting points in looking for generalizations. We
now define the canonical coherent states mathemati-
cally and enumerate a few of these properties.
Introduction Suppose that the vectors j0i, j1i, . . . , jni, . . . , cor-
respond to quantum states of 0, 1, . . . , n, . . . , exci-
Very generally, a family of coherent states is a set of tons, respectively. The Hilbert space of these states,
continuously labeled quantum states, with specific in which they form an orthonormal basis, is often
mathematical and physical properties, in terms known as Fock space. The canonical coherent states
of which arbitrary quantum states can be expressed are then defined in terms of this basis, for each
as linear superpositions. Since coherent states are complex number z, by the analytic expansion:
continuously labeled, they form overcomplete
sets of vectors in the Hilbert space of states. 2 X1
zn
jzi ejzj =2
p jni 1
Originally these states were introduced into physics n!
n0
by Schrodinger (1926), as a family of quantum
states in terms of which the transition from quantum The states jzi are normalized to unity: hzjzi = 1.
to classical mechanics could be conveniently studied. They satisfy the formal eigenvalue equation
These states have the minimal uncertainty property,
in the sense that they saturate the Heisenberg ajzi zjzi 2
uncertainty relations. The name coherent state was where a is the annihilation operator for excitons, which
applied when these states were rediscovered in the acts on the basis vectors (Fock states) jni as follows:
context of quantum optical radiation by Glauber, p
Klauder, and Sudarshan. It was demonstrated that in ajni njn 1i 3
these states the correlation functions of the quantum
optical field factorize as they do in classical optics, Its adjoint ay has the action
so that the optical field has a near-classical behavior, p
ay jni n 1jn 1i 4
with the optical beam being coherent. In this article,
we shall refer to these originally studied coherent and
states as canonical coherent states (CCS).
The canonical coherent states, apart from their a; ay aay ay a I 5
use in quantum optics, have also been found to be
I being the identity operator on Fock space.
extremely useful in computations in atomic and
Introducing the self-adjoint operators Q and P, of
molecular physics, in quantum statistical mechanics,
position and momentum, respectively,
and in certain areas of mathematics and mathema-
tical physics, including harmonic analysis, symplec- a ay a ay
tic geometry, and quantization theory. Their wide Q p ; P p 6
2 i 2
applicability has prompted the search for other
families of states sharing similar mathematical and it is possible to demonstrate the minimal uncertainty
physical properties. These other families of states are property referred to above (we take h = 1):
usually called generalized coherent states, even when hQihPi 12 7
there is no link to optical coherence in such studies.
where for any observable A,
h i1=2
Some Properties of CCS hAi hzjA2 jzi hzjAjzi2
In addition to the minimal uncertainty property, the
canonical coherent states have a number of analytical is its dispersion in the state jzi.
538 Coherent States
One can also prove the resolution of the identity, The operators U(q, p) realize a (projective) unitary,
Z irreducible representation of the WeylHeisenberg
dq dp
jzihzj I 8 group, which is the group whose Lie algebra has the
C 2 generators Q, P, and I, obeying the commutation
p
where z = (1= 2)(q ip) has been written pin terms relations [Q, P] = iI. The existence of the resolution
of pits
real and imaginary parts (1= 2)q and of the identity [8] is the statement of the fact that
(1= 2)p, respectively. The above operator integral this representation is square integrable (a notion
is to be understood in the weak sense, as will be which will be elaborated upon in the section Some
explained later. Equation [8] incorporates the examples) which gives us the next paradigm for
mathematical fact that the set of vectors jzi is building coherent states, namely by the action, on a
overcomplete in the Hilbert space. Indeed, using [8] fixed vector, of the unitary operators of a square-
any vector ji in the Hilbert space can be written as integrable representation of a locally compact
a linear (integral) superposition of these states: group.
Z The above range of properties, which are enjoyed
dq dp by the CCS, cannot all be expected to hold when
ji zjzi
C 2 looking for generalizations. It then becomes neces-
sary to adopt one or other of these properties as the
where is the component function, (z) = hjzi.
starting point and to proceed from there. In so
Thus, the coherent states jzi form a continuously
doing, it is best first to set down a general definition
labeled total set of vectors in the Hilbert space and
of coherent states, involving a minimal mathema-
since this space is separable, they are an over-
tical structure. Motivated more by possible applica-
complete set.
tions to physics, we do this in the following section.
Analytic properties of the vectors jzi emerge when
the scalar product hjzi is taken with respect to an
arbitrary vector ji in Fock space. From [1] it is General Definition
clear that
Let H be an abstract, separable Hilbert space over
jzj2 =2 the complexes, X a locally compact space and d a
Fz hjzi e f z
measure on X. Let jx, ii be a family of vectors in H ,
where f is an entire analytic function in the complex defined for each x in X and i = 1, 2, 3, . . . , N, where
variable z. Moreover, the mapping 7! f is an N is usually a finite integer, although it could also
isometric embedding of the Fock space onto the be infinite. We assume that this set of vectors
Hilbert space of analytic functions, with respect to possesses the following properties:
the norm
1. For each i, the mapping x 7! jx, ii is weakly
Z 1=2 continuous, that is, for each vector ji in H , the
kf k jf zj2 dz; z 9 function i (x) = hx, iji is continuous (in the
C
topology of X).
2
defined by the measure d(z, z) = (1=2)ejzj dq dp. 2. For each x in X, the vectors jx, ii, i = 1, 2, . . . , N,
Group-theoretical properties of the CCS can be are linearly independent.
demonstrated by noting that 3. The resolution of the identity
XN Z
ay n
jni p j0i and aj0i 0 jx; iihx; ijdx IH 12
n! i1 X
using which [1] can be recast into the form holds in the weak sense on the Hilbert space H ,
2
that is, for any two vectors ji,j i in H , the
=2 zay
jzi ejzj e j0i Uzj0i following equality holds:
10
zay
Uz e za X N Z
hjx; iihx; ij idx hj i
The vectors jzi and the unitary operator U(z) can be i1 X
reexpressed in terms of the real variables q, p and the
A set of vectors jx, ii satisfying the above three
operators Q, P as
properties is called a family of generalized vector
jzi jq; pi Uq; pj0i coherent states. In case N = 1, the set is called a family
11 of generalized coherent states. Sometimes the resolu-
Uq; p eipQqP tion of the identity condition is replaced by a weaker
Coherent States 539
i
condition, with the vectors jx, ii simply forming a total defined by xx (y) = K(y, x)ei , is the image in H K of
set in H and the functions Fi (x) = hx, iji, as ji runs the generalized vector coherent state jx, ii, under the
i
through H , forming a reproducing kernel Hilbert above-mentioned isometry. The vectors xx span
space. Alternatively, the identity on the right-hand the space H K and for an arbitrary element Y of this
side of [12] could also be replaced by a bounded, Hilbert space, the reproducing property [16] of the
positive operator T with bounded inverse. In this case, kernel implies the relation
the term frame is also used for the family of general- Z
ized coherent states. For physical applications, how- Kx; yYydy Yx 17
ever, the resolution of the identity condition is always X
assumed to hold, although the measure d could be of Conversely, given any reproducing kernel Hilbert
a very general nature (possibly also singular). The space, with a kernel satisfying the relations [15] and
objective in all these cases is to ensure that an arbitrary [16], generalized coherent states can be constructed
vector ji be expressible as a linear (integral) as above in terms of this kernel. Mathematically,
combination of these vectors. Indeed, [12] is immedi- therefore, generalized coherent states are just the set
ately seen to imply that of vectors naturally defined by the kernel in a
XN Z reproducing kernel Hilbert space.
ji i xjx; iidx 13
i1 X
and enjoying the properties D being an open disk in the complex plane of radius
L,
P1 the n radius
p of convergence of the series
Kx; yij Ky; xji ; Kx; xii > 0 15 n=0 (z = xn !). (In the case of the CCS, L = 1.)
The measure d is generically of the form d d(r)
and (for z = rei ), where d is related to the xn ! through
N Z
X the moment condition
Kx; zi Kz; yj dz Kx; yij 16 Z L
X xn !
1 r 2n dr; n 0; 1; 2; . . . 20
2 0
If ei , i = 1, 2, . . . , N, are the vectors constituting the
canonical basis of CN , then for each x in X and This means that once the quantities xn ! are specified,
i
i = 1, 2, . . . , N, the vector-valued function xx on X, the measure d is to be determined by solving the
540 Coherent States
moment problem [20], which of course may not generalized coherent states arise from representa-
always have a solution. This puts a constraint on the tions of the group SU(1, 1) belonging to the discrete
type of sequences {xn } which may be used in the series, each irreducible representation being labeled
construction. by a specific value of the index . The associated
Once again, we see that for an arbitrary vector ji Hilbert space of functions, analytic on the unit disk,
in the Fock space, the function F(z) = h j zi, of the is a subspace of L2 (D, d ), with
complex variable z, is of the form F(z) =
N (jzj2 )1=2 f (z), where f is an analytic function on 1 r2 22
d z; z 2 1 r dr d
the domain D. The reproducing kernel associated to
these coherent states is z rei
where I (x) is the order- modified Bessel function independent of whether the left- or the right-invariant
of the first kind. These coherent states satisfy the measure is used, so we could just as well have used
resolution of the identity, the right-invariant measure.) A vector j i, satisfying
Z [35], is said to be admissible, and it can be shown
2
jzihzjK21 2rI21 2rr dr d I that the existence of one such vector guarantees the
C 31 existence of an entire dense set of such vectors in H .
z rei Moreover, if the group G is unimodular, that is, if the
left- and the right-invariant measures coincide, then
where again, K (x) is the order- modified Bessel the existence of one admissible vector implies that
function of the second kind. every vector in H is admissible. Given a square-
A nonanalytic extension of the expression [18] is integrable representation and an admissible vector
often used to define generalized coherent states j i, let us define the vectors
associated to physical Hamiltonians having pure
point spectra. These coherent states, known as 1
GazeauKlauder coherent states, are labeled by jgi p Ugj i 36
c
actionangle variables. SupposePthat we are given
the physical Hamiltonian H = 1 n = 0 En jnihnj, with for all g in the group G. These vectors are to be seen
E0 = 0, that is, it has the energy eigenvalues En and as the analogs of the canonical coherent states [11],
eigenvectors jni, which we assume to form an written there in terms of the representation of the
orthonormal basis for the Hilbert space of states H . WeylHeisenberg group. Next, it can be shown that
Let us write the eigenvalues as En = ! n by introdu- the resolution of the identity
cing a sequence of dimensionless quantities { n } Z
ordered as: 0 = 0 < 1 < 2 < . Then, for all J 0 jgihgjdg IH 37
and
2 R, the GazeauKlauder coherent states are G
defined as
holds on H . Thus, the vectors jgi constitute a family
X1 n=2 i n
If is a function in L2 (R, dx) such that its Fourier Choosing a coset representative g(x) 2 G, for each
transform b satisfies the condition coset x, we define the vectors
Z jxi Ugxj i 45
j bkj2
dk < 1 40
R jkj in H . The dependence of these vectors on the specific
choice of the coset representative g(x), is only
then it can be shown to be an admissible vector, that is, through a phase. Thus, if instead of g(x) we took a
Z different representative g(x)0 2 G for the same coset
db da
c jh jUb; a ij2 <1 x, then since g(x)0 = g(x)h for some h 2 H, in view of
GAff a2 [44] we would have U(g(x)0 )j i = ei!(h) jxi. Hence,
quantum mechanically, both jxi and U(g(x)0 )j i
Thus, following the general construction outlined
represent the same physical state and in particular,
above, the vectors
the projection operator jxihxj depends only on the
1 coset. Vectors jxi, defined in this manner, are called
jb; ai p Ub; a ; b; a 2 GAff 41 GilmorePerelomov coherent states. Since U is
c
assumed to be irreducible, the set of all these vectors
as x runs through G=H is dense in H . In this
define a family of generalized coherent states and
definition of generalized coherent states, no resolu-
one has the resolution of the identity
tion of the identity is postulated. However, if X
Z carries an invariant measure, under the natural
db da
jb; aihb; aj I 42 action of G, and if the formal operator B defined as
GAff a2
Z
on L2 (R, dx). B jxihxj dx
X
In the signal-analysis literature a vector satisfying
the admissibility condition [40] is called a mother is bounded, then it is necessarily a multiple of the
wavelet and the generalized coherent states [41] are identity and a resolution of the identity is again
called wavelets. Signals are then identified with retrieved.
vectors ji in L2 (R, dx) and the function The Perelomov construction can be used to define
coherent states for any locally compact group. On
Fb; a hb; aji 43 the other hand, there exist other constructions of
generalized coherent states, using group representa-
is called the continuous wavelet transform of the tions, which generalize the notion of square integr-
signal . ability to homogeneous spaces of the group. Briefly,
There exist alternative ways of constructing in this approach one starts with a unitary irreducible
generalized coherent states using group representa- representation U and attempts to find a vector j i, a
tions. For example, the Perelomov method is based subgroup H and a section : G=H ! G such that
on the observation that the vector j0i, appearing in Z
the construction of the canonical coherent states in
jxihxj dx T 46
[10] and [11] using the representation of the Weyl G=H
Heisenberg group, is invariant up to a phase, under
the action of its center. Consequently, the coherent where jxi = U((x))j i, T is a bounded, positive
states jzi, as written in [10], are labeled, not by operator with bounded inverse and d is a quasi-
elements of the group itself, but only by the points in invariant measure on X = G=H. It is not assumed
the quotient space of the group by its (central) phase that j i be invariant up to a phase under the action
subgroup. Generally, let G be a locally compact of H and clearly, the best situation is when T is a
group and U a unitary irreducible representation of multiple of the identity. Although somewhat techni-
it on the Hilbert space H . We do not assume U to be cal, this general construction is of enormous
square integrable. We fix a vector j i in H , of unit versatility for semidirect product groups of the type
norm and denote by H the subgroup of G consisting Rn o K, where K is a closed subgroup of GL(n, R).
of all elements h for which Thus, it is useful for many physically important
groups, such as the Poincare or the Euclidean group,
Uhj i ei!h j i 44 which do not have square-integrable representations
in the sense of the earlier definition (see eqn [35]).
where ! is a real-valued function of h. Let X = G=H The integral condition [46] ensures that any vector
be the left-coset space and x an arbitrary element in X. ji in H can be written in terms of the jxi. Indeed, it
Coherent States 543
is easy to see that one has the integral representation taking the combination Q iP, one obtains the
of a vector, minimal uncertainty states,
Z p
y 2 y
ji xjxi dx jz; i N z; 1=2 ewa =2 ez= 21wa j0i 50
X
x hxjT 1 i N (z, ) being a normalization constant and
w = (1 )=(1 ). The case = 1 does not lead
in terms of the generalized coherent states. to any solutions, while = 1 gives the canonical
The canonical coherent states satisfy the minimal coherent states [10]. For real 6 1 the above states
uncertainty relation [7]. It is possible to build are the well-known squeezed states of quantum
families of coherent states by generalizing from this optics.
condition. To do this, one typically starts with two Our final example is that of a family of vector
self-adjoint generators in the Lie algebra of a coherent states, which will be obtained essentially
particular group representation and then looks for by replacing the complex variable z in [18] by a
appropriate eigenvectors of a complex combination matrix variable. We choose the domain = C22
of these two generators. For two self-adjoint (all 2 2 complex matrices), equipped with the
operators B and C on a Hilbert space H , satisfying measure
the commutation relation [B, C] = iD and any
y
normalized vector in H , one can prove the y etrZ Z Y
2
functions in the variable Z y , with the matrix valued As already mentioned, generalized coherent states
kernel K : 7! C22 : are widely used in signal analysis. The wavelet
transform F(b, a) = hb, aji, introduced in [43], is a
2 X
X 1
KZ 0y ; Z Y ik Z 0y Y ik Z y y timefrequency transform, in which the parameter b
i1 k0 is identified with time and 1/a with frequency.
2 X
X Wavelet transforms are used extensively to analyze,
1
Z 0yk Z k
54 encode, and reconstruct signals arising in many
i1 k0
bk different branches of physics, engineering, seismo-
graphy, electronic data processing, etc. Similarly, the
Vector coherent states in H K are then naturally
canonical coherent states, as written in [11], give
associated to this kernel and are given by
rise to the transform F(q, p) = hq, p j i. Again, if q is
X2 X 1
jy Z k i j interpreted as time and p as frequency, then this is
jZ ; ii p jY k i just the windowed Fourier transform, also used
j1 k0 bk 55
extensively in signal processing. More general
0y 0y
that is; jZ ; iiZ KZ ; Z i wavelets, from higher-dimensional affine groups,
are used to analyze higher-dimensional signals,
for i = 1, 2 and all Z in . They satisfy the resolution while wavelet like transforms from other groups
of the identity have been used to study signals exhibiting different
X2 Z geometries. In particular, wavelet transforms from
jZ ; iihZ ; ijdZ ; Z y IH K 56 spherical geometries have been applied to the study
i1 of brain signals and to astrophysical data.
Our final example is taken from quantization
The expression for the jZ , ii in [55], involving the theory. A quantization technique is a method for
sum, should be compared to [18], of which it is a performing the transition from a given classical
direct analog. mechanical system to its quantum counterpart.
Many methods have been developed to accomplish
this and the use of coherent states is one of them.
Some Applications of Coherent States Suppose that we are given a family of coherent
states jxi in a Hilbert space H , where the set X from
Generalized coherent states have many applications
which x is taken is a classical phase space. This
in physics, signal analysis, and mathematics, of
means that X is a symplectic manifold with an
which we mention a few here. As an example of
associated 2-form !, which defines a Poisson
an application of deformed coherent states, we take
bracket on the set of observables of the classical
n system, which are real-valued functions on X. There
q qn 1=2
xn ; q>0 57 is a natural measure d!, defined on X by the 2-form
q q1
!. Let us assume that the coherent states jxi satisfy a
in the definition of these states in [18]. It is then easy resolution on the identity with respect to this
to see that the operators A and Ay , defined in [23], measure:
satisfy the q-deformed commutation relation
Z
y y N
AA qA A q 58 jxihxjd!x IH
X
where N is the usual number operator, which acts
on the Fock states as Njni = njni. Clearly, in the In this case, the coherent states may be used to
limit as q ! 1, these q-deformed coherent states go quantize the observables of the classical system in
over to the canonical coherent states, with the the following way: let f be a real-valued function on
operators A and Ay becoming the usual creation X, representing a classical observable and suppose
and annihilation operators a and ay , respectively. that the formal operator
The operators A and Ay and the commutation Z
relation [58] describe a system of q-deformed b
f f xjxihxjd!x 59
oscillators, which have been used to describe, for X
example, the vibrations of polyatomic molecules.
The potential energy between the atoms of such is well defined as a self-adjoint operator on H . Then
a molecule has anharmonic terms, leading to we may take the operator b f to be the quantized
a deformation of the usual oscillator algebra, observable corresponding to the classical observable
generated by the operators a and ay . f. Suppose that we have two such operators, b f and b
g,
Cohomology Theories 545
corresponding to the two classical observables f and It can be verified that these two operators satisfy the
g, which have the Poisson bracket {f , g}, defined via canonical commutation relations [Q, P] = iIH , as
the 2-form !. We then check if the quantization required.
condition
2 b See also: Solitons and KacMoody Lie Algebras;
ff;d
gg f ; b
g 60 Wavelets: Mathematical Theory.
ih
where h is Plancks constant, is satisfied. Generally
this will be the case for a certain number of classical
Further Reading
observables. This method of quantization has been
most successfully used for manifolds X which have a Ali ST, Antoine J-P, and Gazeau J-P (2000) Coherent States,
(complex) Kahler structure. Over such a manifold, Wavelets and Their Generalizations. New York: Springer.
one can define a Hilbert space of analytic functions, Ali ST and Englis M (2005) Quantization methods a guide for
physicists and analysts. Reviews in Mathematical Physics
which has a reproducing kernel and hence a 17: 391490.
naturally associated set of coherent states. As a Brif C (1997) SU(2) and SU(1,1) algebra eigenstates: a unified
specific example, we take the case of canonical analytic approach to coherent and intelligent states. Interna-
coherent states [11]. We can identify the complex tional Journal of Theoretical Physics 36: 16511682.
plane C with the phase space R2 of a free classical Klauder JR and Sudarshan ECG (1968) Fundamentals of
Quantum Optics. New York: Benjamin.
particle having a single degree of freedom. The Klauder JR and Skagerstam BS (1985) Coherent States
measure d! in this case is just (1=2)dq dp. If we Applications in Physics and Mathematical Physics. Singapore:
now quantize the classical observables f (q, p) = q World Scientific.
and f (q, p) = p, of position and momentum, respec- Perelomov AM (1986) Generalized Coherent States and their
tively, using the canonical coherent states, we obtain Applications. Berlin: Springer.
Schrodinger E (1926) Der stetige Ubergang von der Mikro- zur
the two operators Makromechanik. Naturwissenschaften 14: 664666.
Z Sivakumar S (2000) Studies on nonlinear coherent states. Journal
dq dp
Q qjq; pihq; pj of Optics B: Quantum Semiclass. Opt. 2: R61R75.
2 2 Zhang W-M, Feng DH, and Gilmore RG (1990) Coherent states:
ZR 61
dq dp theory and some applications. Reviews of Modern Physics
P pjq; pihq; pj 62: 867927.
R2 2
Cohomology Theories
U Tillmann, University of Oxford, Oxford, UK To illustrate the interplay between the local and
2006 Elsevier Ltd. All rights reserved. global structure, consider the Euler characteristic of
a compact manifold; as will be explained below,
cohomology is a refinement of the Euler character-
Introduction istic. For simplicity, assume that the manifold M is a
surface and that we have chosen a way of dividing
The origins of cohomology theory are found in the surface into triangles. The Euler characteristic is
topology and algebra at the beginning of the last then defined to be
century but since then it has become a tool of nearly
every branch of mathematics. Its a way of life! M F E V
Naturally, this article can only give a glimpse at the where F denotes the number of faces, E the number
rich subject. We take here the point of view of of edges, and V the number of vertices in the
algebraic topology and discuss only the cohomology triangulation. Remarkably, this number does not
of spaces. depend on the triangulation. Yet, this simple, easy to
Cohomology reflects the global properties of a compute number can already distinguish the differ-
manifold, or more generally of a topological space. ent types of closed, oriented surfaces: for the sphere
It has two crucial properties: it only depends on the we have = 2, the torus = 0, and in general for
homotopy type of the space and is determined by any surface Mg of genus g
local data. The latter property makes it in general
computable. Mg 2 2g
546 Cohomology Theories
The Euler characteristic also tells us something Z, C2 , C1 , C0 are the free abelian groups generated
about the geometry and analysis of the manifold. For by the set of faces, edges, and vertices, respectively;
example, the total curvature of a surface is equal to its Ci = {0} for i 3. The map @2 assigns to a triangle
Euler characteristic. This is the GaussBonnet theo- the sum of its edges; @1 maps an edge to the sum of
rem and an analogous result holds in higher dimen- its endpoints. If we are working with Z2 coeffi-
sions. Another striking result is the PoincareHopf cients, this defines for us a chain complex as [2] is
theorem which equates the Euler characteristic with clearly satisfied; in general, one needs to keep track
the total index of a vector field and thus gives strong of the orientations of the triangles and edges and
restrictions on what kind of vector fields can exist on take sums with appropriate signs (cf. [6] below). An
a manifold. This interplay between global analysis easy calculation shows that for an oriented, closed
and topology has been one of the most exciting and surface Mg of genus g, we have
fruitful research areas and is most powerfully
H0 Mg ; Z Z
expressed in the celebrated AtiyahSinger index
theorem, which determines the analytic index of an H1 Mg ; Z Z2g
4
elliptic operator, such as the Dirac operator on a spin H2 Mg ; Z Z
manifold, in terms of cohomology classes.
Hi Mg ; Z 0 for i 3
Note that the Euler characteristic can be recov-
Chain Complexes and Homology ered as the alternating sum of the rank of the
homology groups:
There are several different geometric definitions of
the cohomology of a topological space. All share XM
dim
cycle, and we may define the quotient vector space @n : 1i jv0 ;...;^vi ;...;vn 6
i0
(R-module), the ith-dimensional homology,
One easily checks that the boundary of a boundary is
ker@i zero, and hence (S (X), @ ) defines a chain complex.
Hi C ; @ : 3
im@i1 Its homology is by definition the singular homology
(C , @ ) is exact if all its cycles are boundaries. H (X; Z) of X. For any simplicial space, the inclusion
Homology thus measures to what extent the of the simplicial chains into the singular chains
sequence [1] fails to be exact. induces an isomorphism of homology groups. In
particular, this implies that the simplicial homology
of a manifold, and hence its Euler characteristic do
Simplicial Homology
not depend on its triangulation.
A triangulation of a surface gives rise to its If in the definition of simplicial and singular
simplicial chain complex: Taking coefficients in homology we take free R-modules (where R may
Cohomology Theories 547
also be a field) instead of free abelian groups, we get and b in B to @c := @ n a = @n b. For example,
the homology H (X; R) of X with coefficients in R. consider two cones, A and B, on a space X and
The universal-coefficient theorem describes the identify them at the base X to define the suspension
homology with arbitrary coefficients in terms of the X of X. Then X = A [ B with A, B pt and A \
homology with integer coefficients. In particular, if R B X. The boundary map @ is then an isomorphism:
is a field of characteristic zero,
~ n X; R Hn1 X; R for all n 0
H 7
dim Hn X; R rk Hn X; Z
From this one can easily compute the homology of a
sphere. First note that
Basic Properties of Singular Homology ~ 0 X; Z Zk1
H
While simplicial homology (and the more efficient
where k is the number of connected components in
cellular homology which we will not discuss) is
X. Also, Sn Sn1 n S0 . Thus, by [7],
easier to compute and easier to understand geome-
trically, singular homology lends itself more easily to ~ Sn ; Z 0 for 6 n
Hn Sn ; Z Z and H 8
theoretical treatment.
If Y is a subspace of X, relative homology groups
1. Homotopy invariance. Any continuous map H (X, Y; R) can be defined as the homology of the
f : X ! Y induces a map on homology quotient complex S (X)=S (Y). When Y has a good
f : H (X; R) ! H (Y; R) which only depends on neighborhood in X (i.e., it is a neighborhood
the homotopy class of f. deformation retract in X), then, by the excision
In particular, a homotopy equivalence f : X ! Y theorem,
induces an isomorphism in homology. So, for exam- ~ X=Y; R
H X; Y; R H
ple, the inclusion of the circle S1 into the punctured
plane Cn{0} is a homotopy equivalence, and thus where X=Y denotes the quotient space of X with Y
identified to a point. There is a long exact sequence
Hi Cnf0g; R Hi S1 ; R
! Hn Y; R ! Hn X; R ! Hn X; Y; R
Z for i 0; 1
@
0 for i 2 ! Hn1 Y; R ! ! H0 X; Y; R ! 0
For the one point space we have H0 (pt; R) = R. Define This and the MayerVietoris sequence give two ways of
reduced homology by H ~ (X; R) := ker(H (X; R) ! breaking up the problem of computing the homology of
H (pt; R)). a space into computing the homology of related spaces.
~ i (pt; R) = 0 for all i. An iteration of this process leads to the powerful tool of
2. Dimension axiom. H
spectral sequences (see Spectral Sequences).
More generally, it follows immediately from the
definition of simplicial homology that the homology
of any n-dimensional manifold is zero in dimensions Relation to Homotopy Groups
larger than n. Let 1 (X, x0 ) denote the fundamental group of X
We mentioned in the introduction that homology relative to the base point x0 . These are the based
depends only on local data. This is made precise homotopy classes of based maps from a circle to X.
by the
If X is connected; then H1 X; Z is
3. MayerVietoris theorem. Let X = A [ B be the 9
the abelianization of 1 X; x0
union of two open subspaces. Then the following
sequence is exact: Indeed, every map from a (triangulated) sphere to
X defines a cycle and hence gives rise to a homology
!Hn A \ B; R ! Hn A; R Hn B; R
class. This defines the Hurewicz map h : (X; x0 ) !
@
! Hn X; R! Hn1 A \ B; R H (X; Z). In general there is no good description of
its image. However, if X is k-connected with k 1,
! ! H0 X; R ! 0
then h induces an isomorphism in dimension k 1
On the level of chains, the first map is induced by the and an epimorphism in dimension k 2.
diagonal inclusion, while the second map takes the Though [9] indicates that homology cannot distin-
difference between the first and second summands. guish between all homotopy types, the fundamental
Finally, @ takes a cycle c = a b in the chains of X group is in a sense the only obstruction to this.
that can be expressed as the sum of a chain a in A A simple form of the Whitehead theorem states:
548 Cohomology Theories
ker @ i1
H i C ; @ : 10 Steenrod Algebra
im @ i
Evaluation (, ) 7! () descends to a dual pairing The cup product on the chain level is homotopy
commutative, but not commutative. Steenrod used
Hn C ; @ R H n C ; @ !R this defect to define operations
and when R is a field, this identifies the cohomology
Sqi : Hn X; Z2 ! H ni X; Z2
groups as the duals of the homology groups. More
generally, the universal-coefficient theorem relates for all i 0 which refine the cup-squaring opera-
the two. A simple version states: let (C , @ ) be a tion: when n = i, then Sqn (x) = x [ x. These are
chain complex of free abelian groups (such as the natural group homomorphisms which commute
simplicial or singular chain complexes) with finitely with suspension. Furthermore, they satisfy the
generated homology groups. Then, Cartan and Adem Relations
X
Hi C ; @ Hifree C ; @ Hi1
tor
C ; @ 11 Sqn x [ y Sqi x [ Sqj y
ijn
where Htor denotes the torsion subgroup of H and
!
Hfree denotes the quotient group H =Htor . i=2
X jk1
i j
Sq Sq Sqijk Sqk
Singular Cohomology k0 i 2k
The dual S (X) of the singular chain complex of a for i 2j
space X carries a natural pairing, the cup product,
The mod-2 Steenrod algebra A is then the free
[ : Sp (X) Sq (X) ! Spq (X) defined by
Z2 -algebra generated by the Steenrod squares
1 [ 2 Sqi , i 0, subject only to the Adem relations. With
: 1 jv0 ;...;vp 2 jvp ;...;vpq the help of Adems relations, Serre and Cartan found
a Z2 -basis for A:
This descends to a multiplication
L on cohomology
groups and makes H (X; R):= n0 Hn (X; R) into fSqI : Sqi1 Sqin jij 2ij1 for all jg
Cohomology Theories 549
The Steenrod algebra is also a Hopf algebra with where sign(p0 ) is 1 or 1 depending on whether f is
a commutative comultiplication : A ! A A orientation preserving or reversing in a neighbor-
induced by hood of p0 . For example, a complex polynomial of
X degree d defines a map of the two-dimensional
Sqn : Sqi Sqj sphere to itself of degree d: a generic point has n
ijn points in its inverse image and the map is locally
The Cartan relation implies that the mod-2 orientation preserving. On the other hand, a map of
cohomology of a space is compatible with the Sn1 induced by a reflection of Rn reverses orienta-
comultiplication, that is, H (X; Z2 ) is an algebra tion and has degree 1. Thus, as degrees multiply on
over the Hopf algebra A. There are odd primary composing maps, the antipodal map x 7! x has
analogs of the Steenrod algebra based on the degree (1)n . As an application we prove:
reduced pth power operations Every tangent vector field on an even-dimensional
sphere Sn1 has a zero.
Pi : H n X; Zp ! H n2ip1 X; Zp
Proof Assume v(x) is a vector field which is nonzero
with similar properties to A. for all x 2 Sn1 . Then x is perpendicular to v(x), and
One of the most striking applications of the after rescaling, we may assume that v(x) has length 1.
Steenrod algebra can be found in the work of The function F(x, t) = cos (t)x sin (t) v(x) is a well-
Adams on the vector fields on spheres problem: defined homotopy from the identity map (t = 0) to
for each n, find the greatest number k, denoted K(n), the antipodal map (t = ). But this is impossible as
such that there is a k-field on the (n 1)-sphere Sn1 . homotopic maps induce the same map in (co)homo-
Recall that a k-field is an ordered set of k pointwise logy and we have already seen that the degree of the
linear independent tangent vector fields. If we write n identity map is 1 while the degree of the antipodal
in the form n = 24ab (2s 1) with 0 b < 4, Adams map is (1)n = 1 when n is odd.
proved that K(n) = 2b 8a 1. In particular, when n It is well known that two self-maps of a sphere of
is odd, K(n) = 0. We give an outline of the proof for
any dimension are homotopic if and only if they
this special case in the next section.
have the same degree, that is, n (Sn ) Z for n 1.
The failure of associativity of the cup product at When M is not orientable, [M] still defines a cycle
the chain level gives rise to secondary operations, in homology with Z2 -coefficients, and [M]\
the so-called Massey products. defines an isomorphism between the cohomology
and homology with Z2 coefficients.
As [M] represents a homology class, so does every
other closed (orientable) submanifold of M. It is
Cohomology of Smooth Manifolds however not the case that every homology class
A smooth manifold M of dimension n can be can be represented by a submanifold or linear
triangulated by smooth simplices : n ! M. If M combinations of such.
is compact, oriented, without boundary, the sum of
Cohomology is a contravariant functor. Poincare
these simplices define a homology cycle [M], the
duality however allows us to define, for any f : M0 ! M
fundamental class of M. The most remarkable
between oriented, compact, closed manifolds of arbi-
property of the cohomology of manifolds is that
trary dimensions, a transfer or Umkehr map,
they satisfy Poincare duality: taking cap product
with [M] defines an isomorphism: f ! : D1 f D0 : H M0 ; Z ! H c M; Z
D: M\ : H k M; Z ! Hnk M; Z for all k 14 which lowers the degree by c = dim M0 dim M. It
satisfies the formula
In particular, for connected manifolds, H n (M; Z) Z;
and every map f : M0 ! M between oriented, compact f ! f x [ y x [ f ! y
closed manifolds of the same dimension has a degree:
for all x 2 H (M; Z) and y 2 H (M0 ; Z). When f is a
f : H (M; Z) ! H (M0 ; Z) is multiplication by an
covering map then f ! can be defined on the chain
integer deg(f ), the degree of f. For smooth maps, the
level by
degree is the number of points in the inverse image of
X
a generic point p 2 M counted with signs: f ! x : x ~
X f ~
degf signp0
p0 2f 1 p where x 2 C (M0 ) and 2 C (M).
550 Cohomology Theories
divergence. An easy exercise shows that d2 = 0 and With respect to this inner product is an isometry.
the qth de Rham cohomology of Rn is the vector space Define the codifferential via
q ker d : q Rn ! q1 Rn : 1npn1 d : q M ! q1 M
Hde Rn
R im d : q1 Rn ! q Rn
and the LaplaceBeltrami operator via
More generally, the de Rham complex (M) and
: d d
its cohomology Hde R (M) can be defined for any
smooth manifold M. The codifferential satisfies 2 = 0 and is the adjoint
Let be a smooth, singular, real (q 1)-chain on of the differential. Indeed, for q-forms ! and (q 1)-
M, and let ! 2 q (M). Stokes theorem then says forms !0 :
Z Z
d!; !0 !; !0 15
! d!
@
It follows easily that is self-adjoint, and
and therefore integration defines a pairing between furthermore,
the qth singular homology and the qth de Rahm
! 0 if and only if d! 0 and ! 0 16
cohomology of M. This pairing is exact and thus de
Rahm cohomology is isomorphic to singular coho- A form ! satisfying ! = 0 is called harmonic. Let
mology with real coefficients: Hq denote the subspace of all harmonic q-forms. It is
not hard to prove the Hodge decomposition theorem:
Hde R M H M; R H M; R
q Hq im d im
Let c (M) denote the subcomplex of compactly
supported forms and Hc (M) its cohomology. Integra- Furthermore, by adjointness [15], a form ! is closed
tion with respect to the first i coordinates defines a map only if it is orthogonal to im . On calculating the
de Rham cohomology we can also ignore the
c Rn ! i
c R
ni
summand im d and find that:
which induces an isomorphism in cohomology; note in Each de Rham cohomology class on a compact
particular Hcn (Rn ) = R. More generally, when E ! M oriented Riemannian manifold M contains a unique
is an i-dimensional orientable, real vector bundle over q
harmonic representative, that is, Hde q
R (M) H .
a compact, orientable manifold M, integration over
the fiber gives the Thom isomorphism: Warning: This is an isomorphism of vector spaces
and in general does not extend to an isomorphism of
Hc E Hci M Hde
i
R M algebras.
f
For orientable fiber bundles F ! M0 ! M with
compact, orientable fiber F, integration over the Examples
fiber provides another definition of the transfer map
0 i
We list the cohomology of some important
f ! : Hde R M ! Hde R M examples.
Cohomology Theories 551
are stunted polynomial rings with deg(y) = 2 and with di as above and jxi j = i. In particular,
deg(z) = 4. H BSO2k 1; Z1=2
Z1=2p1 ; p2 ; . . . ; pk
Lie Groups
H BSO2k; Z1=2
Let G be a compact, connected Lie group of rank l, Z1=2p1 ; p2 ; . . . ; pk1 ; ek
that is, the dimension of the maximal torus of G is l.
H BUk; Z Zc1 ; c2 ; . . . ; ck
Then,
where the Pontryagin, Euler, and Chern classes have
H G; Q degree jpi j = 4i, jek j = 2k, and jci j = 2i, respectively.
^
a2d1 1 ; a2d2 1 ; . . . ; a2dl 1
Q Moduli Spaces
where jai j = i and d1 , . . . , dl are the fundamental Let Mng be the space of Riemann surfaces of genus g
degrees of G which are known for all G. Often this with n ordered, marked points. There are naturally
structure lifts to the integral cohomology. In defined classes i and e1 , . . . , en of degree 2i and 2,
particular we have: respectively. By HarerIvanov stability and the
recent proof of the Mumford conjecture (Madsen
Hfree SO2k 1; Z Weiss, preprint 2004), there is an isomorphism up to
^ degree < 3g=2 of the rational cohomology of Mng
a3 ; a7 ; . . . ; a4k1
with
Z
Hfree SO2k; Z Q1 ; 2 ; . . . Qe1 ; . . . ; en
^ The rational cohomology vanishes in degrees >
a1 ; a7 ; . . . ; a4k5 ; a2k1
4g 5 if n = 0, and > 4g 4 n if n > 0. Though
Z
^ the stable part of the cohomology is now well under-
H Uk; Z a1 ; a3 ; . . . ; a2k1 stood, the structure of the unstable part, as proposed by
Z Faber (Viehweg 1999), remains conjectural.
and it follows that every generalized cohomology example, the category of finite-dimensional,
theory is represented by an infinite loop space complex vector spaces and their isomorphisms
gives rise to Z
BU. To give another example, in
E0 E1 n En quantum field theory, one considers the (d 1)-
Vice versa, any such infinite loop space gives rise to dimensional cobordism category with objects the
a generalized cohomology theory. compact, oriented d-dimensional manifolds, and
One may think of infinite loop spaces as the their (d 1)-dimensional cobordisms as morphisms.
abelian groups up to homotopy in the strongest Disjoint union of manifolds makes this category
sense. Indeed, ordinary cohomology with integer into a symmetric monoidal category. The associated
coefficients is represented by infinite loop space and hence generalized cohomol-
ogy theory has recently been identified as a (d 1)-
Z S1 2 CP1 n Kn; Z dimensional slice of oriented cobordism theory
(Galatius et al. preprint 2005).
where by definition the EilenbergMacLane space
K(n, Z) has trivial homotopy groups for all dimen- See also: Characteristic Classes; Equivariant
sions not equal to n and n K(n, Z) = Z. Complex Cohomology and the Cartan Model; Functional Equations
K-theory is represented by and Integrable Systems; Index Theorems; Intersection
Theory; K-Theory; Moduli Spaces: An Introduction;
Z
BU U 2 BU 3 U Riemann Surfaces; Spectral Sequences.
Combinatorics: Overview
C Krattenthaler, Universitat Wien, Vienna, Austria technique, RedfieldPolya theory, methods of solving
2006 Elsevier Ltd. All rights reserved.
functional equations of combinatorial origin, meth-
ods of asymptotic enumeration, the theory of heaps,
and the transfer matrix method. The subsequent
sections then discuss specific problem circles with
Introduction
relation to statistical physics more closely. We discuss
Combinatorics is a vast field which enters particularly lattice path problems, explain Kasteleyns method of
in a crucial way in statistical physics. There, it is enumerating perfect matchings and tilings, present
particularly the enumerative problems that are of the fundamental theorems on nonintersecting paths,
importance. Therefore, in this article, we shall mainly and provide an introduction into the research field
concentrate on the enumerative aspects of combina- involving vicious walkers, plane partitions, rhombus
torics. We first recall the basic terminology, in tilings, alternating sign matrices, six-vertex config-
particular the basic combinatorial objects and num- urations, and fully packed loop configurations.
bers, together with the simplest facts about them. We Finally, we explain how one should treat binomial
then provide introductions into the most important and hypergeometric series, which frequently arise in
techniques of enumeration: the generating function enumeration problems.
554 Combinatorics: Overview
X
n Finally, if p(n, k, m) denotes the number of parti-
xx 1 x n 1 1nk sn; kxk tions of n into at most k parts, all of which are at
k0 most m, then
X
or in form of the double (formal) power series pn; k; mxn
X yn n0
sn; kxk 1 yx
n! 1 xkm 1 xkm1 1 xm1
n;k0
1 xk 1 xk1 1 x
A partition of a set is a collection of pairwise
The expression on the right-hand side is called
disjoint subsets the union of which is the complete km
set. The subsets in the collection are called the q-binomial coefficient, and is denoted by [ ]x .
k
blocks of the partition. The total number of Partitions are frequently encoded in terms of their
partitions of an n-element set is the Bell number Ferrers diagrams. The Ferrers diagram of a partition
Bn . These numbers are given by = (1 , 2 , . . . , ) is an array of cells with left-
justified rows and i cells in row i. For example, the
X xn x diagram in Figure 1 is the Ferrers diagram of the
Bn ee 1
n0
n! partition (3, 3, 2).
A lattice path P in Zd (where Z denotes the set of
The number of partitions of an n-element set into integers) is a path in the d-dimensional integer
exactly k blocks is the Stirling number of the second lattice Zd which uses only points of the lattice, that
kind, S(n, k). These numbers are given by is, it is a sequence (P0 , P1 , . . . , Pl ), where Pi 2 Zd for
! ! !
X yn y
all i. The vectors P0 P1 , P1 P2 , . . . , Pl1 Pl are called
Sn; kxk exe 1 the steps of P. The number of steps, l, is called the
n!
n;k0 length of P. Figure 2 shows a lattice path in Z2 of
length 11.
or, explicitly, by
1X k
k n
Sn; k 1kj j
k! j0 j
The reader is referred to exercise 6.19 in Stanley The generating function for these numbers is
(1999) for countless occurrences of the Catalan
p
numbers. X1
1 x 1 6x x2
n
A Motzkin path is a lattice path in the integer Sn x 3
n0
2x
plane Z2 consisting of up-steps (1, 1), level steps
(1, 0), and down-steps (1,1), which starts at the The reader is referred to exercise 6.39 in Stanley
origin, never passes below the x-axis, and ends on (1999) for numerous occurrences of the Schroder
the x-axis. The path in Figure 2 is in fact a Motzkin numbers.
path. The number of Motzkin paths of length n is There is another famous sequence of numbers
the Motzkin number which we did not touch yet, the Fibonacci numbers
X 1 2k n Fn . They are given by
Mn
k0
k1 k 2k p!n1
1 1 5
Fn p
The generating function for these numbers is 5 2
p
X 1
1 x 1 2x 3x2
n
Mn x 2 with generating function
n0
2x2
X
1
1
Fn xn 4
The reader is referred to exercise 6.38 in Stanley (1999) n0
1 x x2
for numerous occurrences of the Motzkin numbers.
A Schroder path is a lattice path in the integer They also occur in numerous places. For example,
plane Z2 consisting of horizontal steps (1, 0) and the number Fn counts all paths on the integers Z
from 0 to n with steps (1, 0) and (2, 0).
An undirected graph G consists of vertices and
edges. An edge is a two-element subset of the
vertices, which, however, is thought of as a line or
curve connecting the two vertices. See Figure 5a
for an example. The usual notation for a graph G
is G = (V, E), where V is the set of vertices and E
Figure 3 A Dyck path. is the set of edges of G. A graph is planar if it is
556 Combinatorics: Overview
5
(ordinary) generating function for A is the formal
power series
4 X X
1
2
FA x xjaj an x n
a2A n0
contains. In order that A(B) contains only a finite If A and B are two sets of objects, one defines
number of objects of a given size, we must assume again several other sets of objects using them. The
that B contains no elements of size 0. If, in addition, union of A and B, written A [ B, has as a groundset
the atoms of any element a from A inherit an order the disjoint union of A and B, and the size of an
(e.g., if A is a set of binary trees, then the leaves of a element from A is its size in A, while the size of an
binary tree are ordered in a natural way from left element from B is its size in B. We have
to right), then we have
EA[B x EA x EB x 13
FAB
x FA FB x 8
To define the product of A and B, written A B,
However, this equation is not true in general. The we cannot simply take A B as a groundset, we
general formula comes out of RedfieldPolya theory must also say something about the labeling of the
(see [21] and [24]) and requires the notion of cycle objects. So, as a groundset we take all pairs (a, b)
index series. For example, if B is the set of connected with a 2 A and b 2 B, but labeled in all possible
(unlabeled) graphs, A is Sets, so that A(B) is the ways by 1, 2, . . . , jaj jbj such that the order of
set of all (connected and disconnected) graphs, then labels assigned to a respects the original order of
[8] is not true, but what is true is labels of a, and the same for b. The size of such an
element (a,b) is again the sum of the sizes of a (in A)
FSetsB exp FB x 12 FB x2 13 FB x3 9 and of b (in B). We have
This holds, in fact, for any set B of unlabeled objects. EAB x EA x EB x 14
(This is seen by combining [24], [17], and [21].)
Next we deal with the enumeration of labeled Since, in the labeled world, objects come automati-
objects. Let A be a set of labeled objects, again, each cally with atoms, the substitution of two sets A and
object a with a certain size jaj which is a non- B of objects can now always be defined. The
negative integer. Labeled means that each object substitution of B in A, denoted by A(B), is the set
of size n, by its structure, comes with n atoms of objects which arises by replacing the atoms of
(nodes) which are labeled 1, 2, . . . , n. For example, objects from A by objects from B in all possible
A may be the set of all labeled graphs, where the ways, and labeling the substituted
P objects in all
size of a graph is the number of its vertices, and possible ways by 1, 2, . . . , b jbj (the sum being
where the vertices are labeled 1, 2, . . . , n. Again, we over the objects from B which were put in the places
assume that there is only a finite number of objects of the atoms) that are consistent with the original
from A of a given size. Let an be the number of labelings of the objects from B. The size of an object
objects from A of size n. The exponential generating from A(B) is the sum of the sizes of the objects from
function for A is the formal power series B that it contains. In order that A(B) contains only a
finite number of objects of a given size, we must
X xjaj X
1
xn assume that B contains no elements of size 0. Then
EA x an we have
a2A
jaj! n0
n!
EAB x EA EB x 15
Typical examples are Sets (the collection containing
all labeled sets, that is all objects of the form An example of a composition is
{1, 2, . . . , n}, including the empty set), Permuta-
Permutations SetsCycles
tions, Cycles (labeled cycles), with respective
generating functions Thus, from [15] we have
ESets x expx 10 EPermutations x ESets ECycles x
1 corresponding to the identity
EPermutations x 11
1x
1
1 explog 1=1 x
ECycles x log 12 1x
1x
Another manifestation of the composition rule is, for
or Trees (labeled trees). The explicit form of the example, the fact (which is sometimes called the
generating function for Trees is discussed in the exponential principle) that, if one takes the log of
section Solving equations for generating functions: the partition function for some maps, the result is
the Lagrange inversion formula and the kernel the partition function for the connected maps among
method. them.
558 Combinatorics: Overview
All of the above can be generalized to a weighted our familiar families of objects, compact expressions
setting. Namely, if A is a set of objects (labeled or are available:
unlabeled), and if w : A ! R is a weight function x2 x3
from A into some ring R, then all of the above ZSets x1 ; x2 ; . . . exp x1 17
2 3
remains true, if we replace the definitions of FA (x)
and EA (x) above by the weighted sums Y1
1
ZPermutations x1 ; x2 ; . . . 18
X X1 1 xi
FA x waxjaj i i1 1
ZCycles x1 ; x2 ; . . . log 19
a2A i1
i 1 xi
color from the set of colors C. The question that we Sedgewick, (section VII.5 of the reference in Further
pose is: how many different colored objects of a reading section) for further reading.
given size are there? In our example, if C consists of In many situations it will happen that, when we
the two colors black and white, then we are apply the methods from the last section, we end up
asking the question of how many necklaces one can with aPfunctional equation for the generating function
make out of n pearls that can be black or white. In f (x) = 1 n
n = 0 fn x that we wanted to compute. For
terms of generating functions, we want to compute example, if tn denotes the number of labeled rooted
X trees
A~x xjcj P1 with n
n nodes, and if we write T(x) =
t
n=1 n x =n!, then, by applying a straightforward
c
decomposition of a tree into its root and its set of
where the sum is over all colored objects c that one subtrees attached to the root, we obtain the equation
can obtain by coloring the objects from A. ~
The central result of RedfieldPolya theory is that, Tz z expTz 25
if A is the set of labeled objects that one obtains How does one solve such an equation? As a matter
from A~ by labeling the objects of A~ in all possible of fact, for T(z), there is no expression in terms of
ways, then known functions. However, the Lagrange inversion
A~x ZA jCjx; jCjx2 ; jCjx3 ; . . . formula enables one to find the coefficients tn =n! of
T(z) explicitly. The theorem reads as follows.
There is again a weighted version. One allows the Theorem Let g(x) be a formal Laurent series
objects a from A~ to have weight w(a) 2 R. More- containing only a finite number of negative powers
over, one assumes a weight function f : C ! R on of x, and let f (x) be a formal power series without
the colors with values in the ring R. One defines the constant term. If we expand g(x) in powers of f (x),
weight of a colored object obtained by coloring X
the atoms of a to be w(a) multiplied by the product gx ck f k x 26
of all f (), where ranges over all the colors of the k
atoms (including repetitions of colors). Let A~(w, f ) then the coefficients cn are given by
denote the sum of all the weights of all colored
objects obtained from A. ~ Then 1
cn x1 g0 xf n x for n 6 0 27
! n
X X 2
X 3
A~w; f ZA f c; f c ; f c ; . . . or, alternatively, by
c2C c2C c2C
cn x1 gxf 0 xf n1 x 28
We remark that these results cover also the case of n n
Here, [x ]h(x) denotes the coefficient of x in the
enumeration of objects under a group action. This
power series h(x).
includes the enumeration of objects on which we
impose certain symmetries. See Bergeron et al. With this theorem in hand, eqn [25] is easy to
(1998, appendix 1), de Bruijn (1981), and Stanley solve. We write it in the form
(1999, chapter 7) for more details. The enumeration
Tx expTx x 29
of asymmetric objects is the subject of an ongoing
research program (cf. Labelle and Lamathe (2004)). We want P to know the coefficients in the expansion
T(x) = 1 n=0 t n xn
=n!. Since, by [29], T(x) is the
compositional inverse of x exp (x), substitution of
Solving Equations for Generating x exp (x) instead of x gives
Functions: The Lagrange Inversion X
1
tn
Formula and the Kernel Method x x expxn
n0
n!
In this section, we describe two methods to solve
This equation is in the form [26] with f (x) =
functional equations for generating functions. The
x exp (x) and g(x) = x. Hence, by [27], we obtain
Lagrange inversion makes it possible (in some situa-
tions) to find explicit expressions for the coefficients of tn 1 1
x x expxn
an implicitly given series. The kernel method (and its n! n
extensions), on the other hand, is a powerful method 1 nn1
to obtain an explicit expression for an implicitly given xn1 expnx
n n!
function. We refer the reader to Flajolet and
and, thus, tn = nn1 .
560 Combinatorics: Overview
The second method to solve functional equations reading section). In a more general situation, one
which we explain in this section is the kernel has a functional equation
method. We illustrate the method by an example.
PFu; x; F1 x; . . . ; Fk x; x; u 0 33
Let us consider the problem of counting Dyck paths
of length 2n (see the section Basic combinatorial where F(u, x) appears linearly, as well as the
terminology). Rather than attempting to arrive at a unknown series F1 (x), . . . , Fk (x), whereas x and u
solution of the problem directly, we consider the appear rationally. It is clear that one can apply the
more general problem of counting the number an, k same technique, namely collecting all the terms
of paths consisting of steps (1, 1) and (1, 1), which involving F(u, x), equating the coefficient of F(u, x)
start at the origin, never drop below y = 0, have to zero, solving for u and substituting back in [33]. If
length n, and end at height k. We then form P the there is more than one function Fi (x), then this will
bivariate generating function F(u, x) = n, k0 only give one equation for Fi (x). However, when
an, k xn uk . We then have the functional equation equating the coefficient of F(u, x), which was a
x polynomial equation, there can be more solutions.
Fu; x 1 xuFu; x Fu; x F0; x 30 (That was actually also the case in our example,
u
although only one solution could be used.) All these
since a path can be empty (this explains the term 1), solutions can be substituted in [33] to give many
it can end by a step (1,1) (this explains the term more equations for Fi (x). The kernel method will
xuF(u)), or it can end by a step (1,1). The latter work if we have enough equations to determine the
can only happen if the path before that last step did unknown functions Fi (x) (see the Flajolet and
not end at height 0. The generating function for Sedgewick reference, section VII.5 for further details).
these paths is F(u, x) F(0, x), and this explains the In the variant of the obstinate kernel method,
third term in the eqn [30]. In fact, we may replace more equations are produced in more sophisticated
[30] by ways. The method has been largely extended by
x BousquetMelou and co-workers to cover equations
Fu; x 1 xuFu; x Fu; x F1 x 31
u of the form [33], where P is a polynomial such that
because [31] implies that F1 (x) = F(0, x). eqn [33] determines all involved series uniquely. This
The idea of the kernel method is to get rid of the extension covers in particular the so-called quadratic
unknown series F(u, x). This is possible because F(u, x) method due to Brown, which is of great significance
occurs linearly in [31], which can be rewritten as in the work of Tutte on the enumeration of maps.
We refer the reader to BousquetMelou and Jehanne
x x (2005) and the references given there for these
Fu; x 1 xu 1 F1 x 32
u u extensions.
We simply equate the coefficient of F(u, x) in this
equation to zero,
x Extracting Asymptotic Information
1 xu 0
u from Generating Functions
solve this for u, There is powerful machinery available to extract the
p asymptotic behavior of the coefficients of a power
1 1 4x2
u series out of analytic properties of the power series.
2x
We describe the corresponding methods, singularity
(the other solution for u makes no sense in [31]), analysis and the saddle point method in this section.
and substitute this back in [32], to obtain The survey by Odlyzko (1995) and the Flajolet and
p Sedgewick reference in Further reading are excel-
1 1 4x2
F1 x lent sources for further reading, which, in particular,
2x2 contain several other methods which we cannot
the familiar generating function [2] for the Catalan cover here for reasons of limited space.
numbers. Now, by substituting this result in [31], we Let us suppose that we are interested in the
can even compute the full series F(u, x). asymptotic behavior of the sequence (fn )n0 of real
While this was certainly a complicated, and (or complex) numbers as n tends to infinity.P Let usn
unusual, way to compute the Catalan numbers, suppose that the power series f (z) = 1 n = 0 fn z
this approach generalizes when one considers converges in some neighborhood of the origin. (If
paths with different step sets (see section VII.5 of this series converges only at z = 0, then either one
the Flajolet and Sedgewick reference in Further has to try to scale, that is, for example, look at the
Combinatorics: Overview 561
P
power series f (z) = 1 n
n = 0 fn z =n! instead, or one expansion of f (z). For the above-mentioned stan-
must apply methods other than singularity analysis dard functions, we have
or the saddle point method. In the latter case,
depending on the nature of the coefficients fn , this 1 1
zn 1 z log
may be the EulerMaclaurin or the Poisson summa- z 1z
tion formulas, the Mellin transform technique, or n1 C1
other direct methods. The reader is referred to log n 1
1! log n
Odlyzko (1995) and the Flajolet and Sedgewick !
reference.) The idea is then to consider f (z) as a C2 1
complex function in z (and extend the range of f 35
2! log n2
beyond the disk of convergence about the origin),
and to study the singularities of f (z). (The point at where [zn ]g(z) denotes the coefficient of zn in g(z),
infinity can also be a singularity.) The upshot is that and where
the singularities of f (z) with smallest modulus
dictate the asymptotic behavior of the coefficients dk 1
Ck k
fn . These singularities of smallest modulus are called ds s s
the dominating singularities.
If there is an infinite number of dominant If is a nonpositive integer, then this expansion has
singularities, then one has to try the circle method. to be taken with care (cf. section VI.2 of the Flajolet
We refer the reader to Andrews (1976) and Ayoub and Sedgewick reference).
(1963) for details of this method. ToPsee how
this works, consider the example
If there is a finite number of dominant singula- fn = nk = 0 2kk . We have
rities, then there can be again two different situa- X
1
1
tions, depending on whether these are small or fn zn p
n0 1 z 1 4z
large singularities. Roughly speaking, a singularity
is small if the function f (z) grows at most The function on the right-hand side is meromorphic
polynomially when z approaches the singularity, in all of C (where C denotes the complex numbers),
otherwise it is large. A typical example of a small with singularities at z = 1 and z = 1=4. The domi-
singularity is z = 1=4 in (1 4z)1=2 , whereas a nant singularity is z = 1=4. We determine the
typical example of a large singularity is z = 1 in singular expansion of f(z) about z = 1=4,
exp (x) or z = 1 in exp (1=(1 z)).
The method to apply for small singularities is the 4 4
f z 1 4z1=2 1 4z1=2
method of singularity analysis as developed by 3 9
4
Flajolet and Odlyzko. (Singularity analysis implies
1 4z O 1 4z5=2
3=2
Darbouxs method, which occurs frequently in the 27
literature, and, thus, supersedes it.) For the sake of (We stopped the expansion after three terms. The
simplicity, we consider first only the case of a farther we go, the more terms can we compute
unique dominant singularity. We shall address the of the asymptotic expansion for fn .) Hence, we
issue of several dominant singularities shortly. obtain
Furthermore, we assume the singularity to be
1=2
z = 1, again for the sake of simplicity of presenta- n 4 n 1 1
tion. The general result can then be obtained by fn 4 1
3 1=2 8n 128n2
rescaling z.
4 n3=2 3
The basic idea is the transfer principle: 1
9 1=2 8n
If f z z Oz then 4 n 5=2
7=2
z!1 O n
27 3=2
fn n On 34
n!1 4n 4 1 11 1
p O
P n 3 18n 288n2 n3
where (z) = 1 n
n = 0 n z is a linear combination of
standard functions of the form P1 (1 z)n , or loga- If there are several small dominant singularities
rithmic variants, and (z) = n = 0 n z also lies in (but only a finite number of them), then one simply
the scale (see sections VI.3,4 of the Flajolet and applies the above procedure for all of them and, to
Sedgewick reference for the exact statement). The obtain the desired asymptotic expansion, one adds
expansion for f (z) in [34] is called the singular up the corresponding contributions.
562 Combinatorics: Overview
The method to apply for large singularities is the This result covers only the first term in the
saddle point method. For the following considera- asymptotic expansion. There is an even more
tions, we assume that f(z) is analytic in jzj < R 1. sophisticated theory due to Harris and Schoenfeld,
At the heart of the saddle point method lies which allows one to also find a complete asymptotic
Cauchys formula expansion. We refer the reader to section VIII.5 of
Z the Flajolet and Sedgewick reference and Odlyzko
1 f z (1995) for more details.
fn zn f z dz 36
2 i C zn1 Methods for the asymptotic analysis of multi-
for writing the nth coefficient in the power series variable generating functions are also available
expansion of f(z). Here, C is some simple closed (see the corresponding chapters in Flajolet and
contour around the origin that stays in the range Sedgewick, Odlyzko (1995) and the recent impor-
jzj < R. The idea is to exploit the fact that we are tant development surveyed in the Pemantle and
free to deform the contour. The aim is to choose a Wilson reference listed in Further reading). We
contour such that the main contribution to the add that both the method of singularity analysis and
integral in [36] comes from a very tiny part of the Haymans theory of admissible functions have been
contour, whereas the contribution of the rest is made largely automatic, and that this has been
negligible. This will be possible if we put the implemented in the Maple program gdev (see
contour through a saddle point of the integrand Further reading).
f (z)=zn1 . Under suitable conditions, the main
contribution will then come from the small passage
of the path through the saddle point, and the The Theory of Heaps
contribution of the rest will be negligible. The theory of heaps, developed by Viennot, is a
In practice, the saddle point method is not always geometric rendering of the theory of the partial
straightforward to apply, but has to be adapted to the commutation monoid of Cartier and Foata, which
specific properties of the function f(z) that we are is now most often called the CartierFoata monoid.
encountering. We refer the reader to the correspond- Its importance stems from the fact that several
ing chapters in the Flajolet and Sedgewick reference objects which appear in statistical physics, such as
and Odlyzko (1995) for more details. There is one Motzkin paths, animals, respectively polyominoes,
important exception though, namely the Hayman or Lorentzian triangulations (see the Viennot and
admissible functions. We will not reproduce the James reference in Further reading and the
definition of Hayman admissibility because it is references therein), are in bijection with heaps.
cumbersome (cf. section VIII.5 in the Flajolet and Informally, a heap is what we would imagine. We
Sedgewick reference and definition 12.4 of Odlyzko take a collection of pieces, say B1 , B2 , . . . , and put
(1995)). However, in many applications, it is not them one upon the other, sometimes also sideways,
even necessary to go back to it because of the closure to form a heap, see Figure 6.
properties of Hayman admissible functions. Namely, There, we imagine that pieces can only move
it is known (cf. Odlyzko (1995), theorem 12.8) that vertically, so that the heap in Figure 6 would indeed
exp (p(z)) is Hayman admissible in jzj < 1 for any form a stable arrangement. Note that we allow
polynomial p(z) with real coefficients as long as the several copies of a piece to appear in a heap. (This
coefficients an of the Taylor series of exp (p(z)) are means that they differ only by a vertical translation.)
positive for all sufficiently large n (thus, e.g., exp (z) For example, in Figure 6 there appear two copies of
is Hayman admissible), and it is known that, if f(z) B2 . Under these assumptions, there are pieces which
and g(z) are Hayman admissible in jzj < R 1, then can move past each other, and others which cannot.
exp (f (z)) and f(z)g(z) are also (thus, e.g., For example, in Figure 6, we can move the piece B6
exp ( exp (z) 1) is Hayman admissible). higher up, thus moving it higher than B1 if we wish.
The central result P of Haymans theory is the However, we cannot move B7 higher than B6 ,
following: if f (z) = n0 fn zn is Hayman admissible
in jzj < R, then
B1
f rn
fn p as n!1 37 B3 B2
n
rn 2brn
B4 B5 B6
where rn is the unique solution for large n of the
B2 B7
equation a(r) = n in (R0 , R), with a(r) = rf 0 (r)=f (r),
b(r) = ra0 (r), and a suitably chosen constant R0 > 0. Figure 6 A heap of pieces.
Combinatorics: Overview 563
d3
d2
d1
0 1 2 3 4 5 6 7 8
m0 m1 m2 m3 m4 m5 m6 m7 Figure 10 Bijection between animals and heaps of dimers.
0 1 2 3 4 5 6 7
Figure 7 Monomers and dimers.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8
Figure 8 Two heaps of monomers and dimers. Figure 11 Bijection between animals and heaps of dimers.
564 Combinatorics: Overview
by u omitted, and where #u denotes the row from ( t 1, t 1) to (n, m), by reflecting the path
number of u and similarly for #v. A weighted portion between the origin and the last touching
version could also be developed in the same way, point on y = x t 1 in this latter line. Thus, the
where we put a weight w(e) on each edge, and the result of the enumeration problem is the number of
weight of a walk is the product of the weights of all all paths from (0, 0) to (n,
m), which is given by the
its edges. binomial coefficient nm n , minus the number of all
In particular, the expression [41] is a rational paths from (t 1, t 1) to (n, m), which is given
function in x. Then, by the basic theorem on nm
by the binomial coefficient nt1 , whence the
rational generating functions (cf. Stanley (1986), formula [42].
sectionP4.1), the number wn (u, v) can be expressed as If one considers more generally paths bounded by
a sum di= 1 Pi (n)in , where the i s are the different the line my = nx t, no compact formula is known.
roots of the polynomial det (xIn A(G)), and Pi (n) It seems that the most conceptual way to approach
is a polynomial of degree less than the multiplicity this problem is through the so-called kernel method
of the root i . (The Pi (n)s depend on u and v, (see the section on solving equations for generating
whereas the i s do not.) If there exists a unique root functions), which, in combination with the saddle
j with maximal modulus, then this implies that, point method, allows one also to obtain strong
asymptotically as n ! 1, wn (u, v) Pj (n)jn . asymptotic results. There is one special instance,
however, which has a nice formula. The number
of all lattice paths from the origin to (n, m) which
Lattice Paths never pass above x = y, where is a positive
integer, is given by
Recall from the section on basic combinatorial
terminology that a lattice path P in Zd is a path in n m 1 n m 1
the d-dimensional integer lattice Zd which uses only 43
nm1 m
points of the lattice, that is, it is a sequence
(P0 , P1 , . . . , Pl ), where Pi 2 Zd for all i. The vectors The most elegant way to prove this formula is by
!
P0 P1 , P1 P2 , . . . , Pl1 Pl are called the steps of P. The means of the cycle lemma of Dvoretzky and
number of steps, l, is called the length of P. Motzkin (see Mohanty (1979), p. 9 where the cycle
The enumeration of lattice paths has always lemma occurs under the name of penetrating
been an intensively studied topic in statistics, analysis).
because of their importance in the study of Iteration of the reflection principle shows that the
random walks, of rank order statistics for non- number of paths from the origin to (n, m) which stay
parametric testing, and of queueing processes. The between the lines y = x t and y = x s (being
reader is referred to Feller (1957) and particularly allowed to touch them), where t 0 s and n t
Mohantys (1979) book, which is a rich source for m n s, is given by the finite (!) sum (see, e.g.,
enumerative results on lattice paths, albeit in a Mohanty (1979), p. 6)
statistical language. We review the most important X
results in this section. Most of these concern two- nm
dimensional lattice paths, that is, the case d = 2. k2Z
n kt s 2
To begin with, we consider paths in the integer
nm
plane Z2 consisting of horizontal and vertical unit 44
n kt s 2 t 1
steps in the positive direction. Clearly, the number
of all (unrestricted) paths from the origin to (n, m) is The enumeration of lattice paths restricted to
the binomial coefficient nm n . By the reflection regions bounded by hyperplanes has also been
principle, which is commonly attributed to D Andre considered for other regions, such as quadrants,
(see, e.g., Comtet (1974) p. 22), it follows that the octants, and rectangles, as well as in higher dimen-
number of paths from the origin to (n, m) which do sions. A general result due to Gessel and Zeilberger,
not pass above the line y = x t, where m n t, is and Biane, independently, on the number of lattice
given by paths in a chamber (alcove) of an (affine) reflection
n m n m group (see Krattenthaler (2003) for the correspond-
42 ing references and pointers to further results) shows
n nt1
how far one can go when one uses the reflection
Roughly, the reflection principle sets up a bijec- principle. In particular, this result covers [42] and
tion between the paths from the origin to (n, m) [44], the enumeration of lattice paths in quadrants,
which do pass above the line y = x t and all paths octants, rectangles, and many other results that have
566 Combinatorics: Overview
appeared (before and after) in the literature. We Enumerating lattice paths with a fixed number
present a particularly elegant (and frequently occur- of maximal straight pieces (which correspond to
ring) special case. (In reflection group language, it runs), is intimately connected to another basic
corresponds to the reflection group of type An1 . enumeration problem concerning lattice paths: the
See Humphreys (1990) for terminology and infor- enumeration of lattice paths having a fixed number
mation on reflection groups.) of turns. An effective way to attack the latter problem
Let A = (a1 , a2 , . . . , ad ) and E = (e1 , e2 , . . . , ed ) be is by means of two-rowed arrays (see the survey
points in Zd with a1 a2 ad and e1 article by Krattenthaler (1997), where in particular
e2 ed . The number of all paths from A to E in analogs of the reflection principle for two-rowed
the integer lattice Zd , which consist of positive unit arrays are developed. These imply formulas for the
steps and which stay in the region x1 x2 xd , number of lattice paths with fixed starting points and
equals endpoints and a fixed number of north-east (respec-
! tively eastnorth) turns, for unrestricted paths, as
X d
1
ei ai ! det 45 well as for paths bounded by lines. (A northeast turn
1i;jd ei aj i j ! in a lattice path is a point where the direction changes
i1
from north to east. An eastnorth turn is defined
The counting problem of the theorem is equiva- analogously.) In particular, analogs of [42][44] are
lent to numerous other counting problems. It has known when the number of northeast (respectively
been originally formulated as an n-candidate ballot eastnorth) turns is fixed.
problem, but it is as well equivalent to counting the These formulas imply for example (see again
number of standard Young tableaux of a given Krattenthaler (1997, section 3.5)) that the number
shape. In the case that all aj s are equal, the of lattice paths from the origin to (n, n) which
determinant does in fact evaluate into a closed- never pass above the line y = x t and have
form product. In Young tableaux theory, a parti- exactly 2r maximal straight pieces is given by
cular way to write the result is known as the
hook-length formula (see, e.g., Stanley (1999), n1 2 nt1 nt1
corollary 7.21.6). 2
r1 r2 r
We return to lattice paths in the plane, mention-
ing some more closely related results. The first is a nt1 nt1
result of Mohanty (1979, section 4.2), which 49
r1 r1
expresses the number of all lattice paths from the
origin to (n, m) which touch the line y = x t with a similar result for the case of 2r 1 maximal
exactly r times, never crossing it, as the difference straight pieces. (If t = 0, the numbers in [49] become
nmr nmr 1 n n
; r1 46
nt1 nt n r r1
Not forbidding that the paths cross the bounding and they are known as the Narayana numbers.)
line, we arrive at the problem of counting the lattice Furthermore, they imply that the number of lattice
paths from the origin to (n, m), which cross the main paths from the origin to (n, n) which never pass
diagonal y = x exactly r times, the answer being above the line y = x t and never below the line
8 y = x t and have exactly 2r maximal straight
> m n 2r 1 m n 1 pieces is given by
>
> if m > n
< mn1 nr
47 X1
> n 2kt 1 n 2kt 1
>
> 2r 2 2n 2
: if m = n rk1 rk1
n nr1 k1
Next, we give the number of lattice paths from the n 2kt t 1 n 2kt t 1
origin to (n, n) which have 2r steps on one side of rk2 rk
the line y = x, as
n 2kt t 1 n 2kt t 1
50
2r 2n 2r rk1 rk1
48
r nr
with a similar result for the case of 2r 1 maximal
a result due to Sparre Andersen. We refer the reader straight pieces.
to Mohanty (1979, chapter 3) for further results in The most general boundary for lattice paths that
this direction. one can imagine is the restriction that it stays
Combinatorics: Overview 567
between two given (fixed) paths. Let us assume that The sequence of polynomials (pn (x))n0 is in fact a
the horizontal steps of the upper (fixed) path are at sequence of orthogonal polynomials (cf. Koekoek
heights a1 a2 an , whereas the horizontal and Swarttouw (1998) and Szego " (1959)).
steps of the lower (fixed) path are at heights b1 We remark that in the case that r = s = 0 there is
b2 bn , ai bi , i = 1, 2, . . . , n. Then the num- also an elegant expression for the generating func-
ber of all paths from (0, b1 ) to (n, an ) satisfying the tion due to Flajolet (see section V.2 of the Flajolet
property that for all i = 1, 2, . . . , n the height of the and Sedgewick reference in Further reading) in
ith horizontal step is between bi and ai is given by terms of a continued fraction.
the determinant In order to solve our problem, we just have to
extract the coefficient of x in [53]. By a partial
a i bj 1
det 51 fraction expansion, a formula of the type
1i;jn ji1 X
In the statistical literature, this formula is often cm
m 54
m
known as Stecks formula, but it is actually a
special case of a much more general theorem due results, where the
m s are the zeroes of pK1 (x), and
to Kreweras. A generalization of [51] to higher- the cm s are some coefficients, only a finite number
dimensional paths was given by Handa and of them being nonzero.
Mohanty (see Mohanty (1979, section 2.4)). It should be noted that, because of the many
Next, we consider three-step lattice paths in the available parameters (the bn s and n s), by appro-
integer plane Z2 , that is, paths consisting of up-steps priate specializations one can also obtain numerous
(1, 1), level steps (1, 0), and down-steps (1, 1). The results about enumerating three-step paths accord-
particular problem that we are interested in is to ing to various statistics, such as the number of
count such three-step paths starting at (0, r) and touchings on the bounding lines, etc.
ending at (, s), which do not pass below the x-axis There are two important special cases in which a
and do not pass above the horizontal line y = K. completely explicit solution in terms of elementary
Furthermore, we assign the weight 1 to an up-step, functions can be given.
the weight bh to a level-step at height h, and the The first case occurs for bi = 0 and i = 1 for all i.
weight h to a down-step from height h to h 1. In this case, the polynomials pn (x) defined by
The weight w(P) of a path P is defined as the the three-term recurrence [52] are Chebyshev poly-
product of the weights of all its steps. Then we have nomials of the second kind, pn (x) = Un (x=2).
the following result, which can be obtained by the (The Chebyshev polynomial of the second kind
transfer matrix method described in the last section. Un (x) is defined by Un ( cos t) = sin ((n 1)t)= sin t
Define the sequence (pn (x))n0 of polynomials by (see Koekoek and Swarttouw (1998) for almost
exhaustive information on these polynomials and,
xpn x pn1 x bn pn x n pn1 x
52 more generally, on hypergeometric orthogonal poly-
for n 1 nomials)). The result which is then obtained from the
with initial conditions p0 (x) = 1 and p1 (x) = x b0 . general theorem (clearly, the zeros of Un (x) are
Furthermore, define (Spn (x))n0 to be the sequence of x = cos (2k=(n 1)), k = 1, 2, . . . , n, and therefore
polynomials which arises from the sequence (pn (x)) the partial fraction expansion of [53] is easily
by replacing i by i1 and bi by bi1 , i = 0, 1, 2, . . . , determined) is that the number of lattice paths from
everywhere in the three-term recurrence [52] and in (0, r) to (, s) with only up- and down-steps, which
the initial conditions. Finally, given a polynomial p(x) always stay between the x-axis and the line y = K, is
of degree n, we denote the corresponding reciprocal given by (see also Feller (1957, chapter XIV, eqn [5.7])
polynomial xn p(1=x) by p (x). K1
2 X k
With the weight w defined as before, the generat- 2 cos
P K 2 k1 K2
ing function P w(P)x(P) , where the sum is over all
three-step paths which start at (0, r), terminate at kr 1 ks 1
sin sin 55
height s, do not pass below the x-axis, and do not K2 K2
pass above the line y = K, is given by
a formula which goes back to Lagrange.
8 sr The second case occurs for bi = 1 and i = 1 for
>
> x pr xSs1 pKs x
< ; rs all i. In this case, the polynomials pn (x) defined
p x
K1
53 by the three-term recurrence [52] are again
>
> xrs ps xSr1 pKr x
: r s1 ; rs Chebyshev polynomials of the second kind,
pK1 x pn (x) = Un ((x 1)=2). The result which is then
568 Combinatorics: Overview
obtained from the general theorem is that the The latter equality shows in particular that Pfaffians
number of three-step lattice paths from (0, r) to are very close to determinants. They do, in fact,
(, s), which always stay between the x-axis and the generalize determinants since
line y = K, is given by
0 B
Pf det B 59
K1 B 0
2 X k
2 cos 1
K 2 k1 K2 for any square matrix B.
Thus, given a graph with vertices v1 , v2 , . . . , v2n ,
kr 1 ks 1
sin sin 56 specializing ai, j to the weight of the edge between vi
K2 K2 and vj , if it exists, and setting ai, j = 0 otherwise in
the definition of the Pfaffian, we obtain almost
Mw (G), the only difference is that there could be
signs in front of the individual terms of the sum,
Perfect Matchings and Tilings whereas in Mw (G) the sign in front of each term
In this section we consider the problem of counting must be . (The object obtained by omitting the sign
the perfect matchings of a graph. For an introduc- in [57] is called Hafnian. Unfortunately, in contrast
tion into the problem, and into methods to solve it, to the Pfaffian, it does not have any nice properties
as well as for a report on recent developments, we and it is therefore extremely difficult to compute.)
refer the reader to Propp (1999). Kasteleyns idea is to circumvent this problem by
Let G = (V, E) be a finite loopless graph with orienting the edges of the graph, defining signed
vertex set V and edge set E. A matching (also called weights of the edges, in such a way that the Pfaffian
1-factor in graph theory) is a subset of the edges of the array with signed weights produces exactly
with the property that no two edges share a vertex. Mw (G).
A matching is perfect if it covers all the edges. More precisely, given a (weighted) graph G with
Let M(G) denote the number of perfect matchings of vertices v1 , v2 , . . . , v2n , we make it into an oriented
!
the graph G. More generally, we could assign a (weighted) graph G . That is, if there is an edge
weight w(e) to each edge e of the graph and define the between vi and vj , ei, j say, we orient it either from vi
weight of a matching to be the product of to vj or the other way. Now we define the signed
! !
the weights of all its edges. Let Mw (G) denote adjacency matrix A(G ) of G by letting its (i, j)-entry
the sum of all weights of all matchings of the to be w(ei, j ) if there is an edge from vi to vj
graph G. oriented that way, w(ei, j ) if there is an edge from
Kasteleyns method for determining M(G), respec- vj to vi oriented that way, and 0 if there is no edge
tively Mw (G), makes use of determinants and between vi and vj . Such an orientation is called
Pfaffians. Recall that the Pfaffian Pf(A) of a Pfaffian if
triangular array A = (ai, j )1i<j2n is defined by !
X Y PfAG Mw G
PfA sgn m i;j 57
m fi;jg2m Clearly, the question remains whether a Pfaffian
orientation can be found for a given graph. In
where the sum is over all perfect matchings of the general, this is an open question. However, Kaste-
complete graph on vertices {1, 2, . . . , 2n}, and where leyn shows that for planar graphs such a Pfaffian
the product is over all edges {i, j}, i < j, of m. The orientation can always be found. Moreover, he
sign sgn m of m is (1)#crossings of m , where a crossing shows that any orientation of a planar graph
is a pair ({i, j}, {k, l}) of edges such that i < k < j < l. which has the property that around any face
Usually, one extends the triangular array A to a bounded by 4k edges an odd number of edges is
matrix by setting aj, i = ai, j , i < j, and ai, i = 0 for oriented in either direction and that around any face
all i. Then, abusing notation, we identify the bounded by 4k 2 edges an even number of edges is
triangular array with the skew-symmetric matrix oriented in either direction is Pfaffian.
A = (ai, j )1i, j2n . The Pfaffian satisfies the following For bipartite graphs (i.e., for graphs in which the set
useful properties: of vertices can be split into two disjoint sets such that
all the edges connect the vertex of one of these sets to a
PfBt AB detB PfA
vertex of the other), the situation is even nicer. This is
and because for a bipartite graph G in which both parts of
the bipartition of the vertices are of the same size
PfA2 detA 58 (otherwise, there is no perfect matching), any signed
Combinatorics: Overview 569
!
adjacency matrix A(G ) has the block form of the denote the set of all walks in G from u to v by
matrix on the left-hand side of [59] and, hence, the P(u ! v), and the set of all families (P1 , P2 , . . . , Pn )
Pfaffian reduces to a determinant. More precisely, let of walks, where Pi runs from ui to vi , i = 1, 2, . . . , n,
G be a bipartite graph with vertex set V = U [ W, by P(u ! v), with u = (u1 , u2 , . . . , un ) and v = (v1 ,
U = {u1 , u2 , . . . , un } and W = {w1 , w2 , . . . , wn }, with v2 , . . . , vn ). The symbol P (u ! v) stands for the set
edges connecting some ui to some wj . Given a of all families (P1 , P2 , . . . , Pn ) in P(u ! v) with the
!
Pfaffian orientation G , we build the signed bipartite additional property that no two walks share a
! !
adjacency matrix B(G ) = (bi, j )1i, jn of G by setting vertex. We call such families of walk(er)s vicious
bi, j = w(ei, j ) if there is an edge from ui to wj oriented walkers or, alternatively, nonintersecting paths.
that way, w(ei, j ) if there is an edge from uj to wi The weight w(P) of a family PQ = (P1 , P2 , . . . , Pn ) of
oriented that way, and 0 if there is no edge between ui walks is defined as the product ni= 1 w(Pi ) of all the
and wj . Then we have weights of the walks in the family. Finally, given a
! set M with weight functionP w, we write GF(M; w)
detBG Mw G for the generating function x2M w(x).
In particular, this holds for any bipartite planar We need two further notations before we are able
graph. See Robertson et al. (1999) for a structural to state the LindstromGesselViennot theorem.
description about which (not necessarily planar) (For references and historical remarks, we refer the
bipartite graphs admit a Pfaffian orientation. reader to footnote 5 in Krattenthaler (2005a).) As
Kasteleyns construction in the planar case has earlier, the symbol S n denotes the symmetric group
been generalized to graphs on surfaces of any genus of order n. Given a permutation 2 S n , we write u
g in Dolbilin et al. (1996), Galluccio and Loebl for (u(1) , u(2) , . . . , u(n) ). Then
(1999), and Tesler (2000), independently. As pre- X
dicted by Kasteleyn, the solution is in terms of a sgn GFP u ! v; w
linear combination of 4g Pfaffians. 2S n
With the help of his method, Kasteleyn computed det GFPuj ! vi ; w 60
1i;jn
the number of dimer coverings of an m n
rectangle. (A dimer is a 2 1 rectangle. Thus, this Most often, this theorem is applied in the case
is equivalent to counting the number of perfect where the only permutation for which vicious
matchings on the m n grid graph. The formula walks exist is the identity permutation, so that the
was independently found by Temperley and Fisher.) sum on the left-hand side reduces to a single term
The result is that counts all families (P1 , P2 , . . . , Pn ) of vicious
Ym Y n p
walks, the ith walk Pi running from Ai to
i j
2 cos 2 1 cos Ei , i = 1, 2, . . . , n. This case occurs, for example, if
i1 j1
m1 n1
for any pair of walks (P, Q) with P running from ua
For even m and n, the formula can be rewritten as to vd and Q running from ub to vc , a < b and c < d,
it is true that P and Q must have a common vertex.
m=2
YY n=2
i j Explicitly, in that case we have
4 cos2 4 cos2
m1 n1
i1 j1
GFP u ! v;w det GFPuj ! vi ; w 61
1i;jn
There is a similar rewriting if one of m or n is odd.
(If both m and n are odd, there is no dimer If the starting points or/and the endpoints are not
covering.) fixed, then the corresponding number is given by a
For further reading and references see Dimer Pfaffian, a result obtained by Okada and Stembridge
Problems and Kuperberg (1998). (see Bressoud (1999) for references). For a set A of
starting points, let P (A ! v) denote the set of all
families (P1 , P2 , .. ., P2n ) of nonintersecting lattice
Nonintersecting Paths
paths, where Pi runs from some point of A to
Let G = (V, E) be a directed acyclic graph with vi , i = 1, 2, ..., 2n. Furthermore, let us suppose that
vertices V and directed edges E. Furthermore, we are the elements of A = {u1 , u2 , ...} are ordered in such a
given a function w which assigns a weight w(x) to way that for any pair of walks (P, Q) with P running
every vertex or edge x. Let usQdefine the
Q weight w(P) from ua to vd and Q running from ub to vc , a < b and
of a walk P in the graph by e w(e) v w(v), where c < d, it is true that P and Q must have a common
the first product is over all edges e of the walk P and vertex. (This is the same condition as the one which
the second product is over all vertices v of P. We makes [61] valid, with the only difference that, here,
570 Combinatorics: Overview
the number of ui s could be larger than the number of The second model could also be realized as a
vi s.) Then, single walker model (cf. Krattenthaler (2003)).
However, most often it is realized as a model of n
GFP A ! v;w paths in the plane consisting of steps (1, 1) and
X (1, 1) with the property that no two paths have a
Pf GFPua ! vi ;wGFPub ! vj ;w point in common. In this picture, the x-axis becomes
1i;j2n
a<b
the time line, the kth path doing an up-step (1, 1)
GFPub ! vi ; wGFPua ! vj ;w 62 from (t 1, y) to (t, y 1) meaning that the kth
If the number of paths is odd, then one can use the particle moves to the left at time t, whereas the kth
same formula by adding an artificial point to the path doing a down-step (1, 1) from (t 1, y) to
endpoints and to the set of starting points A. There (t, y 1) meaning that the kth particle moves to the
is also a theorem by Okada and Stembridge which right at time t.
covers the case that starting points and endpoints The reader should consult Figure 14a for an
vary. Refinements when the number of turns is fixed example. (The labelings should be ignored at this
can be found in Krattenthaler (1997). point.) Clearly, what we encounter here is a
particular instance of the nonintersecting paths of
the last section. Therefore, for fixed starting points
Vicious Walkers, Plane Partitions, and endpoints, formula [61] applies, whereas if the
starting points vary and the endpoints are fixed, it is
Rhombus Tilings, and Fully Packed
formula [62] that applies.
Loop Configurations
At this point, the links to the other objects,
In this section we describe the interrelations between semistandard tableaux and plane partitions
four frequently appearing objects in statistical (cf. Bressoud (1999)), emerge. A filling of the cells
mechanics and combinatorics: vicious walkers, of the Ferrers diagram of with elements of the set
plane partitions, rhombus tilings, and fully packed {1, 2, . . . }, which is weakly increasing along rows
loop configurations. and strictly increasing along columns is called a
Given a lattice, vicious walkers, as introduced by (semistandard) tableau of shape . Figure 14b shows
Fisher (1984), are particles which move on lattice such a semistandard tableau of shape (4, 3, 2). In
sites in such a way that two particles never occupy fact, vicious walkers and semistandard tableaux are
the same lattice site. Models of vicious walkers have equivalent objects. To see this, first label down-steps
been the object of numerous studies from various by the x-coordinate of their endpoint, so that a step
points of view. Rather than accomplishing the from (a 1, b) to (a, b 1) is labeled by a, see
impossible task of providing a complete overview Figure 14a. Then, out of the labels of the jth path,
of references, the reader is referred to the basic form the jth column of the corresponding tableau,
reference Fisher (1984) and to Krattenthaler (2005a)
for further pointers to the literature.
Most of the known results apply for vicious
walkers on the line. There are in fact two different
6
models: in the random turns vicious walker model, n
E4
walkers move on the integral points of the real line
in such a way that at each tick of the clock exactly
one walker moves to the right or to the left, whereas
in the lock step vicious walker model n walkers 4 6
move on the integral points of the real line in such a A4 E3
way that at each tick of the clock each walker moves
to the right or to the left. A3 3
The first model is equivalent to a model of one 4 6
walker in Zn (Z denoting the set of integers) which A2 E2
at each tick of the clock moves a positive or negative 2 4
unit step in the direction of one of the coordinate A1 5 E1
2 3 4 6
axes, always staying in the wedge x1 > x2 > > 4 4 6
xn . This point of view was already put forward by 5 6
Fisher (1984). However, this problem belongs to the
problem of counting paths in chambers of reflection (a) (b)
groups discussed in the section Lattice paths. Figure 14 (a) Vicious walkers. (b) A tableau.
Combinatorics: Overview 571
1 1 2 2 2 1
3 3 3 1 1 1
4 5 5 1 0 0
(a) (b)
(a) (b)
Figure 17 (a) A rhombus tiling. (b) A family of nonintersecting
Figure 15 (a) A semistandard tableau. (b) A plane partition. paths.
572 Combinatorics: Overview
2 3
to Andrews et al. (1999), Gasper and Rahman a;a=2 1;b;c;d;1 2a b c d n;n
(2004), and Slater (1966). 6 7
6 ;1 7
Hypergeometric series can be characterized as 6 7
6
7 F6 6
7
series in which the quotient of the (k 1)st by the a=2; 1 a b; 1 a c; 1 a d; 7
6 7
4 5
kth summand is a rational function in k. This is also
the way to convert binomial sums into their a b c d n; a 1 n
hypergeometric form (respectively to see if this is 1 an 1 a b cn 1 a b dn 1 a c dn
possible; in most cases it is): form the quotient of the
1 a bn 1 a cn 1 a dn 1 a b c dn
(k 1)st by the kth summand and read off the
parameters a1 , . . . , ar , b1 , . . . , bs , and the argument z provided n is a non-negative integer.
from the factorization of the numerator and the Some of the most important transformation
denominator polynomials of the rational function, formulas are
out of these form the corresponding hypergeometric the Euler transformation formula
series, and multiply the series by the summand for 2 3 2 3
a;b c a;c b
k = 0. This is, in fact, a completely routine task, and, 2 F1
4 ;z 5 1 zcab 2 F1 4 ;z 5
indeed, computer algebra programs such as Maple c c
and Mathematica do this automatically.
The reason why hypergeometric series are much provided jzj < 1,
more fundamental than the binomial sums them- the Kummer transformation formula
selves is that there are hundreds of ways to write the 2 3
a; b; c
same sum using binomial coefficients and factorials, 6 7 ed e a b c
whereas there is just one hypergeometric form, that 3 F2 4 ; 1 5
e ad e b c
is, hypergeometric series are a kind of normal form d; e
2 3
for binomial sums. In particular, given a specific a; d b; d c
binomial sum, it is a hopeless enterprise to scan 6 7
3 F2 4 ;15
through all the identities available in the literature
for this sum. There may be an identity for it, but d; d e b c
perhaps written differently. On the contrary, given a provided both series converge,
specific hypergeometric series, the list of available and the Whipple transformation formulas
identities which apply to this series is usually not 2 3
large, and tables of such identities can be set up in a;b;c;n
6 7
a systematic way. This has been done (cf. Slater 6
4 F3 4 ;1 7
5
(1966); the most comprehensive table available to e;f ;1 a b c e f n
this date is contained in the manual of
e an f an
the Mathematica package HYP see Further
en f n
reading), and scanning through these tables is 2 3
largely facilitated by the use of the Mathematica n;a;1 a c e f n;1 a b e f n
6 7
package HYP. 6
4 F3 4 ;1 7
5
We give here some of the most important
1 a b c e f n;1 a e n;1 a f n
identities for hypergeometric series. Aside from the
binomial theorem, the most important summation 68
formulas are: the Gau 2 F1 -summation formula where n is a non-negative integer, and
2 3 2 3
a; b a; 1 2a ; b; c; d; e; n
4 cc a b
2 F1 ;15 6
6
7
c ac b 7 F6 4 ;17
5
c
a
2 ; 1 a b; 1 a c; 1 a d; 1 a e; 1 a n
provided <(c a b) > 0,
the PfaffSaalschutz summation formula 1 an 1 a d en
1 a dn 1 a en
2 3
a; b; n 2 3
c an c bn 1 a b c; d; e; n
3 F2
4 ;15 6 7
cn c a bn 4 F3 6 ;17 69
c; 1 a b c n 4 5
1 a b; 1 a c; a d e n
provided n is a non-negative integer, and
the Dougall summation formula provided n is a non-negative integer.
Combinatorics: Overview 575
Since about 1990, for the verification of binomial may now sum both sides of [71] over k to obtain a
and hypergeometric series, there are automatic tools recurrence of the form [70].
available. The book by Petkovsek et al. (1996) is an Algorithms for multiple sums are also available
excellent introduction into these aspects. The philo- (see Further reading). They follow ideas by Wilf
sophy is as follows. Suppose we are P given a binomial and Zeilberger (1992) (of which a simplified
or hypergeometric series S(n) = k F(n, k). The version is presented in a Mohammed and Zeilber-
GosperZeilberger algorithm (see Further read- ger preprint (see Further reading)); however, they
ing) (cf. Petkovsek et al. (1996); a simplified run more quickly in capacity problems. Schneider
version was presented in the reference Zeilberger in (2005) is currently developing a very promising
Further reading) will find a linear recurrence new algorithmic approach to the automatic treat-
ment of multisums. See q-Special Functions and
A0 nSn A1 nSn 1 Statistical Mechanics and Combinatorial Problems.
Ad nSn d Cn 70
See also: Classical Groups and Homogeneous Spaces;
for some d, where the coefficients Ai (n) are Compact Groups and Their Representations; Dimer
polynomials in n, and where C(n) is a certain Problems; Growth Processes in Random Matrix Theory;
function in n, with proof ! Ordinary Special Functions; q-Special Functions; Saddle
If, for example, we suspected that S(n) = RHS(n), Point Problems; Statistical Mechanics and Combinatorial
where RHS(n) is some closed-form expression, then Problems.
we just have to verify that RHS(n) satisfies the
recurrence [70] and check S(n) = RHS(n) for suffi-
ciently many initial values of n to have a proof for Further Reading
the identity S(n) = RHS(n) for all n. On the other http://algo.inria.fr This site includes, among its libraries, the
hand, if RHS(n) was a different sum, then we would Maple program gdev.
apply the algorithm to find a recurrence for RHS(n). Andrews GE (1976) The Theory of Partitions, Encyclopedia of
Mathematics and Its Applications, vol. 2. (reprinted by Cambridge
If it turns out to be the same recurrence then, again,
University Press, Cambridge, 1998). Reading: AddisonWesley.
a check of S(n) = RHS(n) for a few initial values will Andrews GE, Askey RA, and Roy R (1999) In: Rota GC (ed.)
provide a full proof of S(n) = RHS(n) for all n. Special Functions, Encyclopedia of Mathematics and Its
Even in the case that we do not have a conjectured Applications, vol. 71. Cambridge: Cambridge University Press.
expression RHS(n), this is not the end of the story. Ayoub R (1963) An Introduction to the Analytic Theory of
Numbers. Mathematical Surveys, vol. 10, Providence, RI:
Given a recurrence of the type [70], the Petkovsek
American Mathematical Society.
algorithm (see Further reading) (cf. Petkovsek et al. Bergeron F, Labelle G, and Leroux P (1998) Combinatorial Species
(1996)) is able to find a closed-form solution (where and Tree-Like Structures. Cambridge: Cambridge University Press.
closed form has a precise meaning), respectively tell Bousquet-Melou M and Jehanne A (2005), Polynomial equations
that there is no closed-form solution. with one catalytic variable, algebraic series, and map
enumeration. Preprint, ariv:math.CO/0504018.
The fascinating point about both algorithms is
Bressoud DM (1999) Proofs and Confirmations The Story of
that neither do we have to know what the algorithm the Alternating Sign Matrix Conjecture. Cambridge: Cam-
does internally nor do we have to check that. For bridge University Press.
the Petkovsek algorithm, this is obvious anyway de Bruijn NG (1964) Polyas theory of counting. In: Beckenbach
because, once the computer says that a certain EF (ed.) Applied Combinatorial Mathematics, New York:
Wiley, (reprinted by Krieger, Malabar, Florida, 1981).
expression is a solution of [70], it is a routine matter
Comtet L (1974) Advanced Combinatorics. Dordrecht: Reidel.
to check that. This is less obvious for the Gosper Dolbilin NP, Mishchenko AS, Shtanko MA, Shtogrin MI, and
Zeilberger algorithm. However, what the Gosper Zinoviev YuM (1996) Homological properties of dimer
Zeilberger
P algorithm does is, for a given sum
configurations for lattices on surfaces. Functional Analysis
S(n) = k F(n, k), it finds polynomials A0 (n), and its Application 30: 163173.
Feller W (1957) An Introduction to Probability Theory and Its
A1 (n), . . . , Ad (n) and an expression G(n, k) (which
Applications, vol. 1, 2nd edn. New York: Wiley.
is, in fact, a rational multiple of F(n, k)), such that Fisher ME (1984) Walks, walls, wetting and melting. Journal of
Statistical Physics 34: 667729.
A0 nFn; k A1 nFn 1; k Flajolet P and Sedgewick R, Analytic Combinatorics, book
Ad nFn d; k Gn; k 1 Gn; k 71 project, available at http://algo.inria.fr.
Di Francesco P, Zinn-Justin P and Zuber J.-B. (2004), Determi-
for some d. Because of the properties of F(n, k) and nant formulae for some tiling problems and application to
G(n, k), which are part of the theory, this is an fully packed loops, Preprint, ariv:math-ph/0410002.
Di Francesco P and Zinn-Justin P (2005), Quantum Knizhnik
identity which can be directly verified by clearing all Zamolodchikov equation, generalized RazumovStroganov
common factors and checking the remaining identity sum rules and extended Joseph polynomials. Preprint,
between rational functions in n and k. However, we ariv:math-ph/0508059.
576 Compact Groups and Their Representations
Galluccio A and Loebl M (1999) On the theory of Pfaffian Pemantle R and Wilson MC, Twenty combinatorial examples of
orientations I. Perfect matchings and permanents. Electronic asymptotics derived from multivariate generating functions.
Journal of Combinatorics 6: Article #R6, 18 pp. Preprint, available at http://www.cs.auckland.ac.nz.
http://www.fmf.uni-lj.si website of Faculty of Mathematics of Petkovsek M, Wilf H, and Zeilberger D (1996) A B Wellesley:
University of Ljubljana. A Mathematica implementation by Peters AK.
Marko Petkovsek is available here. http://www.mat.univie.ac.at Website of Faculty of Mathematics,
Gasper G and Rahman M (2004) Basic Hypergeometric Series, University of Vienna. It provides the manual of the Mathe-
2nd edn. Encyclopedia of Mathematics and Its Applications, matica package HYP.
vol. 96. Cambridge: Cambridge University Press. Propp J (1999) Enumeration of matchings: problems and progress.
de Gier J (2005) Loops matchings and alternating-sign matrices. In: Billera L, Bjorner A, Greene C, Simion R, and Stanley RP
Discrete Mathematics 365388. (eds.) New Perspectives in Algebraic Combinatorics, Mathe-
Humphreys JE (1990) Reflection Groups and Coxeter Groups. matical Sciences Research Institute Publications, vol. 38,
Cambridge: Cambridge University Press. pp. 255291. Cambridge: Cambridge University Press.
Johansson K (2002) Non-intersecting paths, random tilings and Razumov AV and Stroganov YG (2005) Enumeration of quarter-
random matrices. Probability Theory and Related Fields turn symmetric alternating-sign matrices of odd order.
123: 225280. Preprint, ariv:math-ph/0507003.
Kenyon R (2003) An Introduction to the Dimer Model, Lecture Notes Robertson N, Seymour PD, and Thomas R (1999) Permanents,
for a Short Course at the ICTP, 2002; ariv:math.CO/0310326. Pfaffian orientations, and even directed circuits. Annals of
Koekoek R and Swarttouw RF, The Askeyscheme of hypergeo- Mathematics 150(2): 929975.
metric orthogonal polynomials and its q-analogue, TU Delft, Schneider C (2005) A new Sigma approach to multi-summation.
The Netherlands, 1998; on the www: http://aw.twi.tudelft.nl. Advances in Applied Mathematics 34(4): 740767.
Krattenthaler C (1997) The enumeration of lattice paths with Slater LJ (1966) Generalized Hypergeometric Functions.
respect to their number of turns. In: Balakrishnan N (ed.) Cambridge: Cambridge University Press.
Advances in Combinatorial Methods and Applications to Stanley RP (1986) Enumerative Combinatorics, Pacific Grove,
Probability and Statistics, pp. 2958. Boston: Birkhauser. CA: Wadsworth & Brooks/Cole, (reprinted by Cambridge
Krattenthaler C (2003), Asymptotics for random walks in alcoves University Press, Cambridge, 1998).
of affine Weyl groups. Preprint, ariv:math.CO/0301203. Stanley RP (1999) Enumerative Combinatorics, vol. 2. Cambridge:
Krattenthaler C (2005a), Watermelon configurations with wall Cambridge University Press.
interaction: exact and asymptotic results. Preprint, Szego" G (1959) Orthogonal Polynomials, American Mathematical
ariv:math.CO/0506323. Society Colloquium Publications, vol. 23. New York. Provi-
Krattenthaler C (2005b) Advanced determinant calculus: a dence RI: American Mathematical Society.
complement. Linear Algebra Applications 411: 68166. Tesler G (2000) Matchings in graphs on non-oriented surfaces.
Krattenthaler C, Guttmann AJ, and Viennot XG (2000) Vicious Journal of Combinatorial Theory Series B 78: 198231.
walkers, friendly walkers and Young tableaux II: with a wall. http://www.risc.uni.linz.ac.at website of RISC (Research Insti-
Journal of Physics A: Mathematical and General 33: 88358866. tute for Symbolic Computation). Mathematica implementa-
Kuperberg G (1998) An exploration of the permanent-determi- tions written by Peter Paule and Markus Schorn, and Axel
nant method. Electronic Journal of Combinatorics 5: Article Riese and Kurt Wegschaider are available here.
#R46, 34 pp. http://www.math.rutgers.edu website of Department of Mathe-
Labelle G and Lamathe C (2004) A shifted asymmetry index matics, Rutgers University. Computer implementations written
series. Advances in Applied Mathematics 32: 576608. by D Zeilberger are available here.
Mohammed M and Zeilberger D (2005) Multi-variable Zeilberger Viennot X and James W Heaps of segments, q-Bessel functions in
and AlmkvistZeilberger algorithms and the sharpening of square lattice enumeration and applications in quantum
WilfZeilberger theory. Advanced Applications in Mathe- gravity. Preprint.
matics (to appear). Wilf HS and Zeilberger D (1992) An algorithmic proof theory for
Mohanty SG (1979) Lattice Path Counting and Applications. hypergeometric (ordinary and q) multisum/integral identi-
New York: Academic Press. ties. Inventiones Mathematicae 108: 575633.
Odlyzko AM (1995) Asymptotic enumeration methods. In: Zeilberger D (2005) Deconstructing the Zeilberger algorithm.
Graham RL, Grotschel M, and Lovasz L (eds.) Handbook of Journal of Difference Equations and Applications 11: 851856.
Combinatorics, pp. 10631229. Amsterdam: Elsevier.
Examples of Compact Lie Groups The proof of these results is based on the fact that
the Killing form of g is negative semidefinite.
Examples of compact groups include
Example 1 The group U(n) contains as the center
finite groups,
the subgroup C of scalar matrices. The quotient
quotient groups Tn = Rn =Zn , or more generally,
group U(n)=C is simple and isomorphic to
V=L, where V is a finite-dimensional real vector
SU(n)=Zn . The presentation of Theorem 1 in this
space and L is a lattice in V, that is, a discrete
case is
subgroup generated by some basis in V groups
of this type are called tori; it is known that Un T1 SUn =Zn
every commutative connected compact group is a
C SUn=C \ SUn
torus;
unitary groups U(n) and special unitary groups For the group SO(4) the presentation is
SU(n), n 2; (SU(2) SU(2))={(1 1)}.
orthogonal groups O(n) and SO(n), n 3; and
the groups U(n, H), n 1, of unitary quaternionic This theorem effectively reduces the study of the
transformations, which are isomorphic to Sp(n) := structure of connected compact groups to the study
Sp(n, C) \ SU(2n). of simply connected compact simple Lie groups.
Since the Lie algebra of a compact Lie group G is The restrictions on n in this table are
reductive, we see that GC must be reductive; if G is made to avoid repetitions which appear for
semisimple or simple, then so is GC . The natural small values of n. Namely, A1 = B1 = C1 , which
question is whether every complex reductive group gives SU(2) = Spin(3) = Sp(1); D2 = A1 [ A1 , which
can be obtained in this way. The following theorem gives Spin(4) = SU(2) SU(2); B2 = C2 , which gives
gives a partial answer. SO(5) = Sp(4); and A3 = D3 , which gives SU(4) =
Spin(6). Other than that, all entries are distinct.
Theorem 3 Every connected complex semisimple
Exceptional groups E6 , . . . , G2 also admit explicit
Lie group H has a compact real form: there is a
geometric and algebraic descriptions which are
compact real subgroup K H such that H = KC .
related to the exceptional nonassociative algebra O
Moreover, such a compact real form is unique up to
of the so-called octonions (or Cayley numbers). For
conjugation.
example, the compact group of type G2 can be
Example 2 defined as a subgroup of SO(7) which preserves an
almost-complex structure on S6 . It can also be
(i) The unitary group U(n) is a compact real form
described as the subgroup of GL(7, R) which
of the group GL(n, C).
preserves one quadratic and one cubic form, or,
(ii) The orthogonal group SO(n) is a compact real
finally, as a group of all automorphisms of O.
form of the group SO(n, C).
(iii) The group Sp(n) is a compact real form of the
group Sp(n, C). Maximal Tori
(iv) The universal cover of GL(n, C) has no compact
real form. Main Properties
These results have a number of important appli- In this section, G is a compact connected Lie group.
cations. For example, they show that study of Definition 2 A maximal torus in G is a maximal
representations of a semisimple complex group H connected commutative subgroup T G.
can be replaced by the study of representations of its
compact form; in particular, every representation is The following theorem lists the main properties of
completely reducible (this argument is known as maximal tori.
Weyls unitary trick). Theorem 5
(i) For every element g 2 G, there exists a maximal
Classification of Simple Compact Lie Groups torus T 3 g.
(ii) Any two maximal tori in G are conjugate.
Theorem 1 essentially reduces such classification to
(iii) If g 2 G commutes with all elements of a
classification of simply connected simple compact
maximal torus T, then g 2 T.
groups, and Theorems 2 and 3 reduce it to the
(iv) A connected subgroup H G is a maximal
classification of simple complex Lie algebras. Since
torus iff the Lie algebra Lie(H) is a maximal
the latter is well known, we get the following result.
abelian subalgebra in Lie(G).
Theorem 4 Let G be a connected, simply con-
Example 3 Let G = U(n). Then the set T of
nected simple compact Lie group. Then gC must be
diagonal unitary matrices is a maximal torus in G;
a simple complex Lie algebra and thus can be
moreover, every maximal torus is of this form after
described by a Dynkin diagram of one the following
a suitable unitary change of basis. In particular, this
types: An , Bn , Cn , Dn , E6 , E7 , E8 , F4 , G2 .
implies that every element in G is conjugate to a
Conversely, for each Dynkin diagram in the above
diagonal matrix.
list, there exists a unique, up to isomorphism, simply
connected simple compact Lie group whose Lie Example 4 Let G = SO(3). Then the set D of
algebra is described by this Dynkin diagram. diagonal matrices is a maximal commutative sub-
group in G, but not a torus. Here D consists of four
For types An , . . . , Dn , the corresponding compact
elements and is not connected.
Lie groups are well-known classical groups shown in
the table below: Maximal Tori and Cartan Subalgebras
The study of maximal tori in compact Lie groups is
An , n 1 Bn , n 2 Cn , n 3 Dn , n 4 closely related to the study of Cartan subalgebras in
SU(n 1) Spin(2n 1) Sp(n) Spin(2n) reductive complex Lie algebras. For convenience of
readers, we briefly recall the appropriate definitions
Compact Groups and Their Representations 579
here; details can be found in Serre (2001) or in Lie It follows from the definition of root system that
Groups: General Theory. we have inclusions
Definition 3 Let a be a complex reductive Lie Q P it
algebra. A Cartan subalgebra h a is a maximal 2
Q_ P_ it
commutative subalgebra consisting of semisimple
elements.
Both P, Q are lattices in it ; thus, the index (P : Q)
Note that for general Lie algebras Cartan sub- is finite. It can be computed explicitly: if i is a basis
algebra is defined in a different way; however, for of the root system, then the fundamental weights !i
reductive algebras the definition given above is defined by
equivalent to the standard one.
A choice of a Cartan subalgebra gives rise to the h_i ; !j i ij
so-called root decomposition: if h a is a Cartan
subalgebra in a complex reductive Lie algebra, then form a basis of P. The simple roots i are related
we can write to fundamental
P weights !j by the Cartan matrix A:
i = Aij !j . Therefore, (P : Q) = (P_ : Q_ ) = j det Aj.
!
M Definitions of P, Q, P_ , Q_ also make sense when
ah a 1
g is reductive but not semisimple. However, in this
2R case they are no longer lattices: rkQ < dim t , and P
is not discrete.
where
We can now give more precise information about
a fx 2 aj ad h:x h; hix 8h 2 hg the structure of the maximal torus.
R f 2 h f0gja 6 0g h Lemma 1 Let T be a compact connected commu-
tative Lie group, and t = Lie(T) its Lie algebra. Then
The set R is called the root system of a with the exponential map is surjective and preimage
respect to Cartan subalgebra h; elements 2 R are of unit is a lattice L t. There is an isomorphism
called roots. We will also frequently use elements of Lie groups
_ 2 h defined by h_ , i = 2(, )=(, ) where ( , )
is a nondegenerate invariant bilinear form on a and exp : t=L ! T
h , i is the pairing between a and a . It can be shown
that so defined _ does not depend on the choice of In particular, T Rr =Zr = Tr , r = dim T.
the form ( , ). Let X(T) it be the lattice dual to 2i1 L:
Theorem 6 Let G be a connected compact Lie
group with Lie algebra g, and let T G be a XT f 2 it jh; li 2 2iZ 8l 2 Lg 3
units in it, we identify it Rn , which also allows us Example 6 Let G = U(n). The set of diagonal unitary
to identify it Rn . Under this identification, matrices is a maximal torus, and the Weyl group is the
n X o symmetric group Sn acting on diagonal matrices by
Q 1 ; . . . ; n ji 2 Z; i 0 permutations of entries. In this case, Theorem 9 shows
that if f (U) is a central function of a unitary matrix,
P 1 ; . . . ; n ji 2 R; i j 2 Z then f (U) = ~f (1 , . . . , n ), where i are eigenvalues of
XT Zn U and ~f is a symmetric function in n variables.
Weyl group of G.
g ! EndV
Choosing a basis in V, we can write the operators
Since the Weyl group acts faithfully on t and t , it
(g) and (X) in matrix form and consider and
is common to consider W as a subgroup in GL(t ). It
as matrix-valued functions on G and g. The diagram
is known that W is finite.
above means that
The Weyl group can also be defined in terms of
Lie algebra g and its complexification gC . exp X e X 4
Theorem 8 The Weyl group coincides with the Recall that if G is connected, simply connected, then
subgroup in GL(it ) generated by reflections every representation of g can be uniquely lifted to a
s : x 7! x (2(, x))=(, ), 2 R, where, as representation of G. Thus, classification of repre-
before, ( , ) is a nondegenerate invariant bilinear sentations of connected simply connected Lie groups
form on g . is equivalent to the classification of representations
Theorem 9 of Lie algebras.
Let (1 , V1 ) and (2 , V2 ) be two representations of
(i) Two elements t1 , t2 2 T are conjugate in G iff the same group G. An operator A 2 Hom(V1 , V2 ) is
t2 = w(t1 ) for some w 2 W. called an intertwining operator, or simply an
(ii) There exists a natural homeomorphism of intertwiner, if A 1 (g) = 2 (g) A for all g 2 G.
quotient spaces G=AdG T=W, where AdG Two representations are called equivalent if they
stands for action of G on itself by conjugation. admit an invertible intertwiner. In this case, using an
(Note, however, that these quotient spaces are appropriate choice of bases, we can write 1 and 2
not manifolds: they have singularities.) by the same matrix-valued function.
(iii) Let us call a function f on G central if Let (, V) be a representation of G. If all operators
f (hgh1 ) = f (g) for any g, h 2 G. Then the (g), g 2 G, preserve a subspace V1 V, then the
restriction map gives an isomorphism restrictions 1 (g) = (g)jV1 define a subrepresenta-
tion (1 , V1 ) of (, V). In this case, the quotient
fcontinuous central functions on Gg
space V2 = V=V1 also has a canonical structure of a
fW invariant continuous functions on Tg representation, called the quotient representation.
Compact Groups and Their Representations 581
A representation (, V) is called reducible if it The collection of all unirreps of T is itself a group,
has a nontrivial (different from V and {0}) sub- called Pontrjagin dual of T and denoted by
representation. Otherwise it is called irreducible. b This group is isomorphic to Z.
T.
We call representation (, V) unitary if V is a By Theorem 11, any f.d. representation of T is
Hilbert space and all operators (g), g 2 G, are equivalent to a direct sum of one-dimensional
unitary, that is, given by unitary matrices in any unirreps. So, an equivalence class of is defined by
orthonormal basis. We use a short term unirrep the multiplicity function on T b = Z taking non-
for a unitary irreducible representation. negative values:
X
k
k
Main Theorems k2Z
The following simple but important result was one The many-dimensional case of compact connected
of the first discoveries in representation theory. It abelian Lie group can be treated in a similar way.
holds for representations of any group, not necessa- Let T be a torus, that is, an abelian compact group,
rily compact. t = Lie(T). Then every irreducible representation
Theorem 10 (Schur lemma). Let (i , Vi ), i = 1, 2, be of T is one dimensional and thus is defined by a
any two irreducible finite-dimensional representa- group homomorphism : T ! T1 = U(1). Such
tions of the same group G. Then any intertwiner homomorphisms are called characters of T. One
A : V1 ! V2 is either invertible or zero. easily sees that such characters themselves form a
group (Pontrjagin dual of T). If we denote by L the
Corollary 1 If V is an irreducible f.d. representation, kernel of the exponential map t ! T (see Lemma 1),
then any intertwiner A : V ! V is scalar: A = c
id, c 2 C. one easily sees that every character has a form
Corollary 2 Every irreducible representation of a expt eht;i ; t 2 t; 2 XT
commutative group is one dimensional.
where X(T) it is the lattice defined by [3]. Thus,
The following theorem is one of the fundamental we can identify the group of characters T b with X(T).
results of the representation theory of compact b
In particular, this shows that T Z dim T
.
groups. Its proof is based on the technique of The second example is the group G = SU(2), the
invariant integrals on a compact group, which will simplest connected, simply connected nonabelian
be discussed in the next section. compact Lie group. Topologically, G is a three-
Theorem 11 dimensional sphere since the general element of G is
a matrix of the form
(i) Any f.d. representation of a compact group is
equivalent to a unitary representation. a b
g ; a; b 2 C; jaj2 jbj2 1
(ii) Any f.d. representation is completely reducible: b a
it can be decomposed into direct sum
M Let V be two-dimensional complex vector space,
V ni V i realized by column vectors uv . The group G acts
naturally on V. This action induces the representa-
where Vi are pairwise nonequivalent unirreps. tion of G in the space S(V) of all polynomials in
Numbers ni 2 Z are called multiplicities. u, v. It is infinite dimensional, but has many f.d.
subrepresentations. In particular, let Sk (V), or
Examples of Representations simply Sk , be the space of all homogeneous
polynomials of degree k. Clearly, dim Sk = k 1.
The representation theory looks rather different for
It turns out that the corresponding f.d. representa-
abelian (i.e., commutative) and nonabelian groups.
tions (k , Sk ), k 0, are irreducible, pairwise non-
Here we consider two simplest examples of both kinds. b of all unirreps.
equivalent, and exhaust the set G
Our first example is a one-dimensional compact
Some particular cases are of special interest:
connected Lie group. Topologically, it is a circle
which we realize as a set T U(1) of all complex 1. k = 0. The space V0 consists of constant functions
numbers t with absolute value 1. and 0 is the trivial one-dimensional representa-
Every unirrep of T is one dimensional; thus, it is tion: 0 (g) 1.
just a continuous multiplicative map of T to itself. 2. k = 1. The space V1 is identical to V and 1 is
It is well known that every such map has the form just the tautological representation (g) g.
3. k = 2. The space V2 is spanned by monomials
k t tk for some k 2 Z u2 , uv, v2 . The remarkable fact is that this
582 Compact Groups and Their Representations
representation is equivalent to a real one. Namely, Theorem 12 For every compact Lie group G, there
in the new basis exists a unique measure dg on G, called Haar
measure, which is invariant
R under left shifts
u2 v2 u2 v2 Lg : h 7! gh and satisfies G dg = 1.
x ; y ; z iuv
2 2i In addition, this measure is also invariant under
we have right shifts h 7! hg and under involution h 7! h1 .
0 1
! Rea2 b2 2Imab Imb2 a2 Invariance of the Haar measure implies that for
a b B C every integrable function f (g), we have
2 @ 2Imab jaj2 jbj2 2Reab A
b a Z Z Z Z
Ima2 b2 2Reab Rea2 b2 f g dg f hgdg f gh dg f g1 dg
G G G G
This formula defines a homomorphism 2 : SU(2) !
For a finite group G, the integral with respect to
SO(3). It can be shown that this homomorphism is
the Haar measure is just averaging over the group:
surjective, and its kernel is the subgroup
Z
{ 1} SU(2): 1 X
f g dg f g
2 G jGj g2G
1 ! f1g ,! SU2 ! SO3 ! 1
The simplest way to see it is to establish the For compact connected Lie groups, the Haar
equivalence of 2 with the adjoint representation measure is given by a differential form of top degree
of G in g. The corresponding intertwiner is which is invariant under right and left translations.
For a torus T n = Rn =Zn with real coordinates
k 2
S2 3 i u2 2iuv
R=Z or complex coordinates tk = e2i
k , the Haar
2 i i measure is dn
:= d
1 d
2
d
n or
i v ! 2g
i i Yn
dtk
dn t :
Note that SU(2) and SO(3) are the only compact k1
2it k
groups associated with the Lie algebra sl(2, C).
The group G contains the subgroup H of diagonal In particular, consider a central function f (see
matrices, isomorphic to T1 . Consider the restriction Theorem 9). Since every conjugacy class contains
of n to T1 . It splits into the sum of unirreps k as elements of the maximal torus T (see Theorem 5),
follows: such a function is determined by its values on T, and
the integral of a central function can be reduced to
sn=2
R
G, we get an intertwining operator hAi = G A(g)dg. b as the space
We introduce the Hilbert space L2 (G)
Comparing this fact with the Schur lemma, one b whose value at a point
of matrix-valued functions on G
obtains the following fundamental results. 2G b belongs to Matd() (C). The norm is defined as
Let (, V) be any unirrep of a compact group G. X
Choose any orthonormal basis {vk , 1 k dim V} kFk2 2 b d
trFF
L G
V b
in V and denote by tkl , or tkl , the function on G 2G
defined by
For a function f on G define its Fourier transform e
f
V
tkl g gvl ; vk as a matrix-valued function on G:b
Z
V
The functions tkl are called matrix elements of the e
f f g1 gdg
unirrep (, V). G
k1 f g d
tre
f g
b
2G
It is obviously a central function on G.
(iv) The Fourier transform sends the convolution to
Remark Traditionally, in representation theory
the matrix multiplication:
the word character has two different meanings:
(1) a multiplicative map from a group to U(1), and g
f1 f2 e
f1
e
f2
(2) the trace of a representation operator (g). For
one-dimensional representations both notions where the convolution product is defined by
coincide. Z
f1 f2 h f1 hgf2 g1 dg
From the orthogonality relations we get the G
following result.
Note the special case of the inversion formula for
Corollary The characters of unirreps of G form an g = e:
orthonormal basis in the subspace of central func- X
tions in L2 (G, dg). f e d
tre
f ;
b
2G
Theorem 19 (Weyl character formula). Let 2 X . Example 10 For G = SU(2), tensor product multi-
Then plicities are given by
A X n m l
L ; A "wew
A w2W
where the sum is taken over all l such that jm nj
l m n, m n l is even.
where, for w 2 W, we denote "(w) = det wPconsid-
ered as a linear map t ! t , and = (1=2) R . For G = U(n), there is an algorithm for finding the
tensor product multiplicities, formulated in the
In particular, computing the value of the character language of Young tableaux (LittlewoodRichardson
at point t = 0 by LHopitals rule, it is possible to rule). There are also tables and computer programs
deduce the following formula for the dimension of for computing these multiplicities; some of them are
irreducible representations: listed in the bibliography.
Y h_ ; i
dim L 11
See also: Classical Groups and Homogeneous Spaces;
2R
h_ ; i Combinatorics: Overview; Equivariant Cohomology and
586 Compactification of Superstring Theory
the Cartan Model; Finite Group Symmetry Breaking; Lie Fulton W and Harris J (1991) Representation Theory. New York:
Groups: General Theory; LjusternikSchnirelman Theory; Springer.
Noncommutative Geometry and the Standard Model; Knapp A (2002) Lie Groups beyond an Introduction, 2nd edn.
Optimal Cloning of Quantum States; Ordinary Special Boston: Birkhauser.
LiE: A Computer algebra package for Lie group computations,
Functions; Quasiperiodic Systems; Symmetry Classes in
available from http://young.sp2mi.univ-poitiers.fr
Random Matrix Theory. McKay WG, Patera J, and Rand DW (1990) Tables of
Representations of Simple Lie Algebras, vol. I. Exceptional
Simple Lie Algebras. Montreal: CRM.
Further Reading Serre J-P (2001) Complex Semisimple Lie Algebras. Berlin: Springer.
Bump D (2004) Lie Groups. New York: Springer. Simon B (1996) Representations of Finite and Compact Groups.
Brocker T and tom Dieck T (1995) Representations of Compact Providence, RI: American Mathematical Society.
Lie Groups, Graduate Texts in Mathematics, vol. 98. Zelobenko DP (1973) Compact Lie Groups and Their Represen-
New York: Springer. tations. Providence, RI: American Mathematical Society.
The most general metric ansatz for a Poincare for AdS compactifications). The remaining perturba-
invariant compactification is tions can be divided into massless fields, correspond-
ing to zero modes of the linearized equations of
f 0 motion on K, and massive fields, the others. General
GIJ
0 Gij results on eigenvalues of Laplacians imply that the
masses of massive fields depend on the diameter of
where the tangent space indices are 0 I < d
K as m 1=diam(K), so at energies far smaller than
k = D, 0 < d, and 1 i k. Here is the
m, they cannot be excited (this is not universal;
Minkowski metric, Gij is a metric on K, and f is a
given strong negative curvature on K, or a rapidly
real-valued function on K called the warp factor.
varying warp factor, one can have perturbations of
As the simplest example, consider pure
small nonzero mass). Thus, the massive fields can be
D-dimensional GR. in this case, Einsteins equations
integrated out, to leave an EFT with a finite
reduce to Ricci flatness of GIJ . Given our metric
number of fields. In the classical approximation, this
ansatz, this requires f to be constant, and the metric
simply means solving their equations of motion in
Gij on K to be Ricci flat. Thus, any K which admits
terms of the massless fields, and using these
such a metric, for example, the k-dimensional torus,
solutions to eliminate them from the action. At
will lead to a compactification.
leading order in an expansion around a solution,
Typically, if a manifold admits a Ricci-flat metric,
these fields are zero and this step is trivial; never-
it will not be unique; rather there will be a moduli
theless, it is useful in making a systematic definition
space of such metrics. Physically, one then expects
of the interaction terms in the EFT.
to find solutions in which the choice of Ricci-flat
As we saw in pure GR, the configuration space
metric is slowly varying in d-dimensional spacetime.
parametrized by the massless fields in the EFT, is the
General arguments imply that such variations
moduli space of compactifications obtained by
must be described by variations of d-dimensional
deforming the original solution. Thus, from a
fields, governed by an EFT. Given an explicit
mathematical point of view, low-energy EFT can
parametrization of the family of metrics, say
be thought of as a sort of enhancement of the
Gij ( ) for some parameters , in principle the
concept of moduli space, and a dictionary set up
EFT could be computed explicitly by promoting
between mathematical and physical languages. To
the parameters to d-dimensional fields, substituting
give its next entry, there is a natural physical metric
this parametrization into the D-dimensional action,
on moduli space, defined by restriction from the
and expanding in powers of the d-dimensional
metric on the configuration space of the theory T ;
derivatives. In pure GR, we would find the four-
this becomes the sigma-model metric for the scalars
dimensional effective Lagrangian
in the EFT. Because the theories T arising from
Z q string theory are geometrically natural, this metric is
LEFT dk y det GR4 also natural from a mathematical point of view, and
q one often finds that much is already known about it.
@Gij @Gkl For example, the somewhat fearsome two derivative
det GGik Gjl @ @
@ @ terms in eqn [1], are (perhaps) less so when one
1 realizes that this is an explicit expression for the
WeilPetersson metric on the moduli space of Ricci-
While this is easily evaluated for K a symmetric space flat metrics. In any case, knowing this dictionary is
or torus, in general a direct computation of LEFT is essential for taking advantage of the literature.
impossible. This becomes especially clear when one Another important entry in this dictionary is that
learns that the Ricci-flat metrics Gij are not explicitly the automorphism group of a solution translates
known for the examples of interest. Nevertheless, into the gauge group in the EFT. This can be either
clever indirect methods have been found that give a continuous, leading to the gauge symmetry of
great deal of information about LEFT ; this is much of Maxwell and YangMills theories, or discrete,
the art of superstring compactification. However, in leading to discrete gauge symmetry. For example, if
this section, let us ignore this point and continue as if the metric on K has continuous isometry group G,
we could do such computations explicitly. the resulting EFT will have gauge symmetry G, as in
Given a solution, one proceeds to consider its the original example of Kaluza and Klein with K S1
small perturbations, which satisfy the linearized and G U(1). Mathematically, these phenomena
equations of motion. If these include exponentially of enhanced symmetry are often treated using the
growing modes (often called tachyons), the solu- languages of equivariant theories (cohomology,
tion is unstable. (Note that this criterion is modified K-theory, etc.), stacks, and so on.
588 Compactification of Superstring Theory
moduli, which are naturally complex, and Kahler Note the very important fact that this expression
moduli, which are not. However, in string compac- only depends on the cohomology classes of the i
tification the latter are complexified to the periods of (and ). This means the Yukawa couplings can be
the 2-form B iJ integrated over a basis of H2 (K, Z), computed without finding the explicit harmonic
where J is the Kahler form and B is the NS 2-form. In representatives, which is not possible (we do not
addition, there is a complex field pairing the dilaton even know the explicit metric). More generally, one
(actually, exp()) and the model-independent expects to be able to explicitly compute the super-
axion, the scalar dual in d = 4 to the 2-form B . potential and all other holomorphic quantities in
Finally, each complex modulus of the holomorphic the effective Lagrangian solely from topological
bundle E will lead to a chiral multiplet. Thus, the information (the Dolbeault cohomology ring, and
total number of massless uncharged chiral multiplets its generalizations within topological string theory).
is 1 h1, 1 (K) h2, 1 (K) dim H1 (K, End (E)). On the other hand, computing the Kahler metric in
Massless charged matter will arise from zero an N = 1 EFT is usually out of reach as it would
modes of the gauge field and its supersymmetric require having explicit normalized zero modes.
partner spinor a . It is slightly easier to discuss the Most results for this metric come from considering
spinor, and then appeal to supersymmetry to get the closely related compactifications with extended
bosons. Decomposing the spinors of SO(6) under supersymmetry, and arguing that the breaking
SU(3), one obtains (0, p) forms, and the Dirac to N = 1 supersymmetry makes small corrections
equation becomes the condition that these forms to this.
are harmonic. By the Hodge theorem, these are in There are several generalizations of this construc-
one-to-one correspondence with classes in Dolbeault tion. First, the necessary condition to solve eqn [5] is
cohomology H 0, p (K, V), for some bundle V. The that the left-hand side be exact, which requires
bundle V is obtained by decomposing the spinor into
c2 E c2 TK 7
representations of the holonomy group of E. For
H = SU(3), the decomposition of the adjoint under This allows for a wide variety of Es to be used, so
the embedding of SU(3) E6 in E8 , that Ngen = 3 can be attained with many more Ks.
This class of models is often called (0, 2) compacti-
27
248 8; 1 1; 78 3; 27 3; 6 fication to denote the world-sheet supersymmetry
implies that charged matter will form generations of the heterotic string in these backgrounds. One can
in the 27, of number dim H 0, 1 (K, E), and antigene- also use bundles with larger structure group; for
rations in the 27, of number dim H 0, 1 (K, E) = example, H = SL(4) leads to unbroken SO(10) E8 ,
0, 2
dim H (K, E). The difference in these numbers is and H = SL(5) leads to unbroken SU(5) E8 .
determined by the AtiyahSinger index theorem to be The subsequent breaking of the grand unified
group to the SM gauge group is typically done by
1 choosing K with nontrivial 1 , so that it admits a
Ngen N27 N27
2c3 E
flat line bundle W with nontrivial holonomy
In the special case of E TK, these numbers are (usually called a Wilson line). One then uses the
separately determined to be N27 = b1, 1 and bundle E
W in the above discussion, to obtain the
2, 1
N27
=b , so their difference is (K)=2, half the commutant of H
W as gauge group. For example,
Euler number of K. In the real world, this number is if 1 (K) Z5 , one can use W whose holonomy is an
Ngen = 3, and matching this under our assumptions element of order 5 in SU(5), to obtain as commutant
so far is very constraining. the SM gauge group SU(3) SU(2) U(1).
Substituting these zero modes into the ten- Another generalization is to take the 3-form H 6 0.
dimensional YangMills action and integrating, one This discussion begins by noting that, for super-
can derive the d = 4 EFT. For example, the cubic symmetry, we still require the existence of a unique
terms in the superpotential, usually called Yukawa spinor
; however, it will no longer be covariantly
couplings after the corresponding fermionboson constant in the Levi-Civita connection. One way to
interactions in the component Lagrangian, are structure the problem is to note that the right-hand
obtained from the cubic product of zero modes side of eqn [2] takes the form of a connection with
Z torsion; the resulting equations have been discussed
^ tr1 ^ 2 ^ 3 mathematically in (Li and Yau 2004).
K
Another recent approach to these compactifica-
where is the holomorphic i 2 H 0, 1 (K, Rep E) are tions (Gauntlett 2004) starts out by arguing that
the zero modes, and tr arises from decomposing the cannot vanish on K, so it defines a weak SU(3)
E8 cubic group invariant. structure, a local reduction of the structure group of
590 Compactification of Superstring Theory
The other compactifications with Ns = 16 is Finally, these constructions admit further discrete
M-theory on K3 and its further toroidal reductions, choices, which break some of the gauge symmetry.
and IIb on K3. M-theory compactification to d = 7 The simplest to explain is in the toroidal compacti-
is dual to heterotic on T 3 , with the same moduli fication of I/HE/HO. The moduli space of theories
space and enhanced gauge symmetry. As we discuss we discussed uses flat connections on the torus
at the end of the section Stringy and quantum which are continuously connected to the trivial
corrections, the extra massless gauge bosons of connection, but in general the moduli space of flat
enhanced gauge symmetry are M2 branes wrapped connections has other components. The simplest
on 2-cycles with topology S2 . For such a cycle to example is the moduli space of flat E8 E8
have zero volume, the integral of the Kahler form connections on S1 , which has a second component
and holomorphic 2-form over the cycle must vanish; in which the holonomy exchanges the two E8 s. On
expressing this in a basis for H 2 (K3, R) leads to T 3 , there are connections for which the holonomies
exactly the same condition we discussed for cannot be simultaneously diagonalized. This struc-
enhanced gauge symmetry above. The final result is ture and the M-theory dual of these choices is
that all such K3 degenerations lead to one- of the discussed in (de Boer et al. 2001).
two-dimensional canonical singularities, of types A,
D or E, and the corresponding EFT phenomenon is
Ns = 8, d < 6
the enhanced gauge symmetry of corresponding
Dynkin type A, D, or E. Again, the gravity multiplet is uniquely determined,
IIb on K3 is similar, but reducing the self-dual so the most basic classification is by the gauge group
RamondRamond (RR) 4-form potential on the 2- G. The full low-energy EFT is determined by the
cycles leads to self-dual tensor multiplets instead of matter content and action, and there are two types
Maxwell theory. The moduli space is eqn [8] but of matter multiplets. First, vector multiplets contain
with n = 5, not n = 4, incorporating periods of RR the YangMills fields, fermions and 6 d scalars;
potentials and the SL(2, Z) duality symmetry of IIb their action is determined by a prepotential which is
theory. a G-invariant function of the fields. Since the vector
One may ask if the Ns = 16 I/HE/HO theories in multiplets contain massless adjoint scalars, a generic
d = 8 and d = 9 have similar duals. For d = 8, these vacuum in which these take nonzero distinct
are obtained by a pretty construction known as vacuum expectation values (VEVs) will have U(1)r
F-theory. Geometrically, the simplest definition of gauge symmetry, the commutant of G with a generic
F-theory is to consider the special case of M-theory matrix (for d < 5, while there are several real
on an elliptically fibered CalabiYau, in the limit scalars, the potential forces these to commute in a
that the Kahler modulus of the fiber becomes small. supersymmetric vacuum). Vacua with this type of
One check of this claim for d = 8 is that the moduli gauge symmetry breaking, which does not reduce
space of elliptically fibered K3s agrees with eqn [8] the rank of the gauge group, are usually referred to
with n = 2. as on a Coulomb branch of the moduli space. To
Another definition of F-theory is the particular summarize, this sector can be specified by nV , the
case of IIb compactification using Dirichlet number of vector multiplets, and the prepotential F ,
7-branes, and orientifold 7-planes. This construction a function of the nV VEVs which is cubic in d = 5,
is T-dual to the type I theory on T 2 , which provides and holomorphic in d = 4.
its simplest string theory definition. As discussed in Hypermultiplets contain scalars which parame-
Polchinski (1999), one can think of the open strings trize a quaternionic Kahler manifold, and partner
giving rise to type I gauge symmetry as living on 32 fermions. Thus, this sector is specified by a 4nH real
Dirichlet 9-branes (or D9-branes) and an orientifold dimensional quaternionic Kahler manifold. The G
nineplane. T-duality converts Dirichlet and orienti- action comes with triholomorphic moment maps; if
fold p-branes to (p 1)-branes; thus this relation nontrivial, VEVs in this sector can break gauge
follows by applying two T-dualities. symmetry and reduce it in rank. Such vacua are
These compactifications can also be parametrized usually referred to as on a Higgs branch.
by elliptically fibered CalabiYaus, where K is the The basic example of these compactifications is
base, and the branes correspond to singularities of M-theory on a CalabiYau 3-fold (CY3 ). Reduction
the fibration. The relation between these two of the 3-form leads to h1, 1 (K) vector multiplets,
definitions follows fairly simply from the duality whose scalar components are the CY Kahler moduli.
between M-theory on T 2 , and IIb string on S1 . There The CY complex structure moduli pair with periods
is a partially understood generalization of this of the 3-form to produce h2, 1 (K) hypermultiplets.
to d = 9. Enhanced gauge symmetry then appears when the
592 Compactification of Superstring Theory
CY3 contains ADE singularities fibered over a curve, M-theory on an elliptically fibered CY3 in the same
from the same mechanism involving wrapped M2 general way we discussed under Ns = 16. The
branes we discussed under Ns = 16. If degenerating relation between F-theory and the heterotic string
curves lead to other singularities (e.g., the ODP or on K3 can be seen by lifting M-theory-heterotic
conifold), it is possible to obtain extremal transi- duality; this suggests that the two constructions are
tions which translate physically into CoulombHiggs dual only if the CY3 is a K3 fibration as well. Since
transitions. Finally, singularities in which surfaces not all elliptically fibered CY3 s are K3 fibered, the
degenerate lead to nontrivial fixed-point theories. F-theory construction is more general.
Reduction on S1 leads to IIa on CY3 , with the We return to d = 4 and Ns = 4 in the final section.
spectrum above plus a universal hypermultiplet The cases of Ns < 4 which exist in d 3 are far less
which includes the dilaton. Perhaps the most studied.
interesting new feature is the presence of world-
sheet instantons, which correct the metric on vector
Stringy and Quantum Corrections
multiplet moduli space. This metric satisfies the
restrictions of special geometry and thus can be The D-dimensional low-energy effective supergrav-
derived from a prepotential. ity actions on which we based our discussion so far
The same theory can be obtained by compactifi- are only approximations to the general story of
cation of IIb theory on the mirror CY3 . Now vector string/M-theory compactification. However, if
multiplets are related to the complex structure Plancks constant is small, K is sufficiently large,
moduli space, while hypermultiplets are related to and its curvature is small, then they are controlled
Kahler moduli space. In this case, the prepotential approximations.
derived from variation of complex structure receives In M-theory, as in any theory of quantum gravity,
no instanton corrections, as we discuss in the next corrections are controlled by the Planck scale
section. parameter MD2P , which sits in front of the Einstein
Finally, one can compactify the heterotic string on term of the D-dimensional effective Lagrangian, and
K3 T 6d , but this theory follows from toroidal plays the role of h. In general, this is different from
reduction of the d = 6 case we discuss next. the four-dimensional Planck scale, which satisfies
M2P 4 = Vol(K)MD2P . After taking the low-energy
limit E
MP , the remaining corrections are con-
Ns = 8, d = 6
trolled by the dimensionless parameters lP =R, where
These supergravities are similar to d < 6, but there R can any characteristic length scale of the solution:
is a new type of matter multiplet, the self-dual a curvature radius, the length of a nontrivial cycle,
tensor (in d < 6 this is dual to a vector multiplet). and so on.
Since fermions in d = 6 are chiral, there is an In string theory, one usually thinks of the
anomaly cancellation condition relating the numbers corrections as a double series expansion in gs , the
of the three types of multiplets (Aspinwall 1996, dimensionless (closed) string coupling constant, and
section 6.6), 0 , the inverse string tension parameter, of dimen-
sions (length)2 . The ten-dimensional Planck scale is
nH nV 29nT 273 9
related to these parameters as M8P = 1=g2s (0 )4 , up to
One class of examples is the heterotic string a constant factor that depends on conventions.
compactified on K3. In the original perturbative Besides perturbative corrections, which have power-
constructions, to satisfy eqn [7], we need to choose a like dependence on these parameters, there can be
vector bundle with c2 (V) = (K3) = 24. The result- world sheet and brane instanton corrections. For
ing degrees of freedom are a single self-dual tensor example, a string world sheet can wrap around a
multiplet and a rank-16 gauge group. More gen- topologically nontrivial spacelike 2-cycle in K,
erally, one can introduce N5B heterotic 5-branes, leading to an instanton correction to the effective
which generalize eqn [7] to c2 (E) N5B = c2 (TK). action which is suppressed as exp(Vol()=20 ).
Since this brane carries a self-dual tensor multiplet, More generally, any p-brane wrapping a p-cycle
this series of models is parametrized by nT . They are can produce a similar effect. As for which terms in
connected by transitions in which an E8 instanton the effective Lagrangian receive corrections, this
shrinks to zero size and becomes a 5-brane; the depends largely on the number and symmetries of
resulting decrease in the dimension of the moduli the fermion zero modes on the instanton world
space of E8 bundles on K3 agrees with eqn [9]. volumes.
Another class of examples is F-theory on an Let us start by discussing some cases in which one
elliptically fibered CY3 . These are related to can argue that these corrections are not present.
Compactification of Superstring Theory 593
while the chiral matter consists of metric moduli of ChernSimons action on the special Lagrangian
K, and fields corresponding to a basis for the cycles, with disk world-sheet instanton corrections,
Dolbeault cohomology group H 0, 1 (K, Rep E) where as studied in open string mirror symmetry. The
Rep E is the bundle E embedded into an E8 bundle gauge theory instantons are now D2-branes.
and decomposed into G-reps. Using the duality relation between the IIa string and
There is a general (though somewhat formal) 11-dimensional M-theory, this construction can be
expression for the superpotential, lifted to a compactification of M-theory on a seven-
Z dimensional manifold L, which is an S1 fibration over
K. The D6 and O6 planes arise from singularities in the
W ^ tr A @A
2A3
3
S1 fibration. Generically, L can be smooth, and the
Z
only candidate in Table 1 for such an N = 1
^ H 3 WNP 10
compactification is a manifold with G2 holonomy;
therefore, L must have such holonomy. Finally, both
The first term is the holomorphic ChernSimons the IIa world-sheet instantons and the D2-brane
action, whose variation enforces the F0, 2 = 0 condi- instantons lift to membrane instantons in M-theory.
tion. The second is the flux superpotential, while This construction implicitly demonstrates the exis-
the third term is the nonperturbative corrections. tence of a large number of G2 holonomy manifolds.
The best understood of these arise from super- Another way to arrive at these is to go back to the
symmetric gauge theory sectors. In some, but not all, heterotic string on K, and apply the duality (again
cases, these can be understood as arising from gauge under Ns = 16) between heterotic on T 3 and M-theory
theoretic instantons, which can be shown to be dual on K3 to the T 3 fibration structure on K, to arrive at
to heterotic 5-branes wrapped on K. Heterotic M-theory on a K3-fibered manifold of G2 holonomy.
world-sheet instantons can also contribute. Wrapping membranes on 2-cycles in these fibers, we
The HO theory is S-dual to the type I string, with can see enhanced gauge symmetry in this picture fairly
the same gauge group, realized by open strings on directly. It is an illuminating exercise to work through
Dirichlet 9-branes. This construction involves essen- its dual realizations in all of these constructions.
tially the same data. The two classes of heterotic Our final construction uses the interpretation of the
instantons are dual to D1- and D5-brane instantons, strong coupling limit of the HE theory as M-theory on
whose world-sheet theories are somewhat simpler. a one-dimensional interval I, in which the two E8
If the CY3 K has a fibration by tori, by applying factors live on the two boundaries. Thus, our original
T-duality to the fibers along the lines discussed for starting point can also be interpreted as the heterotic
tori under Ns = 16 above, one obtains various type II string on K I. This construction is believed to be
orientifold compactifications. On an elliptic fibra- important physically as it allows generalizing a
tion, double T-duality produces a IIb compactifica- heterotic string tree-level relation between the gauge
tion with D7s and O7s. Using the relation between and gravitational couplings which is phenomenologi-
IIb theory on T 2 and F-theory on K3 fiberwise, one cally disfavored. One can relate it to a IIa orientifold as
can also think of this as an F-theory compactifica- well, now with D8- and O8-branes.
tion on a K3-fibered CY4 . More generally, one These multiple relations are often referred to as the
can compactify F theory on any elliptically fibered web of dualities. They lead to numerous relations
4-fold to obtain N = 1. These theories have between compactification manifolds, moduli spaces,
D3-instantons, the T-duals of both the type I superpotentials, and other properties of the EFTs,
D1- and D5-brane instantons. whose full power has only begun to be appreciated.
The theory of mirror symmetry predicts that all
CY3 s have T 3 fibration structures. Applying the
corresponding triple T-duality, one obtains a IIa Suggestions for further reading
compactification on the mirror CY3 K, ~ with D6-
Original references for all but the most recent of
branes and O6-planes. Supersymmetry requires these topics can be found in the following textbooks
these to wrap special Lagrangian cycles in K.~ As in
and proceedings. We have also referenced a few
all Dirichlet brane constructions, enhanced gauge research articles which are good starting points for
symmetry arises from coincident branes wrapping the more recent literature. There are far more
the same cycle, and only the classical groups are reviews than we could reference here, and a partial
visible in perturbation theory. Exceptional gauge listing of these appears at http://www.slac.stanford.
symmetry arises as a strong coupling phenomenon edu/spires/reviews/
of the sort described in the previous section. The
superpotential can also be thought of as mirror to See also: Brane Construction of Gauge Theories;
eqn [10], but now the first term is the sum of a real Random Algebraic Geometry, Attractors and Flux Vacua;
Compressible Flows: Mathematical Theory 595
String Theory: Phenomenology; Superstring Theories; Connes A and Gawedzki K (eds.) (1998) Les Houches 1995:
Two-Dimensional Conformal Field Theory and Vertex Quantum Symmetries. Amsterdam: North-Holland.
Operator Algebras; Viscous Incompressible Fluids: Deligne P et al. (eds.) (1999) Quantum Fields and Strings: A Course for
Mathematical Theory. Mathematicians. Providence, RI: American Mathematical Society.
Douglas M et al. (eds.) (2004) Strings and Geometry: Proceedings
of the 2002 Clay School. Providence, RI: American Mathe-
Further Reading matical Society.
Gauntlett J (2004) Branes, calibrations and supergravity. In:
Aharony O (2000) A brief review of little string theories. Douglas M et al. (eds.) Strings and Geometry, pp. 79126.
Classical and Quantum Gravity 17: 929938. Providence, RI: American Mathematical Society.
Aspinwall PS (1996) K3 surfaces and string duality, 1996 Green MB, Schwarz JH, and Witten E (1987) Superstring Theory,
preprint, arXiv:hep-th/9611137. 2 vols. Cambridge: Cambridge University Press.
Bachas C et al. (eds.) (2002) Les Houches 2001: Unity from Li J and Yau S-T (2004) The existence of supersymmetric string
Duality: Gravity, Gauge Theory and Strings. Berlin: theory with torsion, 2004 preprint, arXiv:hep-th/0411136.
Springer. Polchinski J (1998) String Theory, 2 vols. Cambridge: Cambridge
de Boer J et al. (2002) Triples, fluxes, and strings. Advances in University Press.
Theoretical and Mathematical Physics 4: 995.
Chen (2005), Dafermos (2005), Feireisl (2004), The system above can be rewritten in Lagrangian
Lions (1986, 1988) or Liv (2000). coordinates:
@t @x v 0; @t v @x p 0
2
13
Inviscid Compressible Fluid Flows: @t e v =2 @x pv 0
Euler Equations with v = m=, where the coordinates (t, x) are
Solutions to the Euler equations [1][3] are generically the Lagrangian coordinates, which are different
discontinuous functions obeying the ClausiusDuhem from the Eulerian coordinates for [12]; for simp-
inequality, the second law of thermodynamics: licity of notations, we do not distinguish them.
For the barotropic case, systems [12] and [13]
@t S rx mS 0 9 reduce to
in the sense of distributions. Such discontinuous
@t @x m 0; @t m @x m2 = p 0 14
solutions are called entropy solutions.
When a flow is isentropic, that is, entropy S is a and
uniform constant S0 in the flow, then the Euler
equations for the flow take the simpler form: @t @x v 0; @t v @x p 0 15
Then, system [18] can be rewritten as the following Consider the Cauchy problem of the Euler
time-dependent potential flow equation of second equations [1][3] in R3 for polytropic gases with
order: smooth initial data:
by ignoring shock waves. which, roughly speaking, measure the entropy and the
radial component of momentum. Then, if (, v, S)(t, x)
is a C1 solution of [1][3] and [21] for 0 < t < T, and
Local Well-Posedness for Classical Solutions
P0 0; F0 > R4 max 0 x
Consider the Cauchy problem for the Euler equations x
[1][3] with Cauchy data [8]: with 16 =3 23
d s 1
Assume that u0 : R ! D is in H \ L with s > d=2 1. then the lifespan T of the C1 solution is finite
Then, for the Cauchy problem [1][3] and [8], there (Sideris 1985).
exists a finite time T = T(ku0 ks , ku0 kL1 ) 2 (0, 1) such
To illustrate a way in which the conditions in
that there is a unique, stable bounded classical solution
[23] may be satisfied, consider the initial data:
u 2 C1 ([0, T] Rd ) with u(t, x) 2 D for (t, x) 2 [0, T]
Rd and u 2 C([0, T]; Hs ) \ C1 ([0, T]; H s1 ). Moreover, 0 = , S0 = S. Then P(0) = 0, and [23] holds if
the interval [0, T) with T < 1 is the maximal interval
Z
of the classical H s existence for [1][3] if and only if v0 x x dx > R4
jxj<R
either k(ut ,rx u)kL1 ! 1 or u(t, x) escapes every
compact subset K ! D as t ! T. Comparing both sides, one finds that the initial
This local existence can be established by relying velocity must be supersonic in some region relative
solely on the elementary linear existence theory for to the sound speed at infinity. The formation of a
symmetric hyperbolic systems with smooth coeffi- singularity (presumably a shock wave) is detected as
cients (cf. Majda (1984)), or by the abstract the disturbance overtakes the wave front forcing the
semigroup theory (Kato 1975). front to propagate with supersonic speed.
Singularities are formed even without the condi-
tion of largeness, such as [23], being satisfied. For
Formation of Singularities example, if S0 (x) S and, for some 0 < R0 < R,
Z
For the one-dimensional case, singularities include
jxj1 jxj r2 0 x dx > 0
the development of shock waves and formation of jxj>r
vacuum states. For the multidimensional case, the Z 24
situation is much more complicated: besides shock jxj3 jxj2 r2 0 xv0 x x dx 0
jxj>r
waves and vacuum states, singularities can also be
generated from vortex sheets, focusing and breaking for R0 < r < R, then the lifespan T of the C1
of waves, among others. solution of [1][3] and [21] is finite. The
598 Compressible Flows: Mathematical Theory
() = (
1 , . . . ,
d )() be a unit normal to S 0 . Define satisfied, the compatibility conditions are automati-
the piecewise smooth initial values for respective cally guaranteed for a wide class of initial data. The
domains D
0 and D0 on either side of the hypersur- idea of the proof is to use the existence of a strictly
face S 0 as convex entropy and the symmetrization of [4]; the
shock-front solutions are defined as the limit of a
u0 x; x 2 D 0
u0 x 25 convergent classical iteration scheme based on
u
0 x; x 2 D
0 a linearization by using the theory of linearized
It is assumed that the initial jump in [25] satisfies the stability for shock fronts (Majda 1984). The uni-
RankineHugoniot condition, that is, there is a form existence time of shock-front solutions in
smooth scalar function () so that shock strength can be achieved (Metivier 1990).
u
0 u0
Global Theory in L1 for the Isentropic Euler
f u0 f u0 0 26 Equations for x 2 R
and that () does not define a characteristic Consider the Cauchy problem for [14] with initial
direction, that is, data:
6 i u
0 ; 2 S0 ; 1 i n 27 ; mjt0 0 ; m0 x 31
where i , i = 1, . . . , n, are the eigenvalues of [4]. It is where 0 and m0 are in the physical region
natural to require that S(0) = S 0 . {(, m) : 0, jmj C0 } for some C0 > 0. System
Consider the Euler equations [1][3] in R3 for [14] is strictly hyperbolic at the states with > 0,
polytropic gases with piecewise smooth initial data: and strict hyperbolicity fails at the vacuum states
V := {(, m=) : = 0, jm=j < 1}. Then, we have:
; v ; E x; x 2 D 0
; v; Ejt0 0 0 28
0 ; v0 ; E x; x 2 D 0
1. There exists a global solution (, m)(t, x) of the
Cauchy problem [14] and [31] satisfying
Assume that S 0 is a smooth compact surface in R3
and that ( 0 t; x C; jmt; xj Ct; x 32
0 , v0 , E0 )(x) belongs to the uniform local
Compressible Flows: Mathematical Theory 599
for some C > 0 depending only on C0 and , and such that, for every initial data (0 , v0 , S0 ) 2 K with
the entropy inequality TVR (0 , v0 , S0 ) N, when
in the sense of distributions for any convex weak the Cauchy problem [13] and [34] has a global
entropyentropy flux pair (, q), that is, entropy solution (, v, S)(t, x) which is bounded and
satisfies
rq; m r; mrf ; m
TVR ; v; St; C TVR 0 ; v0 ; S0
with
for some constant C > 0 independent of .
r2 ; m 0 and jV 0
This result specially includes that for the baro-
2. The solution operator (, m)(t, ) = St (0 , m0 )( ), tropic case (Nishida 1968, NishidaSmoller 1973,
determined by (1), is compact in L1loc (R) for t > 0; DiPerna 1973). Some efforts in the direction of
3. Furthermore, if (0 , m0 )(x) is periodic with period relaxing the requirement of small total variation
P, then there exists a global periodic solution have been made. Some extensions to the initial-
(, m)(t, x) with [32] such that (, m)(t, x) asymp- boundary value problems have also been made. In
totically decays to addition, an entropy solution in BV with periodic
Z data or compact support decays when t ! 0.
1
0 ; m0 xdx Furthermore, even for a general hyperbolic system
jPj P
[4] for x 2 R, we have:
in L1 .
If the initial data functions u0 (x) and v0 (x) have
The convergence of the LaxFriedrichs scheme, sufficiently small total variation and u0 v0 2 L1 (R),
the Godunov scheme, and the vanishing viscosity then, for the corresponding exact Glimm, or wave-
method for system [14] have also been established. front tracking, or vanishing viscosity solutions u(t, x)
The results are based on a compensated compact- and v(t, x) of the Cauchy problem [4] and [8], there
exists a constant C > 0 such that
ness framework to replace the BV compactness
framework. For a gas obeying the -law, the case
kut; vt; kL1 R Cku0 v0 kL1 R
= (N 2)=N, N 5 odd, was first studied by
DiPerna (1983), and the case 1 < 5=3 for for all t > 0 35
usual gases was first solved by Chen (1986) and An immediate consequence is that the whole
Ding-Chen-Luo (1985). The cases 3 and 5=3 < sequence of the approximate solutions constructed
< 3 were treated by LionsPerthameTadmor by the Glimm (1965) scheme, as well as the wave-
(1994) and LionsPerthameSouganidis (1996), front tracking method and the vanishing viscosity
respectively. The case of general pressure laws was method, converges to a unique entropy solution of
solved by ChenLeFloch (2000, 2003). All the [4] and [8] when the mesh size or the viscosity
results for entropy solutions to [14] in Eulerian coefficient tends to zero. More detailed discussions
coordinates can equivalently be presented as the and extensive references about the L1 -stability of BV
corresponding results for entropy solutions to [15] entropy solutions and related topics can be found in
in Lagrangian coordinates. The isothermal case Bressan (2000) and Dafermos (2000); also see Chen
= 1 was treated by HuangWang (2002). and Wang (2002). Furthermore, the Riemann solu-
tion is unique and asymptotically stable in the class
Global Theory in BV for the Adiabatic Euler of entropy solutions to [13] with large variation
Equations for x 2 R satisfying only one physical entropy inequality
(Chen-Frid-Li 2002).
Consider the Euler equations [13] for polytropic
gases with the Cauchy data: Multidimensional Steady Theory
; v; Sjt0 0 ; v0 ; S0 x 34 The mathematical study of two-dimensional steady
supersonic flows past wedges, whose vertex angles
Then we have (Liu 1977, Temple 1981, Chen and
are less than the critical angle, can date back to the
Wagner 2003):
1940s, since the stability of such flows is fundamental
Let K
{(, v, S) : > 0} be a compact set in R R2 , in applications (cf. CourantFriedrichs (1948)). Local
and let N 1 be any constant. Then there exists a solutions around the wedge vertex were first
constant C0 = C0 (K, N), independent of 2 (1, 5=3], constructed (Gu 1962, Schaeffer 1976, Li 1980).
600 Compressible Flows: Mathematical Theory
Such global potential solutions were constructed the free boundary has a strictly positive lower bound
when the wedge has some convexity, or is a small (Chen-Feldman 2003, 2004), which works for the
perturbation of the straight wedge with fast decay in nonlinear equations whose coefficients may depend
the flow direction (Chen 2001, Chen-Xin-Yin 2002), on not only the solution itself but also the gradients
or is piecewise smooth which is a small perturba- of the solution. The second approach is a partial
tion of straight wedge (Zhang 2003). For the hodograph procedure, with which the existence and
two-dimensional steady supersonic flows gov- stability of multidimensional transonic shocks that
erned by the full Euler equations past Lipschitz are not nearly orthogonal to the flow direction can
wedges, it indicates (Chen-Zhang-Zhu 2005a) be handled (Chen-Feldman 2004): one of the main
that, when the wedge vertex angle is less than ingredients in this approach is to employ a partial
the critical angle, the strong shock front hodograph transform to reduce the free boundary
emanating from the wedge vertex is nonlinearly problem into a conormal boundary value problem
stable in structure globally, although there may be for the corresponding nonlinear equations of diver-
many weak shocks and vortex sheets between the gence form and then develop techniques to solve the
wedge boundary and the strong shock front, under conormal boundary value problem. When the reg-
the BV perturbation of the wedge so that the total ularity of the steady perturbation is C3, or higher,
variation of the tangent function along the wedge the third approach is to employ the implicit function
boundary is suitably small. This asserts that any theorem to deal with the existence and stability
supersonic shock for the wedge problem is non- problem. Another iteration approach, which works
linearly stable. well for the two-dimensional equations whose coeffi-
A self-similar gas flow past an infinite cone in R3 cients depend only on the solution itself, has also
with small vertex angle is also nonlinearly stable been developed (Canic-Keyfitz-Lieberman 2000).
upon the BV perturbation of the obstacle (Lien-Liu Further longstanding open problems include the
1999). It is still open for the nonlinear stability when existence of global transonic flows past an airfoil or
the infinite cone in R3 has arbitrary vertex angle. a smooth obstacle (Morawetz 195658, 1985).
The stability issues of supersonic vertex sheets have
been studied by classical linearized stability analysis,
Multidimensional Unsteady Problems
large-scale numerical simulations, and asymptotic
analysis. In particular, the nonlinear development of Now we present some multidimensional time-
instabilities of supersonic vortex sheets at high dependent problems with a simplifying feature that
Mach number was predicted as time evolves the data (domain and/or the initial data) coupled
(Woodward 1985, Artola-Majda 1989). In contrast with the structure of the underlying equations
with the prediction of evolution instability, steady obey certain geometric structure so that the multi-
supersonic vortex sheets, as time-asymptotics, are dimensional problems can be reduced to lower-
stable globally in structure, even under the BV dimensional problems with more complicated
perturbation of the Lipschitz walls, although there couplings. Different types of geometric structure
may be many weak shocks and supersonic vortex call for different techniques.
sheets away from the strong vortex sheet (Chen- The Euler equations for compressible fluids
Zhang-Zhu 2005b). with geometric structure describe many important
Transonic shock problems for steady fluid flows fluid flows, including spherically symmetric flows
are important in applications (cf. Courant and and self-similar flows. Such geometric flows
Friedrichs (1948)). A program on the existence and are motivated by many physical problems such as
stability of multidimensional transonic shocks has shock diffractions, supernovas formation in stellar
been initiated and three new analytical approaches dynamics, inertial confinement fusion, and under-
have been developed (Chen-Feldman 2003, 2004). water explosions. For the initial data with large
The transonic problems include the existence and amplitude having geometric structure, the requi-
stability of transonic shocks in the whole Rd , the red physical insight is: (1) whether the solution
existence and stability of transonic flows past finite has the same geometric structure globally and
or infinite nozzles, the stability of transonic flows (2) whether the solution blows up to infinity in a
past infinite nonsmooth wedges, and the existence of finite time. These questions are not easily under-
regular shock reflection solutions. The first stood in physical experiments and numerical simula-
approach is an iteration scheme based on the tions, especially for the blow-up, because of the
nondegeneracy of the free boundary condition: the limited capacity of available instruments and
jump of the normal derivative of a solution across computers.
Compressible Flows: Mathematical Theory 601
The first type of geometric structure is spherical gradient equation when the wedge is close to a flat
symmetry. A criterion for L1 Cauchy data functions wall.
of arbitrarily large amplitude was observed to For the potential flow equation [19], a self-
guarantee the existence of spherically symmetric similar solution is a solution of the form:
solutions in L1 in the large for the isentropic flows, = t
(y), y = x=t. Letting (y) = y2 =2
(y),
which model outgoing blast waves and large-time then the system can be rewritten in the form of a
asymptotic solutions (Chen 1997). On the other hand, second-order equation of mixed hyperbolicelliptic
it is evident that the density blows up as jxj ! 0 in type in y 2 Rd by scaling:
general, especially for the focusing case; the singular-
ity at the origin makes the problem truly multi- ry jry j2 ; ry djry j2 ; 0 36
dimensional due to the reflection of waves from with (q2 , z) = (1 (q2 2z)=2)1=(1) . Equation [36]
infinity and their strengthening as they move radially at jry j = q is hyperbolic (pseudosupersonic) if
inwards. One of the important open questions is to (q2 , z) qq (q2 , z) < 0 and elliptic (pseudosubsonic)
understand the order of singularity, (t, jxj) jxj , if (q2 , z) qq (q2 , z) > 0. Under this framework,
at the origin for bounded Cauchy data. the nature of the shock reflection pattern has been
The second type of geometric structure is self- explored for weak incident shocks (strength b) and
similarity, that is, the solutions with initial data small wedge angles 2w by a number of different
functions that give rise to self-similar solutions, scalings, a study of mixed equations, and matching
especially including Riemann solutions. Compressi- asymptotics for the different scalings, where the
ble flow equations in Rd , d 2, with one or more parameter = c1 2w =b( 1) ranges from 0 to 1
linearly degenerate modes of wave propagation have and c1 is the speed of sound behind the incident
additional difficulties. In that case, the global flow is shock (Morawetz 1994). For > 2, a regular
governed by a reduced (self-similar) system which is reflection of both strong and weak kinds is
of composite (hyperbolicelliptic) type in the sub- possible as well as a Mach reflection; for <
sonic region. The linearly degenerate waves give rise 1=2, a Mach reflection occurs and the flow behind
to one or more families of degenerate characteristics the reflection is subsonic and can be constructed in
which remain real in the subsonic region. In some principle (with an elliptic problem) and matched;
cases, the reduced equations couple an elliptic and for 1=2 < < 2, the flow behind a Mach
(degenerate elliptic) problem for the density with a reflection may be transonic which is a solution of
hyperbolic (transport) equation for the vorticity. a nonlinear boundary-value problem of mixed
An important prototype for both practical type. The basic pattern of reflection has been
applications and the theory of multidimensional shown to be an almost semicircular shock issuing,
complex wave patterns is the problem of diffraction for a regular reflection, from the reflection point
of a shock wave which is incident along an inclined on the wedge and, for a Mach reflection, matched
ramp (see Glimm and Majda (1991)). When a with a local interaction flow. Some related
plane shock hits a wedge head-on, a self-similar observations were also made (Keller-Blank 1951,
reflected shock moves outward as the original Hunter-Keller 1984, Hunter 1988). It is important
shock moves forward. The computational and to establish rigorous proofs. Recently, a rigorous
asymptotic analysis shows that various patterns of existence proof was established for global solutions
reflected shocks may occur, including regular to shock reflection by large-angle wedges in Chen
reflection and (simple, double, and complex) and Feldman (2005).
Mach reflections. The main part or whole reflected
shock is a transonic shock in the self-similar
coordinates, for which the corresponding equation Analytical Frameworks for Entropy Solutions
changes the type from hyperbolic to elliptic across The recent great progress for entropy solutions for
the shock. There are few rigorous mathematical one-dimensional time-dependent Euler equations
results on the global existence and stability of and two-dimensional steady Euler equations, based
shock reflection solutions and the transition among on BV, L1 , or even L1 estimates, naturally arises the
regular, simple Mach, double Mach, and complex expectation that a similar approach may also be
Mach reflections for the potential flow equa- effective for the multidimensional Euler equations,
tion [19] and the full Euler equations [1][3]. or more generally, hyperbolic systems of conserva-
Some results were recently obtained for simplified tion laws, especially,
models including the transonic small-disturbance
equation near the reflection point and the pressure kut; kBV Cku0 kBV 37
602 Compressible Flows: Mathematical Theory
Unfortunately, this is not the case. The necessary Furthermore, since the fluid is isotropic, we are led
condition for [37] to be held for p 6 2 (Rauch to the Fourier law:
1986) is
q k; ; jrx jrx
rf k urf l u rf l urf k u
for all k; l 1; 2; . . . ; d 38 for scalar function k which, in most cases, is taken
to be simply a function of and , or even a
The analysis suggests that only systems in which the constant called the thermal conduction coefficient.
commutativity relation [38] holds offer any hope for Again, system [39][41] is closed by the constitutive
treatment in the framework of BV. This special case relations in [5]. The equation for entropy S is
includes the scalar case n = 1 and the case of one
q
space dimension d = 1. Beyond that, it contains very @t S rx mS
few systems of physical interest.
In this regard, it is important to identify effective Srx v : rx v q rx
43
analytical frameworks for studying entropy solu- 2
tions of the multidimensional Euler equations [1]
[3], which are not in BV. Naturally, we want to The second law of thermodynamics indicates that
approach the questions of existence, stability, the right-hand side of [43] should be non-negative
uniqueness, and long-time behavior of entropy which yields the restriction:
solutions with as much generality as possible. For
this purpose, a theory of divergence-measure fields k; ; jrx j 0; 0; 2=d 0
to construct such a global framework has been
developed for studying entropy solutions (Chen-Frid The case > 0 and > 0 is the viscous case
1999, 2000, Chen-Torres 2005, Chen-Torres-Ziemer with heat conductivity k > 0. In particular, the
2005). For more details, see Chen (2005). kinetic theory indicates that the Stokes relationship
should hold, namely = 2=d and the adiabatic
component = 5=3 for monatomic gases.
Viscous Compressible Fluid Flows: In mathematical viscous fluid dynamics, an
NavierStokes Equations important model is the barotropic model for
Compressible fluid flows that are viscous and viscous fluids, that is, p = p(). Then, the specific
conduct heat are governed by the following energy E can be taken in the form of
NavierStokes equations: E = (1=2)jvj2 e() with e0 () = p()=2 . For clas-
sical solutions, the energy of a barotropic flow
@t rx m 0; x 2 Rd 39 satisfies the equality:
@t E rx E pv rx Sv S : rx v
mm
@t m rx rx p rx S 40
which is now a direct consequence of [39] and [40].
The question of local existence of classical
m m
@t E rx E p rx S rx q 41 solutions to [39][41] for regular initial data was
addressed by Nash (1962), where there is no
Here, S = S(rx v, , ) is the viscous stress tensor indication whether or not these solutions exist for
which is symmetric from the conservation of angular all times.
momentum and q is the heat flux. If the fluid is In the case of one space dimension, the well-
isotropic and the viscous tensor S is a linear function posedness is largely settled. The basic result for the
of rx v and invariant under a change of reference existence of classical solutions is that of Kazhikhov
frame (translation and rotation), then we deduce (1976); see Lions (1998) and Feireisl (2004) for
from elementary algebraic manipulations that extensive references. The discontinuous solutions
necessarily have been constructed (Shelukhin 1979, Serre 1986,
Hoff 1987, Chen-Hoff-Trivisa 2000).
S ; rx v 2; D 42 For the NavierStokes equations in R3 with
general equation of state, the global classical
which corresponds to the Newtonian fluids, where solutions for the Cauchy problem and various
D = (rx v (rx v)> )=2 is the deformation tensor and initial-boundary value problems whose initial data
and are the Lame viscosity coefficients. is small around a constant state have been
Compressible Flows: Mathematical Theory 603
constructed (Matsumura-Nishida 1980, 1983). The The inviscid limits from the NavierStokes equa-
approach is to obtain a priori estimates via energy tions to the Euler equations have been established as
methods for extending the local solution or for a long as the solutions of the Euler equations are
difference method globally. These results have been smooth, when the viscosity and heat conductivity
extended to the Cauchy problem or the initial- coefficients tend to zero (Klainerman-Majda 1982).
boundary value problems with small discontinuous It is completely open for general entropy solutions,
initial data (Hoff 1997). even in the one-dimensional case.
For the NavierStokes equations in Rd for
barotropic flows with [11] and large initial data, See also: Breaking Water Waves; Capillary Surfaces;
the global existence of solutions containing vacuum Fluid Mechanics: Numerical Methods; Geophysical
for the Cauchy problem or various initial-boundary Dynamics; Incompressible Euler Equations:
Mathematical Theory; Inviscid Flows;
value problems was first established by Lions
Magnetohydrodynamics; Newtonian Fluids and
(1998) for 3=2 if d = 2, 9=5 if d = 3, and
Thermohydraulics; Non-Newtonian Fluids; Partial
> d=2 if d 4. The gap was closed by Feireisl Differential Equations: Some Examples; Stability of
NovotnyPetzeltova (2001) for the full range Flows; Viscous Incompressible Fluids: Mathematical
> d=2. These results have been extended to the Theory.
full NavierStokes equations describing the motion
of a general compressible, viscous, and heat con-
ducting fluid (see Feireisl (2004)). The physically
relevant isothermal case, = 1, is completely open Further Reading
even if d = 2. The only large data existence result is Bressan A (2000) Hyperbolic Systems of Conservation Laws: The
that for radially symmetric data (Hoff 1992). The One-Dimensional Cauchy Problem. Oxford: Oxford Univer-
general case 1 and d = 3 for radially symmetric sity Press.
data was solved only recently (Jiang-Zhang 2001). Chen G-Q (2005) Euler equations and related hyperbolic
conservation laws. In: Dafermos CM and Feireisl E (eds.)
The lower-bound estimate on the density is a Handbook of Differential Equations II: Evolutionary Differ-
delicate issue. Weak solutions containing vacuum ential Equations, Chapter 1, pp. 1104. Amsterdam: Elsevier.
for the isentropic viscous flows with constant Chen G-Q and Wang D (2002) The Cauchy problem for the
viscosity are unstable in general (Hoff-Serre Euler equations for compressible fluids. In: Friedlander S
1991). Hence, it is important to see whether and Serre D (eds.) Handbook of Mathematical Fluid
Dynamics, vol. 1, ch. 5, pp. 421543. Amsterdam: Elsevier
vacuum will never develop if the initial data is Science B.V.
away from vacuum; this has been shown for the Courant R and Friedrichs KO (1948) Supersonic Flow and Shock
one-dimensional case for large initial data and Waves. New York: Springer.
for the multidimensional case with small data. On Dafermos CM (2005) Hyperbolic Conservation Laws in Con-
tinuum Physics (2nd edn). Berlin: Springer.
the other hand, from the kinetic theory, if
Feireisl E (2004) Dynamics of Viscous Compressible Fluids.
solutions contain vacuum, then the viscosity Oxford: Oxford University Press.
coefficients in the NavierStokes equations should Glimm J (1965) Solutions in the large for nonlinear hyperbolic
depend on the density near vacuum; this indeed system of equations. Communications on Pure and Applied
stabilizes the solutions for the one-dimensional Mathematics 18: 95105.
case. Glimm J and Majda A (1991) Multidimensional Hyperbolic
Problems and Computations. New York: Springer.
The stability of viscous shock waves has been Lax PD (1973) Hyperbolic Systems of Conservation Laws and
studied for the one-dimensional case (see Liu (2000) the Mathematical Theory of Shock Waves. Philadelphia:
and the references therein). The compressible SIAM.
incompressible limits from the isentropic compres- Lions PL (1996, 1998) Mathematical Topics in Fluid Mechanics,
sible to incompressible NavierStokes equations vols. 12. New York: Oxford University Press.
Liu T-P (2000) Hyperbolic and Viscous Conservation Laws,
when the Mach number tends to zero have been CBMS-NSF RCSAM, vol. 72. Philadelphia: SIAM.
established for arbitrarily weak solutions (Lions- Majda A (1984) Compressible Fluid Flow and Systems of
Masmoudi 1998) and for smooth solutions and a Conservation Laws in Several Space Variables. New York:
class of initial data functions (Hoff 1998). Springer.
604 Computational Methods in General Relativity: The Theory
weak gravity is in the locality of Earth. However, as these events using the techniques of numerical
befits anything of Einsteinian nature, the weakness relativity have the potential to substantially hasten
of gravity is relative, so that at the surface of a the discovery process, on the basis of the general
neutron star, one would find principle that if one knows what signal to look for,
it is much easier to extract that signal from the
RM
0:4 3 experimental noise.
R The computational task facing numerical relati-
while for black holes, one has vists who study problems such as binary inspiral is
RM formidable. In particular, such problems are intrin-
1 4 sically 3D, to use the CFD (computational fluid
R
dynamics) nomenclature in which time dependence
In such circumstances, gravity is anything but is always assumed. That is, the PDEs that must be
weak! Furthermore, in situations where the mat- solved govern functions, F(t, xk ), that depend on all
terenergy distribution has a highly time-dependent three spatial coordinates, xk , as well as on time, t.
quadrupole moment such as occurs naturally with Unfortunately, even a cursory description of 3D
a compact-binary system (i.e., a gravitationally work in numerical relativity as it stands at this time
bound two-body system, in which each of the is far beyond the scope of this article.
bodies is either a black hole or a neutron star) the What follows, then, is an outline of a traditional
dynamics of the gravitational field, including, approach to numerical relativity that underpins
crucially, the dynamics of the radiative components many of the calculations from the early years of
of the gravitational field, can be expected to the field (1970s and 1980s), most of which were
dominate the dynamics of the overall system, carried out with simplifying restrictions to
matter included. For scenarios such as these, it either spherical symmetry or axisymmetry. The
should come as no surprise that the solution of the mathematical development, which will hereafter be
combined gravitohydrodynamical system begs for called the 3 1 approach to general relativity, has
numerical analysis. the advantage of using tensors and an associated
In addition, both from the physical and mathe- tensor calculus that are reasonably intuitive for the
matical perspectives, it is also natural to study the physicist. This standard 3 1 approach is also
strong, field dynamic regimes (R ! RM and/or v ! c, sufficient in many instances (particularly those
where v is the typical speed characterizing internal with symmetry) in the sense that it leads to well-
bulk motion of the matter) of general relativity posed sets of PDEs that can be discretized and
within the context of a variety of matter models. then solved computationally in a convergent
Typical processes addressed by these theoretical (stable) fashion. In addition, a thorough under-
studies include the process of black hole formation, standing of the 3 1 approach will be of sig-
end-of-life events for various types of model stars, nificant help to the reader wishing to study any of
and, again, the interaction, including collisions, of the current literature in numerical relativity,
gravitationally compact objects. Note that it is including the 3D work.
another hallmark of general relativity that highly However, the reader is strongly cautioned that
dynamical spacetimes need not contain any matter; the blind application of any of the equations that
indeed, the interaction of two black holes the follow, especially in a 3D context, may well lead
natural analog of the Kepler problem in relativity to ill-posed systems, numerical analysis of which
is a vacuum problem; that is, it is described by a is useless. Anyone specifically interested in using
solution of [1] with T = 0. the methods of numerical relativity to generate
Motivated in significant part by the large-scale discrete, approximate solutions to [1], particularly
efforts currently underway to directly detect gravita- in the generic 3D case, is thus urged to first
tional radiation (gravitational waves), much of the consult one of the comprehensive reviews of
contemporary work in numerical relativity is numerical relativity that continue to appear at
focused on precisely the problem of the late phases fairly regular intervals (see, e.g., Lehner (2001), or
of compact-binary inspiral and merger. Such bin- Baumgarte and Shapiro (2003)). Most such refer-
aries are expected to be the most likely candidates ences will also provide a useful overview of many
for early detection by existing instruments such as of the most popular numerical techniques that are
TAMA, GEO, VIRGO, LIGO, and, more likely, by currently being used to discretize (convert to
planned detectors including LIGO II and LISA (see, algebraic form) the Einstein equations, as well as
e.g., Hough and Rowan (2000)). Detailed and the main algorithms that are used to solve the
accurate predictions of expected waveforms from resulting discrete equations. These subjects are not
606 Computational Methods in General Relativity: The Theory
described below, not least since discussion of the of t should nominally be infinite, both to the future
available discretization techniques only makes as well as to the past; that is, the solution domain is
sense in the context of PDEs of specific systems
with specific boundary conditions, while there is 1 < t < 1 6
only space here to describe the general mathema- 1=2
tical setting for 3 1 numerical relativity. jXj ij xi xj <1 7
Di V j @i V j j ik V k 12
As Figure 1 illustrates, a quick route to the 3 1
and decomposition of the above expression, and thus of
the tensor g itself, is based on an application of
Di Wj @i Wj k ij Wk 13 the four-dimensional Pythagorean theorem. In
setting up the calculation, one naturally identifies
respectively. four functions, the scalar lapse, (t, xk ), and the
Given the Christoffel symbols, the components of vector shift, i (t, xk ), that encode the full coordi-
the spatial Riemmann tensor, denoted here Rijk l , are nate (gauge) freedom of the theory. That is,
computed using complete specification of the lapse and shift is
equivalent to completely fixing the spacetime
Rijk l @j l ik @i l jk m ik l mj coordinate system.
In light of the above discussion, and again
m jk l mi 14
referring to Figure 1, one readily deduces the 3 1
decomposition of the spacetime line element:
Finally, the Ricci tensor, Ri j , and Ricci scalar, R, are
defined in the usual fashion
ds2 2 dt2 ij dxi i dt dxj j dt 18
completely covariant, but otherwise arbitrary, space- the extrinsic curvature (or second fundamental
time tensor, Q , constitute the components of a form). This additional tensor is analogous to a
completely covariant spatial tensor. time derivative of ij (t, xk ), or, from a Hamiltonian
A straightforward calculation, which provides a perspective, to a variable that is dynamically
good exercise in the use of the 3 1 calculus, conjugate to ij (t, xk ).
yields the following equally useful identifications for As the name suggests, the extrinsic curvature
various pieces of the inverse spacetime metric: g describes the manner in which the slice (t) is
embedded in the manifold (to be contrasted with
g00 2 23 Rijk l defined by [14] which is, as mentioned
previously, completely insensitive to the manner in
g0i gi0 2 i 24 which the hypersurface is embedded in M ).
Geometrically, Kij is computed by calculating the
gij ij 2 i j 25 spacetime gradient of the normal covector field, n ,
and projecting the result on to the hypersurface,
Since the Einstein field equations are equations
with, loosely speaking, geometry on one side and Kij 12 ri nj 31
matter on the other, tensors built from matter fields
must also be decomposed. In particular, it is where it must be stressed that r is the spacetime
conventional to define tensors,
, ji , and Sij that covariant derivative operator compatible with the
result from various projections of the spacetime 4-metric, g ; that is, r g = 0. A straightforward
stress energy tensor, T , onto the hypersurface: tensor calculus calculation then yields the following,
which can be viewed as a definition of the Kij :
n n T 26
1
Kij @t ij Di j Dj i 32
ji n T i 27 2
Here, Di is the spatial covariant metric, compatible
Sij Tij 28 with ij (Dk ij = 0), that was defined previously.
For observers with 4-velocities u equal to n , and Observe that this equation can be easily solved for
only for those observers with u = n , the above @t ij (this will be done below), and thus, in the 3 1
quantities have the interpretation of the locally and approach it is [32] that is the origin of the evolution
instantaneously measured energy density, momen- equations for the 3-metric components, ij .
tum density, and spatial stresses, respectively. As
with the geometric quantities, all of the matter
variables,
, ji , and Sij defined in [26][28] are Einsteins Equations in 3 1 Form
spatial tensors and thus have their indices (if any)
raised and lowered with the 3-metric. Note that the The Constraint Equations
identification Sij = Tij is another illustration of As is well known, as a result of the coordinate (gauge)
the general result mentioned in the context of the invariance of the theory, general relativity is overdeter-
previous identification of ij and gij . mined in a sense completely analogous to the situation
Finally, observing that time parameters are natu- in electrodynamics with the Maxwell equations. One
rally defined in terms of level surfaces (equipotential of the ways that this situation is manifested is via the
surfaces), it should be no surprise that the covariant existence of the constraint equations of general
components, n , of the hypersurface normal field, relativity. Briefly, starting from the naive view that
the ten metric functions, g (t, xi ), that completely
n ; 0; 0; 0 29
determine the spacetime geometry are all dynamical
are simpler than the components, n , of the normal that is, that they satisfy second-order-in-time equations
itself, of motion one finds that the Einstein equations do not
n 1 ; 1 i 30 provide dynamical equations of motion for the lapse,
, or the shift, i . Rather, four of the field equations [1]
and, in fact, eqn [29] can also be deduced from a are equations of constraint for the true dynamical
quick study of Figure 1. variables of the theory, {ij , @t ij }, or, equivalently,
In the 3 1 approach, in addition to the 3-metric, {ij , Ki j }. Note that in the following, the mixed
ij (t, xk ), and coordinate functions, (t, xi ) and form, Ki j , is at times used again by convention as
(t, xi ), it is convenient to introduce an additional the principal representation of the extrinsic curvature
rank-2 symmetric spatial tensor, Kij (t, xk ), known as tensor (instead of Kij as previously, or Kij ).
Computational Methods in General Relativity: The Theory 609
their solutions as X ij xi xj ! 1 encodes the matter fields. Then determine the remaining four
conserved mass and linear momentum (four numbers) dynamical gravitational fields from the constraints
that can be defined in asymptotically flat spacetimes. [35] and [36]. This completes the initial data
In a general 3 1 coordinate system, and with an specification.
appropriate choice of variables, the constraints can One must now choose a prescription for the
be written as a set of quasilinear elliptic equations kinematical (coordinate) functions, and i , so that
for four of the {ij , Ki j } (or, more properly, for either explicitly or implicitly, they are completely fixed;
certain algebraic combinations of the {ij , Ki j }). for the case of implicit specification, this may well
Thus, especially for 2D and 3D calculations, the mean that the coordinate functions themselves will
setting of initial data for the Cauchy problem in satisfy PDEs, which, furthermore, can be of essentially
general relativity is itself a highly nontrivial mathe- any type in practice (i.e., elliptic, hyperbolic, para-
matical and computational exercise. Readers bolic, . . .). Finally, with consistent initial data,
wishing more details on this subject are directed to {ij (0, xk ), Ki j (0, xk ); A (0, xk )}, in hand, and with a
the comprehensive review by Cook (2000). prescription for the coordinate functions, the evolution
610 Computational Methods in General Relativity: The Theory
equations [37] and [38] can be used to advance the It is critical to note at this point, however, that in
dynamical variables forward or backward in time. the vast bulk of past and current work in numerical
The above description is naive since, apart from a relativity, including most of the ongoing work in
consistent mathematical specification, the most crucial 3D, the Einstein equations [1] have been solved, not
issue in the solution of a time-dependent PDE as a as a pure Cauchy problem, but as a mixed initial-
Cauchy problem is that the problem be well posed. value/boundary-value (IBVP) problem. That is, in
Roughly speaking, this means that solutions do not the discretization process in which the continuum
grow without bound (blow-up) without physical equations [1] are replaced with algebraic equations,
cause, and that small, smooth changes to initial data the continuum domain [6][7] is typically replaced
yield correspondingly small, smooth changes to the with a truncated spatial domain
evolved data. In short, the Cauchy problem must be
stable, and whether or not a particular subset of jxi j Ximax 45
the equations displayed in this section yields a well- where the Ximax are a priori specified constants
posed problem is a complicated and delicate issue, (parameters of the computational solution) that
especially in the generic 3D case. The reader is thus define the extremities of the computational box.
again cautioned against blind application of any of the As one might expect, the theory underlying stability
equations displayed in this article. and well-posedness of IBVP problems especially
for differential systems as complicated as [1] is
even more involved than for the pure initial-value
Boundary Conditions
case, and is another very active area of research in
In principle, because all spacelike hypersurfaces, (t), both mathematical and numerical relativity
in a pure Cauchy evolution are edgeless and provided (see, e.g., Friedrich and Nagy (1999)).
that the initial data {ij (0, xk ), Ki j (0, xk ); A (0, xk )} is
consistent with asymptotic flatness, or whatever other See also: Critical Phenomena in Gravitational Collapse;
condition is appropriate given the topology of the Einstein Equations: Initial Value Formulation; Fluid
(t) there are essentially no boundary conditions to Mechanics: Numerical Methods; General Relativity:
Overview; Geometric Analysis and General Relativity;
be imposed on the dynamical variables, {ij (t, xk ),
Gravitational Waves; Hamiltonian Reduction of Einsteins
Ki j (t, xk )}, during Cauchy evolution. Note that asymp- Equations; Magnetohydrodynamics; Spacetime
totic flatness generally requires that Topology, Causal Structure and Singularities; Symmetric
1 Hyperbolic Systems and Shock Waves.
lim ij fij O 40
X!1 X
and
Further Reading
i 1
lim K j O 41
X!1 X2 Baumgarte T and Shapiro SL (2001) Numerical relativity and
compact binaries. Physics Reports 376: 41131.
where X is defined by Cook G (2000) Initial data for numerical relativity. Living
q Reviews of Relativity 3: 5 (irr-2000-5).
X ij xi xj 42 Font JA (2003) Numerical hydrodynamics in general relativity.
Living Reviews of Relativity 6: 4 (irr-2003-4).
as previously, and fij is the flat 3-metric. Similarly, Frauendiener J (2004) Conformal infinity. Living Reviews of
Relativity 7: 1 (irr-2004-1).
should the lapse, , and shift, , be constrained by
Friedrich H and Nagy G (1999) The initial boundary value
elliptic PDEs as is frequently the case in practice problem for Einsteins vacuum field equation. Communica-
then the only natural place to set boundary condi- tions in Mathematical Physics 201: 619655.
tions is at spatial infinity, and then, provided that Hough J and Rowan S (2000) Gravitational wave detection by
the frame at spatial infinity is inertial, with interferometry (ground and space). Living Reviews of Rela-
tivity 3: 3 (irr-2000-3).
coordinate time t measuring proper time, one should
Lehner L (2001) Numerical relativity: a review. Classical and
have Quantum Gravity 18: R25R86.
Misner CW, Thorne KS, and Wheeler JA (1973) Gravitation.
1
lim 1 O 43 San Francisco: W.H. Freeman.
X!1 X Reula OA (1998) Hyperbolic methods for Einsteins equations.
Living Reviews of Relativity 1: 3 (irr-1998-3).
and Winicour J (2001) Characteristic evolution and matching. Living
1 Reviews of Relativity 4: 3 (irr-2001-3).
lim i O 44
X!1 X
Constrained Systems 611
Conformal Geometry see Two-dimensional Conformal Field Theory and Vertex Operator Algebras
Constrained Systems
M Henneaux, Universite Libre de Bruxelles, of motion in the standard canonical form
Brussels, Belgium qi = @H=@pi , pi = @H=@qi . These canonical
2006 Elsevier Ltd. All rights reserved. equations are in normal form and have a unique
solution for given initial data, which would
contradict the presence of a gauge symmetry.
A simple example that illustrates this phenom-
Introduction enon is given by the following model for three
Consider a dynamical system with coordinates variables q1 , q2 , and , the Lagrangian of which
qi (i = 1, . . . , n) and Lagrangian L(qi , qi ) (field theory reads
is formally covered by regarding the spatial coordi-
nates as a continuous index). When going to the L 12 q_ 1 2 q_ 2 2 2
Hamiltonian formulation, it is usually assumed that
This model is inspired by electromagnetism: the
the Legendre transformation between the velocities
variables q1 and q2 play a role somewhat similar
qi and the momenta
to that of the spatial components of the vector
@L potential, while corresponds to the temporal
pi 1
@ q_ i component. The Lagrangian is invariant under the
gauge transformations
can be inverted to yield the velocities as functions of
the qs and the ps. This regular situation occurs q1 ! q1 "; q2 ! q2 "; ! "_ 3
for most systems appearing in standard classical
mechanics and enables one to proceed to the where " is an arbitrary function of time. The
Hamiltonian formulation of the theory without conjugate momenta are
difficulty.
In field theory, however, the regular case is the p1 q_ 1 ; p2 q_ 2 ; 0
exception rather than the rule. This is due to gauge
One cannot invert the Legendre transformation
invariance and first-order Lagrangians.
since one cannot express the velocity _ in terms of
Gauge invariance A system possesses gauge sym- the momenta.
metries if it is invariant under transformations that First-order Lagrangians Fermionic fields obey
involve arbitrary functions of time (gauge trans- first-order equations. Their Lagrangian is linear
formations). In that case, the solution of the in the derivatives, so that the conjugate momenta
equations of motion with given initial data is not pi depend on the coordinates qi only. It is then
unique, since it is always possible to perform a clearly impossible to express the velocities in
gauge transformation in the course of the evolution terms of the momenta through the Legendre
without changing the initial data. It is then clear transformation. More generally, any first-order
that the Legendre transformation cannot be inver- Lagrangian with or without gauge symmetry leads
tible, for if it were, one could rewrite the equations to a noninvertible Legendre transformation.
612 Constrained Systems
A simple system that exhibits this feature is by their expression [1] in terms of the coordinates
described by the Lagrangian and the velocities. They are called primary con-
straints. We shall assume that the matrix
L z2 z_ 1 12 z2 2 4
@m
1 2
for two bosonic degrees of freedom (z , z ). This @pi ; qi
is in fact the canonical form of the Lagrangian for
a free particle in one dimension (z2 is the is everywhere of constant (maximum) rank M on the
momentum conjugate to the position z1 ): the phase-space surface defined by eqns [6] which is
system is already in Hamiltonian form. There is assumed to be smooth. This surface is of dimension
no gauge invariance, but because the Lagrangian 2n M.
is first order, the Legendre transformation with
[4] as starting point, Canonical Hamiltonian The next step in the Dirac
procedure is to define the canonical Hamiltonian H
p 1 z2 ; p2 0 5 through
is non invertible for the velocities (which do not H q_ i pi L 7
even appear in the formulas for the momenta).
As shown by Dirac, H can be re-expressed as a
Dirac showed how to develop the Hamiltonian function H(q, p) of the momenta and the coordi-
formalism in the case when the Legendre transfor- nates, even when the Legendre transformation is not
mation is not invertible. One can still reformulate invertible: the canonical Hamiltonian H depends on
the equations in phase space and write them in terms the velocities only through the pi s. Furthermore, the
of brackets with the Hamiltonian, but a new major original equations of motion in Lagrangian form are
feature emerges, namely the canonical variables are equivalent to the Hamiltonian equations
no longer free. Rather, the permissible phase-space
points are constrained to be on the so-called @H @m
q_ i um 8
constrained surface. For this reason, systems for @pi @pi
which the Legendre transformation is not invertible
are also called constrained Hamiltonian systems. @H @m
p_ i um 9
We shall adopt this terminology here. @qi @qi
The purpose of this article is to explain the main
ideas underlying the Dirac method. To simplify the m q; p 0 10
discussions and to focus on the features peculiar to
the Dirac construction, we shall assume as a rule where the um s are parameters, some of which will
that all necessary smoothness conditions are fulfilled be determined through the consistency algorithm to
by the functions, surfaces, etc., appearing in the be discussed shortly. (In [7][9] and everywhere
formalism. How to develop the analysis when some below, there is a summation over the repeated
of the smoothness conditions are not fulfilled is of indices.)
definite interest but goes beyond the scope of this
review. We shall also assume, for definiteness, that Secondary constraints The equations of motion [8]
all the variables are bosonic in order to avoid and [9] can be rewritten as
straightforward but somewhat cumbersome sign F_ F; H um F; m 11
factors in the formulas.
where F = F(q, p) is any function of the canonical
variables. Here, the Poisson bracket is defined as
General Theory usual by
Dirac Algorithm @G @F @G @F
G; F 12
Primary constraints When the Legendre transfor- @qi @pi @pi @qi
mation [1] cannot be inverted, the momenta pi s do If one takes for F one of the primary constraints
not span an n-dimensional space but are constrained m , one should get zero, _ m = 0. This yields the
by relations consistency conditions
m q; p 0; m 1; . . . ; M 6 0
m ; H um m ; m0 0 13
which follow from their definition. These equations These conditions can imply further restrictions on the
reduce to identities when the momenta are replaced canonical variables and/or impose conditions on the
Constrained Systems 613
variables um . Any new relation X(q, p) = 0 on the Poisson brackets with all the constraints vanish
canonical variables leads, in turn, to a further consis- weakly (i.e., are zero on the constraint surface),
0
tency condition X = [X, H] um [X, m0 ] = 0, which
can bring in either further restriction on the constraint F; j 0; j 1; . . . ; J 18
surface or fix more variables um . Constraints that
A function is second class otherwise, that is, if there
follow from the consistency algorithm are called
is at least one constraint j such that [F, j ] 6 0
secondary constraints. Finally, one is left with a
(not even weakly). Second-class functions generate
certain number of secondary constraints, which are
canonical transformations that do not leave the
denoted by k = 0, k = M 1, . . . , M K. We assume
constraint surface invariant. Since canonical trans-
again that all the constraints (primary and secondary)
formations that map the constraint surface on itself
define a smooth surface, called the constraint surface,
form a group, the Poisson bracket of two first-class
and fulfill the condition that @(k )=@(qi , pi ) is of
functions is itself a first-class function.
maximum rank J M K on the constraint surface.
Because the system is constrained to lie on the
(We also assume for simplicity that there is no
constraint surface, the only allowed canonical
branching in the consistency algorithm.)
transformations are those that are generated by
first-class functions. The importance of the distinc-
Restrictions on the us Having a complete set of tion between first-class and second-class functions
constraints stems from this elementary fact. Note, in particular,
that the time evolution is generated as it should
j 0; j 1; . . . ; M K J 14 by a first-class generator since the equations of
motion [11] can be rewritten as
we can now investigate more precisely the restric-
tions on the variables um . These read F_ F; H 0 ua F; Vam m 19
with
j ; H um j ; m 0; j 1; . . . ; J 15
H 0 H U m m 20
where the notation means equal modulo the
constraints. In [15], m is summed from 1 to M.
0
One has both [H , m ] 0 and [Vam m , j ] 0.
Equations [15] are a set of J linear, inhomogeneous
equations for the us, with coefficients that are Splitting of the constraints One can separate
functions of the canonical variables qi , pi . The the constraints between first-class and second-class
general solution of this system is of the form constraints. This can be achieved by considering the
matrix Cjj0 of the Poisson bracket of the constraints,
um Um ua Vam 16
Cjj0 j ; j0 ; j; j0 1; . . . ; J 21
where Um is a particular solution and where the Vam
(a = 1, . . . , A) provide a complete set of independent One has the following theorem due to Dirac.
solutions of the homogeneous system Theorem 1 If det Cjj0 0, there exists at least one
Vam j ; m 0 17 first-class constraint among the j s.
Proof Straightforward: if det Cjj0 0, one can find
The coefficients ua (a = 1, . . . , A) are completely a nontrivial solution j of j Cjj0 0. The corre-
arbitrary. sponding constraint j j is easily verified to be first
We thus see the emergence of another new feature class.
in the theory, in addition to the appearance of 0
constraints. It is that the general solution of the By redefining the constraints as j ! j = aj j j0
0
equations of motion may contain arbitrary functions with aj j (q, p) invertible, one can bring the Poisson
of time (when A 6 0), in agreement with the brackets of the constraints to the form
possible presence of a gauge symmetry.
a ; b 0; a ; 0; ; C 22
with (j ) (a , ) and where the matrix C is
First- and Second-Class Constraints
invertible. (We assume, for simplicity, throughout
First- and second-class functions A function F(q, p) that the rank of the matrix Cjj0 is constant on the
is called a first-class function if it generates a constraint surface (regular case).) In this repre-
canonical transformation that maps the constraint sentation, the constraints are completely split into
surface on itself. Thus, F(q, p) is first class if its first-class constraints (a ) and second-class
614 Constrained Systems
constraints ( ): there is no first-class constraint left transformations as being the transformations gener-
among the s, and the set {a } exhausts all the ated by the first-class constraints).
first-class constraints. Note that now the index The extended Hamiltonian HE is defined to be the
runs over all (primary and
a = 1, . . . , A, A 1, . . . , A sum of the first-class Hamiltonian [20] and of all the
secondary) first-class constraints. first-class constraints a multiplied by an arbitrary
This separation of the constraints into first-class Lagrange multiplier,
and second-class constraints is quite important
H E H 0 va a 23
because, as already seen above, the first-class
constraints generate admissible canonical transfor- (with a summed from 1 to A). It is the generator of
mations, while the second-class constraints do not. the time evolution in which the complete gauge
For a bosonic system, the matrix C is antisym- symmetry is fully displayed.
metric. As C is invertible, this implies that the
number of second-class constraints is even. In the
fermionic case, C is symmetric (in the fermionic Elimination of second-class constraints Dirac
sector) and, therefore, the number of second-class brackets Second-class constraints do not generate
constraints can be even or odd. permissible canonical transformations, since they do
not map the constraint surface on itself. For this
reason, it is convenient to eliminate them. This can
First-class constraints and gauge symmetries The consistently be done by using the Dirac brackets
first-class constraints not only map the constraint instead of the Poisson brackets. By definition, the
surface on itself, but generate, in fact, transforma- Dirac bracket [F, G]D of two phase-space functions
tions that do not change the physical state of the F and G is given by
system, that is, gauge transformations. Indeed, the F; DD F; G F; C ; G 24
presence of arbitrary functions in the solutions of
the equations of motion indicates that the qs and where C is the inverse to C ,
the ps involve some redundancy and are not all
C C
physically distinct. Only those phase-space functions
whose time evolution does not depend on the (which exists since the s are second class). As
arbitrary functions ua are observables. shown by Dirac, the bracket [24] is indeed a bracket
That the first-class constraints generate gauge (antisymmetry, derivation property, and Jacobi
transformations is rather clear in the case of the identity). Furthermore, it fulfills the crucial property
first-class primary constraints, since these appear that the Dirac bracket of anything with any second-
explicitly in the generator of the time evolution class constraint is zero,
multiplied by arbitrary functions. That it also holds
for the first-class secondary constraints is known as F; D 0 F arbitrary 25
the Dirac conjecture. This conjecture can be
Thus, one can consistently eliminate the second-class
proved under reasonable assumptions (see, e.g.,
constraints and replace the Poisson bracket by the
Henneaux et al. 1990). The reason that the
Dirac bracket. Once this is done, one has fewer
secondary first-class constraints also correspond to
canonical variables and only first-class constraints
gauge transformations is that they appear in the
remain (if any). It also follows from the definition
brackets of the Hamiltonian with the primary first-
that the Dirac bracket of two first-class functions is
class constraints. Thus, different choices of arbitrary
equal to their Poisson bracket.
functions ua in the dynamical equations of motion
will lead to phase-space points that differ by a
canonical transformation whose generator involves Gauge conditions One can push the reduction
the secondary first-class constraints as well. procedure further and eliminate the first-class con-
In any case, as noted below, one must identify the straints by means of gauge conditions. Gauge condi-
phase-space points in the same orbit generated by all tions Ca = 0 are conditions on the phase-space
the first-class constraints (primary and secondary) in variables which do not follow from the Lagrangian
order to get a reduced space with a symplectic and which have the property that they cut each gauge
structure (reduced phase space). For this reason, orbit once and only once. Since the gauge transfor-
one postulates that the first-class constraints always mations are generated by the first-class constraints,
generate gauge transformations, even for systems this requirement is (locally) equivalent to
which are counterexamples to the Dirac conjecture
(i.e., in that case, one defines the gauge Ca ; b "b 0 ) "b 0 26
Constrained Systems 615
That is, the constraints (a , Cb ) form together a Second example (see eqn [4]). The primary
second-class system: there is no first-class constraint constraints are p1 z2 = 0 and p2 = 0 and define a
left once the conditions Ca = 0 are included. One two-dimensional plane in the four-dimensional
can then eliminate all the constraints and gauge phase space (z1 , z2 , p1 , p2 ). The consistency algo-
conditions and introduce the corresponding Dirac rithm forces u1 = z2 and u2 = 0 and does not bring
bracket. For gauge-invariant functions, this Dirac any further constraint. The constraints are second
bracket coincides with the original Poisson bracket. class since [p2 , p1 z2 ] = 1. One can eliminate p1
The reduced phase space is the unconstrained and p2 through the constraints. The Dirac brackets
space obtained after this reduction, equipped with of the remaining variables vanish, except
the Dirac bracket. It has dimension 2n s 2A, [z1 , z2 ] = 1. The reduced phase is the space of the
where 2n is the dimension of the original phase zs, with z2 conjugate to z1 . The Hamiltonian is the
space, s is the number of second-class constraints, free-particle Hamiltonian , H = (1/2)(z2 )2 . Thus, one
and A is the number of first-class constraints. In the recovers the original description which was already
bosonic case, this number is even (as it should) in Hamiltonian form. (The recognition that a system
because s is even. One sees that first-class con- is already in first-order form often enables one to
straints strike twice since they need gauge shortcut some aspects of the Dirac procedure by not
conditions. introducing the unnecessary momenta which would
The observables of the theory are the reduced in any case be eliminated in the end.)
phase-space functions. They form a Poisson algebra,
the relevant reduced phase-space bracket being the
Dirac bracket associated with all the constraints and Quantization
gauge conditions. The symplectic structure defined
The phase space of physical interest is the reduced
in the reduced phase space is nondegenerate because
phase space and the physical algebra is the algebra
one has removed all the first-class constraints.
of the observables. The quantization of the theory
The definition of reduced phase space given above
then amounts to quantizing the algebra of the
is useful in practice but has the conceptual
observables. This can be achieved along two
drawback of relying on gauge conditions. This
different lines:
approach does not display clearly its intrinsic
significance and, furthermore, in the case of the 1. Reduce then quantize: In this direct approach,
so-called Gribov problems (global obstructions to one represents as quantum operators only the
cutting each gauge orbit once and only once), may reduced phase-space functions. There is no
yield the incorrect expectation that the reduced operator associated with non-gauge-invariant
phase space does not exist. We shall provide a more functions.
intrinsic definition below, which does not involve 2. Quantize then reduce: In this approach, one
gauge conditions. represents as quantum operators the bigger alge-
bra of functions of all the phase-space variables.
One must then take into account the constraints.
Examples The second-class constraints are enforced as
First example (see eqn [2]). There is here one operator equations, which is consistent with the
primary constraint, namely = 0. The canonical correspondence rule that the commutator in the
Hamiltonian is (1=2)((p1 )2 (p2 )2 ) (p1 p2 ). quantum theory is ih times the Dirac bracket,
The consistency algorithm yields the secondary
AB BA ihA; BD 27
constraint p1 p2 = 0 and no condition on the us.
The constraints are first class. They generate the (plus higher-order terms in h). The first-class
gauge transformations q1 ! q1 ", q2 ! q2 ", constraints are implemented in a more subtle
and !
, which coincide with the Lagrangian way. It would be inconsistent to impose them as
gauge transformations if one identifies
with "_ operator equations since in general [a , F]D 6 0
(" and "_ are, of course, independent at any given (even in the Dirac bracket). What one does is to
time). One can fix the gauge by means of the gauge impose them as conditions on the physical states:
conditions = 0, q1 q2 = 0. The reduced phase these are defined as the states annihilated by the
space is two-dimensional and the observables can first-class constraints,
be identified with the functions of the gauge-
a j i 0 28
invariant variables (1=2)(q1 q2 ) and p1 p2 ,
which are conjugate. Any other gauge condition For simple systems, it is easy to verify that the two
leads to the same reduced phase space. procedures are equivalent. There is yet another
616 Constrained Systems
approach, in which one extends the system rather functions in C1 (), that is, to impose that they are
than reduce it. This is the BecchiRouetStora constant along the gauge orbits O. Assuming all
Tyutin (BRST) approach, in which the new variables necessary smoothness and regularity conditions to be
are called ghosts. fulfilled (i.e., that the orbits fiber which is, for
instance, the case if the gauge orbits are the orbits
of a free and proper group action), one may denote
the algebra of observables as C1 (=O). This algebra
Geometric Description
is a Poisson algebra because the induced 2-form on
We defined above first-class and second-class the quotient space =O is nondegenerate. The
constraints through algebraic means. It turns out algebraic description of the observables underlies the
that these definitions also have a geometrical BRST construction.
interpretation, which sheds considerable insight It is interesting to note that in the covariant
into their nature. approach to phase space, a similar two-step reduc-
The phase-space symplectic 2-form induces, by tion procedure occurs. What plays the role of the
pullback, a 2-form on the constraint surface . constraint surface is the stationary surface in the
While is of maximal rank, this may not be the case space of all histories qi (t) of the dynamical variables.
for the induced , which may be degenerate. In The gauge symmetry acts on this space and the
fact, the rank of fails to be equal to the reduced phase space is just the quotient space. One
maximum rank 2n J (where J is the total number can establish the equivalence of the two descriptions
of constraints) by precisely the number A of first- (Barnich et al. 1991).
class constraints.
Indeed, the Hamiltonian vector fields Xa associated See also: BatalinVilkovisky Quantization; BRST
with the first-class constraints are tangent to the Quantization; Canonical General Relativity; Operads;
constraint surface and are null eigenvectors of , Perturbative Renormalization Theory and BRST;
Quantum Dynamics in Loop Quantum Gravity; Quantum
Xa ; Y 0 8Y tangent to 29 Field Theory: A Brief Introduction.
A simple estimate yields, if " 2 (0, 1) is fixed and c the fields x(N) sampled with distribution PN
is suitably chosen, are rather singular objects. Their properties cannot be
described by a single length scale: they are extremely
N
Cx;h c d2N emjxhj large for large N, take independent values only beyond
distances of order m1 but, at the same time, they look
N N "
Cx;h Cx;h0 c d2N N mjh h0 j 4 smooth only on the much smaller scale m1 N . Their
essential feature is that fixed " < 1, for example,
with (d2)N interpreted as N if d = 2.
" = 1=2, with PN -probability 1 there is B > 0 such
The
that (interpreting (d2)=2N as N if d = 2)
ZN ; f
f log N
ZN ; 0 x B Nd2=2
"=2 6
defines a generating function of a probability N N
x h < B Nd2=2 N mjx hj
distribution Pint over the fields on which will be
called the distribution with 4 -interaction regu- and furthermore the probability of the relations in
larized on and at length scale m1 N : the [6] will be N-independent, that is, (N) are
x
integral, in [1], bounded and roughly of size N(d2)=2 as N ! 1
Z and, on a very small length scale m1 N , almost
def N4 N2
VN N N x N x constant.
Substantial control on the field (N) x
statisti-
N
N f x x dd x 5 cally sampled with distribution PN can be obtained
by decomposing it, through [3], into components
will be called the interaction potential with of various scales: that is, as a sum of statistically
external field f. The regularizationR is introduced to mutually independent fields whose properties
guarantee that the integral [1], eVN dPN , is well are entirely characterized by a single scale of length.
defined if N > 0. The momenta of Pint are the This means that they have size of order 1 and
functional derivatives of (f ): they are called are independent and smooth on the same length
Schwinger functions. scale.
The problem (1) can now be made precise: it is to Assuming the side of to be an integer multiple
show the existence of N , N , N so that the limit of m1 , let Qh be a pavement of into boxes of
side m1 h , imagined hierarchically arranged so
ZN ; f that the boxes of Qh are exactly paved by those of
lim
N!1 ZN ; 0 Qh1 .
Define z(h) to be the random field with propa-
exists for all f and is not Gaussian, that is, it is not x
gator C(h) with Fourier transform
the exponential of a quadratic form in f: which x, h
where jx hj is reinterpreted as the distance where the last product runs over all pairs = (x , h )
between x, h measured over the periodic box h of half-lines of G that are joined and connect two
(hence jx hj differs from the ordinary distance vertices labeled by points x , h : call line of G any
only if the latter is of the order of h L). The such pair. If the graph consists of the single vacuum
interpretation of [10] is that z(h)x
are essentially
bounded variables which, on scale m1 , are
essentially constant and furthermore beyond length
m1 are essentially independently distributed.
For more details, the reader is referred to Wilson Figure 1 The graph elements to representing (N)4
, (N)2 ,
(1970, 1972) and Gallavotti (1981, 1985). a constant (N) .
620 Constructive Quantum Field Theory
R
vertex its value will be N . The series for C(N)
ax
C(N)3
xh
(C(N)
hb
C(N)
xb
) dh. If d = 2, we only
(1=jj) log ZN (, f ) is then need to define N as the first term on the right-hand
Z side (RHS) of [14] and we can leave the subgraphs like
npr
1 X Y the second in Figure 2 as they are (without any
N WG x1 ; . . . ; xnpr dxj 12
jj G j1
renormalization).
Graphs without external lines are called vacuum
and the integral will be called the integrated graph graphs and there are a few such graphs which are
value. divergent. Namely, if d = 3, they are the first three
Suppose first that N = N = 0. Then if a graph G drawn in Figure 3; furthermore, if N is set to the
contains subgraphs like in Figure 2, the correspond- above nonzero value a new vacuum graph, the
ing respective contribution to the integral in [12] fourth in Figure 3, can be formed. Such graphs
(considering only the integrals over h and suitably contribute to the graph value, respectively, the terms
taking care of the combinatorial factors) is a factor in the sum
obtained by integrating over x the quantities Z
N2 4! N4 23 3!3 3
3Cx ;x 2 Cx x dx2
N
6Cax Cxx Cxb
N N 1 1 2 1 2 3!
Z
Z N2 N2 N2 N
42 3! 2 N 13
Cx x Cx x Cx x dx2 dx3 N Cx x 15
N3 N 1 2 2 3 3 1 1 1
or Cax Cxh Chb dh
2!
and diverge, respectively, as 2N , N , N, 2N if d = 3
which if d = 3 diverge as N ! 1 as or, respec- N while, if d = 2, only the first and the last (see [14])
tively, as N; the second factor does not diverge in diverge, like N 2 .
dimension d = 2 while the first still diverges as N. The Therefore, if we fix N as minus the quantity in
divergences arise from the fact that as x h ! 0 the [15] we can disregard graphs like those in Figure 3;
propagator behaves as jx hjN if d = 3 or as if d = 2 N can be defined to be the sum of the first
log jx hj if d = 2, all the way until saturation and last terms in [15].
occurs at distance jx hj m1 N : for this reason The formal series in and f thus obtained is called
the latter divergences are called ultraviolet the renormalized series for the field 4 in
divergences. dimension d = 2 or, respectively, d = 3. Note that
However, if we set N 6 0, then for every graph with the given definitions and choices of N , N the
containing a subgraph like those in Figure 2 there only graphs G that need to be considered to
is another one identical except that the points construct the expansion in and f are formed by
a, b are connected via a mass vertex, see Figure 1, the first and last graph elements in Figure 1, paying
with the vertex in x, by a line ax and a line xb; attention that the graphs in Figure 3 do not
the new graph value receives a contribution from contribute and, if d = 3, the graphs with subgraphs
the mass vertex inserted in x between a and b like the second in Figure 2 have to be computed with
simply given by a factor N . Therefore if we fix, the modification described.
for d = 3, In the next section, it will be shown that the
above are the only sources of divergences as N ! 1
N 42 3! 2 and therefore the problem of studying [1] is solved
N 6Cxx
2 at the level of formal power series by the subtraction
Z
N3 def N in [14]. This also shows that giving a meaning to the
Cxh dh 6Cxx N 14 series thus obtained is likely to be much easier if
d = 2 than if d = 3.
we can simply consider graphs which do not contain The coefficients of order k of the expansion in
any mass graph element and in which there are no of (1=jj) log ZN (, f ) can be ordered by the number
subgraphs like the first in Figure 2 while the subgraphs 2n of vertices
R representing Q external fields: and have
2n
Rlike(N)
the second in Figure 2 do not contribute a factor the form S(k) (
2n 1 x , . . . , x 2n ) i = 1 (fxi dx i ): the kernels
Cax C(N)3
xh
(N)
Chb dh but a renormalized factor (k)
S2n are the Schwinger functions of order 2n, see the
section Euclidean quantum fields.
1
1
1 2 2 1
3
Figure 2 Divergent subgraphs, if d = 3. If d = 2 only the first
diverges. Figure 3 Divergent vacuum graphs.
Constructive Quantum Field Theory 621
Remark If d = 4, the regularization at cutoff N in The distinctions between the cases d = 2, 3, 4, >4
[2] is not sufficient as in the subtraction procedure explain the terminology given to the 4 -scalar field
smoothness of the first derivatives of the field theories calling them super-renormalizable if
(N) is necessary, while the regularization [2] does d = 2, 3, renormalizable if d = 4 and nonrenormaliz-
not even imply [6], that is, not even Holder able if d > 4. Since the (divergent) coefficients in the
continuity. A higher regularization (i.e., using a formal power series defining N , N , N , N are
N like the square of the N in [3]). Furthermore, called counter-terms, the 4 -scalar fields require
the subtractions discussed in the case d = 3 are not finitely many counter-terms (see [14]) in the super-
sufficient to generate a formal power series and renormalizable cases and infinitely many in the
many more subtractions are needed: for instance, renormalizable case. The nonrenormalizable cases
graphs with a subgraph like the one in Figure 4 (d > 4) cannot be treated in a way analogous to the
would give a contribution to the graph value which renormalizable ones.
is a factor For more details, the reader is referred
2 Z to Gallavotti (1985), Aizenman (1982), and
2 def 2 6 2 N2
N Cxh dh Frohlich (1982).
2!
hold, so that the estimate [16] can be elaborated into For more details, the reader is referred to Hepp
Y (1966), Gallavotti (1985), sections 8 and 16.
I v hv hv0
v>r
18
def d2 d2 e
v d 4 dnv rv nv Asymptotic Freedom (d = 2, 3).
2 2
Heuristic Analysis
where hv0 = k = 0 if v is the first nontrivial node (i.e.,
v0 = root), and an estimate of the integral of the Finiteness to all orders of the perturbation expan-
absolute value of the graphs G with given tree sions is by no means sufficient to prove the existence
structure but different scale labels is proportional to of the ultraviolet limit for ZN (, f ) or for (1=jj)
{hv } I < 1 if (and only if) v > 0, 8v. log ZN (, f ): and a priori it might not even be
But there may be clusters v with only two necessary. For this purpose, the first step is to check
external lines nev = 2 and two graph vertices inside: uniform (upper and lower) boundedness of ZN (, f )
for which v = 0. However, this can happen only if as N ! 1.
d = 3 and in only one case: namely if the graph G The reason behind the validity of a bound
contains a subgraph of the second type in Figure 2 ejjE (, f ) ZN (, f ) ejjE (, f ) with E (, f ) cutoff
and the three intermediate lines form a cluster v of independent has been made very clear after the
scale hv while the other two lines are external to it: introduction of the renormalization group methods
hence on scale h0 > hv . In this case, one has to in field theory. The approach studies the integral
remember that the subtraction in the previous section ZN (, f ), recursively, decomposing the field (N) x
has led to a modification of the contribution of such a into its regular components z(h) x
, see [7], and
subgraph to the value of the graph (integrated over integrating first over z(N) , then over z(N1) and so on.
the position labels of the vertices). As discussed in the The idea emerges naturally if the potential VN in
previous section, the change amounts to replacing the [1] and [4] is written in terms of the normalized
def
variables Xx(N) N(d2)=2 (N)
0
(h0 ) (h0 )
propagator C(h h, b
)
by C h, b
C x, b
. x
, see [6]; here if d = 2
(d2)=2N
This improves, in [18], the estimate of the contribu- the factor is interpreted as N1=2 .
tion Rof the line joining h to b from being proportional The key remark is that as far as the integration
(hv )3 (h0 ) over the small-scale component z(N) is concerned the
to Cxh Chb dh to being proportional to
R (hv )3 (h0 ) 0 field X(N) is a sum of two fields of size of order 1
Cxh (Chb C(h )
) dh; and this changes the con- x
(statistically),
xb
0 R hv
tribution of the line hb from (d2)h to em jxhj
h0 1=2 (h0 ) N N N1
( jx hj) dh because C is regular on scale Xx zN x d2=2 Xx
0
h m1 , see [10] with " = 1=2.
Since x, h are in a cluster of higher scale hv this if d = 2 this becomes
0
means that the estimate is improved by (1=2)(hv h ) .
N 1 N N 11=2 N1
In terms of the final estimate, this means that v in Xx zN Xx
[18] can be improved to v = v 1=2 for the N 1=2 x N 1=2
clusters for which v = 0. Hence, the integrated and it can be considered to be smooth on scale m1 N
value of the graph G (after taking also into account (also statistically). Hence, approximately constant
the integration over the initially selected vertex x1 , and of size of order O(1) on the small cubes of
trivially giving a further factor jj by translation volume dN md of the pavement QN introduced
invariance), and summed over the possible scale before [7]; at the same time it can be considered to
labels is bounded proportionally to jj{hv } I < 1 take (statistically) independent values on different cubes
once the estimate of I is improved as described. of QN . This is suggested by the inequalities [8][10].
Note that the graphs contributing to the perturbation Therefore, it is natural to decompose the potential
series for (1=jj) log ZN (, f ) to order n are finitely VN , see [5], as a sum over the small cubes of volume
many because the number r of external vertices is r dN md of the pavement QN as (see [14] for the
2n 2 (since graphs must be connected). Hence, the definition of N , N ), taking henceforth m = 1,
perturbation series is finite to all orders in . X Z
N def N 4
The above is the renormalizability proof of the VN z Nd
2d2N Xx
scalar 4 -fields in dimension d = 2, 3. The theory is 2QN
where (d2)N is interpreted as N if d = 2. Hence, if divergent when the fields were not properly scaled,
d = 3 it is are in fact of the same order or much smaller than
the main 4 -term.
VN zN Therefore, the integration over z(N) can be, heur-
X Z
def N N 4 N 2 istically, performed by techniques well established
Xx N Xx
2QN in statistical mechanics (i.e., by straightforward
dx perturbation expansions): at least if the field
X(N1)
3 N
N fx 2N Xx 20 x
is smooth and bounded, as prescribed
jj by [6], with B = BN1 growing as a power of N.
where In this case, denoting symbolically the integration
over z(N) by P or by h. . .i, it can be expected that it
def should give
N 6cN 2 N N c0N ;
def Z
N 3c2N 2 N bN 3 N 2N b0N eVN dP zN eVj;N1 Rj;Njj 22
and cN , c0N , bN , b0N , computable from [15] and [14],
admit a limit as N ! 1. While if d = 2 it is where
R Vj; N1 is the Taylor expansion of
log eVN dP(z(N) ) in powers of (hence essentially
VN zN in the very small parameter (4d)N ) truncated at
X Z order j, that is,
def 2 2N N 4 N 2
N Xx N Xx
2QN V1;N1 hVN i1
dx " #2
32 N
2
hVN i hVN i2
N fx N Xx 21 V2;N1 hVN i
jj 2!
"
def 2
hVN i hVN i2
where N = 6cN and N = 3c2N and cN , compu- V3;N1 hVN i
2!
table from [13], admits a limit as N ! 1. #
The fields z(N) and X(N1) can be considered 2
hVN hVN i hVN i2 i hVN hVN
2
i hVN i2 3
; ...
constant over boxes 2 QN : z(N) x
= s , X(N1)
x
= x 3!
for x 2 and the s can be considered statistically 23
independent on the scale of the lattice QN .
j
Therefore, [20] and [21] show that integration over where [] denotes truncation to order j in ,
z(N) in the integral defining ZN (, f ) is not too and R(j, N) is a remainder (depending on (N1)
x
)
different from the computation of a partition func- which can be expected to be estimated, for d = 2, 3, by
tion of a lattice continuous spin model in which the
spins are s and, most important, interact extre- jRj; Nj Rj; N
mely weakly if N is large. In fact, the coupling def 4j
Cj BN N 2 4dN j1 dN 24
constants are of order of a power of jX(N1) j times
O( N ) if d = 3 (O(N 2 2N ) if d = 2), or of order for suitable constants Cj , that is, a remainder
O( N(d2)=2 max jfx j), no matter how large and f. estimated by the (j 1)th power of the coupling
This says that the smallest scale fields are times the number of boxes of scale N in . The
extremely weakly coupled. The fields X(N1) can be relations [22][24] resultR from a naive Taylor
regarded as external fields of size that will be called expansion (in of the log eVN dP(z(N) ), taking into
BN1 , of order 1 or even allowed to grow with a account that, in VN as a function of z(N) , the z(N) s
power of N, see [6]. Their presence in VN does not appear multiplied by quantities at most of size
affect the size of the couplings, as far as the analysis 4d N 2 B3N , by [20] and [21] if jX(N1) j BN1 ).
of the integral over z(N) is concerned, because the In a statistical mechanics model for a lattice spin
couplings remain exponentially small in N, see [20] system, such a calculation of ZN would lead to a
and [21], being at worst multiplied by a power of mean-field equation of state once the remainder was
BN1 , i.e., changed by a factor which is a power of N. neglected.
The smallness of the coupling at small scale is a The peculiarity of field theory is that a relation like
property called asymptotic freedom. Once fields [22] and [24] has to be applied again to Vj; N1 to
and coordinates are correctly scaled, the real size perform the integration over z(N1) and define Vj; N2
of the coupling becomes manifest, that is, it is and, then, again to Vj; N2 . . . . Therefore, it will be
extremely small and the addends in VN proportional essential to perform the integral in [22] to an order
to the counter-terms N , N , which looked (in ) high enough so that the bound R(j, N) can be
Constructive Quantum Field Theory 625
summed over N: this requires (see [24]) an explicit The relevant part in d = 2 is simply of the form
calculation of [23] pushed at least to order j = 1 if [21] with h replacing N: call it Vh(rel, 1) . If d = 3, it is
d = 2 or to order j = 3 if d = 3; furthermore it is also given by [20] with h replacing N plus, for h < N, a
necessary to check that the resulting Vj; N1 can still second nonlocal term
be interpreted as low-coupling spin model so that Z
2
[22] can be iterated with N 1 replacing N and then rel;2 def 4 3! 2 h 3 N 3
Vh Chh0 Chh0
with N 2 replacing N 1, . . . . 2! 2!
The first necessary check towards a proof of the 2
h h
h h0 dhdh0
discussed heuristic expectations is that, defining
recursively Vj; h from Vj, h1 for h = N 1, . . . , 1, 0
which is conveniently expressed in terms of a
by [23] with VN replaced by Vj; h1 and Vj; N1
nonlocal field
replaced by Vj; h , the couplings between the variables
z(h) do not become worse than those discussed in h h
h h0
the case h = N. Furthermore, the field (N1)
x
has a h def
Yhh0 1
high probability of satisfying [6], but fluctuations h jh h0 j4
are possible: hence the R-estimate has to be rel rel;1 rel;2
combined with another one dealing with the large as Vh Vh Vh with
fluctuations of X(N1)
x
which has to be shown to be rel;2 def
X Z h2 h
not worse. Vh 2 2h Yhh0 Ahh0
;0 2Q
0
For more details, the reader is referred to Gallavotti h
h0 = 1, . . . , h, then kX(h) k c Bh04 for some c so is N 2 2N (BN 4 ) < 0 and it overwhelmingly
that, by recursive PN application of Lemma 1, dominates on the remaining terms whose value is
ZN (, f ) eVj, 0 h = 1 R (j, h)jj . By the remark at the bounded by a similar expression with a smaller
end of the previous section, given j the lower bound power of N. Then if E c def= =E denotes the comple-
on E just described agrees with the perturbation ment in of a set E :
expansion of E = (1=jj) log ZN (, f ) truncated to Lemma 2 Let d = 2. Define Vh (Dch ) to be given by
order j (in ) up to an error bounded by
P the expression [22] with the integrals extending over
1
h = 1 R (j, h). j =Dh and define R(j, h 1) by [24]. Then
Z
Remark The problem solved by Lemma 1 is c c
usually referred to as the small-field problem, to eVh1 Dh1 dP zh1 eVh Dh R j;h1jj 28
contrast it with the large-field problem discussed
later. The proof of the lemma is a simple Taylor where jR (j, h 1j R (j, h 1 def
= R(j; h 1)
c0 B2 (h1)2
expansion in h if d = 3 or in h2 2h if d = 2 to c e with suitable c , c0 .
order j (in ). The constraint on z(h1) makes the Remark Lemma 2 is genuinely not perturbative
integrations over z(h1) , necessary to compute Vj; h and making essential use of the positivity of .
from Vj; h1 , not Gaussian. But the tail estimates [9], Below the analysis of the proof of the lemma, which
together with the Markov property of the distribu- consists essentially in its reduction to Lemma 1, is
tion of z(h) can be used to estimate the difference described in detail. It is perhaps the most interesting
with respect to the Gaussian unconstrained integra- part and the core of the theory of the proof that
tions of z(h1) : and the result is the addition of the truncating the expansion in of (1=jj) log ZN (, f )
small tail error changing R into R in [27]. The to order j gives as a result an estimate exact to order
estimate of the main part of the remainder R would j1 of (1=jj) log ZN (, f ).
be obvious if the fields z(h) were independent on
boxes of scale h : they are not independent but Let RN be the cubes 2 QN in which there is at
they are Markovian and the estimate can be done by least one point x where jz(N)
x
j BN 2 . By definition,
taking into account the Markov property. the region DN =DN1 is covered by RN .
Remark that in the region DN1 =RN the field
For more details, the reader is referred to Wilson X(N1) is large but zN is not large so that X(N) is still
(1970, 1972), Gallavotti (1978, 1981, 1985), and very large: this is so because the bounds set to define
Benfatto et al. (1978). the regions D and R are quite different being BN 4
and BN 2 , respectively. Hence, if a point is in DN1
and not in RN , then the field X(N) must be of the
Nonperturbative Renormalization: Large order
BN 3 . Therefore, by positivity of the (N)4
x
Fields, Ultraviolet Stability term (which dominates all other terms so that
The small-field estimates are not sufficient to obtain V (N) ((N)
x
) < 0 for x 2 DN [ (DN1 =RN )) we can
ultraviolet stability: to control the cases in which replace VN (DcN ) by V((DN [ (DN1 =RN ))c ), for the
jX(h) (h)
j > Bh4 for some x or some h, or jYxh j > Bh4 for purpose of obtaining an upper bound.
x
h
some jx hj < , a further idea is necessary and it Furthermore, modulo a suitable correction, it is
rests on making use of the assumption that > 0 possible to replace V((DN [ (DN1 =RN ))c ) by
which, in a sense to be determined, should suppress V((DN1 [ RN )c ): because the integrand in VN is
the contribution to the integral defining ZN (, f ) bounded below by
coming from very large values of the field. Assume b 2N N 2
also < 1 for the same reasons advanced in the
section Effective potentials and their scale if d = 2 (by b N if d = 3), for some b, so that the
(in)dependence. points in RN can at most lower V((DN [
Consider first d = 2. Let DN be the large-field (DN1 =RN ))c ) by bN 2 (4d)N #(RN ) if #RN is
region where jX(N)x
j > BN 4 and let VN (=DN ) be the number of boxes of QN in RN and V(x ) is
the integral defining the potential in [21] extended bounded below by its minimum: thus,
to the region =DN , complement of DN . This region
VDN1 [ RN c bN2 4dN #RN
is typically very irregular (and random as X itself is
random with distribution PN ). is an upper bound to V((DN [ (DN1 =RN ))c ).
An upper bound on the integral defining ZN (, f ) In the complement of DN1 [ RN , all fields are
is obtained by simply replacing eVN by eVN (=DN ) small; if X(N1) and RN are fixed this region is not
because in DN the first term in the integrand in [21] random (as a function of z(N) ) any more. Therefore,
628 Constructive Quantum Field Theory
if X(N1) , RN are fixed the integration over z(N) , quantity like b0 N 2 (4d)N (BN 4 )4 #(RN ) (because
conditioned to having z(N) fixed (and large) in the the reintroduction occurs in the region RN =DN1
region RN , is performed by means of the same which is covered by RN and in such points the field
argument necessary to prove Lemma 1 (essentially a Xx(N1) is not large, being bounded by B(N 1)4 );
Taylor expansion in (4d)N ). The large size of so that their contribution to the effective potential
z(N) in RN does not affect too much the result is still dominated by the 4 -term and therefore by
because on the boundary of RN the field z(N) is (4d)N times a power of BN 4 times the volume of
BN 2 (recalling that z(N) is continuous) and since RN (in units N , i.e., #RN ). All this is taken care
the variable z(N) is Markovian, the boundary effect of by suitably fixing c00 .
decays exponentially from the boundary @RN : it
Note that the sum over RN of [29] is
adds a quantity that can be shown to be bounded by
the number of boxes in RN on the boundary of RN , 0 2
N4 00
4dN N2 BN4 4 dN jj
hence by #RN , times b0 (N 1)2 (4d) (B(N 1)4 )4 1 c ec B ec
for some b0 .
The result of the integration over z(N) of (because contains jj dN0 cubes of QN ); hence, it is
c B2 N 2
VN ((DN [(DN1 =RN ))c ) c e
e conditioned to the large-field bounded above by e for suitably defined
values of z (N)
in RN leads to an upper bound on
c , c0 .
R V
e N P(dz(N) ) as The same argument can be repeated for Vj; h (Dch )
with any h if Vj; h (Dch ) is defined by the sum over s
X c
eVj;N1 DN1 Rj;Njj in Qh of the same integrals as those in [25] and [26]
RN with j =Dh replacing j in the integration domains.
Y 2 2
#RN Applying Lemma 1 recursively with j 1 (if
0 00
4dN N2 BN 4 4
c ec BN ec 29 d = 3 it would become necessary to take j 3), it
2RN follows that there exist N-independent upper and
lower P bounds E jj on log Z(, f ) of the form
where c, c0 , c00 are suitable constants: this is Vj; 0 1 c0 B2 h2
)jj for c , c0 > 0
h = 1 (R(j, h) c e
explained as follows. suitably chosen and -independent for < 1.
1. Taylor expansion (in ) of the integral By the remark at the end of Sec.6, given j, the
c 2 (4d )N
eVN ((DN1 [RN ) )bN #(RN )
(which, by cons- bounds just described agree with the perturbation
c
truction, is an upper bound on eVN (DN ) ) with expansion E(j, 0)jj Vj; 0 of log Z(, f ) truncated
respect to the field z(N) , conditioned to be fixed toP order j (in ) up to the remainders
and large in RN , would lead to an upper bound as 1 h = 1 R (j, h). Hence, if B is chosen proportional
to log 1 def = log (e 1 ), the upper and lower
c 0 00
BN 4 4 4dN #RN bounds coincide to order j in with the value
eVj; N1 DN1 [RN R j;Njjb
obtained by truncating to order j the perturbative
with R0 equal to [24] possibly with some C0j series.
replacing Cj . The second exponential on the RHS The latter remark is important as it implies
of [29] arises partly from the above correction not only that the bounds are finite (by the
b00 (BN 4 )4 (4d)N #(RN ) and partly from a section Perturbation theory) but also that the
contribution of similar form explained in (3) function (1=jj) log Z(, f ) is not quadratic in f:
below. already to order 1Rin it is quartic in f (containing a
2. Integration over the large conditioning fields term equal to ( Cx, 0 fx dx)4 ).
fixed in RN is controlled by the second estimate The latter property is important as it excludes
in [9] (the tail estimate): the first factors in that the result is a Gaussian generating function.
parentheses in [29] is the tail estimate just Thus, the outline of the proof of Lemma 2, which
mentioned, i.e., the probability that z(N) is large together with Lemma 1 forms the core of the
in the region RN . The second factor is only partly analysis of the ultraviolet stability for d = 2, is
explained in (1) above. completed.
3. Without further estimates, the bound [29] would If d = 3, more care is needed because (very mild)
contain Vj; N1 ((DN1 [ RN )c ) rather than smoothness, like the considered Holder continuity
c
Vj; N1 (DN1 ). Hence, there is the need to change with exponent 1/4, of z, X is necessary to obtain the
the potential Vj;N1 ((DN1 [ RN )c ) by reintrodu- key scale independence property discussed in earlier:
cing the contribution due to the fields in therefore, the natural measure of the size of z(h) and
RN =DN1 in order to reconstruct Vj; N1 (DcN1 ). X(h) in a box 2 Qh is no longer the maximum of
Reintroducing this part of the potential costs a jz(h)
x
j or of jX(h) x
j. The region Dh becomes more
Constructive Quantum Field Theory 629
involved as it has to consist of the points x renormalization group applications in which they
where jX(h)
x
j > Bh4 and of the pairs h, h0 where either tend to zero only as powers of h or do not
tend to zero at all.
h h
Xh Xh0 The multiscale analysis method, i.e., the renorma-
4
jYh;h0 j 1 > Bh lization group method, in a form close to the one
h jh h0 j4
discussed here has been applied very often since its
i.e., it is not just a subset of . introduction in physics and it has led to the solution
However, if d = 3, the relevant part also contains of several important problems. The following is not
the negative term V (rel, 2) , see [25]: and since it an exhaustive list and includes a few open questions.
dominates over all other terms which contain a
Y-field (because their couplings [25] are smaller by 1. The arguments just discussed imply, with minor
about h ), the argument given for d = 2 can be extra work that ZN (, f ) as N ! 1 not only admit
adapted to the new situation. Two regions D1h , D2h uniform upper and lower bounds but also that the
will be defined: the first consists of all the points x limit as N ! 1 actually exists and it is a C1 function
where jX(h) x
j > Bh4 and the second of all the pairs of , f . Its and f-derivatives at = 0 and f = 0 are
0 (h) 4
h, h where jYh, h0 j > Bh . The region Rh will be
given by the formal perturbation calculation. In some
the collection of all 2 Qh , where kz(h) k > Bh2 , cases, it is even possible to show that the formal series
see [8] with = 0. Then V(Dch ) will be defined as the for ZN (, f ) in powers of is Borel summable.
sum of the integrals in [25] and [26] with the integrals 2. The problem of removing the infrared cutoff (i.e.,
over xi further restricted to xi 62 D1h and those over the ! 1) is in a sense more a problem of statistical
pairs hi , h0i are further restricted to (hi , h0i ) 62 D2h . With mechanics. In fact, it can be solved for d = 2, 3 by a
the new settings, Lemma 2 can be proved also for typical technique used in statistical mechanics, the
d = 3 along the same lines as in the d = 2 case. cluster expansion. This is not intended to mean
For more details, the reader is referred to Wilson that it is technically an easy task: understanding its
(1970, 1972), Benfatto et al. (1978), and Gallavotti connection with the low-density expansions and
(1981). the possibility of using such techniques has been a
major achievement that is not discussed here.
3. The third problem mentioned in the introduction,
that is, checking the axioms so that the theory could
Ultraviolet Limit, Infrared Behavior, and
be interpreted as a quantum field theory is a difficult
Other Applications problem which required important efforts to con-
The results on the ultraviolet stability are nonper- trol and which is not analyzed here. An introduction
turbative, as no assumption is made on the size of to it can be its analysis in the d = 2 case.
(the assumption < 1 has been imposed in the last 4. Also the problem of keeping the ultraviolet cutoff
two sections only to obtain simpler expressions for and removing the infrared cutoff while the para-
the -dependence of various constants): nevertheless meter m2 in the propagator approaches 0 is a very
the multiscale analysis has allowed us to use interesting problem related to many questions in
perturbative techniques (i.e., the Taylor expansion statistical mechanics at the critical point.
in Lemmata 1, 2) to find the solution. The latter 5. Field theory methods can be applied to various
procedure is the essence of the renormalization statistical mechanics problems away from criti-
group methods: they aim at reducing a difficult cality: particularly interesting is the theory of the
multiscale problem to a sequence of simple single- neutral Coulomb gas and of the dipole gas in two
scale problems. Of course, in most cases, it is dimensions.
difficult to implement the approach and the scalar 6. The methods can be applied to Fermi systems in
quantum fields in dimensions 2, 3 are among the field theory as well as in equilibrium statistical
simplest examples. The analysis of the beta function mechanics. The understanding of the ground state
and of the running couplings, which appear in in not exactly soluble models of spinless fermions
essentially all renormalization group applications, in one dimension at small coupling is one of the
does not play a role here (or, better, their role is so results. And via the transfer matrix theory it has
inessential that it has even been possible to avoid led to the understanding of nontrivial critical
mentioning them). This makes the models somewhat behavior in two-dimensional models that are not
special from the renormalization group viewpoint: exactly soluble (like Ising next-nearest-neighbor or
the running couplings at length scale h, if intro- AshkinTeller model). Fermi systems are of
duced, would tend exponentially to 0 as h ! 1; particular interest also because in their analysis
unlike what happens in the most interesting the large-fields problem is absent, but this great
630 Constructive Quantum Field Theory
technical advantage is somewhat offset by the In general, constructive quantum field theory
anticommutation properties of the fermionic seems to be in a deep crisis: the few solutions that
fields, which do not allow us to employ have been found concern very special problems and
probabilistic techniques in the estimates. are very demanding technically; the results obtained
7. An outstanding open problem is whether the scalar have often not been considered to contribute
4 -theory is possible and nontrivial in dimension appreciably to any progress. And many consider
d = 4: this is a case of a renormalizable not that the work dedicated to the subject is not worth
asymptotically free theory. The conjecture that the results that one can even hope to obtain.
many support is that the theory is necessarily trivial Therefore, in recent years, attempts have been
(i.e., the function ZN (, f ) becomes necessarily a made to follow other paths: an attitude that in the
Gaussian in the limit N ! 1). One of the main past usually did not lead, in general to great
problems is the choice of the ultraviolet cut-off; achievements but that is always tempting and
unlike the d = 2, 3 cases in which the choice is a worth pursuing because the rare major progresses
matter of convenience it does not seem that the made in physics resulted precisely by such changes
issue of triviality can be settled without a careful of attitude, leaving aside developments requiring
analysis of the choice and of the role of the work which was too technical and possibly hopeless:
ultraviolet cut-off. just to mention an important case, one can recall
8. Very interesting problems can be found in the quantum mechanics which disposed of all attempts
study of highly symmetric quantum fields: gauge at understanding the observed atomic levels quanti-
invariance presents serious difficulties to be zation on the basis of refined developments of
studied (rigorously or even heuristically) because classical electromagnetism.
in its naive forms it is incompatible with For more details, the reader is referred to Nelson
regularizations. Rigorous treatments have been (1966), Guerra (1972), Glimm et al. (1973), Glimm
in some cases possible and in few cases it has been and Jaffe (1981), Simon (1974), Benfatto et al.
shown that the naive treatment is not only not (1978, 2003), Aizenman (1982), Gawedzky and
rigorous but it leads to incorrect results. Kupiainen (1983, 1985a, b), Balaban (1983), and
9. In connection with item (8) an outstanding problem Giuliani and Mastropietro (2005).
is to understand relativistic pure gauge Higgs fields
in dimension d = 4: the latter have been shown to be See also: Algebraic Approach to Quantum Field Theory;
ultraviolet stable but the result has not been Axiomatic Quantum Field Theory; Euclidean Field
followed by the study of the infrared limit. Theory; Integrability and Quantum Field Theory;
Perturbation Theory and its Techniques; Quantum Field
10. The classical gauge theory problem is quantum
Theory: A Brief Introduction; Scattering, Asymptotic
electrodynamics, QED, in dimension 4: it is a
Completeness and Bound States.
renormalizable theory (taking into account gauge
invariance) and its perturbative series truncated
after the first few orders give results that can be Further Reading
directly confronted with experience, giving very
Aizenman M (1982) Geometric analysis of 4 -fields and Ising
accurate predictions. Nevertheless, the model is models. Communications in Mathematical Physics 86: 148.
widely believed to be incomplete: in the sense that, Balaban T (1983) (Higgs)3, 2 quantum fields in a finite volume. III.
if treated rigorously, the result would be a field Renormalization. Communications in Mathematical Physics
describing free noninteracting assemblies of 88: 411445.
photons and electrons. It is believed that QED Benfatto G, Cassandro M, Gallavotti G et al. (1978) Some
probabilistic techniques in field theory. Communications in
can make sense only if embedded in a model with Mathematical Physics 59: 143166.
more fields, representing other particles (e.g., the Benfatto G, Cassandro M, Gallavotti G et al. (1980) Ultraviolet
standard model), which would influence the stability in Euclidean scalar field theories. Communications in
behavior of the electromagnetic field by providing Mathematical Physics 71: 95130.
an effective ultraviolet cutoff high enough for not Benfatto G and Gallavotti G (1995) Renormalization Group,
pp. 1143. Princeton: Princeton University Press.
altering the predictions on the observations on the Benfatto G, Giuliani A, and Mastropietro V (2003) Low
time and energy scales on which present (and, temperature analysis of two dimensional Fermi systems with
possibly, future over a long time span) experi- symmetric Fermi surface. Annales Henry Poincare 4: 137193.
ments are performed. In dimension d = 3, QED is De Calan C and Rivasseau V (1981) Local existence of the Borel
super-renormalizable, once the gauge symmetry is transform in euclidean 44 . Communications in Mathematical
Physics 82: 69100.
properly taken into account, and it can be studied Frohlich J (1982) On the triviality of 4d theories and the
with the techniques described above for the scalar approach to the critical point in d 4 dimensions. Nuclear
fields in the corresponding dimension. Physics B 200: 281296.
Contact Manifolds 631
Gallavotti G (1978) Some aspects of renormalization problems in Glimm J and Jaffe A (1981) Quantum Physics. Springer.
statistical mechanics. Memorie dell Accademia dei Lincei Guerra F (1972) Uniqueness of the vacuum energy density and Van
15: 2359. Hove phenomena in the infinite volume limit for two-dimensional
Gallavotti G (1981) Elliptic operators and Gaussian processes. In: self-coupled Bose fields. Physical Review Letters 28: 12131215.
Aspects Statistiques et Aspects Physiques des Processus Gaus- Hepp K (1966) Theorie de la renormalization. Lecture Notes in
siens, pp. 349360. Colloques Internat. C.N.R.S, St. Flour. Physics, vol. 2. Heidelberg: Springer.
Publications du CNRS, Paris. Nelson E (1966) A quartic interaction in two dimensions. In:
Gallavotti G (1985) Renormalization theory and ultraviolet Goodman R and Segal I (eds.) Mathematical Theory of
stability via renormalization group methods. Reviews of Elementary Particles, pp. 6973. Cambridge: M.I.T.
Modern Physics 57: 471569. Osterwalder K and Schrader R (1973) Axioms for Euclidean
Gawedzky K and Kupiainen A (1983) Block spin renormalization Greens functions. Communications in Mathematical Physics
group for dipole gas and (@)4 . Annals of Physics 147: 198243. 31: 83112.
Gawedzky K and Kupiainen A (1985a) GrossNeveu model Simon B (1974) The P()2 Euclidean (Quantum) Field Theory.
through convergent perturbation expansion. Communications Princeton: Princeton University Press.
in Mathematical Physics 102: 130. Streater RF and Wightman AS (1964) PCT, Spin, Statistics and
Gawedzky K and Kupiainen A (1985b) Massless lattice 44 theory: All That. Benjamin-Cummings (reprinted Princeton University
rigorous control of a renormalizable asymptotically free model. Press, 2000).
Communications in Mathematical Physics 99: 197252. Wightman AS and Garding L (1965) Fields as operator-valued
Giuliani A and Mastropietro V (2005) Anomalous universality in distributions in relativistic quantum theory. Arkiv for Fysik
the anisotropic AshkinTeller model. Communications in 28: 129189.
Mathematical Physics 256: 681735. Wilson KG (1970) Model of coupling constant renormalization.
Glimm J, Jaffe A, and Spencer T (1973) Velo G and Wightman A Physical Review D 2: 14381472.
(eds.) Constructive Field theory, Lecture Notes in Physics, Wilson KG (1972) Renormalization of a scalar field in strong
vol. 25, pp. 132242. New York: Springer. coupling. Physical Review D 6: 419426.
Contact Manifolds
J B Etnyre, University of Pennsylvania, (e.g., thermodynamics, fluid dynamics, holo-
Philadelphia, PA, USA morphic curves, and open book decompositions)
2006 Elsevier Ltd. All rights reserved. are provided in the Further reading section.
1 on manifolds M0 and M1 , respectively, are Lutz and Martinet proved a similar, but weaker,
contactomorphic if there is a diffeomorphism result for oriented closed 3-manifolds. More
f : M0 ! M1 such that f (0 ) = 1 . All contact struc- specifically, every closed oriented 3-manifold admits
tures are locally contactomorphic. In particular, we a co-oriented contact structure and in fact has at least
have the following theorem. one for every homotopy class of plane field. There has
been much progress on classifying contact structures
Theorem 1 (Darbouxs Theorem). Suppose i is a
on 3-manifolds and here an interesting dichotomy has
contact structure on the manifold Mi , i = 0, 1, and
appeared. Contact structures break into one of two
M0 and M1 have the same dimension. Given any
types: tight or overtwisted. Overtwisted contact
points p0 and p1 in M0 and M1 , respectively, there
structures obey an h-principle and are in general easy
are neighborhoods Ni of pi in Mi and a contacto-
to understand. Tight contact structures have a more
morphism from (N0 , 0 jN0 ) to (N1 , 1 jN1 ). Moreover,
subtle, geometric nature. In higher dimensions there is
if i is a contact form for i near pi , then the
much less known about the existence (or classification)
contactomorphism can be chosen to pull 1 back to 0 .
of contact structures.
Thus, locally all contact structures (and contact
forms!) look like the one given in Example 1 above.
Furthermore, contact structures are local in Relations with Symplectic Geometry
time. That is, compact deformations of contact
Let (X, !) be a symplectic manifold. A vector field v
structures do not produce new contact structures.
satisfying
Theorem 2 (Grays theorem). Let M be an oriented
Lv ! ! 4
(2n 1)-dimensional manifold and t , t 2 (0, 1), a
family of contact structures on M that agree off of (where Lv ! is the Lie derivative of ! in the direction
some compact subset of M. Then there is a family of of v) is called a symplectic dilation. A compact
diffeomorphisms t : M ! M such that (t ) t = 0 . hypersurface M in (X, !) is said to have contact
type if there exists a symplectic dilation v in a
In particular, on a compact manifold, all
neighborhood of M that is transverse to M. Given a
deformations of contact structures come from
hypersurface M in (X, !), the characteristic line field
diffeomorphisms of the underlying manifold. The
LM in the tangent bundle of M is the symplectic
theorem is not true if the contact structures do not
complement of TM in TX. (Since M is codimension 1,
agree off of a compact set. For example, there is a
it is coisotropic; thus, the symplectic complement lies
one-parameter family of noncontactomorphic
in TM and is one dimensional.)
contact structures on S1 R2 .
Theorem 3 Let M be a compact hypersurface in a
symplectic manifold (X, !) and denote the inclusion
Existence and Classification
map i : M ! X. Then M has contact type if and only
The existence of contact structures on closed odd- if there exists a 1-form on M such that d = i !
dimensional manifolds is quite difficult. However, and the form is never zero on the characteristic
Gromov has shown that contact structures on line field.
open manifolds obey an h-principle. To explain
If M is a hypersurface of contact type, then the
this, we note that if (M2n1 , ) is a co-oriented
1-form is obtained by contracting the symplectic
contact manifold then the tangent bundle of M can
dilation v into the symplectic form: = v !. It is
be written as R and thus the structure group
easy to verify that the 1-form is a contact form
of TM can be reduced to U(n) (since has
on M. Thus, a hypersurface of contact type in a
a conformal symplectic structure on it). Such
symplectic manifold inherits a co-oriented contact
a reduction of the structure group is called an
structure.
almost contact structure on M. Clearly, a contact
Given a co-orientable contact manifold (M, ), its
structure on M induces an almost contact struc-
symplectization Symp(M, ) = (X, !) is constructed
ture. If M is an open manifold, Gromov proved
as follows. The manifold X = M (0, 1), and given
that the inclusion of the space of co-oriented
a global contact form for the symplectic
contact structures on M into the space of almost
form is ! = d(t), where t is the coordinate on R.
contact structures on M is a weak homotopy
(The symplectization is also equivalently defined as
equivalence. In particular, if an open manifold
(M R, d(et )).)
meets the necessary algebraic condition for the
existence of an almost contact structure, then the Example 6 The symplectization of the standard
manifold has a co-oriented contact structure. contact structure on the unit cotangent bundle
634 Contact Manifolds
(see Example 3) is the standard symplectic structure for if and only if it is transverse to and its flow
on the complement of the zero section in the preserves .
cotangent bundle. The fundamental question concerning Reeb vector
fields asks if its flow has a (contractible) periodic
The symplectization is independent of the choice
orbit. A paraphrazing of the Weinstein conjecture
of contact from . To see this, fix a co-orientation
asserts a positive answer to this question. Most
for and note the manifold X which can be
progress on this conjecture has been made in
identified (in many ways) with the sub-bundle of
dimension 3 where H Hofer has proved the
T M whose fiber over x 2 M is
existence of periodic orbits for all Reeb fields on S3
f 2 Tx M : x 0 and and on 3-manifolds with essential spheres
> 0 on vectors positively transverse to x g 5 (i.e., embedded S2 s that do not bound a 3-ball in
the manifold). Relations with Hamiltonian dynamics
and restricting d to this subspace yields a symplec- are discussed below.
tic form !, where is the Liouville form on T M Recall, from Example 3, that a Riemannian metric
defined in Example 2. A choice of contact form g on a manifold M provides an identification of the
fixes an identification of X with the sub-bundle of (oriented) projectivized cotangent bundle P M with
T M under which d(t) is taken to d. the unit cotangent bundle. Considered as a subset of
The vector field v = @=@t on (X, !) is a symplectic T M, P M inherits not only a contact structure but
dilation that is transverse to M {1} X. Clearly, also a contact form (by restricting the Liouville
v !jM{1} = . Thus, we see that any co-orientable form). Let v be the associated Reeb vector field.
contact manifold can be realized as a hypersurface The metric g also provides an identification of the
of contact type in a symplectic manifold. In tangent and cotangent bundles of M. Thus, P M
summary, we have the following theorem. may be considered as the unit tangent bundle of M.
Let wg be the vector field on the unit tangent bundle
Theorem 4 If (M, ) is a co-oriented contact
generating the geodesic flow on M.
manifold, then there is a symplectic manifold
Symp(M, ) in which M sits as a hypersurface of Theorem 6 The Reeb vector field v is identified
contact type. Moreover, any contact form for with geodesic flow field wg when P M is identified
gives an embedding of M into Symp(M, ) that with the unit tangent space using the metric g.
realizes M as a hypersurface of contact type.
We also note that all the hypersurfaces of contact
type in (X, !) look locally, in X, like a contact Relations with Complex Geometry
manifold sitting inside its symplectification. and Analysis
Theorem 5 Given a compact hypersurface M of Let X be a complex manifold with boundary and
contact type in a symplectic manifold (X, !) with the denote the induced complex structure on TX by J.
symplectic dilation given by v, there is a neighbor- The complex tangencies to M = @X are described
hood of M in X symplectomorphic to a neighbor- by the equation d J = 0, where is a function
hood of M {1} in Symp(M, ) where the defined in a neighborhood of the boundary such that
symplectization is identified with M (0, 1) using 0 is a regular value and 1 (0) = M. The form
the contact form = v !jM and = ker . L(v, w) = d(d J)(v, Jw), for v, w 2 , is called
the Levi form, and when L(v, w) is positive
(negative) definite, then X is said to have strictly
The Reeb Vector Field and Riemannian pseudoconvex (pseudoconcave) boundary. The
Geometry hyperplane field will be a contact structure if and
only if d(d J) is a nondegenerate 2-form on (if
Let (M, ) be a contact manifold. Associated to a
and only if L(v, w) is definite). A well-studied source
contact form for is the Reeb vector field v .
of examples comes from Stein manifolds.
This is the unique vector field satisfying
Example 7 Let X be a complex manifold and
v 1 and v d 0 6
again let J denote the induced complex structure
One may readily check that v is transverse to the on TX. From a function : X ! R, we can define a
contact hyperplanes and the flow of v preserves 2-form ! = d(d J) and a symmetric form
(in fact, it preserves ). These two conditions g(v, w) = !(v, Jw). If this symmetric form is positive
characterize Reeb vector fields; that is, a vector definite, the function is called strictly plurisub-
field v is the Reeb vector field for some contact form harmonic. The manifold X is a Stein manifold if X
Contact Manifolds 635
admits a proper strictly plurisubharmonic function Weinsteins conjecture asserts a positive answer to
: X ! R. An important result says that X is Stein the questions: Does the Hamiltonian flow along a
if and only if it can be realized as a closed complex regular level set of contact type have a periodic
submanifold of C n . Clearly any noncritical level set orbit? Viterbo proved that the answer was yes if the
of gives a contact manifold. hypersurface is compact and in (R 2n , ! = d). Other
progress has been made by studying Reeb dynamics.
Contact manifolds also give rise to an interesting
class of differential operators. Specifically, a contact
structure on M defines a symbol-filtered algebra of
pseudodifferential operators (M), called the
Geometric Optics
Heisenberg calculus. Operators in this algebra In this section, we study the propagation of light (or
are modeled on smooth families of convolution various other disturbances) in a medium (for the
operators on the Heisenberg group. An important moment, we do not specify the properties of this
class of operators of this type are the sum-of- medium). The medium will be given by a three-
squares operators. Locally, the highest-order part dimensional manifold M. Given a point p in M and
of such an operator takes the form t > 0, let Ip (t) be the set of all points to which light
can travel in time t. The wave front of p at time t
X
2n
L v2j iav 7 is the boundary of this set and is denoted as
j1 p (t) = @Ip (t).
where {v1 , . . . , v2n } is a local framing for the contact Theorem 8 (Huygens principle). p (t t0 ) is the
field and v is a Reeb vector field. This operator envelope of the wave fronts q (t0 ) for all q 2 p (t).
belongs to 2 (M) and is subelliptic for a outside a This is best understood in terms of contact
discrete set. geometry. Let : (T Mn{0}) ! P M be the natural
projection (see Example 3) and let S be any smooth
Hamiltonian Dynamics sub-bundle of T Mn{0} that is transverse to the radial
vector field in each fiber and for which jS : S ! P M
Given a symplectic manifold (X, !), a function is a diffeomorphism. The restriction of the Liouville
H : X ! R will be called a Hamiltonian. (Only form to S gives a contact form and a corresponding
autonomous Hamiltonians are discussed here.) The Reeb vector field v. Given a subset F of M with a well-
unique vector field satisfying defined tangent space at every point set
vH ! dH LF fp 2 S : p 2 F and pw 0 for all
is called the Hamiltonian vector field associated to w 2 Tp Fg 8
H. Many problems in classical mechanics can be
The set LF is a Legendrian submanifold of S and is
formulated in terms of studying the flow of vH for
called the Legendrian lift of F. If L is a generic
various H.
Legendrian submanifold in S, then (L) is called the
Example 8 If (X, !) = (R 2n , d), where is from front projection of L and L(L) = L. Given a Legendrian
Example 2, then the flow of the Hamiltonian vector submanifold L, let t (L) be the Legendrian submani-
field is given by fold obtained from L by flowing along v for time t.
@H @H Example 9 Given a metric g on M, Fermats
q_ ; p_
@p @q principle says that light travels along geodesics.
Thus, if S is the unit cotangent bundle, then using g
A standard fact says that the flow of vH preserves
to identify the geodesic flow with the Reeb flow
the level sets of H.
one sees that light will travel along trajectories
Theorem 7 If M is a level set of H corresponding of the Reeb vector field. Given a point p in M,
to a regular value and M is a hypersurface of contact the Legendrian submanifold Lp is a sphere sitting
type, then the trajectories of vH and of the Reeb in Tp M. The Huygens principle follows from the
vector field (associated to M in Theorem 3) agree. observation that p (t) = (t (Lp )).
Thus under suitable hypothesis, Hamiltonian Using the more general S discussed above, one can
dynamics is a reparametrization of Reeb dynamics. generalize this example to light traveling in a medium
In particular, searching for periodic orbits in such a that is nonhomogeneous (i.e., the speed differs from
Hamiltonian system is equivalent to searching for point to point in M) and anisotropic (i.e., the speed
periodic orbits in a Reeb flow. Thus in this context, differs depending on the direction of travel).
636 Control Problems in Mathematical Physics
See also: Hamiltonian Fluid Dynamics; Integrable Systems Etnyre J and Ng L (2003) Problems in Low Dimensional Contact
and Recursion Operators on Symplectic and Jacobi Topology, Topology and Geometry of Manifolds (Athens,
Manifolds; Minimax Principle in the Calculus of Variations. GA, 2001), pp. 337357, Proc. Sympos. Pure Math., vol. 71.
Providence, RI: American Mathematical Society.
Geiges H Contact geometry. Handbook of Differential Geometry,
Further Reading vol. 2 (in press).
Geiges H (2001a) Contact Topology in Dimension Greater than
Aebisher B, Borer M, Kalin M, Leuenberger Ch, and Reimann Three, European Congress of Mathematics, vol. II (Barcelona,
HM (1994) Symplectic Geometry, Progress in Mathematics, 2000), Progress in Mathematics, vol. 202, pp. 535545. Basel:
vol. 124. Basel: Birkhauser. Birkhauser.
Arnold VI (1989) Mathematical Methods of Classical Mechanics, Geiges H (2001b) A brief history of contact geometry and
Graduate Texts in Mathematics, vol. 60, xvi516, pp. 163179. topology. Expositiones Mathematicae 19(1): 2553.
New York: Springer. Ghrist R and Komendarczyk R (2001) Topological features of
Arnold VI (1990) Contact Geometry: The Geometrical Method of inviscid flows. An Introduction to the Geometry and Topology
Gibbss Thermodynamics, Proceedings of the Gibbs Symposium. of Fluid Flows (Cambridge, 2000), 183201, NATO Sci. Ser. II
(New Haven, CT, 1989), pp. 163179. Providence, RI: American Math. Phys. Chem., vol. 47. Dordrecht: Kluwer Academic.
Mathematical Society. Giroux E (2002) Geometrie de contact: de la dimension trois
Beals R and Greiner P (1988) Calculus on Heisenberg manifolds. vers les dimensions superieures, Proceedings of the Inter-
Annals of Mathematics Studies 119. national Congress of Mathematicians, vol. II (Beijing, 2002),
Eliashberg Y, Givental A, and Hofer H (2000) Introduction to pp. 405414. Beijing: Higher Ed. Press.
Symplectic Field Theory, GAFA 2000 (Tel Aviv, 1999), Geom. Hofer H and Zehnder E (1994) Symplectic Invariants and
Funct. Anal. 2000, Special Volume, Part II, pp. 560673. Hamiltonian Dynamics, Birkhauser Advanced Texts: Basler
Etnyre J. Legendrian and transversal knots. Handbook of Knot Lehrbucher, pp. xiv341. Basel: Birkhauser.
Theory (in press). Taylor ME (1984) Noncommutative Microlocal Analysis, Part I,
Etnyre J (1998) Symplectic Convexity in Low-Dimensional Mem Amer. Math. Soc., 52, no. 313. American Mathematical
Topology, Symplectic, Contact and Low-Dimensional Topol- Society.
ogy (Athens, GA, 1996), Topology Appl., vol. 88, No. 12,
pp. 325.
It is easy to check that the feedback control Various problems can be formulated in terms of
u(y1 , y2 ) = y1 y2 stabilizes the system asymptot- reachable sets, for example, controllability requires
ically to the origin, that is, for every initial data that for every y the union of all R(t; y) as t ! 1
(
y1 ,
y2 ), the solution of the corresponding Cauchy includes the entire space. The dependence of R(t; y)
problem satisfies limt ! 1 (y1 , y2 )(t) = (0, 0). on time t and on the set of controls U is also a
Another simple problem consists in driving the subject of investigation: one may ask whether the
point to the origin with zero velocity in minimum same points in R(t; y) can be reached by using
time from given initial data. It is quite easy to see controls which are piecewise constant, or take
that the optimal strategy is to accelerate towards the values within some subsets of U.
638 Control Problems in Mathematical Physics
Control of ODEs the so-called geometric control theory. The main idea
is that controllability (and properties of optimal
For most proofs we refer to Agrachev and Sachkov
trajectories) is determined by the Lie algebra gener-
(2004) and Sontag (1998).
ated by vector fields fi . For example:
Controllability
Theorem 5 (Lie-algebraic rank condition). Let L
be the Lie algebra generated by the vector fields
Consider first the case of a linear system: fi , i = 1, . . . , m, and assume f0 = 0. If L(y) is of
y_ Ay Bu; u 2 U; y0 y0 8 dimension n at every point y then the system is
controllable.
where y, y0 2 Rn , U Rm , A is an n n matrix and
B an n m matrix. We have the following property We refer to Agrachev and Sachkov (2004)
of reachable sets: and Jurdjevic (1997) for general presentation of
geometric control theory and give a simple example
Theorem 1 If U is compact convex then the to show how Lie brackets characterize reachable
reachable set R(t) for [8] is compact and convex. directions.
A control system [8] is controllable if taking Example 3 Consider the Brockett integrator
U = Rm we have R(t) = Rn for every t > 0. By
linearity, this is equivalent to requiring the reachable y_ 1 u1 ; y_ 2 u2 ; y_ 3 u1 y2 u2 y1
set to be a neighborhood of the origin in case of Starting from the origin, using constant controls, we
bounded controls. Define the controllability matrix can move along curves tangent to the y1 y2 plane.
to be the n nm matrix However, let f1 = (1, 0, y2 ) and f2 = (0, 1, y1 ) (fields
corresponding to constant controls); then their Lie
CA; B B; AB; . . . ; An1 B
bracket is given by
Controllability is characterized by the following:
f1 ; f2 0 Df2 f1 Df2 f2 0 0; 0; 2
Theorem 2 (Kalman controllability theorem). The
Moving for time t first along the integral curve of f1 ,
linear system [8] is controllable if and only if
then of f2 , then of f1 , and finally of f2 , we reach
rank(C(A, B)) = n.
a point t2 [f1 , f2 ](0) o(t2 ) along the vertical direc-
For linear systems, there exists a duality between tion y3 . This corresponds to say that the system
controllability and observability in the sense of the satisfies LARC.
following theorem:
Optimal Control
Theorem 3 Consider the linear control system [8]
and assume to observe the variable z(y) = Cy for The theory of optimal control has developed in three
some p n matrix C. Then, observability holds if main directions:
and only if the linear system y_ = At y Ct v is Existence of optimal controls, under various
controllable. assumptions on L, f , U. When the sets F(t, y) are
convex, optimal solutions can be constructed follow-
There exists no characterization of controllability
ing the direct method of Tonelli for the calculus of
for nonlinear systems as for linear ones, but we have
variations, that is, as limits of minimizing sequences:
the linearization result:
the two main ingredients are compactness and lower-
Theorem 4 A nonlinear system is locally control- semicontinuity. If convexity does not hold, existence
lable if its linearization is. The converse is false. is not granted in general but for special cases.
Necessary conditions for the optimality of a
There are many results for the important class of
control u(). The major result in this direction is
controlaffine systems
the celebrated Pontryagin maximum principle
X
m (PMP) which extends the EulerLagrange equation
y_ f0 y fi yui 9 to control systems, and the Weierstrass necessary
i1
conditions for a strong local minimum in the
where f0 , . . . , fm are smooth vector fields on Rn and calculus of variations. Various extensions and other
U = Rm . In general, there exists no explicit represen- necessary conditions are now available (Agrachev
tation for the trajectories of [9], in terms of integrals and Sachkov 2004).
of the control as it happens for linear systems. Still, a Sufficient conditions for optimality. The standard
rich mathematical theory has been developed apply- procedure resorts to embedding the optimal control
ing techniques and ideas from differential geometry: problem in a family of problems, obtained by
Control Problems in Mathematical Physics 639
varying the initial conditions. One defines the value Alternatively, one can define the maximized
function V by Hamiltonian
Vt;
y inf Jy; u Hy; p maxhp; f y; ui
u
where the inf is taken over the set of trajectories and but H may fail to be smooth. Another difficulty lies
controls satisfying y(t) =
y. Under suitable assumptions, in the fact that an initial condition is given for y and
V is the solution to a first-order HamiltonJacobian a final condition is given for .
PDE. The lack of regularity of the value function V has The proof of PMP relies on a special type of
long provided a major obstacle to a rigorous mathema- variations, called needle variations, of a reference
tical analysis, solved by the theory of viscosity solutions trajectory. Given a candidate optimal control u and
(Bardi and Capuzzo Dolcetta 1997). Another method corresponding trajectory y , a time of approximate
consists in building an optimal synthesis, that is, a continuity for f (y (), u ()) and ! 2 U, a needle
collection of trajectorycontrol pairs. variation is a family of controls u" obtained
Pontryagin maximum principle Consider a general by replacing u with ! on the interval [ ", ].
autonomous control system: A needle variation gives rise to a variation v of the
trajectory satisfying the variational equation
y_ f y; u 10
vt
_ Dy f y t; u t vt 14
where y 2 Rn and u 2 U compact subset of Rm . We
assume to have regularity of f guaranteeing existence in classical sense only after time . Recently Piccoli
and uniqueness of trajectories for every u() 2 U. For and Sussmann (2000) introduced a setting in which
a fixed T > 0, an optimal control problem in Mayer needle and other variations happen to be
form is given by differentiable.
One may also consider some final (or initial)
min yT; u; y0
y 11
u2U constraint:
where is the final cost and y the initial condition. T; yT 2 S 15
More generally, one can consider also the Lagran-
R where S R Rn (and T not fixed). In this case, the
gian cost L(y, u)dt and reduce to this case by
final condition for p is more complicated as well as
adding a variable y0 (0) = 0 and y_ 0 = L.
the proof of PMP. It is interesting to note the many
The well-known PMP provides, under suitable
connections between PMP and classical mechanics
assumptions, a necessary condition for optimality in
framework well illustrated by Bloch (2003) and
terms of a lift of the candidate optimal trajectory to
Jurdjevic (1997).
the cotangent bundle. For problems as [11], PMP
can be stated as follows:
Value function and HJB equation In this section
Theorem 6 Let u () be a (bounded) admissible we consider the minimization problem
control whose corresponding trajectory y () = y(, u )
is optimal. Call p : [0, T] 7! Rn the solution of the inf T; yT; u 16
u2U
adjoint linear equation
for the control system
_
pt pt Dy f y t; u t
12
pT r y T y_ f t; y; u; ut 2 U a.e. 17
Then the maximality condition subject to the terminal constraints [15], where
S Rn1 is a closed target set.
pt f y t; u t max pt f y t; ! 13
!2U Theorem 7 (PDE of dynamic programming).
holds for almost every time t 2 [0, T]. Assume that the value function V, for [15][17],
is C1 on some open set
R Rn , not intersecting
Notice that the conclusion of the theorem can be the target set S. Then V satisfies the Hamilton
interpreted by saying that the pair (y, p) satisfies the Jacobi equation
system:
Vs s; y min Vy s; y f s; y; ! 0
@Hy ; p; u @Hy ; p; u !2U
18
y_ ; p_
@p @y 8s; y 2
where H(y, p, u) = hp, f (y, u)i. This is a pseudo Equation [18] is called the HamiltonJacobiBellman
Hamiltonian system, since H also depends on u . (HJB) equation, after Richard Bellman. In general,
640 Control Problems in Mathematical Physics
Then we say that the system [19] is observable at method of Coron, which consists in finding a
time T if there exists C(T) such that trajectory y such that the following hold:
Z T 1. y(0) = y(T) = 0;
E0 CT jzx 1; tj2 dt 2. the linearized system around y is controllable.
0
Then by implicit-function theorem, local controll-
which means that if we observe zero displacement
ability is granted, that is, there exits " > 0 such that
on the right end for time T then the solution has
for every data y0 , y1 of norm less than ", there exists
zero energy and hence vanishes. In this case, the
a control steering the system from y0 to y1 in time T.
system is observable for every time T
2: this is
This method does not give many advantages in the
precisely the time taken by a wave to travel from the
finite-dimensional case, but permits to obtain excel-
right end point to the left one and backward.
lent results for PDE systems such as Euler, Navier
Thanks to a duality as for the finite-dimensional
Stokes, SaintVenant, and others (Coron 2002).
case, observability of [19] is equivalent to null
controllability for [5][7], that is, to the property
Control of Schrodinger Equation
that for every initial conditions y0 , y1 there exists a
control u() such that the corresponding solution Consider the issue of designing an efficient transfer of
verifies y(x, T) = yt (x, T) = 0. More precisely, the population between different atomic or molecular
desired control is given by u(t) = ~zx (1, t), where ~z is levels using laser pulses. The mathematical descrip-
the solution of [19] minimizing the functional (over tion consists in controlling the Schrodinger equation.
L2 H 1 ) Many results are available in the finite-dimensional
case. Finite-dimensional closed quantum systems are
Jz;0; zt ; 0 in fact left-invariant control systems on SU(n), or on
Z Z Z
1 T 2 the corresponding Hilbert sphere S2n1 Cn , where
jzx 1; tj dt y0 zt ; 0dx y1 z; 0dx n is the number of atomic or molecular levels, and
2 0
powerful techniques of geometric control are avail-
One can check that this functional is continuous and able both for what concerns controllability and
convex, and the coercivity is granted by the optimal control (Agrachev and Sachkov 2004,
observability of [19]; thus, a minimum exists by Boscain and Piccoli 2004, Jurdjevic 1997).
the direct method of Tonelli. This is an example of Recent papers consider the minimum-time pro-
the method known as Hilberts uniqueness method blem with unbounded controls as well as minimiza-
introduced by Lions (1988). tion of the energy of transition. Boscain et al. (2002)
In the multidimensional case, controllability can have applied the techniques of sub-Riemannian geo-
be characterized by imposing a condition on the metry on Lie groups and of optimal synthesis on two-
region @ on which the control acts. More dimensional manifolds to the population transfer
precisely, rays of geometric optics in should problem in a three-level quantum system driven by
intersect (Zuazua 2005). two external fields of arbitrary shape and frequency.
If we consider infinite-time horizon T = 1 and Although many results are available for finite-
introduce the functional dimensional systems, only few controllability prop-
Z 1 Z erties have been proved for the Schrodinger equation
J kyk2 dt N u2 dt dx as a PDE, and in particular no satisfactory global
0
controllability results are available at the moment.
then the optimal control is determined as follows.
If (y, p) is a solution of the optimality system:
[5][6] with y = 0 outside and Further Reading
ptt p y 0; @ p Ny 0 on Agrachev A and Sachkov Y (2004) Control from a Geometric
Perspective. Springer.
p 0 on @ Bardi M and Capuzzo Dolcetta I (1997) Optimal Control and
Viscosity Solutions of HamiltonJacobiBellman Equations.
then u = y on (Lions 1988, Zuazua 2005). Boston: Birkhauser.
Bloch AM (2003) Nonholonomic Mechanics and Control, with
the collaboration of J. Baillieul, P. Crouch and J. Marsden,
Controllability via Return Method of Coron with scientific input from P. S. Krishnaprasad, R. M. Murray
and D. Zenkov. New York: Springer.
As we saw in Theorem 4, a nonlinear system may be Boscain U and Piccoli B (2004) Optimal Synthesis for Control
controllable even if its linearization is not. In this Systems on 2-D Manifolds. Springer SMAI, vol. 43. Heidelberg:
case, controllability can be proved by the return Springer.
642 Convex Analysis and Duality Methods
Boscain U, Chambrion T, and Gauthier J-P (2002) On the K P Komornik V (1994) Exact Controllability and Stabilization. The
problem for a three-level quantum system: optimality implies Multiplier Method. Chichester: Wiley.
resonance. Journal of Dynamical and Control Systems Lasiecka I and Triggiani R (2000) Control theory for Partial
8: 547572. Differential Equations: Continuous and Approximation The-
Bullo F and Lewis AD (2005) Geometric Control of Mechanical ories. Cambridge: Cambridge University Press.
Systems. New York: Springer. Lions JL (1988) Exact controllability, stabilization and perturba-
Coron JM (2002) Return method: some application to flow tions for distributed systems. SIAM Review 30: 168.
control. Mathematical Control Theory, Part 1, 2 (Trieste, Piccoli B and Sussmann HJ (2000) Regular synthesis and
2001). In: Agrachev A (ed.) ICTP Lecture Notes, vol. VIII. sufficiency conditions for optimality. SIAM Journal of Control
Trieste: Abdus Salam Int. Cent. Theoret. Phys. Optimization 39: 359410.
Fursikov AV and Imanuvilov O Yu (1996) Controllability of Sontag ED (1998) Mathematical Control Theory. New York:
Evolution Equations. Lecture Notes Series, vol. 34. Seoul: Springer.
Seoul National University. Zuazua E (2005) Propagation, observation and conrol of wave
Jurdjevic V (1997) Geometric Control Theory. Cambridge: approximatex by finite difference methods. SIAM Review
Cambridge University Press. 47: 197243.
f x 1 jxj2
( q
2 Duality Arguments
f y 1 jyj if jyj 1
1 otherwise Two Key Results
Theorem 11 Let X be a normed space and let convex l.s.c. function and let F 7! X be the convex
f : X ! [0, 1] be a convex and proper function; functional defined by
assume that f is continuous at 0, then
Au if u 2 DA
(i) f achieves its minimum on X Fu
1 otherwise
(ii) f (0) = f (0) = inf f
Proof Assume that there exists u0 2 D(A) such that is
continuous at Au0 . Then
(i) Let M be an upper bound of f on the ball {kxk
R}. Then (i) The Fenchel conjugate of F is given by
we may assume that pn converges weakly to is a possibly concentrated Radon measure sup-
some p. Since G(A) is a (weakly) closed subspace ported on . In general, the operator A : u 2
of X Y, we infer that (u, p) as the limit of C1 () L2 () 7! ru 2 L2 (; Rn ) is not closable
(un , pn ) still belongs to G(A). Thus, we conclude, and we need to come back to the general formula
thanks to the (weak) lower-semicontinuity of [3]. The general structure of G(A) has been given in
Bouchitte et al. (1997) and Bouchitte and Fragala
lim inf Jun lim pn p Ju (2002, 2003), namely
n n
&
where is an bounded open subset of R n , f : even if the dependence of f (x, u, z) with respect to u
Rn ! R is a convex integrand with quadratic growth is nonconvex. The idea consists in embedding the
(i.e., cjzj2 f (x, z) C(1 jzj2 for suitables C space BV() in the larger space BV( R) through
c > 0). Then X = L2 (), Y = L2 (; Rn ), the map u 7! 1u , where 1u is the characteristic
Z function defined on R by setting
Gv f x; vx dx
1u x; t : 1 if ux > t
0 otherwise
and A : u 2 C1 () 7! ru 2 L2 (; Rn ). It turns out
that A is closable and that the domain of A Then it is possible to show, under suitable
characterizes the Sobolev space W 1, 2 () on which conditions on the integrand f, that there exists
coincides with the distributional gradient
A a convex l.s.c., 1-homogeneous functional
operator. G : BV( R) ! R [ {1} such that F(u) = G(1u ).
The situation is more involved if we consider This functional G is constructed as in the Example
3 taking C to be a suitable convex subset of
Z C0 ( R). This nice new idea has been the key
Fu f x; ru d tool of the calibration method developed recently
(Alberti et al. 2003).
Convex Analysis and Duality Methods 647
Convex Variational Problems in Duality sup h . Recalling [4], we therefore consider the dual
problem:
Finite-Dimensional Case
P sup b y : y 0; AT c 0
We sketch the duality scheme in two cases.
Linear programming Let c 2 R n , b 2 Rm and A an Theorem 19 The following assertions are equivalent:
m n matrix. We denote by AT the transpose
(i) (P) has a solution.
matrix. We consider the linear program
(ii) (P ) has a solution.
P inffcjx: x 0; Ax bg (iii) There exists (x0 , y0 ) 2 Rn Rm
such that
Ax0 b, AT y0 c 0.
and its perturbed version (p 2 Rm )
In this case, we have min (P) = max (P ) and
hp : inffcjx: x 0; Ax p bg an admissible pair ( x, y) is optimal if and
only if c x = b y or, equivalently, satisfies
An easy computation gives the complementarity relations: (A x b)
y=
8y 2 Rm ; (AT y c) x
= 0.
if AT y c 0; y 0 4
h y bjy
1 otherwise
Convex programming Let f , g1 , . . . , gm : X ! R be
Lemma 18 Assume that inf (P) is finite. Then: convex l.s.c. functions and the optimization problem
(i) h is convex proper and l.s.c. at 0. P infff x: gj x 0; j 1; 2 . . . ; mg
(ii) (P) has at least one solution.
Here X = Rn or any Banach space. As before, we
Proof We introduce the (n m) (m 1) matrix introduce the value function
B defined by
p 2 Rm ; hp : infff x:
cT 0
B : gj x pj 0 j 2 1; 2; . . . ; mg
A Im
and compute its Fenchel conjugate:
(Im is the m-dimensional identity matrix). Denote
{b1 , b2 , . . . , bnm } Rm1 P
the columns of B and K
inf
the convex cone K := { jj = nm
j bj : j 0}. By 2 Rm ; h x2X fLx; g if 0
=1 1 otherwise
Farkas lemma, this cone K is closed. P
where L(x, ) := f (x) i gi (x) is the so-called
(i) Let := lim inf {h(p): p ! 0}. We have to prove
Lagrangian. We notice that h is convex and that
that h(0) = inf P. Let {p" } be a sequence in
the equality h(0) = h (0) is equivalent to the zero-
Rm such that p" ! 0 and h(p" ) ! . By the
duality gap relation
definition of h, we may choose x" 0 such that
Ax" b and (c j x" ) ! . Then we see that the inf sup Lx; sup inf Lx;
column vector x~" associated with (x" , b Ax" ) 2 x x
Theorem 20 Assume that [5] holds. Then x is
2K
b optimal for (P) if and only if there exist Lagrangian
~ = (x, x0 ) such that x 0, x0 0,
and there exists x multipliers 1 , 2 , . . . m in R such that
(c j x) = and Ax x0 = b. It follows that x is !
X
admissible for (P) and then (c j x) = h(0). 2 argmin f
x j gj ; j gj
x 0; 8j
(ii) We repeat the proof of (i) choosing p" = 0 so X j
that = inf (P). &
Notice that the existence of such a solution x
n
Thanks to the assertion (i) in Lemma 18, we deduce is ensured if, for example,
P X = R and if, for some
from Theorem 10 that inf (P) = h(0) = h (0) = k > 0, the function f k j gj is coercive.
648 Convex Analysis and Duality Methods
XY
jx; z jtrzj2 jzj2
2 This formulation, where the infimum is achieved (as
, being the Lame constants). we minimize an l.s.c. functional on a compact set for
We apply Proposition 14 with X = W 1, 2 (; Rn ), the weak star topology), is already a relaxation of
2
Y = L2 (; Rnsym ), Au = e(u) and where we set the initial Monge mass transport problem,
8 R Z
< Rf u dx
> #
inf cx; Txdx: T
T X
u 1 g u dHn1 if u 0 on 0
>
: where the infimum is searched among all transports
1 otherwise
Z maps T : X 7! Y pushing forward on (i.e., such
v jx; v dx that (T 1 (B) = (B) for all Borel subset B Y).
This is equivalent to restricting the infimum in [6] to
After some computations, we may write the supre- the subclass {T } (, ), where
mum appearing in Proposition 14 as our dual Z
problem hT ; x; yi : x; Txdx
Z X
2
P sup j x; dx: 2 L2 ; Rnsym ; In order to find a dual problem for [6], we fix
2 P(Y) and consider the functional F : Mb (X) !
[0, 1) defined by
div f on ; n g on 1
Tc ; if 0; X 1
F
where j is the MoreauFenchel conjugate with 1 otherwise
respect to the second argument and n(x) denotes
the exterior unit normal on . The matrix-valued (Mb (X) denote the Banach space of (bounded)
map is called the stress tensor and j the stress signed Radon measures on X).
potential. Note that the boundary conditions for n Lemma 22 F is convex, weakly-star l.s.c. and
have to be understood in the sense of traces. proper. Its MoreauFenchel conjugate is given by
Z
Theorem 21 The problems (P) and (P ) have
solutions and we have the equality: inf(P) = sup (P ). 8 2 C X; F c ydy
0
Y
Convex Analysis and Duality Methods 649
X Y
Proof The convexity property is obvious and the
properness follows from the fact that We will say that (, ) 2 F c is a pair of c-concave
Z conjugate functions if = c and c = (where
c
F cx; y dxdy symmetrically (x) := inf {c(x, y) (x): y 2 Y}).
XY Checking the latter condition amounts to verifying
Let n be such that n * (weakly star). We may that enjoys the so-called c-concavity property
assume that lim inf n F(n ) = limn F(n ) := is finite. cc = (in general, we have only cc , whereas
Then n and the associated optimal n are prob- ccc = c ). We refer for instance to Villani (2003) for
ability measures on X and on X Y, respectively. further details about this c-duality.
As X and Y are compact, possibly passing to a Now, by exploiting Theorem 10 and Lemma 22,
subsequence, we may assume that n * , and we obtain a very simple proof of Kantorovich
clearly we have 2 (, ). Since c(x, y) is l.s.c. duality theorem:
non-negative, we conclude that Theorem 23 The following duality formula holds:
Z
Z Z
lim inf Fn lim inf cx; yn dxdy
n n XY Tc ; sup d d : ; 2 F c
Z X Y
cx; y dxdy Moreover, the supremum in the right-hand side
XY
member is achieved by a pair (, ) of conjugate
F
c-concave functions such that, for any optimal in
Let us compute now F (). We have (y) = c(x, y), -a.e.
[6], there holds (x)
Z
Proof By Theorem 10 and Lemma 22, we have
F inf cx; ydxdy
XY
Z Tc ; F
Z Z
d: 2 PX; 2 ;
sup d c d: 2 C0 X
Z X
Z X Z
Y
inf cx; y xdxdy:
XY sup d d: ; 2 F c
X Y
2 ; Tc ;
Z
where the last inequality follows from the definition
c y dy of F c . Therefore, inf [6] = sup [7]. Furthermore, on
Y
the right-hand side of first equality, we increase the
To prove that the last inequality is actually an supremum by substituting with cc (recall that
equality, we observe that, for every y 2 Y and 2 ccc = c ). Thus,
C0 (X), the minimum of the l.s.c. function c( , y) Z Z
is attained on the compact set X and there exists a sup7
sup d c d: 2 C0 X;
Borel selection map S(y) such that c (y) = c(S(y), y) X Y
(S(y) for all y 2 Y. We obtain the desired equality by
choosing defined, for every test , by c-concave
Z Z
Take a maximizing sequence (n , cn ) of c-concave
x; ydxdy : Sy; ydy
XY Y conjugate functions. It is easy to check that {fn }
& is equicontinuous on X: this follows from the c-con-
cavity property and from the uniform continuity of
We observe that, for every 2 C0 (X), the func- c (observe that n (x1 ) n (x2 ) = cc cc
n (x1 ) n (x2 )
tion c introduced in Lemma 22 is continuous (use supY {c(x1 , ) c(x2 , )}). Then, by Ascolis theorem,
the uniform continuity of c) and therefore the pair possibly passing to subsequences, we may assume
(, c ) belong to the class that: n cn converges uniformly to some continuous
function where {cn } is a suitable sequence of
F c : ; 2 C0 X C0 Y:
reals. Then, one checks that is still c-concave
x y cx; yg and that (n cn )c = cn cn converges uniformly to
650 Convex Analysis and Duality Methods
X
ality relation:
As it appears, Tc (, ) depends only on the differ-
0 inf6
sup7
ence f = , which belongs to the space M0 (X) of
Z signed measure on X with zero average. Defining
cx; y x
y dxdy N(f ) := Tc (f , f ) provides a seminorm (Kantoro-
XY
vich norm) on M0 (X) (it turns out that M0 (X) is
& not complete and that in general its completion is a
strict subspace of the dual of Lip(X)).
We will now specialize to the case where X is a
Remark 24 compact manifold equipped with a geodesic dis-
(i) In their discrete version (i.e., , are atomic tance. This will allow us to link the original problem
measures), problems [6] and [7] can be seen as to another primaldual formulation closer to that
particular linear programming problems (see the considered in the section Primaldual formulation
section Finite-dimensional case). in mechanics and yielding to a connection with
(ii) The case X = Y Rn and c(x, y) = (1=2)jx yj2 partial differential equations. As a model example,
let us assume that K = , where is a bounded
is important. In this case, the notion of c-concavity
is linked to convexity and the Fenchel transform connected open subset of Rn with a Lipschitz
boundary. Let be a compact subset (on
since, for every 2 C0 (X), one has
which the transport will have zero cost) and define
!
j j2 j j 2
c cx;y: inf H1 S n :
2 2
S Lipschitz curve joining x to y; S 9
Then if (,
c ) is a solution of [7], we find that where H1 denotes the one-dimensional Hausdorff
measure (length). It is easy to check that
jxj2
0 x : x
cx; y minf x; y; x; y; g
2
where (x, y) is the geodesic distance on (induced
is convex continuous and that the extremality by the Euclidean norm). Furthermore, the following
condition: (x)
c (y) = c(x, y) is equivalent to characterization holds:
Fenchel equality 0 (x) 0 (y) = (xjy). There-
fore, any optimal is supported in the graph u 2 Lip1 X () u 2 W 1;1 ;
of the subdifferential map @0 . In the case jruj 1 a.e. in ; u cte on 10
In the following, we assume that X = Y and that We will now derive a new dual problem for [11]
c(x, y) is a semidistance. As an immediate by using Proposition 14. To this aim, we consider
Convex Analysis and Duality Methods 651
the sequence {p } does converge weakly-star to , Further Reading
the unique minimizer of the problem
Alberti G, Bouchitte G, and Dal Maso G (2003) The calibration
inffE : solution of 12
g method for the MumfordShah functional and free-disconti-
nuity problems. Calculus of Variations and Partial Differential
The general case, in particular when all optimal Equations 16(3): 299333.
Ambrosio L (2003) Lecture notes on optimal transport problems.
measures are singular, is open.
In: Mathematical Aspects of Evolving Interfaces (Funchal
Remark 29 Variational problems [11], [12] have 2000), Lecture Notes in Mathematics, vol. 1812, pp. 152.
Berlin: Springer.
important counterparts in the theory of elasticity
Borwein M and Lewis SA (2000) Convex Analysis and Nonlinear
and in optimal design problems (see Bouchitte and Optimization. Theory and Examples, CMS Series. Berlin:
Buttazo (2001)). They read, respectively, as Springer.
Z Bouchitte G and Buttazzo G (2001) Characterization of optimal
shapes and masses through MongeKantorovich equations.
max u df: u 2 \p>1 W 1;p ; R n ;
Journal of the European Mathematical Society 3: 139168.
Bouchitte G, Buttazzo G, and De Pascale L (2003) A p-Laplacian
rux 2 K a:e: on ; u 0 on approximation for some mass optimization problems. Journal
of Optimization Theory and Applications 118: 125.
Z Bouchitte G, Buttazzo G, and Seppecher P (1997) Energies with
min R n2 ;
0K : 2 Mb ; respect to a measure and applications to low dimensional
sym
structures. Calculus of Variations and Partial Differential
Equations 5: 3754.
div f on n Bouchitte G and Dal Maso G (1993) Integral representation and
relaxation of convex local functionals on BV. Annali della
2 Scuola Superiore di Pisa 20(4): 483533.
where K R nsym ) is a convex compact subset of Bouchitte G and Fragala I (2002) Variational theory of weak
symmetric second-order tensors associated with the geometric structures: the measure method and its applications.
elastic material, 0K () = sup { z: z 2 K} is convex Variational Methods for Discontinuous Structures, Ser.
positively R1-homogeneous and the functional on PNLDE, vol. 51, pp. 1940. Basel: Birkhauser.
Bouchitte G and Fragala I (2003) Second order energies on thin
measures 0K ( ) is intended in the sense given in
structures: variational theory and non-local effects. Journal of
[1]. A celebrated example is given by Michells Functional Analysis 204(1): 228267.
problem (Michell 1904) where n = 2 and K := {z 2 Bouchitte G and Valadier M (1988) Integral representation of
2
Rnsym , j(z)j 1}, (z) being the largest singular value convex functionals on a space of measures. Journal of
of z. The potential 0K is given by the nondifferenti- Functional Analysis 80: 398420.
Ekeland I and Temam R (1976) Analyse convexe et problemes
able convex function 0K () = 1 () 2 (), where the
variationnels. Paris: Dunod-Gauthier Villars.
i ()s are the singular values of . Evans LC (1997) Partial differential equations and Monge
Kantorovich mass transfer. In: Bott R, Jaffe A, Jerison D,
Unfortunately, it is not known if the vector
Lutsztig G, Singer I, and Yau JT (eds.) Current Developments
variational problem above can be linked to an in Mathematics, pp. 65126. Cambridge.
optimal transportation problem of the type [6], Michell AGM (1904) The limits of economy of material in frame
even if the analogous of equivalence [10] does exist structures. Philosophical Magazine and Journal of Science
in the Michells case, namely (for convex): 6: 589597.
Rockafellar RT (1970) Convex Analysis. Princeton: Princeton
eu 1 on University Press.
Villani C (2003) Topics in Optimal transportation, Graduate
() jux uyjx yj jx yj2 ; 8x; y studies in Mathematics, vol. 58. Providence, RI: AMS.
time derivative of the expansion (Ehlers 1961) only if the gravitational field equations remain valid
can be written as to arbitrarily early times; but we would in fact
expect that, at high enough energy densities,
S quantum gravity would take over from classical
3 2!2 2 ab;b 3p 6 gravity, so whether or not there was indeed a
S 2
singularity would depend on the nature of the as
where the representative length scale S is defined by yet unknown theory of quantum gravity. The cash
= 3S=S. This is the basis of the fundamental value of the singularity theorems then is the
singularity theorem: if in an expanding universe implication that, when the energy conditions are
! = 0 = ab and the combined matter present satisfies satisfied, one would indeed be involved in such a
[4], with 0, then there was a singularity where quantum gravity realm in the very early universe.
S ! 0 a finite time t0 < 1=H0 ago, H0 = (S=S)0 being
the present value of the Hubble constant. The energy
density will diverge there, so this is a spacetime
The Standard FriedmannLematre
singularity: an origin of physics, matter, and space- Models
time itself. However, the deduction does not follow if The standard models of cosmology are the Fried-
there is rotation or acceleration, which could mannLematre (FL) models with RobertsonWalker
conceivably avoid the singularity, so this result is by (RW) geometry: that is, they are exactly spatially
itself inconclusive for realistic cosmologies. homogeneous and locally isotropic, invariant under a
The vorticity obeys conservation laws analogous G6 of isometries (Robertson 1933, Ehlers 1961).
to those in Newtonian theory (Ehlers 1961). They have a unique cosmic time function t, with
Vorticity-free solutions (! = 0) occur whenever the space sections {t = const:} of constant spatial curva-
fluid flow lines are hypersurface-orthogonal in ture orthogonal to the uniquely preferred 4-velocity
spacetime, that is, there exists a cosmic time ua . The fluid acceleration, vorticity, and shear all
function for the comoving observers, which will vanish, and all physical quantities depend only on the
measure proper time along the flow lines if time coordinate t. They can be represented by a
additionally the fluid flow is geodesic. The rate of metric with scale factor S(t):
change of shear is related to the conformal curvature
(Weyl) tensor, which represents the free gravita- ds2 gab dxa dxb
tional field, and which splits into an electric part Eab dt2 S2 tfdr2 f 2 rd
2 sin2
d2 g
and a magnetic part Hab in close analogy with 7
electromagnetic theory. Shear-free solutions ( = 0)
are very special because they strongly constrain the in comoving coordinates (xa ) = (t, r,
, ), where f (r) =
Weyl tensor; indeed if the flow is shear free and { sin r, r, sinh r} if {k = 1, 0, 1}, and the matter is a
geodesic, then it either does not expand ( = 0), or perfect fluid with 4-velocity vector ua = dxa =ds = 0a .
does not rotate (! = 0) (Ellis 1967). The set of The curvature of the space sections {t = const:} is
cosmological observations associated with generic K = k=S2 ; these 3-spaces are necessarily closed (com-
cosmological models has been characterized in pact) if they are positively curved (k = 1), but may be
power series form by Kristian and Sachs (1966), open or closed in the flat (k = 0) and negatively curved
and that result has been extended to general models (k = 1) cases, depending on their topology
by Ellis et al. (1985). (Lachieze-Rey and Luminet 1995).
The local regularity of the theory is expressed in Matter obeys the conservation equation [5], whose
existence and uniqueness theorems for the EFEs, outcome depends on the equation of state; for
provided the matter behavior is well defined through baryons = M=S3 , whereas for radiation = M=S4 ,
prescription of suitable equations of state (Hawking where M is a constant. The dynamics of the models is
and Ellis 1973). However, in general the theory governed by the Raychaudhuri equation
breaks down in the large, and this feature is S
specified by the HawkingPenrose singularity theo- 3 3p 8
rems, predicting the existence of a geodesic incom- S 2
pleteness of spacetime under conditions applicable which has the Friedmann equation
to realistic cosmological models satisfying the energy
3S_ 2 3k
conditions given by eqns [3] and [4] (Hawking and 2 9
Ellis 1973, Tipler et al. 1980). However, the S2 S
conclusion does not follow if the energy conditions as a first integral whenever S 6 0. Depending on the
are not satisfied. Furthermore, the deduction follows matter components present, one can qualitatively
Cosmology: Mathematical Aspects 655
characterize the dynamical behavior of these models nature of which is most clear when represented in
(Robertson 1933) and find exact and approximate conformal diagrams (Hawking and Ellis 1973, Tipler
solutions to these equations as well as phase planes et al. 1980). These result from the fact that light
representing the relation of the different models to can only proceed a finite distance in the finite time
each other; for example, Ehlers and Rindler (1989) since the origin of the universe, and imply that for
give the phase planes for models with noninteracting a standard radiation-dominated hot-big-bang early
matter and radiation and an arbitrary cosmological universe, regions of larger than 1
angular size on
constant. Universes with maxima or minima in S(t) the surface of last scattering, which emits the CBR,
can only occur if k = 1; when = 0, the universe are causally disconnected: hence, no causal process
recollapses in the future iff k = 1. Static solutions since the start of the universe can account for the
are possible only if k = 1 and (assuming [4]) extreme isotropy of the CBR (T=T 105 over
> 0. The simplest expanding solutions are the the whole sky, once a dipole anisotropy T=T
Einsteinde Sitter universes with k = 0 = . 103 due to our local velocity relative to the
Equation [8] is a special case of [6], with cosmological rest frame is allowed for). This is the
corresponding implications: if the combined matter horizon problem, one of the driving forces
present satisfies [4], with 0, then there must have behind the theory of inflation (Guth 1981): the
been an initial singularity, or at least the universe idea that, in the very early universe, a slow-rolling
must have emerged from a quantum gravity domain. scalar field led to a brief exponential expansion
The temperature would have been arbitrarily high in through at least 50 e-folds (during which time the
the past, so there was a hot big bang era in the early spacetime was approximately de Sitter), thus
universe where matter and radiation were in equili- smoothing the universe and solving the horizon
brium with each other at very high temperatures that problem (Guth 1981, Peacock 1999). This is
rapidly fell as the universe expanded. Many physical possible because a scalar field can violate the energy
processes took place then, in particular nucleosynth- condition [3] and so allows acceleration: S > 0.
esis of light elements took place at 109 K. Decou- Consequently, there are now many studies of the
pling of matter and radiation took place at a dynamics of FLRW solutions driven by scalar fields
temperature of 4000 K, followed by formation of and the subsequent decay of these scalar fields into
stars and galaxies (see Peacock (1999) for a discus- radiation. One interesting point is that one can
sion of these physical processes). The black-body obtain exact solutions of this kind for arbitrarily
radiation emitted by the surface of last scattering at chosen evolutions S(t), provided they satisfy a
2
4000 K is observed by us today as cosmic black-body restriction on the magnitude of S , by running the
radiation (CBR) at a temperature of 2.75 K. field equations backwards to determine the needed
One can determine observational relations for potential V() (Ellis and Madsen 1991). The
these models such as the magnituderedshift relation inflationary paradigm is dominant in present-day
for standard candles at recent times from the EFEs theoretical cosmology, but suffers from the problem
(Sandage 1961). The aim of observations is to that it is not in fact a well-defined theory, for there
determine the Hubble constant H0 , dimensionless is no single accepted proposal for the physical
deceleration parameter q0 = (3=H02 )( S=S)0 , and nature of the effective scalar field underlying the
normalized density parameters 0i = 0i =3H02 for supposed exponential expansion; rather there are
each component of matter present. The spatial numerous competing proposals. As the inflaton has
curvature and the cosmological constant then follow not yet been identified, this theory is not yet
from [6] and [9]; also the present scale factor S0 is soundly linked to well-established physics.
determined if k 6 0. The universe is of positive
spatial curvature
P (k = 1) iff 0 m > 1,
Approximate FL Solutions
where m i 0i , = =3H02 . Current observa-
tions indicate m 0.3, 0.7, 0 1.02 The real universe is, of course, not exactly FL, and
0.02. Because the nucleosynthesis results limit the studies of structure formation depend on studies of
baryon density to a very low value (0b 0.02), solutions that are approximately FL models they
which is about the same as the density of luminous are realistic (lumpy) universe models. These
matter, this indicates the dominant presence of both enable detailed studies of observable properties
nonbaryonic dark matter and a repulsive force such as CBR anisotropies and gravitational lensing
corresponding to either a cosmological constant or induced by matter inhomogeneities, and of the
varying scalar field (dark energy). development of those inhomogeneities from quan-
Crucial causal limitations occur because of the tum fluctuations in the very early universe that then
existence of particle horizons (Rindler 1956), the get expanded to very large scales by inflation.
656 Cosmology: Mathematical Aspects
The key problem here is that apart from the standard of the CBR. The EhlersGerenSachs (EGS) theorem
coordinate freedom allowed in general relativity, there (Ehlers et al. 1968) provides a sound basis for this
is a serious gauge issue: the background FL model is not argument: it shows that if freely propagating CBR
uniquely determined by the realistic universe model; (obeying the Liouville equation) is exactly isotropic in
however, the magnitudes of many perturbed quantities an expanding universe domain U,then the universe is
depend on how it is fitted into the lumpy model. For exactly FL in that domain (i.e., it has exactly the RW
example, the density perturbation is determined spatially homogenous and isotropic geometry there),
pointwise by the equation the point being that any inhomogeneities in the
matter distribution between us and the surface of last
xi xi xi
scattering will produce anisotropies in the CBR
where (xi ) is the background density. But by temperature we measure. But that result does not
altering the correspondence between the background apply to the real universe, because the CBR is not
and realistic models (specifically, by the choice of exactly isotropic. The almost EGS theorem
surfaces (xi ) = const. in the realistic model) one can (Stoeger et al. 1995) shows that this result is stable:
assign that quantity any value, including zero (if one almost isotropic CBR in the domain U implies that
chooses (xi ) = (xi )). This is the gauge problem. the universe is almost-FL in that domain. The
One can handle it by using standard variables and application to the real universe comes by making a
keeping close track of the gauge freedom at all weak Copernican assumption: we assume we are
times. However, one then ends up with higher-order not special, so all observers in U (taken to be the
equations than necessary because some of the visible part of the universe) will also see almost
perturbation modes present are pure gauge modes isotropic CBR, just as we do. The result then
with no physical significance. Alternatively, one can follows. A further argument for homogeneity of the
fix the gauge by some unique specification of how universe comes from postulating uniform thermal
the background model is fitted into the realistic histories (Bonnor and Ellis 1986), but that argument
model, but there is no agreement on a unique way to is yet to be completed and applied in a practical way.
do this, and different choices give different answers.
The preferable resolution is to use gauge-invariant
Anisotropic and Inhomogeneous Models
variables, either coordinate based (Bardeen 1980) or
covariant, based on the (13) covariant decomposi- The FL universes are geometrically extremely special.
tion of spacetime quantities mentioned above (Ellis We wish further to understand the full range of
and Bruni 1989), in either case resulting in pertur- possible universe models, their dynamical behaviors,
bation equations without gauge freedom and of and which of them might, at some epoch, realistically
order corresponding to the physical degrees of represent the real universe. This enables us to see how
freedom. The key point in the latter approach is to the approximate FL models fit into this wider set of
choose covariant variables that vanish in the back- possibilities, and under what circumstances they are
ground spacetime; they are then automatically gauge attractors in this set of cosmologies.
invariant. Realistic structure formation studies carry Exact solutions are characterized by their space-
out this process for a mixture of matter components time symmetries. Symmetries are characterized by
with different average velocities, and extend to a the dimension s of the surfaces of homogeneity and
kinetic theory description of the background radia- the dimension q of the isotropy group at a general
tion (see Ellis and van Elst (1999) and references point, together giving the dimension r = s t of the
therein). The outcome is a prediction of the CBR group of isometries Gr (at special points, such as a
anisotropy power spectrum, determined by the center of symmetry, s can decrease and q increase
inhomogeneities in the gravitational field and the but always so that r stays unchanged). In the case of
motions of the matter components at decoupling a cosmological model, because the 4-velocity ua is
(Sachs and Wolfe 1967). This spectrum can then be invariant under isotropies, the only possible dimen-
compared with observations and used in determin- sions for the isotropy group are q = 3, 1, 0; whereas
ing the values of the cosmological parameters the dimension t of the surfaces of homogeneity can
mentioned above (see Peacock 1999). take any value from 4 to 0. This gives the basis for a
One crucial issue is why it is reasonable to use a classification of cosmological spacetimes (Ellis 1967,
perturbed FL model for the observable region of the Ellis and van Elst 1999).
universe. The key argument is that this is plausible When q = 3, we have isotropic solutions there
because of the high isotropy of all observations are no preferred spatial directions and it is then
around us when averaged on a sufficiently large a theorem that they must be spatially homoge-
spatial scale, and particularly the very low anisotropy neous FL universes (Ehlers 1961). When q = 1, we
Cosmology: Mathematical Aspects 657
have locally rotationally symmetric (LRS) solu- times. This is an indication that inflation can
tions, with precisely one preferred spacelike direc- succeed in making anisotropic early states resemble
tion at a generic point (Ellis 1967). When q = 0, the FL models at later times. Observational properties
solutions are anisotropic in that there can be no like element abundances and CBR anisotropy
continuous group of rotations leaving the solution patterns can be worked out in these models (some
invariant; however, there can be discrete isotropies of them develop a characteristic isolated hot spot
in some special cases. in the CBR sky). For q = 1 (r = 4), we have spatially
When t = 4,we have spacetime homogeneous solu- homogeneous LRS models, either Kantowski Sachs
tions, with all physical quantities constant; they cannot or Bianchi universes, and again observations can be
expand (by [5] and [3]). Nevertheless, two cases are of worked out in detail and phase planes developed
interest. For q = 1 (r = 5) we find the Godel universe, showing their dynamical behavior, often isotropiz-
rotating everywhere with constant vorticity, which ing at late times. There are orthogonal and tilted
illustrates important causal anomalies (Godel 1949, cases, the latter possibly involving nonscalar singu-
Hawking and Ellis 1973). For q = 3 (r = 6), we find larities. For q = 3 (r = 6), we have the isotropic FL
the Einstein static universe (Einstein 1917), the models, discussed above. Both the LRS and isotropic
unique nonexpanding FL model with k = 1 and > 0. cases could be good models of the real universe.
It is of interest because it could possibly represent the When t = 2, we have inhomogeneous evolving
asymptotic initial state of nonsingular inflationary models. This is a very large family, but the LRS
universe models (Ellis et al. 2003). The higher- (q = 1, r = 2) cases have been examined in detail; in
symmetry models (de Sitter and anti-de Sitter the case of pressure-free matter, these are the
universes with higher-dimensional isotropy groups) TolmanBondi inhomogeneous models (Bondi
are not included here because they do not obey the 1947) that can be integrated exactly, and have
energy condition [3] they are empty universes, been used for many interesting astrophysical and
which can be interesting asymptotic states but are cosmological studies. Krasinski (1997) gives a very
not by themselves good cosmological models. complete catalog of these and lower-symmetry
When t = 3, we have spatially homogeneous inhomogeneous models and their uses in cosmology.
evolving universe models. For q = 0 (r = 3), there A considerable challenge is the dynamical systems
are a large family of Bianchi universes, spatially analysis for generic inhomogeneous models, needed
homogeneous but anisotropic, characterized into to properly understand the early evolution of generic
nine types according to the structure constants of universe models (Uggla et al. 2003), and hence to
the Lie algebra of the three-dimensional symmetry determine what is generic behavior.
group G3 . These can be orthogonal: the fluid flow
is orthogonal to the surfaces of homogeneity, or
The Origin of the Universe
tilted; the latter case can have fluid rotation or
acceleration, but the former cannot. They exhibit a The issue underlying all this is what led to the initial
large variety of behaviors, including power-law, conditions for the universe, for example, providing
oscillatory, and nonscalar singularities (Tipler et al. the starting conditions for inflation. There are many
1980). A vexed question is whether truly chaotic approaches to studying the quantum gravity phase
behavior occurs in Bianchi IX models. The behavior of cosmology, including the Wheelerde Witt equa-
of large families of these models has been character- tion, the path-integral approach, string cosmology,
ized in dynamical systems terms (Wainwright and pre-big bang theory, brane cosmology, the ekpyrotic
Ellis 1996), showing the intriguing way that higher- universe, the cyclic universe, and loop quantum
symmetry solutions provide a skeleton that guides gravity approaches. These lie beyond the purview of
the behavior of lower-symmetry solutions in the the present article, except to say that they are all
space of spacetimes. Many Bianchi models can be based on unproven extrapolations of known physics.
shown to isotropize at late times, particularly if The physically possible paths will become clearer as
viscosity is present; thus, they are asymptotic to the the nature of quantum gravity is elucidated.
FL universes in the far future. In some cases, Bianchi It is pertinent to note that there exist nonsingular
models exhibit intermediate isotropization: they are realistic cosmological solutions, possible in the light
much like FL models for a large part of their life, but of the violations of the energy condition enabled by
are very different from it both at very early and very the supposed scalar fields that underlie inflationary
late stages of their evolution. These could be good universe theory. These nonsingular solutions can even
models of the real universe. An important theorem avoid the quantum gravity era (Ellis et al. 2003).
by Wald (1983) shows that a cosmological constant However, they have very fine-tuned initial conditions,
will tend to isotropize Bianchi solutions at late which is nowadays considered as a disadvantage; but
658 Cotangent Bundle Reduction
there is no proof that whatever processes led to the Ellis GFR (1971) In: Sachs RK (ed.) General Relativity and
existence of the universe preferred generic rather than Cosmology, Proc. Int. School of Physics Enrico Fermi,
Course XLVII, p. 104. Academic Press.
fine-tuned conditions; this is a philosophical rather Ellis GFR and Bruni M (1989) Physical Review D 40: 1804.
than physical assumption. It may well be that, as Ellis GFR and van Elst H (1999) In: Lachieze-Ray M (ed.) Theoretical
regards the start of the universe, the options are that and Observational Cosmology, vol. 541 [gr-qc/9812046], Nato
either an initial singularity occurred, or the initial Series C: Mathematical and Physical Sciences: Kluwer.
conditions were very finely tuned and allowed an Ellis GFR and Madsen M (1991) Classical and Quantum Gravity
8: 667.
infinitely existing universe. Investigation of whether Ellis GFR, Murugan J, and Tsagas CG (2003) gr-qc/0307112.
this conjecture is in fact valid, and if so which is the Ellis GFR, Nel SD, Stoeger W, Maartens R, and Whitman AP
best option, are intriguing open topics. (1985) Physics Reports 124(5 and 6): 315.
Godel K (1949) Reviews of Modern Physics 21: 447.
See also: Einstein Equations: Exact Solutions; Guth A (1981) Physical Review D 23: 347.
EinsteinCartan Theory; General Relativity: Experimental Hawking SW and Ellis GFR (1973) The Large Scale Structure of
Space Time. Cambridge: Cambridge University Press.
Tests; General Relativity: Overview; Gravitational
Krasinski A (1997) Inhomogeneous Cosmological Models.
Lensing; Lie Groups: General Theory; Newtonian Limit of
Cambridge: Cambridge University Press.
General Relativity; Quantum Cosmology; Shock Wave Kristian J and Sachs RK (1966) The Astrophysical Journal 143: 379.
Refinement of the FriedmanRobertsonWalker Metric; Lachieze-Rey M and Luminet JP (1995) Physics Reports
Spacetime Topology, Causal Structure and Singularities; 254: 135214.
String Theory: Phenomenology. Robertson HP (1933) Reviews of Modern Physics 5: 62.
Peacock JA (1999) Cosmological Physics. Cambridge: Cambridge
University Press.
Further Reading Rindler W (1956) Monthly Notices of the Royal Astronomical
Society 116: 662.
Bardeen JM (1980) Physical Review D 22: 1882. Sachs RK and Wolfe A (1967) Astrophysical Journal 147: 73.
Bondi H (1947) Monthly Notices of the Royal Astronomical Sandage A (1961) Astrophysical Journal 133: 355.
Society 107: 410. Stoeger W, Maartens R, and Ellis GFR (1995) Astrophysical
Bonnor WB and Ellis GFR (1986) Monthly Notices of the Royal Journal 443: 1.
Astronomical Society 218: 605. Tipler FJ, Clarke CJS, and Ellis GFR (1980) In: Held A (ed.)
Ehlers J (1961) Abh Mainz Akad Wiss u Lit (translated in Gen General Relativity and Gravitation: One Hundred Years after
Rel Grav 25: 1225, 1993). the Birth of Albert Einstein, vol. 2, p. 97. Plenum.
Ehlers J, Geren P, and Sachs RK (1968) Journal of Mathematical Uggla C, van Elst H, Wainwright J, and Ellis GFR (2003) Physical
Physics 9: 1344. Review D gr-qc/0304002 (to appear).
Ehlers J and Rindler W (1989) Monthly Notices of the Royal Wainwright J and Ellis GFR (eds.) (1996) The Dynamical Systems
Astronomical Society 238: 503. Approach to Cosmology. Cambridge: Cambridge University
Einstein A (1917) Sitz Ber Preuss Akad Wiss (translated in The Press.
Principle of Relativity, 1993). Dover. Wald RM (1983) Physical Review D 28: 2118.
Ellis GFR (1967) Journal of Mathematical Physics 8: 1171.
infinitesimal generator vector field Q of the G-action between TQ and T Q . Note that if g is abelian or
at q 2 Q (see Hamiltonian Group Actions and =0, the embedding is always onto and thus the
Symmetries and Conservation Laws). Throughout reduced space is again, topologically, a cotangent
this article, it is assumed that the G-action on Q, bundle.
and hence on T Q, is free and proper. Recall also It should be noted that there is a choice in this
that ((T Q) , (Q ) ) denotes the reduced manifold theorem, namely the 1-form . Whereas the
at 2 g (see Symmetry and Symplectic Reduction), reduced symplectic space ((T Q) , (Q ) ) is intrin-
where (T Q) := J 1 ()=G is the orbit space of the sic, the symplectic structure on the space T Q
G -action on the momentum level manifold J 1 () depends on . The theorem above states that no
and G := {g 2 G j Adg = } is the isotropy sub- matter how is chosen, there is a symplectic
group of the coadjoint representation of G on g . diffeomorphism, which also depends on , of the
The left-coadjoint representation of g 2 G on 2 g reduced space onto a submanifold of T Q .
is denoted by Adg1 .
Cotangent bundle reduction at zero is already quite
Connections
interesting and has many applications. Let : Q ! Q=G
be the G-principal bundle projection defined by the The 1-form is usually obtained from a left
proper free action of G on Q, usually referred to as the connection on the principal bundle : Q ! Q=G or
shape space bundle. Zero is a regular value of J and the : Q ! Q=G. A left connection 1-form A 2 1 (Q; g )
map 0 : ((T Q)0 , (Q )0 ) !(T (Q=G), Q=G ) given on the left principal G-bundle : Q ! Q=G is a Lie
by 0 ([q ])(Tq (vq )) := q (vq ), where q 2 J 1 (0), algebra-valued 1-form A : TQ ! g , where g denotes
[q ] 2 (T Q)0 , and vq 2 Tq Q, is a well-defined sym- the Lie algebra of G, satisfying the conditions A(Q ) =
plectic diffeomorphism. for all 2 g and A(Tq g (v)) = Adg (A(v)) for all g 2 G
This theorem generalizes in two nontrivial ways and v 2 Tq Q, where Adg denotes the adjoint action of
when one reduces at a nonzero value of J: an G on g . The horizontal vector sub-bundle HQ of the
embedding and a fibration theorem. connection A is defined as the kernel of A, that is, its
fiber at q 2 Q is the subspace Hq := ker A(q). The map
vq 7! verq (vq ) := [A(q)(vq )]Q (q) is called the vertical
Embedding Version of Cotangent
projection, while the map vq 7! horq (vq ) := vq
Bundle Reduction
verq (vq ) is called the horizontal projection. Since for
Let 2 g , Q := Q=G , : Q ! Q the projection any vector vq 2 Tq Q we have vq = verq (vq ) horq (vq ),
onto the G -orbit space, g := { 2 g j ad = 0} the it follows that TQ = HQ VQ and the maps
Lie algebra of the coadjoint isotropy subgroup G , horq : Tq Q ! Hq Q and verq : Tq Q ! Vq Q are projec-
where ad := [, ] for any , 2 g , ad : g ! g the tions onto the horizontal and vertical subspaces at every
dual map, 0 := jg 2 g the restriction of to g , q 2 Q.
and ((T Q) , (Q ) ) the reduced space at . The Connections can be equivalently defined by the
induced G -action on T Q admits the equivariant choice of a sub-bundle HQ TQ complementary to
momentum map J : T Q ! g given by J (q ) = the vertical sub-bundle VQ satisfying the following
J(q )jg . Assume there is a G -invariant 1-form G-invariance property: Hgq Q = Tq g (Hq Q) for
on Q with values in ( J )1 (0 ). Then there is a unique every g 2 G and q 2 Q. The sub-bundle HQ is called,
closed 2-form on Q such that = d . Define as before, the horizontal sub-bundle and a connection
the magnetic term B := Q , where Q : 1-form A is defined by setting A(q)(Q (q) uq ) = ,
T Q ! Q is the cotangent bundle projection, for any 2 g and uq 2 Hq Q.
which is a closed 2-form on T Q . Then the map The curvature of the connection A is the Lie
: ((T Q) , (Q ) ) ! (T Q , Q B ) given by algebra-valued 2-form on Q defined by B(uq , vq ) =
([q ])(Tq (vq )):= (q (q))(vq ), for q 2 J 1 (), dA(horq (uq ), horq (vq )). When one replaces vectors in
[q ]2(T Q) , and vq 2Tq Q, is a symplectic embed- the exterior derivative with their horizontal projec-
ding onto a submanifold of T Q covering the base tions, then the result is called the exterior covariant
Q . The embedding is a diffeomorphism onto derivative and the preceding formula for B is often
T Q if and only if g =g . If the 1-form takes written as B = DA. Curvature measures the lack of
values in the smaller set J 1 () then the image of is integrability of the horizontal distribution, namely
the the vector sub-bundle [T (VQ)] of T Q , where B(u, v) = A([hor(u), hor(v)]) for any two vector
VQ TQ is the vertical vector sub-bundle consisting fields u and v on Q. The Cartan structure equations
of vectors tangent to the G-orbits, that is, its fiber at state that B(u, v) = dA(u, v) [A(u), A(v)], where
q2Q equals Vq Q={Q (q) j 2g }, and denotes the the bracket on the right hand side is the Lie
annihilator relative to the natural duality pairing bracket in g .
660 Cotangent Bundle Reduction
Step 3: With (t) 2 g determined in step 2, solve (c) Reconstruction of dynamics for simple
the nonautonomous differential equation g(t) _ = mechanical systems with symmetry. The case of
Te Lg(t) (t) with initial condition g(0) = e, where Lg simple mechanical systems with symmetry deserves
denotes left translation on G; this is the step that special attention since several steps in the recon-
involves quadratures and is the main obstacle struction method can be simplified. For simple
to finding explicit formulas. mechanical systems, the knowledge of the base
Step 4: The curve c(t) = g(t) d(t), with d(t) found integral curve q(t) suffices to determine the entire
in step 1 and g(t) found in step 3 is the integral integral curve on T Q. Indeed, if h = K V q is
curve of Xh with initial condition c(0) = q . the Hamiltonian, the Legendre transformation
Fh : T Q ! TQ determines the Lagrangian system
This method depends on the choice of the conne-
on TQ given by (uq ) = (1=2)kuq k2 V(uq ), for
ction A 2 1 ( J 1 (); g ). Here are several particular
uq 2 Tq Q. Lagranges equations are second-order
cases when this procedure simplifies.
and thus the evolution of the velocities is given by
(a) One-dimensional coadjoint isotropy group. If
the time derivative q(t) _ of the base integral curve.
G = S1 or G = R, identify g with R via the map
Since Fh = (F)1 , the solution of the Hamiltonian
a 2 R $ a
2 g , where
2 g ,
6 0, is a generator of
system is given by F(q(t)). _ Using the explicit
g . Then a connection 1-form on the S1 (or R)
expression of the mechanical connection and the
principal bundle J 1 () ! (T Q) is the 1-form A on
notation given in the general procedure, the method
J 1 () given by A = (1=h,
i) , where is the
of reconstruction simplifies to the following steps.
pullback of the canonical 1-form 2 1 (T Q) to
To find the integral curve c(t) of the simple mecha-
the submanifold J 1 (). The curvature of this
nical system with G-symmetry h = K V Q on
connection is the 2-form on (T Q) given by
T Q with initial condition c(0) = q 2 Tq Q, know-
curv(A) = (1=h,
i)! , where ! is the reduced
ing the integral curve c (t) of the reduced Hamil-
symplectic form on (T Q) . In this case, the curve
tonian system on (T Q) given by the reduced
(t) 2 g in step 2 is given by (t) = [h](d(t)), where
Hamiltonian function h : (T Q) ! R with initial
2 X(T Q) is the Liouville vector field character-
condition c (0) = [q ] one proceeds in the follow-
ized by the property of being the unique vector field
ing manner. Recall the symplectic embedding
on T Q that satisfies the relation d (, ) = . In
: ((T Q) , (Q ) ) ! (T (Q=G ), Q=G B ). The
canonical coordinates (qi , pi ) on T Q, = pi @p @
.
i curve (c (t)) 2 T (Q=G ) is an integral curve of
(b) Induced connection. Any connection A 2
the Hamiltonian system on (T (Q=G ), Q=G B )
1 (Q; g ) on the left principal bundle Q ! Q=G
given by the function that is the sum of the kinetic
induces a connection A 2 1 ( J 1 (); g ) by A(q )
energy of the quotient Riemannian metric and the
(Vq ) := A(q)(Tq Q (Vq )), where q 2 Q, q 2 Tq Q, b . Let q (t) :=
quotient amended potential V
Vq 2 Tq (T Q), and Q : T Q ! Q is the cotangent
Q=G (c (t)) be the base integral curve of this system,
bundle projection. In this case, the curve (t) 2 g in
where Q=G : T (Q=G ) ! Q=G is the cotangent
step 2 is given by (t) = A(q(t))(Fh(d(t)), where
bundle projection.
q(t) := Q (d(t)) is the base integral curve and the
vector bundle morphism Fh : T Q ! TQ is the fiber
Step 1: Relative to the mechanical connection
derivative of h given by Amech 2 1 (Q; g ), horizontally lift q (t) 2 Q=G
to a curve qh (t) 2 Q passing through qh (0) = q.
d
Step 2: Determine (t) 2 g from the algebraic system
Fhq q : hq tq
dt t0 hh(t)Q (qh (t)), Q (qh (t))ii = h, i for all 2 g ,
where hh , ii is the G-invariant kinetic energy
for any q , q 2 Ta Q. Two particular instances of
Riemannian metric on Q. This implies that q_ h (0)
this situation are noteworthy.
and (0)Q (q) are the horizontal and vertical compo-
(b1) Assume that the Hamiltonian h is that of a nents of the vector ]q 2 Tq Q which is associated by
simple mechanical system with symmetry. the metric hh , ii to the initial condition q .
Choosing A to be the mechanical connection
Step 3: Solve g(t) _ = Te Lg(t) (t) in G with initial
Amech , the curve (t) 2 g in step 2 is given by condition g(0) = e.
(t) = Amech (q(t)) (hhd(t), ii).
Step 4: The curve q(t) := g(t) qh (t), with qh (t)
(b2) If Q = G is a Lie group, dim G = 1, and
is a and g(t) determined in steps 2 and 4, respectively,
generator of g , then the connection A 2 1 (G) is the base integral curve of the simple mechanical
can be chosen to equal A(g) := (1=h,
i) system with symmetry defined by the function h
Tg Rg1 (), where
is a generator of g and Rg satisfying q(0) = 0. The curve (Fh)1 (q(t)) _ 2 TQ
is right translation on G. is the integral curve of this system with initial
Cotangent Bundle Reduction 663
0 k
Q qh sk2 be the amended potential and V b 2 C1 (Q=G ) the
! induced function on the base. Let c : [0, T] ! T Q be
h;
i an integral curve of the system with Hamiltonian
q_ h t
Q qh t
k
Q qh sk2 h = K V Q and suppose that its projection
c : [0, T] ! (T Q) to the reduced space is a closed
(c2) The case of compact Lie groups. An obvious integral curve of the reduced system with Hamil-
situation when the differential equation in step 3 tonian h . The reconstruction phase associated to
can be solved is if (t) = for all t, where is a the loop c (t) is the group element g 2 G , satisfying
given element of g . Then the solution is the identity c(T) = g c(0). We shall present two
g(t) = exp(t). However, step 2 puts certain explicit formulas of the reconstruction phase for the
restrictions under this hypothesis, because it case when G = S1 . Let
2 g = R be a generator of
requires that hh(t)Q (qh (t)), Q (qh (t))ii = h, i the coadjoint isotropy algebra and write c(T) =
for any 2 g . This is satisfied if there is a exp(
) c(0); in this case, is identified with the
bilinear nondegenerate form ( , ) on g satisfy- reconstruction phase and, as we shall see in concrete
ing (
, ) = hh
Q (q), Q (q)ii for all q 2 Q and mechanical examples, it truly represents an angle.
, 2 g . This implies that ( , ) is positive If G = S1 , the G -principal bundle : J 1 () !
definite and invariant under the adjoint action (T Q) := J 1 ()=G admits two natural connec-
of G on g , so semisimple Lie algebras of tions: A = (1=
) 2 1 ( J 1 ()), where is the
noncompact type are excluded. If G is com- pullback of the canonical 1-form on the cotangent
pact, which ensures the existence of a positive bundle to the momentum level submanifold J 1 (),
adjoint invariant inner product on g , and and Q Amech 2 1 ( J 1 ()). There is no reason to
Q = G, this condition implies that the kinetic choose one connection over the other and thus there
energy metric is invariant under the adjoint are two natural formulas for the reconstruction
action. There are examples in which such phase in this case. Let c (t) be a periodic orbit of
conditions are natural, such as in Kaluza period T of the reduced system and denote also by
Klein theories. Thus, if G is a compact Lie h the value of the Hamiltonian function on it.
664 Cotangent Bundle Reduction
Assume that D is a two-dimensional surface in Casimir functions that are all smooth functions of
(T Q) whose boundary is the loop c (t). Since the kk2 , where 2 R 3 denotes the body angular
manifolds (T Q) and T (Q=S1 ) are diffeomorphic momentum.
(but not symplectomorphic), it makes sense to The Hamiltonian of the rigid body on the Lie
consider the base integral curve q (t) obtained by Poisson space T SO(3)=SO(3) R 3 is given by
projecting c (t) to the base Q=S1 , which is a closed
curve of period T. Denote by 1 21 22 23
h :
Z T 2 I1 I2 I3
b i : 1
hV b q t dt
V where I1 , I2 , I3 > 0 are the principal moments of
T 0
inertia of the body. Let I := diag(I1 , I2 , I3 ) denote the
the average of V b over the loop q (t). Let qh (t) 2 Q moment of inertia tensor diagonalized in a principal-
be the Amech -horizontal lift of q (t) to Q and let be axis body frame. The LiePoisson bracket on R3 is
the Amech -holonomy of the loop q (t) measured from given by {f , g}() = (rf () rg()) and the
q(0), the base Rpoint of c(0); its expression is given by equation of motions are = , where 2 R3 is
R
exp = exp( D B), where B is the curvature of the the body angular velocity given in terms of by
mechanical connection. Denote by ! the reduced i := =Ii , for i = 1, 2, 3, that is, = I1 . The
symplectic form on (T Q) . With these notations the trajectories of the these equations are found by
phase is given by intersecting a family of homothetic energy ellipsoids
ZZ with the angular momentum concentric spheres. If
1 2h hV b iT
! I1 > I2 > I3 , one immediately sees that all orbits are
D
periodic with the exception of four centers (the two
Z T
ds possible rotations about the long and the short
2
3 moment of inertia axis of the body), two saddles
0 k
Q qh sk
(the two rotations about the middle moment of
The first terms in both formulas are the so-called inertia axis of the body), and four heteroclinic orbits
geometric phases because they carry only geometric connecting the two saddles.
information given by the connection, whereas the Suppose that (t) is a periodic orbit on the sphere
second terms are called the dynamic phases since S2kk with period T. After time T, by how much has
they encapsulate information directly linked to the the rigid body rotated in space? The answer to this
Hamiltonian. The expression of the total phase as a question follows directly from [3]. Taking
= =kk
sum of a geometric and a dynamic phase is not and the potential v 0 we get
intrinsic and is connection dependent. It can even
2h T
happen that one of these summands vanishes. We
shall consider now two concrete examples: the free kk
ZZ
rigid body and the heavy top. 2kIsk2 s Istr I
ds
D s Is2
Reconstruction Phases for the Free Rigid Body Z T
ds
kk3
The motion of the free rigid body is a geodesic with 0 s Is
respect to a left-invariant Riemannian metric on
where D is one of the two spherical caps on S2kk
SO(3) given by the moment of inertia of the body.
whose boundary is the periodic orbit (t), h is the
The phase space of the free rigid body motion is
value of the total energy on the solution (t), and
T SO(3) and a momentum map J : T SO(3) ! R 3 of
is the oriented solid angle, that is,
the lift of left translation to the cotangent bundle is Z Z
given by right translation to the identity element. 1 areaD
: ! ; jj
We have identified here so(3) with R3 by the kk D kk2
Lie algebra isomorphism x 2 (R3 , ) 7! x ^ 2 (so(3),
[ , ]), where x^(y) = x y, and so(3) with R3 by
the inner product on R 3 . The reduced manifold
Reconstruction Phases for the Heavy Top
J 1 ()=G is identified with the sphere S2kk in R3 of
radius kk with the symplectic form ! = dS=kk, The heavy top is a simple mechanical systems with
where dS is the standard area form on S2kk and G symmetry S1 on T SO(3) whose Hamiltonian function
S1 is the group of rotations around the axis . These is given by h(h ) := (1=2)k]h k2 Mgk h, where
concentric spheres are the coadjoint orbits of the Lie h 2 SO(3), h 2 Th SO(3), k is the unit vector of the
Poisson space so(3) and represent the level sets of the spatial Oz axis (pointing in the direction opposite to
Cotangent Bundle Reduction 665
that of the gravity force), M 2 R is the total mass of the action. The leaves of this Poisson manifold are the
body, g 2 R is the value of the gravitational accelera- orbit reduced spaces J 1 (O )=G, where O g is
tion, the fixed point about which the body moves is the the coadjoint G-orbit through 2 g (see Symmetry
origin, and is the unit vector of the straight line and Symplectic Reduction). Is there an explicit
segment of length connecting the origin to the center formula for this reduced Poisson bracket on a
of mass of the body. This Hamiltonian is left invariant manifold diffeomorphic to (T Q)=G? It turns out
under rotations about the spatial Oz axis. A momen- that this question has two possible answers, once a
tum map induced by this S1 -action is given by connection on the principal bundle : Q ! Q=G is
J : T SO(3) ! R, J(h ) = Te Lh (h ) k; recall that introduced. The discussion below will also link to
Te Lh (h ) =: 2 R3 is the body angular momentum. the fibration version of cotangent bundle reduction.
The reduced space J 1 ()=S1 is generically the cotan- In order to present these answers, we review two
gent bundle of the unit sphere endowed with the bundle constructions. Let G act freely and properly
symplectic structure given by the sum of the canonical on the manifold P and consider the a (left) principal
form plus a magnetic term; equivalently, this is the G-bundle : P ! P=G := M. Let : N ! M be a
coadjoint orbit in the dual of the Euclidean Lie algebra surjective submersion. Then the pullback bundle
se(3) = R 3 R 3 given by O = {(, ) j = , : (n, p) 2 P~ := {(n, p) 2 N P j (p) = (n)} 7! n 2 N
kk2 = 1}. The projection map J 1 () ! O imple- over N is also a principal (left) G-bundle relative to
menting the symplectic diffeomorphism between the the action g (n, p) := (n, g p).
reduced space and the coadjoint orbit in se(3) is If there is a (left) G-action a manifold V, then the
given by h 7! (, ) := (Te Lh (h ), h1 k). The orbit diagonal G-action g (p, v) = (g p, g v) on P V is
symplectic form ! on O has the expression also free and proper and one can form the asso-
! (, )(( x y, x), ( x0 y0 , ciated bundle P G V := (P V)=G which is a
x0 )) = (x x0 ) (x y0 x0 y) for any locally trivial fiber bundle E : [p, v] 2 E := P G
x, x0 , y, y0 2 R3 . The heavy-top equations = V 7! (p) 2 M over M with fibers diffeomorphic to
Mg , = are LiePoisson equations on V. Analogously, one can form the associated fiber
se(3) for the Hamiltonian h(, ) = (1=2) bundle E~ : E ~ := P
~ G V ! N. Summarizing, the
Mg and the LiePoisson bracket {f , g}(, ) = associated bundle E ~ =P
~ G V ! N is obtained
(r f r g) (r f r g r g r f ), from the principal bundle : P ! M, the surjective
where r and r denote the partial gradients. submersion : N ! M, and the G-manifold V by
Let ((t), (t)) be a periodic orbit of period T of pullback and association, in this order.
the heavy-top equations. After time T, by how much These operations can be reversed. First, form the
has the heavy top rotated in space? The answer is associated bundle E : E = P G V ! M and then
provided by [3]: pull it back by the surjective submersion : N ! M
ZZ Z T to N to get the pullback bundle ~E : E ~ ! N. The map
1 1 ~ ~
: P G V ! E defined by ([(n, p), v]) := (n, [p, v])
! 2h T 2Mg s ds
D 0 is an isomorphism of locally trivial fiber bundles.
ZZ
2kIsk2 s Istr I These general considerations will be used now to
ds realize the quotient Poisson manifold (T Q)=G in
D s Is2
Z T two different ways. Let Q be a manifold and G a Lie
ds group (with Lie algebra g ) acting freely and properly
0 s Is on it. Let A 2 1 (Q; g ) be a connection 1-form on
where D is the spherical cap on the unit sphere the left G-principal bundle : Q ! Q=G. Pull back
whose boundary is the closed curve (t) and D is a the G-bundle : Q ! Q=G by the cotangent bundle
two-dimensional submanifold of the orbit O projection Q=G : T (Q=G) ! Q=G to T (Q=G) to
obtain the G-principal bundle ~Q=G : ([q] , q) 2 Q ~ :=
bounded by the closed integral curve ((t), (t)).
The first terms in each summand represent the {([q] , q) j [q] = (q), q 2 Q} 7! [q] 2 T (Q=G). This
geometric phase and the second terms the dynamic bundle is isomorphic to the annihilator (VQ)
phase. T Q of the vertical bundle VQ := ker T TQ.
Next, form the coadjoint bundle S : S := Q ~ G
~
g ! T (Q=G) of Q, S (([q] , q), ) = [q] , that is,
Gauged Poisson Structures
the associated vector bundle to the G-principal
If the Lie group G acts freely and properly on a bundle Q ~ ! T (Q=G) given by the coadjoint repres-
smooth manifold Q, then (T Q)=G is a quotient entation of G on g . The connection-dependent map
Poisson manifold (see Poisson Reduction), where the A : S ! (T Q)=G defined by A ([([q] , q), ]) :=
quotient is taken relative to the (left) lifted cotangent [Tq ([q] ) A(q) ], where q 2 Q, q 2 Tq Q, and
666 Cotangent Bundle Reduction
2 g , is a vector bundle isomorphism over Q=G. by re W f (w)(v ) = df (w)(v , T(q, ) Qg (horq
A [q] [q]
The Sternberg space is the Poisson manifold (S, { , }S ), (T[q] Q=G (v[q] )), 0)) where Qg : Q g !
where { , }S is the pullback to S by A of the quotient Q G g = e g is the orbit map. The symbol r eW
A
Poisson bracket on (T Q)=G. signifies that this is a covariant derivative on the
Next, we proceed in the opposite order. Construct pullback bundle W induced by the covariant
first the coadjoint bundle ~g : [q, ] 2 e g := Q G derivative rA on the coadjoint bundle e g . This
g 7! [q] 2 Q=G associated to the principal bundle covariant derivative rA is induced on e g by the
: Q ! Q=G and then pull it back by the cotangent connection A.
bundle projection Q=G : T (Q=G) ! Q=G to
For f 2 C1 (W), we have dSA~ (f ) = (r e W f ) .
A
T (Q=G) to obtain the vector bundle W : W :=
To write the two gauged Poisson brackets on S and
{([q] , [q, ]) j Q=G ([q] ) = ~g ([q, ]) = [q]}, W ([q] ,
on W explicitly, we denote by ~g = Q G g the
[q, ]) = [q] over T (Q=G). Note that W = T
adjoint bundle of : Q ! Q=G, by Q=G the
(Q=G) e g and hence W is also a vector bundle over
canonical symplectic structure on T (Q=G), by
Q/G. Let HQ be the horizontal sub-bundle defined by
B 2 2 (Q; g ) the curvature of A, and by B the
the connection A; thus, TQ = HQ VQ, where
~g -valued 2-form B 2 2 (Q=G; ~g ) on the base Q=G
Hq Q := ker A(q). For each q 2 Q, the linear map
defined by B([q])(u[q] , v[q] ) = [q, B(q)(uq , vq )], for any
Tq jHq Q : Hq Q ! T[q] (Q=G) is an isomorphism. Let
uq , vq 2 Tq Q that satisfy Tq (uq ) = u[q] and
horq := (Tq jHq Q )1 : T[q] (Q=G) ! Hq Q Tq Q be
Tq (vq ) = v[q] . Note that both S and W are Lie
the horizontal lift operator induced by the connection
algebra bundles, that is, their fibers are Lie algebras
A. Thus, horq : Tq Q ! T[q]
(Q=G) is a linear surjective
and the fiberwise Lie bracket operation depends
map whose kernel is the annihilator (Hq Q) of the
smoothly on the base point. If f 2 C1 (S), denote by
horizontal space. The connection-dependent map ~ G g the usual fiber derivative of f.
f =
s 2 S = Q
A : (T Q)=G ! W defined by A ([q ]) := (horq
Similarly, if f 2 C1 (W) denote by
f =
w 2 W the
(q ), [q, J(q )]), where q 2 Q, q 2 Tq Q, and J : T
usual fiber derivative of f. Finally, ] : T
Q ! g is the momentum map of the lifted action,
(T (Q=G)) ! T(T (Q=G)) is the vector bundle iso-
h J(q ), i = q ((Q (q)) for 2 g , is a vector bundle
morphism induced by Q=G . The Poisson bracket of
isomorphism over Q/G and A A = . The Wein-
f , g 2 C1 (S) is given by
stein space is the Poisson manifold (W, { , }W ), where
{ , }W is the push-forward by A of the Poisson
ff ; ggS s Q=G q d SA~f s] ; d SA~gs]
bracket of (T Q)=G. In particular, : S ! W is a
connection independent Poisson diffeomorphism. The
f
g
Poisson brackets on S and on W are called gauged s; ;
s
s
Poisson brackets. They are expressed explicitly in terms D E
of various covariant derivatives induced on S and on v; Q=G Bq d SA~f s] ; d SA~gs]
W by the connection A 2 1 (Q; g ).
Recall that the connection A on the principal g . The Poisson bracket f , g 2
where v = [q, ] 2 e
bundle : Q ! Q=G naturally induces connections C1 (W) is given by
on pullback bundles and affine connections on W
associated vector bundles. Thus, both S and W ff ; ggW w Q=G q r e W gw]
e f w] ; r
A A
carry covariant derivatives induced by A. They are
f
g
given, according to general definitions, in the cases w; ;
w
w
under consideration, by: D W E
v; Q=G Bq r e f w] ; re W gw]
A A
If f 2 C1 (W), w = ([q] , [q, ]) 2 W, and v[q] 2 T[q] on T Q is by cotangent lift and on Q ~ g is
T (Q=G), then r e W f (w) 2 T T (Q=G) is defined
g (([q] , q), ) = (([q] , g q), Adg1 ). The pullback J A
A [q]
Cotangent Bundle Reduction 667
scaling. In an analogy with first- and second-order is that the same spacetime can be sliced in many
phase transitions in statistical mechanics, the critical different ways, none of which is preferred. There-
phenomena with a finite mass at the black hole fore, to turn general relativity into a dynamical
threshold are called type I, and the critical phenomena system, one has to fix a slicing (and in practice also
with power-law scaling of the mass are called type II. coordinates on each slice). In the example of the
At this point, we characterize the degree of rigor spherically symmetric massless scalar field, using
of the various parts of the theory that is summarized polar slicing and an area radial coordinate r, a point
in this article. Critical phenomena were discovered in phase space can be characterized by the two
in the numerical time evolution of generic asympto- functions
tically flat initial data. Numerical evolution of many
elements of a specific one-parameter family, and @
Z r; r r 5
fine-tuning to the black hole threshold along that @t
family showed self-similarity and mass scaling near
In spherical symmetry, there are no degrees of
the threshold. Doing this for a number of randomly
freedom in the scalar field, and Cauchy data for
chosen one-parameter families suggests that these
the metric can be reconstructed from Z using the
phenomena, and in particular the echoing scale
Einstein constraints.
and mass-scaling exponent , are universal between
The phase space consists of two halves: initial
initial data within one model (e.g., the spherical
data whose time evolution always remains regular,
scalar field). Numerical experiments, however, can
and data which contain a black hole or form one
only explore a finite-dimensional subspace of the
during time evolution. The numerical evidence
infinite-dimensional space of initial data (phase
collected from individual one-parameter families of
space) of the field theory, and so cannot prove
data suggests that the black hole threshold that
universality.
separates the two is a smooth hypersurface. The
We go further by applying the theory of dynami-
mass-scaling law [1] can, therefore, be restated
cal systems to general relativity. The arguments
without explicit reference to one-parameter families.
summarized in the next section would be difficult to
Let P be any function on phase space such that data
make rigorous, as the dynamical system under
sets with P > 0 form black holes, and data with P < 0
consideration is infinite dimensional, but they
do not, and which is analytic in a neighborhood of
suggest a focus on fixed points of the dynamical
the black hole threshold P = 0. The black hole mass
system and their linear perturbations. Even though
as a function on phase space is then given by
the dynamical systems motivation is not mathema-
tically rigorous, the linearized analysis itself is a M FP P 6
well-defined problem that can be solved numerically
to essentially arbitrary precision. This proves uni- for P > 0, where F(P) > 0 is an analytic function.
versality on a perturbative level, and provides Consider now the time evolution in this dynami-
numerical values of and . A combination of the cal system, near the threshold (critical surface)
global dynamical systems analysis and perturbative between black hole formation and dispersion. A
analysis even predicts further critical exponents for phase-space trajectory that starts out in a critical
black hole charge and angular momentum. Finally, surface by definition never leaves it. A critical
critical phenomena have been discovered in a surface is, therefore, a dynamical system in its own
number of systems (different types of matter and right, with one dimension fewer. If it has an
symmetry restrictions), and this suggests that they attracting fixed point, such a point is called a
may be generic for some large class of field theories critical point. It is an attractor of codimension 1,
(although details such as the numerical values of and the critical surface is its basin of attraction. The
and do depend on the system), but there is no fact that the critical solution is an attractor of
conclusive evidence for this at present. codimension 1 is visible in its linear perturbations: it
has an infinite number of decaying perturbation
modes tangential to (and spanning) the critical
The Dynamical Systems Picture
surface, and a single growing mode not tangential
When we consider general relativity as an infinite- to the critical surface.
dimensional dynamical system, a solution curve is a Any trajectory beginning near the critical surface,
spacetime. Points along the curve are Cauchy but not necessarily near the critical point, moves
surfaces in the spacetime, which can be thought of almost parallel to the critical surface toward the
as moments of time. An important difference critical point. As the phase point approaches the
between general relativity and other field theories critical point, its movement parallel to the surface
670 Critical Phenomena in Gravitational Collapse
Flat space fixed point terms this corresponds to a discrete symmetry (DSS
rather than CSS in type II, or a pulsating critical
Black hole solution, rather than a stationary one, in type I).
threshold
One-parameter
family of
initial data Self-Similarity and Mass Scaling
Critical
point Type II critical phenomena occur where the critical
solution is scale invariant (self-similar, CSS or DSS).
p<p Using suitable spacetime coordinates, a CSS solution
p=p * can be characterized as independent of a time
p>p * coordinate which is also a logarithmic scale.
*
Similarly, a DSS solution can be characterized as
periodic in . For example, starting from the scale
periodicity [3] in polar-radial coordinates, we
Black hole fixed point replace r and t by new coordinates
r tt
Figure 1 The phase-space picture for the black hole threshold x ; ln 7
t t L
in the presence of a critical point. The arrow lines are time
evolutions, corresponding to spacetimes. The line without an where the accumulation time t and scale L must be
arrow is not a time evolution, but a one-parameter family of initial matched to the one-parameter family under con-
data that crosses the black hole threshold at p = p . (Reproduced
sideration. has been defined so that it increases as
with permission from Gundlach C (2003) Critical phenomena in
gravitational collapse. Physics Reports 376: 339405.) t increases and approaches t from below. It is useful
to think of r, t, and L as having dimension length in
units c = G = 1, and of x and as dimensionless.
slows down, while its distance and velocity out of Choptuiks observation, expressed in these coordi-
the critical surface are still small. The phase point nates, is that in any near-critical solution there is
spends sometime moving slowly near the critical a spacetime region where the fields Z are well
point. Eventually, it moves away from the critical approximated by the critical solution, or
point in the direction of the growing mode, and ends Zx; Z x; 8
up on an attracting fixed point.
This is the origin of universality: any initial data with
set that is close to the black hole threshold (on either Z x; Z x; 9
side) evolves to a spacetime that approximates the
critical spacetime for sometime. When it finally Note that the time parameter of the dynamical
approaches either the dispersion fixed point or the system must be chosen as if a CSS solution is to be
black hole fixed point, it does so on a trajectory that a fixed point, or a DSS solution a cycle. More
appears to be coming from the critical point itself. generally (going beyond spherical symmetry), on any
All near-critical solutions are passing through one of self-similar spacetime one can introduce coordinates
these two funnels. All details of the initial data have x = (, x1 , x2 , x3 ) in which the metric is of the form
been forgotten, except for the distance from the
g e2 g 10
black hole threshold: the closer the initial phase
point is to the critical surface, the more the solution and where g is independent of for a CSS
curve approaches the critical point, and the longer it spacetime, and periodic in for a DSS spacetime.
will remain close to it. These coordinates are not unique.
In all systems that have been examined, the black The critical exponent can be calculated from the
hole threshold contains at least one critical point. A linear perturbations of the critical solution. In order
fixed point of the dynamical system represents a to keep the notation simple, the discussion will be
spacetime with an additional continuous symmetry restricted to a critical solution that is spherically
that generic solutions do not have. If the critical symmetric and CSS, which is correct, for example,
spacetime is time independent in the usual sense, we for perfect-fluid matter.
have type I critical phenomena; if the symmetry is Let us assume that we have fine-tuned initial data
scale invariance, we have type II critical phenomena. close to the black hole threshold so that in a region
The attractor within the critical surface may also be the resulting spacetime is well approximated by the
a limit cycle, rather than a fixed point. In spacetime CSS critical solution. This part of the spacetime
Critical Phenomena in Gravitational Collapse 671
corresponds to the section of the phase-space These Cauchy data at t = tp depend on the initial
trajectory that lingers near the critical point. In this data at t = 0 only through the overall scale Lp , and
region, we can linearize around Z . As Z does not through the sign in front of . If the field equations
depend on , its linear perturbations can depend themselves are scale invariant, or asymptotically
on only exponentially. Labeling the perturbation scale invariant at scales Lp and smaller, the black
modes by i, a single mode perturbation is of hole mass, which has dimensions of length in
the form gravitational units, must be proportional to the
initial data scale Lp , the only length scale that is
Z Ci ei Zi x 11 present. Therefore,
In the near-critical regime, we can therefore
M / Lp / p p 1=0 18
approximate the solution as
X
1 and we have found the critical exponent to be = 1=0 .
i
Zx; Z x Ci p e Zi x 12
i0
The notation Ci (p) is used because the perturbation The Analogy with Statistical Mechanics
amplitudes Ci depend on the initial data, and hence The existence of a threshold where a qualitative
on the parameter p that controls the initial data. change takes place, universality, scale invariance,
If Z is a critical solution, by definition there is and critical exponents suggest that there is a
exactly one i with positive real part (in fact, it is mathematical analogy between type II critical
purely real), say 0 . As t ! t from below, which phenomena and critical phase transitions in statis-
corresponds to ! 1, all other perturbations decay tical mechanics.
and can be neglected. By definition, the critical In equilibrium statistical mechanics, observable
solution corresponds to p = p , and so we must have macroscopic quantities, such as the magnetization of
C0 (p ) = 0. Linearizing around p , we obtain a ferromagnetic material, are derived as statistical
averages over microstates of the system. The
dC0
Zx; Z x p p e0 Z0 x 13 expected value of an observable is
dp p X
hAi Amicrostate eHmicrostate; 19
in a region of the spacetime. microstates
Now we extract Cauchy data at one particular
value of within that region, namely at p The Hamiltonian H depends on the parameters ,
defined by which comprise the temperature, parameters char-
acterizing the system such as interaction energies of
dC0 the constituent molecules, and macroscopic forces
jp p je0 p 14
dp p such as the external magnetic field. The objective of
statistical mechanics is to derive relations between
where is an arbitrary small constant, so that the macroscopic quantities A and parameters .
Zx;p Z x Z0 x 15 Phase transitions in thermodynamics are thresholds
in the space of external forces at which the
where is the sign of p p , left behind because by macroscopic observables A, or one of their derivatives,
definition is positive. As increases from p , the change discontinuously. In a ferromagnetic material
growing perturbation becomes nonlinear and the at high temperatures, the magnetization m of the
approximation [13] breaks down. Then either a material (alignment of atomic spins) is determined by
black hole forms (say for the positive sign), or the the external magnetic field B. At low temperatures, the
solution disperses (for the negative sign). We need material shows a spontaneous magnetization even at
not follow this nonlinear evolution in detail to find zero external field, which breaks rotational symmetry.
the black hole mass scaling in the former case: With increasing temperature, the spontaneous magne-
dimensional analysis is sufficient. Going back to tization m decreases and vanishes at the Curie
coordinates t and r, we have temperature T as
r r jmj T T 20
Zr; tp Z Z0 16
Lp Lp
In the presence of a very weak external field, the
where spontaneous magnetization aligns itself with the
external field B, while its strength is, to leading
Lp Lep 17 order, independent of B. The function m(B, T),
672 Critical Phenomena in Gravitational Collapse
therefore, changes discontinuously at B = 0. The line taking into account that the -evolution in critical
B = 0 for T < T is, therefore, a line of first-order collapse is toward smaller scales, while the renor-
phase transitions between the possible directions of malization group flow goes toward larger scales:
the spontaneous magnetization (in a one-dimen- therefore,
diverges at the critical point, while M
sional system, between m up and m down). This line vanishes.
ends at the critical point (B = 0, T = T ) where the We have shown above that the black hole mass is
order parameter jmj vanishes. The role of B = 0 as controlled by one global function P on phase space.
the critical value of B is obscured by the fact that Clearly, P is the gravity equivalent of T T in
B = 0 is singled out by symmetry. the ferromagnet. But it is tempting to speculate
A critical phase transition involves scale-invariant (Gundlach 2002)that there is also a gravity equiva-
physics. One sign of this is that fluctuations appear lent of the external magnetic field B, which gives rise
on a large range of length scales between the to a second independent critical exponent. At least
underlying atomic scale and the scale of the sample. in some situations, the angular momentum of the
In particular, the atomic scale, and any dimensionful initial data can play this role. Note that, like B,
parameters associated with that scale, must become angular momentum is a vector, with a critical value
irrelevant at the critical point. This can be taken as that is zero because all other values break rotational
the starting point for obtaining properties of the symmetry. Furthermore, the final black hole can
system at the critical point. have nonvanishing angular momentum, which must
One first defines a semigroup acting on micro- depend on the angular momentum of the initial
states: the renormalization group. Its action is to data. The former is analogous to the magnetization
group together a small number of particles as a m, the latter to the external field B. It can be shown
single particle of a fictitious new system, using some that this analogy holds perturbatively for small
averaging procedure. Alternatively, this can also be angular momentum. Future numerical simulations
done in Fourier space. One then defines a dual will show if it goes further.
action of the renormalization group on the space of
Hamiltonians by demanding that the partition
Universality and Cosmic Censorship
function is invariant under the renormalization
group action: Critical phenomena in gravitational collapse first
X X generated interest because a complicated self-similar
0
eH eH 21 structure and dimensionless numbers and arise
microstates microstates0 from generic initial data evolved by quite simple
field equations. Another point of interest is the
The renormalized Hamiltonian H 0 is in general rather detailed analogy of phenomena in a determi-
more complicated than the original one, but it can nistic field theory with critical phase transitions in
be approximated by a fixed expression where only statistical mechanics. But critical phenomena are
a finite number of parameters are adjusted. Fixed important for general relativity mostly for a differ-
points of the renormalization group correspond to ent reason.
Hamiltonians with the parameters at their critical Black holes are among the most important
values. The critical value of any dimensional solutions of general relativity because of their
parameter must be zero (or infinity). Only universality: the black hole uniqueness theorems
dimensionless combinations can have nontrivial state that stable black holes are completely deter-
critical values. mined by their mass, angular momentum, and
The behavior of thermodynamical quantities at electric charge the KerrNewman family of black
the critical point is in general not trivial to calculate. holes. Perturbation theory shows that any perturba-
But the action of the renormalization group on tions of black holes from the KerrNewman solu-
length scales is given by its definition. The blowup tions must be radiated away.
of the correlation length
at the critical point is, Critical solutions have a similar importance
therefore, the easiest critical exponent to calculate. because they are generic intermediate states of
We make contact with critical phenomena in the evolution that are also independent of the
gravitational collapse by considering the time evolu- initial data. An important distinction is that
tion in coordinates (, x) as a renormalization group critical solutions depend on the matter model,
action. The calculation of the critical exponent for and are therefore less universal than black holes.
the black hole mass M is the precise analog of the However, critical phenomena in gravitational
calculation of the critical exponent for the correla- collapse seem to arise in axisymmetric vacuum
tion length
, substituting T T for p p , and spacetimes, and so are apparently not linked to the
Critical Phenomena in Gravitational Collapse 673
Current Algebra
G A Goldin, Rutgers University, Piscataway, NJ, USA More specifically (Adler and Dashen 1968), let
2006 Elsevier Ltd. All rights reserved. F a (x), a = 1, 2, . . . ,8, = 0, 1, 2, 3, be an octet of
hadronic vector currents, where as usual
x = (x ) = (x0 , x) denotes a point in four-dimensional
spacetime. Likewise, introduce an axial vector octet
Introduction
F 5
a (x). Unless otherwise specified, we use natural
Certain commutation relations among the current units, where h = 1 and c = 1. Define the correspond-
density operators in quantum field theories define ing charges Fa and Fa5 to be the space integrals of the
an infinite-dimensional Lie algebra. The original time components of these currents, that is,
current algebra of Gell-Mann described weak and Z
electromagnetic currents of the strongly interacting 0
Fa x d3 xF 0a x0 ; x
particles (hadrons), leading to the AdlerWeisberger Z 1
formula and other important physical results. This Fa5 x0 d3 xF 50 x0
; x
a
helped inspire mathematical and quantum-theoretic
developments such as the Sugawara model, light where d3 x = dx1 dx2 dx3 . Then F1 , F2 , F3 are the
cone currents, Virasoro algebra, the mathematical three components I1 , I2 , I3 of the isotopic spin, and
theory of affine KacMoody algebras, and non- p
Y = (2 3=3)F8 is the hypercharge. The usual elec-
relativistic current algebra in quantum and statis- tromagnetic current Jem
(x0 , x) is given by
tical physics. Lie algebras of local currents may be
p !
the infinitesimal representations of loop groups, 3
local current groups or gauge groups, diffeomorph- Jem q F 3 F 2
3 8
ism groups, and their semidirect products or other
extensions. Broadly construed, current algebra thus where q is the unit elementary charge, and the total
R
leads directly into the representation theory of charge is given by Q = d3 x Jem 0
(x0 , x) = q(I3 Y=2).
infinite-dimensional groups and algebras. Applica- The hadronic part of the weak current entering an
tions have ranged across conformally invariant effective Lagrangian can be written as
field theory, vertex operator algebras, exactly h i
solvable lattice and continuum models in statistical
Jw F 1 F 5
1 i F 2 F 5
2 cos C
physics, exotic particle statistics and q-commuta- h i
tion relations, hydrodynamics and quantized vortex F 4 F 54 i F 5 F 55 sin C 3
motion. This brief survey describes but a few
highlights. where C is the Cabibbo angle (determined experi-
mentally to be 0.27 rad). The terms with F 1 F 51
and F 2 F 52 are strangeness conserving, those with
Relativistic Local Current Algebra F 4 F 54 and F 5 F 55 are not.
for Hadrons The main current algebra hypothesis is that the
time components F 0 and F 50 of these octets satisfy
To model superfluidity, Landau had proposed in
the equal-time commutation relations:
1941 a quantum hydrodynamics fundamentally
0 0
based on local fluid densities and currents as F a x ; x; F 0b y0 ; y x0 y0
(operator) dynamical variables. However, current X
algebra came into its own in theoretical physics with i3 x y cabd F 0d x0 ; x
the ideas of Gell-Mann in the early 1960s. The basic 0 0
d
concept, in the era just preceding quantum chromo- F a x ; x; F 50 0
b y ; y x0 y0
dynamics (QCD), was that even without knowing X 4
the Lagrangian governing hadron dynamics in i3 x y cabd F 50 0
d x ; x
d
detail, exact kinematical information the local 50 0
symmetry could still be encoded in an algebra of F a x ; x; F 50 0
b y ; y x0 y0
X
currents. The local (vector and axial vector) current i3 x y cabd F 0d x0 ; x
density operators, expressed where possible in terms d
of underlying quantized field operators in Hilbert
space, were to form two octets of Lorentz 4-vectors, where the cabd are structure constants of the Lie
with each octet corresponding to the eight genera- algebra of SU(3), antisymmetric in the indices. Since
tors of the compact Lie group SU(3). current commutators relate bilinear expressions to
Current Algebra 675
linear ones, they fix the normalizations of the beyond an experimental test of the algebra of
currents. The chiral currents F L 5
a = 1=2(F a F a ) charges to test the actual local current algebra.
R 5
and F a = 1=2(F a F a ) commute with each Here, the prediction pertained to structure functions
other, so that the local current algebra decomposes in the deep inelastic scattering of neutrinos. This
into two independent pieces. was elaborated by Bjorken to inelastic electron
The Dirac -functions in eqns [4] require that F 0a and scattering. On the theoretical side, the study of the
50
F a be interpreted as (unbounded) operator-valued chiral current in perturbation theory led into the
distributions; while the fixed-time condition suggests theory of anomalies. All these ideas were highly
these should make mathematical sense as influential in subsequent theoretical work (Treiman
three-dimensional distributions, with x0 held constant. et al. 1985, Mickelsson 1989).
Such distributions may be modeled on the test-function It is a natural idea to try to extend eqns [4] or [6],
space D of real-valued, compactly supported, C1 which elegantly express the combined ideas of
functions on the spacelike hyperplane R3 . For functions locality and symmetry, to an equal-time commutator
fa , fa5 2 D, one has formally the smeared currents algebra that would also include the space compo-
that are expected to be bona fide (unbounded) nents of the local currents F ka , k = 1, 2, 3. One may
operators in Hilbert space; suppressing x0 , write without difficulty the commutators of the
Z charges in [1] with these space components:
F 0a fa d3 xfa xF 0a x0 ; x
R 3 Fa x0 ; F kb x0 ; x Fa5 x0 ; F 5k 0
b x ; x
Z 5 X
50 5
F a fa d3 x fa5 xF 50
a x0
; x i cabd F kd x0 ; x
R3 d
7
Equations [4] then become Fa x0 ; F 5k 0 5 0 k 0
b x ; x Fa x ; F b x ; x
0 X
F a fa ; F 0b fb F 50 50
a fa ; F b fb
i cabd F 5k 0
d x ; x
X d
i F 0d cabd fa fb
d 6 But the commutator of the local time component
0 X with the local space component of the current
F a fa ; F 50
b fb i F 50
d cabd fa fb cannot be merely the obvious extrapolation from
d
eqns [4] and [7], that is, it cannot be
Let g(x) be a C map from R3 to the Lie algebra G of
1
field-theoretic model. Furthermore, when the number Related to the Sugawara current algebra, with s = 1
of spacetime dimensions is greater than 1 1, the and the spatial dimension compactified, are affine
c-number Schwinger terms turn out to be infinite. KacMoody and Virasoro algebras (Goddard and
Hence, we do not obtain this way a bona- fide Olive 1986, Kac 1990). Consider the infinite-dimen-
infinite-dimensional, equal-time commutator algebra sional Lie algebra map(S1 , G) of smooth functions
comprising all the components of the local currents. from the circle to G under the pointwise bracket. This
is also called a loop algebra. Referring to the basis Fa ,
define Ta(m) for integer m to be the Fourier function
Sugawara, KacMoody, and ! Fa exp [im]. The pointwise bracket in
Virasoro Algebras map(S1 , G) gives [Ta(m) , Tb(n) ] = id cabd Td(mn) for these
generators. The corresponding (untwisted) affine
Since equations such as [4] and [6] are not explicitly KacMoody algebra is a (uniquely defined, nontri-
dependent on how the currents are constructed from vial) one-dimensional central extension of this loop
underlying canonical fields, one has the possibility algebra that is, the new generator commutes with all
of writing a theory entirely in terms of self-adjoint elements of the Lie algebra and, in an irreducible
currents as the dynamical variables, bypassing the representation, must be a multiple of the identity.
field operators entirely, and expressing a Hamilto- In such a representation, the new bracket can be
nian operator directly in terms of such local written as
currents. This is in the spirit of approaches to X
quantum field theory based on local algebras of n mn
Tam ; Tb i cabd Td kmab m;n I 10
observables. It suggests consideration of relativistic d
current algebras with finite c-number or operator
Schwinger terms in s 1 dimensions, s 1. where k is a constant. Here, Ta(m = 0) is again a
The Sugawara model, which is of this type, turned representation of G. Self-adjointness of the local
out to be one of the most influential of those currents in the representation imposes the condition
The KacMoody and Virasoro algebras, both M under the pointwise bracket, exponentiates to the
modeled on S1 , may be combined to form a natural local current group Map0 (M, G), consisting of
semidirect sum of Lie algebras, with the additional smooth maps from M to G that are the identity
bracket outside a compact set in M, under the pointwise
group operation. When M is taken to be the four-
Tam ; Ln mTamn 13 dimensional spacetime manifold (rather than a
Roughly speaking, the KacMoody generators cor- spacelike hyperplane), the local current group
respond to Fourier transforms of charge densities on modeled on M is mathematically a gauge group for
S1 , whereas the Virasoro generators correspond to nonabelian gauge field theory.
Fourier transforms of infinitesimal motions in S1 . Likewise, the algebra vect0 (M) exponentiates to
The central extensions provide the finite, c-number the group Diff 0 (M) of compactly supported C1
Schwinger terms. These structures have important diffeomorphisms of M (under composition). The
application to light cone current algebra, confor- KacMoody and Virasoro algebras exponentiate to
mally invariant quantum field theories in (1 1)- central extensions of the loop group Map(S1 , G) and
dimensional spacetime, the quantum theory of the diffeomorphism group Diff(S1 ), respectively. The
strings, exactly solvable models in statistical semidirect sums of the Lie algebras are the infinite-
mechanics, and many other domains. simal generators of semidirect products of the
Of greatest physical importance, both in quantum groups.
field theory and statistical mechanics, are those Under appropriate technical conditions, self-
irreducible, self-adjoint representations of the Virasoro adjoint representations of current algebras generate
algebra known as highest weight representations, (and may be obtained from) continuous unitary
where the spectrum of the operator L(m = 0) is bounded representations of the corresponding groups. The
below. In these applications, one represents a pair of needed technical conditions have to do with the
Virasoro algebras by mutually commuting sets of existence of a dense set of analytic vectors belonging
operators L(m) and L (m) . In the quantum theory, for to a common, dense invariant domain of essential
example, one takes the total energy H / L (0) L(0) , self-adjointness for the currents.
(0) (0)
and the total momentum P / L L . In a highest
weight representation, there is a unique eigenstate of
L(0) having the lowest eigenvalue h; for this vacuum Nonrelativistic Current Algebra
jhi, L(m) jhi = 0, m > 0.
Friedan, Qiu, and Shenker showed in 1984 that In nonrelativistic local current algebra, Schwinger
highest weight representations are characterized by a terms do not appear. In 1968, Dashen and Sharp
class of specific, non-negative values of the central defined (at fixed time t, suppressed in the present
charge c and, correspondingly, of h: either c 1 (and notation) a mass density (x) = m (x) (x) and a
h 0) or c = 1 6( 2)1 ( 3)1 , = 1, 2, 3, . . . momentum density J(x) = (h=2i){ (x)r (x)
(and h assumes a corresponding, specified set of values [r (x)] (x)}, where is a second-quantized cano-
for each value of ). In a beautiful application to the nical field; here we keep h in the notation. The
study of the critical behavior of well-known statistical resulting equal-time algebra is the semidirect sum:
systems, in which the generator of dilations is x; y 0
proportional to L (0) L(0) , they discovered a direct
@
correspondence with permitted values of the central x; Jk y ih k 3 x yx
charge; thus, c = 1=2 for the Ising model, c = 7=10 for @x
the tricritical Ising model, c = 4=5 for the three-state @ 3 14
Jk x; J y ih x yJ y
Potts model, and c = 6=7 for the tricritical three-state @yk
Potts model. @
3 x yJk x
@x
Since this current algebra is independent of whether
Current Algebras and Groups obeys commutation or anticommutation relations,
Local current algebras may be exponentiated to the information as to particle statistics (Bose or
obtain corresponding infinite-dimensional topologi- Fermi) is not encoded in the Lie algebra itself but in
cal groups (Pressley and Segal 1986, Mickelsson the choice of its representation (up to unitary
1989, Kac 1990). Let G be a Lie group whose Lie equivalence). Again interpreting andR Jk as operator-
algebra is G. The algebra map0 (M, G), consisting of valued distributions,
R define (f ) = R3 d3 x f (x)(x)
3
smooth, compactly supported G-valued functions on and J(g) = R3 d x 3k = 1 gk (x)Jk (x), where f and the
678 Current Algebra