Deep Learning and Physics
By Akinori Tanaka, Akio Tomiya and Koji Hashimoto
()
About this ebook
In recent years, machine learning, including deep learning, has begun to be used in various physics studies. Why is that? Is knowing physics useful in machine learning? Conversely, is knowing machine learning useful in physics?
This book is devoted to answers of these questions. Starting with basic ideas of physics, neural networks are derived naturally. And you can learn the concepts of deep learning through the words of physics.
In fact, the foundation of machine learning can be attributed to physical concepts. Hamiltonians that determine physical systems characterize various machine learning structures. Statistical physics given by Hamiltonians defines machine learning by neural networks. Furthermore, solving inverse problems in physics through machine learning and generalization essentially providesprogress and even revolutions in physics. For these reasons, in recent years interdisciplinary research in machine learning and physics has been expanding dramatically.
This book is written for anyone who wants to learn, understand, and apply the relationship between deep learning/machine learning and physics. All that is needed to read this book are the basic concepts in physics: energy and Hamiltonians. The concepts of statistical mechanics and the bracket notation of quantum mechanics, which are explained in columns, are used to explain deep learning frameworks.
We encourage you to explore this new active field of machine learning and physics, with this book as a map of the continent to be explored.
Related to Deep Learning and Physics
Related ebooks
Computational Modeling of Tensegrity Structures: Art, Nature, Mechanical and Biological Systems Rating: 0 out of 5 stars0 ratingsHow We Understand Mathematics: Conceptual Integration in the Language of Mathematical Description Rating: 0 out of 5 stars0 ratingsInformation Geometry and Its Applications Rating: 0 out of 5 stars0 ratingsLogic: The Laws of Truth Rating: 4 out of 5 stars4/5The Science We Live By Rating: 0 out of 5 stars0 ratingsRelativity from Lorentz to Einstein.: A Guide for Beginners, Perplexed and Experimental Scientists. Rating: 0 out of 5 stars0 ratingsMathematical Knowledge and the Interplay of Practices Rating: 0 out of 5 stars0 ratingsA Prelude to Quantum Field Theory Rating: 0 out of 5 stars0 ratingsQuantum Physics - Incredible Unlimited Memory: A Beginners Guide to How Quantum Physics Affects Everything around Us Rating: 0 out of 5 stars0 ratingsStable Adaptive Systems Rating: 5 out of 5 stars5/5Spin Glasses and Complexity Rating: 4 out of 5 stars4/5Scattering Theory: The Quantum Theory of Nonrelativistic Collisions Rating: 5 out of 5 stars5/5An Introduction to Practical Formal Methods Using Temporal Logic Rating: 0 out of 5 stars0 ratingsA Bridge to Advanced Mathematics Rating: 1 out of 5 stars1/5Quantum Physics: A Beginners Guide to How Quantum Physics Affects Everything around Us Rating: 5 out of 5 stars5/5Quantum Physics Rating: 3 out of 5 stars3/5Chance in Biology: Using Probability to Explore Nature Rating: 3 out of 5 stars3/5First Course in Mathematical Logic Rating: 3 out of 5 stars3/5Applications of Modern Physics in Medicine Rating: 3 out of 5 stars3/5The Nature and Power of Mathematics Rating: 0 out of 5 stars0 ratingsDiscrete and Computational Geometry Rating: 5 out of 5 stars5/5Understanding Leadership in Complex Systems: A Praxeological Perspective Rating: 0 out of 5 stars0 ratingsOur Place in the Universe - II: The Scientific Approach to Discovery Rating: 0 out of 5 stars0 ratingsPhysics of the New Millennium, Birth of the New Paradigm Rating: 0 out of 5 stars0 ratingsThe Theory of Raikons: Know Everything about the universe through the Raikons Rating: 0 out of 5 stars0 ratingsPractical Relativity: From First Principles to the Theory of Gravity Rating: 0 out of 5 stars0 ratingsSocial and Economic Networks Rating: 4 out of 5 stars4/5Numerical Methods for Stochastic Computations: A Spectral Method Approach Rating: 5 out of 5 stars5/5Learning Logic: Critical Thinking With Intuitive Notation Rating: 5 out of 5 stars5/5Quantum Mechanics in Simple Matrix Form Rating: 4 out of 5 stars4/5
Physics For You
The God Effect: Quantum Entanglement, Science's Strangest Phenomenon Rating: 4 out of 5 stars4/5The Invisible Rainbow: A History of Electricity and Life Rating: 5 out of 5 stars5/5My Big TOE - Awakening H: Book 1 of a Trilogy Unifying Philosophy, Physics, and Metaphysics Rating: 4 out of 5 stars4/5Midnight in Chernobyl: The Untold Story of the World's Greatest Nuclear Disaster Rating: 4 out of 5 stars4/5What If?: Serious Scientific Answers to Absurd Hypothetical Questions Rating: 5 out of 5 stars5/5Quantum Physics for Beginners Rating: 4 out of 5 stars4/5How to Diagnose and Fix Everything Electronic, Second Edition Rating: 4 out of 5 stars4/5Welcome to the Universe: An Astrophysical Tour Rating: 4 out of 5 stars4/5Physics Essentials For Dummies Rating: 4 out of 5 stars4/5Physics I For Dummies Rating: 4 out of 5 stars4/5Step By Step Mixing: How to Create Great Mixes Using Only 5 Plug-ins Rating: 5 out of 5 stars5/5The Dancing Wu Li Masters: An Overview of the New Physics Rating: 4 out of 5 stars4/5Quantum Physics: A Beginners Guide to How Quantum Physics Affects Everything around Us Rating: 5 out of 5 stars5/5The Grid: The Fraying Wires Between Americans and Our Energy Future Rating: 4 out of 5 stars4/5Feynman Lectures Simplified 1A: Basics of Physics & Newton's Laws Rating: 4 out of 5 stars4/5The Reality Revolution: The Mind-Blowing Movement to Hack Your Reality Rating: 4 out of 5 stars4/5Basic Physics: A Self-Teaching Guide Rating: 4 out of 5 stars4/5Unlocking Spanish with Paul Noble Rating: 5 out of 5 stars5/5Moving Through Parallel Worlds To Achieve Your Dreams Rating: 5 out of 5 stars5/5The Physics of Wall Street: A Brief History of Predicting the Unpredictable Rating: 4 out of 5 stars4/5God Particle: If the Universe Is the Answer, What Is the Question? Rating: 5 out of 5 stars5/5The Science of God: The Convergence of Scientific and Biblical Wisdom Rating: 3 out of 5 stars3/5Vibration and Frequency: How to Get What You Want in Life Rating: 4 out of 5 stars4/5The Consciousness of the Atom Rating: 5 out of 5 stars5/5How to Teach Quantum Physics to Your Dog Rating: 4 out of 5 stars4/5DIY Lithium Battery Rating: 3 out of 5 stars3/5Physics Rating: 4 out of 5 stars4/5The Nature of Space and Time Rating: 5 out of 5 stars5/5QED: The Strange Theory of Light and Matter Rating: 4 out of 5 stars4/5
Reviews for Deep Learning and Physics
0 ratings0 reviews
Book preview
Deep Learning and Physics - Akinori Tanaka
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021
A. Tanaka et al.Deep Learning and PhysicsMathematical Physics Studieshttps://doi.org/10.1007/978-981-33-6108-9_1
1. Forewords: Machine Learning and Physics
Akinori Tanaka¹ , Akio Tomiya² and Koji Hashimoto³
(1)
iTHEMS, RIKEN, Wako, Saitama, Japan
(2)
Radiation Lab, RIKEN, Wako, Saitama, Japan
(3)
Department of Physics, Osaka University, Toyonaka, Osaka, Japan
Abstract
What is the relationship between machine learning and physics? First let us start by experiencing why machine learning and physics can be related. There is a concept that bridges between physics and machine learning: that is information. Physics and information theory have been mutually involved for a long time. Also, machine learning is based on information theory. Learning is about passing information and recreating relationships between information, and finding information spontaneously. Therefore, in machine learning, it is necessary to use information theory that flexibly deal with the amount of information, and as a result, machine learning is closely related to the system of information theory. This chapter explores the relationship between physics, information theory, and machine learning, the core concepts in this book.
What is the relationship between machine learning and physics? We’ll take a closer look at that in this book, but first let us start by experiencing why machine learning and physics can be related. There is a concept that bridges between physics and machine learning: that is information.
Physics and information theory have been mutually involved for a long time, and the relationship is still widely and deeply developed. Also, machine learning is based on information theory. Learning is about passing information and recreating relationships between information, and finding information spontaneously. Therefore, in machine learning and deep learning, it is necessary to use information theory that flexibly deals with the amount of information, and as a result, machine learning is closely related to the system of information theory.
As the reader can imagine from these things, machine learning and physics should have some big relationship with information
as an intermediate medium. One of the goals of this book is to clarify this firm bridge. Figure 1.1 shows a conceptual diagram.
Fig. 1.1
Physics, machine learning, and information. Do they form a triangle?
This chapter explores the relationship between physics, information theory, and machine learning, the core concepts in this book. Let us explain how the titles of this book, Deep Learning
and Physics
are related.
1.1 Introduction to Information Theory
Quantifying information
This chapter explores the relationship between physics and machine learning using information theory as a bridge. For that, we need to define exactly what information
is. What is information
in the first place? First, read the following two sentences [3]:
(1.1)
$$\displaystyle \begin{aligned} & \bullet\ \text{Gorillas teach humans how to add numbers}. {} \end{aligned} $$(1.2)
Which of these two sentences can be said to have more information ? (1.1) is not surprising, if true, because it is possible, regardless of whether the gorilla understands it or not. You may have actually heard such news. On the other hand, if (1.2) is true, it is quite surprising. Then one will change one’s mind and remember that there could be a gorilla which can teach humans. This may be said to be increasing the amount of information. In other words [4],
$$\displaystyle \begin{aligned} \text{Amount of information}=\text{Extent of surprise}. \end{aligned} $$(1.3)
Let us go ahead with this policy anyway. The greater the surprise, the less likely it is to happen. In addition, if we try to make information increase by addition
, with P(event) the probability that an event will occur, we have¹
(1.4)
When the probability is low, the amount of information is large.
Average amount of information
Let us further assume that various events A 1, A 2, …, A W occur with probabilities p 1, p 2, …, p W, respectively. At this time, the amount of information of each event is $$-\log p_i $$ , and its expectation value
$$\displaystyle \begin{aligned} S_{\text{information}} = - \sum_{i=1}^W p_i \log p_i . {} \end{aligned} $$(1.5)
is called information entropy .² Let us take a concrete example of what information entropy represents. Suppose there are W boxes, and let p i be the probability that the ith box contains a treasure. Of course, we want to predict which box contains the treasure as accurately as possible, but the predictability depends on the value of p i. For example, if we know that the treasure is always in the first box, it is easy to predict. The value of the information entropy for this case is zero:
$$\displaystyle \begin{aligned} p_i = \left\{ \begin{array}{ll} 1 & (i=1) \\ 0 & (\text{other than that})\\ \end{array} \right. \quad S_{\text{information}} = 0 . \end{aligned} $$(1.6)
On the other hand, if the probability is completely random,
$$\displaystyle \begin{aligned} p_i = \frac{1}{W} \qquad S_{\text{information}} = \log W . {} \end{aligned} $$(1.7)
For this case, even if we know the probability, it is difficult to predict because we do not know which box it is in. This time, the information entropy has a large value, $$ \log W $$ . In other words, the more difficult it is to predict, the greater the information entropy. Therefore, the relation to the commonly referred to information
is as follows:
(1.8)
The information
here is the information we already have, and the amount of information (1.4) is the information obtained from the event.
1.2 Physics and Information Theory
There are many important concepts in physics, but no physicist would oppose that one of the most important is entropy. Entropy is an essential concept in the development of thermodynamics, and is expressed as
$$ S = k_{\mathrm {B}}\log W $$, where W is the number of microscopic states, in statistical mechanics. The proportionality coefficient k B is the Boltzmann constant and can be set to 1 if an appropriate temperature unit is used. The entropy of the system in this case is
$$\displaystyle \begin{aligned} S = \log W \, . {} \end{aligned} $$(1.9)
In physics, the relationship with information theory is revealed through this entropy: That is because S information of (1.7) is exactly the same formula as (1.9). In this way, various discussions on physical systems dealing with multiple degrees of freedom, such as thermodynamics and statistical mechanics, have a companion information-theoretic interpretation.
By the way, most of the interest in research in modern physics (e.g., particle physics and condensed matter physics) is in many-body systems. This suggests that information theory plays an important role at the forefront of modern physics research. Here are two recent studies.
Black hole information loss problem
Research by J. Bekenstein and S. Hawking [6, 7] shows theoretically that black holes have entropy and radiate their mass outward as heat. To explain the intrinsic problem hidden in this radiation property of black holes, the following example is instructive: Assume that there is a spring, and fix it while stretched. If this is thrown into a black hole, the black hole will grow larger by the stored energy, and then emit that energy as heat with the aforementioned thermal radiation. But here is the problem. Work can be freely extracted from the energy of the spring before it is thrown, but the second law of thermodynamics limits the energy efficiency that can be extracted from the thermal radiation. In other words, even if there is only a single state before the throwing (that is, the entropy is zero), the entropy increases by the randomness after thermal radiation. An increase in entropy means that information has been lost (see (1.8)). This is known as the information loss problem and is one of the most important issues in modern physics for which no definitive solution has yet been obtained.³
Maxwell’s demon
Maxwell’s demon is a virtual devil that appears in a thought experiment and breaks the second law of thermodynamics, at least superficially; it was introduced by James Clerk Maxwell. Assume a box contains a gas at temperature T; a partition plate with a small hole is inserted in the middle of this box. The hole is small enough to allow one gas molecule to pass through. There is also a switch next to the hole, which can be pressed to close or open the hole. According to statistical mechanics, in a gas with a gas molecular mass m and temperature T, molecules with speed v exist with probability proportional to $$ e^{-\frac {mv^2}{2 k_B T}}$$ . This means that gas molecules of various speeds are flying: there are some fast molecules, and some slow molecules. Assuming that a small devil is sitting near the hole in the partition plate, the devil lets only the fast molecules coming from the right go through the hole to the left, only the slow molecules from the left go through the hole to the right.
As a result, relatively slow molecules remain on the right, and fast-moving molecules gather on the left. That is, if the right temperature is T R and the left temperature is T L, it means that T R < T L. Using the ideal gas equation of state, p R < p L, so the force F = p L − p R acts in the direction of the room on the right. If we allow the partition to slide and attach some string to it, then this F can do some work. There should be something strange in this story, because it can convert heat to work endlessly, which means that we have created a perpetual motion machine of the second kind. In recent years, information theory has been shown to be an effective way to fill this gap [8].
In this way, relations with information theory have been more and more prominent in various aspects of theoretical physics. Wheeler, famous for his work on gravity theory, even says it from bit
(physical existence come from information) [9]. In recent years, not only information theory based on ordinary probability theory, but also research in a field called quantum information theory based on quantum mechanics with interfering probabilities has been actively performed, and various developments are reported daily. This is not covered in this book, but interested readers may want to read [10] and other books.
1.3 Machine Learning and Information Theory
One way to mathematically formulate machine learning methods is based on probability theory. In fact, this book follows that format. One of the nice things about this method is that we can still use various concepts of information theory, including entropy. The purpose of machine learning is to predict the future unknown from some experience,
and when formulating this mathematically, as in (1.8), it is necessary to deal with a quantity measuring the degree of difficulty in predicting things. However, the predictability
described in (1.8) is based on the assumption that we know the probability p i of the occurrence of the event. Even in machine learning, it is assumed that there exists p i behind the phenomenon, while its value is not known. The following example illustrates that even in such cases, it is still very important to consider a concept similar to entropy.
Relative entropy and Sanov’s theorem
Here, let us briefly look at a typical method of machine learning which we study in this book.⁴ As before, let us assume that the possible events are A 1, A 2, …, A W, and that they occur with probabilities p 1, p 2, …, p W, respectively. If we can actually know the value of p i, we will be able to predict the future to some extent with the accuracy at the level of the information entropy. However, in many cases, p i is not known and instead we only know information
about how many times A i has actually occurred,
(1.10)
Here, # (the number sign) is an appropriate positive integer, indicating the number of times. Just as physics experiments cannot observe the theoretical equations themselves, we cannot directly observe p i here. Therefore, consider creating an expected probability q i that is as close as possible to p i and regard the problem of determining a good
q i here as machine learning. How should we determine the value of q i from the information
(1.10) alone? One thing we can do is to evaluate
(1.11)
If we can calculate this, we need to determine q i that makes the probability (1.11) as large as possible (close to 1). This idea is called the maximum likelihood estimation . First, assume that each A i occurs with probability q i,
$$\displaystyle \begin{aligned} p(\text{probability of}\ A_i \ \text{occurring}\ \#_i \ \text{times}) = q_i^{\#i} .\end{aligned} $$(1.12)
Also, in this setup, we assume that the A is can occur in any order. For example, [A 1, A 1, A 2] and [A 2, A 1, A 1] are counted as the same, and the number of such combinations should be accounted for in the probability calculation. This is the multinomial coefficient
$$\displaystyle \begin{aligned} \begin{pmatrix} \# \\ \#_1, \#_2, \dots, \#_W \end{pmatrix} =\frac{\# !}{\#_1 ! \#_2 ! \cdots \#_W!} \, .\end{aligned} $$(1.13)
Then we can write the probability as the product of these,
$$\displaystyle \begin{aligned} \text{(1.11)} = q_1^{\#_1} q_2^{\#_2} \dots q_W^{\#_W} \frac{\# !}{\#_1! \#_2! \dots \#_W! } \, . {} \end{aligned} $$(1.14)
Then, we should look for q i that makes this value as large as possible. In machine learning, q i is varied to actually increase the amount equivalent to (1.14) as much as possible.⁵
By the way, if the number of data is large (# ≈∞), by the law of large numbers,⁶ we have
$$\displaystyle \begin{aligned} \frac{\#_i}{\#} \approx p_i \quad \Leftrightarrow \quad \#_i \approx \# \cdot p_i \, . {} \end{aligned} $$(1.17)
Here, # i must also be a large value, so according to Stirling’s formula (which is familiar in physics), we find
$$\displaystyle \begin{aligned} \#_i ! \approx \#_i^{\#_i} \, . {} \end{aligned} $$(1.18)
By substituting (1.17) and (1.18) into (1.14), we can get an interesting quantity:
$$\displaystyle \begin{aligned} \text{(1.14)} &\approx q_1^{\# \cdot p_1 } q_2^{\# \cdot p_2 } \dots q_W^{\# \cdot p_W} \frac{\# !}{(\# \cdot p_1)! (\# \cdot p_2)! \dots (\# \cdot p_W)! } \\ &\approx q_1^{\# \cdot p_1 } q_2^{\# \cdot p_2 } \dots q_W^{\# \cdot p_W} \frac{\#^\#}{(\# \cdot p_1)^{\# \cdot p_1} (\# \cdot p_2)^{\# \cdot p_2} \dots (\# \cdot p_W)^{\# \cdot p_W} } \\ &= q_1^{\# \cdot p_1 } q_2^{\# \cdot p_2 } \dots q_W^{\# \cdot p_W} \frac{1}{p_1^{\# \cdot p_1} p_2^{\# \cdot p_2} \dots p_W^{\# \cdot p_W} } \\ &= \exp{\Big[ - \# \sum_{i=1}^W p_i \log \frac{p_i}{q_i} \Big] } \, . \end{aligned} $$(1.19)
The goal is to make this probability as close to 1 as possible, which means to bring
$$ \sum _{i=1}^Wp_i \log \frac {p_i}{q_i} $$close to zero. This quantity is called relative entropy , and is known to be zero only when p i = q i. Therefore, bringing the original goal (1.11) as close to 1 as possible corresponds to reducing the relative entropy. We mean,
$$\displaystyle \begin{aligned} \text{Relative entropy} = \text{Amount to measure how close the prediction}\ q_i \ \text{is to the truth}\ p_i \, . \end{aligned} $$The properties of the relative entropy will be explained later in this book. The relative entropy is called Kullback-Leibler divergence in information theory, and is important in machine learning as can be seen here, as well as having many mathematically interesting properties. The fact that (1.11) is approximately the same as the Kullback-Leibler divergence is one of the consequences from large deviation theory called Sanov’s (I. N. Sanov) theorem [12, 13].
1.4 Machine Learning and Physics
So far, we have briefly described the relationship between physics and information theory, and the relationship between machine learning and information theory. Then, there may be a connection between machine learning and physics in some sense. The aforementioned Fig. 1.1 shows the concept: physics and information are connected, and information and machine learning are connected. Then, how can physics and machine learning be connected?
A thought experiment
Suppose we have a fairy here. A button and a clock are placed in front of the fairy, and every minute, the fairy chooses to press the button or not. If the fairy presses the button, the machine outputs 1. If the fairy does not press the button, the output is 0. The result of a special fairy
is the following sequence:
(1.20)
The following figure shows the result of this monotonous job for about 4 hours, displayed in a matrix of 15 × 15:
../images/498087_1_En_1_Chapter/498087_1_En_1_Figa_HTML.pngWe let the fairy to continue its work all night for five days. The result is:
../images/498087_1_En_1_Chapter/498087_1_En_1_Figb_HTML.pngIt looks like a person’s face.⁷ The special fairy
had the goal of drawing this picture by pressing/not-pressing buttons. This may seem like a trivial task, but here we explain a bit more physically that it is not.
First, consider the random arrangement of 0 and 1 as the initial state. With this we do not lose the generality of the problem. The fairy decides whether to replace 0/1 with 1/0 or leave it as it is at the time (x, y) (coordinates correspond to the time), by operating the button. The fairy checks the state of the single bit at time (x, y) and decides whether or not to invert it. Yes, the readers now know that the identity of this special fairy
is Maxwell’s demon .⁸
From this point of view, the act of drawing a picture
is to draw a meaningful combination (low entropy) from among the myriad of possibilities (high entropy) that can exist on the canvas. In other words, it is to make the information increase in some way. You may have heard news that artificial intelligence has painted pictures. Without fear of misunderstanding, we can say that it means a successful creation of a Maxwell’s demon, in the current context.
The story of Maxwell’s demon above was a thought experiment that suggested that machine learning and physics might be related, but in fact, can the subject of this book, machine learning and physics, be connected by a thick pipe?
In fact, even in the field of machine learning, there are many problems that often require physical sense
and there are many research results inspired by it. For example, in the early days of deep learning, a model (Boltzmann machine) that used a stochastic neural network was used, and this model was inspired by statistical mechanics, especially the Ising model that deals with spin degrees of freedom. A neural network is an artificial network that simulates the connection of neurons in the brain, and the Hopfield model, which provides a mechanism of storing memory as a physical system, can be said to be a type of Ising model. Focusing on these points, in this book we describe various settings of neural networks from statistical mechanics.
In Boltzmann machine learning and Bayesian statistics, it is necessary to generate (or sample) (pseudo-)random numbers that follow some complicated probability distribution. It is a popular method in numerical calculations in condensed matter physics and elementary particle physics, so there is something in common here.⁹
Sampling takes too long for complex models, so in recent years the Boltzmann machine has been replaced by neural networks and is not attracting attention. However, the academic significance may yet be recealed, so it would be a good idea not only to follow the latest topics of deep learning, but also to return to the beginning and rethink the Boltzmann machine.
Today, deep learning is simply a collection of various techniques that have been found to work empirically
when applying deep neural networks to various machine learning schemes. However, the mind that senses any technique to work empirically
has something close to the so-called physical sense
.¹⁰ For example, in recent years, it has been known that a model called ResNet, which includes a bypass
in a neural network, has a high capability. ResNet learns the residual
of the deep neural network, rather than learning directly the features
that would have been acquired by the ordinary deep neural networks. The features
are the quantities important for performing the desired task. In the ResNet, the residuals are accumulated to express the features. This is expressed as a relationship differential = residual
and integral = feature
, and is reminiscent of the equation of motion and its solution.
In this sense, machine learning and physics seem to have a certain
relationship. To actually try the translation, we need to create a corresponding dictionary such as Table 1.1. Returning to Fig. 1.1, combined with the many fragmentary relationships listed above, machine learning and deep learning methodologies can be viewed and constructed from a physical perspective.
Table 1.1
Terms and notations used in machine learning related fields and their counterparts in physics
With this in mind, this book is aimed at:
$$\displaystyle \begin{aligned} &\text{1. Understanding machine learning methods from a physics perspective} \\ &\text{2. Finding the link between physics problems and machine learning methods} \end{aligned} $$In this book, the first half is Introduction to Machine Learning/Deep Learning from the Perspective of Physics
, and the second half is Application to Physics
. Although there are many topics which are not introduced in this book, we have tried to