Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Deep Learning and Physics
Deep Learning and Physics
Deep Learning and Physics
Ebook416 pages3 hours

Deep Learning and Physics

Rating: 0 out of 5 stars

()

Read preview

About this ebook

What is deep learning for those who study physics? Is it completely different from physics? Or is it similar? 
In recent years, machine learning, including deep learning, has begun to be used in various physics studies. Why is that? Is knowing physics useful in machine learning? Conversely, is knowing machine learning useful in physics? 
This book is devoted to answers of these questions. Starting with basic ideas of physics, neural networks are derived naturally. And you can learn the concepts of deep learning through the words of physics.
In fact, the foundation of machine learning can be attributed to physical concepts. Hamiltonians that determine physical systems characterize various machine learning structures. Statistical physics given by Hamiltonians defines machine learning by neural networks. Furthermore, solving inverse problems in physics through machine learning and generalization essentially providesprogress and even revolutions in physics. For these reasons, in recent years interdisciplinary research in machine learning and physics has been expanding dramatically. 
This book is written for anyone who wants to learn, understand, and apply the relationship between deep learning/machine learning and physics. All that is needed to read this book are the basic concepts in physics: energy and Hamiltonians. The concepts of statistical mechanics and the bracket notation of quantum mechanics, which are explained in columns, are used to explain deep learning frameworks.
We encourage you to explore this new active field of machine learning and physics, with this book as a map of the continent to be explored.
LanguageEnglish
PublisherSpringer
Release dateFeb 20, 2021
ISBN9789813361089
Deep Learning and Physics

Related to Deep Learning and Physics

Related ebooks

Physics For You

View More

Related articles

Reviews for Deep Learning and Physics

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Deep Learning and Physics - Akinori Tanaka

    © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021

    A. Tanaka et al.Deep Learning and PhysicsMathematical Physics Studieshttps://doi.org/10.1007/978-981-33-6108-9_1

    1. Forewords: Machine Learning and Physics

    Akinori Tanaka¹  , Akio Tomiya²   and Koji Hashimoto³  

    (1)

    iTHEMS, RIKEN, Wako, Saitama, Japan

    (2)

    Radiation Lab, RIKEN, Wako, Saitama, Japan

    (3)

    Department of Physics, Osaka University, Toyonaka, Osaka, Japan

    Abstract

    What is the relationship between machine learning and physics? First let us start by experiencing why machine learning and physics can be related. There is a concept that bridges between physics and machine learning: that is information. Physics and information theory have been mutually involved for a long time. Also, machine learning is based on information theory. Learning is about passing information and recreating relationships between information, and finding information spontaneously. Therefore, in machine learning, it is necessary to use information theory that flexibly deal with the amount of information, and as a result, machine learning is closely related to the system of information theory. This chapter explores the relationship between physics, information theory, and machine learning, the core concepts in this book.

    What is the relationship between machine learning and physics? We’ll take a closer look at that in this book, but first let us start by experiencing why machine learning and physics can be related. There is a concept that bridges between physics and machine learning: that is information.

    Physics and information theory have been mutually involved for a long time, and the relationship is still widely and deeply developed. Also, machine learning is based on information theory. Learning is about passing information and recreating relationships between information, and finding information spontaneously. Therefore, in machine learning and deep learning, it is necessary to use information theory that flexibly deals with the amount of information, and as a result, machine learning is closely related to the system of information theory.

    As the reader can imagine from these things, machine learning and physics should have some big relationship with information as an intermediate medium. One of the goals of this book is to clarify this firm bridge. Figure 1.1 shows a conceptual diagram.

    ../images/498087_1_En_1_Chapter/498087_1_En_1_Fig1_HTML.png

    Fig. 1.1

    Physics, machine learning, and information. Do they form a triangle?

    This chapter explores the relationship between physics, information theory, and machine learning, the core concepts in this book. Let us explain how the titles of this book, Deep Learning and Physics are related.

    1.1 Introduction to Information Theory

    Quantifying information

    This chapter explores the relationship between physics and machine learning using information theory as a bridge. For that, we need to define exactly what information is. What is information in the first place? First, read the following two sentences [3]:

    $$\displaystyle \begin{aligned} & \bullet\ \text{Humans teach gorillas how to add numbers}. {} \end{aligned} $$

    (1.1)

    $$\displaystyle \begin{aligned} & \bullet\ \text{Gorillas teach humans how to add numbers}. {} \end{aligned} $$

    (1.2)

    Which of these two sentences can be said to have more information ? (1.1) is not surprising, if true, because it is possible, regardless of whether the gorilla understands it or not. You may have actually heard such news. On the other hand, if (1.2) is true, it is quite surprising. Then one will change one’s mind and remember that there could be a gorilla which can teach humans. This may be said to be increasing the amount of information. In other words [4],

    $$\displaystyle \begin{aligned} \text{Amount of information}=\text{Extent of surprise}. \end{aligned} $$

    (1.3)

    Let us go ahead with this policy anyway. The greater the surprise, the less likely it is to happen. In addition, if we try to make information increase by addition, with P(event) the probability that an event will occur, we have¹

    $$\displaystyle \begin{aligned} \text{Amount of information of event A} = - \log P(\text{event A}). {} \end{aligned} $$

    (1.4)

    When the probability is low, the amount of information is large.

    Average amount of information

    Let us further assume that various events A 1, A 2, …, A W occur with probabilities p 1, p 2, …, p W, respectively. At this time, the amount of information of each event is $$-\log p_i $$ , and its expectation value

    $$\displaystyle \begin{aligned} S_{\text{information}} = - \sum_{i=1}^W p_i \log p_i . {} \end{aligned} $$

    (1.5)

    is called information entropy .² Let us take a concrete example of what information entropy represents. Suppose there are W boxes, and let p i be the probability that the ith box contains a treasure. Of course, we want to predict which box contains the treasure as accurately as possible, but the predictability depends on the value of p i. For example, if we know that the treasure is always in the first box, it is easy to predict. The value of the information entropy for this case is zero:

    $$\displaystyle \begin{aligned} p_i = \left\{ \begin{array}{ll} 1 & (i=1) \\ 0 & (\text{other than that})\\ \end{array} \right. \quad S_{\text{information}} = 0 . \end{aligned} $$

    (1.6)

    On the other hand, if the probability is completely random,

    $$\displaystyle \begin{aligned} p_i = \frac{1}{W} \qquad S_{\text{information}} = \log W . {} \end{aligned} $$

    (1.7)

    For this case, even if we know the probability, it is difficult to predict because we do not know which box it is in. This time, the information entropy has a large value, $$ \log W $$ . In other words, the more difficult it is to predict, the greater the information entropy. Therefore, the relation to the commonly referred to information is as follows:

    $$\displaystyle \begin{aligned} \bullet \left\{ \begin{array}{ccccc} \text{Little ``information''} & \Leftrightarrow & \text{difficult to predict} & \Leftrightarrow & \text{large information entropy} \\ \text{A lot of ``information''} & \Leftrightarrow & \text{easy to predict} & \Leftrightarrow & \text{small information entropy} \end{array} \right. {} \end{aligned} $$

    (1.8)

    The information here is the information we already have, and the amount of information (1.4) is the information obtained from the event.

    1.2 Physics and Information Theory

    There are many important concepts in physics, but no physicist would oppose that one of the most important is entropy. Entropy is an essential concept in the development of thermodynamics, and is expressed as

    $$ S = k_{\mathrm {B}}\log W $$

    , where W is the number of microscopic states, in statistical mechanics. The proportionality coefficient k B is the Boltzmann constant and can be set to 1 if an appropriate temperature unit is used. The entropy of the system in this case is

    $$\displaystyle \begin{aligned} S = \log W \, . {} \end{aligned} $$

    (1.9)

    In physics, the relationship with information theory is revealed through this entropy: That is because S information of (1.7) is exactly the same formula as (1.9). In this way, various discussions on physical systems dealing with multiple degrees of freedom, such as thermodynamics and statistical mechanics, have a companion information-theoretic interpretation.

    By the way, most of the interest in research in modern physics (e.g., particle physics and condensed matter physics) is in many-body systems. This suggests that information theory plays an important role at the forefront of modern physics research. Here are two recent studies.

    Black hole information loss problem

    Research by J. Bekenstein and S. Hawking [6, 7] shows theoretically that black holes have entropy and radiate their mass outward as heat. To explain the intrinsic problem hidden in this radiation property of black holes, the following example is instructive: Assume that there is a spring, and fix it while stretched. If this is thrown into a black hole, the black hole will grow larger by the stored energy, and then emit that energy as heat with the aforementioned thermal radiation. But here is the problem. Work can be freely extracted from the energy of the spring before it is thrown, but the second law of thermodynamics limits the energy efficiency that can be extracted from the thermal radiation. In other words, even if there is only a single state before the throwing (that is, the entropy is zero), the entropy increases by the randomness after thermal radiation. An increase in entropy means that information has been lost (see (1.8)). This is known as the information loss problem and is one of the most important issues in modern physics for which no definitive solution has yet been obtained.³

    Maxwell’s demon

    Maxwell’s demon is a virtual devil that appears in a thought experiment and breaks the second law of thermodynamics, at least superficially; it was introduced by James Clerk Maxwell. Assume a box contains a gas at temperature T; a partition plate with a small hole is inserted in the middle of this box. The hole is small enough to allow one gas molecule to pass through. There is also a switch next to the hole, which can be pressed to close or open the hole. According to statistical mechanics, in a gas with a gas molecular mass m and temperature T, molecules with speed v exist with probability proportional to $$ e^{-\frac {mv^2}{2 k_B T}}$$ . This means that gas molecules of various speeds are flying: there are some fast molecules, and some slow molecules. Assuming that a small devil is sitting near the hole in the partition plate, the devil lets only the fast molecules coming from the right go through the hole to the left, only the slow molecules from the left go through the hole to the right. As a result, relatively slow molecules remain on the right, and fast-moving molecules gather on the left. That is, if the right temperature is T R and the left temperature is T L, it means that T R < T L. Using the ideal gas equation of state, p R < p L, so the force F = p L p R acts in the direction of the room on the right. If we allow the partition to slide and attach some string to it, then this F can do some work. There should be something strange in this story, because it can convert heat to work endlessly, which means that we have created a perpetual motion machine of the second kind. In recent years, information theory has been shown to be an effective way to fill this gap [8].

    In this way, relations with information theory have been more and more prominent in various aspects of theoretical physics. Wheeler, famous for his work on gravity theory, even says it from bit (physical existence come from information) [9]. In recent years, not only information theory based on ordinary probability theory, but also research in a field called quantum information theory based on quantum mechanics with interfering probabilities has been actively performed, and various developments are reported daily. This is not covered in this book, but interested readers may want to read [10] and other books.

    1.3 Machine Learning and Information Theory

    One way to mathematically formulate machine learning methods is based on probability theory. In fact, this book follows that format. One of the nice things about this method is that we can still use various concepts of information theory, including entropy. The purpose of machine learning is to predict the future unknown from some experience, and when formulating this mathematically, as in (1.8), it is necessary to deal with a quantity measuring the degree of difficulty in predicting things. However, the predictability described in (1.8) is based on the assumption that we know the probability p i of the occurrence of the event. Even in machine learning, it is assumed that there exists p i behind the phenomenon, while its value is not known. The following example illustrates that even in such cases, it is still very important to consider a concept similar to entropy.

    Relative entropy and Sanov’s theorem

    Here, let us briefly look at a typical method of machine learning which we study in this book.⁴ As before, let us assume that the possible events are A 1, A 2, …, A W, and that they occur with probabilities p 1, p 2, …, p W, respectively. If we can actually know the value of p i, we will be able to predict the future to some extent with the accuracy at the level of the information entropy. However, in many cases, p i is not known and instead we only know information about how many times A i has actually occurred,

    $$\displaystyle \begin{aligned} \bullet \left\{ \begin{array}{l} A_1 : \#_1 \ \text{times}, A_2 : \#_2 \ \text{times}, \dots, A_W : \#_W \ \text{times}, \\ \# (= \sum_{i=1}^W \#_i) \ \text{times in total}. \end{array} \right. {}\end{aligned} $$

    (1.10)

    Here, # (the number sign) is an appropriate positive integer, indicating the number of times. Just as physics experiments cannot observe the theoretical equations themselves, we cannot directly observe p i here. Therefore, consider creating an expected probability q i that is as close as possible to p i and regard the problem of determining a good q i here as machine learning. How should we determine the value of q i from the information (1.10) alone? One thing we can do is to evaluate

    $$\displaystyle \begin{aligned} \bullet \text{ Probability of obtaining information (1.10) assuming}\ q_i \ \text{is the true probability}. {}\end{aligned} $$

    (1.11)

    If we can calculate this, we need to determine q i that makes the probability (1.11) as large as possible (close to 1). This idea is called the maximum likelihood estimation . First, assume that each A i occurs with probability q i,

    $$\displaystyle \begin{aligned} p(\text{probability of}\ A_i \ \text{occurring}\ \#_i \ \text{times}) = q_i^{\#i} .\end{aligned} $$

    (1.12)

    Also, in this setup, we assume that the A is can occur in any order. For example, [A 1, A 1, A 2] and [A 2, A 1, A 1] are counted as the same, and the number of such combinations should be accounted for in the probability calculation. This is the multinomial coefficient

    $$\displaystyle \begin{aligned} \begin{pmatrix} \# \\ \#_1, \#_2, \dots, \#_W \end{pmatrix} =\frac{\# !}{\#_1 ! \#_2 ! \cdots \#_W!} \, .\end{aligned} $$

    (1.13)

    Then we can write the probability as the product of these,

    $$\displaystyle \begin{aligned} \text{(1.11)} = q_1^{\#_1} q_2^{\#_2} \dots q_W^{\#_W} \frac{\# !}{\#_1! \#_2! \dots \#_W! } \, . {} \end{aligned} $$

    (1.14)

    Then, we should look for q i that makes this value as large as possible. In machine learning, q i is varied to actually increase the amount equivalent to (1.14) as much as possible.⁵

    By the way, if the number of data is large (# ), by the law of large numbers,⁶ we have

    $$\displaystyle \begin{aligned} \frac{\#_i}{\#} \approx p_i \quad \Leftrightarrow \quad \#_i \approx \# \cdot p_i \, . {} \end{aligned} $$

    (1.17)

    Here, # i must also be a large value, so according to Stirling’s formula (which is familiar in physics), we find

    $$\displaystyle \begin{aligned} \#_i ! \approx \#_i^{\#_i} \, . {} \end{aligned} $$

    (1.18)

    By substituting (1.17) and (1.18) into (1.14), we can get an interesting quantity:

    $$\displaystyle \begin{aligned} \text{(1.14)} &amp;\approx q_1^{\# \cdot p_1 } q_2^{\# \cdot p_2 } \dots q_W^{\# \cdot p_W} \frac{\# !}{(\# \cdot p_1)! (\# \cdot p_2)! \dots (\# \cdot p_W)! } \\ &amp;\approx q_1^{\# \cdot p_1 } q_2^{\# \cdot p_2 } \dots q_W^{\# \cdot p_W} \frac{\#^\#}{(\# \cdot p_1)^{\# \cdot p_1} (\# \cdot p_2)^{\# \cdot p_2} \dots (\# \cdot p_W)^{\# \cdot p_W} } \\ &amp;= q_1^{\# \cdot p_1 } q_2^{\# \cdot p_2 } \dots q_W^{\# \cdot p_W} \frac{1}{p_1^{\# \cdot p_1} p_2^{\# \cdot p_2} \dots p_W^{\# \cdot p_W} } \\ &amp;= \exp{\Big[ - \# \sum_{i=1}^W p_i \log \frac{p_i}{q_i} \Big] } \, . \end{aligned} $$

    (1.19)

    The goal is to make this probability as close to 1 as possible, which means to bring

    $$ \sum _{i=1}^Wp_i \log \frac {p_i}{q_i} $$

    close to zero. This quantity is called relative entropy , and is known to be zero only when p i = q i. Therefore, bringing the original goal (1.11) as close to 1 as possible corresponds to reducing the relative entropy. We mean,

    $$\displaystyle \begin{aligned} \text{Relative entropy} = \text{Amount to measure how close the prediction}\ q_i \ \text{is to the truth}\ p_i \, . \end{aligned} $$

    The properties of the relative entropy will be explained later in this book. The relative entropy is called Kullback-Leibler divergence in information theory, and is important in machine learning as can be seen here, as well as having many mathematically interesting properties. The fact that (1.11) is approximately the same as the Kullback-Leibler divergence is one of the consequences from large deviation theory called Sanov’s (I. N. Sanov) theorem [12, 13].

    1.4 Machine Learning and Physics

    So far, we have briefly described the relationship between physics and information theory, and the relationship between machine learning and information theory. Then, there may be a connection between machine learning and physics in some sense. The aforementioned Fig. 1.1 shows the concept: physics and information are connected, and information and machine learning are connected. Then, how can physics and machine learning be connected?

    A thought experiment

    Suppose we have a fairy here. A button and a clock are placed in front of the fairy, and every minute, the fairy chooses to press the button or not. If the fairy presses the button, the machine outputs 1. If the fairy does not press the button, the output is 0. The result of a special fairy is the following sequence:

    $$\displaystyle \begin{aligned} \{1, 0,0,0,0,0,0,0,0,1,1,0,0,1,1, \dots \} . \end{aligned} $$

    (1.20)

    The following figure shows the result of this monotonous job for about 4 hours, displayed in a matrix of 15 × 15:

    ../images/498087_1_En_1_Chapter/498087_1_En_1_Figa_HTML.png

    We let the fairy to continue its work all night for five days. The result is:

    ../images/498087_1_En_1_Chapter/498087_1_En_1_Figb_HTML.png

    It looks like a person’s face.⁷ The special fairy had the goal of drawing this picture by pressing/not-pressing buttons. This may seem like a trivial task, but here we explain a bit more physically that it is not.

    First, consider the random arrangement of 0 and 1 as the initial state. With this we do not lose the generality of the problem. The fairy decides whether to replace 0/1 with 1/0 or leave it as it is at the time (x, y) (coordinates correspond to the time), by operating the button. The fairy checks the state of the single bit at time (x, y) and decides whether or not to invert it. Yes, the readers now know that the identity of this special fairy is Maxwell’s demon .

    From this point of view, the act of drawing a picture is to draw a meaningful combination (low entropy) from among the myriad of possibilities (high entropy) that can exist on the canvas. In other words, it is to make the information increase in some way. You may have heard news that artificial intelligence has painted pictures. Without fear of misunderstanding, we can say that it means a successful creation of a Maxwell’s demon, in the current context.

    The story of Maxwell’s demon above was a thought experiment that suggested that machine learning and physics might be related, but in fact, can the subject of this book, machine learning and physics, be connected by a thick pipe?

    In fact, even in the field of machine learning, there are many problems that often require physical sense and there are many research results inspired by it. For example, in the early days of deep learning, a model (Boltzmann machine) that used a stochastic neural network was used, and this model was inspired by statistical mechanics, especially the Ising model that deals with spin degrees of freedom. A neural network is an artificial network that simulates the connection of neurons in the brain, and the Hopfield model, which provides a mechanism of storing memory as a physical system, can be said to be a type of Ising model. Focusing on these points, in this book we describe various settings of neural networks from statistical mechanics.

    In Boltzmann machine learning and Bayesian statistics, it is necessary to generate (or sample) (pseudo-)random numbers that follow some complicated probability distribution. It is a popular method in numerical calculations in condensed matter physics and elementary particle physics, so there is something in common here.

    Sampling takes too long for complex models, so in recent years the Boltzmann machine has been replaced by neural networks and is not attracting attention. However, the academic significance may yet be recealed, so it would be a good idea not only to follow the latest topics of deep learning, but also to return to the beginning and rethink the Boltzmann machine.

    Today, deep learning is simply a collection of various techniques that have been found to work empirically when applying deep neural networks to various machine learning schemes. However, the mind that senses any technique to work empirically has something close to the so-called physical sense.¹⁰ For example, in recent years, it has been known that a model called ResNet, which includes a bypass in a neural network, has a high capability. ResNet learns the residual of the deep neural network, rather than learning directly the features that would have been acquired by the ordinary deep neural networks. The features are the quantities important for performing the desired task. In the ResNet, the residuals are accumulated to express the features. This is expressed as a relationship differential = residual and integral = feature, and is reminiscent of the equation of motion and its solution.

    In this sense, machine learning and physics seem to have a certain relationship. To actually try the translation, we need to create a corresponding dictionary such as Table 1.1. Returning to Fig. 1.1, combined with the many fragmentary relationships listed above, machine learning and deep learning methodologies can be viewed and constructed from a physical perspective.

    Table 1.1

    Terms and notations used in machine learning related fields and their counterparts in physics

    With this in mind, this book is aimed at:

    $$\displaystyle \begin{aligned} &amp;\text{1. Understanding machine learning methods from a physics perspective} \\ &amp;\text{2. Finding the link between physics problems and machine learning methods} \end{aligned} $$

    In this book, the first half is Introduction to Machine Learning/Deep Learning from the Perspective of Physics, and the second half is Application to Physics. Although there are many topics which are not introduced in this book, we have tried to

    Enjoying the preview?
    Page 1 of 1