Python Data Science 2024 - Explo - Wilson, Stephen
Python Data Science 2024 - Explo - Wilson, Stephen
Python Data Science 2024 - Explo - Wilson, Stephen
By
Stephen Wilson
Copyright 2024 © by Stephen Wilson
All rights reserved.
No a part of this book may be reproduced or
transmitted in any kind or by any suggests that
electronic or mechanical, as well as photocopying,
Recording or by any data storage.
TABLE OF CONTENTS
PART I: FUNDAMENTALS OF DEEP LEARNING
FUNDAMENTALS OF PROBABILITY
FORECASTING DEMAND
ARTIFICIAL INTELLIGENCE
DEEP LEARNING
FUNDAMENTALS OF STATISTICS
FUNDAMENTALS OF LINEAR ALGEBRA
MACHINE LEARNING AND DEEP LEARNING
FUNDAMENTALS OF MACHINE LEARNING
FUNDAMENTALS OF NEURAL NETWORKS AND DEEP LEARNING
A COMPUTER LIKE A BRAIN
THE ABILITY TO LEARN
HOW TO DO ERROR CORRECTION
CONTINUOUS IMPROVEMENT TO LEARN
DEEP TROUBLE
NEURAL NETWORK APPLICATIONS
WHAT CAN WE USE THESE SYSTEMS FOR?
THE FUTURE OF NEURAL NETWORKS
DEEP LEARNING PARAMETERS AND HYPER-PARAMETERS
WHAT ARE HYPER PARAMETERS?
TYPES OF HYPERPARAMETERS
CONTINUOUS HYPERPARAMETERS
MEDIAN DETENTION DIRECTIVE
HOW TO MAKE CONFIGURATION OF THE EXPERIMENT
DEEP NEURAL NETWORKS LAYERS
DL MAPPING
DEEP NETWORKS AND SHALLOW NETWORKS
CHOOSE A DEEP NETWORK
DEEP BELIEF NETWORKS - DBNS
ANTAGONISTIC GENE NETWORKS - GANS
RECURRENT NEURAL NETWORKS - RNNS
DEEP LEARNING ACTIVATION FUNCTIONS
NONLINEAR ACTIVATION FUNCTION
TYPES OF NONLINEAR ACTIVATION FUNCTIONS
CONVOLUTIONAL NEURAL NETWORK
FEATURES AND CNNS BENEFITS
NLP
PART II: DEEP LEARNING IN PRACTICE (IN JUPYTER NOTEBOOKS)
WHAT DOES PYTHON COST AND WHERE DO I GET IT?
HOW TO WRITE INTERACTIVELY
PYTHON DATA STRUCTURES
INSTALLING PYTHON
INSTALL PYTHON IN WINDOWS
INSTALL PYTHON: OS X
INSTALLING PYTHON IN LINUX
CONCLUSION
PART I: FUNDAMENTALS OF DEEP LEARNING
In recent years, the increasing number of software technologies
shaped under three specialized topics such as artificial intelligence,
machine learning and deep learning has led to an increasing interest
in these three subjects.
Even though the specialists working in these fields have no longer
reached consensus about these words, new principles emerge every
day about these three troubles which are occasionally used
interconnected and often used in one-of-a-kind meanings.
Artificial intelligence, which directly enters the centre of our lives with
advancing technology, machine learning that regulates the decision
mechanism of computers, and deep learning examining the data
analysis and algorithms underneath all of these, is progressing
towards becoming a common future field of study for many
disciplines. As it is an older term in general, these studies, which are
referred to as artificial intelligence, directly affect the course of
technology.
We are now able to program devices with up to a certain level of
processors with over 100 programming languages. These devices
execute the commands they receive from us through their systems in
the same period they receive from us. The biggest share in the
advancement of technology is called the programmable feature,
which has a flawless operation. However, no matter how much this
talent develops, no matter how artificial intelligence enters into our
lives, it is defeated every time against the human brain.
For example, it can easily understand and solve structures
consisting of repetitions, control groups or mathematical operations;
the current technology remains inadequate in uncertain expressions
that need logic. In this way, Deep Learning, which is essential to
teach learning methods to this system, manifests itself.
Starting from artificial intelligence learning, first of all, from data
processing to algorithms and data sets to machine learning, image
processing and analysis, which is called the highest point of artificial
intelligence, and deep learning that includes all of these, is due to all
these reasons and needs. In modern years it has become a popular
topic.
FUNDAMENTALS OF PROBABILITY
Fundamentals of Probability: Exploring the Basics
Probability, a fundamental concept in mathematics and statistics,
provides a framework for understanding uncertainty and
randomness. Whether applied in games of chance, statistical
analysis, or real-world decision-making, probability serves as a
cornerstone in numerous fields. Let's delve into the key
fundamentals of probability to grasp its foundational principles.
The process of benefiting from science in decision-making
processes that started with probability theory continued with the
extraction of certain parameters to summarize the data. Sampling
characteristics of the available data, such as mean value and
variance, were used to test the hypothesis with probability
distribution functions from probability theory. Today, techniques that
are still used in many decision-making processes have emerged,
including what some of us know as A / B testing.
1. Definition of Probability:
3. Types of Probability:
FORECASTING DEMAND
Forecasting demand in the realm of deep learning involves
predicting the need for resources, expertise, and technologies
associated with the rapidly evolving field of artificial intelligence (AI).
As we look ahead, several key factors will shape the demand for
deep learning applications and services:
The basis for decision-making is the prediction. Estimation is
possible by generating a mathematical model as input (s / s) from
inputs and inputs and then using this model in decision making. The
first models were simple and naturally linear because they answered
the current problems. This meant that the inputs were produced by
adding and/or subtracting each other to produce output. Therefore,
Regression method can be considered as a problem of passing a
straight equation from the points in the data. This method is, of
course, nonlinear known as non-linear multiplication and division
operations are also used to bring the transition to functions. In this
way, we have the opportunity to pass a curve from the points at hand
and make our estimation more accurate accordingly. Using the
probability distribution function in this process was necessary to
model the chance factor in the problem.
1. Industry Adoption:
The increasing adoption of deep learning across various industries,
including healthcare, finance, manufacturing, and technology, will
drive demand. Businesses will seek to harness the power of deep
learning for tasks such as image recognition, natural language
processing, and predictive analytics.
2. Technological Advancements:
Ongoing advancements in deep learning technologies, including
improvements in neural network architectures, algorithms, and
hardware accelerators, will fuel demand. As the field progresses,
businesses will seek to leverage the latest innovations to stay
competitive.
3. Data Growth and Complexity:
The ever-growing volume and complexity of data will contribute to
the demand for deep learning solutions. Businesses are likely to
invest in technologies that can extract meaningful insights from large
datasets, driving the need for sophisticated deep learning models.
7. AI in Healthcare:
The healthcare industry's increasing reliance on AI and deep
learning for tasks like medical imaging analysis, drug discovery, and
personalized medicine will contribute significantly to the demand.
This sector is likely to seek solutions that enhance patient care and
streamline operations.
9. Regulatory Landscape:
Evolving regulatory frameworks and ethical considerations
surrounding AI and deep learning will influence demand.
Organizations will need to comply with regulations while developing
and deploying deep learning solutions, leading to a demand for legal
and ethical expertise in AI.
ARTIFICIAL INTELLIGENCE
In the 1950s, academic circles engaged in artificial intelligence, while
dealing with algorithms for learning and problem solving of a
computer, were pondering on finding solutions without knowing the
distribution function of data at hand. Living things could make
decisions without knowing statistics. During these periods, two basic
methods came to the fore. Artificial neural networks and decision
trees. Neither of these methods was based on statistics. In artificial
neural networks, the structure of the neural cells was simulated, and
a layered network was formed. In this structure, the enter layer was
referred to as the enter layer, and the output layer was once referred
to as the output layer. Layers hidden between these two are hidden
layers. I have. The length of the late Nineteen Eighties to the early
2000s is remembered as the golden age. At the end of this period,
the complexity of the system increased, and the results could not be
improved if the hidden layers exceeded four to five layers. In a
sense, the process of producing economically has slowed down
even if it does not stop. Although decision trees give good results in
the application of certain problems, the problems resulting from the
increase in data size have been applied to limited problems because
they are not very successful in algorithmic solution.
Although some advanced nonlinear methods have been produced in
the world of statistics, no general progress has been made which
can be applied to the problems at hand. For advanced methods,
nonlinear methods related to time series can be examined.
Again, the new re-use of multi-dimensional spaces in a different way.
At the point where Artificial Neural Networks (ANNs) remained in the
late 1990s, a so-called Support Vector Machine became a promising
method of dealing with the complexity that ANN could not cope with.
In this method, complex problems, space structures in mathematics
and functions that allow the transition between these spaces were
used to get rid of complexity. Of course, artificial neural networks
also required processing in multi-dimensional spaces, but there was
no transition between spaces. According to these very complex
mathematical methods, such as doors used in a space size in the
film pass, kernel function (kernel functions) have been introduced
and taken to resolving the many problems. The main reason for this
is AA complex problem in space, which requires a nonlinear solution,
can be handled as a linear problem in space B passed using kernel
functions. In this way, it was possible to use linear methods in this
space. The reflection of this to commercial life is that some products
developed using SVM allows the user to solve complex classification
problems using SVM. As with artificial neural networks, the failure of
the method to elaborate on the decision by the user posed a serious
obstacle to its spread.
This was the general picture in the mid-2000s, but especially with the
wave spread of big data, these methods were far from meeting the
need.
Julia
What is being done in the software world to use these methods,
which require operations with very large matrices? These models are
powered by existing big data technologies. However, new software
libraries have been developed and used to deal with huge matrices.
Even site-specific software languages are being developed. One of
these is Julia. A candidate to become one of the languages that will
make its name more and more popular.
In particular, we distinguish:
The standard 2 or Euclidean distance:
Interpolation
Given a history of points, interpolating means determining the
function that passes through all these points. In general, we use
polynomial interpolation. This could be a regression method.
However, this method is avoided as it very often leads to over-
learning.
Supervised learning with equations
To evaluate the performance of a classifier f, we first define a cost
function L that will give a measure of the error between the
prediction f (x) and the real value y.
In general, the cost function is chosen as the square of the
difference to the result to be predicted; this is the least squares
approximation:
Linear regression
Simple regression - x is a scalar
Linear regression is a supervised learning method for estimating f
assuming f has a linear form:
Thus, with the vector notation, one can see that it is easy to
generalize most of the methods applied to scalars. However, the
calculation time is more important because there are more
coefficients to evaluate (p + 1 coefficient instead of 2).
Multiple regressions, matrix notation
In the literature, one often finds these equations put in matrix form to
have condensed writing.
Let X be the matrix of n vectors x, Y the matrix of n vectors y. Then A
turns into a matrix, and we can write:
Troubleshooting
Machine studying and lookup is intellectual discovery. We have
witnessed a super acceleration in the closing five years compared to
the ultimate 10-20 years. Machine studying feeds on samples.
Instead of writing five hundred thousand traces of code, it is
furnished to examine with the aid of staring at the world. Google
succeeded in photo detection. He finds the elements of the pictures.
When he sees the cake and the boy, he is aware of it is a birthday.
Today we can chat with Google Example: When we ask MI Is there a
Mexican restaurant in Istanbul? Uz, we can see the Top 10 Mexican
Restaurants.
DEEP LEARNING
Neural networks are the subsequent massive aspect when it comes
to heavy calculations and clever algorithms. This is how they work
and why they are so amazing.
If you continue with technical news, you've probably encountered the
concept of neural networks (also called neural networks).
In 2016, Google's AlphaGo neural network hit one of the best
professional Go players in the world in a 4-1 series. YouTube also
announced that they would use neural networks to better understand
their video clips. YouTube will use neural networks to truly
understand video clips. YouTube will use neural networks to truly
understand videos. Searching on YouTube can be frustrating, as
YouTube does not see the video the same way a person does.
Recently, Google adds a patent that can change it. Dozens of other
stories can be remembered.
But what specifically is a neural network? Whence does it work? And
why is it so common in machine learning?
Output nodes work in the same way as hidden layers: output nodes
sum the input from the hidden layer, and if they reach a certain
value, output the output nodes and send specific signals. At the end
of the process, the output layer sends a set of signals indicating the
result of the input.
While the community proven above is simple, deep neural networks
can have many hidden layers and lots of nodes.
Picture credit: Neural networks and deep learning by
In this case, batch_size take one of the values [16, 32, 64, 128] and
number_of_hidden_layerstake one of the values [1, 2, 3, 4].
Advanced discrete hyperparameters can also be specified through a
distribution. The following distributions are supported:
CONTINUOUS HYPERPARAMETERS
Continuous hyperparameters are specified as distribution through a
continuous range of values. The supported distributions are:
Random sampling
In random sampling, the hyperparameter grades are randomly
selected from the specified search space. Random sampling
provides the search space to include discrete and continuous
hyperparameters.
Grid sampling
Grid sampling performs a simple grid search for all possible values
of the defined search space. It can only be used with
hyperparameters specified with. For example, the following space
has six samples in total: choice
Bayesian sampling
Bayesian sampling is centred on the Bayesian optimization algorithm
and makes intelligent decisions about the hyper parameter values to
be sampled next. The sample is taken based on how the previous
samples have been executed, in such a way that the new sample
improves the notified main metric.
When Bayesian sampling is used, the number of simultaneous runs
affects the effectiveness of the adjustment process. Normally, a
smaller number of simultaneous executions can lead to a better
convergence of sampling, given that the lower degree of parallelism
increases the number of executions that benefit from previously
completed executions.
Bayesian sampling only supports distributions choice and uniform in
the search space.
Note
Bayesian sampling does not hold any early termination policy (see
the specification of an early termination directive). When using
Bayesian parameter sampling, set the value to, or do not add the
parameter early_termination_policy = Noneearly_termination_policy.
Bandit Directive
Bandit is a termination directive based on the factor of delay or the
amount of delay and the evaluation interval. The directive terminates
in advance those executions in which the main metric is not within
the factor of delay or amount of delay specified concerning the series
of workouts with the best performance. This directive takes the
following configuration parameters:
slack_factoror slack_amount: the allowed delay concerning the
series of workouts with the best performance. Specifies the allowed
delay as a relation. Specifies the allowed delay as an absolute
quantity, rather than a relation. slack_factor slack_amount
For example, imagine that a bandit policy is applied in interval 10.
Suppose that the execution with the best performance in interval 10
reports the main metric of 0.8 to maximize it. If the directive were
specified with a parameter of 0.2, those training series whose best
metric in interval 10 is less than 0.66 (0.8 / (1 + )) would end. If, on
the other hand, the directive was specified with a parameter of 0.2,
those training series whose best metric in the interval 10 is less than
0.6 (0.8 - ) will end. slack_factorslack_factor
slack_amountslack_amount
Policy=None
Default policy
If no directive is specified, the hyperparameter adjustment service
will allow all training series to run until completed.
Note
If you are looking for a conservative policy that provides savings
without completing promising jobs, you can use a Median Detention
Directive with evaluation_interval1 and delay_evaluation5. It is a
conservative configuration that can provide savings of between 25%
and 35% without loss of the main metric (according to our evaluation
data).
Resource allocation
Control the resource budget for the hyperparameter adjustment
experiment by specifying the maximum total number of training runs.
Optionally, specify the maximum duration for the hyperparameter
adjustment experiment.
max_total_runs: maximum total number of training executions that
will be created. Upper limit: for example, you can have fewer
executions if the hyperparameter space is finite and has fewer
samples. It must be a number between 1 and 1000.
max_duration_minutes: maximum duration in minutes of the
hyperparameter adjustment experiment. The parameter is optional
and, if it exists, all series that might be running after this duration is
automatically cancelled.
Note
If specified max_total_runsand max_duration_minutesthe
experiment hyperparameter adjustment ends when the first of these
two thresholds are reached.
Also, specify the maximum number of training series that will be
executed at the same time during the hyperparameter adjustment
search.
max_concurrent_runs: maximum number of series that will be
executed at the same time at a given time. If you do not specify any,
the number of executions will start in parallel. If specified, the value
must be a number between 1 and 100. max_total_runs
Note
The number of simultaneous executions is determined by the
resources available in the specified process destination. Therefore,
you must ensure that the process destination has the resources
available for the desired simultaneity.
Assign resources for the adjustment of hyperparameters:
max_total_runs=20,
max_concurrent_runs=4
In this code, an adjustment experiment of hyperparameters is
configured to use a maximum of 20 total executions, in such a way
that 4 configurations are executed at the same time.
HOW TO MAKE CONFIGURATION OF THE
EXPERIMENT
Configure the hyperparameter adjustment experiment using the
hyperparameter search space defined, the early termination
directives, the main metrics, and the allocation of resources that
have already been seen in the previous sections. Also, provide a
parameter that will be invoked with the sampled hyperparameters.
The parameter describes the training script to be executed,
resources per job (one or more GPUs) and the process destination
to be used. Since the available resources determine the simultaneity
in the hyperparameter adjustment experiment, make sure that the
process destination specified in has sufficient resources for the
desired simultaneity. estimator estimator estimator (For more
information on estimators, see how to train models ).
Configure the hyperparameter adjustment experiment:
In this code, a table with details about the training series of each of
the hyperparameter configurations is shown.
You can also observe the performance of each of the executions as
the training progresses.
DL MAPPING
Neural networks are purposes that have inputs like x1, x2, x3 ... that
are transformed into outputs like z1, z2, z3, etc. in two (shallow
networks) or several intermediate operations also called layers (deep
networks).
The weights and biases change from one layer to another. "W" and
"v" are the weights or synapses of the layers of the neural networks.
The great case of the usage of deep mastering is the hassle of
supervised learning. Here, we have a massive set of records inputs
with a favored set of outputs.
Backpropagation algorithm
Here we apply the backward propagation algorithm to obtain the
correct output prediction.
The most basic deep-learning data set is the MNIST, a set of data
from handwritten digits.
We can deeply train a convolutional neural network with Keras to
classify images of handwritten digits from this set of data.
The triggering or activation of a neural network classifier provides a
score. For instance, to classify patients as sick and healthy, we
consider parameters such as height, weight and body temperature,
blood pressure, etc.
A high score means that the patient is sick, and a low score means
that they are healthy.
Each node in the output and hidden layers has its classifiers. The
input layer takes inputs and passes its scores to the next hidden
layer for additional activation, and this continues until the output is
reached.
This input to output progress from left to right in the forward direction
is called forward propagation.
The credit allocation path (CAP) in a neural network is the series of
transformations that start from the entrance to the exit. CAPs make
probable causal connections between entry and exit.
The depth of the CAP for a given direct-feed neural network or the
depth of the CAP is the number of hidden layers plus one as the
output layer is included. For recurrent neural networks, where a
signal can propagate through a layer several times, the depth of the
CAP can be potentially unlimited.
Non-linearity
Let's start by briefly seeing the difference between a linear and
nonlinear activation function:
The function ReLU is interpreted by the formula: f (x) = max (0, x). If
the input is invalidating, the output is 0, and if it is negative, then the
output is x. This activation function greatly increases network
convergence and does not saturate.
But the ReLU function is not perfect. If the input value is negative,
the neuron remains inactive, so the weights are not updated, and the
network does not learn.
Leaky ReLU
Parametric ReLU
The idea of the Leaky ReLU function can be further expanded.
Instead of multiplying x by a constant term, we can multiply it by a
hyperparameter that seems to work better than the Leaky ReLU
function. This extension to the Leaky ReLU function is known as
parametric ReLU.
The parametric ReLU function is interpreted by the formula: f (x) =
max (ax, x) where "a" is a hyperparameter. This gives neurons the
ability to choose which slope is best in the negative region. With this
capability, the parametric ReLU function can become a classic ReLU
function or a Leaky ReLU function.
It will be most often preferable to use the function ReLU, its two
other versions (Leaky ReLU and parametric ReLU) are experiments
without real added value.
Over the years, various functions have been used. In this section,
only the main ones have been mentioned. At present, there is still a
lot of research to be done to find an appropriate activation function
that allows the neural network to learn more efficiently and quickly.
CONVOLUTIONAL NEURAL NETWORK
The visual cortex of vertebrates directly inspires convolutional neural
networks. A network of convolutional neurons also called convent
(for "Convolutional Network"), or CNN (for "Convolutional Neural
Network").
We distinguish two parts, a first part which is called the convolutive
part of the model and the second part, which one will call the
classification part of the model which corresponds to a model MLP
(Multi Layers Perceptron).
Features - features.
CNN compares images fragment by fragment. The fragments he
looks for are called features.
The features correspond to pieces of the image.
By finding approximate features that are roughly similar in 2 different
images, the CNN is much better at detecting similarities than by a full
image-to-image comparison.
Convolution
When introduced consisting of a new image, the CNN no longer
recognizes precisely if the facets will exist in the photograph or the
place they ought to be, so it seeks to locate them in the complete
photograph and any position.
By calculating in the whole image if a characteristic is present, we do
filtering. The mathematics we use to do this is called convolution,
from which the convolutional neural networks take their name.
To calculate the correspondence between a characteristic and a sub-
part of the image, it is sufficient to multiply each pixel of the
characteristic by the value that this same pixel contains in the image.
Next, add the responses and divide the result by the total number of
pixels in the feature. If the 2 pixels are white (value 1) then 1 * 1 = 1.
In all cases, each matching pixel results in 1. Similarly, each shift
gives -1.
If all the pixels in a characteristic match, then their addition and then
their division by the total number of pixels gives 1. In the same way,
if none of the pixels of the characteristic correspond to the sub-part
of the image, then the answer is -1.
The next step is to repeat the complete convolution process for each
of the other existing features. The result is a set of filtered images,
each image of which corresponds to a particular filter.
It is advisable to consider this set of convolution operations as a
single processing step: In convolutional neural networks, this is
called the convolutional layer, which suggests that there will
eventually be other layers added to them.
Although the principle is relatively simple and we can easily explain
our CNN on the back of a towel, the number of additions,
multiplications and divisions can quickly accumulate. In
mathematical terms, they increase linearly with the number of pixels
in the image, with the number of pixels of each characteristic.
With so numerous factors, it is very easy to make this problem
infinitely more complex. It is therefore not surprising that
microprocessor manufacturers now manufacture specialized chips in
the type of operations required by CNNs.
Pooling layer
Pooling is a method of taking a large image and reducing its size
while preserving the most important information it contains. The
mathematics behind the concept of pooling is again not very
complex. Indeed, just drag a small window step by step on all parts
of the image and take the maximum value of this window at each
step. In practice, a window of 2 or 3 pixels is often used and a value
of 2 pixels for the value of a step.
Pooling: reduce the stack of images
Choose a step (usually 2).
Browse your window through your filtered images.
From each window, take the maximum value.
After pooling, the image has only a quarter of its starting pixels.
Because it keeps at each step the maximum value contained in the
window, it preserves the best features of this window. This means
that he does not care where the feature was extracted from the
window.
The result is that the CNN can find if a feature is in an image,
regardless of where it is. This helps in particular to solve the problem
related to the fact that computers are hyper-literary.
It is a tool not sexy but fundamental because, without it, the CNN
would not produce the results we know it.
The result of a ReLU layer is the same size as what was input, with
just all the negative values removed.
The exit of one becomes the entry of the next.
Deep Learning
You will probably have noticed that what you give at the input to
each layer (i.e. 2 Dimensional arrays) is very similar to what you get
at the output (other 2 Dimensional arrays). For this reason, we can
add them one by one as we would with Legos.
The raw images are filtered, rectified and pooled to create a set of
shrunk and feature-filtered images visible in each image. These can
be filtered and contracted once more and again. Each time, the
points emerge as larger and greater complex, and the snapshots
grow to be greater compact. This permits the decreased layers to
signify easy components of the image, such as mild edges and dots.
The upper layers represent much more complex aspects of the
image, such as shapes and patterns. These tend to be easily
recognizable. For example, in a CNN trained to recognize faces, the
upper layers represent patterns and patterns that are part of a face
(1).
Fully connected diapers
CNNs have another arrow in their quivers. Indeed, the fully
connected layers take the high-level filtered images and translate
them into votes. In our example, we only have to decide between
two categories, X and O.
Fully connected layers are the main building blocks of traditional
neural networks. Instead of treating inputs as 2-dimensional arrays,
they are treated as a single list and treated identically. Each value
has its vote as to whether the image is an X or an O. However, the
process is not completely democratic. Some values are much better
to detect when an image is an X than others, and others are much
better at detecting an O. They, therefore, have more voting power
than others. This vote is called the weight, or the strength of the
connection, between respective value and each category.
When a new image is shown to the CNN, it spreads through the
lower layers until it reaches the fully connected final layer. The
election then takes place. And the solution with the most votes wins
and is declared the category of the image.
The fully connected layers, like the other layers, can be added one
after the other because their output value (a list of votes) is very
similar to their input value (a list of values). In practice, several fully
connected layers are often added one after the other, with each
intermediate layer voting for ghost "hidden" categories. Indeed, each
additional layer lets the network learn more complex combinations of
features that help improve decision-making.
Retro propagation
Hyperparameters
Unfortunately, not all aspects of CNN's are as intuitive to learn and
understand as we have seen so far. Thus, there is always a long list
of parameters that must be set manually to allow CNN to have better
results.
For each convolutional layer, how many characteristics should one
choose? How many pixels should be considered in each feature?
For each Pooling layer, which window size should we choose? What
not?
For each additional fully connected layer, how many hidden neurons
should one define?
Furthermore, to these parameters, there are also other architectural
elements of a higher level to consider: How many layers of each type
should one include? In which order? Some models of deep learning
can have more than a hundred layers, which makes the number of
possibilities extremely important.
With so many possible combinations and permutations, only a small
fraction of the possible configurations have been tested so far. The
different designs of CNNs are generally motivated by the knowledge
accumulated by the scientific community, with occasional gaps
allowing surprising improvements in performance. And although we
have presented the main building blocks for the construction of
CNNs in a relatively simple and intuitive way, there are a large
number of variations and settings that have been tested and have
yielded very good results, such as new types of layers and more
complex ways of connecting layers in between.
Beyond the images
Although our example with Xs and Os involves the use of images,
CNNs can also be used to categorize other types of data.
The trick is: whatever type of data you are processing, transform that
data to make it look like an image.
For example, the audio signals can be broken down into a set of
smaller pieces of shorter duration and decompose each of these
pieces into a band of low, medium, high or even thinner frequencies.
This can be represented by a 2-dimensional array where each
column is a block of time, and each line is a frequency band. The
"pixels" of this false image that are close to each other are closely
related. CNNs work well in these cases.
The scientists were particularly creative. In a similar approach, they
have adapted textual data for natural language processing and even
chemical data for drug discovery.
A particular example of data that does not fit this type of format is all
that is "customer data", where each row of a table represents a
customer, and each column represents information about that
person, such as its name, his address, his email address, his
purchases and his browsing history. In this situation, the location of
rows and columns does not matter. Lines can be re-arranged, and
columns can be reordered without losing the importance of the data.
On the other hand, rearranging the rows and columns of an image
makes it completely intractable.
A general rule: if your data remains as exploitable after exchanging
some columns between them, you probably cannot use the
convolutional neural networks. Nevertheless, if you can make your
problem look like looking for patterns in an image, CNNs can be
exactly what you need.
NLP
Application in the automatic processing of natural language
(NLP)
Instead of picture pixels, most NLP functions have sentences or
documents as entries.
The idea for applying a CNN (see any RN) is to transform words and
documents into a matrix form. Each row of the matrix matches to a
token, usually a word, but it can also be a character. That is, each
line is a vector that represents a word.
In general, these vectors are word inclusions (small representations)
such as word 2 vice or Glove.
For example, for a 20-word phrase using a 100-dimensional
embedding, we would have a 20 × 100 matrix as input. This is our
"image" for CNN.
Convolutional filtering in image processing consists of dragging a
window representing the feature on the image and calculating the
convolution product between the feature and each portion of the
scanned image. However, in NLP, we usually use filters that slide
over complete rows of the matrix (the words). Thus, the "width" of
the filters is generally the same as the width of the input matrix. The
height, or size of the area, may vary, but slippery windows of more
than 2 to 5 words at a time are typical.
PART II: DEEP LEARNING IN PRACTICE (IN
JUPYTER NOTEBOOKS)
Python is a powerful computer language that is extremely effective
for internet development and web-based applications.
Why in the world would you be involved in Python when there is
already a long line of really good old and well-functioning computer
languages to go to? As an example, let’s mention:
with Entry:
>>> from Tkinter import Entry
>>> child = Entry ()
>>> child.pack ()
or with Text (and a variety of other objects):
GET IT?
Python is free and available for the vast majority of operating
systems. If you do not have the latest version, you can always
download it at http://www.python.org . Installing Python is very easy
if the language is not already installed on your computer. However,
this is usually the case if you use SuSE, RedHat or Mandrake.
There is a very large development of Python in progress. Only the
ca. 3 months, it has taken me to write this book, there have been 2
full updates of the language (the released versions are 2.2 and 2.3).
But not only that, but there are also updates for the latter without
changing the version name. Right now, October 16, 2003, version
2.3.2 is the current one. In version 2.4, there should be quite a lot of
changes, so Python can do even more (e.g. processing 64-bit
character code).
Guide to Python documentation
If you are completely new to programming, it can be difficult to
understand how the documentation of a programming language is
screwed up. But once you can figure it out, it is often and often much
easier to get help in the documentation of the language than having
to ask for help for a specific function on a mailing list.
Documentation for python can be found online at
http://www.python.org/doc/ . Here you can also find guides and
HowTo's for various topics. The documentation is divided into 7
categories: Module overview, tutorial, library reference, Macintosh
reference, language reference, language extensions and a Python /
C API. Explanation of the categories:
Immediate mode
From old DOS, you may know the prompt dose c: In Python, >>> is
used as a corresponding clear message. A message that Python is
in Immediate or as it is also called interactive mode. It is from here
that you start your Python programming, but certainly, also the
mature Python programmer returns to and uses, for example. When
testing new program bits.
Comments
When you program, from time to time, you need to insert comments -
parts of the program that you and others can read, but the computer
language skips. In Python, # (the hash sign - in Danish called garden
gate or steak sign) marks. A comment begins at the garden gate and
continues the program line. Ex. 1:
>>> # Python runs over comments
Comments are for you and not for the computer.
It is also possible to print the hash sign if it is part of a text string. Ex.
2:
>>> print "Here we print a hash sign #"
In this example, there were NO comments. The steak sign is here a
part of a text string. One such Python should like to respond to and
not bypass. Note that the steak sign is printed here, which it did not
in the example where it served its proper purpose.
>>> 25/24
1
25 divided by 24 should give 1.0416666666666667, but Python
automatically descends downwards to the nearest integer, which in
this case is 1.
>>> 25 / 24.0
1.0416666666666667
To get the decimal part returned (printed), it is necessary that at least
one of the numbers is a decimal number (a floating point number).
In Python, there is full support for floating point numbers. Operators
with mixed type operants convert integers (integer) operant into
floating point numbers (in English sentence):
Variables
A variable is a pointer pointing to several addresses in the
computer's memory (RAM), where data, there may be several
things, for example. Numbers and text strings are placed. The
variable must be assigned a name that must not contain the Danish /
Norwegian special characters ÆØÅ and æøå. It should be seen that
some versions of version 2.3 have been able to use the mentioned
characters.
Statement of variables
When the programmers had to make calculations on Dask,
Denmark's first computer, they had to think in zeros and a number. It
is very inconvenient for humans, so the numbers were replaced by
words - initially 3-letter words. It was a great relief for the
programmers. Now variable and other names can usually have up to
256 letters or numbers just should the first character of a variable
name be a letter that must not be a Danish / Norwegian special
character æøå and ÆØÅ. In Python, there is nothing wrong with
that, because the first character can be an underscored sign, but it
can be a really bad solution since predefined names etc. can easily
be overwritten. In Python, variables should not be declared. The
declaration is automatic. BUT NOTE Python is case-sensitive, i.e.
variables a and A are two different variables and will be interpreted
as such. Of course, the same also applies to all other names (on
lists, tuples, etc.)
# First, the x price is assigned zero, then the y cost ix and the z cost
in y are assigned.
x = y = z = zero >>> x
0
>>> y
0
>>> z
NOTE: In Python, use == for equal to and = for assign. Thus, there is
a marked difference between x == y and x = y
Variables can be overwritten
Global variables can be overwritten:
Variables address
The type of variables can be found with type:
>>> a = 25
>>> type (a)
<type 'int'>
>>> a = 12.4
>>> type (a)
<type 'float'>
>>> a = "rowan"
>>> type (a)
<type 'str'>
Variables address
Variables address in RAM can be found with id
>>> a = 12.4
>>> id (a)
136095956
>>> a = 23
>>> id (a)
135591272
>>> a = "rowan"
>>> id (a)
1078892096
The following print shows that the same letter gives the same
address:
>>> for i in range (0.18):
... print s [i], id (s [i])
...
D 1078892160
e 1076630176
t 1076645984
t 1076645984
e 1076630176
1078892192
e 1076630176
r 1076518048
1078892192
e 1076630176
n 1076614816
1078892192
s 1076613536
t 1076645984
r 1076518048
e 1076630176
n 1076614816
g 1076690208
>>> s = "AABBCCaabbcc"
>>> for i in range (0, len (s)):
... print s [i], id (s [i])
...
A 1078892448
A 1078892448
B 1078892384
B 1078892384
C 1078892608
C 1078892608
a 1076679360
a 1076679360
b 1076679584
b 1076679584
c 1076690176
c 1076690176
Variables in Python
That Python is a fairly new computer language; it marks the
characteristics and application of variables, among others. In the
traditional computer languages such as C, C ++, Pascal, Delphi,
Visual Basic, Java and others, variables must be what you call
strongly typed, ie. they must be defined to contain quite specific
types of values, for example. Integers or text strings and only the
types thus defined must be stored in the current variables. This is not
the case with Python. Here, the content of the variables is
automatically moved to other addresses in the warehouse, if the
redefining is necessary, among other things. Because all kinds of
variables do not take up much space in the warehouse.
Ex. 1:
>>> # Here 2 things happen: 1: the variable a is declared (is
declared) and
>>> # 2: is also assigned the value "Text string" ie it is set to
>>> # to point to the addresses in the computer's warehouse, where
"Text string"
>>> # is stored.
>>> a = "Text string"
>>> a
text string
>>> # Now the same variable is assigned an integer (integer) as
value.
>>> a = 25
>>> print a
25
Ex. 2:
>>> a
'Text string 25'
Ex. 1:
>>> 2 == 3
0
>>> 2 == 2
1
Terms & Results
Now we can go ahead and then test the values of truth values in
mathematical terms:
>>> a == 5
1
>>> a == 7
0
>>> b == 7
1
>>> b == 5
0
>>> a == 6 and b == 7
0
>>> a == 7 and b == 7
0
1
>>> a == 7 or b == 7
1
>>> a == 7 or b == 6
0
1
>>> note (a == 7 and b == 6)
$ chmod + x myscript.py
If you want to run a boot file more from the current directory, you can
write a global boot file using the code:
import us
filename = os.environ.get ('PYTHONSTARTUP')
if filename and os.path.isfile (filename):
execfile (filename)
>>> numbers = 17 ** 4
>>> if numbers <50000:
... print "The number is less than 50000"
... else:
... print "The number is", number
...
The number is 83521
>>> def f (a): ... return a * 1.25 ... >>> f (75) 93.75
Function call:
>>> def f1 (): ... print f2 () ... >>> def f2 (): ... print "Function f2 has
been called." ... >>> f1 () Function f2 has been called. none
>>> list = [1,2,3,4] >>> def udvL (l = []): # argument l is declared here
as local name ... for i in range (5,11): ... l.append (i) ... print l ...
Driving result: >>> udvL (list) [1, 2, 3, 4, 5] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4,
5, 6, 7] [1, 2, 3, 4, 5, 6, 7, 8] [1, 2, 3, 4, 5, 6, 7, 8, 9] [1 , 2, 3, 4, 5, 6,
7, 8, 9, 10] >>>
Find prime numbers: >>> for i in range (2, 10): ... for j in range (2, i):
... if i% j == 0: ... print in, 'equals ', j,' * ', i / j ... break ... else: ... print i,'
is a prime number ... ... 3 is a prime number 4 equals 2 * 2 5 is a
prime number 5 is a prime number 6 is equal to 2 * 3 7 is a prime
number 7 is a prime number 7 is a prime number 7 is a prime
number 7 is a prime number 8 is equal to 2 * 4 9 is a prime number 9
is equal to 3 * 3 >>>
Calculate faculty by recursion: def faculty (numbers): if number <= 1:
return 1 else: return number * faculty (number - 1) # recursive call
for i in range (1.11): print "% 2d! =% d"% (i, faculty (i))
jabot @ linux: ~> python fakultet.py 1! = 1 2! = 2 3! = 6 4! = 24 5! =
120 6! = 720 7! = 5040 8! = 40320 9! = 362880 10! = 3628800
>>> i = 5
>>> def f (arg = i):
... print in
...
>>> f ()
5
>>>
>>>
>>> f (1, "four", "five", "six", 7: "seven", 8: "eight", 9: "nine") File "
<stdin>", line 1 f (1, "four", "five", "six", 7: "seven", 8: "eight", 9:
"nine") ^ SyntaxError: invalid syntax >>> f (1, "four", "five", "six", {7:
"seven", 8: "eight", 9: "ni"}) (1, ("fire", "five", "sex", {8: "eight", 9: " ni ',
7:' seven '}), {}) >>>
>>> def f (a, * b, ** c): ... for arg in b: print arg ... >>> f (1, "two",
"three", "four") two three fire >>>
Lists
Lists are variables of zero to several related spaces. Its square
brackets know the list. If you think of them taken away, you have a
tuple. It is known for its commas.
List = [] # create empty list # insert items in list for index in range (1,
11): list + = [index]
# List access via index (room number) print "Select items according
to their index:" print "List content:"
For i in range (len (list)): print "% 6d% 3d"% (in, list [i])
Update "list items ..." print "list content before update:", list list [0] =
-100 list [-3] = "bornholmers" print "list content after update:", list
List of lists in list: list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Print "The factors in the list:" for in list: for an component in print
element, print
Lists can be merged (include other lists) eg: >>> list = [[]] >>> list [[]]
>>> list * 3 [[], [], []]
>>> list_1 = [2, 3] >>> list_2 = [1, list_1, 4] >>> len (list_2) 3 >>>
list_2 [1] [2, 3] >>> list_2 [1] [ 0] 2 >>> list_2 [1] .append ('extrapost')
>>> list_2 [1, [2, 3, 'extrapost'], 4] >>> list_1 [2, 3, 'extrapost']> >>
stack = [3, 4, 5] >>> stack.append (6) >>>
stack.append (7) >>> stack [3, 4, 5, 6, 7] >>> stack.
pop () 7 >>> stack [3, 4, 5, 6] >>> stack.pop () 6
>>> stack.pop () five >>> stack [3, 4]
Example:
Calculate Primary:
Syntax: filter (function, block)
>>> def f (x): return x% 2! = 0 and x% 3! = 0
...
>>> filter (f, range (2, 25))
[5, 7, 11, 13, 17, 19, 23]
>>> t = 1,2,3,4,5
>>> t
(1, 2, 3, 4, 5)
>>>
That the separating commas matter is seen here:
>>> notice = "Danger ahead!"
>>> len (notice)
12
>>> notice = "Danger ahead!"
>>> len (notice)
1
>>>
In the first case, "Danger ahead!" is a textual content string; in the
latter case, it is a tuple. The distinction is the comma alone.
The Tuple and its use created from user data:
t = int (raw_input ("Write current hour:")))
m = int (raw_input ("Enter current minute:"))
print "the string [", start, ":", last, "] =", the string [start: last]
print "tuple [", start, ":", last, "] =", the up [start: last]
Tuples can be compared: >>> (1, 2, 3) <(1, 2, 4) True >>> >>> (1, 2,
3, 4) <(1, 2, 4) True >>> >>> (2,3,4) <> (2,0,3,0,4.0) False >>> >>>
(1, 2) <(1, 2, -1) True >>> >>> (1, 2, -1) 2, 3) == (1.0, 2.0, 3.0) True
>>> >>> (1, 2, 3.4) <> (1, 2, 3, "p") True >>> >>> ( 1, 2, ('aa', 'ab'))
<(1, 2, ('abc', 'a'), 4) True >>>
# Access and fairly existing glossary print "nun ice postal code:",
postal codes ["Rønne"] postal codes ["Gudhjem"] = 3760 print
"Gudhjems real postal code:", postal codes ["Gudhjem"]
>>> list = [] >>> for i in range (6): ... list.append ((str (i), i * i)) ... >>>
>>> list = [('0', 0), ('1', 1), ('2', 4), ('3', 9), ('4', 16), ('5' , 25)] >>> dict
(lista) {'1': 1, '0': 0, '3': 9, '2': 4, '5': 25, '4': 16} >> >
Escape sequences
A new line
b Backspace. Moves the cursor 1 place back
and System Clock
Backspace. Inserts
"Inserts quotes
Insert single quotes
Number systems
In the computer language Forth, you can use about 70 different
number systems depending on the language version. The number
system we use today is a 10-digit system or the decimal number
system. This has not always been the world over. About 4000 years
ago, the 60's system was used in the area just northwest of the
Persian Sea. Let's just look at the number 4320. It consists of 4
thousand, 3 hundred, 2 tens and 0 ones, which most certainly take
as a matter of course, but is this also so of course? No, that already
mentioned has probably already revealed that it is not at all as
simple as the 10's system is nonsense for any computer. It can only
work in the 2-figure system. This is because the computer, which the
electrical device it is, can only respond to whether there is a current
in a wire or not. If a current goes, it has been decided to register the
state with the number 1. If no current goes, it is registered with the
number 0.
2 is the basic number in the total system, which is 10 in the 10-digit
system and the nin-number system. The digits of a number system
always constitute the number that the basic number of the number
system indicates. The base number is usually what gives the number
system its name. In the total system, there are two digits (zero and
one). In the title system, there are ten digits (from 0 through 9
inclusive), etc.
Because all number systems have 0 as the lowest digit, the highest
digit becomes less than the basic number of the number system. In
the 2-figure system, the largest digit thus becomes 2 - 1 = 1 and in
the 10-digit system 10 - 1 = 9 and so on. The rearmost digit (the
furthest right) will represent the digit times the base number in the
first. A digit in a number will always be a digit time the basic number
- 1. But let Python make it a little clearer:
Prerequisite
You can learn the basics in Python and object orientation. You have
worked through the Exceptions exercise.
Stack
A Stack is a long data structure that resembles, just like it sounds, a
stack or stack. Imagine a stack of plates where a plate represents an
object, variable, or whatever it is that you store.
One usually uses a specific set of methods:
1. .push () (Adds)
2. .pop () (
Deletes ) 3. .pek () (Shows what is on top without changing the
stack)
4. .is_empty () (Returns True / False depending on whether the stack
is empty)
5. .size () (Returns the number of elements in the stack)
A Stack with a specified number of seats.
An implementation of a Stack can look as follows:
class Stack :
def __init__ (self) :
self.items = []
myList.pop ()
'Empty list.' >>>
myList.size ()
0
VisuAlgo Stack
Queue
A Queue is a linear data structure reminiscent of a stack. The
difference is that a queue is open at both ends. One end is used to
add elements and the other to remove elements.
class Queue :
def __init__ (self) :
self.items = []
def is_empty (self) :
return self.items == []
def enqueue (self, item) :
self.items.insert ( 0 , item)
except IndexError:
return "Empty list."
Linked list
For both Queue and Stack, we have used Python's built-in List: a to
store values, so so far we have only made a special version of
Python's List. The idea is that we should not need List without having
to build the entire data structure ourselves. To succeed, we will use
nodes and will build our own linked list. Queue and Stack are also
just a special version of the linked list.
We can imagine a regular List / Array as the picture below. The array
is stored in memory as a bit, and in that bit, the values are arranged
one after the other. Then we do not need to know where each value
is but just the array.
Array in memory
For a linked list of nodes, we cannot assume that the values are
adjacent to each other, but they are allocated to different places in
the memory. Therefore, each value in the linked list must be linked to
the next value. We use a Node class for that.
VisuAlgo Stack
Nod class
Then we should look at how the code can look for the node class.
Very little code is needed as we only need an attribute for data and
one for the next node.
class Node :
"" "
Node class
"" "
def __init__ (self, data, subsequent = None) :
" ""
Initialize the object with the statistics and set subsequent to None.
"" "
self.data = data
self.next = next
We test using it in the python3 interpreter.
>>> n2 = node ( 32 )
>>> n2.data
32
>>> head.next
>>> head.next = n2
>>> head.next
<__ main __. Node object at 0x7453468745 >
# Value of the first node
>>> head.data
1
# Second value of the node
>>> head.next.data
32
When we print the head.nextfirst time we get no output for the value
is None. After we assigned "head.next", "head.next" n2 contains our
"n2" object. Then we can write head.next.datato access the data, 32
in the "n2" object.
Test yourself to create one to the node,, n3 and assign to n2.next.
Print the "n3's" value via head.next.next.data and n2.next.data. In
this way, we can build a linked list.
Traverse nodes
Starting at a list's head and going through all the nodes and doing
something with them is called traversing a list. E.g. we need to
traverse a list if we want to print out all nodes values. We do this
most easily by creating a new variable that is used to point to the
list's head and use it to traverse the list.
current_node = number_list
while current_node! = None :
print (current_node.data) # Print nodes value
current_node = current_node.next # Move to next node
In the loop we replace the pointer, current_nodewith the next node
and in this way we traverse the list without changing it.
number_liststill contains the same structure.
number_list = Node ( 1 )
Remove node
To remove a node, we need to use two variables when we cross the
list. We need current_nodeand previous, current_node is used to
traverse the node to be deleted. previous should always point to the
previous node, the node before current_node. When we have found
the right node, we switch the node in previous that it points to the
node after current_node. Finally, we delete the node we want to
remove del temp.
Find the right nodes
Step 1, traverse the nodes so current_nodepoint to the one to be
deleted and previous point to the previous node.
Delete current_node
Step 3, delete current_node.
Add node
When we are joining a node, it is crucial which order we do things;
otherwise, we can lose all nodes that should be after the new one we
add. In the picture below, we start from the list [1, 2, 4]and we want
to insert a new node, with the number 5as value, between 2and 4.
The list should look like [1, 2, 5, 4]it is ready.
Find the right node and create new
Step 1, traverse the nodes so current_nodepoint to the node that
should be before the new one.
command-line
$ sudo zipper install python3
Make sure that the installation was successful by opening the
Terminal application and running the command python3:
command-line
$ python3 --version
Python 3.6.1
The version that you see may not be 3.6.1 - there will be a version
that you installed.
If you have any doubts, or something went wrong, and you have no
idea what to do next, ask your coach! Sometimes things don't go
quite smoothly, so it's best to ask someone with more experience to
help.
Text editor
There are many different editors, and it comes down to personal
preference. Most Python programmers use complex but extremely
powerful IDEs (Integrated Development Environments), such as
PyCharm. However, they are probably not very suitable for
beginners; we offer equally powerful, but far simpler options.
Below is a list of our preferences, but you can also ask your coach
for advice - it will be easier to get help from him.
Gedit
Gedit is an open, free editor, available for all operating systems.
Download it here.
Sublime Text 3
Sublime Text is a very common text editor with a free trial. It is easy
to install and easy to use and is also available for all operating
systems.
Download it here.
Atom
Atom is the latest GitHub text editor. It is free, open, easy to install,
and easy to use. Available for Windows, OSX and Linux.
Why do we need a code editor?
You may ask - why install a separate code editing program if you can
use Word or Notepad?
First, the code should be stored in plain text, and the problem of
programs such as Word or TextEdit is that they do not save files in
this form, but use rich text (with formatting and fonts), for example,
RTF (Rich Text Format).
The second reason is that specialized editors provide many useful
features for programming, such as colour code highlighting
depending on its meaning and automatically closing quotes.
Later we will see all this in action. Soon you will start thinking about
your code editor as a proven favourite tool :)
CONCLUSION
Many deep learning systems were identified as early as the 1980s
(and even earlier, but the results were unimpressive, while advances
in the theory of artificial neural networks (pre-training of neural
networks using a special case of an omnidirectional graphical model,
the so-called Boltzmann limited machine ) and the computing power
of the mid-2000s (first of all, Nvidia graphic processors , and now
Google tensor processors ) allowed to create complex technological
architectures of neural networks with sufficient performance and
allowing to solve a wide range of tasks that could not be effectively
solved earlier, For example, in computer vision , machine translation
, speech recognition and the quality of the solution in many cases is
now comparable, and in some cases exceeds the efficiency of
“protein” experts.
Deep learning algorithms are contrasted to shallow learning
algorithms by the number of parameterized transformations
encountered by the signal propagating from the input layer to the
output layer, where a data processing unit that has learnable
parameters, such as weights or thresholds, is considered a
parametrized transformation. The chain of changes from input to
output is called CAP - the transfer of responsibility ( eng. Credit
assignment path, CAP).). CAPs describe potential causal
relationships along with the network from entry to exit, with the path
in different branches may have different lengths. For a forward
formed neural network, the CAP depth is not different from the
network depth and is equal to the number of hidden layers plus one
(the output layer is also parameterized). For recurrent neural
networks in which the signal can jump through the layers bypassing
the intermediate ones, the CAP due to feedback is potentially
unlimited in length. There is no universally agreed threshold for the
depth of division of shallow learning from deep learning, but it is
usually considered that deep learning is characterized by several
non-linear layers (CAP> 2). Jörgen Schmidhuber also highlights
“very deep learning” when CAP is> 10.