Python Data Science 2024 - Explo - Wilson, Stephen

Download as pdf or txt
Download as pdf or txt
You are on page 1of 170

Python

Data Science 2024


Explore Data, Build Skills, and Make
Data-Driven Decisions in 30 Days
(Machine Learning and Data
Analysis for Beginners)

By
Stephen Wilson
Copyright 2024 © by Stephen Wilson
All rights reserved.
No a part of this book may be reproduced or
transmitted in any kind or by any suggests that
electronic or mechanical, as well as photocopying,
Recording or by any data storage.
TABLE OF CONTENTS
PART I: FUNDAMENTALS OF DEEP LEARNING
FUNDAMENTALS OF PROBABILITY
FORECASTING DEMAND
ARTIFICIAL INTELLIGENCE
DEEP LEARNING
FUNDAMENTALS OF STATISTICS
FUNDAMENTALS OF LINEAR ALGEBRA
MACHINE LEARNING AND DEEP LEARNING
FUNDAMENTALS OF MACHINE LEARNING
FUNDAMENTALS OF NEURAL NETWORKS AND DEEP LEARNING
A COMPUTER LIKE A BRAIN
THE ABILITY TO LEARN
HOW TO DO ERROR CORRECTION
CONTINUOUS IMPROVEMENT TO LEARN
DEEP TROUBLE
NEURAL NETWORK APPLICATIONS
WHAT CAN WE USE THESE SYSTEMS FOR?
THE FUTURE OF NEURAL NETWORKS
DEEP LEARNING PARAMETERS AND HYPER-PARAMETERS
WHAT ARE HYPER PARAMETERS?
TYPES OF HYPERPARAMETERS
CONTINUOUS HYPERPARAMETERS
MEDIAN DETENTION DIRECTIVE
HOW TO MAKE CONFIGURATION OF THE EXPERIMENT
DEEP NEURAL NETWORKS LAYERS
DL MAPPING
DEEP NETWORKS AND SHALLOW NETWORKS
CHOOSE A DEEP NETWORK
DEEP BELIEF NETWORKS - DBNS
ANTAGONISTIC GENE NETWORKS - GANS
RECURRENT NEURAL NETWORKS - RNNS
DEEP LEARNING ACTIVATION FUNCTIONS
NONLINEAR ACTIVATION FUNCTION
TYPES OF NONLINEAR ACTIVATION FUNCTIONS
CONVOLUTIONAL NEURAL NETWORK
FEATURES AND CNNS BENEFITS
NLP
PART II: DEEP LEARNING IN PRACTICE (IN JUPYTER NOTEBOOKS)
WHAT DOES PYTHON COST AND WHERE DO I GET IT?
HOW TO WRITE INTERACTIVELY
PYTHON DATA STRUCTURES
INSTALLING PYTHON
INSTALL PYTHON IN WINDOWS
INSTALL PYTHON: OS X
INSTALLING PYTHON IN LINUX
CONCLUSION
PART I: FUNDAMENTALS OF DEEP LEARNING
In recent years, the increasing number of software technologies
shaped under three specialized topics such as artificial intelligence,
machine learning and deep learning has led to an increasing interest
in these three subjects.
Even though the specialists working in these fields have no longer
reached consensus about these words, new principles emerge every
day about these three troubles which are occasionally used
interconnected and often used in one-of-a-kind meanings.
Artificial intelligence, which directly enters the centre of our lives with
advancing technology, machine learning that regulates the decision
mechanism of computers, and deep learning examining the data
analysis and algorithms underneath all of these, is progressing
towards becoming a common future field of study for many
disciplines. As it is an older term in general, these studies, which are
referred to as artificial intelligence, directly affect the course of
technology.
We are now able to program devices with up to a certain level of
processors with over 100 programming languages. These devices
execute the commands they receive from us through their systems in
the same period they receive from us. The biggest share in the
advancement of technology is called the programmable feature,
which has a flawless operation. However, no matter how much this
talent develops, no matter how artificial intelligence enters into our
lives, it is defeated every time against the human brain.
For example, it can easily understand and solve structures
consisting of repetitions, control groups or mathematical operations;
the current technology remains inadequate in uncertain expressions
that need logic. In this way, Deep Learning, which is essential to
teach learning methods to this system, manifests itself.
Starting from artificial intelligence learning, first of all, from data
processing to algorithms and data sets to machine learning, image
processing and analysis, which is called the highest point of artificial
intelligence, and deep learning that includes all of these, is due to all
these reasons and needs. In modern years it has become a popular
topic.
FUNDAMENTALS OF PROBABILITY
Fundamentals of Probability: Exploring the Basics
Probability, a fundamental concept in mathematics and statistics,
provides a framework for understanding uncertainty and
randomness. Whether applied in games of chance, statistical
analysis, or real-world decision-making, probability serves as a
cornerstone in numerous fields. Let's delve into the key
fundamentals of probability to grasp its foundational principles.
The process of benefiting from science in decision-making
processes that started with probability theory continued with the
extraction of certain parameters to summarize the data. Sampling
characteristics of the available data, such as mean value and
variance, were used to test the hypothesis with probability
distribution functions from probability theory. Today, techniques that
are still used in many decision-making processes have emerged,
including what some of us know as A / B testing.

1. Definition of Probability:

Probability is the measurement of the probability of an


event happening. Expressed as a value between 0 and 1,
where 0 indicates impossibility and 1 indicates certainty,
probability quantifies the chance of an outcome.

2. Sample Space and Events:

Sample Space (S): The set of all possible outcomes of an


experiment or scenario.
Event (E): A subset of the sample space, representing a
specific outcome or a combination of outcomes.

3. Types of Probability:

The basis of Classical Probability is the use of equally


likely outcomes in a sample space.
Empirical Probability: Determined through observation or
experimentation.
Subjective Probability: Based on personal judgment or
belief.
Probability models in decision-making algorithms.
Understanding the fundamentals of probability equips us with a
powerful tool for reasoning and making informed decisions in the
face of uncertainty. As we explore more advanced concepts and
applications, the core principles outlined here will continue to serve
as the foundation for a deeper understanding of this critical
mathematical discipline.

FORECASTING DEMAND
Forecasting demand in the realm of deep learning involves
predicting the need for resources, expertise, and technologies
associated with the rapidly evolving field of artificial intelligence (AI).
As we look ahead, several key factors will shape the demand for
deep learning applications and services:
The basis for decision-making is the prediction. Estimation is
possible by generating a mathematical model as input (s / s) from
inputs and inputs and then using this model in decision making. The
first models were simple and naturally linear because they answered
the current problems. This meant that the inputs were produced by
adding and/or subtracting each other to produce output. Therefore,
Regression method can be considered as a problem of passing a
straight equation from the points in the data. This method is, of
course, nonlinear known as non-linear multiplication and division
operations are also used to bring the transition to functions. In this
way, we have the opportunity to pass a curve from the points at hand
and make our estimation more accurate accordingly. Using the
probability distribution function in this process was necessary to
model the chance factor in the problem.
1. Industry Adoption:
The increasing adoption of deep learning across various industries,
including healthcare, finance, manufacturing, and technology, will
drive demand. Businesses will seek to harness the power of deep
learning for tasks such as image recognition, natural language
processing, and predictive analytics.

2. Technological Advancements:
Ongoing advancements in deep learning technologies, including
improvements in neural network architectures, algorithms, and
hardware accelerators, will fuel demand. As the field progresses,
businesses will seek to leverage the latest innovations to stay
competitive.
3. Data Growth and Complexity:
The ever-growing volume and complexity of data will contribute to
the demand for deep learning solutions. Businesses are likely to
invest in technologies that can extract meaningful insights from large
datasets, driving the need for sophisticated deep learning models.

4. Autonomous Systems and Robotics:


The rise of autonomous systems and robotics, particularly in
industries like autonomous vehicles and logistics, will lead to
increased demand for deep learning algorithms. These systems rely
on deep learning for perception, decision-making, and adapting to
dynamic environments.

5. Customization and Specialization:


As businesses seek more tailored solutions, there will be a growing
demand for specialized deep-learning models. Customized
applications, fine-tuned for specific industries or use cases, will drive
the need for expertise in model development and deployment.

6. Edge Computing Integration:


The integration of deep learning with edge computing devices will
become more prevalent. This trend will lead to a demand for
solutions that can efficiently run deep learning models on edge
devices, enabling real-time processing and decision-making.

7. AI in Healthcare:
The healthcare industry's increasing reliance on AI and deep
learning for tasks like medical imaging analysis, drug discovery, and
personalized medicine will contribute significantly to the demand.
This sector is likely to seek solutions that enhance patient care and
streamline operations.

8. Skill Development and Talent Acquisition:


The demand for skilled professionals in deep learning will continue to
grow. Organizations will invest in training and acquiring talent to build
and maintain robust deep-learning capabilities, addressing the
shortage of skilled professionals in the field.

9. Regulatory Landscape:
Evolving regulatory frameworks and ethical considerations
surrounding AI and deep learning will influence demand.
Organizations will need to comply with regulations while developing
and deploying deep learning solutions, leading to a demand for legal
and ethical expertise in AI.

10. Global Economic Trends:


Economic factors, geopolitical shifts, and global trends will impact
the demand for deep learning solutions. Economic growth,
investment climate, and government policies can influence the pace
of adoption in different regions.

Forecasting demand in deep learning requires a nuanced


understanding of technological advancements, industry dynamics,
and societal factors. As businesses increasingly recognize the
transformative potential of deep learning, the demand for AI-driven
solutions and expertise is poised to experience sustained growth in
the coming years. Organizations that proactively embrace and adapt
to these trends will be well-positioned to navigate the evolving
landscape of deep learning demand.

ARTIFICIAL INTELLIGENCE
In the 1950s, academic circles engaged in artificial intelligence, while
dealing with algorithms for learning and problem solving of a
computer, were pondering on finding solutions without knowing the
distribution function of data at hand. Living things could make
decisions without knowing statistics. During these periods, two basic
methods came to the fore. Artificial neural networks and decision
trees. Neither of these methods was based on statistics. In artificial
neural networks, the structure of the neural cells was simulated, and
a layered network was formed. In this structure, the enter layer was
referred to as the enter layer, and the output layer was once referred
to as the output layer. Layers hidden between these two are hidden
layers. I have. The length of the late Nineteen Eighties to the early
2000s is remembered as the golden age. At the end of this period,
the complexity of the system increased, and the results could not be
improved if the hidden layers exceeded four to five layers. In a
sense, the process of producing economically has slowed down
even if it does not stop. Although decision trees give good results in
the application of certain problems, the problems resulting from the
increase in data size have been applied to limited problems because
they are not very successful in algorithmic solution.
Although some advanced nonlinear methods have been produced in
the world of statistics, no general progress has been made which
can be applied to the problems at hand. For advanced methods,
nonlinear methods related to time series can be examined.
Again, the new re-use of multi-dimensional spaces in a different way.
At the point where Artificial Neural Networks (ANNs) remained in the
late 1990s, a so-called Support Vector Machine became a promising
method of dealing with the complexity that ANN could not cope with.
In this method, complex problems, space structures in mathematics
and functions that allow the transition between these spaces were
used to get rid of complexity. Of course, artificial neural networks
also required processing in multi-dimensional spaces, but there was
no transition between spaces. According to these very complex
mathematical methods, such as doors used in a space size in the
film pass, kernel function (kernel functions) have been introduced
and taken to resolving the many problems. The main reason for this
is AA complex problem in space, which requires a nonlinear solution,
can be handled as a linear problem in space B passed using kernel
functions. In this way, it was possible to use linear methods in this
space. The reflection of this to commercial life is that some products
developed using SVM allows the user to solve complex classification
problems using SVM. As with artificial neural networks, the failure of
the method to elaborate on the decision by the user posed a serious
obstacle to its spread.
This was the general picture in the mid-2000s, but especially with the
wave spread of big data, these methods were far from meeting the
need.

Random Matrix Theory


At this very moment, a method of mathematical solution that was as
old as finding artificial neural networks was remembered by a large
audience. It was, known so much that it was used in many fields
from solid state physics to chemistry. This method, which is called
the Random Matrix Theory, is used by scientists to model the
universality of a complex system that scientists have discovered in
the modelling of complex systems. Nobel Prize winner Eugene
Wigner is said to have said.
"all complex, correlated systems exhibit universality, from a crystal
lattice to the Internet."
DEEP LEARNING
So much so that it started to be applied to a wide range of fields from
modeling energy levels of atoms to spectral analysis of chemical
substances to modeling of clusters on the Internet. The problem of
not being able to cope with the fact that Artificial Neural Networks
were more than five hidden layers during these applications was also
remembered. Ultimately, this was a complex problem and was based
on matrices. In this way, successive solutions such as voice
recognition, natural language processing and image recognition
have emerged. It's called Deep Learning. It can be said that the
name refers to the removal of the obstacle to increasing the number
of hidden layers in artificial neural networks. Of course, as in the
period of artificial neural networks in the 1990s, it is possible to
create great expectations and come up with small solutions, but what
has been done so far is very promising.

Central Limit Theorem


Just as the central limit theorem in the early 20th century made the
normal distribution one of the basic tools of modeling in the scientific
world, the circular distribution of eigenvalues based on the random
matrix theorem seems to have the same effect.

Julia
What is being done in the software world to use these methods,
which require operations with very large matrices? These models are
powered by existing big data technologies. However, new software
libraries have been developed and used to deal with huge matrices.
Even site-specific software languages ​are being developed. One of
these is Julia. A candidate to become one of the languages ​that will
make its name more and more popular.

The age of scientists


It was already said that the time would be the period of data
scientists, and that science will become more prominent. The best
scientists in this field are currently being picked up by companies
such as Google, Microsoft and IBM, and the mathematical models
produced by these scientists have recently been used in the
products that feature voice and image processing.
To compete in such a world, we are at the beginning of a hundred
years where scientists working in the field of applied mathematics
will work more closely with corporate companies.
FUNDAMENTALS OF STATISTICS
The concept of statistical knowledge is a gadget of computing
devices getting to know from the fields of records and useful
analysis. The idea of statistical knowledge offers the hassle of
discovering a predictive characteristic based totally on data. The
idea of statistical getting to know has led to purposes in areas such
as PC vision, speech recognition, bioinformatics, and baseball.
The goals of studying are prediction and comprehension. Learning
entails countless categories, along with supervised learning,
unsupervised learning, studying, and reinforcement learning. From
the concept of statistical learning, supervised getting-to-know is the
most appropriate. Supervised mastering includes mastering from a
set of education data. Each factor of the formation is a pair of input-
outputs, the place the enter corresponds to an output. The getting-to-
know hassle is to infer the characteristic that maps between the
center and the output so that the discovered feature can be used to
predict the output of future input.
Depending on the kind of exit, supervised gaining knowledge of
issues are either regression issues or classification problems. If the
output takes a non-stop variety of values, it is a regression problem.
Using Ohm's Law as an example, a regression may want to be made
with voltage as the enter and cutting edge as the output. The
regression would discover the practical relationship between voltage
and modern
FUNDAMENTALS OF LINEAR ALGEBRA
A function f, or model as we have called it previously, is a
mechanism that transforms an input data x into a result y.
A function is characterized by an equation of the type:

If a datum x consists of a component or dimension, we speak of


scalar; if it consists of several components as follows:

It is then called vector.


The set of all the vectors x that can be constructed is called space.
In space, there exists at least a subset of vectors that can be
combined by addition or multiplication to obtain all the space; this
subset of vectors is called base.
The matrix is the generalization of the concept of vector: it is a
vectors vector or 2-dimensional array. It is possible to add and
multiply matrices with each other, vectors between them, and
matrices with vectors. The matrices are extremely useful from
notation because they make it possible to write in a dense way
equations of functions!
We can define the norm of a vector x as the distance of this vector
at the origin. The latter is noted as follows:

In particular, we distinguish:
The standard 2 or Euclidean distance:

The standard 1 or absolute value:

The notion of the norm is also generalizable to matrices, which we


do not detail here.

Interpolation
Given a history of points, interpolating means determining the
function that passes through all these points. In general, we use
polynomial interpolation. This could be a regression method.
However, this method is avoided as it very often leads to over-
learning.
Supervised learning with equations
To evaluate the performance of a classifier f, we first define a cost
function L that will give a measure of the error between the
prediction f (x) and the real value y.
In general, the cost function is chosen as the square of the
difference to the result to be predicted; this is the least squares
approximation:

Then, we define an error function R, which measures the sum of the


differences between the real value y and the prediction f (x) .
Where n is the number of elements in the history

So, doing a regression is to discover the characteristic f that

minimizes the error:


How to get the minimum of a function?
It is vital to specify that f is a characteristic whose shape is
recognized (for example, a linear feature of the kind y = a * x + b). To
decide f, it is ample to decide its parameters. (In the case of the
linear function, the parameters are a and b). These parameters are
acquired by minimizing the error feature R.
In most cases, there are exact methods for minimizing R using very
expensive computing operations for a machine. In general, it is faster
to estimate the value of the parameters off by performing several
tests on the value of the parameters. The number of values ​taken by
the parameters being infinite, it is necessary to carry out these tests
cleverly. For this, we use optimization methods, the best known to be
the gradient method.

Linear regression
Simple regression - x is a scalar
Linear regression is a supervised learning method for estimating f
assuming f has a linear form:

Determining f then consists in finding a and b while minimizing the


function R :
As previously explained, it is possible to solve this equation using the
gradient method.
Multiple regression - x is a vector
In that case,

The optimization formula becomes:

Thus, with the vector notation, one can see that it is easy to
generalize most of the methods applied to scalars. However, the
calculation time is more important because there are more
coefficients to evaluate (p + 1 coefficient instead of 2).
Multiple regressions, matrix notation
In the literature, one often finds these equations put in matrix form to
have condensed writing.
Let X be the matrix of n vectors x, Y the matrix of n vectors y. Then A
turns into a matrix, and we can write:

A little parenthesis about the possibilities of regressions


Nonlinear regressions
There are many regression models more complex than linear
regression; we must choose the right type of regression for the right
problem: polynomial regression, splines, time series, logistic
regression, kernel regression.
Dimension reduction
As with CPA in unsupervised learning, it is possible to use
regression methods for dimension reduction.
When the number of characteristics is large, it is interesting to
determine which characteristics influence the result. For this, we use
regularization methods. The principle of regularization is to correct
the error R by a term depending on the parameters worth 1 or 0.
One of the regularization strategies is referred to as Lasso
regression:

This method reduces the number of dimensions cancelling much of


the coefficients A. It should be noted that the parameter picture to be
determined empirically.
We are done with the explanation of regressions and this is about
supervised learning.

MACHINE LEARNING AND DEEP LEARNING


Deep studying is a sub-unit of computer learning. Machine getting to
know is a sub-unit of synthetic intelligence. In other words, all deep
studying algorithms are desktop mastering algorithms, however
many computers gaining knowledge of algorithms no longer use
deep learning.
Deep learning refers specifically to an algorithm class called a neural
network, and technically only to “deep ağ neural networks (moreover
one second). This first neural network was developed in 1949, but it
was not very useful at the time. It will consistently outperform neural
network-based models from the 1970s to the traditional AI forms of
2010.
These non-learning AI types include rule-based algorithms (consider
an extremely complex array of if / else blocks); Intuitive based AIs
such as A * search; Constraint saturation algorithms such as Arc
Consistency; tree search algorithms such as minimax (used by the
famous Deep Blue chess AI); and more.
Two things that prevented machine learning and especially deep
learning were successful — availability of large data sets and no
computational power. In 2018, there is a surplus of data, and anyone
with an AWS account and credit card can access a distributed
supercomputer. Due to the new availability of data and computing
power, Machine learning and especially deep learning has taken the
AI ​world by storm.
You should know that there are other categories of machine learning
such as unsupervised learning and reinforcement learning, you will
learn about a subset of machine learning called supervised learning.
Supervised learning algorithms work by forcing the machine to guess
repeatedly. Specifically, we (people) want him to make predictions
about the data we already know the right answer for. This is called
much tagged data etiket - the tag is everything we want the machine
to predict.
Here's an example: Let's say we desired to set up an algorithm to
predict whether or not any individual should prolong their mortgages.
We want a team of humans who do not default on their mortgages.
We will receive data about these people; feed them into machine
learning algorithm; ask him to guess each person, and once
predicted, we tell the machine what the answer is actually. True or
false, the machine learning algorithm has changed how it changes
predictions.
We repeat this process many times, and through the miracle of
mathematics, our machine's predictions are better. Although
estimates are relatively slow, we need a lot of data to train these
algorithms.
Machine learning algorithms such as linear regression, support
vector machines, and decision trees “learn” in different ways, but
apply all the same process: make a prediction, get a correction, and
adjust the prediction-based prediction mechanism. On the upper
level, it's like how a person learns.
Recall that deep mastering is a sub-unit of desktop study that
focuses on a precise class of laptops gaining knowledge of
algorithms referred to as neural networks. Neural networks are
chiefly stimulated by using the way human intelligence works - man
or woman “neurons” get hold of “signals” from different neurons and
in turn, ship ”signals diğer to other“ neurons ”. Each neuron in some
way converts incoming “signals” and sooner or later produces an
output signal. If the whole thing went well, this sign represents the
correct prediction!
This is a useful mental model, but computers are not biological
brains. Neurons lack synapses or other biological mechanisms that
process the brain. As the biological model is disrupted, researchers
and scientists use graph theory to model artificial neural networks,
instead of identifying neural networks as “artificial brains, identifying
them as complex graphs with powerful features.
From the lens of graph theory, a neural network is a series of layers
of connected nodes; Each node represents a “neuron ve, and each
connection represents a“ synapse.
Different types of networks have different connections. The simplest
method of deep learning is a deep neural network. A deep neural
network is a graph, including a series of fully connected layers. Each
node on a given layer has an edge to each node on the next layer;
Each of these edges is dispensed a different weight. The whole layer
sequence is the “brain. If the weights on all these sides are set
correctly.
The Deep Learning study will be about how to create different
versions of these graphs; adjust the link weights until the system
operates, and make sure our machine thinks we did it. Mechanics
that make Deep Learning such as gradient descent and
backpropagation combine many ideas from different mathematical
disciplines. To understand, neural networks need some math
background.

Background Information - A Bit of Everything:


When you think about how easy you can use libraries like PyTorch
and Tensor Flow, you don't need a lot of math. A subset of the topics
of linear algebra, mathematics, probability, statistics and graph
theory has already emerged.
Getting this information at the university will take about five courses.
Mathematics 1, 2 and 3; Linear Algebra; and computer science 101.
Fortunately, You don't need all of these areas. Based on what I've
seen so far if you want to enter the neural networks yourself, I
recommend:
From linear algebra, you need to know and pass the point product,
the matrix product (in particular the rules of multiplying matrices of
different dimensions). You don't have to do these things quickly by
hand, but you should be comfortable enough to make small samples
on whiteboard or paper.
You should also feel satisfied working with “multidimensional fields - -
deep learning uses many dimensional vectors.

FUNDAMENTALS OF MACHINE LEARNING


Computer science has changed a lot in recent years. Now the
concept of artificial intelligence is frequently used. Artificial
intelligence is an effort to make a machine that learns from the
environment, mistakes and people. Although there are many
different branches in this field (pattern recognition, artificial neural
network, learning with reinforcement, statistical inference,
probabilistic machine learning, supervised - unsupervised learning)
we still do not know the right way. Which technique will lead us to a
more beautiful world?
Probably now not a single approach will remedy each and every
problem. The way to produce clever structures with one-of-a-kind
strategies and mixtures additionally relies upon on traits in widely
widespread technology. Not a few years in the past our computer
systems have been now not fast. Maybe we can say the identical in
the future. It is a very necessary strategy to unfold a specific hassle
to a massive quantity of machines (block chain). In this way, lookup
is accelerating. What used to be regarded a dream 20-30 years in
the past is the pursuits of lifestyles today. Examples: ABS (Anti-lock
Brake), autopilot for aircraft, Google search suggestions, Maps,
Email unsolicited mail detection.
It is very difficult to predict the long-term future. Nobody can do that.

Troubleshooting
Machine studying and lookup is intellectual discovery. We have
witnessed a super acceleration in the closing five years compared to
the ultimate 10-20 years. Machine studying feeds on samples.
Instead of writing five hundred thousand traces of code, it is
furnished to examine with the aid of staring at the world. Google
succeeded in photo detection. He finds the elements of the pictures.
When he sees the cake and the boy, he is aware of it is a birthday.
Today we can chat with Google Example: When we ask MI Is there a
Mexican restaurant in Istanbul? Uz, we can see the Top 10 Mexican
Restaurants.

The business world


Automating machine learning is useful to make it much more
efficient. The companies that benefit from the blessings of Artificial
Intelligence are knowledge-based companies. Machines can solve
the needs of almost every sector in the fields of fraud activities, data
sampling, data mining, job security, translation. Let's say you have a
task done hundreds of times and records that show how to do it. You
can use the machines to make this work. Machine learning is perfect
for automating existing processes or making them more efficient.
FUNDAMENTALS OF NEURAL NETWORKS AND

DEEP LEARNING
Neural networks are the subsequent massive aspect when it comes
to heavy calculations and clever algorithms. This is how they work
and why they are so amazing.
If you continue with technical news, you've probably encountered the
concept of neural networks (also called neural networks).
In 2016, Google's AlphaGo neural network hit one of the best
professional Go players in the world in a 4-1 series. YouTube also
announced that they would use neural networks to better understand
their video clips. YouTube will use neural networks to truly
understand video clips. YouTube will use neural networks to truly
understand videos. Searching on YouTube can be frustrating, as
YouTube does not see the video the same way a person does.
Recently, Google adds a patent that can change it. Dozens of other
stories can be remembered.
But what specifically is a neural network? Whence does it work? And
why is it so common in machine learning?

A COMPUTER LIKE A BRAIN


Modern neuroscientists regularly talk about the Genius as a kind of
computer. Neural networks endeavor to do the opposite: construct a
PC that features like a brain.
Of course, we only have a brief understanding of the brain's
extremely complex functions, but by creating a simplified simulation
of how the brain processes data, we can build a type of computer
that works very differently than a regular one.
Computer processors concoct information serially ("in order"). They
function many processes on a set of data, one at a time. Parallel
processing ("multi-stream processing at once") considerably reduces
the PC via the usage of a couple of processors in a series.
An artificial neural network (so-called separating it from the actual
neural networks in the brain) has a fundamentally different structure.
It is much interconnected. This makes it possible to process data
very quickly, learn from that data and update its internal structure to
improve performance.
However, the high interconnection level has some astonishing
effects. For example, neural networks are very good for recognizing
obscure patterns in the data.

THE ABILITY TO LEARN


The ability of the neural network to learn is its greatest strength. With
standard database architecture, a programmer must develop an
algorithm that tells the computer what to do with incoming data to
ensure that the computer outputs the correct answer.
An input-output response may be as simple as "when the A key is
pressed," A "appears on the screen" or as complex as complex
statistics. Neural networks, on the other hand, do not need the same
type of algorithms. By learning mechanisms they can essentially
design their algorithms 4 Machine learning algorithms that shape
your life 4 Machine learning algorithms that create your life You may
not realize it, but machine learning is already around you, and it can
exert a surprising degree of influence over your life.
It is important to note that since neural networks are programs
written on machines that use standard serial processing hardware,
current technology still sets limits. Building a hardware variant of a
neural network is another problem altogether.

From Neurons to Nodes


Now that we have laid the foundation for how neural networks work,
we can start looking at some of the details. The basic construction of
an artificial neural network looks like this:

Each of the circles is named a "node", and it simulates a single


neuron. To the left are inputs nodes, in the middle are hidden nodes,
and to the right are output nodes.
In very basic terms, the input connections accept input values, which
may be a binary 1 or 0, part of an RGB colour value, status of a
chess piece or something else. These nodes serve the information
flowing into the network.
Each input button is linked to several hidden nodes (sometimes to
each hidden node, sometimes to a subset). Input nodes take the
information they are given and pass it on to the hidden layer.
For example, an input button may send a signal ("fire" in neurology)
if it receives a 1 and remains dormant if it receives a zero. Each
hidden node has a threshold value: if all its summed inputs reach a
certain value, it burns.
From synapses to connections
Each connection corresponding to an anatomical synapse is also
given a specific weight, which allows the network to place more
emphasis on the action of a specific node. Here's an example:

As you can see, the weight of connection B is higher than that of


terminals A, and C. Let us say that hidden node 4 will only fire if it
has a total input of 2 or higher. This means that if 1 or 3 fires on their
own, 4 will not be triggered, but 1 and 3 together would trigger the
node. Node 2 can also trigger the node itself via connection B.
Let's take the weather as a practical example. Say that you design a
simple neural network to determine whether there should be a winter
storm warning.
Using the above connections and weights, node 4 can only shoot if
the temperature is below 0 F and winds over 30 MPH, or it would fire
if there is more than 70 per cent chance of snow. The temperature
would be fed into node 1, winds to node 3 and probability of snow in
node 2. Now node 4 can take into account all these when
determining which signal to send to the output layer.
Better than simple logic
Of course, this function can easily be issued with simple and logical
gates. But more complex neural networks, such as the one below,
can do much more complex operations.

Output nodes work in the same way as hidden layers: output nodes
sum the input from the hidden layer, and if they reach a certain
value, output the output nodes and send specific signals. At the end
of the process, the output layer sends a set of signals indicating the
result of the input.
While the community proven above is simple, deep neural networks
can have many hidden layers and lots of nodes.
Picture credit: Neural networks and deep learning by

HOW TO DO ERROR CORRECTION


The process so far is relatively simple. But where neural networks
shine is learning. Most neural networks use a process called back
propagation, which sends signals back through the network.
Before programmers distribute a neural network, they run it through
a training phase where it receives a set of inputs with known results.
For example, a programmer can teach a neural network to recognize
images Use your smartphone to identify something with CamFind
Use your smartphone to identify something with CamFind This app
can identify almost any object you throw at it. It's not just cool, but it
also has a lot of practical uses. The input can be a picture of a car,
and the correct effect would be the word "car".
The programmer gives the image as input and sees what comes out
of the output node. If the network responds with "aircraft", the
programmer tells you that it is incorrect.
The network then does adjustments to its connections, which
changes the weights of different links between nodes. This action is
controlled by a specific learning algorithm that is added to the
network. The network extends to adjust connection weights until it
provides the correct output.
This is a simplification, however, neural networks can analyze very
complicated operations with the usage of comparable principles.

CONTINUOUS IMPROVEMENT TO LEARN


Even after training, back propagation continues and this is where
neural networks become cool. They continue to learn as they are
used, integrate new information and make tweaks to the importance
of different connections, become more and more efficient and
effective in the task they were intended for.
This can be as easy as image recognition or as complicated as
playing Go.
In this way, neural networks always change and improve. And this
can have surprising effects, resulting in networks that prioritize things
that a programmer would not have prioritized.
In addition to the process described above, which is called
supervised learning, there is also another method: unattended
learning.
In this situation, neural networks take input and try to recreate it
precisely in their production, using back propagation to update their
connections. It may sound like a fruitless exercise, but this way,
networks learn to extract useful features and generalize these
features to improve their models.
DEEP TROUBLE
Back propagation is a very productive way to teach neural
networks... when they are just a few layers deep. As the number of
hidden layers raises, the efficiency of back propagation decreases.
This is a problem for deep networks. With back propagation, they are
often no more effective than simple networks.
Researchers have evolved up with several solutions to this problem,
the specifics of which are quite complex and beyond the scope of
this introductory part. Which many of these solutions are trying to do,
is easy to reduce the network's complexity by training it to
"compress" the data.

To do this, the network studies to extract a smaller number of


identifying functions in the input, eventually becoming more efficient
in its calculations. The network makes generalizations and
abstractions, much like people learn.
After this learning, the network can cut nodes and connections that it
considers being less critical. This makes the network extra effective,
and learning becomes more comfortable.
NEURAL NETWORK APPLICATIONS
So neural networks simulate how the brain learns by using multiple
layer nodes - input, hidden and output and they can learn both in
supervised and unanswered situations. Complex networks can make
abstractions and generalize, make them more productive and better
able to learn.
WHAT CAN WE USE THESE SYSTEMS FOR?
In theory, we can practice neural networks for almost anything. And
you have certainly used them without understanding it. They are very
common in speech and visual recognition, for example, because
they can learn to select specific characteristics that sounds or
images have in common.
So when you ask Siri 8 things you probably hadn't realized Siri could
do 8 things you didn't think Siri could do Siri has become one of
iPhone's defining features, but for many people, it's not always the
most useful. While any of this is due to the limitations of voice
recognition, the oddity of using... Read extra about the place at the
nearest gasoline station, your iPhone places your speech through a
neural community to discover out what you are saying. There might
also be an extraordinary neural community that learns to predict the
range of matters that you are in all likelihood to request.

Self-driving vehicles can use neural networks to technique visible


data, thereby following avenue regulations and fending off collisions.
Robots of all sorts can gain from neural networks that assist them
examine to whole duties effectively. Computers can analyze to play
video games like chess, go, and Atari classics. If you ever talk to a
chatbot, there is a threat that it used to be a neural community that
affords fabulous answers.
An Internet search can benefit from neural networks because the
high-efficiency parallel processing model can quickly throw a lot of
data. A neural network can also teach you your habits to customise
your search results or predict what to look for shortly. This prediction
model would be very valuable to marketers (and anyone else who
has to predict complex human behaviour).
Image recognition, optical character recognition The 5 best OCR
tools for extracting text from images The 5 best OCR tools for
removing text from images When you have paper jets, how do you
get all the written text converted to something that a digital program
will be able to recognize and index? Keep good OCR software
nearby. Read more, stock market prediction, route search, large data
processing, medical cost analysis, sales forecast, video games AI...
the possibilities are almost endless. The ability of neural networks to
learn patterns, makes generalizations, and successfully predicts
behavior makes them valuable in countless situations.

THE FUTURE OF NEURAL NETWORKS


Neural networks have advanced from straightforward models to very
complex learning simulations. They are in our phones, our tablets
and run many of the web services we use.
But neural networks, due to their similarity (in a much-simplified way)
to the human brain, are some of the most fascinating. As we
continue to develop and refine models, no one says what they can
do.
DEEP LEARNING PARAMETERS AND HYPER-
PARAMETERS
WHAT ARE HYPER PARAMETERS?
The hyper parameters are adjustable parameters that are chosen to
train a model and that govern the training process itself. For
example, to train a deep neural network, you must decide the
number of hidden layers in the network and the number of nodes in
each layer before training the model. These values usually remain
constant during the training process.
In deep learning or machine learning scenarios, the performance of
the model depends to a large extent on the selected hyper
parameter values. The objective of the hyper parameters exploration
is to search between different configurations of hyper parameters
until finding the one those results in optimal performance. Normally,
the process of exploring hyper parameters is a very laborious
manual work, since the search space is very extensive, and the
evaluation of each configuration can be expensive.
Azure Machine Learning allows you to automate the hyper
parameter scanning efficiently, saving you a considerable amount of
time and resources. Specify the range of hyper parameter values
and a maximum number of training series. The system then
automatically starts several simultaneous series with different
parameter settings and looks for the configuration that results in
optimal performance, as measured by the metric you have chosen.
The series of low-performance training is automatically terminated in
advance, which reduces the waste of process resources. Instead,
these resources are used to explore other hyper parameter
configurations.

Definition of the search space


Automatically adjust the hyper parameters by scanning the range of
values defined for each hyper parameter.
TYPES OF HYPERPARAMETERS
Each hyper parameter can be discrete or continuous.
Discrete hyper parameters
The discrete hyper parameters are specified with an object choice
between discrete values. Can be: choice

one or more values separated by commas;


an object range;
any arbitrary list object.

In this case, batch_size take one of the values [16, 32, 64, 128] and
number_of_hidden_layerstake one of the values [1, 2, 3, 4].
Advanced discrete hyperparameters can also be specified through a
distribution. The following distributions are supported:

uniform(low, high, q): returns a value as round (uniform


(low, high) / q) * q
qloguniform(low, high, q): returns a price as spherical (exp
(uniform (low, high)) / q) * q.
normal(mu, sigma, q): returns a value as round (normal
(mu, sigma) / q) * q
lognormal(mu, sigma, q): returns a value as round (exp
(normal (mu, sigma)) / q) * q

CONTINUOUS HYPERPARAMETERS
Continuous hyperparameters are specified as distribution through a
continuous range of values. The supported distributions are:

Uniform (low, high): returns a value evenly distributed


between low and high.
log-uniform (low, high): returns a value that is extracted
according to exp (uniform (low, high)) so that the logarithm
of the returned value is evenly distributed.
Normal (mu, sigma): returns a real value that is normally
distributed with half mu and standard deviation sigma.
Lognormal (mu, sigma): returns a value extracted
according to exp (normal (mu, sigma)) so that the logarithm
of the returned value is normally distributed.
The following is an example of parameter space definition:

This code describes a search area with two parameters:


learning_rateand keep_probability. It has a regular distribution with a
imply price of 10 and a popular deviation of three It has a uniform
distribution with a minimal cost of 0.05 and a most price of 0.1.
learning_rate keep_probability.

Sampling the hyper parameter space


You can in addition specify the parameter sampling approach that
will be used in the course of the definition of the hyper parameter
space. The Azure Machine Learning provider helps with random
sampling, grid sampling, and Bayesian sampling.

Random sampling
In random sampling, the hyperparameter grades are randomly
selected from the specified search space. Random sampling
provides the search space to include discrete and continuous
hyperparameters.

Grid sampling
Grid sampling performs a simple grid search for all possible values
of the defined search space. It can only be used with
hyperparameters specified with. For example, the following space
has six samples in total: choice

Bayesian sampling
Bayesian sampling is centred on the Bayesian optimization algorithm
and makes intelligent decisions about the hyper parameter values to
be sampled next. The sample is taken based on how the previous
samples have been executed, in such a way that the new sample
improves the notified main metric.
When Bayesian sampling is used, the number of simultaneous runs
affects the effectiveness of the adjustment process. Normally, a
smaller number of simultaneous executions can lead to a better
convergence of sampling, given that the lower degree of parallelism
increases the number of executions that benefit from previously
completed executions.
Bayesian sampling only supports distributions choice and uniform in
the search space.

Note
Bayesian sampling does not hold any early termination policy (see
the specification of an early termination directive). When using
Bayesian parameter sampling, set the value to, or do not add the
parameter early_termination_policy = Noneearly_termination_policy.

Specification of the main metric


Specify the main metric that you want to be optimized with the
hyperparameter adjustment experiment. In each training series, the
main metric is evaluated. Executions with poor performance (where
the main metric does not meet the criteria set by the early
termination directive) are terminated. In addition to the name of the
main metric, it will also specify the goal of the optimization: whether
to maximize or minimize the main metric.

primary_metric_name: name of the main metric to


optimize. The name of the main metric must exactly match
the name of the metric recorded by the training script. See
Register metrics for hyperparameter adjustment.
primary_metric_goal: it can be
PrimaryMetricGoal.MAXIMIZEor
PrimaryMetricGoal.MINIMIZEand it determines if the main
metric will be maximized or minimized when evaluating the
executions.
Optimize the executions to maximize "precision". Be sure to record
this value in the training script.

Register metrics for the adjustment of hyperparameters


The training script for the model must record the relevant metrics
during model training. When configuring the hyperparameters
setting, you must specify the main metric that will be used to
evaluate the performance of the executions. (See specification of the
main metric to optimize). In the training script, you must record this
metric so that it is available for the hyper-parameter adjustment
process.
Record the metrics of the training script using the following sample
code snippet:

The training script calculates the value val_accuracyand registers it


as "precision", which is used as the main metric. Each time the
metric is recorded, the hyper-parameter adjustment service receives
it. It is the developer of the model who must determine how often this
metric is reported.

Specification of an early termination directive


Automatically end executions with low performance with the help of
an early termination policy. The completion reduces the waste of
resources, which are used to explore other parameter settings.
When using an early termination policy, you can configure the
following parameters that control when a policy is applied:

evaluation_interval: how often the policy is applied. Each


time the training script records the main metric, it is
considered an interval. Therefore, a parameter of 1 will
apply the directive each time the training script reports the
main metric. A parameter of 2 will apply the directive the
other times that the training script notifies the main metric.
If it is no longer specified, it is set to 1 by way of default.
Evaluation_interval evaluation_interval evaluation_interval
delay_evaluation: delays the first evaluation of the directive
a specified number of intervals. It is an optional parameter
that allows all configurations to be executed during an
initial minimum number of intervals, which prevents the
premature termination of training series. If specified, the
policy applies each multiple of evaluation interval that is
greater than or equal to delay_evaluation.
The Azure Machine Learning service supports the following early
termination directives.

Bandit Directive
Bandit is a termination directive based on the factor of delay or the
amount of delay and the evaluation interval. The directive terminates
in advance those executions in which the main metric is not within
the factor of delay or amount of delay specified concerning the series
of workouts with the best performance. This directive takes the
following configuration parameters:
slack_factoror slack_amount: the allowed delay concerning the
series of workouts with the best performance. Specifies the allowed
delay as a relation. Specifies the allowed delay as an absolute
quantity, rather than a relation. slack_factor slack_amount
For example, imagine that a bandit policy is applied in interval 10.
Suppose that the execution with the best performance in interval 10
reports the main metric of 0.8 to maximize it. If the directive were
specified with a parameter of 0.2, those training series whose best
metric in interval 10 is less than 0.66 (0.8 / (1 + )) would end. If, on
the other hand, the directive was specified with a parameter of 0.2,
those training series whose best metric in the interval 10 is less than
0.6 (0.8 - ) will end. slack_factorslack_factor
slack_amountslack_amount

evaluation_interval: the frequency with which the directive


is applied (optional parameter).
delay_evaluation: delays the first evaluation of the directive
a specified number of intervals (optional parameter).
from azureml.train.hyperdrive import BanditPolicy
early_termination_policy = BanditPolicy(slack_factor = 0.1,
evaluation_interval=1, delay_evaluation=5)
In this instance, the early termination policy is applied at each
interval when the metrics are notified, starting at the evaluation
interval 5. Any execution whose best metric is less than (1 / (1 +
0.1), or at 91 % of the execution with the best performance, will be
finished.

MEDIAN DETENTION DIRECTIVE


The median of detention is an early termination directive based on
the execution of average values of the main metrics reported by the
executions. This directive calculates the average values of execution
in all the training series and finishes the executions, whose
performance is worse than the median of the average values of
execution. This directive takes the following configuration
parameters:

evaluation_interval: the frequency with which the directive


is applied (optional parameter).
delay_evaluation: delays the first evaluation of the directive
a specified number of intervals (optional parameter).
from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy =
MedianStoppingPolicy(evaluation_interval=1, delay_evaluation=5)

In this instance, the early termination policy is applied at each


interval, beginning at the evaluation interval 5. Execution is
terminated at interval 5 if its best principal metric is worse than the
median of the average execution values during the intervals in a ratio
of 1 to 5 in all the training series.

Truncation selection directive


The truncation selection cancels a given percentage of executions
with the lowest performance in each evaluation interval. The
executions are compared based on their performance in the main
metric, and X% with the lowest performance is terminated. This
directive takes the following configuration parameters:

Truncation percentage: the percentage of executions with


the lowest performance that will be completed in each
evaluation interval. Specify an integer fee between 1 and
ninety-nine.

evaluation interval: the frequency with which the directive is


applied (optional parameter).
delay evaluation: delays the first evaluation of the directive
a specified number of intervals (optional parameter).

from azureml.train.hyperdrive import TruncationSelectionPolicy


early_termination_policy =
TruncationSelectionPolicy(evaluation_interval=1,
truncation_percentage=20, delay_evaluation=5)

In this instance, the early terminus policy is applied at each interval,


beginning at evaluation interval 5. Execution will be terminated at
interval 5 if its performance in this interval is at 20% of the lowest
performance of all executions in interval 5.
Without termination directive
If you want all training series to run to completion, set the directive to
None. The effect will be not to apply any early termination directive.

Policy=None
Default policy
If no directive is specified, the hyperparameter adjustment service
will allow all training series to run until completed.
Note
If you are looking for a conservative policy that provides savings
without completing promising jobs, you can use a Median Detention
Directive with evaluation_interval1 and delay_evaluation5. It is a
conservative configuration that can provide savings of between 25%
and 35% without loss of the main metric (according to our evaluation
data).

Resource allocation
Control the resource budget for the hyperparameter adjustment
experiment by specifying the maximum total number of training runs.
Optionally, specify the maximum duration for the hyperparameter
adjustment experiment.
max_total_runs: maximum total number of training executions that
will be created. Upper limit: for example, you can have fewer
executions if the hyperparameter space is finite and has fewer
samples. It must be a number between 1 and 1000.
max_duration_minutes: maximum duration in minutes of the
hyperparameter adjustment experiment. The parameter is optional
and, if it exists, all series that might be running after this duration is
automatically cancelled.
Note
If specified max_total_runsand max_duration_minutesthe
experiment hyperparameter adjustment ends when the first of these
two thresholds are reached.
Also, specify the maximum number of training series that will be
executed at the same time during the hyperparameter adjustment
search.
max_concurrent_runs: maximum number of series that will be
executed at the same time at a given time. If you do not specify any,
the number of executions will start in parallel. If specified, the value
must be a number between 1 and 100. max_total_runs
Note
The number of simultaneous executions is determined by the
resources available in the specified process destination. Therefore,
you must ensure that the process destination has the resources
available for the desired simultaneity.
Assign resources for the adjustment of hyperparameters:

max_total_runs=20,
max_concurrent_runs=4
In this code, an adjustment experiment of hyperparameters is
configured to use a maximum of 20 total executions, in such a way
that 4 configurations are executed at the same time.
HOW TO MAKE CONFIGURATION OF THE

EXPERIMENT
Configure the hyperparameter adjustment experiment using the
hyperparameter search space defined, the early termination
directives, the main metrics, and the allocation of resources that
have already been seen in the previous sections. Also, provide a
parameter that will be invoked with the sampled hyperparameters.
The parameter describes the training script to be executed,
resources per job (one or more GPUs) and the process destination
to be used. Since the available resources determine the simultaneity
in the hyperparameter adjustment experiment, make sure that the
process destination specified in has sufficient resources for the
desired simultaneity. estimator estimator estimator (For more
information on estimators, see how to train models ).
Configure the hyperparameter adjustment experiment:

Sending the experiment


After defining the hyperparameter adjustment settings, send an
experiment:
experiment_nameis the name you want to assign to your hyper-
parameter adjustment experiment and workspace is the work area in
which you want to create the experiment (for more information about
the experiments, see How does the Azure Machine Learning service
work? ).
Display of the experiment
The Azure Machine Learning SDK provides a Notebook widget that
displays the progress of the training series. The following code
fragment displays all the hyperparameter adjustment runs in one
place, a Jupyter Notebook:

In this code, a table with details about the training series of each of
the hyperparameter configurations is shown.
You can also observe the performance of each of the executions as
the training progresses.

Also, you can visually identify the correlation between performance


and values of individual hyper parameters through a parallel
coordinate plot.
Identification of the best model
Once all the hyperparameter adjustment series have been
completed, identify the configuration with the best performance and
the corresponding hyperparameter values:
DEEP NEURAL NETWORKS LAYERS
A deep neural network (DNN) is known as an artificial neural network
(ANN) with several layers hidden between the input and output
layers. As in slight ANNs, DNNs can model complex non-linear
relationships.
The foremost goal of a neural network is to receive a set of inputs,
make progressively complex calculations in them and output to solve
real-world problems such as classification. We just feed the neural
networks.
We have sequential data entry, exit and flow in a deep network.

Neural networks are generally used in supervised learning and


reinforcement learning problems. These networks are centres on a
set of interconnected layers.
In deep learning, the actual number of hidden layers, often non-
linear, can be large; Let's say about 1000 layers.
We mainly use the gradient descent method to optimize the network
and minimize the loss function.
We can use ImageNet, a repository of millions of digital images to
classify a set of data like cats and dogs. DL networks are
increasingly used for dynamic images, apart from static images, and
for time series and text analysis.
Training records units is a vital phase of the Deep Learning models.
Also, Backpropagation is the essential algorithm in the education of
DL models.
DL deals with the training of large neural networks with complex
input and output transformations.
An example of DL is the assignment of a photo to the name of the
person (s) in the photo as they do in social networks and describing
an image with a phrase is another recent DL application.

DL MAPPING
Neural networks are purposes that have inputs like x1, x2, x3 ... that
are transformed into outputs like z1, z2, z3, etc. in two (shallow
networks) or several intermediate operations also called layers (deep
networks).
The weights and biases change from one layer to another. "W" and
"v" are the weights or synapses of the layers of the neural networks.
The great case of the usage of deep mastering is the hassle of
supervised learning. Here, we have a massive set of records inputs
with a favored set of outputs.

Backpropagation algorithm
Here we apply the backward propagation algorithm to obtain the
correct output prediction.
The most basic deep-learning data set is the MNIST, a set of data
from handwritten digits.
We can deeply train a convolutional neural network with Keras to
classify images of handwritten digits from this set of data.
The triggering or activation of a neural network classifier provides a
score. For instance, to classify patients as sick and healthy, we
consider parameters such as height, weight and body temperature,
blood pressure, etc.
A high score means that the patient is sick, and a low score means
that they are healthy.
Each node in the output and hidden layers has its classifiers. The
input layer takes inputs and passes its scores to the next hidden
layer for additional activation, and this continues until the output is
reached.
This input to output progress from left to right in the forward direction
is called forward propagation.
The credit allocation path (CAP) in a neural network is the series of
transformations that start from the entrance to the exit. CAPs make
probable causal connections between entry and exit.
The depth of the CAP for a given direct-feed neural network or the
depth of the CAP is the number of hidden layers plus one as the
output layer is included. For recurrent neural networks, where a
signal can propagate through a layer several times, the depth of the
CAP can be potentially unlimited.

DEEP NETWORKS AND SHALLOW NETWORKS


There is no precise threshold of depth that divides the superficial
learning from deep learning; but in general, it is accepted that for
deep learning that has multiple non-linear layers, the CAP must be
greater than two.
The basic node in a neural network is a perception that mimics a
neuron in a biological neural network. Then we have Multilayer
Perception or MLP. A set of weights and biases modifies each set of
inputs. Each edge has a different weight, and each node has a
unique bias.
The accuracy of the prediction of a neural network depends on its
weights and biases.
The process of improving the accuracy of the neural network is
called training. The output of a forward propagation network is
compared to the value that is known to be accurate.
The charge function or the loss function is the distinction between
the generated output and the actual output.
The goal of training is to make the cost of training as small as
possible through millions of training examples. To do this, the
network adjusts the weights and biases until the prediction matches
the correct result.
Once well trained, a neural network has the potential to make an
accurate prediction at all times.
When the pattern becomes complex, and you want your computer to
recognize them, you must opt for neural networks. In such complex
pattern scenarios, the neural network outperforms all other
competing algorithms.
Computers have proven to be good at performing repetitive
calculations and following detailed instructions, but they have not
been as good at recognizing complex patterns.
If the problem of recognition of simple patterns exists, a support
vector machine (SVM) or a logistic regression classifier can do the
job well, but as the complexity of the pattern increases, there is no
choice but to search deep neural networks.
Therefore, for complex patterns such as a human face, superficial
neural networks fail and have no alternative but to use deep neural
networks with more layers. Deep networks can do their job by
dividing complex patterns into simpler ones.
For example, on the human face, edges would be used to detect
parts such as lips, nose, eyes, ears, and so on and then recombine
them to form a human face.
The accuracy of the correct prediction has become so precise that
recently, in a Google Pattern Recognition Challenge, a deep network
exceeds that of a human.
This idea of a network of layered perceptrons has existed for some
time. In this area, deep networks mimic the human brain. But one
disadvantage of this is that they take a lot of time to train.
However, recent high-performance GPUs have been able to train
such deep networks in less than a week; while the fast CPU could
have taken weeks or perhaps months to do the same.

CHOOSE A DEEP NETWORK


How to choose a deep network? We have to determine if we are
building a classifier or if we are trying to find patterns in the data and
if we are going to use unsupervised learning. To obtain patterns from
an unlabeled data set, we use a restricted Boltzmann machine or an
automatic encoder.
Consider the following points when choosing a deep network:
For text processing, sentiment analysis, analysis and recognition of
the named entity, we use a recurrent network or recursive neural
tensor network or RNTN;
For any language model that works at the character level, we use
the recurrent network.
For image recognition, we use the deep belief network DBN or
convolutional network.
For the recognition of objects, we use an RNTN or a convolutional
network.
For voice recognition, we use the recurring network.
In general, deep understanding networks and multilayer perceptrons
with rectified linear units or RELUs are good options for
classification.
For the analysis of time series, it is always recommended to use a
recurrent network.
Neural networks have existed for more than 50 years; but only now
has it been given more importance. The reason is that they are
difficult to train when we try to train them with a method called
propagation, we find a problem called leak or explosion gradients.
When that happens, training takes more time and precision takes a
back seat.
When we train a data set, we constantly calculate the cost function,
which is the difference between the predicted output and the actual
output of a set of tagged training data. The cost function is
decreased by adjusting the values of weights and biases until the
lowest value is obtained. The training method uses a gradient, which
is the rate at which the cost will change concerning the change in
weight or bias values.
Restricted Boltzmann Networks or Auto encoders - RBN
In 2006, a breakthrough was achieved by addressing the problem of
leakage gradients. Geoff Hinton devised a novel strategy that drove
to the development of the Restricted Boltzmann Machine: RBM, a
shallow two-layered network.
The first layer is the visible layer. Moreover, the second layer is the
hidden layer. Each node in the noticeable layer is connected to each
node in the hidden layer. The network is known as restricted
because two layers within the same layer are not allowed to receive
a connection.
Auto encoders are networks that encode data as vectors. Create a
hidden or compressed image of the raw data. The vectors are useful
in the reduction of dimensionality. The vector compresses the
uncooked facts into a smaller variety of quintessential dimensions.
The computerized encoders are blended with decoders, which
approve the reconstruction of the entered records primarily based on
their hidden representation.
RBM is the mathematical equal of a bidirectional translator. An
ahead ignore takes entries and interprets them into a set of numbers
that encode the entries. Meanwhile, a backward pass takes this set
of numbers and translates them back into reconstructed entries. A
well-trained network performs again with a high degree of accuracy.

DEEP BELIEF NETWORKS - DBNS


Deep belief networks (DBNs) are made by combining RBM and
introducing an intelligent training method. We have a new design
that finally solves the problem of the disappearance of the gradient.
Geoff Hinton created the RBMs and also Deep Belief Nets as an
alternative to backward propagation.
A DBN is comparable in shape to an MLP (Multi-layer perceptron),
however very unique when it comes to training. It is the training that
allows DBNs to overcome their shallow counterparts.
A DBN can be viewed as an RBM stack where the hidden layer of an
RBM is the visible layer of the RBM located above. The first RBM
can reconstruct its input as precisely as possible.
The hidden layer of the leading RBM is apprehended as the visible
layer of the second RBM, and the second RBM is trained using the
outputs of the first RBM. This process is iterated until each layer in
the network is equipped.
In a DBN, all RBM learns the entire entry. A DBN works globally by
adjusting the entire input in succession as the model slowly improves
as the lens of a camera focusing on an image. An RBM stack
exceeds a single RBM as a multilayer perceptron MLP exceeds a
single perceptron.
In this stage, the RBMs have detected patterns inherent in the data
but without any name or label. To finish the training of the DBN, we
have to introduce labels to the patterns and fine-tune the network
with supervised learning.
We require a very small set of labelled samples so that features and
patterns can be associated with a name. This small set of tagged
facts is used for training. This set of tagged data can be very small
compared to the original data set.
The weights and biases are slightly modified, resulting in small
change in the network's perception of the patterns and, often, a
small increase in overall accuracy.
The training can also be achieved in a reasonable time using GPUs
that offer very accurate results compared to shallow networks, and
we also see a solution to eliminating the gradient problem.
ANTAGONISTIC GENE NETWORKS - GANS
Antagonistic generative networks are deep neural networks that
comprise two networks, one against the other, hence the name
"antagonistic".
The GANs was featured in an article published by researchers at the
University of Montreal in 2014. Yann LeCun, an artificial intelligence
expert at Facebook, referring to GANs, called the training in
confrontation "the most interesting idea in the last 10 years in ML".
The potential of the GANs is enormous since network scanning
learns to imitate any data distribution. GANs can be taught to build
parallel worlds strikingly similar to ours in any domain: images,
music, speech, prose. In a way, they are robot artists and their
production is quite impressive.
In a GAN, a neural network, known as the generator, generates new
instances of data, while the other, the discriminator, evaluates them
to determine their authenticity.
Let's say we are trying to generate handwritten numbers like those
found in the MNIST data set, which is taken from the real world. The
work of the discriminator, when an instance of the true MNIST data
set is shown, is to identify them as authentic.

Now study the following steps of the GAN -


The generator network receives information in the form of random
numbers and returns an image.
This generated image is provided as input to the discriminating
network together with a stream of images taken from the actual data
set.
The discriminator takes both real and false images and returns
probabilities, a number between 0 and 1, where 1 represents a
prediction of authenticity and 0 represents false.
RECURRENT NEURAL NETWORKS - RNNS
RNNS are neural networks in which information can waft in any
place. These networks are utilized for purposes such as language
modeling or herbal language processing (NLP).
The primary notion underlying the RNN is to use sequential
information. In a usual neural network, it is anticipated that all inputs
and outputs are free of each other. If we prefer to predict the
subsequent phrase in a sentence, we have to understand what
phrases got here earlier than it.
The RNNs are known as recurrent given that they repeat the equal
undertaking for every aspect of a sequence, and the output is based
totally on the preceding calculations. Therefore, it can be stated that
the RNN has a "memory" that contains data about what has been
estimated beforehand. In theory, RNNs can use facts in very lengthy
sequences, however in reality, they can seem to be returned solely a
few steps.
Recurrent neural networks
Long-term reminiscence networks (LSTMs) are the most commonly
used RNNs.
Along with convolutional neural networks, RNNs have been used as
phases of a mannequin to generate descriptions of untagged
images. It's incredibly notable how properly this appears to work.

Deep convolutional neural networks - CNN


If we enhance the wide variety of layers in a neural community to
make it deeper, it raises the complexity of the community and
permits us to mannequin greater hard functions. However, the
variety of weights and biases will grow exponentially. Learning such
tough troubles might also emerge as not possible for everyday
neural networks. This leads to a resolution, the convolutional neural
network.
CNN's are widely used in computer vision; They have also been
applied in acoustic modelling for automatic voice recognition.
The idea after convolutional neural networks is the idea of a "moving
filter" that passes through the image. This filter in motion, or
convolution, is applied to a certain neighbourhood of nodes that, for
example, can be pixels, where the filter applied is 0.5 x the value of
the node.
The great researcher Yann LeCun was once a pioneer in
convolutional neuronal networks. Facebook, like facial attention
software, makes use of these networks. CNN has been the answer
for synthetic imaginative and prescient projects. There are many
layers in a convolutional network. In the Imagenet challenge, a
desktop used to be in a position to defeat a human in the attention of
objects in 2015.
In short, convolutional neural networks (CNN) are multilayer neural
networks. Layers every so often have up to 17 or greater and count
on that the enter information is images.
Convolutional neuronal networks
CNN's drastically reduce the number of parameters that must be
adjusted. Therefore, CNN efficiently handles the high dimensionality
of raw images.

DEEP LEARNING ACTIVATION FUNCTIONS


Deep Learning and explains the activity of the activation function.
Also, it suggests the significance of deciding on the activation
characteristic used to force a neural network.
First, before defining precisely what an activation function is, let's
talk about its origin to understand its role better. The "action potential
inspired the activation function", an electrical phenomenon between
two biological neurons.
Let's start with a little reminder of the SVT course on the composition
of a biological neuron: A neuron has a cell body, an axon that allows
it to send messages to other neurons and has dendrites that allow it
to receive signals from other neurons.
Now that you're up to date, let's explain what the action potential is:
In the picture below, the orange rectangle suggests the location the
place the two neurons communicate. The neuron receives alerts
from different neurons through the dendrites. The weight associated
with a dendrite, called synaptic weight, is multiplied by the incoming
signal. Dendrite signals are accumulated in the cell body, and if the
resulting signal strength exceeds a certain threshold, the neuron
transmits the message to the axon. Otherwise, the signal is killed by
the neuron and does not spread further. The action potential is,
therefore, the variation of the signal strength indicating whether the
communication should be done or not.

The activation feature decides whether or not to transmit the signal.


In this case, it is an easy feature with solely one parameter: the
threshold. Now, when we learn something new, the threshold and the
probability of connection (called synaptic weight) of some neurons
change. This creates new connections between neurons, allowing
the brain to learn new things.
Now let's discuss how it all works with a network of artificial neurons:
Incoming values ​in a neuron (x1, x2, x3, ..., xn) are multiplied with
their weight (synaptic weight reference) associated with them (w1,
w2, w3, ..., wn). We then sum these multiplications and finally add
the bias (a reference to the threshold). The image below shows the
calculation formula.
The function Z (x) corresponds to the pre-activation, that is to say,
the step preceding the activation. Then, the activation function
intervenes. The result z of the pre-activation function Z (x) is
interpreted by an activation function A (z) producing a result y.

The cause of the activation is to seriously change the sign to gain an


output price from complicated transformations between the inputs.
To do this, the activation characteristic ought to be non-linear. It is
this non-linearity that makes it viable to create such changes. We will
see why.

Non-linearity
Let's start by briefly seeing the difference between a linear and
nonlinear activation function:

Linear activation function


It is a simple function of the form: f (x) = ax or f (x) = x. The input
goes to the output without a big modification or any modification. We
remain here in a situation of proportionality.

NONLINEAR ACTIVATION FUNCTION


Proportionality situations are usually special cases. The nonlinear
functions make it possible to separate the nonlinearly separable
data. They are the most used activation functions. A non-linear
equation governs the correspondence between the inputs and the
output.

The use of nonlinear activation functions is simply essential for the


simple reason that linear functions only work with a single layer of
the neuron. Because beyond a layer of neurons, the recurrent
application of the same linear activation function will no longer have
any impact on the result. In other words, to solve complex problems,
the use of nonlinear functions is mandatory.

Let's see this with examples:


Look at the picture below; we have two types of data represented in
blue and orange. The goal is to separate them; the use of a linear
activation function is enough. The red line represents the decision
boundary resulting from the activation function. The decision
boundary results from the logistic regression performed by the
function, i.e. the classification of the data. In this case, the data are
well separated, and we thus reach a success rate of 100% from a
simple linear function.

Let's continue with another situation, we always have two types of


data, but this time classified differently. Since logistic regression is
linear, it is impossible to place the decision boundary in such a way
as to perfectly separate the data represented in blue from those in
orange. The success rate when classifying the data is only about
50% (image below). We are here in a case yet very simple, the two
types of data are easily distinguished, but a linear activation function
cannot suffice.
Thus, to properly separate these data, one will need a nonlinear
activation function exerting a further logistic regression. The image
below represents the same situation as the previous one, but this
time we used a nonlinear activation function. Our two types of data
are now perfectly separated (image below).

Another critical point is that data processed by neurons can reach


surprisingly large values. The use of a linear function, which does
not modify the output, the values of the data transmitted from
neurons to neurons can become larger and more computationally
complex. To remedy this, non-linear activation functions reduce the
output value of a neuron most often in the form of a simple
probability.
TYPES OF NONLINEAR ACTIVATION FUNCTIONS
Sigmoid
The primary purpose of the function is to reduce the input value to
reduce it from 0 to 1. In addition to expressing the value as a
probability, if the input value is a very large positive number, the
function will convert that value. Value in a probability of 1.
Conversely, if the input value is a very large negative number, the
function will convert this value to a probability of 0. On the other
hand, the equation of the curve is such that only small values affect
the variation of output values.
The image below represents the sigmoid function:

The Sigmoid function has several defaults:


It is not centred on zero, that is to say, that negative inputs can
generate positive outputs.
Being rather flat, it has a rather weak influence on the neurons
compared to other activations functions. The result is often very
close to 0 or 1, causing the saturation of some neurons.
It is expensive in terms of calculation because it includes the
exponential function.
tanh
The Tanh function is also called "hyperbolic tangent".
This function is comparable to the Sigmoid function. The difference
with the Sigmoid function is that the Tanh function produces a result
between -1 and 1. The Tanh function is in general preferable to the
Sigmoid function because it is centred on zero. Large negative
inputs tend to -1 and large positive inputs tend to 1.
Apart from this advantage, the Tanh function has the same other
disadvantages as the Sigmoid function.
Relu
To solve the difficulty of saturation of the two preceding functions
(Sigmoid and Tanh), there exists the function ReLU (Unit of Linear
Correction). This function is the most used.

The function ReLU is interpreted by the formula: f (x) = max (0, x). If
the input is invalidating, the output is 0, and if it is negative, then the
output is x. This activation function greatly increases network
convergence and does not saturate.
But the ReLU function is not perfect. If the input value is negative,
the neuron remains inactive, so the weights are not updated, and the
network does not learn.
Leaky ReLU

The Leaky ReLU function is interpreted by the formula: f (x) = max


(0.1x, x). The Leaky Relu function tries to correct the ReLU function
when the input is negative. The concept of Leaky ReLU is when the
input is negative; it will have a small positive slope of 0.1. This
function somewhat eliminates the problem of reLU function inactivity
for negative values, but the results obtained with it are not
consistent. It still retains the characteristics of a ReLU activation
function, that is to say computationally efficient, it converges much
more quickly and does not saturate in the positive regions.

Parametric ReLU
The idea of ​the Leaky ReLU function can be further expanded.
Instead of multiplying x by a constant term, we can multiply it by a
hyperparameter that seems to work better than the Leaky ReLU
function. This extension to the Leaky ReLU function is known as
parametric ReLU.
The parametric ReLU function is interpreted by the formula: f (x) =
max (ax, x) where "a" is a hyperparameter. This gives neurons the
ability to choose which slope is best in the negative region. With this
capability, the parametric ReLU function can become a classic ReLU
function or a Leaky ReLU function.
It will be most often preferable to use the function ReLU, its two
other versions (Leaky ReLU and parametric ReLU) are experiments
without real added value.
Over the years, various functions have been used. In this section,
only the main ones have been mentioned. At present, there is still a
lot of research to be done to find an appropriate activation function
that allows the neural network to learn more efficiently and quickly.
CONVOLUTIONAL NEURAL NETWORK
The visual cortex of vertebrates directly inspires convolutional neural
networks. A network of convolutional neurons also called convent
(for "Convolutional Network"), or CNN (for "Convolutional Neural
Network").
We distinguish two parts, a first part which is called the convolutive
part of the model and the second part, which one will call the
classification part of the model which corresponds to a model MLP
(Multi Layers Perceptron).

It is a network of multilayer neurons and more precisely it is a deep


network composed of multiple layers which in general are organized
in blocks (CONV → RELU → POOL).
There are four types of layers for a convolutional neural network: the
convolutional layer, the poolin layer g, the ReLU correction layer, and
the fully-connected layer.

The convolution layer


The convolutional layer is the key component of convolutional neural
networks and is still at least their first layer.
Its motive is to detect the presence of a set of points in the
photographs acquired as input. For this, we function convolutional
filtering: the precept is to "drag" a window representing the
characteristic on the picture and calculate the product of convolution
between the function and every element of the scanned image. A
function is then considered as a filter: the two phrases are equal in
this context.
Three hyperparameters make it feasible to dimension the extent of
the convolution layer.
Depth of the layer: number of convolution nuclei (or number of
neurons associated with the same receiver field).
The step controls the overlap of the receiver fields. The smaller the
pitch, the more overlapping the receiver fields and the greater the
output volume.
The margin (at 0) or zero paddings: sometimes, it is convenient to
put zeros at the border of the input volume.
The dimension of this zero-padding is the 1/3 hyperparameter.
This margin makes it possible to control the spatial dimension of the
output volume. In particular, it is sometimes desirable to keep the
same area as that of the input volume.

Pool Layer: Pooling


This kind of layer is frequently positioned between two convolution
layers: it receives a number of function maps as enter and applies to
every one of them the pooling operation.
The operation of pooling (or sub-sampling) consists in reducing the
size of the images while preserving their important characteristics.
For this, the picture is reduced to everyday cells; then the most cost
is saved inside every cell. In practice, small rectangular cells are
regularly used to keep away from dropping too tons information. The
most frequent picks are adjoining cells of two × two pixels
dimensions that do not overlap, or three × three pixels cells, spaced
from every different with the aid of a step of two pixels (which
consequently overlap). The identical range of characteristic maps
are output as input, however, these are lots smaller.
The pooling layer reduces the quantity of parameters and
calculations in the network. This improves the effectiveness of the
community and avoids over-learning.
It is frequent to periodically insert a Pooling layer between
successive Conv layers in a ConvNet architecture. Its function is to
gradually reduce the spatial size of the representation to reduce the
number of parameters and calculations in the network, and thus also
to control the over-adjustment. The pooling layer operates
independently on each depth slot of the input and resizes it spatially,
using the MAX operation.
Thus, the pooling layer makes the community much less touchy to
the role of features: the truth that a characteristic is a little greater or
lower, or even that it has a barely special orientation ought to now
not motive a trade radical in the classification of the image.
The ReLU layer
To improve the efficiency of the treatment by interleaving between
the processing layers, a layer that will operate a mathematical
function (activation function) on the output signals. in this context we
find
ReLU ( Rectified Linear Units ) refers to the actual non-linear
function.
described with the aid of ReLU (x) = max (0, x).
The ReLU correction layer, therefore, replaces all terrible values
obtained as inputs with zeros. It performs the function of the
activation feature
Often, the Relu correction is preferable, but there is another form.

The correction by hyperbolic tangent f (x) = tanh (x),


The correction by the saturating hyperbolic tangent : f (x) = | tanh (x)
|,
The correction by the sigmoid function {\ displaystyle f (x) = (1 + e ^
{- x}) ^ {- 1}}. f (x) = (1 + e -x ) -1
Fully-connected layer
The fully-connected layer is evermore the last layer of a neural
network, convolutive or not - so it is not characteristic of a CNN.
This kind of layer receives an enter vector and produces a new
output vector. For this, it applies a linear aggregate and optionally an
activation characteristic to the values acquired at the input.
The fully-connected layer classifies the entered picture of the
network: it returns a vector of dimension N and the place N is the
number of lessons in our picture classification problem. Each issue
of the vector suggests the likelihood for the entered picture to belong
to a class.
For example, if the hassle consists of distinguishing cats from dogs,
the last vector will be of measurement 2: the first aspect
(respectively, the second) offers the likelihood to belong to the
category "cat" (respectively "dog"). Thus, the vector [0.90.1] ability
that the photograph has a 90% risk of representing a cat.
Each price of the enter array "votes" in favor of a class. The votes do
not all have equal importance: the layer offers them weights that rely
on the factor of the desk and the class.
To calculate the probabilities, the fully-connected layer, therefore,
multiplies each input element by a weight, adds the sum, and then
applies an activation function (logistic if N = 2, softmax if N> 2):
It is possible to replace fully connected CNN layers with
convolutional layers, which makes it fully convolutional. All-
convolutional network is a good idea. I invite you to discover it.
To begin
To help us understand the operation of a convolutional neural
network, we will help with a simplified example and try to determine if
an image represents an X or an O.
Our CNN has only one task to perform: each time we present a
photo, it must decide if this photo represents an X or an O. He
considers that in each case, there can only be 'either.

for a computer, an picture is zero extra than an array of 2-


dimensional pixels (a type of large chessboard) with every container
containing a precise number: In our example, a pixel with the fee 1 is
a white pixel, and -1 is black.

Features - features.
CNN compares images fragment by fragment. The fragments he
looks for are called features.
The features correspond to pieces of the image.
By finding approximate features that are roughly similar in 2 different
images, the CNN is much better at detecting similarities than by a full
image-to-image comparison.

Each feature is like a mini-image, ie a small array of dimension 2


values. The features bring together the most common aspects of the
images. In the case of the image showing an X, the characteristics
defining the two diagonals and the intersection of the latter represent
the most common features of an X. These characteristics probably
correspond to the arms and the centre of any image of an image. X.

Convolution
When introduced consisting of a new image, the CNN no longer
recognizes precisely if the facets will exist in the photograph or the
place they ought to be, so it seeks to locate them in the complete
photograph and any position.
By calculating in the whole image if a characteristic is present, we do
filtering. The mathematics we use to do this is called convolution,
from which the convolutional neural networks take their name.
To calculate the correspondence between a characteristic and a sub-
part of the image, it is sufficient to multiply each pixel of the
characteristic by the value that this same pixel contains in the image.
Next, add the responses and divide the result by the total number of
pixels in the feature. If the 2 pixels are white (value 1) then 1 * 1 = 1.
In all cases, each matching pixel results in 1. Similarly, each shift
gives -1.
If all the pixels in a characteristic match, then their addition and then
their division by the total number of pixels gives 1. In the same way,
if none of the pixels of the characteristic correspond to the sub-part
of the image, then the answer is -1.

To entire a convolution, one repeats this system aligning the traits


with every subpart of the image. We then take the end result of every
convolution and create a new desk of two dimensions. This map of
suits is additionally a filtered model of the unique image. It is a map
displaying the place where the characteristic was once determined in
the image.

Values ​close to 1 show a strong match; those close to -1 show a


strong match for the photographic negative of the functionality and
values ​close to 0 do not correspond to any type.
An image becomes a stack of filtered images.

The next step is to repeat the complete convolution process for each
of the other existing features. The result is a set of filtered images,
each image of which corresponds to a particular filter.
It is advisable to consider this set of convolution operations as a
single processing step: In convolutional neural networks, this is
called the convolutional layer, which suggests that there will
eventually be other layers added to them.
Although the principle is relatively simple and we can easily explain
our CNN on the back of a towel, the number of additions,
multiplications and divisions can quickly accumulate. In
mathematical terms, they increase linearly with the number of pixels
in the image, with the number of pixels of each characteristic.
With so numerous factors, it is very easy to make this problem
infinitely more complex. It is therefore not surprising that
microprocessor manufacturers now manufacture specialized chips in
the type of operations required by CNNs.

Pooling layer
Pooling is a method of taking a large image and reducing its size
while preserving the most important information it contains. The
mathematics behind the concept of pooling is again not very
complex. Indeed, just drag a small window step by step on all parts
of the image and take the maximum value of this window at each
step. In practice, a window of 2 or 3 pixels is often used and a value
of 2 pixels for the value of a step.
Pooling: reduce the stack of images
Choose a step (usually 2).
Browse your window through your filtered images.
From each window, take the maximum value.

After pooling, the image has only a quarter of its starting pixels.
Because it keeps at each step the maximum value contained in the
window, it preserves the best features of this window. This means
that he does not care where the feature was extracted from the
window.
The result is that the CNN can find if a feature is in an image,
regardless of where it is. This helps in particular to solve the problem
related to the fact that computers are hyper-literary.

In the end, a pooling layer is simply a pooling process on an image


or collection of images. The output will have the same number of
images, but each image will have a lower number of pixels. This will
reduce the computational burden. For example, it is transforming an
8-megapixel image into a 2-megapixel image, which will make life
much easier for the rest of the operations to be performed later.
Rectified Linear Units (ReLU)
Normalization
Change everything negative to zero.

An important element in the whole process is the Rectified Linear


Unit or ReLU. The mathematics behind this concept is quite simple
once again: every time there is a negative value in a pixel, it is
replaced by a 0. Thus, the CNN is allowed to stay healthy
(mathematically speaking) by preventing values ​learned to stay stuck
around 0 or explode to infinity.

It is a tool not sexy but fundamental because, without it, the CNN
would not produce the results we know it.
The result of a ReLU layer is the same size as what was input, with
just all the negative values ​removed.
The exit of one becomes the entry of the next.

Deep Learning

You will probably have noticed that what you give at the input to
each layer (i.e. 2 Dimensional arrays) is very similar to what you get
at the output (other 2 Dimensional arrays). For this reason, we can
add them one by one as we would with Legos.
The raw images are filtered, rectified and pooled to create a set of
shrunk and feature-filtered images visible in each image. These can
be filtered and contracted once more and again. Each time, the
points emerge as larger and greater complex, and the snapshots
grow to be greater compact. This permits the decreased layers to
signify easy components of the image, such as mild edges and dots.
The upper layers represent much more complex aspects of the
image, such as shapes and patterns. These tend to be easily
recognizable. For example, in a CNN trained to recognize faces, the
upper layers represent patterns and patterns that are part of a face
(1).
Fully connected diapers
CNNs have another arrow in their quivers. Indeed, the fully
connected layers take the high-level filtered images and translate
them into votes. In our example, we only have to decide between
two categories, X and O.
Fully connected layers are the main building blocks of traditional
neural networks. Instead of treating inputs as 2-dimensional arrays,
they are treated as a single list and treated identically. Each value
has its vote as to whether the image is an X or an O. However, the
process is not completely democratic. Some values ​are much better
to detect when an image is an X than others, and others are much
better at detecting an O. They, therefore, have more voting power
than others. This vote is called the weight, or the strength of the
connection, between respective value and each category.
When a new image is shown to the CNN, it spreads through the
lower layers until it reaches the fully connected final layer. The
election then takes place. And the solution with the most votes wins
and is declared the category of the image.
The fully connected layers, like the other layers, can be added one
after the other because their output value (a list of votes) is very
similar to their input value (a list of values). In practice, several fully
connected layers are often added one after the other, with each
intermediate layer voting for ghost "hidden" categories. Indeed, each
additional layer lets the network learn more complex combinations of
features that help improve decision-making.

Retro propagation

Even though our analysis of CNNs is progressing well, there is


always something missing: Where do the characteristics come from?
And how do we outline the weights of our absolutely linked layers? If
you had to pick the whole thing manually, the CNNs would be much
less famous than they are. Fortunately, a very vital notion is doing
this work for us: backpropagation.
To be able to use backpropagation, we must have a collection of
images for which we already know the category to select. This
means that some charitable (and patient) souls have analyzed
thousands of images upstream and have associated each of them
with their respective categories, X or O.
These images are therefore used with a CNN that has not yet been
driven, which means that each pixel of each characteristic and each
weight of each fully connected layer is configured with a random
value. After that, we start to shoot images at CNN, one by one.
For each image analyzed by the CNN, a vote is obtained. The
number of errors that we make in our classification then informs us
about the quality of our characteristics and weights. The
characteristics and weights can then be adjusted to reduce the error.
Each value is increased or decreased, and the new error value of
our network is recalculated each time. Whatever adjustment is
made, if the error decreases, the fit is retained.
After doing this, for each pixel of each feature, each convolutional
layer, and each weight in each fully connected layer, the new weights
provide a response that works slightly better for that image.
This process is then repeated with each of the other images that
have a label.
Elements that appear only in rare images are quickly forgotten, but
patterns that are found regularly in a large number of images are
retained by the characteristics and weight of the connections. If you
have enough labelled images, these values ​stabilize at a set of
values ​that work quite well for a variety of cases.
As is certainly apparent, backpropagation is another expensive
calculation step and another motivation for specialized computer
hardware.

Hyperparameters
Unfortunately, not all aspects of CNN's are as intuitive to learn and
understand as we have seen so far. Thus, there is always a long list
of parameters that must be set manually to allow CNN to have better
results.
For each convolutional layer, how many characteristics should one
choose? How many pixels should be considered in each feature?
For each Pooling layer, which window size should we choose? What
not?
For each additional fully connected layer, how many hidden neurons
should one define?
Furthermore, to these parameters, there are also other architectural
elements of a higher level to consider: How many layers of each type
should one include? In which order? Some models of deep learning
can have more than a hundred layers, which makes the number of
possibilities extremely important.
With so many possible combinations and permutations, only a small
fraction of the possible configurations have been tested so far. The
different designs of CNNs are generally motivated by the knowledge
accumulated by the scientific community, with occasional gaps
allowing surprising improvements in performance. And although we
have presented the main building blocks for the construction of
CNNs in a relatively simple and intuitive way, there are a large
number of variations and settings that have been tested and have
yielded very good results, such as new types of layers and more
complex ways of connecting layers in between.
Beyond the images
Although our example with Xs and Os involves the use of images,
CNNs can also be used to categorize other types of data.
The trick is: whatever type of data you are processing, transform that
data to make it look like an image.
For example, the audio signals can be broken down into a set of
smaller pieces of shorter duration and decompose each of these
pieces into a band of low, medium, high or even thinner frequencies.
This can be represented by a 2-dimensional array where each
column is a block of time, and each line is a frequency band. The
"pixels" of this false image that are close to each other are closely
related. CNNs work well in these cases.
The scientists were particularly creative. In a similar approach, they
have adapted textual data for natural language processing and even
chemical data for drug discovery.
A particular example of data that does not fit this type of format is all
that is "customer data", where each row of a table represents a
customer, and each column represents information about that
person, such as its name, his address, his email address, his
purchases and his browsing history. In this situation, the location of
rows and columns does not matter. Lines can be re-arranged, and
columns can be reordered without losing the importance of the data.
On the other hand, rearranging the rows and columns of an image
makes it completely intractable.
A general rule: if your data remains as exploitable after exchanging
some columns between them, you probably cannot use the
convolutional neural networks. Nevertheless, if you can make your
problem look like looking for patterns in an image, CNNs can be
exactly what you need.

FEATURES AND CNNS BENEFITS


A fundamental gain of convolutional networks is the use of a special
weight related with the alerts getting into all the neurons of the
identical convolution core. This approach reduces the reminiscence
footprint, improves overall performance and lets in translation
processing invariance. This is the main advantage of the
convolutional neural network concerning the multilayer perceptron,
which considers each independent neuron and therefore assigns a
different weight to each incoming signal.
When the input volume varies in time (video or sound), it becomes
interesting to add a parameter along the time scale in the setting of
neurons. In this case, we will speak of a neural network with a time
delay (TDNN).
This means that the network is responsible for changing its filters by
itself (learning without supervision), which is not the case for other,
more traditional algorithms. The absence of initial parameterization
and human intervention is a major asset of CNN.

Difference between Comparative Convolutional Neural


Networks and Multilayer Perceptron
Although effective for image processing, Multilayer Perceptrons
(MLPs) have difficulty handling large images, due to the exponential
growth in the number of connections with the size of the image, as
each neuron is "totally connected" to each of the neurons of the
previous and next layer. The convolutional neural networks, whose
principle is inspired by that of the vertebrate visual cortex, For a
deep network such as AlexNet for example, more than 90% of the
parameters to be learned are due to 3 layers "Fully connected"
deepest, and the rest relates to the (5) convolutional layers.
One layer of the CNN in 3 dimensions (green = volume of entry, the
blue = volume of the receiving field, grey = layer of CNN, circles =
independent artificial neurons)
For example, if we take an image size 32 × 32 × 3 (32 wide, 32 high,
3 colour channels), a single fully connected neuron in the first hidden
layer of the MLP would have 3,072 entries (32 * 32 * 3). A 200 × 200
image would thus result in treating 120,000 entries per neuron
which, multiplied by the number of neurons, becomes enormous.
Convolutional neural networks intention to restrict the variety of
inputs whilst keeping the sturdy "spatially local" correlation of herbal
images. In distinction to MLP, CNN has the following distinguishing
features:
Local connectivity: thanks to the receiver field which limits the
number of neuron inputs, while maintaining the MLP architecture, the
convolutional neural networks thus ensure that the "filters" produce
the strongest response to a spatially localized input pattern, which
leads to a parsimonious representation of the entrance. Such a
representation takes up less space in memory. Also, since the
number of parameters to be estimated is reduced, their (statistical)
estimate is more robust for a fixed data volume (compared to an
MLP).
Shared weights: In convolutional neural networks, the filtering
parameters of one neuron (for a given receiver field) are identical for
all other neurons of the same nucleus (processing all the other
receptive fields of the image). This setting (weight vector and bias) is
defined in a "function card".
Invariance to translation: as all the neurons of the same core (filter)
are identical, the pattern detected by this nucleus is independent of
spatial location in the image
Together, these properties allow the convolutional neural networks to
obtain greater robustness in the estimation of parameters 28 on
learning problems since, for a set size of learning corpus, the
amount of data per parameter is higher. Great. Weight sharing also
greatly reduces the number of free parameters to learn, and thus the
memory requirements for network operation. The decrease in the
memory footprint makes it possible to learn larger networks that are
often more powerful.

NLP
Application in the automatic processing of natural language
(NLP)
Instead of picture pixels, most NLP functions have sentences or
documents as entries.
The idea for applying a CNN (see any RN) is to transform words and
documents into a matrix form. Each row of the matrix matches to a
token, usually a word, but it can also be a character. That is, each
line is a vector that represents a word.
In general, these vectors are word inclusions (small representations)
such as word 2 vice or Glove.
For example, for a 20-word phrase using a 100-dimensional
embedding, we would have a 20 × 100 matrix as input. This is our
"image" for CNN.
Convolutional filtering in image processing consists of dragging a
window representing the feature on the image and calculating the
convolution product between the feature and each portion of the
scanned image. However, in NLP, we usually use filters that slide
over complete rows of the matrix (the words). Thus, the "width" of
the filters is generally the same as the width of the input matrix. The
height, or size of the area, may vary, but slippery windows of more
than 2 to 5 words at a time are typical.
PART II: DEEP LEARNING IN PRACTICE (IN
JUPYTER NOTEBOOKS)
Python is a powerful computer language that is extremely effective
for internet development and web-based applications.
Why in the world would you be involved in Python when there is
already a long line of really good old and well-functioning computer
languages ​to go to? As an example, let’s mention:

Python is available for all well-known operating systems


including Linux, Mac and Windows.
Python is easy to move from one operating system to
another, for example.
Is very easy to learn - also for beginners in the use of
computer languages ​in general.
Python combines remarkable strength with a very clear
syntax (read: a way of doing things).
In Python, you can learn object-oriented programming (programming
and use of windows, buttons and the like) without understanding
many of the complex details beforehand. Later you can then, if
desired, familiarize yourself with the technique behind the
construction.
Python is a beginner-friendly computer language that automatically
controls many of the complex details behind the scene. It allows you
to use the power of the overall things in your project without having
to dig in depth for every step. Here, too, let us anticipate the events a
bit and see the following example, which may also interest the
experienced Java, C ++ or Delphi programmer.
>>> from Tkinter import Label
>>> widget = Label ()
There should probably have been 1 program line more in the
example shown. In this case, it should be called a widget. Pack ()
The pack () function sets the object's components in place on the
print unit (screen). But in this case, there is nothing to put in place,
so the two program lines are sufficient. You can create the same
window with:

with Entry:
>>> from Tkinter import Entry
>>> child = Entry ()
>>> child.pack ()
or with Text (and a variety of other objects):

>>> from Tkinter import Text


>>> child = Text ()
>>> child.pack ()

>>> from Tkinter import *


>>> child = Frame ().

Already notice the difference between 2 things. 1: What you probably


expected from Tkinter import Frame has become from Tkinter import
* (here it happens to be able to utilize the parameter values ​in a
pack) and 2: the child.pack () function () has been moved to Frame ()
and has got 2 parameters (arguments). It happens to give Frame the
heir child the opportunity to expand (happens with expanding = YES)
and it in both horizontal and vertical direction or in other words to be
able to fill in the whole screen. The frame object is a container in
which all other graphic objects such as text boxes, buttons, labels
and many others can be inserted. There is none of them in the
instance here, so Frame () .pack (expand = YES, fill = BOTH) works
the equal way as when you click on the leftmost button in the
pinnacle proper nook of the window. It makes the window clap to its
smallest viable size. Expand = YES lets in the body (the container)
to be multiplied as needed, however there is no want right here (no
different aspects ought to be inserted into the container). If later one
or more of said and corresponding objects are inserted into the
frame (the Frame), it will be expanded as needed as long as there is
room for the objects in the given screen area. If not, the objects will
simply not be displayed. It makes the window clap to its smallest
possible size. Expand = YES allows the frame (the container) to be
expanded as needed, but there is no need here (no other
components must be inserted into the container). If later one or more
of said and corresponding objects are inserted into the frame (the
Frame), it will be expanded as needed as long as there is room for
the objects in the given screen area. If not, the objects will simply not
be displayed. It makes the window clap to its smallest possible size.
expand = YES allows the frame (the container) to be expanded as
needed, but there is no need here (no other components must be
inserted into the container). If later one or more of said and
corresponding objects are inserted into the frame (the Frame), it will
be expanded as needed as long as there is room for the objects in
the given screen area. If not, the objects will simply not be displayed.
But here there is no need (no other components must be put into the
container). If later one or more of said and corresponding objects are
inserted into the frame (the Frame), it will be expanded as needed as
long as there is room for the objects in the given screen area. If not,
the objects will simply not be displayed. But here there is no need
(no other components must be put into the container). If later one or
more of said and corresponding objects are inserted into the frame
(the Frame), it will be expanded as needed as long as there is room
for the objects in the given screen area. If not, the objects will simply
not be displayed.

Figure 1-2. Example of windows


Of course, the two windows have the same open/close options etc.
as you know them from, e.g. Internet browsers in Windows. It can
then be called object-oriented programming, so it suggests. All
windows, buttons and other normal objects are pre-programmed, so
you can directly use them and of course, customize them according
to your needs at the given moment. In C ++ you need to write lots of
program lines to achieve the same. It is also very easy to integrate
Python and C ++ into one another.
One of the most prominent features of OOP (object-oriented
programming) is reuse. With the three examples shown, you have
already seen that Python has excellent properties in this area too,
for, of course, the code for a full-finished window cannot possibly be
assembled and performed in the 2-3 program lines that each
example contains. Many times the code is needed. If Python did not
possess the ability to recycle, then the very large program needed to
create the fully finished window should be developed every time the
window was to be used - that is, 3 times only for the examples
shown. It would be huge programs that would be almost inapplicable
due to the enormous scale. The code for the example shown and for
several others is contained in the Widget class. Label, Entry, Text
and Frame are all subclasses in Widget.
Python is the backbone of Jython. This means that a major or minor
part of a Jython program is and will continue to be Python. In Jython,
it is often easier to develop Java applets than it is in Java itself.
Applets can be used as integrated parts of HTML, XHTML, XML
languages ​for websites.
In Linux, you often talk about the entire software package with the
software mentioned above packages like Linux. It is wrong. Linux is
the core (the central unit / the controlling program. In Python, there is
also a core. It is small, which should also be. For this controlling unit,
there is a very large library (library). That means a lot of it, You need
to have already been developed and tested for you Your task will be
to write the code that combines the components of the library and
develop new properties as needed, but you can also greatly expand
Python yourself your functions (methods), classes, modules, etc.
WHAT DOES PYTHON COST AND WHERE DO I

GET IT?
Python is free and available for the vast majority of operating
systems. If you do not have the latest version, you can always
download it at http://www.python.org . Installing Python is very easy
if the language is not already installed on your computer. However,
this is usually the case if you use SuSE, RedHat or Mandrake.
There is a very large development of Python in progress. Only the
ca. 3 months, it has taken me to write this book, there have been 2
full updates of the language (the released versions are 2.2 and 2.3).
But not only that, but there are also updates for the latter without
changing the version name. Right now, October 16, 2003, version
2.3.2 is the current one. In version 2.4, there should be quite a lot of
changes, so Python can do even more (e.g. processing 64-bit
character code).
Guide to Python documentation
If you are completely new to programming, it can be difficult to
understand how the documentation of a programming language is
screwed up. But once you can figure it out, it is often and often much
easier to get help in the documentation of the language than having
to ask for help for a specific function on a mailing list.
Documentation for python can be found online at
http://www.python.org/doc/ . Here you can also find guides and
HowTo's for various topics. The documentation is divided into 7
categories: Module overview, tutorial, library reference, Macintosh
reference, language reference, language extensions and a Python /
C API. Explanation of the categories:

Module overview - Here's an overview of all the modules


that come with a release of Python. There is also
documentation for each module.
Tutorial - Here, you will find a comprehensive tutorial on
the most basic use of Python. A good place to start if you
can already program, but just don't know Python.
Library reference (Library reference) - A review of the most
used libraries in Python as well as more thorough
examples of use than you find in the module overview.
Macintosh reference - Reviewing the libraries specific to
Macintosh.
Language reference - Language usage and structure
review (syntax, functions, classes, etc.)
Extensions and embedding - A guide on how to expand
Python with C / C ++, embedded Python in other
applications etc.
Python / C API - A review of a C API for Python that lets C
programmers use Python in their applications.
Language Reference
GUI and IDLE - what is it?
GUI is an abbreviation of Graphic User Interface and are the
programs that partly create the virtual windows, buttons, menus, etc.
This allows the user, by clicking etc. to call the functions associated
with said elements, the purpose of which can be to send a text to a
text box, retrieve a file or send one to external storage such as a
hard drive, drawing graphics and much more. The code for the GUI
components is found inter alia. In Python's Tk library. The code for
the windows coming from the Tk library can be seen by the word Tk
at the top of the basic window.

>>> from Tkinter import Label


>>> childAfLabel = Label ()

Since, as mentioned, there is a basic model of the GUI components


in Tk, the workman Tkinter can send out Tk, stating that the code for
the construction of the basic window must be made available for
construction of a copy of Label (), which in this case I have chosen to
call childrenAfLabel, because as other children it has its origin, from
which it inherits in the mother, a term that is also used in other
Danish books on computer language. In English, the mother is called
the parent (parent). In Python, the term, the main class or the basic
class is also used.

Immediate mode
From old DOS, you may know the prompt dose c: In Python, >>> is
used as a corresponding clear message. A message that Python is
in Immediate or as it is also called interactive mode. It is from here
that you start your Python programming, but certainly, also the
mature Python programmer returns to and uses, for example. When
testing new program bits.

Comments
When you program, from time to time, you need to insert comments -
parts of the program that you and others can read, but the computer
language skips. In Python, # (the hash sign - in Danish called garden
gate or steak sign) marks. A comment begins at the garden gate and
continues the program line. Ex. 1:
>>> # Python runs over comments
Comments are for you and not for the computer.
It is also possible to print the hash sign if it is part of a text string. Ex.
2:
>>> print "Here we print a hash sign #"
In this example, there were NO comments. The steak sign is here a
part of a text string. One such Python should like to respond to and
not bypass. Note that the steak sign is printed here, which it did not
in the example where it served its proper purpose.

Comments in quotes par


Strings can also be enclosed in 3 quotes (single or double). Ex. 1:

>>> print ""


... ... Name:
The cliff island Bornholm ... Population: 44.500
... "" "

Name: The cliff island Bornholm


Population: 44,500
Ex. 2:

>>> def Function ():


... "" "Inserted text" ""
... return "Normally 1 character set"
...
>>> Function ()
'Normally 1 character is used
Ex. 3:

Normal textual content for everyday printing.


Ex. 4:

def inactive ():


... "" "Be passive, however file it
... No, nothing certainly happens." ""
... pass
...
Inactive ()
Note that three costs (single or double after your request) enable
Python to bypass the entirety between the first (initial) prices and the
remaining one, which you can additionally see, that No ... does now
not want the in any other case required block indentation.
Arithmetic operators
Addition + ex: three + four = 12
Subtraction - ex: three - four = -1
Multiplication * ex: three * four = 12
Exponent notation ** ex: 3 ** 4 = 81
Division / ex: 3/4 = 0 (rounding down to the nearest whole number)
Division // ex: 3 // 4 = 0 (rounding down to the nearest whole
number)
Modulus% ex: 3.0% 4.0 = 0.75
DANGER: Beware of integer divisions

>>> 25/24
1
25 divided by 24 should give 1.0416666666666667, but Python
automatically descends downwards to the nearest integer, which in
this case is 1.
>>> 25 / 24.0
1.0416666666666667
To get the decimal part returned (printed), it is necessary that at least
one of the numbers is a decimal number (a floating point number).
In Python, there is full support for floating point numbers. Operators
with mixed type operants convert integers (integer) operant into
floating point numbers (in English sentence):

>>> 3 * 3.75 / 1.5


7.5
>>> 7.0 / 2
3.5
>>> print "simple example" * 3
simple example simple example simple example
In traditional computer languages, this cannot be done so easily.
Something like:

print "simple example" + "simple example" + "simple example"

Variables
A variable is a pointer pointing to several addresses in the
computer's memory (RAM), where data, there may be several
things, for example. Numbers and text strings are placed. The
variable must be assigned a name that must not contain the Danish /
Norwegian special characters ÆØÅ and æøå. It should be seen that
some versions of version 2.3 have been able to use the mentioned
characters.

Statement of variables
When the programmers had to make calculations on Dask,
Denmark's first computer, they had to think in zeros and a number. It
is very inconvenient for humans, so the numbers were replaced by
words - initially 3-letter words. It was a great relief for the
programmers. Now variable and other names can usually have up to
256 letters or numbers just should the first character of a variable
name be a letter that must not be a Danish / Norwegian special
character æøå and ÆØÅ. In Python, there is nothing wrong with
that, because the first character can be an underscored sign, but it
can be a really bad solution since predefined names etc. can easily
be overwritten. In Python, variables should not be declared. The
declaration is automatic. BUT NOTE Python is case-sensitive, i.e.
variables a and A are two different variables and will be interpreted
as such. Of course, the same also applies to all other names (on
lists, tuples, etc.)

# First, the x price is assigned zero, then the y cost ix and the z cost
in y are assigned.
x = y = z = zero >>> x
0
>>> y
0
>>> z
NOTE: In Python, use == for equal to and = for assign. Thus, there is
a marked difference between x == y and x = y
Variables can be overwritten
Global variables can be overwritten:

>>> e = 1000 # global variable variable


>>> def nix ():
... global e
... e = 25
...
>>> e
1000
>>> nix ()
>>> e
25
Global names are several things, so beware:

>>> list = [1,2,3,4,5]


>>> def function (l):
... global list
... list.append (12345)
...
>>> list
[1, 2, 3, 4, 5]
>>> function (list)
>>> list
[1, 2, 3, 4, 5, 12345]
>>>
Initially, the list [1,2,3,4,5] is created. The first call shows that if the
list is called before the global list declared in the function takes over
1: the list defined outside the function and 2: it further assigns an
element (12345). Here it is pointed out that a global variable
declared in a function overrides a global variable declared outside
the function.

Variables address
The type of variables can be found with type:
>>> a = 25
>>> type (a)
<type 'int'>
>>> a = 12.4
>>> type (a)
<type 'float'>
>>> a = "rowan"
>>> type (a)
<type 'str'>

Variables address
Variables address in RAM can be found with id
>>> a = 12.4
>>> id (a)
136095956
>>> a = 23
>>> id (a)
135591272
>>> a = "rowan"
>>> id (a)
1078892096

The following print shows that the same letter gives the same
address:
>>> for i in range (0.18):
... print s [i], id (s [i])
...
D 1078892160
e 1076630176
t 1076645984
t 1076645984
e 1076630176
1078892192
e 1076630176
r 1076518048
1078892192
e 1076630176
n 1076614816
1078892192
s 1076613536
t 1076645984
r 1076518048
e 1076630176
n 1076614816
g 1076690208

>>> s = "AABBCCaabbcc"
>>> for i in range (0, len (s)):
... print s [i], id (s [i])
...
A 1078892448
A 1078892448
B 1078892384
B 1078892384
C 1078892608
C 1078892608
a 1076679360
a 1076679360
b 1076679584
b 1076679584
c 1076690176
c 1076690176

Variables in Python
That Python is a fairly new computer language; it marks the
characteristics and application of variables, among others. In the
traditional computer languages ​such as C, C ++, Pascal, Delphi,
Visual Basic, Java and others, variables must be what you call
strongly typed, ie. they must be defined to contain quite specific
types of values, for example. Integers or text strings and only the
types thus defined must be stored in the current variables. This is not
the case with Python. Here, the content of the variables is
automatically moved to other addresses in the warehouse, if the
redefining is necessary, among other things. Because all kinds of
variables do not take up much space in the warehouse.
Ex. 1:
>>> # Here 2 things happen: 1: the variable a is declared (is
declared) and
>>> # 2: is also assigned the value "Text string" ie it is set to
>>> # to point to the addresses in the computer's warehouse, where
"Text string"
>>> # is stored.
>>> a = "Text string"
>>> a
text string
>>> # Now the same variable is assigned an integer (integer) as
value.
>>> a = 25
>>> print a
25

Ex. 2:
>>> a
'Text string 25'

Python is not strongly typed as C, Pascal and may not be the


majority of other computer language variables. The fact that a
language is strongly typed means that a variable alone can contain
the type of value it is created (defined) to contain. In short, C, Pascal
and the other languages ​keep track of the addresses in the
computer's RAM store; the variables are set to point to.
Boolean expressions / variables
Boolean expressions / variables can take / assign 2 values ​and only
2. The 2 values ​are true (true) and false (false). The 2 values ​in all
computer languages ​should be represented by 0 for false and 1 for
sand, but unfortunately, this is not always the case. Fortunately,
Python here also uses the logical one: 0 for false and 1 for true, what
the following example can demonstrate:

Ex. 1:
>>> 2 == 3
0
>>> 2 == 2
1
Terms & Results

true and true & true

true and false & false

false and true & false

false and false & false

not true & false

not false & true


true or true & true
true or false & true

false or true & true

false or false & false


Ex. 2 We begin by letting a variable point to an address in the store
where the value 5 is stored. For convenience, we choose to give the
variable the bland name a. It was a proper smear to describe what
happened. When you know what happens and because the whole
thing does not have to become absurd and mechanical, we can say
that we assign the variable a value 5, the value b value 7 and the
variable c value 9. The same ratio should always apply to the serious
programmer. One can assign a variable a value, e.g. a = 25 and b =
15 + 10 with the assignment sign = and, for example, indicate that if
the value a variable points to satisfies the same conditions as, e.g.
contains the same speech sum, which in this case is a == b.

Now we can go ahead and then test the values ​of truth values ​in
mathematical terms:
>>> a == 5
1
>>> a == 7
0
>>> b == 7
1
>>> b == 5
0
>>> a == 6 and b == 7
0
>>> a == 7 and b == 7
0
1
>>> a == 7 or b == 7
1
>>> a == 7 or b == 6
0
1
>>> note (a == 7 and b == 6)

HOW TO WRITE INTERACTIVELY


In interactive mode, numbers can be entered with 2 different
functions: raw input, where numbers are received as was the text,
and input that receives numbers as numbers. In the former, the
received value will often have to be converted into numbers.
Otherwise, it may give unfortunate results, e.g.:
>>> numbers = raw_input ("Write a number:")
Write a number: 730
>>> numbers * 4
'730730730730'
>>>
It goes much better with:
>>> number = input ("Write a number:")
Write a number: 730
>>> speech
730

integer = raw_input ("Write an integer: n")


integer = int (integer)
if integer <0:
print "% d is less than zero"% integer
else:
print "% d is greater than zero"% integer
>>> # From text to speech
>>> integer = raw_input ("Write an integer:")
Write an integer: 25
>>> "The entered number was" + integer
'The entered number was 25'
>>> # From numbers to text
>>> integer = input ("Write an integer:")
Write an integer: 25
>>> # and converted to string
The entered variety was" + str (integer)
'The entered quantity used to be 25'
Scripts
If and when you need to develop larger projects in Python, you will
need to save your Python code as individual files you can use over
and over again. There are several text editors for Linux to choose
from. (Insert a link here for a description of text editors in FTAV). The
use of the editor is not that difficult at all, as it may seem initially what
you can easily observe if you have written an example in immediate
mode, then mark the entire text and copy it all to the editor you want
to use. When you have the code inside the editor, let the editor make
a find and replace and remove all.
>>>
Reports. If you have retained the code with the associated block
columns, etc., then what you now have is a script (Python's program
name). The script can be saved under a legal name and with the
type designation py as mitScript.py. Then you can run the (run)
program just as often as you want.
Moreover, it is very easy to expand and change. This is done by
retrieving it in the editor again and writing on it. Of course, you are
not supposed to start your projects in interactive mode each time to
move them out into the editor and continue there. The larger projects
are, as indicated, troubled to treat in interactive mode. There you
write it all in the editor and save it in that way.
In Linux, Python scripts can be made directly executable by
inserting.
as the first line of the program. "#!" (garden gate and exclamation
mark) MUST be the first two characters in the script. The script can
be assigned an executable mode or permission the use of the
command:

$ chmod + x myscript.py

It is possible to use other control codes than ASCII in Python source


texts. The most reliable way to do this is to insert another special
command line immediately after it has already been shown:

If you want to run a boot file more from the current directory, you can
write a global boot file using the code:

if os.path.isfile ('. pythonrc.py'): execfile ('. pythonrc.py')

If you want to practice the startup file in a program, the program


must further contain the following code (currently adapted to your
program):

import us
filename = os.environ.get ('PYTHONSTARTUP')
if filename and os.path.isfile (filename):
execfile (filename)

If, elif and else


Here is an example of how one can test for different conditions and
get one's functions to do some different things.
>>> def findMax (a, b, c):
... max = a
... if b> a:
... max = b
... if c> b:
... max = c
... return max
...
>>> findMax (37, 2 * 19, 43)
43

>>> numbers = 17 ** 4
>>> if numbers <50000:
... print "The number is less than 50000"
... else:
... print "The number is", number
...
The number is 83521

If the statement is perhaps the best-known phrase (statement)

>>> x = int (raw_input ("Write an integer:"))


>>> if x <0:
... x=0
... print 'Negative number changed to zero'
... elif x == 0:
... print 'Zero'
... elif x == 1:
... print 'One'
... else:
... print 'Greater than 1'
...
There may be blank or more elif blocks, else can be selected/omitted
as desired. The key word "elif" is an abbreviation of "else if" and is
practical, as it is possible to avoid too many block insertions (they
take up enormously in width).

>>> for throw in range (1, 10001): # 10,000 dice throw


... outcome = random.randrange (1, 7)
... if outcome == 1:
... f1 + = 1
... elif outcome == 2:
... f2 + = 1
... elif outcome == 3:
... f3 + = 1
... elif outcome == 4:
... f4 + = 1
... elif outcome == 5:
... f5 + = 1
... else:
... f6 + = 1
print "Outcome:"
print "Number one:", f1
print "Number of turns:", f2
print "Number of threes:", f3
print "Number fours:", f4
print "Number of fives:", f5
print "Number of sixes:", f6
import random
def boxMed2 ():
dice1 = random.randrange (1, 7)
cube2 = random.randrange (1, 7)
sumTotal = dice1 + dice2
print "Player beat% d +% d =% d"% (dice1, dice2, sumTotal)
return sumTotal
sum = throwMed2 () # first pass throw
gameStatus = "WINS"
gameStatus = "TABLE"
else: # remembers points
gameStatus = "CONTINUE"
minePoints = sum
print "Player points are", myPoints
while gameStatus == "CONTINUE": # continue the game
sum = throwMed2 ()
if sum == minePoints: # won with the following result
gameStatus = "WINS"
elif sum == 7: # won by beating / throwing the sum 7
gameStatus = "TABLE"
if gameStatus == "WET":
print "The player won"
else:
print "The player lost"

Break and continue

>>> for i in range (1,101):


... print i,
... if i == 6:
... break
...
123456

Function definition and call


During the development of a program, an appropriate way to divide
the program may be needed so that we can only deal with part of the
program at a time. We may want to put more trivial routines into a
corner of the program so that they do not disturb us in other program
writing. Here are functions (functions) or, like the same, often called
methods (methods) a familiar help in most computer languages ​if not
all of the present. In the original BASIC, the concept was not known
why we used "GOTO" in plenty. In Pascal and Delphi, in addition to
the functions, some relatively similar functions are "package
collectors", called procedures. Procedures may return a value, but
do not need it. Depending on the computer language, a function can
unconditionally return a value (applies in C ++), in other computer
languages ​such as Pascal, Delphi, Visual Basic and Python, a
function can return a value, but does not necessarily need to. Python
functions, therefore, become what C ++ programmers will probably
call a procedure. But let us now write a Python function of each kind
- first a function that returns a value (i.e. the "real" function).
When you are finished with the definition, press the Enter key twice
to tell Python that the definition is complete and partly to get back to
the interactive mode selection
>>>
Funktionsdefinering:
>>> def function (): -
... a = 4 * 5
... return a
...
Function call:
>>> function ()
20
As you can see above, the keyword def should be used to tell
Python that the upcoming is a feature definition. After the keyword,
the function name follows, which can be all legitimate names as in
variables. After the function name follows a parameter list which, as
here, can be empty, it is surrounded by round brackets and is
followed by the colon that you will forget many times before you are
inside the routine of telling Python that the upcoming is a related
block (here the function body). Python responds by moving the
following program lines a tab width to the right. If you copy the code
from, for example, an editor to Python, it is necessary to get said
indentations with any. By inserting them yourself by tapping the
keyboard tab. There are often several blocks in one function. If so,
the program lines belonging to the block in question must be moved
another tab stop to the right.
The return value can also be included in, for example, another
calculation operation:
>>> print function () * 7
140
Function with a parameter (sometimes called an argument):

>>> def f (a): ... return a * 1.25 ... >>> f (75) 93.75

Function with more than one parameter:


>>> def function name (s1, s2): ... return s1 + s2

Function call:

>>> function name ("string1", "string2.") 'string1 string2.

Same function, but by assigning numerical values:


>>> function name (10,20.50) 30.5

Custom Function Calling Another Custom Function:

>>> def f1 (): ... print f2 () ... >>> def f2 (): ... print "Function f2 has
been called." ... >>> f1 () Function f2 has been called. none

Custom function calls predefined function:


>>> from sys import exit >>> def exit (): ... sys.exit () ... >>> quit ()
jabot @ linux: ~>

>>> m = input ("Write this month's number:") Write the number


(space in the calendar) for this month: 12 >>> month names =
["January", "February", "March", "April", " May "," June "," July ", ..."
August "," September "," October "," November "," December "] >>>
If 1 <= m <= 12: ... print "This month's name is", moon names [m - 1]

This month's name is December


List can be received as argument in function: >>> list [] # create
empty global list >>> def ul (l): # argument l is declared here as local
name ... list.append (l) # expands the global list ... >>> ul (1) >>> ul
("Hasle") >>> ul (2) >>> ul ("Nyker") Driving performance: >>> list [1,
'Hasle', 2, 'Nyker']

>>> list = [1,2,3,4] >>> def udvL (l = []): # argument l is declared here
as local name ... for i in range (5,11): ... l.append (i) ... print l ...
Driving result: >>> udvL (list) [1, 2, 3, 4, 5] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4,
5, 6, 7] [1, 2, 3, 4, 5, 6, 7, 8] [1, 2, 3, 4, 5, 6, 7, 8, 9] [1 , 2, 3, 4, 5, 6,
7, 8, 9, 10] >>>

def kR (length = 1, width = 1, height = 1): return length * width *


height
print "Pre-selected box volume:", kR () print "Volume of box with
length 10 is:", kR (10) print "Volume of box with height 10 is:", kR (1,
10) print "Volume of box with length and height 10 are: "kR (10, 10)
print" Volume of box of length 12, width 4 and height 10 is: "kR (12.4,
10)

Find prime numbers: >>> for i in range (2, 10): ... for j in range (2, i):
... if i% j == 0: ... print in, 'equals ', j,' * ', i / j ... break ... else: ... print i,'
is a prime number ... ... 3 is a prime number 4 equals 2 * 2 5 is a
prime number 5 is a prime number 6 is equal to 2 * 3 7 is a prime
number 7 is a prime number 7 is a prime number 7 is a prime
number 7 is a prime number 8 is equal to 2 * 4 9 is a prime number 9
is equal to 3 * 3 >>>
Calculate faculty by recursion: def faculty (numbers): if number <= 1:
return 1 else: return number * faculty (number - 1) # recursive call
for i in range (1.11): print "% 2d! =% d"% (i, faculty (i))
jabot @ linux: ~> python fakultet.py 1! = 1 2! = 2 3! = 6 4! = 24 5! =
120 6! = 720 7! = 5040 8! = 40320 9! = 362880 10! = 3628800

>>> def F ():


>>> F ()
...
>>> Extract
23
>>>

def OK (prompt, round = 4, answer = 'Answer yes or no!'):


while True:
ok = raw_input (prompt)
if ok in ('YES', 'Yes', 'yes'): return 1
if ok in ('NO', 'No', 'no'): return 0
round = round - 1
if round <0: raise IOError, 'difficult user'
print response
Can be called with, for example:
OK ('Do you want to exit?')
and:
OK ('Is the file closed?', 2)

>>> i = 5
>>> def f (arg = i):
... print in
...
>>> f ()
5
>>>

>>> def f (a, L = []):


L.append (a)
return L ... ...
...
A driving result:
>>> for i in range (10):
... print f (i)
...
[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4, 5]
[0, 1, 2, 3, 4, 5, 6]

>>>

def f (a, L = None):


if L is None:
L = []
L.append (a)
return L

>>> for i in range (10):


... print f (i),
...
Return the None variable:
>>> def f ():
... return
...

>>> print f (56)


none
>>>

Keyword arguments. You can choose to have your function have


several predefined and pre-assigned keywords. Such keywords
should not be confused with the variables defined in Python. We'll
see an example:

>>> def function (computer, quality, operating system = "Linux",


computer language = "Python"):
... print "One", quality, computer, "use", operating system
...
>>> function ("PC", "well equipped")
A well-equipped PC uses Linux
>>>
The arguments computer and quality are empty, so they must be
assigned value by the function call. The keyword arguments should
not, as they are already assigned to them. The same goes for the
keyword computer language.
Like other variables, the variable quality can receive an empty value,
otherwise you may get a little illogical result like:

>>> function (0.0)


Of course, a 0 0 uses Linux
>>>
>>> function ("PC", "")
A PC uses Linux
>>>
>>> function ("", "well equipped")
A well-equipped Linux uses
>>>
>>> function ("PC", "fairly", operating system = "Windows")
A fairly PC uses Windows
>>>

When a function's last parameter is of the form ** name (with 2


leading stars), it can receive a glossary whose keywords are not in
the parameter list. It can be combined with a formal parameter of the
form * name (with 1 leading star). * Name must come before **
name:
>>> def f (variant, * arguments, ** keywords):
... pass
...
>>>
Function calls of this type are widely used in Python, so it is
important to understand the effect, etc., why I showed the first
example with Danish names, otherwise it is most practical to stick to
the English, as these are the ones you would normally see:

>>> def f (program name, * arguments, ** keywords):


... return program name, arguments, keywords
...
>>> f ("withProgram.py")
('withProgram.py', (), {})
>>>

As you can see, a tuple containing 1: argument 2: a tuple and 3: a


glossary is returned. If you want to read the external file with
Program.py, you can use the following introduction:
>>> f ("withProgram.py", "r")
('withProgram.py', ('r',), {})
>>>

>>> f ("withProgram.py", "r", "r2", "r3") ("withProgram.py", ("r", "r2",


"r3"), {}) >>>
>>> f ("mitProgram.py", "r", "r2", "r3", fiscal = 2003) ('mitProgram.py',
('r', 'r2', 'r3'), {' financial year ': 2003}) >>>

>>> f ("withProgram.py", "r", "r2", "r3", accounting month =


"January") ("withProgram.py", ("r", "r2", "r3"), {"month": "January"})
>>> >>> f ("mitProgram.py", "r", "r2", "r3", month = "February")
("withProgram.py", ( 'r', 'r2', 'r3'), {'month': 'February'}) >>> >>> f
("mitProgram.py", "r", "r2", "r3", month = "January", md2 =
"February") ("mitProgram.py", ("r", "r2", "r3"), {'md2 ":" February ","
md ":" January "}) >>>

The argument list does not necessarily have to be of a scale as a


Danish style:
>>> def f (v, * a, ** k): ... return v, a, k ... >>> f ("Vareart", "bike",
name = "Christiania") ('Vareart ', (' bike ',), {' name ':' Christiania '})
>>> >>> f ("Vareart", "bike", name2 = "Christiania") (' Vareart ', ('
bicycle ' ,), {'name2': 'Christiania'}) >>>
>>> def f (v, * a, ** k): ... print a, k ... >>> f ("Vareart", "bike", name =
"Christiania") ("bike", ) {'name': 'Christiania'}
>>> def f (a, * b, ** c): ... c = {1: "a", 2: "to", 3: "three"} ... return a, b,
c. .. >>> f (1,2) (1, (2,), {1: 'and', 2: 'to', 3: 'tre'}) >>>
Note: the book list requires the curly brackets: >>> def f (a, * b, ** c):
... return a, b, c ... ...

>>> f (1, "four", "five", "six", 7: "seven", 8: "eight", 9: "nine") File "
<stdin>", line 1 f (1, "four", "five", "six", 7: "seven", 8: "eight", 9:
"nine") ^ SyntaxError: invalid syntax >>> f (1, "four", "five", "six", {7:
"seven", 8: "eight", 9: "ni"}) (1, ("fire", "five", "sex", {8: "eight", 9: " ni ',
7:' seven '}), {}) >>>
>>> def f (a, * b, ** c): ... for arg in b: print arg ... >>> f (1, "two",
"three", "four") two three fire >>>

>>> args = [1,10] >>> range (* args) [1, 2, 3, 4, 5, 6, 7, 8, 9] >>>

>>> args = [3, 6] >>> range (* args) [3, 4, 5]

Lamda works in the background:


>>> def fraud (s):
... return lambda x: x + n
...
>>> f = fraud (1.25)
>>> f (100)
101.25
>>>

Lists
Lists are variables of zero to several related spaces. Its square
brackets know the list. If you think of them taken away, you have a
tuple. It is known for its commas.

List and its options:


>>> # Create empty list
>>> list = []
>>> # expand the list with an additional item
>>> listen.append ("Ypnasted")
>>> # The extension can also be done as follows:
>>> list [list (list):] = "cracked"
>>> list [list (list):] = ["Holy Peder"]
>>> list
['Ypnasted', 'T', 'e', ​'g', 'l', 'k', 'a', 'a', 's', 'Holy Peder']
>>> # The extension can also be done as follows:
>>> listen.extend (["Catch", "Hammer"])
>>> list
['Ypnasted', 'T', 'e', ​'g', 'l', 'k', 'a', 'a', 's', 'Holy Peder', 'Catch',
'Hammer']
>>> # The list can be reversed:
>>> listen.reverse ()
>>> list
['Hammer', 'Catch', 'Sacred Peder', 's', 'a', 'a', 'k', 'l', 'g', 'e', ​'T',
'ypnasted']
>>> # removes last item from list:
>>> listen.pop ()
'Ypnasted'
>>> list
['Hammer', 'Catch', 'Sacred Peder', 's', 'a', 'a', 'k', 'l', 'g', 'e', ​'T']
>>> for i in range (0.8):
... list.pop ()
...
'T'
'E'
'G'
'L'
'K'
'A'
'A'
'S'
>>> list
['Hammer', 'Catch', 'Holy Peder']
>>> # sorts the list
>>> listen.sort ()
>>> list
['Hammer', 'Holy Peder', 'Catch']
>>> # removes specified item from list:
>>> listen.remove ("Holy Peder")
>>> list
['Hammer', 'Catch']
>>> # returns the given index (space) in the list
>>> list.index ("hammer")
0
>>> the list.index ("Vang")
1
>>> # finds the list's biggest item:
>>> max (list)
'Vang'
>>> # finds the list's smallest item:
>>> min (list)
'The Hammer'
>>> # splits the list into single elements
>>> zip (list, [1,2])
[('Hammer', 1), ('Catch', 2)]
>>> zip (list, [8.7])
[('Hammer', 8), ('Catch', 7)]
>>> # multiplies the list:
>>> lists = list * 2
>>> Lists
['Hammer', 'Catch', 'Hammer', 'Catch']
>>> # multiplies the list:
>>> list = zip (list, [1,2]) * 2
>>> Lists
[('Hammer', 1), ('Catch', 2), ('Hammer', 1), ('Catch', 2)]
>>>
Returning larger or smaller parts of a list is called slicing. Slicing a
given part of a list is done by inserting start and end indexes in the
square brackets known from the list:
>>> l = [1,2,3,3,4,5,6,7,8,9] >>> print l [0] # return the list's first item
1 >>> print l [0: 4] # return the list's first 4 elements [1, 2, 3, 3] >>>
print l [3: 6] # return the list 4 to 6 element [3, 4, 5] >>> print l [-4] #
return the list's last 4 item 6 >>> print l [-1] # returns the list's last
item 9 >>>

# Create list with even numbers <= 20 # Search list by integer


(integer) list = range (0, 21, 2) # list content: [0, 2, 4, 6, 8, 10, 12, 14,
16, 18, 20] print list shouldFind = int (raw_input ("Enter integer <=
20:")) if should be found list: print "Found on index:", list.index
(shellFind) else: print "The element was not found"
Enter integer <= 20: 12 Found on index: 6 Enter integer <= 20: 13
Element not found
The list and the tuple have several similarities: >>> l = [] >>> t = {}
>>> t = 1,2,3,5 >>> t (1, 2, 3, 5) >>> l = t >>> l (1, 2, 3, 5) >>> l [1] 2
>>> t [1] 2 >>> l = [6,7,8,9] >>> l [ 6, 7, 8, 9] >>> t = l >>> t [6, 7, 8,
9]

List = [] # create empty list # insert items in list for index in range (1,
11): list + = [index]

Print "The contents of the list", list

Print # inserts empty line

For element in list: print element,


Print

# List access via index (room number) print "Select items according
to their index:" print "List content:"

For i in range (len (list)): print "% 6d% 3d"% (in, list [i])
Update "list items ..." print "list content before update:", list list [0] =
-100 list [-3] = "bornholmers" print "list content after update:", list

Jabot @ linux: ~> python applied_liste.py List content: [1, 2, 3, 4, 5,


6, 7, 8, 9, 10]

Select items according to their index: Content of the list: 0 1 1 2 2 3 3


4 4 5 5 6 6 7 7 8 8 9 9 10
Updating list items ... List content before the update: [1, 2, 3, 4, 5, 6,
7, 8, 9, 10] List content after the update: [-100, 2, 3, 4, 5, 6 , 7,
'Bornholmers', 9, 10]

List = [] # creates empty list


# insert 10 integers via user entries print "Write 10 integers:"

For i in range (10): newElement = int (raw_input ("Write whole


number:% d:"% (in + 1))) list + = [newElement]

List of lists in list: list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Print "The factors in the list:" for in list: for an component in print
element, print

Driving Results: python row.py The items in the list: 1 2 3 4 5 6 7 8 9

>>> List = range (11) >>> list [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Lists can be merged (include other lists) eg: >>> list = [[]] >>> list [[]]
>>> list * 3 [[], [], []]

>>> list_1 = [2, 3] >>> list_2 = [1, list_1, 4] >>> len (list_2) 3 >>>
list_2 [1] [2, 3] >>> list_2 [1] [ 0] 2 >>> list_2 [1] .append ('extrapost')
>>> list_2 [1, [2, 3, 'extrapost'], 4] >>> list_1 [2, 3, 'extrapost']> >>
stack = [3, 4, 5] &gt;&gt;&gt; stack.append (6) &gt;&gt;&gt;
stack.append (7) &gt;&gt;&gt; stack [3, 4, 5, 6, 7] &gt;&gt;&gt; stack.
pop () 7 &gt;&gt;&gt; stack [3, 4, 5, 6] &gt;&gt;&gt; stack.pop () 6
&gt;&gt;&gt; stack.pop () five &gt;&gt;&gt; stack [3, 4]

>>> List = ["Ugleenge", "Sæne", "Bækkely", "Stampen"] >>> for i, v


in enumerate (list): ... print i, v ... 0 Ugleenge 1 Sane 2 Bækkely 3
Stampen >>>
There may also be loop over two or more simultaneous sequences:
>>> person = ['Ole', '120', 'nature'] >>> answer = ['name:', 'age:',
'hobby:' ] >>> for i, j in zip (person, answer): ... print j, in ... name: Ole
age: 120 hobby: nature >>>

Lists can be compared:


>>> [1, 2, 3] <[1, 2, 4]
Threaten
>>>
>>> [1, 2, 3, 4] <[1, 2, 4]
Threaten
>>>
>>> [2,3,4] <> [2.0,3.0,4.0]
Threaten
>>>
>>> [1, 2] <[1, 2, -1]
Threaten
>>>
Lists used as stacks.
In the computer language FORTH, the term stack is used a lot.
There you usually compare a stack with a stack of plates. In such a
stack of plates, the bottom, under normal conditions, will be placed in
the bottom to stack or stack. In FORTH you usually work with the
concepts LIFO (load in first out or in Danish first in (i.e. at the top of
the plate stack) first out or first out meaning, as the word says first -
ie you take the top plate (the top or last placed element first,
corresponding to LIFO, you also use FIFO in FORTH The
abbreviation FIFO stands for first in first out or in Danish first in first.
Transferred to the plate stack means that the bottom plate in the
stack is the one the server uses first. I have vaguely explained the
principle from FORTH because it thereby also becomes easier to
understand how Python can do something similar. In Python, you call
a stack for a queue, just as you usually have to print the actual FIFO
and LIFO elements. This can be done as follows:

>>> queue = ["Arnager", "Dueodde", "Nexø", "Svaneke"]


>>> queue.pop (0)
'Arnager'
>>> queue.pop (2)
'Svaneke'
>>> queue.append ("Gudhjem")
>>> queue.pop (2)
'Gudhjem'
>>> queue.append ("tern")
>>> queue.pop (2)
'Teen'
>>>

Functional programming tools


In Python, three predefined features are very useful for lists:

Filter (), folder (), and reduce ().


Example: Syntax: filter (function, sequence) returns, if possible, a
sequence of the same type as the filter parameter list. The returned
sequence consists of the values ​that make a function (argument)
true (true).

Example:

Calculate Primary:
Syntax: filter (function, block)
>>> def f (x): return x% 2! = 0 and x% 3! = 0
...
>>> filter (f, range (2, 25))
[5, 7, 11, 13, 17, 19, 23]

Syntax: map (function, block)


Calls function (parameter) for each of the sequence elements and
return a list containing the return values.
Example: Calculate cubic number: >>> def cubic (x): return x * x * x
... >>> map (cubic, range (1, 11)) [1, 8, 27, 64, 125, 216, 343, 512,
729, 1000] >>>
Map can also take several sequences. >>> def square (x): return x *
x ... >>> map (None, sequence, map (square, sequence)) [(0, 0), (1,
1), (2, 4) , (3, 9), (4, 16), (5, 25), (6, 36), (7, 49)] >>>
Example: Syntax: reduce (function, sequence) >>> 1 + 2 + 3 + 4 + 5
+ 6 + 7 + 8 + 9 + 10 55 >>>
>>> def sumTotal (x, y): return x + y ... >>> reduce (sumTotal, range
(1, 11)) 55 >>>
>>> reduce (sumTotal, range (11) ...) 55 >>>

New in Python version 2.3: >>> names = ['London', 'Paris',' New


York cities', 'Gudhjem'] >>> [cities.strip () for cities in names]
['London', 'Paris ',' New York City ',' Gudhjem '] >>>
>>> boys = ["Ole", "Per", "Sofus", "Nikolai"] >>> [name.strip () for
names in boys] ['Ole', 'Per', 'Sofus',' Nikolai '] >>>
>>> straight = [2, 4, 6]
>>> [3 * x for x in straight]
[6, 12, 18]
>>>
>>> [3 * x for x in just if x> 3]
[12, 18]
>>>
>>> [3 * x for x in just if x <2]
[]
>>>
>>> [[x, x ** 2] for x in straight]
[[2, 4], [4, 16], [6, 36]]
>>>
>>> Straight = [2, 4, 6]
>>> mixed = [4, 3, -9]
>>> [x * y for x in right for y in mixed]
>>>
>>> [x + y for x in right for y in mixed]
>>>
>>> [straight [i] * mixed [i] for i in range (len)]
[8, 12, -54]
>>>
>>> [str (round (355 / 113.0, i)) for i in range (1.6)]
['3.1', '3.14', '3.142', '3.1416', '3.14159']
>>>
[0, 1, 8, 27, 64]
>>>

Tuples consist of anal values ​separated by commas. It is said that


tuples are known in their commas while the list is known on its
square brackets. ex:

>>> t = 1,2,3,4,5
>>> t
(1, 2, 3, 4, 5)
>>>
That the separating commas matter is seen here:
>>> notice = "Danger ahead!"
>>> len (notice)
12
>>> notice = "Danger ahead!"
>>> len (notice)
1
>>>
In the first case, "Danger ahead!" is a textual content string; in the
latter case, it is a tuple. The distinction is the comma alone.
The Tuple and its use created from user data:
t = int (raw_input ("Write current hour:")))
m = int (raw_input ("Enter current minute:"))

Straight Now = t, m, s # create a tuple

Print "Number of seconds since midnight", (justNow [0] * 3600 +


evenNow [1] * 60 + evenNow [2])
Driving result: jabot @ Linux: ~> python applied_tuple.py Write the
current number of hours: 11 Write the current minute number: 17
Write the current second number: 13 The Tuples content is: (11, 17,
13) Number of seconds since midnight 40633
String, Listed and Tuple Usage: The string = "a b c d afghan" tuple =
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) the list = ["I", "II", "III", "IV", "V", "VI", "VII",
"VIII", "IX", "X"]

Print "strings:", strings print "tuples:", tuple print "list:", list

Start = int (raw_input ("Select start point:")) last = int (raw_input


("Select end point:"))

print "the string [", start, ":", last, "] =", the string [start: last]
print "tuple [", start, ":", last, "] =", the up [start: last]

Driving result: string: a b c d efghij tuple: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)


list: ['I', 'II', 'III', 'IV', 'V' , 'VI', 'VII', 'VIII', 'IX', 'X'] Select starting point: 0
Select end point: 6

strands [0: 6] = abcdef tuples [0: 6] = (1, 2, 3, 4, 5, 6) list [0: 6] = ['I',


'II', 'III', 'IV ',' V ',' VI '] jabot @ linux: ~>
Tuples can be braided: >>> t = 12345, 54321, 'hi!' >>> t [0] 12345
>>> t (12345, 54321, 'hi!')

>>> # Tuples can be merged: >>> u = t, (1, 2, 3, 4, 5) >>> u ((12345,


54321, 'hi!'), (1, 2, 3, 4 , 5))

Tuples can be compared: >>> (1, 2, 3) <(1, 2, 4) True >>> >>> (1, 2,
3, 4) <(1, 2, 4) True >>> >>> (2,3,4) <> (2,0,3,0,4.0) False >>> >>>
(1, 2) <(1, 2, -1) True >>> >>> (1, 2, -1) 2, 3) == (1.0, 2.0, 3.0) True
>>> >>> (1, 2, 3.4) <> (1, 2, 3, "p") True >>> >>> ( 1, 2, ('aa', 'ab'))
<(1, 2, ('abc', 'a'), 4) True >>>

Comparison of Lists and Tuples


>>> [1, 2, 3] <(1, 2, 4)
Threaten
>>>
>>> [1, 2, 3, 4] <(1, 2, 4)
Threaten
>>>
>>> [2,3,4] <> (2.0,3,0,4.0)
Threaten
>>>
>>> [1, 2] <(1, 2, -1)
>>> True
>>>
[1, 2, 3] == (1.0, 2.0, 3.0)
Threaten
>>>
>>> [1, 2, 3.4] <> (1, 2, 3, "p")
Threaten
>>>
>>> [1, 2, ['aa', 'ab']] <(1, 2, ('abc', 'a'), 4)
Threaten
>>>

Dictionary = {} # creates tom dictionary


Print "Glossary content:" glossary

Postal codes = {"Rønne": 3700, "Allinge": 3770, "Gudhjem": 3780,


"Nexø": 3730} print "nAll inserted postal codes:", postal codes

# Access and fairly existing glossary print "nun ice postal code:",
postal codes ["Rønne"] postal codes ["Gudhjem"] = 3760 print
"Gudhjems real postal code:", postal codes ["Gudhjem"]

# add postcode postal code ["Aakirkeby"] = 3720 print "Postcode


postal codes after the correction:" print postal codes
# Plain entry from the glossary part postcodes ["Allinge"] print "\ t
Glossary content: {}

All inserted postal codes: {'Nexø': 3730, 'Gudhjem': 3780, 'Rønne':


3700, 'Allinge': 3770}

Rønne postcode: 3700 Gudhjems real postcode: 3760

Glossary's postcodes after the correction: {'Nexø': 3730, 'Aakirkeby':


3720, 'Gudhjem': 3760, 'Rønne': 3700, 'Allinge': 3770}

Glossary's current content: {'Nexø': 3730, 'Aakirkeby': 3720,


'Gudhjem': 3760, 'Rønne': 3700}
Here is an example of using a glossary: ​>>> postcode = {'Nyker':
3700, 'Hasle': 3790} >>> postcode ['Gudhjem'] = 3760 >>> postcode
{'Hasle': 3790, 'Nyker': 3700, 'Gudhjem': 3760} >>> >>> part
postcode ["Hasle"] >>> postcode {'Nyker': 3700, 'Gudhjem': 3760}
>>> >>> postcode { 'Muleby': 3700, 'Nyker': 3700, 'Gudhjem': 3760}
>>> >>> postcode. Keys () ['Muleby', 'Nyker', 'Gudhjem'] >>> >>>
postcode. has_key ("Hasle") 0 >>> >>> postcode.has_key
("Gudhjem") 1 >>>
The constructor dict () creates word lists directly from a list whose
elements are tuples: >>> list = [('Muleby', 3700), ('Nyker', 3700),
('Gudhjem', 3760)] >>> dict ( list) {'Nyker': 3700, 'Muleby': 3700,
'Gudhjem': 3760} >>> or: >>> dict ([('Muleby', 3700), ('Nyker', 3700),
(' Gudhjem ', 3760)] (' Nyker ': 3700,' Muleby ': 3700,' Gudhjem ':
3760} >>>

>>> list = [] >>> for i in range (6): ... list.append ((str (i), i * i)) ... >>>

>>> list = [('0', 0), ('1', 1), ('2', 4), ('3', 9), ('4', 16), ('5' , 25)] >>> dict
(lista) {'1': 1, '0': 0, '3': 9, '2': 4, '5': 25, '4': 16} >> >

Loop Techniques: >>> t = "Owls", "Special", "Bægly", "Stampen" >>>


t ('Ugleenge', 'Sxe6ne', 'B \ t >

glossary = {"Ugle": 1, "Seen": 2, "Bækkely": 3, "Stampen": 4} >>> for


in in wordlist.items (): print in ... ('S , 2) ('B xe 6kkely', 3) ('Ugleenge',
1) ('Stampen', 4) >>>

The entries in the glossary are: "print calendar. keys ()

The list items are: "print calendar. values ​()

for loop retrieves glossary elements: "for entry in calendar.keys ():


print" calendar [", input,"] = ", calendar [input]
Driving result: jabot @ linux: ~> python calendar.py Glossary
content: [(1, 'January'), (2, 'February'), (3, 'March'), (4, 'April'), (5 ,
'May', (6, 'June'), (7, 'July'), (8, 'August'), (9, 'September'), (10,
The glossary's index (entries) are: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Glossary elements are: [January, February, March, April, May, June,


July, August, September, October, November, November, Word
October December']

For loop retrieves glossary elements: calendar [1] = January


calendar [2] = February calendar [3] = March calendar [4] = April
calendar [5] = May calendar [6] = June calendar [7] = July calendar [
8] = August calendar [9] = September calendar [10] = October
calendar [11] = November calendar [12] = December jabot @ linux:
~>

Escape sequences

A new line
b Backspace. Moves the cursor 1 place back
and System Clock
Backspace. Inserts
"Inserts quotes
Insert single quotes

Number systems
In the computer language Forth, you can use about 70 different
number systems depending on the language version. The number
system we use today is a 10-digit system or the decimal number
system. This has not always been the world over. About 4000 years
ago, the 60's system was used in the area just northwest of the
Persian Sea. Let's just look at the number 4320. It consists of 4
thousand, 3 hundred, 2 tens and 0 ones, which most certainly take
as a matter of course, but is this also so of course? No, that already
mentioned has probably already revealed that it is not at all as
simple as the 10's system is nonsense for any computer. It can only
work in the 2-figure system. This is because the computer, which the
electrical device it is, can only respond to whether there is a current
in a wire or not. If a current goes, it has been decided to register the
state with the number 1. If no current goes, it is registered with the
number 0.
2 is the basic number in the total system, which is 10 in the 10-digit
system and the nin-number system. The digits of a number system
always constitute the number that the basic number of the number
system indicates. The base number is usually what gives the number
system its name. In the total system, there are two digits (zero and
one). In the title system, there are ten digits (from 0 through 9
inclusive), etc.
Because all number systems have 0 as the lowest digit, the highest
digit becomes less than the basic number of the number system. In
the 2-figure system, the largest digit thus becomes 2 - 1 = 1 and in
the 10-digit system 10 - 1 = 9 and so on. The rearmost digit (the
furthest right) will represent the digit times the base number in the
first. A digit in a number will always be a digit time the basic number
- 1. But let Python make it a little clearer:

Hexadecimal and octal numbers


In Python, a hexadecimal number is specified at a leading zero
followed by an x ​and terminated with the hexadecimal number such
as. 0x49 corresponding to 9 + 4 * 16 = 73 in our normal 10's system.
I do not get closer to the octal number system (the one with the basic
number 8), but must only briefly mention that in it is 7 largest digit x
omitted. This means that 0x49 corresponds to 0111 (1 ener 1 8s and
1 64s or 8 ** 2) ie 1 + 8 + 64 = 73 in the decimal number system. It
may be difficult for someone to begin to understand, so let's look at a
few examples:
First, an example from the 10th-century system (decimal number
system)
>>> for i in range (0.9):
... print "10 i", i, ". is:", 10 ** i
...
10 in 0. is: 1
10 in 1. is: 10
10 in 2. is: 100
10 in 3. is: 1000
10 in 4. is: 10000
10 in 5. is: 100000
10 in 6. is: 1000000
10 in 7. is: 10,000,000
10 in 8. is: 100000000
>>>

The binary number system:


>>> for i in range (0.9):
... print "2 i", i, ". is:", 2 ** i
...
2 in 0. is: 1
2 in 1. is: 2
2 in 2. is: 4
2 in 3. is: 8
2 in 4. is: 16
2 in 5. are: 32
2 in 6. is: 64
2 in 7. is: 128
2 in 8. is: 256
>>>
The octal number system:
>>> for i in range (0.9):
... print "8 i", i, ". is:", 8 ** i
...
8 in 0. is: 1
8 in 1. is: 8
8 in 2. is: 64
8 in 3. is: 512
8 in 4. is: 4096
8 in 5. is: 32768
8 in 6. is: 262144
8 in 7. er: 2097152
8 in 8. is: 16777216
>>>
The hexadecimal number system:
>>> for i in range (0.9):
... print "16 i", i, ". is:", 16 ** i
...
16 in 0. is: 1
16 in 1. is: 16
16 in 2. is: 256
16 in 3. is: 4096
16 in 4. is: 65536
16 in 5. is: 1048576
16 in 6. is: 16777216
16 in 7. is: 268435456
16 in 8. is: 4294967296
>>>
Print hexadecimal numbers:
>>> for i in range (0.17):
... print hex (i)
...
0x0
0x1
0x2
0x3
0x4
0x5
0X6
0x7
0x8
0x9
0XA
0xb
0xc
0xD
0xE
0xf
0x10
Print octal numbers
>>> for i in range (0.17):
... print oct (i)
...
0
01
02
03
04
05
06
07
010
011
012
013
014
015
016
017
020
From octal and hexadecimal numbers to decimal
>>> a = 0x49
>>> a
73
>>> a = 0111
>>> a
73
>>> # whether "big" or "small" is used
>>> # letters are irrelevant
>>> a = 0xa5
>>> a
165
>>> b = 0XB2
>>> b
178
2.18.3. From decimal to octal and hexadecimal numbers

>>> oct (73)


'0111'
>>> hex (73)
'0x49'
Hexadecimal numbers in text strings
In text strings, zero is replaced by a backslash.
>>> "x41", chr (65) # 1 + 4 * 16 = 65
('A', 'A')
>>> "x61", chr (97) # 1 + 6 * 16 = 97
('a', 'a')

The easiest thing is that function chr is old-fashioned, from Python


2.4 it will be completely out, so it is the only niche function that
should be used. But if we go in depth, it is not that easy. The
explanation is that the chr function in its original design was solely
intended for characters (numbers, letters, etc.) whose numerical
value was below 129. Even Python 2.2 returned an error message if
the character's numerical value was greater than 128. Starting with
character number 129 and the character table out, the function
unichr should be used. In short, it is a larger character table, how big
is this kind of uncertain, for Python's version 2.4 is huge since you
switch to using 64 bits (one followed by 64 zeros) code. There must
have been a lot of considerations in the Python community about
how the chr function should function in the future because even in
the same version of the language there are and have been
deviations. In the version I am using right now (a Python 2.3), chr
and unichr return the function character number 156 as follows:
>>> chr (156)
'\ X 9c'
>>> unicorn (156)
u '\ x 9c'
In both cases, the return is completely in order. That there is a u in
front of the quotation mark in niche return tells that the string is a
Unicode string, what chr return is of course not. FORDI the return
here was okay, there may be good reason to assume that Python
has "cheated" a bit, so that in the version I am using right now, the
same code is used in chr as in the niche functions, but as indicated it
is not Always the case, what you can figure out by using a Python
version lower than 2.3

l = [] # here blank list is created


for i in range (65.91):
if i / 78 == 0: print "n"
l.append (chr (i)) # chr converted number to character (character)
...
>>> l
The number 32 is equal to the distance between the numerical value
of a "large" letter and the numerical value of the corresponding
"small" letter.

>>> def uni ():


... number = input ("Write an integer between -1 and 256:")
... print number, niche (number)
...
>>> uni ()
Write an integer between -1 and 256: 255
255 ÿ

PYTHON DATA STRUCTURES


In programming, a 'data structure' is a structure for organizing data.
The choice of data structure is important since they have different
significance for performance and perform differently depending on
which algorithms have been planned to be used. A data structure is
an abstract description, unlike 'data types'. A data type may be, for
example, Integer, String or boolean. It has a definite significance
while a data structure describes something undefined, such as a list
or array.
There are many different data structures in different categories.
Another structure category that has a large place in programming is
"Tree". They are a little more complex than, for example, a Stack.
Many data structures are already built into the programming
languages (e.g. list in Python), and some ready-made modules and
libraries have the structure implemented and ready. Nevertheless, it
is important to have an insight into how they work "on the inside".

There are three different data structures.

Stack (Linear Data Structure)


Queue (Linear Data Structure)
Linked List (Linear Data Structure)

Prerequisite
You can learn the basics in Python and object orientation. You have
worked through the Exceptions exercise.

Stack
A Stack is a long data structure that resembles, just like it sounds, a
stack or stack. Imagine a stack of plates where a plate represents an
object, variable, or whatever it is that you store.
One usually uses a specific set of methods:
1. .push () (Adds)
2. .pop () (
Deletes ) 3. .pek () (Shows what is on top without changing the
stack)
4. .is_empty () (Returns True / False depending on whether the stack
is empty)
5. .size () (Returns the number of elements in the stack)
A Stack with a specified number of seats.
An implementation of a Stack can look as follows:

class Stack :
def __init__ (self) :
self.items = []

def is_empty (self) :


return self.items == []

def push (self, item) :


self.items.append (item)

def pop (self) :


try :
return self.items.pop ()
except IndexError:
return "Empty list."

def peek (self) :


return self.items [len (self.items) -1 ]

def size (self) :


return len (self.items)
Working with the stack can go to this:

myList.pop ()
'Empty list.' >>>

myList.size ()
0
VisuAlgo Stack
Queue
A Queue is a linear data structure reminiscent of a stack. The
difference is that a queue is open at both ends. One end is used to
add elements and the other to remove elements.

The methods used are usually:


1. .enqueue () (Adds)
2. .dequeue () (
Deletes ) 3. .peek () (Shows what is at the top without changing the
queue)
4. .is_empty () (Returns True / False depending on whether the
queue is empty)
5. .size () (Returns the number of elements in queue)

A queue without a specified number of places.


An implementation of a Queue may look like this:

class Queue :
def __init__ (self) :
self.items = []
def is_empty (self) :
return self.items == []
def enqueue (self, item) :
self.items.insert ( 0 , item)

def dequeue (self) :


try :
return self.items.pop ()

except IndexError:
return "Empty list."

def peek (self) :


Return self.items [len (self.items) -1 ]

def. Size (self) :


Return len (self.items)

Working with a Queue:

>>> From queue import Queue


>>> My List = Queue ()
>>> myList.is_empty ()
True
>>> myList.enqueue ( "Tiger" )
>>> myList.enqueue ( "Lion" )
>>> myList .enqueue ( "Moose" )
>>> myList.is_empty ()
False
>>> myList.dequeue ()
'Tiger'
>>> myList.peek ()
'Lion'
>>> myList.enqueue ( "Godzilla" )
> >>myList.dequeue ()
'Lion'
VisuAlgo Stack

Linked list
For both Queue and Stack, we have used Python's built-in List: a to
store values, so so far we have only made a special version of
Python's List. The idea is that we should not need List without having
to build the entire data structure ourselves. To succeed, we will use
nodes and will build our own linked list. Queue and Stack are also
just a special version of the linked list.
We can imagine a regular List / Array as the picture below. The array
is stored in memory as a bit, and in that bit, the values ​are arranged
one after the other. Then we do not need to know where each value
is but just the array.

Array in memory
For a linked list of nodes, we cannot assume that the values ​are
adjacent to each other, but they are allocated to different places in
the memory. Therefore, each value in the linked list must be linked to
the next value. We use a Node class for that.

A linked list of nodes


The simplest version of a Node class contains only two attributes,
one for holding data and one for keeping track of the next node. We
call the first node in a non-empty list the head, the head of the list. To
access the nodes in a linked list, there must be a reference to the
head. From the head, we can access the remaining nodes by
following the pointers in the nodes. The next node's next attribute
contains Noneto show that it's the last node.
The list in the picture is called a single-linked list. There are also
double-linked lists; then each node is linked to the node before and
after.
Another frequent kind is the Circular linked list. In a Circular linked
list, the remaining node is linked to the first.

Circular Single Link List


Then there is of course also Circular double-linked list, and they can
be sorted or unsorted. A sorted list automatically sorts the list. When
adding new data to the list, the entire list is searched to find the right
place in the order. An unsorted list adds only to the value, usually the
last in the list.

VisuAlgo Stack
Nod class
Then we should look at how the code can look for the node class.
Very little code is needed as we only need an attribute for data and
one for the next node.

class Node :
"" "
Node class
"" "
def __init__ (self, data, subsequent = None) :
" ""
Initialize the object with the statistics and set subsequent to None.

"" "
self.data = data
self.next = next
We test using it in the python3 interpreter.

>>> head = node ( 1 )


>>> head
<__ main __. Node object at 0x743545132 >
>>> head.data
1
We have created a Node object and tested it and its data. Now we
are going to create one more object and connect ours head with the
new one.

>>> n2 = node ( 32 )
>>> n2.data
32
>>> head.next

>>> head.next = n2
>>> head.next
<__ main __. Node object at 0x7453468745 >
# Value of the first node
>>> head.data
1
# Second value of the node
>>> head.next.data
32
When we print the head.nextfirst time we get no output for the value
is None. After we assigned "head.next", "head.next" n2 contains our
"n2" object. Then we can write head.next.datato access the data, 32
in the "n2" object.
Test yourself to create one to the node,, n3 and assign to n2.next.
Print the "n3's" value via head.next.next.data and n2.next.data. In
this way, we can build a linked list.

Traverse nodes
Starting at a list's head and going through all the nodes and doing
something with them is called traversing a list. E.g. we need to
traverse a list if we want to print out all nodes values. We do this
most easily by creating a new variable that is used to point to the
list's head and use it to traverse the list.

number_list = Node ( 3 ) # Create first node, head


temp = Node ( 2 ) # Create node and assign to temp variable
Above, we create a simple list number_listthat contains [3, 2, 4]. To
print all values ​we use a while loop.

current_node = number_list
while current_node! = None :
print (current_node.data) # Print nodes value
current_node = current_node.next # Move to next node
In the loop we replace the pointer, current_nodewith the next node
and in this way we traverse the list without changing it.
number_liststill contains the same structure.

Get value with index


Now we are going to get a value at with an index.
def get (index, head) :
# Get value by index
if the head is None :
current_node = head
counter = 0
pass
else :
pass
# Raise exception of index out of binding
We start by checking that the first node is None, that is, the list is
empty. Then we create the variable current_nodethat we will use
when we traverse to keep the node we are on. counter is used for
counting which index we are on. If the list is then empty, head ==
None know that whatever index is, it will be out-of-bound and will
then lift an exception.
Now we add the loop that goes through the list.

def get (index, head) :


# Get value by index
if head is not None :
current_node = head
counter = 0
current_node = current_node.next
counter + = 1
else :
pass
# raise exception for index out of bound
After our while loop we need to check if it ran out for the index to be
too high or if we have found the index. If we are on the index we print
the value otherwise we will lift an index-out-of-bounds error.
This is a way to write the code to look through a list for an index;
there are at least 20 other ways to write it. Try to find other ways to
write it.

We also test the function.

number_list = Node ( 1 )

print (get ( 0 , number_list))


#1
print (get ( 3 , number_list))
#4
print (get ( 5 , number_list))
# IndexError
Now we have code to download from a linked list and also seen how
we can connect several nodes. How do we remove a node from the
list?

Remove node
To remove a node, we need to use two variables when we cross the
list. We need current_nodeand previous, current_node is used to
traverse the node to be deleted. previous should always point to the
previous node, the node before current_node. When we have found
the right node, we switch the node in previous that it points to the
node after current_node. Finally, we delete the node we want to
remove del temp.
Find the right nodes
Step 1, traverse the nodes so current_nodepoint to the one to be
deleted and previous point to the previous node.

Linking previous.next to current_node.next


Step 2, assign current_node.nextto previous.next.

Delete current_node
Step 3, delete current_node.

Add node
When we are joining a node, it is crucial which order we do things;
otherwise, we can lose all nodes that should be after the new one we
add. In the picture below, we start from the list [1, 2, 4]and we want
to insert a new node, with the number 5as value, between 2and 4.
The list should look like [1, 2, 5, 4]it is ready.
Find the right node and create new
Step 1, traverse the nodes so current_nodepoint to the node that
should be before the new one.

Insert new_node.next to current_node.next


Step 2, assign current_node.next to new_node.next. So both point to
the same node.

Insert current_node.next to new_node.


Assign new_nodeto current_node. Next we have a complete list
again. If we do this in this order, we need not worry about losing any
nodes. Think about what had happened if we had skipped step 2 and
instead directly assigned the new node to current_node.next.
As mentioned, there are many types of data structures. For a list,
you can look at List of data structures. Here we have raised three of
the most common ones
INSTALLING PYTHON
Python came into being in the late 80s, and its main goal is to be
clear to people (and not just machines!). Because of this, it looks
simple, but do not worry - Python besides this is also a very powerful
language!
Django is written in Python. We need Python to do something in
Django. Let's start by installing it! We want you to install the latest
version of Python 3, so if you already have an earlier version, you
will need to update it. If you already have version 3.4 or higher
installed, it should fit.

INSTALL PYTHON IN WINDOWS


First, check which version of Windows you have on your computer -
32-bit or 64-bit. This will be indicated in the “System Type” line on
the System Information page. To get there, try one of these ways:
Open the control panel from the Windows menu, from there go to
"System and Security", then to "System."
Press the Windows key, then go to Settings> System> About system
You can download Python for Windows from the official website:
https://www.python.org/downloads/windows/. Follow the link "Latest
Python 3 Release - Python xxx". If you own a 64-bit variant of
Windows installed, download the Windows x86-64 executable
installer. If not, download the Windows x86 executable installer. After
downloading the distribution, you should launch it (double click) and
follow the instructions.
Pay attention to the setup wizard screen, which is called “Setup”
(Setup): you need to scroll down and select the “Add Python 3.6 to
the PATH” option as shown in the figure (it might look like -
depending on the version you install):
When the installation is finished, you can see a suggestion to learn
more about Python or the version you installed. Close this window -
you will learn much more in this guide!

Note: if you are using an older version of Windows (7, Vista or an


older version), and installing Python 3.6.x ends with an error
message, you can try:
either install all available Windows updates and try installing Python
3.6 again;
or install an earlier version of Python, for example, 3.4.6.
If you had to install an early version of Python, the installation screen
might look a little different than the one shown above. Do not forget
to scroll the window to the line “Add python.exe to Path”, then click
the button to the left of it and select the item “Will be installed on the
local hard drive”:
INSTALL PYTHON: OS X
Note: Before installing Python on OS X, you need to check that your
Mac settings allow you to install packages not downloaded from the
App Store. Go to the System Settings (in the "Programs" folder),
click "Security and Safety" and select the "General" tab. If in the
section “Allow downloads from:” the option “App Store for Mac” is
selected, change it to “App Store for Mac and from installed
developers”.
You need to follow the link
https://www.python.org/downloads/release/python-361/ and
download the Python distribution:
Download the Mac OS X 64-bit / 32-bit installer file,
Double-click on python-3.6.1-macosx10.6.pkg to launch the installer.

INSTALLING PYTHON IN LINUX


It is likely that you now have Python installed. To check this (as well
as the language version), open the console and enter the following
command:
Command-line
$ python3 --version
Python 3.6.1
If you have another version of Python installed, no less than 3.4.0
(for example, 3.6.0), then there is no need to update. If Python is not
installed, or you want to use another version of the language, then
you can install it as follows:

Installing Python: Debian or Ubuntu


Enter this command in the console:
command-line
$ sudo apt install python3

Install Python: Fedora


Use the following command in the console:
command-line
$ sudo dnf install python3
If you have an old version of Fedora, then you may get a "command
fins not found" error . In this case, use yum.
Install Python: openSUSE
Use the following command in the console:

command-line
$ sudo zipper install python3
Make sure that the installation was successful by opening the
Terminal application and running the command python3:

command-line
$ python3 --version
Python 3.6.1
The version that you see may not be 3.6.1 - there will be a version
that you installed.

NOTE: if you were using Windows and received an error message


saying that it was python3 not found, try typing python(without 3) and
check if it will be Python version 3.4.0 or higher.

If you have any doubts, or something went wrong, and you have no
idea what to do next, ask your coach! Sometimes things don't go
quite smoothly, so it's best to ask someone with more experience to
help.
Text editor
There are many different editors, and it comes down to personal
preference. Most Python programmers use complex but extremely
powerful IDEs (Integrated Development Environments), such as
PyCharm. However, they are probably not very suitable for
beginners; we offer equally powerful, but far simpler options.
Below is a list of our preferences, but you can also ask your coach
for advice - it will be easier to get help from him.

Gedit
Gedit is an open, free editor, available for all operating systems.

Download it here.

Sublime Text 3
Sublime Text is a very common text editor with a free trial. It is easy
to install and easy to use and is also available for all operating
systems.

Download it here.
Atom
Atom is the latest GitHub text editor. It is free, open, easy to install,
and easy to use. Available for Windows, OSX and Linux.
Why do we need a code editor?
You may ask - why install a separate code editing program if you can
use Word or Notepad?

First, the code should be stored in plain text, and the problem of
programs such as Word or TextEdit is that they do not save files in
this form, but use rich text (with formatting and fonts), for example,
RTF (Rich Text Format).
The second reason is that specialized editors provide many useful
features for programming, such as colour code highlighting
depending on its meaning and automatically closing quotes.
Later we will see all this in action. Soon you will start thinking about
your code editor as a proven favourite tool :)
CONCLUSION
Many deep learning systems were identified as early as the 1980s
(and even earlier, but the results were unimpressive, while advances
in the theory of artificial neural networks (pre-training of neural
networks using a special case of an omnidirectional graphical model,
the so-called Boltzmann limited machine ) and the computing power
of the mid-2000s (first of all, Nvidia graphic processors , and now
Google tensor processors ) allowed to create complex technological
architectures of neural networks with sufficient performance and
allowing to solve a wide range of tasks that could not be effectively
solved earlier, For example, in computer vision , machine translation
, speech recognition and the quality of the solution in many cases is
now comparable, and in some cases exceeds the efficiency of
“protein” experts.
Deep learning algorithms are contrasted to shallow learning
algorithms by the number of parameterized transformations
encountered by the signal propagating from the input layer to the
output layer, where a data processing unit that has learnable
parameters, such as weights or thresholds, is considered a
parametrized transformation. The chain of changes from input to
output is called CAP - the transfer of responsibility ( eng. Credit
assignment path, CAP).). CAPs describe potential causal
relationships along with the network from entry to exit, with the path
in different branches may have different lengths. For a forward
formed neural network, the CAP depth is not different from the
network depth and is equal to the number of hidden layers plus one
(the output layer is also parameterized). For recurrent neural
networks in which the signal can jump through the layers bypassing
the intermediate ones, the CAP due to feedback is potentially
unlimited in length. There is no universally agreed threshold for the
depth of division of shallow learning from deep learning, but it is
usually considered that deep learning is characterized by several
non-linear layers (CAP> 2). Jörgen Schmidhuber also highlights
“very deep learning” when CAP is> 10.

You might also like