ML Unit - 1
ML Unit - 1
ML Unit - 1
Definition:
A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P, if its performance at tasks T, as measured by P,
improves with experience E
E*T=P
Examples
Handwriting recognition learning problem
classifications
A robot driving learning problem
A machine learning system builds prediction models, learns from previous data, and
predicts the output of new data whenever it receives it. The amount of data helps to build
a better model that accurately predicts the output, which in turn affects the accuracy of
the predicted output
1. Supervised Learning
In supervised learning, sample labeled data are provided to the machine learning
system for training, and the system then predicts the output based on the training data.
After the training and processing are done, we test the model with sample data to see if it
can accurately predict the output.
The mapping of the input data to the output data is the objective of supervised learning.
The managed learning depends on oversight, and it is equivalent to when an understudy
learns things in the management of the educator. Spam filtering is an example of
supervised learning.
o Classification
o Regression
2. Unsupervised Learning
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
We don't have a predetermined result. The machine tries to find useful insights from
the huge amount of data. It can be further classifieds into two categories of algorithms:
o Clustering
o Association
3. Reinforcement Learning
The agent learns automatically with these feedbacks and improves its performance.
In reinforcement learning, the agent interacts with the environment and explores it. The
goal of an agent is to get the most reward points, and hence, it improves its performance.
The robotic dog, which automatically learns the movement of his arms, is an example of
Reinforcement learning.
4. APPLICATION OF ML :
1. Image Recognition:
It is based on the Facebook project named "Deep Face," which is responsible for face
recognition and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech
recognition, and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also
known as "Speech to text", or "Computer speech recognition." At present, machine
learning algorithms are widely used by various applications of speech recognition. Google
assistant, Siri, and Alexa are using speech recognition technology to follow the voice
instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the
correct path with the shortest route and predicts the traffic conditions.
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes
information from the user and sends back to its database to improve the performance.
4. Product recommendations:
Google understands the user interest using various machine learning algorithms and
suggests the product as per customer interest.
5. Self-driving cars:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
These assistant record our voice instructions, send it over the server on a cloud, and decode
it using ML algorithms and act accordingly.
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that
a fraudulent transaction can take place such as fake accounts, fake ids, and steal
money in the middle of a transaction. So to detect this, Feed Forward Neural
network helps us by checking whether it is a genuine transaction or a fraud transaction.
Machine learning is widely used in stock market trading. In the stock market, there is
always a risk of up and downs in shares, so for this machine learning's long short term
memory neural network is used for the prediction of stock market trends.
10. Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact
position of lesions in the brain.
Nowadays, if we visit a new place and we are not aware of the language then it
is not a problem at all, as for this also machine learning helps us by converting the text into
our known languages. Google's GNMT (Google Neural Machine Translation) provide this
feature, which is a Neural Machine Learning that translates the text into our familiar
language, and it called as automatic translation.
Well Posed Learning Problem – A computer program is said to learn from experience E
in context to some task T and some performance measure P, if its performance on T, as
was measured by P, upgrades with experience E.
Any problem can be segregated as well-posed learning problem if it has three traits –
Task
Performance Measure
Experience
The performance System — Takes a new board as input and outputs a trace of the
function) as input and outputs a new problem (an initial board state) for the
performance system to explore.
Given performance task, in this case playing checkers, by using the learned target
function(s), The Critic takes as input the history or trace of the game and produces as
output a set of training examples of the target function. The Generalizer takes as input
the training examples and produces an output hypothesis that is its estimate of the target
function. The Experiment Generator takes as input the current hypothesis (currently
learned function) and outputs a new problem (i.e., initial board state) for the
Performance System to explore. Its role is to pick new practice problems that will
maximize the learning rate of the overall system. source of experience examples. To get
a successful learning system, it should be designed properly, for a proper design several
steps may be followed for perfect and efficient system.
Certain examples that efficiently defines the well-posed learning problem are –
1. To better filter emails as spam or not
Task – Classifying emails as spam or not
Performance Measure – The fraction of emails accurately classified as spam or not
spam
Experience – Observing you label emails as spam or not spam
A checkers learning problem
Task – Playing checkers game
Performance Measure – percent of games won against opposer
Experience – playing implementation games against itself
3. Handwriting Recognition Problem
Task – Acknowledging handwritten words within portrayal
Performance Measure – percent of words accurately classified
Experience – a directory of handwritten words with given classifications
4. A Robot Driving Problem
Task – driving on public four-lane highways using sight scanners
Performance Measure – average distance progressed before a fallacy
Experience – order of images and steering instructions noted down while observing a
human driver
5. Fruit Prediction Problem
Task – forecasting different fruits for recognition
Performance Measure – able to predict maximum variety of fruits
Experience – training machine with the largest datasets of fruits images
6. Face Recognition Problem
Task – predicting different types of faces
Performance Measure – able to predict maximum types of faces
Experience – training machine with maximum amount of datasets of different face
images
7. Automatic Translation of documents
Task – translating one type of language used in a document to other language
Performance Measure – able to convert one language to other efficiently
Experience – training machine with a large dataset of different types of languages
DESIGNING A LEARNING SYSTEM IN MACHINE LEARNING:
The very important and first task is to choose the training data or training
experience which will be fed to the Machine Learning Algorithm. It is important to
note that the data or experience that we fed to the algorithm must have a significant impact
on the Success or Failure of the Model. So Training data or experience should be chosen
wisely.
Below are the attributes which will impact on Success and Failure of Data:
The training experience will be able to provide direct or indirect feedback
regarding choices.
For example: While Playing chess the training data will provide feedback to itself
like instead of this move if this is chosen the chances of success increases.
Second important attribute is the degree to which the learner will control the
sequences of training
For example: when training data is fed to the machine then at that time accuracy is
very less but when it gains experience while playing again and again with itself or
opponent the machine algorithm will get feedback and control the chess game
accordingly.
Third important attribute is how it will represent the distribution of examples over
which performance will be measured.
For example: a Machine learning algorithm will get experience while going through
a number of different cases and different examples. Thus, Machine Learning
Algorithm will get more and more experience by passing through more and more
examples and hence its performance will increase.
The next important step is choosing the target function. It means according to the
knowledge fed to the algorithm the machine learning will choose NextMove function
which will describe what type of legal moves should be taken.
For example : While playing chess with the opponent, when opponent will play then
the machine learning algorithm will decide what be the number of possible legal moves
taken in order to get success.
When the machine algorithm will know all the possible legal moves the next step
is to choose the optimized move using any representation i.e. using linear
Equations, Hierarchical Graph Representation, Tabular form etc. The NextMove
function will move the Target move like out of these move which will provide more
success rate.
For Example : while playing chess machine have 4 possible moves, so the machine
will choose that optimized move which will provide success to it.
Step 4- Choosing Function Approximation Algorithm:
An optimized move cannot be chosen just with the training data. The training data
had to go through with set of example and through these examples the training data
will approximates which steps are chosen and after that machine will provide
feedback on it.
For Example : When a training data of Playing chess is fed to algorithm so at that
time it is not machine algorithm will fail or get success and again from that failure or
success it will measure while next move what step should be chosen and what is its
success rate.
The final design is created at last when system goes from number of examples ,
failures and success , correct and incorrect decision and what will be the next step etc.
Example: DeepBlue is an intelligent computer which is ML-based won chess game
against the chess expert Garry Kasparov, and it became the first computer which had
beaten a human chess expert.
PERSPECTIVES AND ISSUES IN MACHINE LEARNING:
The major issue that comes while using machine learning algorithms is the lack
of quality as well as quantity of data. Although data plays a main role in the
processing of machine learning algorithms, many data scientists claim that
insufficient data, noisy data, and unclean data are extremely exhausting the machine
learning algorithms.
For example:
A simple task requires thousands of sample data, and an advanced task such
as speech or image recognition needs millions of sample data
o Noisy Data- It is responsible for an inaccurate prediction that affects the decision
as well as accuracy in classification tasks.
o Incorrect data- It is also responsible for faulty programming and results obtained
in machine learning models. Hence, incorrect data may affect the accuracy of the
results also.
Over fitting:
Overfitting is one of the most common issues faced by Machine Learning engineers
and data scientists. Whenever a machine learning model is trained with a huge amount
of data, it starts capturing noise and inaccurate data into the training data set. It negatively
affects the performance of the model. Let's understand with a simple example where we
have a few training data sets such as 1000 mangoes, 1000 apples, 1000 bananas, and 5000
papayas.
We can overcome over fitting by using linear and parametric algorithms in the
machine learning models.
Under fitting is just the opposite of over fitting. Whenever a machine learning model
is trained with fewer amounts of data, and as a result, it provides incomplete and inaccurate
data and destroys the accuracy of the machine learning model.
Under fitting occurs when our model is too simple to understand the base structure
of the data.This generally happens when we have limited data into the data set, and we try
to build a linear model with non-linear data.
6. Customer Segmentation
The machine learning process is very complex, which is also another major
issue faced by machine learning engineers and data scientists. However, Machine
Learning and Artificial Intelligence are very new technologies but are still in an
experimental phase and continuously being changing over time. There is the majority
of hits and trial experiments; hence the probability of error is higher than expected.
Further, it also includes analyzing the data, removing data bias, training data,
applying complex mathematical calculations, etc…
8. Data Bias
Data Biasing is also found a big challenge in Machine Learning. These errors
exist when certain elements of the dataset are heavily weighted or need more
importance than others. Biased data leads to inaccurate results, uneven outcomes,
and other analytical errors. However, we can resolve this error by determining where
data is actually biased in the dataset. Further, take necessary steps to reduce it.
o Linear Regression
o Logistic Regression
o Decision Tree
o Bayes Theorem and Naïve Bayes Classification
o Support Vector Machine (SVM) Algorithm
o K-Nearest Neighbor (KNN) Algorithm
o K-Means
o Gradient Boosting algorithms
o Dimensionality Reduction Algorithms
o Random Forest
1. Bird
2. Stressful situation
3. Cold, hot
Now please note that, on a more technical level we can think of a concept as a Boolean
function! As we know, every function has a domain (i.e., the inputs) and a range (i.e., the
outputs)! So can you think of the domain and range of a Boolean function that represents
a concept such as the concept of bird?
this function is true for birds and false for anything else (tigers, ants, …)
“A machine learning algorithm tries to infer the general definition of some concept,
through some training examples that are labeled as members or non-members of that
particular concept”
The whole idea is to estimate the true underlying Boolean function (i.e., concept),
which can successfully fit the training examples perfectly and spit out the right output.
This means that if the label for a training example is positive (i.e., a member of the
concept) or negative, we would like for our learned function to correctly determine all
these cases.
Let’s say the task is to learn the following target concept:
What is a hypothesis?
How do we represent a hypothesis to a machine learning algorithm?
We can simply define a hypothesis as a vector of some constraints on the attributes. In
our example below, a hypothesis consists of 3 constraints. Now, a constraint for a given
attribute could have different shapes and forms:
Totally free of constraint (denoted with a question mark): This means that we
really don’t put any constraint on a particular attribute as that attribute doesn’t play
an important role in learning the concept of play in this hypothesis.
the attribute: specified by the exact required value for that particular attribute
h(x) = C(x)!
CONCEPT LEARNING AS SEARCH FIND S ALGORITHM:
.
Inner working of Find-S algorithm
Initialization − The algorithm starts with the most specific hypothesis, denoted as h.
This initial hypothesis is the most restrictive concept and typically assumes no
positive examples. It may be represented as h = <∅, ∅, ..., ∅>, where ∅ denotes "don't
care" or "unknown" values for each attribute.
Iterative Process − The algorithm iterates through each training example and refines
the hypothesis based on whether the example is positive or negative.
o For each positive training example (an example labeled as the target class), the
algorithm updates the hypothesis by generalizing it to include the attributes of
the example. The hypothesis becomes more general as it covers more positive
examples.
o For each negative training example (an example labeled as a non-target class),
the algorithm ignores it as the hypothesis should not cover negative examples.
The hypothesis remains unchanged for negative examples.
Generalization − After processing all the training examples, the algorithm produces
a final hypothesis that covers all positive examples while excluding negative
examples. This final hypothesis represents the generalized concept that the algorithm
has learned from the training data.
Suppose, we have a dataset of animals with two attributes: "has fur" and "makes sound."
Each animal is labeled as either a dog or a cat. Here is a sample training dataset .
To apply the Find-S algorithm, we start with the most specific hypothesis, denoted as h,
which initially represents the most restrictive concept. In our example, the initial
hypothesis would be h = <∅, ∅>, indicating that no specific animal matches the concept.
For each positive training example (an example labeled as the target class), we update
the hypothesis h to include the attributes of that example. In our case, the positive
training examples are dogs. Therefore, h would be updated to h = <Yes, Yes>.
For each negative training example (an example labeled as a non-target class), we
ignore it as the hypothesis h should not cover those examples. In our case, the
negative training examples are cats, and since h already covers dogs, we don't need
to update the hypothesis.
After processing all the training examples, we obtain a generalized hypothesis that
covers all positive training examples and excludes negative examples. In our
example, the final hypothesis h = <Yes, Yes> accurately represents the concept of a
dog.
Terms Used:
Concept learning: Concept learning is basically the learning task of the machine
(Learn by Train data)
General Hypothesis: Not Specifying features to learn the machine.
G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes
Specific Hypothesis: Specifying features to learn machine (Specific feature)
S= {‘pi’,’pi’,’pi’…}: The number of pi depends on a number of attributes.
Version Space: It is an intermediate of general hypothesis and Specific hypothesis.
It not only just writes one hypothesis but a set of all possible hypotheses based on
training data-set.
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
S = [Null, Null, Null, Null, Null, Null]
1. Improved accuracy: CEA considers both positive and negative examples to generate
the hypothesis, which can result in higher accuracy when dealing with noisy or
incomplete data.
2. Flexibility: CEA can handle more complex classification tasks, such as those with
multiple classes or non-linear decision boundaries.
3. More efficient: CEA reduces the number of hypotheses by generating a set of general
hypotheses and then eliminating them one by one. This can result in faster processing
and improved efficiency.
1. More complex: CEA is a more complex algorithm than Find-S, which may make it
more difficult for beginners or those without a strong background in machine
learning to use and understand.
2. Higher memory requirements: CEA requires more memory to store the set of
hypotheses and boundaries, which may make it less suitable for memory-constrained
environments.
3. Slower processing for large datasets: CEA may become slower for larger datasets
due to the increased number of hypotheses generated.
4. Higher potential for overfitting: The increased complexity of CEA may make it more
prone to overfitting on the training data, especially if the dataset is small or has a
high degree of noise.
1.1 Version Spaces
Definition (Version space). A concept is complete if it covers all positive
examples.
1.7.1 Representation
The Candidate – Elimination algorithm finds all describable hypotheses that are
consistent with the
observed training examples. In order to define this algorithm precisely,
we begin with a few basic definitions. First, let us say that a hypothesis is
consistent with the training examples if it correctly classifies these
examples.
To Prove:
1. Every h satisfying the right hand side of the above expression is in VS
H, D
2. Every member of VS satisfies the right-hand side of the expression
H, D
Sketch of proof:
1. let g, h, s be arbitrary members of G, H, S respectively with g
g h gs
By the definition of S, s must be satisfied by all positive examples in D.
Because h g s,
h must also be satisfied by all positive examples in D.
By the definition of G, g cannot be satisfied by any negative
example in D, and because g g h h cannot be satisfied by any
negative example in D. Because h is satisfied by all positive
examples in D and by no negative examples in D, h is consistent
with D, and therefore h is a member of VSH,D.
2. It can be proven by assuming some h in VSH,D,that does not
satisfy the right-hand side of the expression, then showing that
this leads to an inconsistency
1.7.3 CANDIDATE-ELIMINATION Learning Algorithm
• If d is a negative example
• Remove from S any hypothesis inconsistent with d
• For each hypothesis g in G that is not consistent with d
• Remove g from G
• Add to G all minimal specializations h of g such that
• h is consistent with d, and some member of S is more specific than
h
• Remove from G any hypothesis that is less general than
Example :
Initializing the S boundary set to contain the most specific (least general)
hypothesis
S0 , , , , ,
46
Given that there are six attributes that could be specified to
specialize G2, why are there only three new hypotheses in
G 3?
For example, the hypothesis h = (?, ?, Normal, ?, ?, ?)
is a minimal specialization of G 2 that correctly labels
the new example as a negative example, but it is not
included in G 3. The reason this hypothesis is excluded
is that it is inconsistent with the previously
encountered positive examples
Consider the fourth training example.
48
Example 2:
Time Weather Temp Company Humidity Wind Goes
Morning Sunny Warm Yes Mild Strong Yes
Evening Rainy Cold No Mild Normal No
Morning Sunny Moderate Yes Normal Normal Yes
Evening Sunny Cold Yes High Strong Yes
Attributes:
Time, weather, temp, company, humidity, wind
Target: goes for walk or not
+ve yes -ve no
Specific hypothesis(s)
General hypothesis (g)
S0 [null, null, null, null, null, null ]
G0 [?, ?, ?, ?, ?, ?]
Step 1:
Xo< morning ,sunny, warm , yes, mild , strong>
{+ve}
Update specific hypothesis values
S1{ morning ,sunny, warm , yes, mild , strong}
G1 [?, ?, ?, ?, ?, ?]
49
Step 2:
X1[evening,rainy,cold,no,mild,normal
-ve instance
Update general hypothesis
S2{ morning ,sunny, warm , yes, mild , strong} ------S1 AND S2 SAME
G2={morning,?,?,?,?,?} (?,suuny,?,?,?,?}{{?,?,warm,?,?,?}{?,?,?,yes,?,?}
{?,?,?,?,?,strong}
STEP 3
x2{morning,sunny,moderate,yes,normal,normal}
+ve instance --so update specific hypothesis
(x2,s2) s3={morning,sunny,?,yes,?,?}
G3=={morning,?,?,?,?,?} (?,suuny,?,?,?,?} { ?,?,?,yes,?,?}
STEP 4:
X3={evening,sunny,cold,yes,high,strong}
+ve---update specific hypothesis
(x3,s3) S4 ={?,sunny,?, yes,?,?}
G4=(?,sunny,?,?,?,?} { ?,?,?,yes,?,?}
Let;
Specific hypothesis values ={?,sunny,?,yes,?,?}
general hypothesis value = {?, sunny,?,?,?,?} { ?,?,?,yes,?,?}
50
DECISION TREE:
A decision tree is a type of supervised learning algorithm that is
commonly used in machine learning to model and predict outcomes based on
input data.
It is a tree-like structure where each internal node tests on attribute, each
branch corresponds to attribute value and each leaf node represents the
final decision or prediction. The decision tree algorithm falls under the
category of supervised learning. They can be used to solve
both regression and classification problems.
Decision Tree Terminologies:
Root Node: A decision tree’s root node, which represents the original
choice or feature from which the tree branches, is the highest node.
Internal Nodes (Decision Nodes): Nodes in the tree whose choices are
determined by the values of particular attributes. There are branches on
these nodes that go to other nodes.
Branches (Edges): Links between nodes that show how decisions are
made in response to particular circumstances.
Parent Node: A node that is split into child nodes. The original node from
which a split originates.
51
Decision Criterion: The rule or condition used to determine how the data
should be split at a decision node. It involves comparing feature values
against a threshold.
Decision tree uses the tree representation to solve the problem in which
each leaf node corresponds to a class label and attributes are represented
on the internal node of the tree. We can represent any boolean function on
discrete attributes using the decision tree.
52
Below are some assumptions that we made while using the decision tree:
At the beginning, we consider the whole training set as the root.
Feature values are preferred to be categorical. If the values are continuous
then they are discretized prior to building the model.
On the basis of attribute values, records are distributed recursively.
We use statistical methods for ordering attributes as root or the internal
node
As you can see from the above image the Decision Tree works on the Sum of
Product form which is also known as Disjunctive Normal Form. In the above
image, we are predicting the use of computer in the daily life of people. In the
Decision Tree, the major challenge is the identification of the attribute for the
root node at each level. This process is known as attribute selection. We have
two popular attribute selection measures:
1. Information Gain
2. Gini Index
Information Gain:
When we use a node in a decision tree to partition the training instances into
smaller subsets the entropy changes. Information gain is a measure of this
change in entropy.
Suppose S is a set of instances,
A is an attribute
Sv is the subset of S
v represents an individual value that the attribute A can take and Values (A)
is the set of all possible values of A, then
53
Entropy: is the measure of uncertainty of a random variable, it characterizes
the impurity of an arbitrary collection of examples. The higher the entropy
more the information content.
Suppose S is a set of instances, A is an attribute, Sv is the subset of S with A
= v, and Values (A) is the set of all possible values of A, then
Example:
For the set X = {a,a,a,b,b,b,b,b}
Total instances: 8
Instances of b: 5
Instances of a: 3
54
ID 3 ALGORITHM:
The ID3 algorithm then selects the feature that provides the most
information about the target variable. The decision tree is built top-down,
starting with the root node, which represents the entire dataset. At each node,
the ID3 algorithm selects the attribute that provides the most information gain
about the target variable.
The attribute with the highest information gain is the one that best
separates the data points into different categories.
Information gain:
56
What are the steps in ID3 algorithm:
1. Determine entropy for the overall the dataset using class distribution.
4. Iteratively apply all above steps to build the decision tree structure.
57
EXAMPLE:
Calculate Entropy(S)
where
p+, is the proportion of positive examples in S and
p-, is the proportion of negative examples in S.
58
STEP 1:
Attribute: outlook
Values (outlook)= Sunny, Overcast, Rain
1. S sunny [2+,-3]
2 2 3 3
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑠𝑢𝑛𝑛𝑦 ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =0.971
5 5 5 5
2. s overcast [4+,-0]
4 4 0 0
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡 ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =0
4 4 4 4
3. S rain [3+,-2]
3 3 2 2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑟𝑎𝑖𝑛) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =0.971
5 5 5 5
Gain(s,outlook)≡ ∑𝑣∈𝑠𝑢𝑛𝑛𝑦,𝑜𝑣𝑒𝑟𝑐𝑎𝑠𝑡,𝑟𝑎𝑖𝑛
Entropy(s)- 5/14 entropy(ssunny)- 4/14 entropy(sovercast) - 5/14 entropy(srain)
0.940 - 5/14 * 0.971 -4/14 * 0 -5/14 * 0.971
=0.2464
59
STEP 2
Attribute = temperature
Values (temp)= hot, mild, cool
1. S hot [2+,-2]
2 2 2 2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 ℎ𝑜𝑡) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =1
4 4 4 4
2. s mild [4+,-2]
4 4 2 2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑚𝑖𝑙𝑑 ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.9183
6 6 6 6
STEP 3:
Attribute = humidity
Values (humidity)= high, normal
1. S high [3+,-4]
3 3 4 4
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 ℎ𝑖𝑔ℎ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.9852
7 7 7 7
60
2. s normal[6+,-1]
6 6 1 1
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑛𝑜𝑟𝑚𝑎𝑙) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.5916
7 7 7 7
2. s week [6+,-2]
6 6 2 2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑤𝑒𝑎𝑘) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.8113
8 8 8 8
62
STEP:1
Attribute = temperature
Values (temp)= hot, mild, cool
Ssunny[+2,-3]
2 2 3 3
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑠𝑢𝑛𝑛𝑦 ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =0.97
5 5 5 5
1. S hot [0+,-2]
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 ℎ𝑜𝑡) =0
2. s mild [1+,-1]
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑚𝑖𝑙𝑑 ) =1
3.S cool [1+,-0]
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑐𝑜𝑜𝑙) =0
63
Gain(s,temp)≡ ∑𝑣∈ hot,mild,cool
Entropy(s)- 2/5 entropy(shot)- 2/5 entropy(smild) - 1/5 entropy(scool)
0.97 - 2/5 * 0 -2/5 * 1 -1/5 * 0
=0.570
STEP 2:
Attribute = humidity
Values (humidity)= high, normal
Ssunny[+2,-3]
2 2 3 3
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑠𝑢𝑛𝑛𝑦 ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =0.97
5 5 5 5
1. S high [0+,-3]
0 0 3 3
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 ℎ𝑖𝑔ℎ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =0
5 5 5 5
2. s normal [2+,-0]
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑛𝑜𝑟𝑚𝑎𝑙) = 0
STEP 3:
Attribute = wind
Values (humidity)= strong, weak
Ssunny[+2,-3]
2 2 3 3
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑠𝑢𝑛𝑛𝑦 ) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 =0.97
5 5 5 5
1. S strong [1+,-1]
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆𝑠𝑡𝑟𝑜𝑛𝑔) = 1
2. s week [1+,-2]
1 1 2 2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆 𝑤𝑒𝑎𝑘) = − 𝑙𝑜𝑔2 − 𝑙𝑜𝑔2 = 0.9183
3 3 3 3
65
Gain (ssunny,temp)=0.570
Gain (ssunny,humidity)=0.97
Gain (ssunny,wind)=0.0192
66
Entropy(s)- 0/5 entropy(shot)- 3/5 entropy(smild) - 2/5 entropy(scool)
0.97 - 0/5 * 0 - 3/5 * 0.9183- 2/5 * 1
=0.0192
67
68
69
Advantages of the Decision Tree:
1. It is simple to understand as it follows the same process which a human follow
while making any decision in real-life.
2. It can be very useful for solving decision-related problems.
3. It helps to think about all the possible outcomes for a problem.
4. There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree:
1. The decision tree contains lots of layers, which makes it complex.
2. It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.q
3. For more class labels, the computational complexity of the decision tree may
increase.
70