Ch13. Decision Tree: KH Wong

Ch13.
Decision Tree
KH Wong
Ch13. Decision tree v9.b2 1

We will learn : the Classification and Regression
Tree ( CART) ( or Decision Tree)
• CART (Classification and Regression Trees)

– uses Gini Index(Classification) as metric.
• Other methods: ID3 (Iterative Dichotomiser 3)
– uses Entropy function and Information gain as
metrics.
• References:
• https://medium.com/deep-math-machine-learning-ai/chapter-4-decision-trees-algorithms-b93975f7a1f1
• https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/

To build the tree you need training data
• You should have enough data for training. It is

a supervised learning algorithm
• Divide the whole training data (100%) into:
– Training set (60%): for training your classifier
– Validation set (10%): for tuning the parameters
– Test set (30%): for testing the performance of
your classifier

CART can preform classification or
regression functions
• So when to use classification or regression
• Classification trees : Outputs are class symbols
not real numbers. E.g. high, medium, low etc.
• Regression trees : Outputs are target variables
(real numbers): E.g. 1.234, 5.678 etc. (Not
covered in this lecture)
• A good example can be found at
https://sefiks.com/2018/08/28/a-step-by-step
-regression-decision-tree-example/
Classification tree approaches
• Famous trees are ID3 and CART.
– ID3 uses information gain
• https://en.wikipedia.org/wiki/ID3_algorithm
– CART uses Gini index
• https://en.wikipedia.org/wiki/Decision_tree_learning
• Information gain & Gini index will be discussed

How to read Decision tree diagram
• https://www.python-course.eu/Decision_Trees.php

Common terms used with Decision trees
• Root Node: It represents entire population or
sample and this further gets divided into two or
more homogeneous sets.
• Splitting: It is a process of dividing a node into two
or more sub-nodes.
• Decision Node: When a sub-node splits into further
sub-nodes, then it is called decision node.
• Leaf/ Terminal Node: Nodes do not split is called
Leaf or Terminal node.
• Pruning: When we remove sub-nodes of a decision
node, this process is called pruning. You can say
https://medium.com/greyatom/
opposite process of splitting. decision-trees-a-simple-way-to-vi
• Branch / Sub-Tree: A sub section of entire tree is sualize-a-decision-dc506a403aeb
called branch or sub-tree.
• Parent and Child Node: A node, which is divided
into sub-nodes is called parent node of sub-nodes
whereas sub-nodes are the child Ch13.of parent node.
Decision tree v9.b2 7
CART Model Representation
Root
Attribute
• CART is a binary tree. node
(variables)
• Each root node represents a single
input variable (x) and a split point on
that variable (assuming the variable is
numeric).
• The leaf nodes of the tree contain an
output variable (y) which is used to Leaf
make a prediction. Node
(class
• Given a dataset with two inputs (x) of
variable
height in centimeters and weight in prediction)
kilograms the output of sex as male or
female, here is an example of a binary
decision tree (completely fictitious for
demonstration purposes only).
https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/
A simple example of a decision tree
• Use height and weight to guess the sex of a
person.
code
1 If Height > 180 cm Then Male
2 If Height <= 180 cm AND Weight > 80
3 kg Then Male
4 If Height <= 180 cm AND Weight <=
80 kg Then Female
Make Predictions With CART Models
The decision tree split this up

Testing to see if a person a male or not
into rectangles (when p=2
Height > 180 cm: No input variables) or some kind
Weight > 80 kg: No of hyper-rectangles with
Therefore: Female more inputs.
CMSC5707, Ch13. Exercise 1
Decision Tree
• Why it is a binary tree?
– Answer: ____________________
• How many nodes and leaves?
– Answer: ________________
• Male or Female if
– 183cm , 77 Kg? ANS:______
– 173 cm , 79 Kg? ANS: _____
– 177 cm , 85 Kg? ANS: ______

CMSC5707, Ch13. Answer 1
Decision Tree
• Why it is a binary tree?
– Answer: at each node it has 2 leaves
• How many nodes and leaves?
– Answer: Nodes:2, leaves 3.
• Male or Female if
– 183 cm , 77 Kg? ANS: Male
– 173 cm , 79 Kg? ANS: Female
– 177 cm , 85 Kg? ANS: Male

How to create a CART
• Greedy Splitting : Grow the tree
• Stopping Criterion: when the number of
samples in a leaf is small enough.
• Pruning The Tree: remove unnecessary leaves
to
– make it more efficient and
– solve over fitting problems.

Greedy Splitting
• During the process of growing the tree, you need to grow the
leaves from a node by splitting.
• You need a metric to evaluate your split is good or not, e.g. can
use one of the following splitting methods:
– Method 1:
– Gini Index: Split using the attribute that the Gini (impurity) index is the
lowest . Gini_index=1-(pi2)
– Or
– Method 2:
– Information gain (based on Entropy) is the highest:
• Information gain (IG)=Entropy(parent)-entropy(child)
• IG= pparent[log2(pparent)]-pchild[log2 (pchild)]

Example: data input
4 buses
3 cars
3 trains
Total 10 samples
https://www.saedsayad.com/decision_tree.htm
Method 1) Split metric : Entropy(Parent) =Entropy at the
top level
Entropy ( parent )    pi log 2 ( pi )
i
• Prob(bus) =4/10=0.4
• Prob(car) =3/10=0.3
• Prob(train)=3/10=0.3
– Entropy(parent)= -0.4*log_2(0.4)- 0.3*log_2(0.3)-0.3*log_2(0.3)
=1.571
– (note:log_2 is log base 2.)
• Another example: if P(bus)=1, P(car)=0, P(train)=0
– Entropy = 1*log_2(1)-0*log_2(0.00001)- 0*log_2(0.000001)=0
– Entropy = 0, it is very pure, Impurity is 0

Exercise 2
Method 2) Split metric: Gini (impurity) index
Gini _ index  1   pi2
i
• Prob(bus) =4/10=0.4
• Prob(car) =3/10=0.3
– Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)=____ ?
• Another example if the class has only bus: if P(bus)=1,
P(car)=0, P(train)=0
– Gini Impurity index= ____?
– Impurity is____ ?

Answer2
2) Split metric: Gini (impurity) index
i
• Prob(bus) =4/10=0.4
• Prob(car)=3/10=0.3
• Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)= 0.66
• Another example if the class has only bus: if P(bus)=1,
P(car)=0, P(train)=0
– Gini Impurity index= 1-1*1-0*0-0*0=0
– Impurity is 0

Entropy    pi log 2 ( pi ), Gini _ index  1   pi2
Exercise 3. i i
Train
Train
• If the first 2 rows are not bus but train, find entropy and Gini index
• Prob(bus) =2/10=0.2
• Prob(car)=3/10=0.3
• Entropy =_______________________________?
• Ch13. Decision tree v9.b2
Gini index =_____________________________? 18
Entropy    pi log 2 ( pi ), Gini _ index  1   pi2
ANSWER 3. i i
Train
Train
• If the first 2 rows are not bus but train, find entropy and Gini index
• Prob(bus) =2/10=0.2
• Prob(car)=3/10=0.3
• Entropy =-0.2*log_2(0.2)- 0.3*log_2(0.3)- 0.5log_2(0.5)= 1.485
• Gini index =1-(0.2*0.2+0.3*0.3+0.5*0.5)=
Ch13. Decision tree v9.b2 0.62 19
Method 3) Split metrics : Variance reduction
• Introduced in CART,[3] variance reduction is

often employed in cases where the target
variable is continuous (regression tree),
meaning that use of many other metrics would
first require discretization before being applied.
The variance reduction of a node N is defined
as the total reduction of the variance of the
target variable x due to the split at this node:
• Details will not be discussed here.
Ch13. Decision tree v9.b2
https://en.wikipedia.org/wiki/Decision_tree_learning 20
Splitting procedure: Recursive Partitioning
Algorithm for CART
• Take all of your training data.
• Consider all possible values of all variables.
• Select the variable/value (X=t1) (e.g.
X1=Height) that produces the greatest
“separation” (or maximum homogeneity - -
less impurity within each of the new part,
meaning lowest Gini index) in the target.
• (X=t1) is called a “split”.
• If X< t1 (e.g. Height <180cm) then send the
data to the “left”; otherwise, send data
point to the “right”.
• Now repeat same process on these two
“nodes”
• You get a “tree”
• Note: CART only uses binary splits.
https://www.casact.org/education/specsem/f2005/handouts/cart.ppt
Example1: Design a decision tree
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
•
14 Rain Mild High Strong No
Ch13. Decision tree v9.b2
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
22
Gini index or Information gain approach
• Gini index
• Gini index is a metric for classification tasks in CART.
• Gini_index = 1 – Σ (Pi2) for i=1 to number of classes
– Note: (two approaches will be shown in following slides)
– Method 1:
– Gini Index: Split using the attribute that the Gini (impurity) index is
the lowest . Gini_index=1-(pi2)
– Or
– Method 2:
– Information gain (based on Entropy) is the highest:

Outlook
GINI index approach

Outlook is a nominal feature. It can be sunny, overcast or rain. I will summarize the final
decisions for outlook feature.
Gini(Outlook=Sunny) = 1 – (2/5)^2– (3/5)^2 = 0.48
Gini(Outlook=Overcast) = 1 – (4/4)^2– (0/4)^2 = 0
Gini(Outlook=Rain) = 1 – (3/5)^2 – (2/5)^2 = 0.48
Then, we will calculate weighted sum of gini indexes for outlook feature.
Gini(Outlook) = (5/14) * 0.48 + (4/14) * 0 + (5/14) * 0.48 = 0.343
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94
Outlook is a nominal feature. It can be sunny, overcast or rain. I will summarize the final decisions for
outlook feature.
Weighted_entropy(Outlook=Sunny) =(5/14)*( -(2/5)*log_2(2/5)-(3/5)*log_2(3/5))=0.347
Weighted_entropy(Outlook=Overcast) = (4/14)*( -(4/4)*log_2(4/4)-(0/4)*log_2(0/4))=0
Weighted_entropy(Outlook=Rain) = (5/14)*( -(3/5)*log_2(3/5)-(2/5)*log_2(2/5))=0.347
Information_gain_for_outlook= Parent entropy- Weighted_entropy(Outlook=Sunny)-
Weighted_entropy(Outlook=Overcast) - Weighted_entropy(Outlook=Rain)=0.94
-0.347-0-0.347= 0.246
Outlook Yes No Number of instances
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5
Exercise 4:Temperature
GINI index approach
Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and
Mild. Let’s summarize decisions for temperature feature.
Gini(Temp=Hot) = __________________________________?
Gini(Temp=Cool) __________________________________?
Gini(Temp=Mild) __________________________________?
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) =____________________________________________?
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page)
Temeprature is a feature. It can be Hot, cool, mild
Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286
Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232
Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) -
Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232-
0.394=0.028
Gi Temperature Yes No Number of instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
ANSWER 4:Temperature
GINI index approach
Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and
Mild. Let’s summarize decisions for temperature feature.
Gini(Temp=Hot) = 1-(2/4)^2- (2/4)^2 = 0.5
Gini(Temp=Cool) = 1-(3/4)^2-(1/4)^2 = 0.375
Gini(Temp=Mild) = 1-(4/6)^2-(2/6)^2 = 0.445
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) =(4/14) *0.5 +(4/14)*0.375+(6/14)*0.445= 0.439
Humidity is a feature. It can be High or Normal
Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286
Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232
Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) -
Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232-
0.394=0.028
Gi Temperature Yes No Number of instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
Humidity
GINI index approach
Humidity is a binary class feature. It can be high or normal.
Gini(Humidity=High) = 1 – (3/7)^2 – (4/7)^2 = 1 – 0.183 – 0.326 = 0.489
Gini(Humidity=Normal) = 1 – (6/7)^2 – (1/7)^2 = 1 – 0.734 – 0.02 = 0.244
Weighted sum for humidity feature will be calculated next
Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 = 0.367

Humidity is a feature. It can be High or Normal
Weighted_entropy(Humidity=high) =(7/14)*( -(3/7)*log_2(3/7)- (4/7)*log_2(4/7))=0.492
Weighted_entropy(Humidity=Normal) =(7/14)*( -(6/7)*log_2(6/7)- (1/7)*log_2(1/7))=0.296
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Humidity=high) -
Weighted_entropy(Humidity=Normal) =0.94- 0.492- 0.296= 0.152
Humidity Yes No Number of instances

High 3 4 7
Normal 6 1 7
GINI index approach
Exercise 5: Wind
Wind is a binary class similar to humidity. It can be weak and strong.
Gini(Wind=Weak) = 1-(6/8)^2- (2/8)^2 =0.375
Gini(Wind=Strong) = 1-(3/6)^2-(3/6)^2 = 0.5
Gini(Wind) = (8/14) * 0.375 + (6/14) * 0.5 = 0.428
Parent entropy= _________________________________________?
Weighted_entropy(wind=weak) =____________________________?
Weighted_entropy(wind=strong) =___________________________?
Information_gain_for_humidity= Parent entropy-

Weighted_entropy(Humidity=high) - Weighted_entropy(Humidity=Normal)
=____________________?
Wind Yes No Number of G

instances
Weak 6 2 8
Strong 3 3
Ch13. Decision 6
tree v9.b2 28
GINI index approach Answer 5: Wind
Wind is a binary class similar to humidity. It can be weak and strong.
Gini(Wind=Weak) = 1-(6/8)^2- (2/8)^2 =0.375
Gini(Wind=Strong) = 1-(3/6)^2-(3/6)^2 = 0.5
Gini(Wind) = (8/14) * 0.375 + (6/14) * 0.5 = 0.428
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last
page)
Weighted_entropy(wind=weak) =(8/14)*( -(6/8)*log_2(6/8)-
(2/8)*log_2(2/8))=0.464
Weighted_entropy(wind=strong) =(6/14)*( -(3/6)*log_2(3/6)-
(3/6)*log_2(3/6))=0.428
Information_gain_for_humidity= Parent entropy-
Weighted_entropy(Humidity=high) - Weighted_entropy(Humidity=Normal)
=0.94- 0.464 - 0.428 = 0.048
Wind Yes No Number of G

instances
Weak 6 2 8
Strong 3 3
Ch13. Decision 6
tree v9.b2 29
Question 6 : Time to decide
Use either Gini or information gain
– Method 1:
Question: the lowest . Gini_index=1-(pi2)
Choose
which one – Or
is used as – Method 2:
the top – Information gain (based on Entropy) is the highest:
node
Feature Method 1 : Gini index Method 2 : Information gain

by entropy
Outlook 0.342 0.246
Temperature 0.439 0.028
Humidity 0.367 0.152
Wind 0.428 0.048
Answer 6 : Time to decide
Use either Gini or information gain
– Method 1:
the lowest . Gini_index=1-(pi2)
– Or
– Method 2: Answer: Both
– Information gain (based on Entropy) is the highest: methods agrees
• Information gain (IG)=Entropy(parent)-entropy(child) with each other
Method 2 : Information gain

Feature Method 1 : Gini index by entropy
Outlook ( picked as top node) 0.342 (lowest) 0.246 (highest)
Humidity 0.367 0.152
Wind 0.428 0.048
Top of the tree
Time to decide : Put outlook
decision at the top of the tree.
All
“Decision=yes”,
so the branch
for “Overcast” is
over
You might realize that sub dataset in the overcast leaf has only
yes decisions. This means that overcast leaf is over.
Top of the tree

• We will apply same principles to those sub
datasets in the following steps. Focus on the sub
dataset for sunny outlook. We need to find the Gini
index scores for temperature, humidity and wind
features respectively.
Total population total under Outlook_sunny=5,
yes=2,no=3
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes

Gini of temperature for sunny outlook
Gini approach
Gini(Outlook=Sunny & Temp=Hot) = 1-(0/2)^2-(2/2)^2 = 0
Gini(Outlook=Sunny & Temp=Cool) =1-(1/1)^2-(0/1)^2 = 0
Gini(Outlook=Sunny & Temp=Mild) = 1-(1/2)^2-(1/2)^2 = 0.5
Gini(Outlook=Sunny & Temp)=(2/5)*0+(1/5)*0+(2/5)*0.5 = 0.2
Information gain by entropy : Total population under Outlook_sunny=5, yes=2,no=3
Parent entropy_Outlook_sunny= -(2/5)*log_2(2/5)-(3/5)*log_2(3/5) = 0.97
Weighted_entropy(outlook_sunny=hot) =(2/5)*( -(0/2)*log_2(0/2)- (2/2)*log_2(2/2))=0
Weighted_entropy(outlook_sunny=cool) =(1/5)*( -(0/1)*log_2(0/1)- (0/1)*log_2(0/1))=0
Weighted_entropy(outlook_sunny=Mild) =(2/5)*( -(1/2)*log_2(1/2)- (1/2)*log_2(1/2))=0.4
Information_gain_for_outlook_sunny= Parent entropy_Outlook_sunny -
Weighted_entropy(outlook_sunny=hot) – Weighted_entropy(outlook_sunny=cool)-
Weighted_entropy(outlook_sunny=Mild)= 0.97 -0-0-0.4= 0.57
Temperature Yes No Number of instances

Hot 0 2 2
Cool 1 0 1
Mild 1 1 2
Gini of humidity for sunny outlook
• Gini approach
• Gini(Outlook=Sunny and Humidity=High) = 1-(0/3)^2-(3/3)^2 = 0
• Gini(Outlook=Sunny and Humidity=Normal) = 1-(2/2)^2-(0/2)^2 = 0
• Gini(Outlook=Sunny and Humidity) = (3/5)*0 + (2/5)*0 = 0
Weighted_entropy(outlook_humidity=high) =(3/5)*( -(0/3)*log_2(0/3)- (3/3)*log_2(3/3))=0
Weighted_entropy(outlook_humidity=normal) =(2/5)*( -(2/2)*log_2(2/2)- (0/2)*log_2(0/2))=0
Information_gain_for_outlook_humidity= Parent entropy_Outlook_sunny -
Weighted_entropy(outlook_humidity=high)- Weighted_entropy(outlook_humidity=normal)=
0.97 -0-0= 0.97

High 0 3 3
Normal 2 0 2

Gini of wind for sunny outlook
Gini Approach
Gini(Outlook=Sunny and Wind=Weak) = 1-(1/3)^2-(2/3)^2 =0.445
Gini(Outlook=Sunny and Wind=Strong) = 1-(1/2)^2-(1/2)^2 = 0.5
Gini(Outlook=Sunny and Wind) = (3/5)*0.445 + (2/5)*0.5 = 0.467
Weighted_entropy(Outlook=Sunny and Wind=Weak) =(3/5)*( -(1/3)*log_2(1/3)-
(2/3)*log_2(2/3))=0.551
Weighted_entropy(Outlook=Sunny and Wind=strong) =(2/5)*( -(1/2)*log_2(1/2)-
(1/2)*log_2(1/2))=0.4
Information_gain_for_Outlook_Sunny= Parent entropy_Outlook_sunny -
Weighted_entropy(Outlook=Sunny and Wind=Weak) - Weighted_entropy(Outlook=Sunny
and Wind=strong) =0.97-0.551-0.4=0.019
Wind Yes No Number of instances

Weak 1 2 3
Strong 1 1 2
Decision for sunny outlook
• We’ve calculated gini index scores for feature when outlook is
sunny. The winner is humidity because it has the lowest value.
We’ll put humidity check at the extension of sunny outlook
• Split using the attribute that the Gini (impurity) index is the lowest .
Gini_index=1-(pi)2
• Or splitting using Both results agree
• Information gain (based on Entropy) is the highest: with each other.
– Information gain (IG)=Entropy(parent)-entropy(child) Humidity is picked
– IG= pparent[log2(pparent)]-pchild[log2 (pchild)] as the second level
node
Information gain by
Feature Gini index entropy
Humidity 0 is the lowest 0.97 is the highest
Wind 0.466 0.019
Result
Both results agree
with each other.
• Humidity is picked
as the second level
node
Pure No Pure yes
When humidity is When humidity is “Normal”,

“High”, decision is pure decision is pure “Yes”
“No” Ch13. Decision tree v9.b2 39
As seen, decision is always “no” for high humidity and sunny
outlook. On the other hand, decision will always be “yes” for
normal humidity and sunny outlook. This branch is over.

Now we will work on the Rain branch
Now, we need to focus on rain outlook.

We’ll calculate Gini index scores for
temperature, humidity and wind features when
outlook is rain.
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No

Gini of temperature for rain outlook
Gini Approach
Gini(Outlook=Rain and Temp.=Cool) = 1-(1/2)^2-(1/2)^2 = 0.5
Gini(Outlook=Rain and Temp.=Mild) = 1-(2/3)^2-(1/3)^2 = 0.444
Gini(Outlook=Rain and Temp.) = (2/5)*0.5 + (3/5)*0.444 = 0.466
Information gain by entropy : Total population under Outlook_rain=5, yes=3,no=2

Parent entropy_Outlook_rain= -(3/5)*log_2(3/5)-(2/5)*log_2(2/5) = 0.97
Weighted_entropy(Outlook_rain=cool) =(2/5)*( -(1/2)*log_2(1/2)- (1/2)*log_2(1/2))=0.4
Weighted_entropy(Outlook_rain=Mild) =(3/5)*( -(2/3)*log_2(2/3)- (1/3)*log_2(1/3))=0.551
Information_gain_for_Outlook_rain= Parent entropy_Outlook_rain -
Weighted_entropy(Outlook_rain=cool) - Weighted_entropy(Outlook_rain=Mild) = 0.97
-0.4-0.551= 0.019
Temperature Yes No Number of instances

Cool 1 1 2
Mild 2 1 3

Gini of Humidity for rain outlook
Gini approach
Gini(Outlook=Rain and humidity=High) = 1-(0/3)^2-(3/3)^2 =0
Gini(Outlook=Rain and humidity=normal) = 1-(2/2)^2-(0/2)^2 = 0
Gini(Outlook=Rain and humidity) = (3/5)*0 + (2/5)*0 = 0
Weighted_entropy(Outlook=rain and humidity=high) =(3/5)*( -(0/3)*log_2(0/3)-

(3/3)*log_2(3/3))=0
Weighted_entropy(Outlook=rain and humidity=normal) =(2/5)*( -(2/2)*log_2(2/2)-
(0/2)*log_2(0/2))=0
Information_gain_for_outlook_rain= Parent entropy_Outlook_rain -
Weighted_entropy(Outlook=rain and humidity=high) - Weighted_entropy(Outlook=rain and
humidity=high) =0.97-0-0=0.97

High 0 3 3
Normal 2 0 2

Gini of wind for rain outlook
Gini approach
Gini(Outlook=Rain Gini(Outlook=Rain and Wind=Weak) = 1-(3/3)^2-(0/3)^2 = 0
Gini(Outlook=Rain and Wind=Strong) = 1-(0/2)^2-(2/2)^2 = 0
Gini(Outlook=Rain and Wind) = (3/5)*0 + (2/5)*0 = 0
and Wind=Weak) = 1-(3/3)^2-(0/3)^2 = 0
Gini(Outlook=Rain and Wind=Strong) = 1-(0/2)^2-(2/2)^2 = 0
Gini(Outlook=Rain and Wind) = (3/5)*0 + (2/5)*0 = 0
Weighted_entropy(Outlook=rain and Wind=Weak) =(3/5)*( -(3/3)*log_2(3/3)-
(0/3)*log_2(0/3))=0
Weighted_entropy(Outlook=rain and Wind=strong) =(2/5)*( -(0/2)*log_2(0/2)-
(2/2)*log_2(2/2))=0
Information_gain_for_outlook_rain= Parent entropy_Outlook_rain -
Weighted_entropy(Outlook=rain and Wind=Weak) - Weighted_entropy(Outlook=rain and
Wind=strong) = 0.97-0-0=0.97
Wind Yes No Number of instances

Weak 3 0 3
Strong 0 2 2
Decision for rain outlook
The winner is wind feature for rain outlook because it has the minimum gini index
score in features.
• Put the wind feature for rain outlook branch and monitor the new sub data sets.
Split using the attribute that the Gini (impurity) index is the lowest .
Gini_index=1-(pi)2
• Or
• Information gain (based on Entropy) is the highest:
– Information gain (IG) =Entropy(parent)-entropy(child)
– IG= pparent[log2(pparent)]-pchild[log2 (pchild)]
Feature Gini index Information gain by entropy

Humidity 0 is the lowest 0.97 is highest
Wind 0 is the lowest 0.97 is the highest (pick this )
Put the wind feature for rain outlook branch and monitor the
new sub data sets. Can repeat the calculation to find the
complete solution.
• However, you might realize that sub dataset in the overcast leaf has only
yes decisions. This means that overcast leaf is over.
Pure :yes Pure :yes
Sub data sets for weakCh13.

and strong
Decision tree v9.b2wind and rain outlook 46
Final result
• As seen, decision is always “yes” when “wind” is
“weak”. On the other hand, decision is always “no” if
“wind” is “strong”. This means that this branch is
over.

Example 2
Design a tree to find out whether an

umbrella is needed

An example: design a tree to find out
whether an umbrella is needed
• Weather Driving Class=Umbrella
• ------- -------- ----------
• 1 Sunny 1 Yes 1 Yes
• 2 Cloudy 2 No 2 No
• 3 Rainy
• ------- ----- -------
• 1 1 2
The first question is :
• 1 2 2 Choose the root attribute:
• 2 1 2 You have two choices for the
• 3 1 2 root attribute:
• 2 2 1 1) Weather
• 3 1 2 2) Driving
• 3 2 1
• 2 2 2
• 2 2 1

How to build the tree
• First question: You have 2 choices
• 1) Root is attribute “Weather”: The braches are
– Sunny or not , find metric M_sunny Root=weather
– Cloudy or not, find metric M_cloudy
– Rainy or not, find metric M_rainy
– Total weather_split_metric= weight_sunny*M_sunny+ Sunny Cloudy Rainy
weight_cloudy*M_cloudy+ weight_rainy*M_rainy
– (If this is smaller, pick “weather” as root)
OR
• 2) Root is attribute “Driving”: Root=driving
– Yes or n umbrella , find metric M_drive
– Total split_metric_drive= weight_drive* Yes No
M_drive (umbrella) (umbrella)
– Note weight_drive =1, since it is the only
choice
– (If this is smaller, pick “driving” as root)
• We will describe the procedure using 7
steps Ch13. Decision tree v9.b2 50
Steps to develop the tree. Weather: step1
If root is attribute “weather”: Sunny ?
yes
• Step1 : if root is attribute No
“weather”, branch is
“Sunny”, find split metric
(M_sunny) Weather:
step2
Cloudy ?
• Step2 : if root is attribute
yes No
“weather”, branch is
“Cloudy”, find split metric
(M_cloudy)
Weather:
• Step3: if root is attribute Rainy ?
step3
“weather”, branch is “Rainy”,

yes No
find split metric (M_rainy)

Step1: Find M_sunny, Weather: step1
Weight_sunny Sunny ?
• N=Number of samples=9 yes No

• M1=Number of sunny cases=2
• W1=Weight_sunny=M1/N=2/9
• N1y=Num of Umbrella yes=0

• N1n=Num of Umbrella No=2
• Nsunny=2
• G1=Gini=1- ((N1y/M1)^2+(N1n/M1)^2)
• = 1-((0/2)^2+(2/2)^2)=0
Gini _ index  1   p
• Metric_sunny=G1 or E1 2
i
i
For step2: Find M_cloudy, Weather: step2
Weight_cloudy Cloudy?
yes No
• N=Number of samples=9
• M2=Number of cloudy cases=4
• W2=Weight_cloudy=M2/N=4/9
• N2y=Num of Umbrella Yes, when cloudy=2

• N2n=Num of Umbrella No, when cloudy=2
• G2=Gini=1- ((N2y/M2)^2+(N2n/M2)^2)
• = 1-((2/4)^2+(2/4)^2)=0.5
• Metric_cloudy=G2 or E2
Gini _ index  1   p 2
i
i Ch13. Decision tree v9.b2 53
For step3: Find M_rainy, Weather: step3
Weight_rainy Rainy ?
yes No
• M3=Number of rainy cases=3
• W3=Weight_rainy=M3/N=3/9
• N3y=Num of Umbrella Yes, when rainy=1

• N3n=Num of Umbrella No, when rainy=2
• G3=Gini=1- ((N3y/M3)^2+(N3n/M3)^2)
• = 1-((1/3)^2+(2/3)^2)=0.444
• Metric_rainy=G3
  2
Gini _ index 1 p
or E3i
i

Step4: metric for weather
• weather_split_metric=
weight_sunny*M_sunny+
weight_cloudy*M_cloudy+
weight_rainy*M_rainy
• weather_split_metric_Gini=
W1*G1+W2*G2+W3*G3
• =(2/9)*0+(4/9)*0.5+(3/9)*0.44= 0.3689

Step5: Find M_driving, Weather: step5
Weight_driving Rainy ?
yes No
• M5=Number of driving cases=9
• W5=Weight_rainy=M5/N=9/9
• N5y=Num of Umbrella Yes, when driving=3

• N5n=Num of Umbrella No, when driving=6
• G5=Gini=1- ((N5y/M5)^2+(N5n/M5)^2)
• = 1-((3/9)^2+(6/9)^2)=0.444
• Metric_rainy=G3 or E3
i Ch13. Decision tree v9.b2 56
Step6: metric for driving
• driving_split_metric= driving_sunny*M_yes+
driving_cloudy*M_no
• driving_split_metric_Gini=
W5*G5=(9/9)*0.444= 0.444

Step7 make decision
• Decide which is suitable to be the root (weather or driving)
• Compare
• weather_split_metric_Gini= 0.3689
• driving_split_metric_Gini= 0.444
• Choose the lower score, so weather is selected as the root,
• see more example https
://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree
-example
/
• Can repeat this procedure for the development of leaves
(subtrees)
To continue
• Weather Driving Class=Umbrella
• ------- -------- ----------
• 1 Sunny 1 Yes 1 Yes
• 2 Cloudy 2 No 2 No
To continue the
• 3 Rainy construction of the
• ------- ----- ------- tree:
• 1 1 2 Should driving put
• 1 2 2 under
• 2 1 2 1) sunny or
• 3 1 2 2) cloudy or
• 2 2 1 3) Rainy ?
• 3 1 2
• 3 2 1
• 2 2 2
• 2 2 1

Step7: i
• If weather=sunny, and the leaf =driving

• Only 2 cases, so weight=2/9, and umbrella_yes=2,
umbrella_no=0
• Ginit_a=(2/9)*(1-(0/2)^2-(2/2)^2)=0
• If weather=cloudy, and the leaf =driving

umbrella_no=2
• Ginit_b=(4/9)*(1-(2/4)^2-(2/4)^2)=0.222
• If weather=rainy, and the leaf =driving

umbrella_no=2
• Ginit_c=(3/9)*(1-(1/3)^2-(2/3)^2)=0.148
• Since Ginit_a is the smallest, so driving should be placed under

sunny.
• Can continue , similar to previous approaches, see blue solid
lines and dotted lines (red)

The final result
• Root=weather
Rainy
Sunny cloudy
Driving Driving
yes No
Driving yes No
No Yes
yes umbrella umbrella
No Not sure
No umbrella Yes=2 Sample is 3
umbrella N=1 for Cannot resolve, but the
umbrella sample is too small, so
we can ignore it
https://stackoverflow.com/questions/19993139/can-splitting-attribute-appear-ma
ny-times-in-decision-tree

Exercise 7:
Information gain using entropy, example: A decision tree to
determine a person can complete marathon or not
• Total 30 students
• Target(complete marathon):yes=16, no=14
• Bodymass:
– Heavy (13 in total): 1 yes, 12 no
– Fit (17 in total): 13 yes, 4 no
• Exercise(habit)
– Daily (total 8):7 yes, 1 no
– Weekly (total 10): 4 yes, 6 no
– Occasionally (total 12): 5 yes, 7 no
• Build a tree, first, we need to select bodymass or habit as the
top node, the calculation will follow.
https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
Answer exercise 7: Calculation1
• Parent entropy :yes=16, no=14
• Test1: Entropy_parent
• =-(14/30)*log_2(14/30)-(16/30)*log_2(16/30)
• =0.997
• Total population =30

• Test2: entropy for bodymass
• Heavy (1 yes, 12 no) total 13
• Fit (13 yes, 4 no) total 17
• Entropy_bodymass_heavy= -(1/13)*log_2(1/13) -(12/13)*log_2(12/13)=0.391
• Weighted_Entropy_bodymass_heavy= (total_bodymass_heavy_polution/total_population)*
Entropy_bodymass_heavy
• Weight_Entropy_bodymass_heavy= (13/30)*0.391
• Entropy_bodymass_fit= -(13/17)*log_2(13/17) -(4/17)*log_2(4/17)=0.787

• Weighted_entropy for bodymass+ Entropy_bodymass_fit = (13/30)*0.391+(17/30)* 0.787 =0.615
• Information gain_for bodymass is top node: Entropy_parent- entropy for weight = 0.997-0.615=0.382

• Calculation2 Exercise(habit)
Test3: entropy for habit
• Habit_Daily (total 8):7 yes, 1 no Total population =30
• Entropy_habit_daily= -(7/8)*log_2(7/8) -(1/8)*log_2(1/8)=0.544 Daily (total 8):7 yes, 1
• Weighted_Entropy_habit_daily= no
(total_habit_daily/total_population)* Entropy_habit_daily Weekly (total 10): 4
• Weighted_Entropy_habit_daily = (8/30)*0.543= 0.1448 yes, 6 no
Occasionally (total 12):
• Habit_Weekly (total 10): 4 yes, 6 no 5 yes, 7 no
• Entropy_habit_ Weekly = -(4/10)*log_2(4/10) -
(6/10)*log_2(6/10)=0.971
• Weighted_Entropy_habit_Weekly = (total_habit_Weekly
/total_population)* Entropy_habit_daily
• Weighted_Entropy_habit_Weekly = (10/30)*0.971= 0.324
• Habit_Occasionally (total 12): 5 yes, 7 no

• Entropy_habit_ Weekly= -(5/12)*log_2(5/12) -
(7/12)*log_2(7/12)=0.98
• Weighted_Entropy_habit_Occasionally = Ch13. Decision tree v9.b2
(total_habit_Occasionally /total_population)*
Entropy_habit_Occasionally
• Weighted_Entropy_habit_Occasionally = (12/30)*0.98= 0.392
- 64
Selection
• It is a node if Information gain (based on Entropy) is the highest:

– Information gain =Entropy(parent)-entropy(child)
– = pparent[log2(pparent)]-pchild[log2 (pchild)]
• Information gain_if_bodymass_is_top_node: Entropy_parent- entropy for weight

=0.997-0.615=0.382
• Information gain_if_habit_is_top_node =Entropy_parent-

Weighted_Entropy_habit_daily- Weighted_Entropy_habit_Weekly -
Weighted_Entropy_habit_Occasionally =0.997-( 0.1448+ 0.324+ 0.392)=0.997-
0.8608= 0.1362
• Conclusion
• Bodymass is picked as the top node because its information gain is bigger.

Decision tree
•
Bodymass Root node
Interior node Heavy Fit Interior node
Leaf Exercise Exercise Exercise Exercise Exercise Exercise

nodes daily daily daily daily daily daily

Exercise 8 (student exercise, no answer given)
• Temperature Humidity Weather Drive/walk Class=Umbrella
• ----------- -------- ------- ---------- ----------
• 1 Low 1 Low 1 Sunny 1 Drive 1 Yes
• 2 Medium 2 Medium 2 Cloudy 2 Walk 2 No
• 3 High 3 High 3 Rain
• ----------- -------- ------- ---------- ----------
• 1 1 1 1 2
• 1 2 1 2 1
• 2 2 1 1 2
• 2 1 1 2 1
• 1 2 1 2 1
• 1 1 2 1 2
• 2 2 2 1 2
• 2 2 3 2 2
• 3 3 3 2 1
• 3 3 3 1 2
http://dni-institute.in/blogs/cart-algorithm-for-decision-tree/
http://people.revoledu.com/kardi/tutorial/DecisionTree/how-decision-tree-algorithm-work.htm
Overfitting
Problem and solution

Overfitting problem and solution
• Problem: Your trained model only works for training
data but will fail when handling new or unseen data
• Solution: use error estimation to prune (remove some
leaves) the decision tree to avoid overfitting.
• One approach is Post-pruning using Error estimation
References: https://www.investopedia.com/terms/o/overfitting.asp
https://www.investopedia.com/terms/o/overfitting.asp#ixzz5OJ5hm9Hb

Pruning methods
• Idea: Remove leaves that contribute little or
cause overfitting.
• The original Tree is T, it has a subtree Tt2, we
prune Tt2 and the pruned tree is shown below
Tree T subtree T2 pruned tree
https://en.wikipedia.org/wiki/Pruning_(decision_trees)
http://mlwiki.org/index.php/Cost-Complexity_Pruning
Post-pruning using Error estimation
Defining terms
• For the whole dataset : use about 70 % for training data; 30 % for
testing (pruning and Cross-Validation use) http://
mlwiki.org/index.php/Cross-Validation
• Choose examples for training/testing sets randomly
• Training data is used to construct the decision tree (will be pruned)
• Testing data is used for pruning
• f= Error on training data
• N= number of instances covered by the leaves
• Z= score of a normal distribution https://
en.wikipedia.org/wiki/Standard_normal_table
• e=Error on testing data (calculated from f,N,z)

http://www.saedsayad.com/decision_tree_overfitting.htm
Post-pruning by Error estimation example
For f=5/14, it
means 5 fails to
classify on 14
samples
The error rate at the parent node is

0.46 and since the error rate for its
children (0.51) increases with the
split, we do not want to keep the
children.
In this example we set Z to 0.69 (see normal
distribution curve) which is equal to a =1/(1+1)
=2/(4+2)
confidence level of 75%.
see
Note: •
(6/14)*0.47 +(2/14)*0.72 + (6/14)*0.47=0.5057
https://www.rapidtables.com/math/probability/normal_distribution.html
Conclusion
• We studied how to build a decision tree
• We learned the method of splitting using Gini
index
• We learned the method of splitting using
information gain by entropy
• We learn the idea of pruning to improve
classification and solve the overfitting problem

References
• http://
people.revoledu.com/kardi/tutorial/DecisionT
ree/how-decision-tree-algorithm-work.htm
• https
://onlinecourses.science.psu.edu/stat857/nod
e/60
/
• https://sefiks.com/2018/08/27/a-step-by-step
-cart-decision-tree-example/
Appendix

Example using sklearn
• https://github.com/alameenkhader/spam_classifier
• Using sklearn
• from sklearn import tree
• # You may hard code your data as given or to use a .csv file import csv then fetch your data from .csv file
• # Assume we have two dimensional feature space with two classes we like distinguish
• dataTable = [[2,9],[4,10],[5,7],[8,3],[9,1]]
• dataLabels = ["Class A","Class A","Class B","Class B","Class B"]
• # Declare our classifier
• trained_classifier = tree.DecisionTreeClassifier()
• # Train our classifier with data we have
• trained_classifier = trained_classifier.fit(dataTable,dataLabels)
• # We are done with training, so it is time to test it!
• someDataOutOfTrainingSet = [[10,2]]
• label = trained_classifier.predict(someDataOutOfTrainingSet)
• # Show the prediction of trained classifier for data [11,2]
• print(label[0])

Iris test using sklearn, this will generate
dt.dot file
• import numpy as np
• from sklearn import datasets
• from sklearn import tree
• # Load iris
• iris = datasets.load_iris()
• X = iris.data
• y = iris.target
• # Build decision tree classifier

• dt = tree.DecisionTreeClassifier(criterion='entropy')
• dt.fit(X, y)
• dotfile = open("dt.dot", 'w')
• tree.export_graphviz(dt, out_file=dotfile, feature_names=iris.feature_names)
• dotfile.close()

• print(__doc__)
• import numpy as np
• import matplotlib.pyplot as plt
• from sklearn.datasets import load_iris

• from sklearn.tree import DecisionTreeClassifier
•
•
•
•
# Parameters
n_classes = 3
plot_colors = "ryb"
plot_step = 0.02
• # Load data
• iris = load_iris()
Iris dataset
• for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
• [1, 2], [1, 3], [2, 3]]):
• # We only take the two corresponding features
http://scikit-
• X = iris.data[:, pair]
• y = iris.target learn.org/stable/auto_examples/tree/plot_iris
•
•
# Train
clf = DecisionTreeClassifier().fit(X, y) .html#sphx-glr-auto-examples-tree-plot-iris-py
• # Plot the decision boundary
• plt.subplot(2, 3, pairidx + 1)
• x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

• y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
• xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
• np.arange(y_min, y_max, plot_step))
• plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)
• Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
• Z = Z.reshape(xx.shape)
• cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)
• plt.xlabel(iris.feature_names[pair[0]])
• plt.ylabel(iris.feature_names[pair[1]])
• # Plot the training points

• for i, color in zip(range(n_classes), plot_colors):
• idx = np.where(y == i)
• plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i],
• cmap=plt.cm.RdYlBu, edgecolor='black', s=15)
• plt.suptitle("Decision surface of a decision tree using paired features")

• plt.legend(loc='lower right', borderpad=0, handletextpad=0)
• plt.axis("tight")
• plt.show()

A working implementation in pure python
• https://machinelearningmastery.com/implem
ent-decision-tree-algorithm-scratch-python
/

• function tt4
• clear
• parent_en=entropy_cal([9,5])
•
code
%humidy-------------------
• en1=entropy_cal([3,4])
• Information_gain(1)=parent_en-(7/14)*en1-(7/14)*en2
• clear en1 en2
•
• %outlook------------------
• Information_gain(2)=parent_en-(5/14)*en1-(4/14)*en2-(5/14)*en3
• clear en1 en2 en3
•
• %wind -------------------------
• Information_gain(3)=parent_en-(8/14)*en1-(6/14)*en2
• clear en1 en2
• %temperature -------------------------
• en1=entropy_cal([2,2]) %hot 2 yes , 2 no
• en2=entropy_cal([3,1]) %mild 3 yes, 1 no
• en3=entropy_cal([4,2]) %cool 4 yes, 2 no
• clear en1 en2 en3
• Information_gain(4)=parent_en-(4/14)*en1-(4/14)*en2-(6/14)*en3
• Information_gain
• %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
• function [en]=entropy_cal(e)
• n=length(e);
• base=sum(e);
•
• %% probabilty of the elements in the input
• for i=1:n
• p(i)=e(i)/base;
• end
• %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
• temp=0;
• for i=1:n
• if p(i)==0 %to avoid the problem of -inf
• temp=0;
• else
• temp= p(i)*log2(p(i))+temp;
• end
• end
• en=-temp;
•

A tree
showing nodes, branches, leaves , attributes and target classes
• Root node ?
If attribute X=Raining
Branch: No Branch: yes
Leaf node1 ? Leaf node3

?
If attribute X=sunny
If attribute Z=driving
?Branch: No Branch: Yes

Yes No
Leaf node2
If Y=stay outdoor
Target Target Target
Branch: Yes Branch: No Class=
Class= No Class= No
umbrella umbrella umbrella
Target Target
Class= Class= No
umbrella umbrella

https://www-users.cs.umn.edu/~kumar001/dmbook/ch4.pdf
MATLAB DEMO
• https://www.mathworks.com/help/stats/exam
ples/classification.html

Ch13. Decision Tree: KH Wong

Uploaded by

Copyright:

Available Formats

Ch13. Decision Tree: KH Wong

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ch13. Decision Tree: KH Wong

Uploaded by

Copyright:

Available Formats

Ch13.

Ch13. Decision tree v9.b2 1

• CART (Classification and Regression Trees)

Ch13. Decision tree v9.b2 2

• You should have enough data for training. It is

Ch13. Decision tree v9.b2 3

Ch13. Decision tree v9.b2 5

Ch13. Decision tree v9.b2 6

The decision tree split this up

Ch13. Decision tree v9.b2 10

Ch13. Decision tree v9.b2 11

Ch13. Decision tree v9.b2 12

Ch13. Decision tree v9.b2 13

Ch13. Decision tree v9.b2 15

Ch13. Decision tree v9.b2 16

Ch13. Decision tree v9.b2 17

• Introduced in CART,[3] variance reduction is

Ch13. Decision tree v9.b2 23

GINI index approach

Outlook Yes No Number of instances

Information gain by entropy approach: Overall decision: yes=9, no=5

Humidity Yes No Number of instances

Information_gain_for_humidity= Parent entropy-

Wind Yes No Number of G

Wind Yes No Number of G

Feature Method 1 : Gini index Method 2 : Information gain

Method 2 : Information gain

Ch13. Decision tree v9.b2 33

Ch13. Decision tree v9.b2 34

Temperature Yes No Number of instances

Humidity Yes No Number of instances

Ch13. Decision tree v9.b2 36

Wind Yes No Number of instances

Pure No Pure yes

When humidity is When humidity is “Normal”,

Ch13. Decision tree v9.b2 40

Now, we need to focus on rain outlook.

Ch13. Decision tree v9.b2 41

Information gain by entropy : Total population under Outlook_rain=5, yes=3,no=2

Temperature Yes No Number of instances

Ch13. Decision tree v9.b2 42

Weighted_entropy(Outlook=rain and humidity=high) =(3/5)*( -(0/3)*log_2(0/3)-

Humidity Yes No Number of instances

Ch13. Decision tree v9.b2 43

Wind Yes No Number of instances

Feature Gini index Information gain by entropy

Pure :yes Pure :yes

Sub data sets for weakCh13.

Ch13. Decision tree v9.b2 47

Design a tree to find out whether an

Ch13. Decision tree v9.b2 48

Ch13. Decision tree v9.b2 49

“weather”, branch is “Rainy”,

Ch13. Decision tree v9.b2 51

• N=Number of samples=9 yes No

• N1y=Num of Umbrella yes=0

• N2y=Num of Umbrella Yes, when cloudy=2

• N3y=Num of Umbrella Yes, when rainy=1

Ch13. Decision tree v9.b2 54

Weighted_entropy(Outlook=rain and humidity=high) =(3/5)( -(0/3)log_2(0/3)-

• Entropy_bodymass_fit= -(13/17)log_2(13/17) -(4/17)log_2(4/17)=0.787