Ch13. Decision Tree: KH Wong

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 82

Ch13.

Decision Tree

KH Wong

Ch13. Decision tree v9.b2 1


We will learn : the Classification and Regression
Tree ( CART) ( or Decision Tree)

• CART (Classification and Regression Trees)


– uses Gini Index(Classification) as metric.
• Other methods: ID3 (Iterative Dichotomiser 3)
– uses Entropy function and Information gain as
metrics.
• References:
• https://medium.com/deep-math-machine-learning-ai/chapter-4-decision-trees-algorithms-b93975f7a1f1
• https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/

Ch13. Decision tree v9.b2 2


To build the tree you need training data

• You should have enough data for training. It is


a supervised learning algorithm
• Divide the whole training data (100%) into:
– Training set (60%): for training your classifier  
– Validation set (10%): for tuning the parameters
– Test set (30%): for testing the performance of
your classifier  

Ch13. Decision tree v9.b2 3


CART can preform classification or
regression functions
• So when to use classification or regression
• Classification trees : Outputs are class symbols
not real numbers. E.g. high, medium, low etc.
• Regression trees : Outputs are target variables
(real numbers): E.g. 1.234, 5.678 etc. (Not
covered in this lecture)
• A good example can be found at
https://sefiks.com/2018/08/28/a-step-by-step
-regression-decision-tree-example/
Ch13. Decision tree v9.b2 4
Classification tree approaches
• Famous trees are ID3 and CART.
– ID3 uses information gain
• https://en.wikipedia.org/wiki/ID3_algorithm
– CART uses Gini index
• https://en.wikipedia.org/wiki/Decision_tree_learning
• Information gain & Gini index will be discussed

Ch13. Decision tree v9.b2 5


How to read Decision tree diagram
• https://www.python-course.eu/Decision_Trees.php

Ch13. Decision tree v9.b2 6


Common terms used with Decision trees
• Root Node: It represents entire population or
sample and this further gets divided into two or
more homogeneous sets.
• Splitting: It is a process of dividing a node into two
or more sub-nodes.
• Decision Node: When a sub-node splits into further
sub-nodes, then it is called decision node.
• Leaf/ Terminal Node: Nodes do not split is called
Leaf or Terminal node.
• Pruning: When we remove sub-nodes of a decision
node, this process is called pruning. You can say
https://medium.com/greyatom/
opposite process of splitting. decision-trees-a-simple-way-to-vi
• Branch / Sub-Tree: A sub section of entire tree is sualize-a-decision-dc506a403aeb
called branch or sub-tree.
• Parent and Child Node: A node, which is divided
into sub-nodes is called parent node of sub-nodes
whereas sub-nodes are the child Ch13.of parent node.
Decision tree v9.b2 7
CART Model Representation
Root
Attribute
• CART is a binary tree. node
(variables)
• Each root node represents a single
input variable (x) and a split point on
that variable (assuming the variable is
numeric).
• The leaf nodes of the tree contain an
output variable (y) which is used to Leaf
make a prediction. Node
(class
• Given a dataset with two inputs (x) of
variable
height in centimeters and weight in prediction)
kilograms the output of sex as male or
female, here is an example of a binary
decision tree (completely fictitious for
demonstration purposes only).
https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/
Ch13. Decision tree v9.b2 8
A simple example of a decision tree
• Use height and weight to guess the sex of a
person.
code
1 If Height > 180 cm Then Male
2 If Height <= 180 cm AND Weight > 80
3 kg Then Male
4 If Height <= 180 cm AND Weight <=
80 kg Then Female
Make Predictions With CART Models

The decision tree split this up


Testing to see if a person a male or not
into rectangles (when p=2
Height > 180 cm: No input variables) or some kind
Weight > 80 kg: No of hyper-rectangles with
Therefore: Female more inputs.
Ch13. Decision tree v9.b2 9
CMSC5707, Ch13. Exercise 1
Decision Tree
• Why it is a binary tree?
– Answer: ____________________
• How many nodes and leaves?
– Answer: ________________
• Male or Female if
– 183cm , 77 Kg? ANS:______
– 173 cm , 79 Kg? ANS: _____
– 177 cm , 85 Kg? ANS: ______

Ch13. Decision tree v9.b2 10


CMSC5707, Ch13. Answer 1
Decision Tree
• Why it is a binary tree?
– Answer: at each node it has 2 leaves
• How many nodes and leaves?
– Answer: Nodes:2, leaves 3.
• Male or Female if
– 183 cm , 77 Kg? ANS: Male
– 173 cm , 79 Kg? ANS: Female
– 177 cm , 85 Kg? ANS: Male

Ch13. Decision tree v9.b2 11


How to create a CART
• Greedy Splitting : Grow the tree
• Stopping Criterion: when the number of
samples in a leaf is small enough.
• Pruning The Tree: remove unnecessary leaves
to
– make it more efficient and
– solve over fitting problems.

Ch13. Decision tree v9.b2 12


Greedy Splitting
• During the process of growing the tree, you need to grow the
leaves from a node by splitting.
• You need a metric to evaluate your split is good or not, e.g. can
use one of the following splitting methods:
– Method 1:
– Gini Index: Split using the attribute that the Gini (impurity) index is the
lowest . Gini_index=1-(pi2)
– Or
– Method 2:
– Information gain (based on Entropy) is the highest:
• Information gain (IG)=Entropy(parent)-entropy(child)
• IG= pparent[log2(pparent)]-pchild[log2 (pchild)]

Ch13. Decision tree v9.b2 13


Example: data input

4 buses

3 cars

3 trains

Total 10 samples
Ch13. Decision tree v9.b2 14
https://www.saedsayad.com/decision_tree.htm
Method 1) Split metric : Entropy(Parent) =Entropy at the
top level
Entropy ( parent )    pi log 2 ( pi )
i

• Prob(bus) =4/10=0.4
• Prob(car) =3/10=0.3
• Prob(train)=3/10=0.3
– Entropy(parent)= -0.4*log_2(0.4)- 0.3*log_2(0.3)-0.3*log_2(0.3)
=1.571
– (note:log_2 is log base 2.)
• Another example: if P(bus)=1, P(car)=0, P(train)=0
– Entropy = 1*log_2(1)-0*log_2(0.00001)- 0*log_2(0.000001)=0
– Entropy = 0, it is very pure, Impurity is 0

Ch13. Decision tree v9.b2 15


Exercise 2
Method 2) Split metric: Gini (impurity) index
Gini _ index  1   pi2
i

• Prob(bus) =4/10=0.4
• Prob(car) =3/10=0.3
• Prob(train)=3/10=0.3
– Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)=____ ?
• Another example if the class has only bus: if P(bus)=1,
P(car)=0, P(train)=0
– Gini Impurity index= ____?
– Impurity is____ ?

Ch13. Decision tree v9.b2 16


Answer2
2) Split metric: Gini (impurity) index
Gini _ index  1   pi2
i

• Prob(bus) =4/10=0.4
• Prob(car)=3/10=0.3
• Prob(train)=3/10=0.3
• Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)= 0.66
• Another example if the class has only bus: if P(bus)=1,
P(car)=0, P(train)=0
– Gini Impurity index= 1-1*1-0*0-0*0=0
– Impurity is 0

Ch13. Decision tree v9.b2 17


Entropy    pi log 2 ( pi ), Gini _ index  1   pi2
Exercise 3. i i

Train
Train

• If the first 2 rows are not bus but train, find entropy and Gini index
• Prob(bus) =2/10=0.2
• Prob(car)=3/10=0.3
• Prob(train)=5/10=0.5
• Entropy =_______________________________?
• Ch13. Decision tree v9.b2
Gini index =_____________________________? 18
Entropy    pi log 2 ( pi ), Gini _ index  1   pi2
ANSWER 3. i i

Train
Train

• If the first 2 rows are not bus but train, find entropy and Gini index
• Prob(bus) =2/10=0.2
• Prob(car)=3/10=0.3
• Prob(train)=5/10=0.5
• Entropy =-0.2*log_2(0.2)- 0.3*log_2(0.3)- 0.5log_2(0.5)= 1.485
• Gini index =1-(0.2*0.2+0.3*0.3+0.5*0.5)=
Ch13. Decision tree v9.b2 0.62 19
Method 3) Split metrics : Variance reduction

• Introduced in CART,[3] variance reduction is


often employed in cases where the target
variable is continuous (regression tree),
meaning that use of many other metrics would
first require discretization before being applied.
The variance reduction of a node N is defined
as the total reduction of the variance of the
target variable x due to the split at this node:
• Details will not be discussed here.
Ch13. Decision tree v9.b2
https://en.wikipedia.org/wiki/Decision_tree_learning 20
Splitting procedure: Recursive Partitioning
Algorithm for CART
• Take all of your training data.
• Consider all possible values of all variables.
• Select the variable/value (X=t1) (e.g.
X1=Height) that produces the greatest
“separation” (or maximum homogeneity - -
less impurity within each of the new part,
meaning lowest Gini index) in the target.
• (X=t1) is called a “split”.
• If X< t1 (e.g. Height <180cm) then send the
data to the “left”; otherwise, send data
point to the “right”.
• Now repeat same process on these two
“nodes”
• You get a “tree”
• Note: CART only uses binary splits.

https://www.casact.org/education/specsem/f2005/handouts/cart.ppt
Ch13. Decision tree v9.b2 21
Example1: Design a decision tree
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No
Ch13. Decision tree v9.b2
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
22
Gini index or Information gain approach
• Gini index
• Gini index is a metric for classification tasks in CART.
• Gini_index = 1 – Σ (Pi2) for i=1 to number of classes
– Note: (two approaches will be shown in following slides)
– Method 1:
– Gini Index: Split using the attribute that the Gini (impurity) index is
the lowest . Gini_index=1-(pi2)
– Or
– Method 2:
– Information gain (based on Entropy) is the highest:
• Information gain (IG)=Entropy(parent)-entropy(child)
• IG= pparent[log2(pparent)]-pchild[log2 (pchild)]

Ch13. Decision tree v9.b2 23


Outlook

GINI index approach


Outlook is a nominal feature. It can be sunny, overcast or rain. I will summarize the final
decisions for outlook feature.
Gini(Outlook=Sunny) = 1 – (2/5)^2– (3/5)^2 = 0.48
Gini(Outlook=Overcast) = 1 – (4/4)^2– (0/4)^2 = 0
Gini(Outlook=Rain) = 1 – (3/5)^2 – (2/5)^2 = 0.48
Then, we will calculate weighted sum of gini indexes for outlook feature.
Gini(Outlook) = (5/14) * 0.48 + (4/14) * 0 + (5/14) * 0.48 = 0.343
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94
Outlook is a nominal feature. It can be sunny, overcast or rain. I will summarize the final decisions for
outlook feature.
Weighted_entropy(Outlook=Sunny) =(5/14)*( -(2/5)*log_2(2/5)-(3/5)*log_2(3/5))=0.347
Weighted_entropy(Outlook=Overcast) = (4/14)*( -(4/4)*log_2(4/4)-(0/4)*log_2(0/4))=0
Weighted_entropy(Outlook=Rain) = (5/14)*( -(3/5)*log_2(3/5)-(2/5)*log_2(2/5))=0.347
Information_gain_for_outlook= Parent entropy- Weighted_entropy(Outlook=Sunny)-
Weighted_entropy(Outlook=Overcast) - Weighted_entropy(Outlook=Rain)=0.94
-0.347-0-0.347= 0.246

Outlook Yes No Number of instances

Sunny 2 3 5

Overcast 4 0 4

Rain 3 2 5
Ch13. Decision tree v9.b2 24
Exercise 4:Temperature
GINI index approach
Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and
Mild. Let’s summarize decisions for temperature feature.
Gini(Temp=Hot) = __________________________________?
Gini(Temp=Cool) __________________________________?
Gini(Temp=Mild) __________________________________?
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) =____________________________________________?
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page)
Temeprature is a feature. It can be Hot, cool, mild
Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286
Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232
Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) -
Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232-
0.394=0.028
Gi Temperature Yes No Number of instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
Ch13. Decision tree v9.b2 25
ANSWER 4:Temperature
GINI index approach
Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and
Mild. Let’s summarize decisions for temperature feature.
Gini(Temp=Hot) = 1-(2/4)^2- (2/4)^2 = 0.5
Gini(Temp=Cool) = 1-(3/4)^2-(1/4)^2 = 0.375
Gini(Temp=Mild) = 1-(4/6)^2-(2/6)^2  = 0.445
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) =(4/14) *0.5 +(4/14)*0.375+(6/14)*0.445= 0.439
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page)
Humidity is a feature. It can be High or Normal
Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286
Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232
Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) -
Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232-
0.394=0.028
Gi Temperature Yes No Number of instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
Ch13. Decision tree v9.b2 26
Humidity
GINI index approach
Humidity is a binary class feature. It can be high or normal.
Gini(Humidity=High) = 1 – (3/7)^2 – (4/7)^2 = 1 – 0.183 – 0.326 = 0.489
Gini(Humidity=Normal) = 1 – (6/7)^2 – (1/7)^2 = 1 – 0.734 – 0.02 = 0.244
Weighted sum for humidity feature will be calculated next
Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 = 0.367

Information gain by entropy approach: Overall decision: yes=9, no=5


Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page)
Humidity is a feature. It can be High or Normal
Weighted_entropy(Humidity=high) =(7/14)*( -(3/7)*log_2(3/7)- (4/7)*log_2(4/7))=0.492
Weighted_entropy(Humidity=Normal) =(7/14)*( -(6/7)*log_2(6/7)- (1/7)*log_2(1/7))=0.296
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Humidity=high) -
Weighted_entropy(Humidity=Normal) =0.94- 0.492- 0.296= 0.152

Humidity Yes No Number of instances


High 3 4 7
Normal 6 1 7
Ch13. Decision tree v9.b2 27
GINI index approach
Exercise 5: Wind
Wind is a binary class similar to humidity. It can be weak and strong.
Gini(Wind=Weak) = 1-(6/8)^2- (2/8)^2 =0.375
Gini(Wind=Strong) = 1-(3/6)^2-(3/6)^2 = 0.5
Gini(Wind) = (8/14) * 0.375 + (6/14) * 0.5 = 0.428
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= _________________________________________?
Weighted_entropy(wind=weak) =____________________________?
Weighted_entropy(wind=strong) =___________________________?

Information_gain_for_humidity= Parent entropy-


Weighted_entropy(Humidity=high) - Weighted_entropy(Humidity=Normal)
=____________________?

Wind Yes No Number of G


instances
Weak 6 2 8
Strong 3 3
Ch13. Decision 6
tree v9.b2 28
GINI index approach Answer 5: Wind
Wind is a binary class similar to humidity. It can be weak and strong.
Gini(Wind=Weak) = 1-(6/8)^2- (2/8)^2 =0.375
Gini(Wind=Strong) = 1-(3/6)^2-(3/6)^2 = 0.5
Gini(Wind) = (8/14) * 0.375 + (6/14) * 0.5 = 0.428
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last
page)
Weighted_entropy(wind=weak) =(8/14)*( -(6/8)*log_2(6/8)-
(2/8)*log_2(2/8))=0.464
Weighted_entropy(wind=strong) =(6/14)*( -(3/6)*log_2(3/6)-
(3/6)*log_2(3/6))=0.428
Information_gain_for_humidity= Parent entropy-
Weighted_entropy(Humidity=high) - Weighted_entropy(Humidity=Normal)
=0.94- 0.464 - 0.428 = 0.048

Wind Yes No Number of G


instances
Weak 6 2 8
Strong 3 3
Ch13. Decision 6
tree v9.b2 29
Question 6 : Time to decide
Use either Gini or information gain
– Method 1:
– Gini Index: Split using the attribute that the Gini (impurity) index is
Question: the lowest . Gini_index=1-(pi2)
Choose
which one – Or
is used as – Method 2:
the top – Information gain (based on Entropy) is the highest:
node
• Information gain (IG)=Entropy(parent)-entropy(child)
• IG= pparent[log2(pparent)]-pchild[log2 (pchild)]

Feature Method 1 : Gini index Method 2 : Information gain


by entropy
Outlook 0.342 0.246
Temperature 0.439 0.028
Humidity 0.367 0.152
Wind 0.428 0.048
Ch13. Decision tree v9.b2 30
Answer 6 : Time to decide
Use either Gini or information gain
– Method 1:
– Gini Index: Split using the attribute that the Gini (impurity) index is
the lowest . Gini_index=1-(pi2)
– Or
– Method 2: Answer: Both
– Information gain (based on Entropy) is the highest: methods agrees
• Information gain (IG)=Entropy(parent)-entropy(child) with each other
• IG= pparent[log2(pparent)]-pchild[log2 (pchild)]

Method 2 : Information gain


Feature Method 1 : Gini index by entropy
Outlook ( picked as top node) 0.342 (lowest) 0.246 (highest)
Temperature 0.439 0.028
Humidity 0.367 0.152
Wind 0.428 0.048
Ch13. Decision tree v9.b2 31
Top of the tree
Time to decide : Put outlook
decision at the top of the tree.

All
“Decision=yes”,
so the branch
for “Overcast” is
over
Ch13. Decision tree v9.b2 32
You might realize that sub dataset in the overcast leaf has only
yes decisions. This means that overcast leaf is over.
Top of the tree

Ch13. Decision tree v9.b2 33


• We will apply same principles to those sub
datasets in the following steps. Focus on the sub
dataset for sunny outlook. We need to find the Gini
index scores for temperature, humidity and wind
features respectively.
Total population total under Outlook_sunny=5,
yes=2,no=3
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes

Ch13. Decision tree v9.b2 34


Gini of temperature for sunny outlook
Gini approach
Gini(Outlook=Sunny & Temp=Hot) = 1-(0/2)^2-(2/2)^2 = 0
Gini(Outlook=Sunny & Temp=Cool) =1-(1/1)^2-(0/1)^2 = 0
Gini(Outlook=Sunny & Temp=Mild) = 1-(1/2)^2-(1/2)^2 = 0.5
Gini(Outlook=Sunny & Temp)=(2/5)*0+(1/5)*0+(2/5)*0.5 = 0.2
Information gain by entropy : Total population under Outlook_sunny=5, yes=2,no=3
Parent entropy_Outlook_sunny= -(2/5)*log_2(2/5)-(3/5)*log_2(3/5) = 0.97
Weighted_entropy(outlook_sunny=hot) =(2/5)*( -(0/2)*log_2(0/2)- (2/2)*log_2(2/2))=0
Weighted_entropy(outlook_sunny=cool) =(1/5)*( -(0/1)*log_2(0/1)- (0/1)*log_2(0/1))=0
Weighted_entropy(outlook_sunny=Mild) =(2/5)*( -(1/2)*log_2(1/2)- (1/2)*log_2(1/2))=0.4
Information_gain_for_outlook_sunny= Parent entropy_Outlook_sunny -
Weighted_entropy(outlook_sunny=hot) – Weighted_entropy(outlook_sunny=cool)-
Weighted_entropy(outlook_sunny=Mild)= 0.97 -0-0-0.4= 0.57

Temperature Yes No Number of instances


Hot 0 2 2
Cool 1 0 1
Mild 1 1 2
Ch13. Decision tree v9.b2 35
Gini of humidity for sunny outlook
• Gini approach
• Gini(Outlook=Sunny and Humidity=High) = 1-(0/3)^2-(3/3)^2 = 0
• Gini(Outlook=Sunny and Humidity=Normal) = 1-(2/2)^2-(0/2)^2 = 0
• Gini(Outlook=Sunny and Humidity) = (3/5)*0 + (2/5)*0 = 0
Weighted_entropy(outlook_humidity=high) =(3/5)*( -(0/3)*log_2(0/3)- (3/3)*log_2(3/3))=0
Weighted_entropy(outlook_humidity=normal) =(2/5)*( -(2/2)*log_2(2/2)- (0/2)*log_2(0/2))=0
Information_gain_for_outlook_humidity= Parent entropy_Outlook_sunny -
Weighted_entropy(outlook_humidity=high)- Weighted_entropy(outlook_humidity=normal)=
0.97 -0-0= 0.97

Humidity Yes No Number of instances


High 0 3 3
Normal 2 0 2

Ch13. Decision tree v9.b2 36


Gini of wind for sunny outlook
Gini Approach
Gini(Outlook=Sunny and Wind=Weak) = 1-(1/3)^2-(2/3)^2 =0.445
Gini(Outlook=Sunny and Wind=Strong) = 1-(1/2)^2-(1/2)^2 = 0.5
Gini(Outlook=Sunny and Wind) = (3/5)*0.445 + (2/5)*0.5 = 0.467
Weighted_entropy(Outlook=Sunny and Wind=Weak) =(3/5)*( -(1/3)*log_2(1/3)-
(2/3)*log_2(2/3))=0.551
Weighted_entropy(Outlook=Sunny and Wind=strong) =(2/5)*( -(1/2)*log_2(1/2)-
(1/2)*log_2(1/2))=0.4
Information_gain_for_Outlook_Sunny= Parent entropy_Outlook_sunny -
Weighted_entropy(Outlook=Sunny and Wind=Weak) - Weighted_entropy(Outlook=Sunny
and Wind=strong) =0.97-0.551-0.4=0.019

Wind Yes No Number of instances


Weak 1 2 3
Strong 1 1 2
Ch13. Decision tree v9.b2 37
Decision for sunny outlook
• We’ve calculated gini index scores for feature when outlook is
sunny. The winner is humidity because it has the lowest value.
We’ll put humidity check at the extension of sunny outlook
• Split using the attribute that the Gini (impurity) index is the lowest .
Gini_index=1-(pi)2
• Or splitting using Both results agree
• Information gain (based on Entropy) is the highest: with each other.
– Information gain (IG)=Entropy(parent)-entropy(child) Humidity is picked
– IG= pparent[log2(pparent)]-pchild[log2 (pchild)] as the second level
node

Information gain by
Feature Gini index entropy
Temperature 0.2 0.57
Humidity 0 is the lowest 0.97 is the highest
Wind 0.466 0.019
Ch13. Decision tree v9.b2 38
Result
Both results agree
with each other.
• Humidity is picked
as the second level
node

Pure No Pure yes

When humidity is When humidity is “Normal”,


“High”, decision is pure decision is pure “Yes”
“No” Ch13. Decision tree v9.b2 39
As seen, decision is always “no” for high humidity and sunny
outlook. On the other hand, decision will always be “yes” for
normal humidity and sunny outlook. This branch is over.

Ch13. Decision tree v9.b2 40


Now we will work on the Rain branch

Now, we need to focus on rain outlook.


We’ll calculate Gini index scores for
temperature, humidity and wind features when
outlook is rain.
Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No

Ch13. Decision tree v9.b2 41


Gini of temperature for rain outlook
Gini Approach
Gini(Outlook=Rain and Temp.=Cool) = 1-(1/2)^2-(1/2)^2 = 0.5
Gini(Outlook=Rain and Temp.=Mild) = 1-(2/3)^2-(1/3)^2 = 0.444
Gini(Outlook=Rain and Temp.) = (2/5)*0.5 + (3/5)*0.444 = 0.466

Information gain by entropy : Total population under Outlook_rain=5, yes=3,no=2


Parent entropy_Outlook_rain= -(3/5)*log_2(3/5)-(2/5)*log_2(2/5) = 0.97
Weighted_entropy(Outlook_rain=cool) =(2/5)*( -(1/2)*log_2(1/2)- (1/2)*log_2(1/2))=0.4
Weighted_entropy(Outlook_rain=Mild) =(3/5)*( -(2/3)*log_2(2/3)- (1/3)*log_2(1/3))=0.551
Information_gain_for_Outlook_rain= Parent entropy_Outlook_rain -
Weighted_entropy(Outlook_rain=cool) - Weighted_entropy(Outlook_rain=Mild) = 0.97
-0.4-0.551= 0.019

Temperature Yes No Number of instances


Cool 1 1 2
Mild 2 1 3

Ch13. Decision tree v9.b2 42


Gini of Humidity for rain outlook
Gini approach
Gini(Outlook=Rain and humidity=High) = 1-(0/3)^2-(3/3)^2 =0
Gini(Outlook=Rain and humidity=normal) = 1-(2/2)^2-(0/2)^2 = 0
Gini(Outlook=Rain and humidity) = (3/5)*0 + (2/5)*0 = 0

Weighted_entropy(Outlook=rain and humidity=high) =(3/5)*( -(0/3)*log_2(0/3)-


(3/3)*log_2(3/3))=0
Weighted_entropy(Outlook=rain and humidity=normal) =(2/5)*( -(2/2)*log_2(2/2)-
(0/2)*log_2(0/2))=0
Information_gain_for_outlook_rain= Parent entropy_Outlook_rain -
Weighted_entropy(Outlook=rain and humidity=high) - Weighted_entropy(Outlook=rain and
humidity=high) =0.97-0-0=0.97

Humidity Yes No Number of instances


High 0 3 3
Normal 2 0 2

Ch13. Decision tree v9.b2 43


Gini of wind for rain outlook
Gini approach
Gini(Outlook=Rain Gini(Outlook=Rain and Wind=Weak) = 1-(3/3)^2-(0/3)^2 = 0
Gini(Outlook=Rain and Wind=Strong) = 1-(0/2)^2-(2/2)^2 = 0
Gini(Outlook=Rain and Wind) = (3/5)*0 + (2/5)*0 = 0
and Wind=Weak) = 1-(3/3)^2-(0/3)^2 = 0
Gini(Outlook=Rain and Wind=Strong) = 1-(0/2)^2-(2/2)^2 = 0
Gini(Outlook=Rain and Wind) = (3/5)*0 + (2/5)*0 = 0
Weighted_entropy(Outlook=rain and Wind=Weak) =(3/5)*( -(3/3)*log_2(3/3)-
(0/3)*log_2(0/3))=0
Weighted_entropy(Outlook=rain and Wind=strong) =(2/5)*( -(0/2)*log_2(0/2)-
(2/2)*log_2(2/2))=0
Information_gain_for_outlook_rain= Parent entropy_Outlook_rain -
Weighted_entropy(Outlook=rain and Wind=Weak) - Weighted_entropy(Outlook=rain and
Wind=strong) = 0.97-0-0=0.97

Wind Yes No Number of instances


Weak 3 0 3
Strong 0 2 2
Ch13. Decision tree v9.b2 44
Decision for rain outlook
The winner is wind feature for rain outlook because it has the minimum gini index
score in features.
• Put the wind feature for rain outlook branch and monitor the new sub data sets.
Split using the attribute that the Gini (impurity) index is the lowest .
Gini_index=1-(pi)2
• Or
• Information gain (based on Entropy) is the highest:
– Information gain (IG) =Entropy(parent)-entropy(child)
– IG= pparent[log2(pparent)]-pchild[log2 (pchild)]

Feature Gini index Information gain by entropy


Temperature 0.466 0.019
Humidity 0 is the lowest 0.97 is highest
Wind 0 is the lowest 0.97 is the highest (pick this )
Ch13. Decision tree v9.b2 45
Put the wind feature for rain outlook branch and monitor the
new sub data sets. Can repeat the calculation to find the
complete solution.
• However, you might realize that sub dataset in the overcast leaf has only
yes decisions. This means that overcast leaf is over.

Pure :yes Pure :yes

Sub data sets for weakCh13.


and strong
Decision tree v9.b2wind and rain outlook 46
Final result
• As seen, decision is always “yes” when “wind” is
“weak”. On the other hand, decision is always “no” if
“wind” is “strong”. This means that this branch is
over.

Ch13. Decision tree v9.b2 47


Example 2

Design a tree to find out whether an


umbrella is needed

Ch13. Decision tree v9.b2 48


An example: design a tree to find out
whether an umbrella is needed
• Weather Driving Class=Umbrella
• ------- -------- ----------
• 1 Sunny 1 Yes 1 Yes
• 2 Cloudy 2 No 2 No
• 3 Rainy
• ------- ----- -------
• 1 1 2
The first question is :
• 1 2 2 Choose the root attribute:
• 2 1 2 You have two choices for the
• 3 1 2 root attribute:
• 2 2 1 1) Weather
• 3 1 2 2) Driving
• 3 2 1
• 2 2 2
• 2 2 1

Ch13. Decision tree v9.b2 49


How to build the tree
• First question: You have 2 choices
• 1) Root is attribute “Weather”: The braches are
– Sunny or not , find metric M_sunny Root=weather
– Cloudy or not, find metric M_cloudy
– Rainy or not, find metric M_rainy
– Total weather_split_metric= weight_sunny*M_sunny+ Sunny Cloudy Rainy
weight_cloudy*M_cloudy+ weight_rainy*M_rainy
– (If this is smaller, pick “weather” as root)
OR
• 2) Root is attribute “Driving”: Root=driving
– Yes or n umbrella , find metric M_drive
– Total split_metric_drive= weight_drive* Yes No
M_drive (umbrella) (umbrella)
– Note weight_drive =1, since it is the only
choice
– (If this is smaller, pick “driving” as root)
• We will describe the procedure using 7
steps Ch13. Decision tree v9.b2 50
Steps to develop the tree. Weather: step1
If root is attribute “weather”: Sunny ?

yes
• Step1 : if root is attribute No

“weather”, branch is
“Sunny”, find split metric
(M_sunny) Weather:
step2
Cloudy ?
• Step2 : if root is attribute
yes No
“weather”, branch is
“Cloudy”, find split metric
(M_cloudy)
Weather:
• Step3: if root is attribute Rainy ?
step3

“weather”, branch is “Rainy”,


yes No
find split metric (M_rainy)

Ch13. Decision tree v9.b2 51


Step1: Find M_sunny, Weather: step1
Weight_sunny Sunny ?

• N=Number of samples=9 yes No


• M1=Number of sunny cases=2
• W1=Weight_sunny=M1/N=2/9

• N1y=Num of Umbrella yes=0


• N1n=Num of Umbrella No=2
• Nsunny=2

• G1=Gini=1- ((N1y/M1)^2+(N1n/M1)^2)
• = 1-((0/2)^2+(2/2)^2)=0

Gini _ index  1   p
• Metric_sunny=G1 or E1 2
i
i
Ch13. Decision tree v9.b2 52
For step2: Find M_cloudy, Weather: step2
Weight_cloudy Cloudy?

yes No
• N=Number of samples=9
• M2=Number of cloudy cases=4
• W2=Weight_cloudy=M2/N=4/9

• N2y=Num of Umbrella Yes, when cloudy=2


• N2n=Num of Umbrella No, when cloudy=2

• G2=Gini=1- ((N2y/M2)^2+(N2n/M2)^2)
• = 1-((2/4)^2+(2/4)^2)=0.5

• Metric_cloudy=G2 or E2
Gini _ index  1   p 2
i
i Ch13. Decision tree v9.b2 53
For step3: Find M_rainy, Weather: step3
Weight_rainy Rainy ?

yes No
• N=Number of samples=9
• M3=Number of rainy cases=3
• W3=Weight_rainy=M3/N=3/9

• N3y=Num of Umbrella Yes, when rainy=1


• N3n=Num of Umbrella No, when rainy=2

• G3=Gini=1- ((N3y/M3)^2+(N3n/M3)^2)
• = 1-((1/3)^2+(2/3)^2)=0.444

• Metric_rainy=G3
  2
Gini _ index 1 p
or E3i
i

Ch13. Decision tree v9.b2 54


Step4: metric for weather
• weather_split_metric=
weight_sunny*M_sunny+
weight_cloudy*M_cloudy+
weight_rainy*M_rainy
• weather_split_metric_Gini=
W1*G1+W2*G2+W3*G3
• =(2/9)*0+(4/9)*0.5+(3/9)*0.44= 0.3689

Ch13. Decision tree v9.b2 55


Step5: Find M_driving, Weather: step5
Weight_driving Rainy ?

yes No
• N=Number of samples=9
• M5=Number of driving cases=9
• W5=Weight_rainy=M5/N=9/9

• N5y=Num of Umbrella Yes, when driving=3


• N5n=Num of Umbrella No, when driving=6

• G5=Gini=1- ((N5y/M5)^2+(N5n/M5)^2)
• = 1-((3/9)^2+(6/9)^2)=0.444

• Metric_rainy=G3 or E3
Gini _ index  1   pi2
i Ch13. Decision tree v9.b2 56
Step6: metric for driving
• driving_split_metric= driving_sunny*M_yes+
driving_cloudy*M_no
• driving_split_metric_Gini=
W5*G5=(9/9)*0.444= 0.444

Ch13. Decision tree v9.b2 57


Step7 make decision
• Decide which is suitable to be the root (weather or driving)
• Compare
• weather_split_metric_Gini= 0.3689
• driving_split_metric_Gini= 0.444
• Choose the lower score, so weather is selected as the root,
• see more example https
://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree
-example
/
• Can repeat this procedure for the development of leaves
(subtrees)
Ch13. Decision tree v9.b2 58
To continue
• Weather Driving Class=Umbrella
• ------- -------- ----------
• 1 Sunny 1 Yes 1 Yes
• 2 Cloudy 2 No 2 No
To continue the
• 3 Rainy construction of the
• ------- ----- ------- tree:
• 1 1 2 Should driving put
• 1 2 2 under
• 2 1 2 1) sunny or
• 3 1 2 2) cloudy or
• 2 2 1 3) Rainy ?
• 3 1 2
• 3 2 1
• 2 2 2
• 2 2 1

Ch13. Decision tree v9.b2 59


Gini _ index  1   pi2
Step7: i

• If weather=sunny, and the leaf =driving


• Only 2 cases, so weight=2/9, and umbrella_yes=2,
umbrella_no=0
• Ginit_a=(2/9)*(1-(0/2)^2-(2/2)^2)=0

• If weather=cloudy, and the leaf =driving


• Only 4 cases, so weight=4/9, and umbrella_yes=2,
umbrella_no=2
• Ginit_b=(4/9)*(1-(2/4)^2-(2/4)^2)=0.222

• If weather=rainy, and the leaf =driving


• Only 3 cases, so weight=3/9, and umbrella_yes=1,
umbrella_no=2
• Ginit_c=(3/9)*(1-(1/3)^2-(2/3)^2)=0.148

• Since Ginit_a is the smallest, so driving should be placed under


sunny.
• Can continue , similar to previous approaches, see blue solid
lines and dotted lines (red)

Ch13. Decision tree v9.b2 60


The final result
• Root=weather
Rainy
Sunny cloudy
Driving Driving

yes No
Driving yes No
No Yes
yes umbrella umbrella

No Not sure
No umbrella Yes=2 Sample is 3
umbrella N=1 for Cannot resolve, but the
umbrella sample is too small, so
we can ignore it
https://stackoverflow.com/questions/19993139/can-splitting-attribute-appear-ma
ny-times-in-decision-tree

Ch13. Decision tree v9.b2 61


Exercise 7:
Information gain using entropy, example: A decision tree to
determine a person can complete marathon or not
• Total 30 students
• Target(complete marathon):yes=16, no=14
• Bodymass:
– Heavy (13 in total): 1 yes, 12 no
– Fit (17 in total): 13 yes, 4 no
• Exercise(habit)
– Daily (total 8):7 yes, 1 no
– Weekly (total 10): 4 yes, 6 no
– Occasionally (total 12): 5 yes, 7 no
• Build a tree, first, we need to select bodymass or habit as the
top node, the calculation will follow.
https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
Ch13. Decision tree v9.b2 62
Answer exercise 7: Calculation1
• Parent entropy :yes=16, no=14
• Test1: Entropy_parent
• =-(14/30)*log_2(14/30)-(16/30)*log_2(16/30)
• =0.997

• Total population =30


• Test2: entropy for bodymass
• Heavy (1 yes, 12 no) total 13
• Fit (13 yes, 4 no) total 17
• Entropy_bodymass_heavy= -(1/13)*log_2(1/13) -(12/13)*log_2(12/13)=0.391
• Weighted_Entropy_bodymass_heavy= (total_bodymass_heavy_polution/total_population)*
Entropy_bodymass_heavy
• Weight_Entropy_bodymass_heavy= (13/30)*0.391

• Entropy_bodymass_fit= -(13/17)*log_2(13/17) -(4/17)*log_2(4/17)=0.787


• Weighted_entropy for bodymass+ Entropy_bodymass_fit = (13/30)*0.391+(17/30)* 0.787 =0.615
• Information gain_for bodymass is top node: Entropy_parent- entropy for weight = 0.997-0.615=0.382

Ch13. Decision tree v9.b2 63


• Calculation2 Exercise(habit)
Test3: entropy for habit
• Habit_Daily (total 8):7 yes, 1 no Total population =30
• Entropy_habit_daily= -(7/8)*log_2(7/8) -(1/8)*log_2(1/8)=0.544 Daily (total 8):7 yes, 1
• Weighted_Entropy_habit_daily= no
(total_habit_daily/total_population)* Entropy_habit_daily Weekly (total 10): 4
• Weighted_Entropy_habit_daily = (8/30)*0.543= 0.1448 yes, 6 no
Occasionally (total 12):
• Habit_Weekly (total 10): 4 yes, 6 no 5 yes, 7 no
• Entropy_habit_ Weekly = -(4/10)*log_2(4/10) -
(6/10)*log_2(6/10)=0.971
• Weighted_Entropy_habit_Weekly = (total_habit_Weekly
/total_population)* Entropy_habit_daily
• Weighted_Entropy_habit_Weekly = (10/30)*0.971= 0.324

• Habit_Occasionally (total 12): 5 yes, 7 no


• Entropy_habit_ Weekly= -(5/12)*log_2(5/12) -
(7/12)*log_2(7/12)=0.98
• Weighted_Entropy_habit_Occasionally = Ch13. Decision tree v9.b2
(total_habit_Occasionally /total_population)*
Entropy_habit_Occasionally
• Weighted_Entropy_habit_Occasionally = (12/30)*0.98= 0.392

- 64
Selection

• It is a node if Information gain (based on Entropy) is the highest:


– Information gain =Entropy(parent)-entropy(child)
– = pparent[log2(pparent)]-pchild[log2 (pchild)]

• Information gain_if_bodymass_is_top_node: Entropy_parent- entropy for weight


=0.997-0.615=0.382

• Information gain_if_habit_is_top_node =Entropy_parent-


Weighted_Entropy_habit_daily- Weighted_Entropy_habit_Weekly -
Weighted_Entropy_habit_Occasionally =0.997-( 0.1448+ 0.324+ 0.392)=0.997-
0.8608= 0.1362
• Conclusion
• Bodymass is picked as the top node because its information gain is bigger.

Ch13. Decision tree v9.b2 65


Decision tree

Bodymass Root node

Interior node Heavy Fit Interior node

Leaf Exercise Exercise Exercise Exercise Exercise Exercise


nodes daily daily daily daily daily daily

Ch13. Decision tree v9.b2 66


Exercise 8 (student exercise, no answer given)
• Temperature Humidity Weather Drive/walk Class=Umbrella
• ----------- -------- ------- ---------- ----------
• 1 Low 1 Low 1 Sunny 1 Drive 1 Yes
• 2 Medium 2 Medium 2 Cloudy 2 Walk 2 No
• 3 High 3 High 3 Rain
• ----------- -------- ------- ---------- ----------
• 1 1 1 1 2
• 1 2 1 2 1
• 2 2 1 1 2
• 2 1 1 2 1
• 1 2 1 2 1
• 1 1 2 1 2
• 2 2 2 1 2
• 2 2 3 2 2
• 3 3 3 2 1
• 3 3 3 1 2

http://dni-institute.in/blogs/cart-algorithm-for-decision-tree/
http://people.revoledu.com/kardi/tutorial/DecisionTree/how-decision-tree-algorithm-work.htm
Ch13. Decision tree v9.b2 67
Overfitting

Problem and solution

Ch13. Decision tree v9.b2 68


Overfitting problem and solution
• Problem: Your trained model only works for training
data but will fail when handling new or unseen data
• Solution: use error estimation to prune (remove some
leaves) the decision tree to avoid overfitting.
• One approach is Post-pruning using Error estimation

References: https://www.investopedia.com/terms/o/overfitting.asp
 https://www.investopedia.com/terms/o/overfitting.asp#ixzz5OJ5hm9Hb 

Ch13. Decision tree v9.b2 69


Pruning methods
• Idea: Remove leaves that contribute little or
cause overfitting.
• The original Tree is T, it has a subtree Tt2, we
prune Tt2 and the pruned tree is shown below
Tree T subtree T2 pruned tree

https://en.wikipedia.org/wiki/Pruning_(decision_trees)

http://mlwiki.org/index.php/Cost-Complexity_Pruning
Ch13. Decision tree v9.b2 70
Post-pruning using Error estimation
Defining terms
• For the whole dataset : use about 70 % for training data; 30 % for
testing (pruning and Cross-Validation use) http://
mlwiki.org/index.php/Cross-Validation
• Choose examples for training/testing sets randomly
• Training data is used to construct the decision tree (will be pruned)
• Testing data is used for pruning
• f= Error on training data
• N= number of instances covered by the leaves
• Z= score of a normal distribution https://
en.wikipedia.org/wiki/Standard_normal_table
• e=Error on testing data (calculated from f,N,z)

Ch13. Decision tree v9.b2 71


http://www.saedsayad.com/decision_tree_overfitting.htm
Post-pruning by Error estimation example

For f=5/14, it
means 5 fails to
classify on 14
samples

The error rate at the parent node is


0.46 and since the error rate for its
children (0.51) increases with the
split, we do not want to keep the
children.
In this example we set Z to 0.69 (see normal
distribution curve) which is equal to a =1/(1+1)
=2/(4+2)
confidence level of 75%.
see

Note: •
(6/14)*0.47 +(2/14)*0.72 + (6/14)*0.47=0.5057
Ch13. Decision tree v9.b2 72
https://www.rapidtables.com/math/probability/normal_distribution.html
Conclusion
• We studied how to build a decision tree
• We learned the method of splitting using Gini
index
• We learned the method of splitting using
information gain by entropy
• We learn the idea of pruning to improve
classification and solve the overfitting problem

Ch13. Decision tree v9.b2 73


References
• http://
people.revoledu.com/kardi/tutorial/DecisionT
ree/how-decision-tree-algorithm-work.htm
• https
://onlinecourses.science.psu.edu/stat857/nod
e/60
/
• https://sefiks.com/2018/08/27/a-step-by-step
-cart-decision-tree-example/
Ch13. Decision tree v9.b2 74
Appendix

Ch13. Decision tree v9.b2 75


Example using sklearn
• https://github.com/alameenkhader/spam_classifier
• Using sklearn
• from sklearn import tree
• # You may hard code your data as given or to use a .csv file import csv then fetch your data from .csv file
• # Assume we have two dimensional feature space with two classes we like distinguish
• dataTable = [[2,9],[4,10],[5,7],[8,3],[9,1]]
• dataLabels = ["Class A","Class A","Class B","Class B","Class B"]
• # Declare our classifier
• trained_classifier = tree.DecisionTreeClassifier()
• # Train our classifier with data we have
• trained_classifier = trained_classifier.fit(dataTable,dataLabels)
• # We are done with training, so it is time to test it!
• someDataOutOfTrainingSet = [[10,2]]
• label = trained_classifier.predict(someDataOutOfTrainingSet)
• # Show the prediction of trained classifier for data [11,2]
• print(label[0])

Ch13. Decision tree v9.b2 76


Iris test using sklearn, this will generate
dt.dot file
• import numpy as np
• from sklearn import datasets
• from sklearn import tree

• # Load iris
• iris = datasets.load_iris()
• X = iris.data
• y = iris.target

• # Build decision tree classifier


• dt = tree.DecisionTreeClassifier(criterion='entropy')
• dt.fit(X, y)
• dotfile = open("dt.dot", 'w')
• tree.export_graphviz(dt, out_file=dotfile, feature_names=iris.feature_names)
• dotfile.close()

Ch13. Decision tree v9.b2 77


• print(__doc__)

• import numpy as np
• import matplotlib.pyplot as plt

• from sklearn.datasets import load_iris


• from sklearn.tree import DecisionTreeClassifier





# Parameters
n_classes = 3
plot_colors = "ryb"
plot_step = 0.02

• # Load data
• iris = load_iris()
Iris dataset
• for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
• [1, 2], [1, 3], [2, 3]]):
• # We only take the two corresponding features
http://scikit-
• X = iris.data[:, pair]
• y = iris.target learn.org/stable/auto_examples/tree/plot_iris


# Train
clf = DecisionTreeClassifier().fit(X, y) .html#sphx-glr-auto-examples-tree-plot-iris-py
• # Plot the decision boundary
• plt.subplot(2, 3, pairidx + 1)

• x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1


• y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
• xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
• np.arange(y_min, y_max, plot_step))
• plt.tight_layout(h_pad=0.5, w_pad=0.5, pad=2.5)

• Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
• Z = Z.reshape(xx.shape)
• cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)

• plt.xlabel(iris.feature_names[pair[0]])
• plt.ylabel(iris.feature_names[pair[1]])

• # Plot the training points


• for i, color in zip(range(n_classes), plot_colors):
• idx = np.where(y == i)
• plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i],
• cmap=plt.cm.RdYlBu, edgecolor='black', s=15)

• plt.suptitle("Decision surface of a decision tree using paired features")


• plt.legend(loc='lower right', borderpad=0, handletextpad=0)
• plt.axis("tight")
• plt.show()

Ch13. Decision tree v9.b2 78


A working implementation in pure python

• https://machinelearningmastery.com/implem
ent-decision-tree-algorithm-scratch-python
/

Ch13. Decision tree v9.b2 79


• function tt4
• clear
• parent_en=entropy_cal([9,5])

code
%humidy-------------------
• en1=entropy_cal([3,4])
• en2=entropy_cal([6,1])
• Information_gain(1)=parent_en-(7/14)*en1-(7/14)*en2
• clear en1 en2

• %outlook------------------
• en1=entropy_cal([3,2])
• en2=entropy_cal([4,0])
• en3=entropy_cal([2,3])
• Information_gain(2)=parent_en-(5/14)*en1-(4/14)*en2-(5/14)*en3
• clear en1 en2 en3

• %wind -------------------------
• en1=entropy_cal([6,2])
• en2=entropy_cal([3,3])
• Information_gain(3)=parent_en-(8/14)*en1-(6/14)*en2
• clear en1 en2
• %temperature -------------------------
• en1=entropy_cal([2,2]) %hot 2 yes , 2 no
• en2=entropy_cal([3,1]) %mild 3 yes, 1 no
• en3=entropy_cal([4,2]) %cool 4 yes, 2 no
• clear en1 en2 en3
• Information_gain(4)=parent_en-(4/14)*en1-(4/14)*en2-(6/14)*en3
• Information_gain
• %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
• function [en]=entropy_cal(e)
• n=length(e);
• base=sum(e);

• %% probabilty of the elements in the input
• for i=1:n
• p(i)=e(i)/base;
• end
• %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
• temp=0;
• for i=1:n
• if p(i)==0 %to avoid the problem of -inf
• temp=0;
• else
• temp= p(i)*log2(p(i))+temp;
• end
• end
• en=-temp;

Ch13. Decision tree v9.b2 80


A tree
showing nodes, branches, leaves , attributes and target classes

• Root node ?
If attribute X=Raining

Branch: No Branch: yes

Leaf node1 ? Leaf node3


?
If attribute X=sunny
If attribute Z=driving

?Branch: No Branch: Yes


Yes No
Leaf node2
If Y=stay outdoor
Target Target Target
Branch: Yes Branch: No Class=
Class= No Class= No
umbrella umbrella umbrella
Target Target
Class= Class= No
umbrella umbrella

Ch13. Decision tree v9.b2 81


https://www-users.cs.umn.edu/~kumar001/dmbook/ch4.pdf
MATLAB DEMO
• https://www.mathworks.com/help/stats/exam
ples/classification.html

Ch13. Decision tree v9.b2 82

You might also like