Ch13. Decision Tree: KH Wong
Ch13. Decision Tree: KH Wong
Ch13. Decision Tree: KH Wong
Decision Tree
KH Wong
4 buses
3 cars
3 trains
Total 10 samples
Ch13. Decision tree v9.b2 14
https://www.saedsayad.com/decision_tree.htm
Method 1) Split metric : Entropy(Parent) =Entropy at the
top level
Entropy ( parent ) pi log 2 ( pi )
i
• Prob(bus) =4/10=0.4
• Prob(car) =3/10=0.3
• Prob(train)=3/10=0.3
– Entropy(parent)= -0.4*log_2(0.4)- 0.3*log_2(0.3)-0.3*log_2(0.3)
=1.571
– (note:log_2 is log base 2.)
• Another example: if P(bus)=1, P(car)=0, P(train)=0
– Entropy = 1*log_2(1)-0*log_2(0.00001)- 0*log_2(0.000001)=0
– Entropy = 0, it is very pure, Impurity is 0
• Prob(bus) =4/10=0.4
• Prob(car) =3/10=0.3
• Prob(train)=3/10=0.3
– Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)=____ ?
• Another example if the class has only bus: if P(bus)=1,
P(car)=0, P(train)=0
– Gini Impurity index= ____?
– Impurity is____ ?
• Prob(bus) =4/10=0.4
• Prob(car)=3/10=0.3
• Prob(train)=3/10=0.3
• Gini index =1-(0.4*0.4+0.3*0.3+0.3*0.3)= 0.66
• Another example if the class has only bus: if P(bus)=1,
P(car)=0, P(train)=0
– Gini Impurity index= 1-1*1-0*0-0*0=0
– Impurity is 0
Train
Train
• If the first 2 rows are not bus but train, find entropy and Gini index
• Prob(bus) =2/10=0.2
• Prob(car)=3/10=0.3
• Prob(train)=5/10=0.5
• Entropy =_______________________________?
• Ch13. Decision tree v9.b2
Gini index =_____________________________? 18
Entropy pi log 2 ( pi ), Gini _ index 1 pi2
ANSWER 3. i i
Train
Train
• If the first 2 rows are not bus but train, find entropy and Gini index
• Prob(bus) =2/10=0.2
• Prob(car)=3/10=0.3
• Prob(train)=5/10=0.5
• Entropy =-0.2*log_2(0.2)- 0.3*log_2(0.3)- 0.5log_2(0.5)= 1.485
• Gini index =1-(0.2*0.2+0.3*0.3+0.5*0.5)=
Ch13. Decision tree v9.b2 0.62 19
Method 3) Split metrics : Variance reduction
https://www.casact.org/education/specsem/f2005/handouts/cart.ppt
Ch13. Decision tree v9.b2 21
Example1: Design a decision tree
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
•
14 Rain Mild High Strong No
Ch13. Decision tree v9.b2
https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/
22
Gini index or Information gain approach
• Gini index
• Gini index is a metric for classification tasks in CART.
• Gini_index = 1 – Σ (Pi2) for i=1 to number of classes
– Note: (two approaches will be shown in following slides)
– Method 1:
– Gini Index: Split using the attribute that the Gini (impurity) index is
the lowest . Gini_index=1-(pi2)
– Or
– Method 2:
– Information gain (based on Entropy) is the highest:
• Information gain (IG)=Entropy(parent)-entropy(child)
• IG= pparent[log2(pparent)]-pchild[log2 (pchild)]
Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5
Ch13. Decision tree v9.b2 24
Exercise 4:Temperature
GINI index approach
Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and
Mild. Let’s summarize decisions for temperature feature.
Gini(Temp=Hot) = __________________________________?
Gini(Temp=Cool) __________________________________?
Gini(Temp=Mild) __________________________________?
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) =____________________________________________?
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page)
Temeprature is a feature. It can be Hot, cool, mild
Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286
Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232
Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) -
Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232-
0.394=0.028
Gi Temperature Yes No Number of instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
Ch13. Decision tree v9.b2 25
ANSWER 4:Temperature
GINI index approach
Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and
Mild. Let’s summarize decisions for temperature feature.
Gini(Temp=Hot) = 1-(2/4)^2- (2/4)^2 = 0.5
Gini(Temp=Cool) = 1-(3/4)^2-(1/4)^2 = 0.375
Gini(Temp=Mild) = 1-(4/6)^2-(2/6)^2 = 0.445
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) =(4/14) *0.5 +(4/14)*0.375+(6/14)*0.445= 0.439
Information gain by entropy approach: Overall decision: yes=9, no=5
Parent entropy= -(9/14)*log_2(9/14)-(5/14)*log_2(5/14)=0.94 (same as last page)
Humidity is a feature. It can be High or Normal
Weighted_entropy(Temp=Hot) =(4/14)*( -(2/4)*log_2(2/4)- (2/4)*log_2(2/4))=0.286
Weighted_entropy(Temp=Cool) =(4/14)*( -(3/4)*log_2(3/4)- (1/4)*log_2(1/4))=0.232
Weighted_entropy(Temp=Mild) =(6/14)*( -(4/6)*log_2(4/6)- (2/6)*log_2(2/6))=0.394
Information_gain_for_humidity= Parent entropy- Weighted_entropy(Temp=Hot) -
Weighted_entropy(Temp=Cool) – Weighted_entropy(Temp=Mild) =0.94- 0.286- 0.232-
0.394=0.028
Gi Temperature Yes No Number of instances
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
Ch13. Decision tree v9.b2 26
Humidity
GINI index approach
Humidity is a binary class feature. It can be high or normal.
Gini(Humidity=High) = 1 – (3/7)^2 – (4/7)^2 = 1 – 0.183 – 0.326 = 0.489
Gini(Humidity=Normal) = 1 – (6/7)^2 – (1/7)^2 = 1 – 0.734 – 0.02 = 0.244
Weighted sum for humidity feature will be calculated next
Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 = 0.367
All
“Decision=yes”,
so the branch
for “Overcast” is
over
Ch13. Decision tree v9.b2 32
You might realize that sub dataset in the overcast leaf has only
yes decisions. This means that overcast leaf is over.
Top of the tree
Information gain by
Feature Gini index entropy
Temperature 0.2 0.57
Humidity 0 is the lowest 0.97 is the highest
Wind 0.466 0.019
Ch13. Decision tree v9.b2 38
Result
Both results agree
with each other.
• Humidity is picked
as the second level
node
yes
• Step1 : if root is attribute No
“weather”, branch is
“Sunny”, find split metric
(M_sunny) Weather:
step2
Cloudy ?
• Step2 : if root is attribute
yes No
“weather”, branch is
“Cloudy”, find split metric
(M_cloudy)
Weather:
• Step3: if root is attribute Rainy ?
step3
• G1=Gini=1- ((N1y/M1)^2+(N1n/M1)^2)
• = 1-((0/2)^2+(2/2)^2)=0
Gini _ index 1 p
• Metric_sunny=G1 or E1 2
i
i
Ch13. Decision tree v9.b2 52
For step2: Find M_cloudy, Weather: step2
Weight_cloudy Cloudy?
yes No
• N=Number of samples=9
• M2=Number of cloudy cases=4
• W2=Weight_cloudy=M2/N=4/9
• G2=Gini=1- ((N2y/M2)^2+(N2n/M2)^2)
• = 1-((2/4)^2+(2/4)^2)=0.5
• Metric_cloudy=G2 or E2
Gini _ index 1 p 2
i
i Ch13. Decision tree v9.b2 53
For step3: Find M_rainy, Weather: step3
Weight_rainy Rainy ?
yes No
• N=Number of samples=9
• M3=Number of rainy cases=3
• W3=Weight_rainy=M3/N=3/9
• G3=Gini=1- ((N3y/M3)^2+(N3n/M3)^2)
• = 1-((1/3)^2+(2/3)^2)=0.444
• Metric_rainy=G3
2
Gini _ index 1 p
or E3i
i
yes No
• N=Number of samples=9
• M5=Number of driving cases=9
• W5=Weight_rainy=M5/N=9/9
• G5=Gini=1- ((N5y/M5)^2+(N5n/M5)^2)
• = 1-((3/9)^2+(6/9)^2)=0.444
• Metric_rainy=G3 or E3
Gini _ index 1 pi2
i Ch13. Decision tree v9.b2 56
Step6: metric for driving
• driving_split_metric= driving_sunny*M_yes+
driving_cloudy*M_no
• driving_split_metric_Gini=
W5*G5=(9/9)*0.444= 0.444
yes No
Driving yes No
No Yes
yes umbrella umbrella
No Not sure
No umbrella Yes=2 Sample is 3
umbrella N=1 for Cannot resolve, but the
umbrella sample is too small, so
we can ignore it
https://stackoverflow.com/questions/19993139/can-splitting-attribute-appear-ma
ny-times-in-decision-tree
- 64
Selection
http://dni-institute.in/blogs/cart-algorithm-for-decision-tree/
http://people.revoledu.com/kardi/tutorial/DecisionTree/how-decision-tree-algorithm-work.htm
Ch13. Decision tree v9.b2 67
Overfitting
References: https://www.investopedia.com/terms/o/overfitting.asp
https://www.investopedia.com/terms/o/overfitting.asp#ixzz5OJ5hm9Hb
https://en.wikipedia.org/wiki/Pruning_(decision_trees)
http://mlwiki.org/index.php/Cost-Complexity_Pruning
Ch13. Decision tree v9.b2 70
Post-pruning using Error estimation
Defining terms
• For the whole dataset : use about 70 % for training data; 30 % for
testing (pruning and Cross-Validation use) http://
mlwiki.org/index.php/Cross-Validation
• Choose examples for training/testing sets randomly
• Training data is used to construct the decision tree (will be pruned)
• Testing data is used for pruning
• f= Error on training data
• N= number of instances covered by the leaves
• Z= score of a normal distribution https://
en.wikipedia.org/wiki/Standard_normal_table
• e=Error on testing data (calculated from f,N,z)
For f=5/14, it
means 5 fails to
classify on 14
samples
Note: •
(6/14)*0.47 +(2/14)*0.72 + (6/14)*0.47=0.5057
Ch13. Decision tree v9.b2 72
https://www.rapidtables.com/math/probability/normal_distribution.html
Conclusion
• We studied how to build a decision tree
• We learned the method of splitting using Gini
index
• We learned the method of splitting using
information gain by entropy
• We learn the idea of pruning to improve
classification and solve the overfitting problem
• # Load iris
• iris = datasets.load_iris()
• X = iris.data
• y = iris.target
• import numpy as np
• import matplotlib.pyplot as plt
•
•
•
•
# Parameters
n_classes = 3
plot_colors = "ryb"
plot_step = 0.02
• # Load data
• iris = load_iris()
Iris dataset
• for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
• [1, 2], [1, 3], [2, 3]]):
• # We only take the two corresponding features
http://scikit-
• X = iris.data[:, pair]
• y = iris.target learn.org/stable/auto_examples/tree/plot_iris
•
•
# Train
clf = DecisionTreeClassifier().fit(X, y) .html#sphx-glr-auto-examples-tree-plot-iris-py
• # Plot the decision boundary
• plt.subplot(2, 3, pairidx + 1)
• Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
• Z = Z.reshape(xx.shape)
• cs = plt.contourf(xx, yy, Z, cmap=plt.cm.RdYlBu)
• plt.xlabel(iris.feature_names[pair[0]])
• plt.ylabel(iris.feature_names[pair[1]])
• https://machinelearningmastery.com/implem
ent-decision-tree-algorithm-scratch-python
/
code
%humidy-------------------
• en1=entropy_cal([3,4])
• en2=entropy_cal([6,1])
• Information_gain(1)=parent_en-(7/14)*en1-(7/14)*en2
• clear en1 en2
•
• %outlook------------------
• en1=entropy_cal([3,2])
• en2=entropy_cal([4,0])
• en3=entropy_cal([2,3])
• Information_gain(2)=parent_en-(5/14)*en1-(4/14)*en2-(5/14)*en3
• clear en1 en2 en3
•
• %wind -------------------------
• en1=entropy_cal([6,2])
• en2=entropy_cal([3,3])
• Information_gain(3)=parent_en-(8/14)*en1-(6/14)*en2
• clear en1 en2
• %temperature -------------------------
• en1=entropy_cal([2,2]) %hot 2 yes , 2 no
• en2=entropy_cal([3,1]) %mild 3 yes, 1 no
• en3=entropy_cal([4,2]) %cool 4 yes, 2 no
• clear en1 en2 en3
• Information_gain(4)=parent_en-(4/14)*en1-(4/14)*en2-(6/14)*en3
• Information_gain
• %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
• function [en]=entropy_cal(e)
• n=length(e);
• base=sum(e);
•
• %% probabilty of the elements in the input
• for i=1:n
• p(i)=e(i)/base;
• end
• %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
• temp=0;
• for i=1:n
• if p(i)==0 %to avoid the problem of -inf
• temp=0;
• else
• temp= p(i)*log2(p(i))+temp;
• end
• end
• en=-temp;
•
• Root node ?
If attribute X=Raining