Decision Trees
Decision Trees
Decision Trees
• Decision-tree classification
Decision Trees
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree
Training
Output: A Decision Tree for
age?
<=30 overcast
30..40 >40
no yes no yes
Constructing decision trees
• Exponentially many decision trees can be
constructed from a given set of attributes
Classification-Error(t) = 1 max[p(i|t)]
i
Range of impurity measures
Impurity measures
• In general the different impurity measures are
consistent
• Gain of a test condition: compare the impurity
of the parent node with the impurity of the child
nodes k
X N (vj )
= I(parent) I(vj )
j=1
N
• Maximizing the gain == minimizing the
weighted average impurity measure of children
nodes
• If I() = Entropy(), then Δinfo is called
information gain
Computing gain: example
Is minimizing impurity/
maximizing Δ enough?
Is minimizing impurity/
maximizing Δ enough?
• Impurity measures favor attributes with
large number of values
• SplitInfo = -Σi=1…kp(vi)log(p(vi))
• k: total number of splits
• If each attribute has the same number
of records, SplitInfo = logk
• Large number of splits large
SplitInfo small gain ratio
Constructing decision-trees
(pseudocode)
GenDecTree(Sample S, Features F)
1. If stopping_condition(S,F) = true then
a. leaf = createNode()
b. leaf.label= Classify(S)
c. return leaf
2. root = createNode()
3. root.test_condition = findBestSplit(S,F)
4. V = {v| v a possible outcome of root.test_condition}
5. for each value vєV:
a. Sv: = {s | root.test_condition(s) = v and s є S};
b. child = TreeGrowth(Sv ,F) ;
c. Add child as a descent of root and label the edge (rootchild)
as v
Stopping criteria for tree
induction
• Stop expanding a node when all the
records belong to the same class
• Early termination
Advantages of decision trees
• Inexpensive to construct
• Extremely fast at classifying unknown
records
• Easy to interpret for small-sized trees
• Accuracy is comparable to other
classification techniques for many
simple data sets
Example: C4.5 algorithm
• Simple depth-first construction.
• Uses Information Gain
• Sorts Continuous Attributes at each
node.
• Needs entire data to fit in memory.
• Unsuitable for Large Datasets.
x+y<1
Class = + Class =
Circular points:
0.5 ≤ sqrt(x12+x22) ≤ 1
Triangular points:
sqrt(x12+x22) >1 or
sqrt(x12+x22) < 0.5