FACTOID # 120: Nepal’s flag isn’t square or rectangular. It’s a double triangle.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

Encyclopedia > Alternating decision tree

An Alternating Decision Tree (ADTree) is a machine learning method for classification. The ADTree data structure and algorithm are a generalization of decision trees and have connections to boosting. ADTrees were introduced by Yoav Freund and Llew Mason[1]. As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to learn. At a general level, there are two types of learning: inductive, and deductive. ... A binary tree, a simple type of branching linked data structure. ... In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will terminate in a defined end-state. ... In operations research, specifically in decision analysis, a decision tree is a decision support tool that uses a graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. ... Boosting is a machine learning meta-algorithm for performing supervised learning. ... Yoav Freund is a researcher and professor at the University of California, San Diego who mainly works on machine learning, probability theory and related fields and applications. ...

Contents

Motivation

Original boosting algorithms typically combined either decision stumps or decision trees. Boosting decision stumps creates a set of T weighted weak hypotheses (where T is the number of boosting iterations), which can be visualized as a set for reasonable values of T. Boosting decision trees could result in a final combined classifier with thousands (or millions) of nodes for modest values of T. Both of these scenarios produce final classifiers in which it is either difficult to visualize correlations or difficult to visualize at all. Alternating decision trees provide a method for visualizing decision stumps in an ordered and logical way to demonstrate correlations. In doing so, they simultaneously generalize decision trees and can be used to essentially grow boosted decision trees in parallel. A Decision Stump is a weak machine learning model consisting of a Decision Tree with only a single branch. ... In operations research, specifically in decision analysis, a decision tree is a decision support tool that uses a graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. ... A Decision Stump is a weak machine learning model consisting of a Decision Tree with only a single branch. ...


Description of the structure

The alternating decision tree structure consists of two components: decision nodes and prediction nodes. Decision nodes specify a predicate condition. Prediction nodes specify a value to add to the score based on the result of the decision node. Each decision node can be seen as a conjunction between a precondition (the decision node was reached) and the condition specified in the decision node.


Perhaps the easiest way to understand the interaction of decision and prediction nodes is through an example.The following example is taken from JBoost performing boosting for 6 iterations on the spambase dataset (available from the UCI Machine Learning Repository). Positive examples indicate that the message is spam and negative examples are not spam. During each iteration, a single node is added to the ADTree. The ADTree determined by the learning algorithm implemented in JBoost is:

The tree construction algorithm is described below in the Description of the algorithm section. We now show how to interpret the tree once it has been constructed. We focus on one specific instance: Image File history File links Size of this preview: 800 × 331 pixelsFull resolution (1121 × 464 pixel, file size: 38 KB, MIME type: image/png) Author: Aaron Arvey I, the copyright holder of this work, hereby release it into the public domain. ...

An instance to be classified
Feature Value
char_freq_bang 0.08
word_freq_hp 0.4
capital_run_length_longest 4
char_freq_dollar 0
word_freq_remove 0.9
word_freq_george 0
Other features ...

For this instance, we obtain a score that determines the classification of the instance. This score not only acts as a classification, but also as a measure of confidence. The actual order that the ADTree nodes are evaluated will likely be different then the order in which they were created. That is, the node from iteration 4 can be evaluated before the node from iteration 1. There are constraints to this (e.g. node from iteration 2 must be evaluated before the node from iteration 5). In general, either breadth-first or depth-first evaluation will yield the correct interpretation.


The following table shows how the score is created (progressive score) for our above example instance:

Score for the above instance
Iteration 0 1 2 3 4 5 6
Instance values N/A .08 < .052 = n .4 < .195 = n 0 < .01 = y 0 < 0.005 = y N/A .9 < .225 = n
Prediction -0.093 0.74 -1.446 -0.38 0.176 0 1.66
Progressive Score -0.093 0.647 -0.799 -1.179 -1.003 -1.003 0.657


There are a few observations that we should make

  • The final classification of the example is positive (0.657), meaning that the example is considered to be spam.
  • All nodes at depth 1 have their predicate evaluated and one of their prediction nodes contributes to the score. Thus a tree with depth 1 is the equivalent of boosted decision stumps.
  • If a decision node is not reached (the node from iteration 5 in the above example) then the node's predicate and subsequent prediction nodes will not be evaluated.

Description of the algorithm

The alternating decision tree learning algorithm is described in the original paper[1]. The general idea involves a few main concepts:

  • The root decision node is always TRUE or FALSE
  • The tree is grown iteratively. The total number of iterations is generally decided prior to starting the algorithm.
  • Each decision node (c2) is selected by the algorithm based on how well it discriminates between positive and negative examples.
  • Once a decision node is created, the prediction node is determined by how good the decision node discriminates.


Before the algorithm, we first define some notation. Let c be a predicate, then

  • W + (c) is the weight of all positively labeled examples that satisfy c
  • W (c) is the weight of all negatively labeled examples that satisfy c
  • W(c) is the weight of all examples that satisfy c
  • We call c a precondition when it is a conjunction of previous base conditions and negations of previous base conditions


The exact algorithm is:


INPUT: m examples and labels (x_1,y_1),ldots,(x_m,y_m)


Set the weight of all examples to W1(xj) = 1 / m


Set the margin of all examples to r1(xj) = 0


The root decision node is always c = TRUE, with a single prediction node


a=frac{1}{2}textrm{ln}frac{W_+(T)}{W_-(T)}


For t = 1, ldots, T do:

  • Let c_1 in P_t be a precondition (that is, the node being created can be reached via c1) and c2 be a condition (the new node). Then each decision node (c2) is selected by the algorithm based on how well it discriminates between positive and negative examples. The original ADTree algorithm minimizes the criterion 2 left( sqrt{W_+(c_1wedge c_2) W_-(c_1 wedge c_2)} + sqrt{W_+(c_1wedge neg c_2) W_-(c_1 wedge neg c_2)} right) +W(neg c_2) .
  • Once a decision node is created, the prediction nodes are determined by a=frac{1}{2}textrm{ln}frac{W_+(c_1wedge c_2)}{W_-(c_1 wedge c_2)} and b=frac{1}{2}textrm{ln}frac{W_+(c_1wedge neg c_2)}{W_-(c_1 wedge neg c_2)}
  • Add the conditions c_1 wedge c_2 and c_1 wedge neg c_2 to the set of possible preconditions Pt + 1
  • Update the weights: W_{t+1}(x_j) = W_{t}(x_j)e^{r_t(x_j)y_j}

Empirical Results

Figure 6 in the original paper[1] demonstrates that ADTrees are typically as robust as boosted decision trees and boosted decision stumps. In decision theory (for example risk management), a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. ... A Decision Stump is a weak machine learning model consisting of a Decision Tree with only a single branch. ...


References

  1. ^ a b c Yoav Freund and Llew Mason. The Alternating Decision Tree Algorithm. Proceedings of the 16th International Conference on Machine Learning, pages 124-133 (1999)

 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your location
Your comments
Please enter the 5-letter protection code


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.