Decision Trees Let's now examine the decision tree model, a well-liked categorization technique. A decision tree is a hierarchical structure used in machine learning to make judgments andpredictions. You will be able to describe how a decision tree is used for classification,describe the process of creating a decision tree for classification, and analyze how adecision tree arrives at a classification decision by the time you finish reading this article. The idea behind decision trees for classification is to split the data into subsets, where each subset belongs to only one class. This is accomplished by dividing the input space into pureregions. That is, regions with samples from only one class. With real data, completely puresubsets may not be possible, so the goal is to divide the data into subsets that are as pureas possible. That is, each subset contains as many samples as possible of a single class.Graphically, this is equivalent to dividing the input space into regions that are as pure aspossible. Boundaries separating these regions are called decision boundaries, and thedecision tree model makes classification decisions based on these decision boundaries. A decision tree is a hierarchical structure with nodes and directed edges. The node at the top is called a root node. The nodes at the bottom are called leaf nodes. Nodes that areneither the root node or the leaf nodes are called internal nodes. The root and internal nodeshave test conditions. Each leaf node has a class label associated with it. A classificationdecision is made by traversing the decision tree, starting with the root node. At each node,the answer to the test condition determines which branch to traverse to. When a leaf node isreached, the category at the leaf node determines the classification decision. The depth of a node is the number of edges from the root node to that node. The depth of the root node is zero. The depth of a decision tree is the number of edges in the longest pathfrom the root node to the leaf node. The size of a decision tree is the number of nodes in thetree. To build a decision tree, we start with all samples at a single node, the root node, and add additional nodes when the data is split into subsets. At a high level, constructing a decisiontree consists of the following steps: 1. Start with all samples at a node.2. Partition the samples into subsets based on the input variables.3. Create subsets of records that are purest. That is, each subset contains as many samples as possible belonging to just one class. Another way to say this is that thesubsets should be homogenous, or as pure as possible. 4. Repeatedly partition data into successively pure subsets until stopping criteria are satisfied. An algorithm for constructing a decision tree model is referred to as an induction algorithm, so you may hear the term tree induction used to describe the process of building a decisiontree. Note that at each split, the induction algorithm only considers the best way to split theparticular portion of the data. This is referred to as a greedy approach. Greedy algorithms
solve a subset of the problem at the time and are a necessary approach when solving theentire problem is not feasible. And by feasible, I mean computable in a reasonable amountof time or space. Using a greedy algorithm is necessary for decision trees. It is not feasibleto determine the best tree given a dataset, so the tree has to be built in a piecemeal fashionby determining the best split at each node. Then, let's take a closer look at the construction of a decision tree. The first step is to select the best attribute to split the data. We use a statistical measure such as information gain,gain ratio, or Gini index to determine the best attribute. Information gain is a measure of thereduction in entropy achieved.