Decision Tree Algorithm in Python Machine Learning

1. Decision Tree Principles

The decision tree is a tree structure where the attributes of the samples are used as nodes, and the values of the attributes are used as branches.
The root node of the decision tree is the attribute with the largest information content among all samples. The intermediate nodes of the tree are the attributes with the largest information content in the subsets of samples contained in the subtree rooted at each node. The leaf nodes of the decision tree are the class values of the samples. The decision tree is a form of knowledge representation, which is a high-level generalization of all sample data. The decision tree can accurately identify the class of all samples and also effectively identify the class of new samples.　

Decision Tree Algorithm ID3The basic idea is:

Firstly, find the most discriminative attribute, divide the samples into multiple subsets, and then choose the most discriminative attribute for division in each subset, continuing this process until all subsets only contain data of the same type. Finally, a decision tree is obtained.

J.R. Quinlan's work mainly introduces the concept of information gain from information theory, which he calls information gain (information gain), as a measure of attribute discriminability, and designs a recursive algorithm for constructing decision trees.

It is easier to understand with an example:

For the problem of climate classification, the attribute is:
Weather (A1The value is: Clear, Cloudy, Rain
Temperature (A2The value is: Cold, Moderate, Hot
Humidity (A3) The value is: High, Normal
Wind (A4) The value is: Windy, No wind

Each example belongs to a different category, this example has only two categories, namely P, N. The examples of P class and N class are called positive examples and negative examples, respectively. By putting some known positive examples and negative examples together, we get the training set.
From ID3The algorithm generates a decision tree that correctly classifies each example in the training set, as shown in the figure below.

The leaves of the decision tree are category names, that is, P or N. Other nodes are composed of the attributes of the examples, and each different value of an attribute corresponds to a branch.
If you want to classify an example, start from the root of the tree and test according to the value of the attribute, branch down to the lower-level node, and test the node, and the process continues until the leaf node, and the example is judged to belong to the category marked by the leaf node.
Now let's use a figure to judge a specific example,
The climate description at the morning of a certain day is:
Weather: Cloudy
Temperature: Cold
Humidity: Normal
Wind: No wind

Which climate does it belong to?63;-------------From the figure, it can be determined that the category of the sample is P class.　

ID3It is to construct a decision tree like this from the training set of the table. In fact, there are not only one decision tree that can correctly classify the training set.3The algorithm can generate the decision tree with the fewest nodes.

ID3Algorithm:

　　　　　1. For the current example set, calculate the information gain of each attribute;
　　　　　2. Select the attribute Ak with the highest information gain;
　　　　　3. Group examples with the same value of Ak at Ak into the same subset; the number of subsets is equal to the number of values of Ak;
　　　　 4. For subsets that contain both positive examples and negative examples, recursively call the tree-building algorithm;
　　　　　5. If the subset only contains positive examples or negative examples, mark the corresponding branch with P or N and return to the calling place.

Generally, when it comes to trees, recursion is often used.　

For specific calculations of the climate classification problem, we have:
1, and the calculation of information entropy: where S is the set of examples, P(ui) is the probability of category i appearing:

|S| represents the total number of examples in the example set S, |ui| represents the number of examples in the category ui. For9Positive examples and5There are
P(u1)=(9/14
P(u2)=(5/14
H(S)=(9/14)=log(14/9)+(5/14)=log(14/5)=0.94bit　

2, and the calculation of information gain:

Among them, A is an attribute, Value(A) is the set of attribute A values, v is a certain attribute value of A, Sv is the set of samples in S where A is equal to v, and |Sv| is the number of samples in Sv.

With attribute A1For example, according to the calculation formula of information gain, attribute A1The information gain of

S=[9+,5-] //The original sample set contains14Total examples,9Positive examples,5Negative examples
S_sunny=[2+,3-]//Attribute A1There are a total of5a2,3Negative
S_cloud=[4+,0-] //Attribute A1There are a total of4a4Positive, 0 Negative
S_rain=[3+,2-] //Attribute A1There are a total of5a3,2Negative
Therefore　

3, and the result is

Attribute A1The information gain of this attribute is the highest, so it is selected as the root node.

4, and builds the root and leaf of the decision tree.

ID3The algorithm selects the attribute 'weather' with the highest information gain as the root, in14Examples on the weather's3Branching for3 Corresponding branches3 Subsets are:

where S2All examples in belong to P class, so the corresponding branch is marked as P, and the other two subsets contain both positive and negative examples, and the recursive tree construction algorithm is called.

5Recursive tree construction

Respectively for S1and S3Subset recursive call ID3The algorithm calculates the information gain of each attribute in each subset.
(1) on S1The humidity attribute has the largest information gain, and it is taken as the root node of this branch. Further branching, high humidity examples are all N class, and this branch is marked as N. Normal value examples are all P class, and this branch is marked as P.
(2) on S3The wind attribute has the largest information gain, so it is taken as the root node of this branch. Further branching, when there is wind, all are N class, and this branch is marked as N. When there is no wind, all are P class, and this branch is marked as P.

Two, Python implementation of decision tree algorithm classification

This code is an example in Chapter 3 of machine learning in action, tested and proven to be correct.
　1Calculate the shangnon value of the given data function:

def calcShannonEnt(dataSet): 
 #calculate the shannon value 
 numEntries = len(dataSet) 
 labelCounts = {} 
 for featVec in dataSet:  #create the dictionary for all of the data 
  currentLabel = featVec[-1] 
  if currentLabel not in labelCounts.keys(): 
   labelCounts[currentLabel] = 0 
  labelCounts[currentLabel] +)}= 1 
 shannonEnt = 0.0 
 for key in labelCounts: 
  prob = float(labelCounts[key])/numEntries 
  shannonEnt -= prob*log(prob,2) #get the log value 
 return shannonEnt

　2Create data function

def createDataSet(): 
 dataSet = [[1,1'], 
    [1,1, 'yes'], 
    [1,0,'no'], 
    [0,1'], 
    [0,1']] 
 labels = ['no surfacing','flippers'] 
 return dataSet, labels

3.divide the dataset, divide the dataset according to the given feature

def splitDataSet(dataSet, axis, value): 
 retDataSet = [] 
 for featVec in dataSet: 
  if featVec[axis] == value:  #abstract the fature 
   reducedFeatVec = featVec[:axis] 
   reducedFeatVec.extend(featVec[axis+1:]) 
   retDataSet.append(reducedFeatVec) 
 return retDataSet

4.choose the best way to split the dataset

def chooseBestFeatureToSplit(dataSet): 
 numFeatures = len(dataSet[0])-1 
 baseEntropy = calcShannonEnt(dataSet) 
 bestInfoGain = 0.0; bestFeature = -1 
 for i in range(numFeatures): 
  featList = [example[i] for example in dataSet] 
  uniqueVals = set(featList) 
  newEntropy = 0.0 
  for value in uniqueVals: 
   subDataSet = splitDataSet(dataSet, i , value) 
   prob = len(subDataSet)/float(len(dataSet)) 
   newEntropy +=prob * calcShannonEnt(subDataSet) 
  infoGain = baseEntropy - newEntropy 
  if(infoGain > bestInfoGain): 
   bestInfoGain = infoGain 
   bestFeature = i 
 return bestFeature

5.recursively create the tree

Function to find the name of the classification with the most occurrences

def majorityCnt(classList): 
 classCount = {} 
 for vote in classList: 
  if vote not in classCount.keys(): classCount[vote] = 0 
  classCount[vote] +)}= 1 
 sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True) 
 return sortedClassCount[0][0]

Code for creating the tree function

def createTree(dataSet, labels): 
 classList = [example[-1] for example in dataSet] 
 # The type is the same, so stop classifying 
 if classList.count(classList[0]) == len(classList): 
  return classList[0] 
 # Traverse all the features and choose the most frequent feature 
 if (len(dataSet[0]) == 1) : 
  return majorityCnt(classList) 
 bestFeat = chooseBestFeatureToSplit(dataSet) 
 bestFeatLabel = labels[bestFeat] 
 myTree = {bestFeatLabel:{}} 
 del(labels[bestFeat]) 
 # Get the list that has all the properties 
 featValues = [example[bestFeat] for example in dataSet] 
 uniqueVals = set(featValues) 
 for value in uniqueVals: 
  subLabels = labels[:] 
  myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), subLabels) 
 return myTree

Then, input the following command in the Python prompt symbol:

myDat, labels = trees.createDataSet() 
myTree = trees.createTree(myDat, labels) 
print myTree

Result is:
{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}

6Function for classifying with practical decision trees

def classify(inputTree, featLabels, testVec): 
 firstStr = inputTree.keys()[0] 
 secondDict = inputTree[firstStr] 
 featIndex = featLabels.index(firstStr) 
 for key in secondDict.keys(): 
  if testVec[featIndex] == key: 
   if type(secondDict[key]).__name__ == 'dict': 
    classLabel = classify(secondDict[key], featLabels, testVec) 
   else: classLabel = secondDict[key] 
 return classLabel

In the Python command prompt, enter:
trees.classify(myTree, labels,[1,0])　

Get the result:
'no'
Congratulations. Oh yeah. You did it.!!!

That's all for this article. Hope it will be helpful to everyone's study, and also hope everyone will support the Yelling Tutorial more.

Statement: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (when sending an email, please replace # with @ to report abuse, and provide relevant evidence. Once verified, this site will immediately delete the infringing content.)

Basic Tutorial