Python implementation of decision tree classification algorithm

This article shares the specific code for implementing the decision tree classification algorithm in Python for everyone's reference, as follows

1Overview

Decision Tree (decision tree) - It is a widely used classification algorithm.

Compared to the Bayesian algorithm, the advantage of decision trees lies in the fact that the construction process does not require any domain knowledge or parameter settings

In practical applications, decision trees are more suitable for exploratory knowledge discovery.

2Algorithm Idea

In simple terms, the idea of decision tree classification is similar to finding a partner. Imagine a mother of a girl wants to introduce a boyfriend to her daughter, and then the following conversation arises:

　　　　　 Daughter: How old is she?

　　　　　 Mother:26.

　　　　　 Daughter: Is he handsome?

　　　　　 Mother: She looks quite handsome.

　　　　　 Daughter: Does she earn a lot?

　　　　　 Mother: Not very high, just average.

　　　　　 Daughter: Is she a civil servant?

　　　　　 Mother: Yes, she works at the tax bureau.

　　　　　 Daughter: Alright, I'll go and see.　

The girl's decision-making process is a typical decision-making tree classification.

Essence:Categorize men into two categories: see and not see based on age, appearance, income, and whether they are civil servants

Suppose the girl's requirements for men are:3Under the age of 20, with an average or above appearance and either a high-income person or a civil servant with an average or above income, this can be represented by the following diagram to show the girl's decision logic

The above diagram fully expresses the girl's strategy for deciding whether to meet a date object, among which:

◊ the green node represents the judgment condition

◊ the orange node represents the decision result

◊ the arrow indicates the decision path under different judgment conditions

The red arrow in the diagram indicates the decision-making process of the girl in the above example.　

This diagram can be considered as a decision tree, but it is said to be 'basically can be considered' because the judgment conditions in the diagram are not quantified, such as high, medium, and low income, etc., which cannot be regarded as a strict decision tree. If all conditions are quantified, it will become a real decision tree.　

The key to the decision tree classification algorithm is to construct the best decision tree based on 'prior data' to predict the category of unknown data　

Decision tree: It is a tree structure (which can be a binary tree or a non-binary tree). Each non-leaf node represents a test on a feature attribute, each branch represents the output of this feature attribute in a certain value range, and each leaf node stores a category. The process of making decisions using a decision tree is to start from the root node, test the corresponding feature attribute of the item to be classified, and select the output branch according to its value, until reaching the leaf node, and taking the category stored in the leaf node as the decision result.

3, decision tree construction

If there are the following data samples to judge the quality of apples:

Sample　　　 Red　　　　 Large　　　　　 Good apple

0　　　　　　　　1　　　　　 1　　　　　　　　 1

1　　　　　　　　1　　　　　 0　　　　　　　　 1

2　　　　　　　　0　　　　　 1　　　　　　　　 0

3　　　　　　　　0　　　　　 0　　　　　　　　 0

samples have2attributes, A0 represents whether it is a red apple. A1indicating whether it is a large apple. Suppose we need to build a decision tree to automatically judge the quality of apples based on this data sample.

Since the data in this example only has2attributes, so we can enumerate all possible decision trees that can be constructed, so2trees, as shown in the following figure:

Obviously, the decision tree on the left, which uses A0 (red) as the basis for division, is superior to the one on the right using A1(Size) as the basis for division decision trees.

Of course, this is intuitive recognition. However, intuition is obviously not suitable for conversion into program implementation, so a quantitative evaluation is needed to evaluate the performance of these two trees.

The quantitative evaluation method used for the evaluation of decision trees isCalculate the information entropy gain of each division scenario:

If the information entropy decreases the most after data division by a selected attribute, then this division attribute is the optimal choice　

The basis for attribute division selection (i.e., constructing a decision tree):

In simple terms, entropy is the degree of 'disorder' and 'chaos'.

To understand through calculation:

1, and the entropy of the original sample data:

Total number of samples:4

Good apples:2

Bad apples:2

Entropy: -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1

Information entropy is1Representing the current state as the most chaotic and disordered.

2, Entropy gain calculation of the division results of two decision trees

tree1First choose A0 to make a division, and the calculation of the information entropy of each sub-node is as follows:

0,1Leaf nodes have2positive examples, 0 negative examples. The information entropy is: e1 = -(2/2 * log(2/2) + 0/2 * log(0/2)) = 0.

2,3Leaf nodes have 0 positive examples,2negative examples. The information entropy is: e2 = -(0/2 * log(0/2) + 2/2 * log(2/2)) = 0.

Therefore, the information entropy after the A0 division is the weighted sum of the proportion of the information entropy of each sub-node: E = e1*2/4 + e2*2/4 = 0.

The information entropy gain G(S, A0) of the division of choosing A0 is S - E = 1 - 0 = 1.

In fact, the leaf nodes of the decision tree represent that all belong to the same category, so the information entropy must be 0.　

tree2First choose A1make a division, and the calculation of the information entropy of each sub-node is as follows:

0,2sub-nodes have1positive examples,1negative examples. The information entropy is: e1 = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1.

1,3sub-nodes have1positive examples,1negative examples. The information entropy is: e2 = -(1/2 * log(1/2) + 1/2 * log(1/2)) = 1.

Therefore, choose A1The information entropy after division is the weighted sum of the proportion of the information entropy of each sub-node: E = e1*2/4 + e2*2/4 = 1. That is to say, it is the same as dividing and not dividing!

Choose A1The information entropy gain G(S, A1）=S - E = 1 - 1 = 0.　

Therefore, before each division, we only need to calculate the division with the largest information entropy gain.

The information entropy gain of the first A0 division is1>First do A1The information entropy gain at the time of division, so the first A0 division is the optimal choice!!

4, Algorithm Guiding Ideology

After the division of decision attributes, the disorder of the data is becoming lower and lower, that is, the information entropy is becoming smaller.　

5, Algorithm Implementation

Sort out the attributes in the data

Compare the information entropy gain of the data divided according to a specific attribute, and choose the attribute with the largest information entropy gain as the first division basis, and then continue to choose the second attribute, and so on.

That's all for this article. I hope it will be helpful to everyone's learning and that everyone will support the呐喊 tutorial more.

Statement: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (Please replace # with @ when sending an email for reporting. Provide relevant evidence, and once verified, this site will immediately delete the infringing content.)

Basic Tutorial