English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Python language description of machine learning Logistic regression algorithm

This article introduces the Logistic Regression algorithm in machine learning. We use this algorithm to classify data. The Logistic Regression algorithm is also a supervised learning algorithm that needs to learn from the sample space and is suitable for numerical and nominal data, such as, we need to judge whether the data is a certain classification or not based on the size of the feature values (numerical) of the input data.

I. Sample Data

In our example, we have some sample data such as the following:

The sample data has3feature values: X0X0, X1X1, X2X2

We use this to3a feature value of X1X1and X2X2to determine whether the data meets the requirements, that is, data that meets the requirements is1where data that does not meet the requirements is 0.

Sample data for classification is stored in an array

In the logRegres.py file, we write the following function to prepare the data and print it out for observation:

#coding=utf-8-8
from numpy import *
def loadDataSet():
 dataMat = []; labelMat = []
 fr = open('testSet.txt')
 for line in fr.readlines():
  lineArr = line.strip().split()
  dataMat.append([1.0, float(lineArr[1])])
  labelMat.append(int(lineArr[2))
 return dataMat,labelMat
if __name__=='__main__':
 dataMat,labelMat=loadDataSet()
 print 'dataMat:\n',dataMat

Let's take a look at this data sample:

dataMat:
[[1.0, -0.017612, 14.053064], [1.0, -1.395634, 4.662541], [1.0, -0.752157, 6.53862], [1.0, -1.322371, 7.152853], [1.0, 0.423363, 11.054677], [1.0, 0.406704, 7.067335], [1.0, 0.667394, 12.741452], [1.0, -2.46015, 6.866805], [1.0, 0.569411, 9.548755], [1.0, -0.026632, 10.427743], [1.0, 0.850433, 6.920334], [1.0, 1.347183, 13.1755], [1.0, 1.176813, 3.16702], [1.0, -1.781871, 9.097953], [1.0, -0.566606, 5.749003], [1.0, 0.931635, 1.589505], [1.0, -0.024205, 6.151823], [1.0, -0.036453, 2.690988], [1.0, -0.196949, 0.444165], [1.0, 1.014459, 5.754399], [1.0, 1.985298, 3.230619], [1.0, -1.693453, -0.55754], [1.0, -0.576525, 11.778922], [1.0, -0.346811, -1.67873], [1.0, -2.124484, 2.672471], [1.0, 1.217916, 9.597015], [1.0, -0.733928, 9.098687], [1.0, -3.642001, -1.618087], [1.0, 0.315985, 3.523953], [1.0, 1.416614, 9.619232], [1.0, -0.386323, 3.989286], [1.0, 0.556921, 8.294984], [1.0, 1.224863, 11.58736], [1.0, -1.347803, -2.406051], [1.0, 1.196604, 4.951851], [1.0, 0.275221, 9.543647], [1.0, 0.470575, 9.332488], [1.0, -1.889567, 9.542662], [1.0, -1.527893, 12.150579], [1.0, -1.185247, 11.309318], [1.0, -0.445678, 3.297303], [1.0, 1.042222, 6.105155], [1.0, -0.618787, 10.320986], [1.0, 1.152083, 0.548467], [1.0, 0.828534, 2.676045], [1.0, -1.237728, 10.549033], [1.0, -0.683565, -2.166125], [1.0, 0.229456, 5.921938], [1.0, -0.959885, 11.555336], [1.0, 0.492911, 10.993324], [1.0, 0.184992, 8.721488], [1.0, -0.355715, 10.325976], [1.0, -0.397822, 8.058397], [1.0, 0.824839, 13.730343], [1.0, 1.507278, 5.027866], [1.0, 0.099671, 6.835839], [1.0, -0.344008, 10.717485], [1.0, 1.785928, 7.718645], [1.0, -0.918801, 11.560217], [1.0, -0.364009, 4.7473], [1.0, -0.841722, 4.119083], [1.0, 0.490426, 1.960539], [1.0, -0.007194, 9.075792], [1.0, 0.356107, 12.447863], [1.0, 0.342578, 12.281162], [1.0, -0.810823, -1.466018], [1.0, 2.530777, 6.476801], [1.0, 1.296683, 11.607559], [1.0, 0.475487, 12.040035], [1.0, -0.783277, 11.009725], [1.0, 0.074798, 11.02365], [1.0, -1.337472, 0.468339], [1.0, -0.102781, 13.763651], [1.0, -0.147324, 2.874846], [1.0, 0.518389, 9.887035], [1.0, 1.015399, 7.571882], [1.0, -1.658086, -0.027255], [1.0, 1.319944, 2.171228], [1.0, 2.056216, 5.019981], [1.0, -0.851633, 4.375691], [1.0, -1.510047, 6.061992], [1.0, -1.076637, -3.181888], [1.0, 1.821096, 10.28399], [1.0, 3.01015, 8.401766], [1.0, -1.099458, 1.688274], [1.0, -0.834872, -1.733869], [1.0, -0.846637, 3.849075], [1.0, 1.400102, 12.628781], [1.0, 1.752842, 5.468166], [1.0, 0.078557, 0.059736], [1.0, 0.089392, -0.7153], [1.0, 1.825662, 12.693808], [1.0, 0.197445, 9.744638], [1.0, 0.126117, 0.922311], [1.0, -0.679797, 1.22053], [1.0, 0.677983, 2.556666], [1.0, 0.761349, 10.693862], [1.0, -2.168791, 0.143632], [1.0, 1.38861, 9.341997], [1.0, 0.317029, 14.739025]
labelMat:
[0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0]

The first column of the sample data dataMat, that is, our characteristic value X0X0, is all1. This problem needs to be understood when calculating the regression parameters later. All sample data in total10zero, and the corresponding classification result is also10zero.

So, the problem we have now is:
We need to find the relationship between the characteristic values in the sample space and the classification results. Design a function or feature that, after inputting a set of feature values, can automatically classify the input data according to the relationship between the characteristic values in the sample space and the classification results, so that the result is either1either 0.

2. Sigmoid Function

To solve the problem mentioned in the previous section, we first introduce the Sigmoid function here:

This function has the following characteristics:

When z=0, the value is 0.50.5
As z continuously increases, the value tends to approach1
As z continuously decreases, the value tends to approach 0
Let's take a look at the curve chart of the function:

If we take the3characteristic values X0X0, X1X1and X2X2the value is substituted into the function, and a result is calculated. Then this result will be close to our classification result (0 to1between these values). If this result is close to 0, we consider it as classified as 0, and if the result is close to1We consider it as classified1.

How to substitute it into the function? In fact, it can be simply added because when z continuously increases or decreases, the value of the function tends to approach1or 0. We make z=x0+x1+x2z=x0+x1+x2

However, in reality, there will be errors between our calculation results and the actual classification values, even completely incorrect. In order to correct this problem, we correct the errors for the sample space3characteristic values X0X0, X1X1and X2X2, define a regression coefficient w0w0, w1w1and w2w2, so as to reduce this error. Even if z=w0x0+w1x1+w2x2

It is not difficult to imagine that the values of this set of w regression coefficients determine the accuracy, even the correctness, of our calculation results. That is to say, this set of w values reflects the rules of classification in the sample space.
Then, when we input data other than a set of samples, along with the correct w regression coefficients, we can obtain a classification result that is closer to the classification rule of the sample space.
The problem arises again, how do we get such a set of regression coefficients w?

3. Gradient Ascent Method

Gradient ascent method is to iteratively calculate parameter values in the direction of the gradient of the function to find the maximum parameter value. The iterative formula is as follows:

Among them, αα is the step length, and Δσ(w)Δσ(w) is the gradient of the σ(w)σ(w) function. For the derivation of the gradient, please refer toHere. The author's mathematical ability is limited, so no explanation is given.

Finally, we can obtain the calculation formula for the gradient:

Then, the iterative formula is as follows:

Formula description:

wk+1wk+1is the regression coefficient result of the XX feature item for this iteration
wkwk is the regression coefficient result of the XX feature item from the previous iteration
αα is the step size moving in the direction of the gradient in each iteration
xixi is the i-th element in the XX feature item
yiyi is the classification result of the i-th record in the sample
σ(xi,wk)σ(xi,wk) is the classification result of the i-th record in the sample, using the sigmoid function and wkwk as the regression coefficient.
[yi−σ(xi,wk)][yi−σ(xi,wk)] is the classification result value corresponding to the i-th record of the sample, and the error value of the classification result value calculated by the sigmoid function using wkwk as the regression coefficient.

Now that we have the formula for calculating the regression coefficient, let's implement a function in the logRegres.py file to calculate the regression coefficient of the sample space and print our results:

def gradAscent(dataMatIn, classLabels):
 dataMatrix = mat(dataMatIn)    #100 row3Column
 #print dataMatrix
 labelMat = mat(classLabels).transpose() #100 row1Column
 #print 'labelMat:\n',labelMat
 print 'labelMat's shape: rowNum=',shape(labelMat)[0],'colNum=',shape(labelMat)[1]
 rowNum,colNum = shape(dataMatrix)
 alpha = 0.001
 maxCycles = 500
 weights = ones((colNum,1)) #3Row1Column
 #print shape(dataMatrix)
 #print shape(weights)
 #print shape(labelMat)
 for k in range(maxCycles):    #heavy on matrix operations
  h = sigmoid(dataMatrix*weights)  #100 row1Column
  #print h
  error = (labelMat - h)    #vector subtraction
  weights = weights + alpha * dataMatrix.transpose()* error #3Row1Column
 return weights
if __name__=='__main__':
 dataMat,labelMat=loadDataSet()
 #weights=gradAscent(dataMat,labelMat)
 #print 'dataMat:\n',dataMat
 #print 'labelMat:\n',labelMat
 print weights

Print Result:

Regression Coefficient:
[[ 4.12414349]
 [ 0.48007329]
 [-0.6168482 ]

In order to verify the accuracy of the review coefficient we have calculated, let's take a look at the scatter plot of the sample space and the fitting curve of the regression coefficient. We use z(x1,x2)=w0+w1x1+w2x2as our fitting function, draw its fitting curve in the coordinate system. With X1X1and X2X2The value as the abscissa and ordinate to draw the scatter plot of the sample space. The code is as follows:

def plotBestFit(weights):
 import matplotlib.pyplot as plt
 dataMat,labelMat=loadDataSet()
 dataArr = array(dataMat)
 n = shape(dataArr)[0] 
 xcord1 = []; ycord1 = []
 xcord2 = []; ycord2 = []
 for i in range(n):
  if int(labelMat[i])== 1:
   xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
  else:
   xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
 fig = plt.figure()
 ax = fig.add_subplot(111)
 ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
 ax.scatter(xcord2, ycord2, s=30, c='green')
 x = arange(-3.0, 3.0, 0.1)
 y = (-weights[0]-weights[1]*x)/weights[2]
 y = y.transpose()
 ax.plot(x, y)
 plt.xlabel('X1); plt.ylabel('X2);
 plt.show()
if __name__=='__main__':
 dataMat,labelMat=loadDataSet()
 weights=gradAscent(dataMat,labelMat)
 print 'Regression coefficient:\n',weights
 plotBestFit(weights)

After running, we get the following image:

Through our observations, our regression coefficient algorithm is relatively accurate, and the fitting curve divides the sample data into two parts and conforms to the classification rules of the samples.

Next, let's implement a classifier and test this classifier:

def classify0(targetData,weights):
 v = sigmoid(targetData*weights)
 if v>0.5:
  return 1.0
 else :
  return 0
def testClassify0():
 dataMat,labelMat=loadDataSet()
 examPercent=0.7
 row,col=shape(dataMat)
 exam=[]
 exam_label=[]
 test=[]
 test_label=[]
 for i in range(row):
  if i < row*examPercent:
   exam.append(dataMat[i])
   exam_label.append(labelMat[i])
  else:
   test.append(dataMat[i])
   test_label.append(labelMat[i])
 weights=gradAscent(exam,exam_label)
 errCnt=0
 trow,tcol=shape(test)
 for i in range(trow):
  v=int(classify0(test[i],weights))
  if v != int(test_label[i]):
   errCnt += 1
   print 'Calculated value:', v, ' Original value', test_label[i]
 print 'Error rate:', errCnt/trow
if __name__=='__main__':
 #dataMat,labelMat=loadDataSet()
 #weights=gradAscent(dataMat,labelMat)
 ##print 'dataMat:\n',dataMat
 ##print 'labelMat:\n',labelMat
 #print 'Regression coefficients:\n',weights
 #plotBestFit(weights)
 testClassify0()

The implementation of the classifier is very simple. We use the sample data from the previous70 data as our test sample data, calculate the regression coefficients. Then use the classifier to classify the remaining30 records for classification, then compare the results with the sample data. Finally, print the error rate. We can see that the error rate is 0, almost perfect! We can modify the proportion of test samples in the original sample space and test several times. So, the conclusion is that the accuracy of our algorithm is still quite good!

So, is the problem solved here? It seems that something is still missing. Let's study our method of calculating regression coefficients in detail. It is not difficult to find that in this process, we have performed matrix multiplication with the matrix composed of sample data. That is to say, in order to calculate the regression coefficients, we have traversed the entire sample data.

Our problem arises again, the sample data in our example only has100 records, if the data is processed with thousands of sample data, the computational complexity of our function for calculating regression coefficients will rise sharply. Let's take a look at how to optimize this algorithm.

Fourth, optimize the gradient ascent algorithm - stochastic gradient ascent method

After understanding the formula for iterative calculation of regression coefficients

After our implemented program, we will make the following improvement to the method of calculating regression coefficients:

def stocGradAscent0(dataMatrix, classLabels):
 m,n = shape(dataMatrix)
 alpha = 0.01
 weights = ones((n,1)) #initialize to all ones
 for i in range(m):
  h = sigmoid(sum(dataMatrix[i]*weights))
  error = classLabels[i] - h
  weights = weights + alpha * error * mat(dataMatrix[i]).transpose()
 return weights

When calculating the regression coefficient in each iteration, only one sample point in the sample space is used. Let's take a look at the accuracy of this algorithm through a scatter plot of sample points and a fitted curve generated by the program:

It is not difficult to see that it is quite different from the previous algorithm. The reason is that the previous algorithm was through500 iterations. The result calculated by the latter only went through100 iterations. The issue to be explained here is that the regression coefficient tends to converge as the number of iterations increases, and the convergence process has fluctuations. In short, the more iterations, the closer to the desired value, but due to the non-linear nature of the sample data, there will also be some errors in this process. The specific relationship between the regression coefficient and the number of iterations can be referred to in some textbooks, such as the description in 'Machine Learning in Action', and will not be introduced in detail here.
Here, we only introduce how to improve our algorithm so that our algorithm can converge quickly and reduce fluctuations. The method is as follows:

In each iteration, a random sample point is selected to calculate the regression vector
The step size of iteration decreases continuously as the number of iterations increases, but it is never equal to 0

Improve the code and print the fitted curve and sample scatter plot:

def stocGradAscent1(dataMatrix, classLabels, numIter=150):
 m,n = shape(dataMatrix)
 weights = ones((n,1)) #initialize to all ones
 for j in range(numIter):
  dataIndex = range(m)
  for i in range(m):
   alpha = 4/(1.0+j+i)+0.0001 #apha decreases with iteration, does not 
   randIndex = int(random.uniform(0,len(dataIndex)))#go to 0 because of the constant
   h = sigmoid(sum(dataMatrix[randIndex]*weights))
   error = classLabels[randIndex] - h
   weights = weights + alpha * error * mat(dataMatrix[randIndex]).transpose()
   del(dataIndex[randIndex])
 return weights
if __name__=='__main__':
 dataMat,labelMat=loadDataSet()
 #weights=stocGradAscent0(dataMat,labelMat)
 weights=stocGradAscent1(dataMat,labelMat)
 #weights=gradAscent(dataMat,labelMat)
 #print 'dataMat:\n',dataMat
 #print 'labelMat:\n',labelMat
 #print 'Regression coefficients:\n',weights
 plotBestFit(weights)
 #testClassify0()

The default is15Scatter plot and fitting curve of 0 iterations:

It is not difficult to see that the accuracy is very close to the first algorithm!

Summary

The logistic regression algorithm mainly uses the Sigmoid function to classify data, and the key to the accuracy of classification depends on the regression coefficients calculated from the sample space. We use the gradient ascent method to calculate the regression coefficients and use the stochastic gradient ascent method to improve the performance of the algorithm.

This is the full content of this article about the description of machine learning logistic regression algorithm in Python language, I hope it will be helpful to everyone. Those who are interested can continue to refer to other Python andAlgorithmFor related topics, welcome to leave comments if there are any deficiencies. Thank you friends for your support to this site!

Declaration: The content of this article is from the network, the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#w3Please report by email to codebox.com (replace # with @ when sending email) and provide relevant evidence. Once verified, this site will immediately delete the content suspected of infringement.

You May Also Like