Writing Logistic Regression in Python

The process of fitting the data with a straight line is called regression. The idea of logistic regression classification is: establish a regression formula for the classification boundary line based on existing data.
The formula is expressed as:

I. Gradient Ascent Method

All data participate in the calculation in each iteration.

for Loop Times:
　　　　　　　 Training

The code is as follows:

import numpy as np
import matplotlib.pyplot as plt
def loadData():
 labelVec = []
 dataMat = []
 with open('testSet.txt') as f:
  for line in f.readlines():
   dataMat.append([1.0,line.strip().split()[0],line.strip().split()[1])
   labelVec.append(line.strip().split()[2])
 return dataMat,labelVec
def Sigmoid(inX):
 return 1/(1+np.exp(-inX))
def trainLR(dataMat,labelVec):
 dataMatrix = np.mat(dataMat).astype(np.float64)
 lableMatrix = np.mat(labelVec).T.astype(np.float64)
 m,n = dataMatrix.shape
 w = np.ones((n,1))
 alpha = 0.001
 for i in range(500):
  predict = Sigmoid(dataMatrix*w)
  error = predict-lableMatrix
  w = w - alpha*dataMatrix.T*error
 return w
def plotBestFit(wei,data,label):
 if type(wei).__name__ == 'ndarray':
  weights = wei
 else:
  weights = wei.getA()
 fig = plt.figure(0)
 ax = fig.add_subplot(111)
 xxx = np.arange(-3,3,0.1)
 yyy = - weights[0]/weights[2] - weights[1]/weights[2]*xxx
 ax.plot(xxx,yyy)
 cord1 = []
 cord0 = []
 for i in range(len(label)):
  if label[i] == 1:
   cord1.append(data[i][1:3])
  else:
   cord0.append(data[i][1:3])
 cord1 = np.array(cord1)
 cord0 = np.array(cord0)
 ax.scatter(cord1[:,0],cord1[:,1],c='red')
 ax.scatter(cord0[:,0],cord0[:,1],c='green')
 plt.show()
if __name__ == "__main__":
 data,label = loadData()
 data = np.array(data).astype(np.float64)
 label = [int(item) for item in label]
 weight = trainLR(data,label)
 plotBestFit(weight,data,label)

II. Stochastic Gradient Ascent Method

1.Adjust the learning parameters with the number of iterations, which can alleviate high-frequency fluctuations of parameters.
2.Randomly select samples to update regression parameters, which can reduce periodic fluctuations.

for Loop Times:
　　　 for Sample Quantity:
　　　　　　　 Update the learning rate
　　　　　　　 Randomly select samples
　　　　　　　 Training
　　　　　　　 Remove the sample from the sample set

The code is as follows:

import numpy as np
import matplotlib.pyplot as plt
def loadData():
 labelVec = []
 dataMat = []
 with open('testSet.txt') as f:
  for line in f.readlines():
   dataMat.append([1.0,line.strip().split()[0],line.strip().split()[1])
   labelVec.append(line.strip().split()[2])
 return dataMat,labelVec
def Sigmoid(inX):
 return 1/(1+np.exp(-inX))
def plotBestFit(wei,data,label):
 if type(wei).__name__ == 'ndarray':
  weights = wei
 else:
  weights = wei.getA()
 fig = plt.figure(0)
 ax = fig.add_subplot(111)
 xxx = np.arange(-3,3,0.1)
 yyy = - weights[0]/weights[2] - weights[1]/weights[2]*xxx
 ax.plot(xxx,yyy)
 cord1 = []
 cord0 = []
 for i in range(len(label)):
  if label[i] == 1:
   cord1.append(data[i][1:3])
  else:
   cord0.append(data[i][1:3])
 cord1 = np.array(cord1)
 cord0 = np.array(cord0)
 ax.scatter(cord1[:,0],cord1[:,1],c='red')
 ax.scatter(cord0[:,0],cord0[:,1],c='green')
 plt.show()
def stocGradAscent(dataMat,labelVec,trainLoop):
 m,n = np.shape(dataMat)
 w = np.ones((n,1))
 for j in range(trainLoop):
  dataIndex = range(m)
  for i in range(m):
   alpha = 4/(i+j+1) + 0.01
   randIndex = int(np.random.uniform(0,len(dataIndex)))
   predict = Sigmoid(np.dot(dataMat[dataIndex[randIndex]],w))
   error = predict - labelVec[dataIndex[randIndex]]
   w = w - alpha*error*dataMat[dataIndex[randIndex]].reshape(n,1)
   np.delete(dataIndex,randIndex,0)
 return w
if __name__ == "__main__":
 data,label = loadData()
 data = np.array(data).astype(np.float64)
 label = [int(item) for item in label]
 weight = stocGradAscent(data,label,300) 
 plotBestFit(weight,data,label)

Chapter 3: Programming Skills

1. string extraction

Remove ' \n', ' \r', ' \t', ' ' from the string and divide by the space character.

string.strip().split()

2. type judgment

if type(secondTree[value]).__name__ == 'dict':

3. multiplication

The multiplication of two matrix types of numpy results in a matrix

c = a*b
c
Out[66]: matrix([ 6.830482])

The multiplication of two vectors of vector types results in a two-dimensional array

b
Out[80]: 
array([ 1.],
  [ 1.],
  [ 1.]])
a
Out[81]: array([1, 2, 3])
a*b
Out[82]: 
array([ 1,. 2,. 3.],
  [ 1,. 2,. 3.],
  [ 1,. 2,. 3.]])
b*a
Out[83]: 
array([ 1,. 2,. 3.],
  [ 1,. 2,. 3.],
  [ 1,. 2,. 3.]])

That's all for this article. Hope it will be helpful to everyone's learning and also hope everyone will support the Shouting Tutorial more.

Statement: The content of this article is from the Internet, owned by the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#w3Please report via email to codebox.com (replace # with @ when sending email) and provide relevant evidence. Once verified, this site will immediately delete the infringing content.

Basic Tutorial