English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
The process of fitting the data with a straight line is called regression. The idea of logistic regression classification is: establish a regression formula for the classification boundary line based on existing data.
The formula is expressed as:
I. Gradient Ascent Method
All data participate in the calculation in each iteration.
for Loop Times:
Training
The code is as follows:
import numpy as np import matplotlib.pyplot as plt def loadData(): labelVec = [] dataMat = [] with open('testSet.txt') as f: for line in f.readlines(): dataMat.append([1.0,line.strip().split()[0],line.strip().split()[1]) labelVec.append(line.strip().split()[2]) return dataMat,labelVec def Sigmoid(inX): return 1/(1+np.exp(-inX)) def trainLR(dataMat,labelVec): dataMatrix = np.mat(dataMat).astype(np.float64) lableMatrix = np.mat(labelVec).T.astype(np.float64) m,n = dataMatrix.shape w = np.ones((n,1)) alpha = 0.001 for i in range(500): predict = Sigmoid(dataMatrix*w) error = predict-lableMatrix w = w - alpha*dataMatrix.T*error return w def plotBestFit(wei,data,label): if type(wei).__name__ == 'ndarray': weights = wei else: weights = wei.getA() fig = plt.figure(0) ax = fig.add_subplot(111) xxx = np.arange(-3,3,0.1) yyy = - weights[0]/weights[2] - weights[1]/weights[2]*xxx ax.plot(xxx,yyy) cord1 = [] cord0 = [] for i in range(len(label)): if label[i] == 1: cord1.append(data[i][1:3]) else: cord0.append(data[i][1:3]) cord1 = np.array(cord1) cord0 = np.array(cord0) ax.scatter(cord1[:,0],cord1[:,1],c='red') ax.scatter(cord0[:,0],cord0[:,1],c='green') plt.show() if __name__ == "__main__": data,label = loadData() data = np.array(data).astype(np.float64) label = [int(item) for item in label] weight = trainLR(data,label) plotBestFit(weight,data,label)
II. Stochastic Gradient Ascent Method
1.Adjust the learning parameters with the number of iterations, which can alleviate high-frequency fluctuations of parameters.
2.Randomly select samples to update regression parameters, which can reduce periodic fluctuations.
for Loop Times:
for Sample Quantity:
Update the learning rate
Randomly select samples
Training
Remove the sample from the sample set
The code is as follows:
import numpy as np import matplotlib.pyplot as plt def loadData(): labelVec = [] dataMat = [] with open('testSet.txt') as f: for line in f.readlines(): dataMat.append([1.0,line.strip().split()[0],line.strip().split()[1]) labelVec.append(line.strip().split()[2]) return dataMat,labelVec def Sigmoid(inX): return 1/(1+np.exp(-inX)) def plotBestFit(wei,data,label): if type(wei).__name__ == 'ndarray': weights = wei else: weights = wei.getA() fig = plt.figure(0) ax = fig.add_subplot(111) xxx = np.arange(-3,3,0.1) yyy = - weights[0]/weights[2] - weights[1]/weights[2]*xxx ax.plot(xxx,yyy) cord1 = [] cord0 = [] for i in range(len(label)): if label[i] == 1: cord1.append(data[i][1:3]) else: cord0.append(data[i][1:3]) cord1 = np.array(cord1) cord0 = np.array(cord0) ax.scatter(cord1[:,0],cord1[:,1],c='red') ax.scatter(cord0[:,0],cord0[:,1],c='green') plt.show() def stocGradAscent(dataMat,labelVec,trainLoop): m,n = np.shape(dataMat) w = np.ones((n,1)) for j in range(trainLoop): dataIndex = range(m) for i in range(m): alpha = 4/(i+j+1) + 0.01 randIndex = int(np.random.uniform(0,len(dataIndex))) predict = Sigmoid(np.dot(dataMat[dataIndex[randIndex]],w)) error = predict - labelVec[dataIndex[randIndex]] w = w - alpha*error*dataMat[dataIndex[randIndex]].reshape(n,1) np.delete(dataIndex,randIndex,0) return w if __name__ == "__main__": data,label = loadData() data = np.array(data).astype(np.float64) label = [int(item) for item in label] weight = stocGradAscent(data,label,300) plotBestFit(weight,data,label)
Chapter 3: Programming Skills
1. string extraction
Remove ' \n', ' \r', ' \t', ' ' from the string and divide by the space character.
string.strip().split()
2. type judgment
if type(secondTree[value]).__name__ == 'dict':
3. multiplication
The multiplication of two matrix types of numpy results in a matrix
c = a*b c Out[66]: matrix([ 6.830482])
The multiplication of two vectors of vector types results in a two-dimensional array
b Out[80]: array([ 1.], [ 1.], [ 1.]]) a Out[81]: array([1, 2, 3]) a*b Out[82]: array([ 1,. 2,. 3.], [ 1,. 2,. 3.], [ 1,. 2,. 3.]]) b*a Out[83]: array([ 1,. 2,. 3.], [ 1,. 2,. 3.], [ 1,. 2,. 3.]])
That's all for this article. Hope it will be helpful to everyone's learning and also hope everyone will support the Shouting Tutorial more.
Statement: The content of this article is from the Internet, owned by the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#w3Please report via email to codebox.com (replace # with @ when sending email) and provide relevant evidence. Once verified, this site will immediately delete the infringing content.