English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Chapter 1: Algorithm Briefly
We hope to have a function that accepts input and predicts the category, which is used for classification. Here, the sigmoid function in mathematics is used, and the specific expression and function graph of the sigmoid function are as follows:
It can be clearly seen that when the input x is less than 0, the function value is less than 0.5, predicting the classification as 0; when the input x is greater than 0, the function value is greater than 0.5, predicting the classification as1.
1.1 Representation of the prediction function
1.2Parameter Solution
Chapter 2: Code Implementation
The sigmoid function calculates the corresponding function value; gradAscent implements batch-Gradient ascent means that all data in the dataset are considered in each iteration; while in stoGradAscent0, all examples in the dataset are compared, which greatly reduces the complexity; stoGradAscent1This is an improvement on stochastic gradient ascent, specifically the frequency of change of alpha varies each time, and the examples used to update the parameters each time are randomly selected.
from numpy import * import matplotlib.pyplot as plt def loadDataSet(): dataMat = [] labelMat = [] fr = open('testSet.txt') for line in fr.readlines(): lineArr = line.strip('\n').split('\t') dataMat.append([1.0, float(lineArr[1]) labelMat.append(int(lineArr[2)) fr.close() return dataMat, labelMat def sigmoid(inX): return 1.0/(1+exp(-inX)) def gradAscent(dataMatIn, classLabels): dataMatrix = mat(dataMatIn) labelMat = mat(classLabels).transpose() m,n=shape(dataMatrix) alpha = 0.001 maxCycles = 500 weights = ones((n,1)) errors=[] for k in range(maxCycles): h = sigmoid(dataMatrix*weights) error = labelMat - h errors.append(sum(error)) weights = weights + alpha*dataMatrix.transpose()*error return weights, errors def stoGradAscent0(dataMatIn, classLabels): m,n=shape(dataMatIn) alpha = 0.01 weights = ones(n) for i in range(m): h = sigmoid(sum(dataMatIn[i]*weights)) error = classLabels[i] - h weights = weights + alpha*error*dataMatIn[i] return weights def stoGradAscent1(dataMatrix, classLabels, numIter = 150): m,n=shape(dataMatrix) weights = ones(n) for j in range(numIter): dataIndex=range(m) for i in range(m): alpha= 4/(1.0+j+i)+0.01 randIndex = int(random.uniform(0,len(dataIndex))) h = sigmoid(sum(dataMatrix[randIndex]*weights)) error = classLabels[randIndex]-h weights=weights+alpha*error*dataMatrix[randIndex] del(dataIndex[randIndex]) return weights def plotError(errs): k = len(errs) x = range(1,k+1) plt.plot(x,errs,'g--') plt.show() def plotBestFit(wei): weights = wei.getA() dataMat, labelMat = loadDataSet() dataArr = array(dataMat) n = shape(dataArr)[0] xcord1=[] ycord1=[] xcord2=[] ycord2=[] for i in range(n): if int(labelMat[i])==1: xcord1.append(dataArr[i,1) ycord1.append(dataArr[i,2) else: xcord2.append(dataArr[i,1) ycord2.append(dataArr[i,2) fig = plt.figure() ax = fig.add_subplot(111) ax.scatter(xcord1, ycord1, s=30, c='red', marker='s') ax.scatter(xcord2, ycord2, s=30, c='green') x = arange(-3.0,3.0,0.1) y=(-weights[0]-weights[1]*x)/weights[2] ax.plot(x,y) plt.xlabel('x1') plt.ylabel('x2') plt.show() def classifyVector(inX, weights): prob = sigmoid(sum(inX*weights)) if prob>0.5: return 1.0 else: return 0 def colicTest(ftr, fte, numIter): frTrain = open(ftr) frTest = open(fte) trainingSet=[] trainingLabels=[] for line in frTrain.readlines(): currLine = line.strip('\n').split('\t') lineArr=[] for i in range(21]): lineArr.append(float(currLine[i])) trainingSet.append(lineArr) trainingLabels.append(float(currLine[21)) frTrain.close() trainWeights = stoGradAscent1(array(trainingSet),trainingLabels, numIter) errorCount = 0 numTestVec = 0.0 for line in frTest.readlines(): numTestVec += 1.0 currLine = line.strip('\n').split('\t') lineArr=[] for i in range(21]): lineArr.append(float(currLine[i])) if int(classifyVector(array(lineArr), trainWeights))!=int(currLine[21]) errorCount += 1 frTest.close() errorRate = (float(errorCount))/numTestVec return errorRate def multiTest(ftr, fte, numT, numIter): errors=[] for k in range(numT): error = colicTest(ftr, fte, numIter) errors.append(error) print "There "+str(len(errors))+" test with "+str(numIter)+" interations in all!" for i in range(numT): print "The "+str(i+1)+"th"+" testError is:"+str(errors[i]) print "Average testError: ", float(sum(errors))/len(errors) ''''' data, labels = loadDataSet() weights0 = stoGradAscent0(array(data), labels) weights,errors = gradAscent(data, labels) weights1= stoGradAscent1(array(data), labels, 500) print weights plotBestFit(weights) print weights0 weights00 = [] for w in weights0: weights00.append([w]) plotBestFit(mat(weights00)) print weights1 weights11=[] for w in weights1: weights11.append([w]) plotBestFit(mat(weights11)) ''' multiTest(r"horseColicTraining.txt",r"horseColicTest.txt",10,500)
Summary
That is all about the classic algorithm of machine learning in this article-Full Code Explanation of Logistic Regression in Python, hoping it will be helpful to everyone. Those who are interested can continue to refer to this site:
Implementation of k in Python-Means Clustering Algorithm in Detail
Python Implementation of Particle Swarm Optimization (PSO) in Detail
Python Implementation of Ant Colony Algorithm in Detail
If there are any shortcomings, please leave a message to point them out. Thank you for your friends' support of this site!
Statement: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, does not edit the content manually, and does not assume relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (Please replace # with @ when sending an email for reporting, and provide relevant evidence. Once verified, this site will immediately delete the content suspected of infringement.)