English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

NumPy Statistical Functions

numpy.amin() and numpy.amax()

numpy.amin() is used to calculate the minimum value of the elements along the specified axis in the array.
numpy.amax() is used to calculate the maximum value of the elements along the specified axis in the array.

 import numpy as np 
 a = np.array([[3,7,5],[8,4,3],[2,4,9]) 
 print('Our array is:', a)
 print('Call the amin() function:', np.amin(a,1))
 print('Call the amin() function again:', np.amin(a, 0))
 print('Call the amax() function:', np.amax(a))
 print('Call the amax() function again:', np.amax(a, axis=0))

The output result is:

 Our array is: [[3 7 5])
  [8 4 3])
  [2 4 9]
 Call the amin() function: [3 3 2])
 Call the amin() function again: [2 4 3])
 Call the amax() function: 9
 Call the amax() function again: [8 7 9])

numpy.ptp()

The numpy.ptp() function calculates the difference between the maximum and minimum values of the elements in the array (maximum value - Minimum value).

 import numpy as np 
 a = np.array([[3,7,5],[8,4,3],[2,4,9]) 
 print('Call the ptp() function:', np.ptp(a))
 print ('Along axis 1 Call the ptp() function: ', np.ptp(a, axis= 1))
 print('Call the ptp() function along axis 0:', np.ptp(a, axis=0))

The output result is:

 Call the ptp() function: 7
 Along axis 1 Call the ptp() function: [4 5 7])
 Call the ptp() function along axis 0: [6 3 6])

numpy.percentile()

Percentiles are a measure used in statistics to represent the percentage of observations that are less than this value. The numpy.percentile() function accepts the following parameters.

numpy.percentile(a, q, axis)

Parameter description:

a: input array q: the percentile to be calculated, in 0 ~ 100 between axis: along which to calculate the percentile

Firstly, clarify the percentile:

The p-th percentile is such a value that at least p% of the data items are less than or equal to this value, and at least (100-p)% of the data items are greater than or equal to this value.
For example: The entrance examination scores of colleges and universities are often reported in the form of percentiles. For example, suppose a candidate's original score in the Chinese section of the entrance examination is 54 Score. It is not easy to know how his performance compares to other students who took the same exam. However, if the original score54The score exactly corresponds to the70 percentile, we can know approximately70% of the students scored lower than him, and about30% of the students scored higher than him.

Here, p = 70.

 import numpy as np 
 a = np.array([[10, 7, 4], [3, 2, 1])
 # 50% of the quantile, which is the median of the sorted a
 print('Calling the percentile() function:', np.percentile(a, 50)) 
 # axis is 0, on the vertical line
 print(np.percentile(a, 50, axis=0)) 
 # axis is 1, on the horizontal line
 print(np.percentile(a, 50, axis=1)) 
 # Keep dimensions unchanged
 print(np.percentile(a, 50, axis=1, keepdims=True))

The output result is:

 Calling the percentile() function: 3.5
 [6.5 4.5 2.5])
 [7. 2.]
 [[7.]
 [2.]]

numpy.median()

The numpy.median() function is used to calculate the median (median value) of the elements in the array a

 import numpy as np 
 a = np.array([[30,65,70],[80,95,10],[50,90,60]]) 
 print('Calling the median() function:', np.median(a))
 print ('Call the 'median()' function along axis 0:', np.median(a, axis = 0))
 print ('Along axis 1 Call the 'median()' function: ', np.median(a, axis = 1))

The output result is:

 Call the 'median()' function: 65.0
 Call the 'median()' function along axis 0: [50. 90. 60.]
 Along axis 1 Call the 'median()' function: [65. 80. 60.]

numpy.mean()

The 'numpy.mean()' function returns the arithmetic mean of the elements in the array. If an axis is provided, it calculates along it.
The arithmetic mean is the total sum of elements along the axis divided by the number of elements.

 import numpy as np 
 a = np.array([[1,2,3],[3,4,5],[4,5,6]) 
 print ('Call the 'mean()' function:
 print ('Call the 'mean()' function along axis 0:', np.mean(a, axis = 0))
 print ('Along axis 1 Call the 'mean()' function: ', np.mean(a, axis = 1))

The output result is:

 Call the 'mean()' function:3.6666666666666665
 Call the 'mean()' function along axis 0: [2.66666667 3.66666667 4.66666667])
 Along axis 1 Call the 'mean()' function: [2. 4. 5.]

numpy.average()

The 'numpy.average()' function calculates the weighted average of the elements in the array based on the weights given in another array.
This function can accept an axis parameter. If no axis is specified, the array will be expanded.<
The weighted average multiplies each value by the corresponding weight, sums them up to get the total value, and then divides by the total number of units.
Consider the array [1,2,3,4], and the corresponding weights [4,3,2,1], through adding the product of corresponding elements and dividing the sum by the sum of weights to calculate the weighted average.

Weighted average = (1*4+2*3+3*2+4*1])/(4+3+2+1])
 import numpy as np 
 a = np.array([1,2,3,4]) 
 print ('Call the 'average()' function:', np.average(a))
 # When no weights are specified, it is equivalent to the 'mean' function
 wts = np.array([4,3,2,1]) 
 print ('Again call the 'average()' function:', np.average(a, weights = wts))
 # If the 'returned' parameter is set to True, it returns the sum of weights 
 print ('The sum of weights:', np.average([1,2,3, 4], weights = [4,3,2,1], returned = True))

The output result is:

 Call the 'average()' function:2.5
 Again call the 'average()' function:2.0
 Sum of weights: (2.0, 10.0)

In multi-dimensional arrays, you can specify the axis for calculation.

 import numpy as np 
 a = np.arange(6).reshape(3,2]) 
 wt = np.array([3,5]) 
 print ('Modified array:', np.average(a, axis = 1, weights = wt))
 print ('Modified array:', np.average(a, axis = 1, weights = wt, returned = True))

The output result is:

 Modified array: [0.625 2.625 4.625])
 Modified array: (array([0.625, 2.625, 4.625]), array([8. 8. 8.))

Standard Deviation

The standard deviation is a measure of the degree of dispersion of the average value of a set of data.
The standard deviation is the arithmetic square root of the variance.
The standard deviation formula is as follows:

 std = sqrt(mean((x - x.mean())**2))

If the array is [1,2,3,4], then its mean is 2.5。 Therefore, the square of the difference is [2.25,0.25,0.25,2.25]) and then the square root of the mean divided by 4, that is, sqrt(5/4)) , the result is 1.1180339887498949。

import numpy as np 
print (np.std([1,2,3,4))

The output result is:

1.1180339887498949

Variance

The variance in statistics (sample variance) is the mean of the squared differences between each sample value and the mean of all sample values, that is, mean((x - x.mean())** 2)。
In other words, the standard deviation is the square root of the variance.

import numpy as np 
print (np.var([1,2,3,4))
1.25