English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
NumPy provides two basic objects, namely ndarray and ufunc objects. Ufunc is the abbreviation of universal function, which means 'universal function', it is a function that can operate on each element of an array.
Many ufunc functions are implemented at the C language level, so they are very fast in terms of computation.
Moreover, ufuncs are more flexible than the functions in the math module. The input of the math module is generally a scalar, but the functions in NumPy can be vectors or matrices, and using vectors or matrices can avoid the use of loop statements, which is very important in machine learning and deep learning.
Ufunc is used to implement vectorization in NumPy, which is much faster than iterating over elements.
They also provide broadcasting and other methods, such as reduction and accumulation, which are very helpful for calculations.
Ufuncs also accept other parameters, such as:
where is a boolean array or condition used to define where the operation should be performed.
dtype defines the return type of the elements.
The out return value should be copied to the output array.
Function | Usage |
sqrt() | Calculate the square root of serialized data |
sin(), cos() | Trigonometric functions |
abs() | Calculate the absolute value of serialized data |
dot() | Matrix operation |
log(), logl(), log2() | Logarithmic function |
exp() | Exponential function |
cumsum(), cumproduct() | Cumulative sum and product |
sum() | Sum a serialized data sequence |
mean() | Calculate mean |
median() | Calculate median |
std() | Calculate standard deviation |
var() | Calculate variance |
corrcoef() | Calculate correlation coefficient |
import time import math import numpy as np x = [i * 0.001 for i in np.arange(1000000) start = time.clock() for i, t in enumerate(x): x[i] = math.sin(t) print("math.sin:", time.clock()) - start()) x = [i * 0.001 for i in np.arange(1000000) x = np.array(x) start = time.clock() np.sin(x) print("numpy.sin:", time.clock()) - start())
Running Result:
math.sin: 0.5169950000000005 numpy.sin: 0.05381199999999886
As can be seen, numpy.sin is nearly 10 times.
Converting iterative statements to vector-based operations is called vectorization.
Since modern CPUs have been optimized for such operations, they are faster.
Add elements of two lists:
list 1: [1, 2, 3, 4]
list 2: [4, 5, 6, 7]
One method is to iterate through two lists and sum each element.
If there is no ufunc, we can use Python's built-in zip() method:
x = [1, 2, 3, 4] y = [4, 5, 6, 7] z = [] for i, j in zip(x, y): z.append(i + j) print(z)
Running Result:
[5, 7, 9, 11]
In this regard, NumPy has a ufunc named add(x, y) that produces the same result. Through ufunc, we can use the add() function:
import numpy as np x = [1, 2, 3, 4] y = [4, 5, 6, 7] z = np.add(x, y) print(z)
Running Result:
[5, 7, 9, 11]
Fully utilize the built-in functions of the NumPy library in Python (Built-to implement vectorization of computation, which can greatly improve the execution speed. Built-in functions in the NumPy library use SIMD instructions. The following vectorization is much faster than using loops for calculation. If using GPU, the performance will be even stronger, but NumPy does not support GPU.
See the following code:
import time import numpy as np x1 = np.random.rand(1000000) x2 = np.random.rand(1000000) ## Using loop to calculate vector dot product tic = time.process_time() dot = 0 for i in range(len(x1)): dot+= x1[i]*x2[i] toc = time.process_time() print("dot = " + str(dot) + ## for loop-----Calculation Time = " + str(1000*(toc - tic)) + "ms" ## Using numpy function to calculate dot product tic = time.process_time() dot = 0 dot = np.dot(x1,x2) toc = time.process_time() print("dot = " + str(dot) + "\nVerctor Version---- Calculation Time = " + str(1000*(toc - tic)) + "ms"
Running Result:
dot = 250215.601995 for loop-----Calculation Time = 798.3389819999998ms dot = 250215.601995 Verctor Version---- Calculation Time = 1.885051999999554ms