English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Pandas Basic Methods Examples
So far, we have learned about three Pandas DataStructures and how to create them. Due to its importance in real-time data processing, we will mainly focus on DataFrame objects and discuss some other DataStructures.
Method | Description |
axes | Return the list of row axis labels |
dtype | Return the dtype of the object. |
empty | If the Series is empty, return True. |
ndim | Return the number of dimensions of the base data according to definition. |
size | Return the number of elements in the base data. |
values | Return Series as ndarray. |
head() | Returns the first n rows. |
tail() | Returns the last n rows. |
import pandas as pd import numpy as np # Use100 random numbers to create a Series s = pd.Series(np.random.randn(4)) print(s)
Running Result:
0 0.967853 1 -0.148368 2 -1.395906 3 -1.758394 dtype: float64
Return the list of Series labels
import pandas as pd import numpy as np # Use100 random numbers to create a Series s = pd.Series(np.random.randn(4)) print("The axes are:") print(s.axes)
Running Result:
The axes are: [RangeIndex(start=0, stop=4, step=1])
The above result is from 0 to5(i.e., [0,1,2,3,4]).
Return a boolean value indicating whether the object is empty. True means the object is empty
import pandas as pd import numpy as np # Use100 random numbers to create a Series s = pd.Series(np.random.randn(4)) print("Is the Object empty?") print(s.empty)
Running Result:
Is the Object empty? False
Return the number of dimensions of the object. According to definition, Series is a1D data structure, so it returns
import pandas as pd import numpy as np # Use4Create a Series with a random number s = pd.Series(np.random.randn(4)) print s print("The dimensions of the object:") print(s.ndim)
Running Result:
0 0.175898 1 0.166197 2 -0.609712 3 -1.377000 dtype: float64 The dimensions of the object: 1
Return the size (length) of the Series.
import pandas as pd import numpy as np # Use4Create a Series with a random number s = pd.Series(np.random.randn(2)) print s print("The size of the object:") print(s.size)
Running Result:
0 3.078058 1 -1.207803 dtype: float64 The size of the object: 2
Return Series data in array form
import pandas as pd import numpy as np # Use4Create a Series with a random number s = pd.Series(np.random.randn(4)) print s print("The actual data series is:") print(s.values)
Running Result:
0 1.787373 1 -0.605159 2 0.180477 3 -0.140922 dtype: float64 The actual data series is: [ 1.78737302 -0.60515881 0.18047664 -0.1409218 ]
To view the head and tail data of a Series or DataFrame object, please use the head() and tail() methods.
head() Return the first n rows (observation index values). The default number of elements displayed is5However, you can pass custom numbers.
import pandas as pd import numpy as np # Use4Create a Series with a random number s = pd.Series(np.random.randn(4)) print("The initial series is:") print s print("The first two rows of the data series:") print(s.head(2))
Running Result:
The original series is: 0 0.720876 1 -0.765898 2 0.479221 3 -0.139547 dtype: float64 The first two rows of the data series: 0 0.720876 1 -0.765898 dtype: float64
tail() Return the last n rows (observe the index value). The default number of elements displayed is5However, you can pass custom numbers.
import pandas as pd import numpy as np # Use4Create a Series with a random number s = pd.Series(np.random.randn(4)) print("The original series is:") print(s) print("The last two rows of the data series:") print(s.tail(2)
Running Result:
The original series is: 0 -0.655091 1 -0.881407 2 -0.608592 3 -2.341413 dtype: float64 The last two rows of the data series are: 2 -0.608592 3 -2.341413 dtype: float64
Now let's understand what the basic functions of DataFrame are. The table below lists important attributes or methods that help with the basic functions of DataFrame.
Attribute/Method | Description |
T | Row and column are mutually converted |
axes | Returns a list with unique members of row labels and column labels. |
dtypes | Returns the dtypes in this object. |
empty | If the NDataFrame is completely empty [no items], then true; otherwise false. If any axis has a length of 0. |
ndim | Number of axes/Array size. |
shape | Returns a tuple representing the dimensions of the DataFrame. |
size | The number of elements in the NDataFrame. |
values | NDFrame's numeric representation. |
head() | Returns the first n rows. |
tail() | Returns the last n rows. |
Next, let's create a DataFrame and view all the ways to operate on the above properties.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("Our data series is:") print(df)
Running Result:
Our data series is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80
Returns the transpose of the DataFrame. Rows and columns will be swapped.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("The transpose of the data series is:") print(df.T)
Running Result:
The transpose of the data series is: 0 1 2 3 4 5 6 Age 25 26 25 23 30 29 23 Name object Tom James Ricky Vin Steve Smith Jack Rating 4.23 3.24 3.98 2.56 3.2 4.6 3.8
Returns a list of row labels and column labels.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("The row labels and column labels are:") print(df.axes)
Running Result:
The row labels and column labels are: [RangeIndex(start=0, stop=7, step=1), Index([u'Age', u'Name', u'Rating'], dtype='object')]
Returns the data type of each column.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("The data types of each column are as follows:") print(df.dtypes)
Running Result:
The data types of each column are as follows: Age int64 Name object Rating float64 dtype: object
Returns a boolean value indicating whether the object is empty; True means the object is empty.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("Is the object empty?") print(df.empty)
Running Result:
Is the object empty? False
Returns the number of objects. According to the definition, DataFrame is2D object.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("Our object is:") print df print("The dimension of the object is:") print(df.ndim)
Running Result:
Our object is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 The dimension of the object is: 2
Returns a tuple representing the dimensions of the DataFrame. Tuple (a, b), where a represents the number of rows, and b represents the number of columns.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("Our object is:") print df print("The shape of the object is:") print(df.shape)
Running Result:
Our object is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 The shape of the object is: (7, 3)
Returns the number of elements in the DataFrame.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("Our object is:") print df print("The total number of elements in our object is:") print(df.size)
Running Result:
Our object is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 The total number of elements in our object is: 21
Returns the actual data in the DataFrame in the form of NDarray.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("Our object is:") print df print("The actual data in our data frame is:") print(df.values)
Running Result:
Our object is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 The actual data in our data frame is: [[25 "Tom" 4.23] [26 "James" 3.24] [25 "Ricky" 3.98] [23 "Vin" 2.56] [30 "Steve" 3.2] [29 "Smith" 4.6] [23 "Jack" 3.8]]
To view the head and tail data of the DataFrame object, please use the head() and tail() methods. head() returns the first n rows (observing the index value). The default number of elements displayed is5However, you can pass custom numbers.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print("Our data frame is:") print df print("The first two rows of the data frame are:") print(df.head(2))
Running Result:
Our data frame is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 The first two rows of the data frame are: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24
tail() Return the last n rows (observe the index value). The default number of elements displayed is5However, you can pass custom numbers.
import pandas as pd import numpy as np # Create Series dictionary d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']), 'Age':pd.Series([25,26,25,23,30,29,23]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])} # Create a DataFrame df = pd.DataFrame(d) print ("Our data frame is:") print df print ("The last two rows of the data frame are:") print(df.tail(2))
Running Result:
Our data frame is: Age Name Rating 0 25 Tom 4.23 1 26 James 3.24 2 25 Ricky 3.98 3 23 Vin 2.56 4 30 Steve 3.20 5 29 Smith 4.60 6 23 Jack 3.80 The last two rows of the data frame are: Age Name Rating 5 29 Smith 4.6 6 23 Jack 3.8