English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Pandas index and data query operation examples
In this chapter, we will discuss how to slice and dice dates and obtain subsets of Pandas objects.
Python and NumPy index operators "[]" and attribute operators ".". They can quickly and easily access Pandas data structures in various use cases. However, due to the unknown data type to be accessed in advance, there are some optimization limitations when using standard operators. For production code, we recommend using the optimized panda data access methods introduced in this chapter.
Pandas now supports three types of multi-axis indexing: The table below mentions three types-
Index | Description |
.loc() | Based on label |
.iloc() | Based on integer |
.ix() | Based on label and integer |
Pandas provides multiple methods for purely label-based indexing. When slicing, it also includes the start boundary. Integers are valid labels, but they refer to labels rather than positions.
.loc() With various access methods, for example:
A scalar label Label list Slice object Boolean array
loc Need two single/List/Range operator, separated by commas. The first indicates the row, the second indicates the column.
# Import the pandas library and alias it as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D']) # Selecting all rows of specific columns Case
Running Results:
A C391548 a 0.0 -0.070649 b -0.317212 c -2.162406 d 2.202797 a 0.0613709 00 1.050559 g 1.122680 Name: A, dtype: float64
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D']) print(df.loc[:,'A']) # Selecting all rows for multiple columns, for example, list[]
Running Results:
print(df.loc[:,['A','C']]) A B C D391548 0.745623 a 0.0 -0.070649 1.620406 b -0.317212 1.448365 c -2.162406 -0.873557 d 2.202797 0.528067 e613709 0.286414 00 1.050559 0.216526 g 1.122680 -1.621420
# Import the pandas library and alias it as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D']) A C # Selecting several rows for multiple columns, for example, list[]
Running Results:
print(df.loc[['a','b','f','h'],['A','C']]) A C391548 0.745623 a 0.0 -0.070649 1.620406 a 0.0613709 0.286414 g 1.122680 -1.621420
# Import the pandas library and alias it as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D']) f 0.0 # Selecting row range for all columns
Running Results:
print(df.loc['a':'h']) A B C D391548 -0.224297 0.745623 0.054301 a 0.0 -0.070649 -0.880130 1.620406 1.419743 b -0.317212 -1.929698 1.448365 0.616899 c -2.162406 0.614256 -0.873557 1.093958 d 2.202797 -2.315915 0.528067 0.612482 e613709 -0.157674 0.286414 -0.5f 0.0517 00 1.050559 -2.272099 0.216526 0.928449 g 1.122680.0324368 -1.621420 -0.741470
# Import the pandas library and alias it as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D']) # Used to retrieve values using a boolean array print(df.loc['a'] > 0)
Running Results:
A False B True C False D False Name: a, dtype: bool
Pandas provides multiple methods to obtain purely integer-based indexing. Like Python and NumPy, they are all 0-based indexing.
The following are various access methods:
Integer Integer list Value range
# Import the pandas library and alias it as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Selecting all rows of specific columns print(df.iloc[:4])
Running Results:
A B C D 0.0699435 0.256239 -1.270702 -0.645195 1 -0.685354 0.890791 -0.813012 0.631615 2 -0.783192 -0.531378 0.025070.0230806 3 0.539042 -1.284314 0.826977 -0.026251
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Integer slicing print(df.iloc[:4] print(df.iloc[1:5, 2:4])
Running Results:
A B C D 0.0699435 0.256239 -1.270702 -0.645195 1 -0.685354 0.890791 -0.813012 0.631615 2 -0.783192 -0.531378 0.025070.0230806 3 0.539042 -1.284314 0.826977 -0.026251 C D 1 -0.813012 0.631615 2 0.025070.0230806 3 0.826977 -0.026251 4 1.423332 1.130568
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Slicing value lists print(df.iloc[[1, 3, 5], [1, 3], ) print(df.iloc[1:3, :) print(df.iloc[:,1:3])
Running Results:
B D 1 0.890791 0.631615 3 -1.284314 -0.026251 5 -0.512888 -0.518930 A B C D 1 -0.685354 0.890791 -0.813012 0.631615 2 -0.783192 -0.531378 0.025070.0230806 B C 0.0256239 -1.270702 1 0.890791 -0.813012 2 -0.531378 0.025070 3 -1.284314 0.826977 4 -0.460729 1.423332 5 -0.512888 0.581409 6 -1.204853 0.098060 7 -0.947857 0.641358
In addition to the pure label and integer-based methods, Pandas also provides a mixed method for selecting and subsetting objects using the .ix() operator.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Integer slicing print(df.ix[:4])
Running Results:
A B C D 0.0699435 0.256239 -1.270702 -0.645195 1 -0.685354 0.890791 -0.813012 0.631615 2 -0.783192 -0.531378 0.025070.0230806 3 0.539042 -1.284314 0.826977 -0.026251
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) # Index slicing print(df.ix[:, 'A'])
Running Results:
0.0699435 1 -0.685354 2 -0.783192 3 0.539042 4 -1.044209 5 -1.415411 6 1.062095 7 0.994204 Name: A, dtype: float64
Values are retrieved from Pandas objects using multi-axis indexing with the following symbols:
Object | Indexer | Return Type |
Series | s.loc[indexer] | Scalar Value |
DataFrame | df.loc[row_index,col_index] | Series Object |
Panel | p.loc[item_index,major_index, minor_index] | p.loc[item_index,major_index, minor_index] |
.iloc() and .ix() apply the same indexing options and return values.
Let's see how to perform each operation on a DataFrame object. We will use the basic index operator '[]'-
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) print(df['A'])
Running Results:
0 -0.478893 1 0.391931 2 0.336825 3 -1.055102 4 -0.165218 5 -0.328641 6 0.567721 7 -0.759399 Name: A, dtype: float64
We can pass a list of values to [] to select those columns
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) print(df[['A','B']])
Running Results:
A B 0 -0.478893 -0.606311 1 0.391931 -0.949025 2 0.336825 0.093717 3 -1.055102 -0.012944 4 -0.165218 1.550310 5 -0.328641 -0.226363 6 0.567721 -0.312585 7 -0.759399 -0.372696
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) print(df[2:2])
Running Results:
Columns: [A, B, C, D] Index: []
You can use the attribute operator “.” to select columns.
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D']) print(df.A)
Running Results:
0 -0.478893 1 0.391931 2 0.336825 3 -1.055102 4 -0.165218 5 -0.328641 6 0.567721 7 -0.759399 Name: A, dtype: float64