English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Indexing and Data Querying in Pandas

Pandas index and data query operation examples

In this chapter, we will discuss how to slice and dice dates and obtain subsets of Pandas objects.
Python and NumPy index operators "[]" and attribute operators ".". They can quickly and easily access Pandas data structures in various use cases. However, due to the unknown data type to be accessed in advance, there are some optimization limitations when using standard operators. For production code, we recommend using the optimized panda data access methods introduced in this chapter.
Pandas now supports three types of multi-axis indexing: The table below mentions three types-

IndexDescription
.loc()Based on label
.iloc()Based on integer
.ix()Based on label and integer

.loc()

Pandas provides multiple methods for purely label-based indexing. When slicing, it also includes the start boundary. Integers are valid labels, but they refer to labels rather than positions.

.loc() With various access methods, for example:

A scalar label Label list Slice object Boolean array

loc Need two single/List/Range operator, separated by commas. The first indicates the row, the second indicates the column.

案例 1

# Import the pandas library and alias it as pd
 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4,
 index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
 # Selecting all rows of specific columns
 Case

Running Results:

  A        C391548
a        0.0  -0.070649
b  -0.317212
c  -2.162406
d   2.202797
a        0.0613709
00   1.050559
g   1.122680
Name: A, dtype: float64

Instance 2

   
 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4,
 index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
 print(df.loc[:,'A'])
 # Selecting all rows for multiple columns, for example, list[]

Running Results:

          print(df.loc[:,['A','C']])
A        B        C        D391548    0.745623
a        0.0   -0.070649    1.620406
b   -0.317212    1.448365
c   -2.162406   -0.873557
d    2.202797    0.528067
e613709    0.286414
00    1.050559    0.216526
g    1.122680   -1.621420

Instance 3

# Import the pandas library and alias it as pd
 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4,
 index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
 A        C
 # Selecting several rows for multiple columns, for example, list[]

Running Results:

         print(df.loc[['a','b','f','h'],['A','C']])
A        C391548   0.745623
a        0.0  -0.070649   1.620406
a        0.0613709   0.286414
g   1.122680  -1.621420

Instance 4

# Import the pandas library and alias it as pd
 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4,
 index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
 f        0.0
 # Selecting row range for all columns

Running Results:

          print(df.loc['a':'h'])
A        B        C        D391548   -0.224297   0.745623   0.054301
a        0.0   -0.070649   -0.880130   1.620406   1.419743
b   -0.317212   -1.929698   1.448365   0.616899
c   -2.162406    0.614256  -0.873557   1.093958
d    2.202797   -2.315915   0.528067   0.612482
e613709   -0.157674   0.286414  -0.5f        0.0517
00    1.050559   -2.272099   0.216526   0.928449
g    1.122680.0324368  -1.621420  -0.741470

Instance 5

# Import the pandas library and alias it as pd
 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4,
 index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
 # Used to retrieve values using a boolean array
 print(df.loc['a'] > 0)

Running Results:

   A    False
 B    True
 C    False
 D    False
 Name: a, dtype: bool

.iloc()

Pandas provides multiple methods to obtain purely integer-based indexing. Like Python and NumPy, they are all 0-based indexing.
The following are various access methods:

Integer Integer list Value range

Instance1

# Import the pandas library and alias it as pd
 import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 # Selecting all rows of specific columns
 print(df.iloc[:4])

Running Results:

         A        B        C        D
0.0699435   0.256239   -1.270702   -0.645195
1  -0.685354   0.890791   -0.813012    0.631615
2  -0.783192  -0.531378    0.025070.0230806
3   0.539042  -1.284314    0.826977   -0.026251

Instance 2

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 # Integer slicing
 print(df.iloc[:4]
 print(df.iloc[1:5, 2:4])

Running Results:

         A        B        C        D
0.0699435   0.256239   -1.270702   -0.645195
1  -0.685354   0.890791   -0.813012    0.631615
2  -0.783192  -0.531378    0.025070.0230806
3   0.539042  -1.284314    0.826977   -0.026251
           C        D
1  -0.813012   0.631615
2   0.025070.0230806
3   0.826977  -0.026251
4   1.423332   1.130568

Instance 3

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 # Slicing value lists
 print(df.iloc[[1, 3, 5], [1, 3], )
 print(df.iloc[1:3, :)
 print(df.iloc[:,1:3])

Running Results:

         B        D
1   0.890791    0.631615
3  -1.284314   -0.026251
5  -0.512888   -0.518930
           A        B        C        D
1  -0.685354    0.890791   -0.813012    0.631615
2  -0.783192   -0.531378    0.025070.0230806
           B        C
0.0256239   -1.270702
1   0.890791   -0.813012
2  -0.531378    0.025070
3  -1.284314    0.826977
4  -0.460729    1.423332
5  -0.512888    0.581409
6  -1.204853    0.098060
7  -0.947857    0.641358

.ix()

In addition to the pure label and integer-based methods, Pandas also provides a mixed method for selecting and subsetting objects using the .ix() operator.

Instance 1

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 # Integer slicing
 print(df.ix[:4])

Running Results:

         A        B        C        D
0.0699435   0.256239   -1.270702   -0.645195
1  -0.685354   0.890791   -0.813012    0.631615
2  -0.783192  -0.531378    0.025070.0230806
3   0.539042  -1.284314    0.826977   -0.026251

Instance 2

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 # Index slicing
 print(df.ix[:, 'A'])

Running Results:

  0.0699435
1  -0.685354
2  -0.783192
3   0.539042
4  -1.044209
5  -1.415411
6   1.062095
7   0.994204
Name: A, dtype: float64

Symbol usage

Values are retrieved from Pandas objects using multi-axis indexing with the following symbols:

ObjectIndexerReturn Type
Seriess.loc[indexer]Scalar Value
DataFramedf.loc[row_index,col_index]Series Object
Panelp.loc[item_index,major_index, minor_index]p.loc[item_index,major_index, minor_index]

.iloc() and .ix() apply the same indexing options and return values.

Let's see how to perform each operation on a DataFrame object. We will use the basic index operator '[]'-

Instance 1

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 print(df['A'])

Running Results:

  0  -0.478893
1   0.391931
2   0.336825
3  -1.055102
4  -0.165218
5  -0.328641
6   0.567721
7  -0.759399
Name: A, dtype: float64

We can pass a list of values to [] to select those columns

Instance 2

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 print(df[['A','B']])

Running Results:

         A           B
0  -0.478893   -0.606311
1   0.391931   -0.949025
2   0.336825    0.093717
3  -1.055102   -0.012944
4  -0.165218    1.550310
5  -0.328641   -0.226363
6   0.567721   -0.312585
7  -0.759399   -0.372696

Instance 3

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 print(df[2:2])

Running Results:

   Columns: [A, B, C, D]
 Index: []

Attribute Access

You can use the attribute operator “.” to select columns.

Instance

import pandas as pd
 import numpy as np
 df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])
 print(df.A)

Running Results:

  0   -0.478893
1    0.391931
2    0.336825
3   -1.055102
4   -0.165218
5   -0.328641
6    0.567721
7   -0.759399
Name: A, dtype: float64