SQL Operations in Pandas Custom Options in Pandas

Indexing and Data Querying in Pandas

Pandas index and data query operation examples

In this chapter, we will discuss how to slice and dice dates and obtain subsets of Pandas objects.
Python and NumPy index operators "[]" and attribute operators ".". They can quickly and easily access Pandas data structures in various use cases. However, due to the unknown data type to be accessed in advance, there are some optimization limitations when using standard operators. For production code, we recommend using the optimized panda data access methods introduced in this chapter.
Pandas now supports three types of multi-axis indexing: The table below mentions three types-

Index	Description
.loc()	Based on label
.iloc()	Based on integer
.ix()	Based on label and integer

.loc()

Pandas provides multiple methods for purely label-based indexing. When slicing, it also includes the start boundary. Integers are valid labels, but they refer to labels rather than positions.

.loc() With various access methods, for example:

A scalar label Label list Slice object Boolean array

loc Need two single/List/Range operator, separated by commas. The first indicates the row, the second indicates the column.

案例 1

Example

# Import the pandas library and alias it as pd
　import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4,
　index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
　# Selecting all rows of specific columns
　Case

Running Results:

　　A        C391548
a        0.0　　-0.070649
b　　-0.317212
c　　-2.162406
d　　　2.202797
a        0.0613709
00　　　1.050559
g　　　1.122680
Name:　A,　dtype:　float64

Instance 2

Example

　　　
　import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4,
　index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
　print(df.loc[:,'A'])
　# Selecting all rows for multiple columns, for example, list[]

Running Results:

　　　　　　　　　　print(df.loc[:,['A','C']])
A        B        C        D391548　　　　0.745623
a        0.0　　　-0.070649　　　　1.620406
b　　　-0.317212　　　　1.448365
c　　　-2.162406　　　-0.873557
d　　　　2.202797　　　　0.528067
e613709　　　　0.286414
00　　　　1.050559　　　　0.216526
g　　　　1.122680　　　-1.621420

Instance 3

Example

# Import the pandas library and alias it as pd
　import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4,
　index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
　A        C
　# Selecting several rows for multiple columns, for example, list[]

Running Results:

　　　　　　　　　print(df.loc[['a','b','f','h'],['A','C']])
A        C391548　　　0.745623
a        0.0　　-0.070649　　　1.620406
a        0.0613709　　　0.286414
g　　　1.122680　　-1.621420

Instance 4

Example

# Import the pandas library and alias it as pd
　import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4,
　index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
　f        0.0
　# Selecting row range for all columns

Running Results:

　　　　　　　　　　print(df.loc['a':'h'])
A        B        C        D391548　　　-0.224297　　　0.745623　　　0.054301
a        0.0　　　-0.070649　　　-0.880130　　　1.620406　　　1.419743
b　　　-0.317212　　　-1.929698　　　1.448365　　　0.616899
c　　　-2.162406　　　　0.614256　　-0.873557　　　1.093958
d　　　　2.202797　　　-2.315915　　　0.528067　　　0.612482
e613709　　　-0.157674　　　0.286414　　-0.5f        0.0517
00　　　　1.050559　　　-2.272099　　　0.216526　　　0.928449
g　　　　1.122680.0324368　　-1.621420　　-0.741470

Instance 5

Example

# Import the pandas library and alias it as pd
　import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4,
　index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'], columns = ['A', 'B', 'C', 'D'])
　# Used to retrieve values using a boolean array
　print(df.loc['a'] > 0)

Running Results:

　　　A    False
　B    True
　C    False
　D    False
　Name: a, dtype: bool

.iloc()

Pandas provides multiple methods to obtain purely integer-based indexing. Like Python and NumPy, they are all 0-based indexing.
The following are various access methods:

Integer Integer list Value range

Instance1

Example

# Import the pandas library and alias it as pd
　import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　# Selecting all rows of specific columns
　print(df.iloc[:4])

Running Results:

　　　　　　　　　A        B        C        D
0.0699435　　　0.256239　　　-1.270702　　　-0.645195
1　　-0.685354　　　0.890791　　　-0.813012　　　　0.631615
2　　-0.783192　　-0.531378　　　　0.025070.0230806
3　　　0.539042　　-1.284314　　　　0.826977　　　-0.026251

Instance 2

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　# Integer slicing
　print(df.iloc[:4]
　print(df.iloc[1:5,　2:4])

Running Results:

　　　　　　　　　A        B        C        D
0.0699435　　　0.256239　　　-1.270702　　　-0.645195
1　　-0.685354　　　0.890791　　　-0.813012　　　　0.631615
2　　-0.783192　　-0.531378　　　　0.025070.0230806
3　　　0.539042　　-1.284314　　　　0.826977　　　-0.026251
　　　　　　　　　　　C        D
1　　-0.813012　　　0.631615
2　　　0.025070.0230806
3　　　0.826977　　-0.026251
4　　　1.423332　　　1.130568

Instance 3

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　# Slicing value lists
　print(df.iloc[[1,　3,　5], [1,　3], )
　print(df.iloc[1:3, :)
　print(df.iloc[:,1:3])

Running Results:

　　　　　　　　　B        D
1　　　0.890791　　　　0.631615
3　　-1.284314　　　-0.026251
5　　-0.512888　　　-0.518930
　　　　　　　　　　　A        B        C        D
1　　-0.685354　　　　0.890791　　　-0.813012　　　　0.631615
2　　-0.783192　　　-0.531378　　　　0.025070.0230806
　　　　　　　　　　　B        C
0.0256239　　　-1.270702
1　　　0.890791　　　-0.813012
2　　-0.531378　　　　0.025070
3　　-1.284314　　　　0.826977
4　　-0.460729　　　　1.423332
5　　-0.512888　　　　0.581409
6　　-1.204853　　　　0.098060
7　　-0.947857　　　　0.641358

.ix()

In addition to the pure label and integer-based methods, Pandas also provides a mixed method for selecting and subsetting objects using the .ix() operator.

Instance 1

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　# Integer slicing
　print(df.ix[:4])

Running Results:

　　　　　　　　　A        B        C        D
0.0699435　　　0.256239　　　-1.270702　　　-0.645195
1　　-0.685354　　　0.890791　　　-0.813012　　　　0.631615
2　　-0.783192　　-0.531378　　　　0.025070.0230806
3　　　0.539042　　-1.284314　　　　0.826977　　　-0.026251

Instance 2

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　# Index slicing
　print(df.ix[:, 'A'])

Running Results:

　　0.0699435
1　　-0.685354
2　　-0.783192
3　　　0.539042
4　　-1.044209
5　　-1.415411
6　　　1.062095
7　　　0.994204
Name:　A,　dtype:　float64

Symbol usage

Values are retrieved from Pandas objects using multi-axis indexing with the following symbols:

Object	Indexer	Return Type
Series	s.loc[indexer]	Scalar Value
DataFrame	df.loc[row_index,col_index]	Series Object
Panel	p.loc[item_index,major_index, minor_index]	p.loc[item_index,major_index, minor_index]

.iloc() and .ix() apply the same indexing options and return values.

Let's see how to perform each operation on a DataFrame object. We will use the basic index operator '[]'-

Instance 1

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　print(df['A'])

Running Results:

　　0　　-0.478893
1　　　0.391931
2　　　0.336825
3　　-1.055102
4　　-0.165218
5　　-0.328641
6　　　0.567721
7　　-0.759399
Name:　A,　dtype:　float64

We can pass a list of values to [] to select those columns

Instance 2

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　print(df[['A','B']])

Running Results:

　　　　　　　　　A　　　　　　　　　　　B
0　　-0.478893　　　-0.606311
1　　　0.391931　　　-0.949025
2　　　0.336825　　　　0.093717
3　　-1.055102　　　-0.012944
4　　-0.165218　　　　1.550310
5　　-0.328641　　　-0.226363
6　　　0.567721　　　-0.312585
7　　-0.759399　　　-0.372696

Instance 3

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　print(df[2:2])

Running Results:

　　　Columns:　[A,　B,　C,　D]
　Index:　[]

Attribute Access

You can use the attribute operator “.” to select columns.

Instance

Example

import　pandas　as　pd
　import　numpy　as　np
　df　=　pd.DataFrame(np.random.randn(8,　4),　columns　=　['A',　'B',　'C',　'D'])
　print(df.A)

Running Results:

　　0　　　-0.478893
1　　　　0.391931
2　　　　0.336825
3　　　-1.055102
4　　　-0.165218
5　　　-0.328641
6　　　　0.567721
7　　　-0.759399
Name:　A,　dtype:　float64

SQL Operations in Pandas Custom Options in Pandas

Pandas tutorial

Indexing and Data Querying in Pandas

.loc()

案例 1

Instance 2

Instance 3

Instance 4

Instance 5

.iloc()

Instance1

Instance 2

Instance 3

.ix()

Instance 1

Instance 2

Symbol usage

Instance 1

Instance 2

Instance 3

Attribute Access

Instance