SQL Operations in Pandas Application of Functions in Pandas

Reindexing in Pandas

Rebuild Index It will change the row labels and column labels of the DataFrame. Reindexing is to make the data match a set of given labels on a specific axis.

Multiple operations can be completed through indexing, such as-

Reorder existing data to match a set of new labels.Insert missing value (NA) markers in the label data at the positions where labels do not exist.

Example:

Example

　import　pandas　as　pd
　import　numpy　as　np
　N=20
　df = pd.DataFrame({
　　　　'A': pd.date_range(start='2016-01-01, periods=N, freq='D'),
　　　　'x': np.linspace(0, stop=N-1, num=N),
　　　　'y': np.random.rand(N),
　　　　'C': np.random.choice(['Low','Medium','High'], N).tolist(),
　　　　'D': np.random.normal(100,　10, size=(N)).tolist()
　)
　# DataFrame reindexing
　df_reindexed = df.reindex(index=[02,5, columns=['A', 'C', 'B'])
　print(df_reindexed)

Running Results:

　　　　　　　　　　　A　　　　C　　　　　B
0　　2016-01-01　　Low　　　NaN
2　　2016-01-03　　High　　NaN
5　　2016-01-06　　Low　　　NaN

Reindex to align with other objects

You may want to get an object and reindex its axis to make it marked as the same as another object. Consider the following example to understand the same content.

Example

　import　pandas　as　pd
　import　numpy　as　np
　df1　=　pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
　df2　=　pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
　df1　= df1.reindex_like(df2)
　print(df1)

Running Results:

　　　　　　　　　col1　　　　　　　　　col2　　　　　　　　　col3
0　　　　-2.467652　　　　-1.211687　　　　-0.391761
1　　　　-0.287396　　　　　0.522350　　　　　0.562512
2　　　　-0.255409　　　　-0.483250　　　　　1.866258
3　　　　-1.150467　　　　-0.646493　　　　-0.222462
4　　　　　0.152768　　　　-2.056643　　　　　1.877233
5　　　　-1.155997　　　　　1.528719　　　　-1.343719
6　　　　-1.015606　　　　-1.245936　　　　-0.295275

Here, df1 DataFrame like df2It is changed and reindexed in the same way. The column names should match, otherwise NAN will be added to the entire column label.

Fill when reindexing

reindex() With the optional parameter method, which is a filling method with the following values

pad/ffill − Fill forward value

bfill/backfill − Fill backward value

nearest − Fill from the nearest index value

Example

　import　pandas　as　pd
　import　numpy　as　np
　df1　=　pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
　df2　=　pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
　# Fill NAN
　print　df2.reindex_like(df1)
　# Fill NAN with the previous value
　print("DataFrame with forward fill:")
　print(df2.reindex_like(df1, method='ffill'))

Running Results:

　　　　　　　　　col1　　　　　　　　col2　　　　　　　col3
0　　　　1.311620　　　-0.707176　　　0.599863
1　　　-0.423455　　　-0.700265　　　1.133371
2　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　NaN
3　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　NaN
4　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　NaN
5　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　NaN
DataFrame with forward fill:
　　　　　　　　　col1　　　　　　　　col2　　　　　　　　col3
0　　　　1.311620　　　-0.707176　　　　0.599863
1　　　-0.423455　　　-0.700265　　　　1.133371
2　　　-0.423455　　　-0.700265　　　　1.133371
3　　　-0.423455　　　-0.700265　　　　1.133371
4　　　-0.423455　　　-0.700265　　　　1.133371
5　　　-0.423455　　　-0.700265　　　　1.133371

The last four lines are filled.

Fill limit when reindexing

The limit parameter provides additional control for filling when reindexing. It specifies the maximum number of consecutive matches. Let's consider the following example to understand the same content-

Example

　import　pandas　as　pd
　import　numpy　as　np
　　
　df1　=　pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
　df2　=　pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
　# Fill NAN
　print　df2.reindex_like(df1)
　# Fill NAN with the previous value1.DataFrame:")
　print(df2.reindex_like(df1,method='ffill',limit=1))

Running Results:

　　　　　　　　　col1　　　　　　　　col2　　　　　　　　col3
0　　　　0.247784　　　　2.128727　　　　0.702576
1　　　-0.055713　　　-0.021732　　　-0.174577
2　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　　NaN
3　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　　NaN
4　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　　NaN
5　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　　NaN
Forward fill limit is1DataFrame:
　　　　　　　　　col1　　　　　　　　col2　　　　　　　　col3
0　　　　0.247784　　　　2.128727　　　　0.702576
1　　　-0.055713　　　-0.021732　　　-0.174577
2　　　-0.055713　　　-0.021732　　　-0.174577
3　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　　NaN
4　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　　NaN
5　　　　　　　　　NaN　　　　　　　　　NaN　　　　　　　　　NaN

Note that the sixth line above only filled the seventh line. Then, each row remains the same.

Renaming

Through the rename() method, you can re-label the axes based on some mapping (dictionary or series) or any function. 　
Let's consider the following example to understand this-

Example

　　import　pandas　as　pd
　import　numpy　as　np
　df1　=　pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
　print　df1
　print　("After renaming rows and columns:")
　print(df1.rename(columns={'col1'　:　'c1',　'col2'　:　'c2'},
　index　=　{0　:　'apple',　1　:　'banana',　2　:　'durian'}))

Running Results:

　　　　　　　　　col1　　　　　　　　col2　　　　　　　　col3
0　　　　0.486791　　　　0.105759　　　　1.540122
1　　　-0.990237　　　　1.007885　　　-0.217896
2　　　-0.483855　　　-1.645027　　　-1.194113
3　　　-0.122316　　　　0.566277　　　-0.366028
4　　　-0.231524　　　-0.721172　　　-0.112007
5　　　　0.438810　　　　0.000225　　　　0.435479
After renaming rows and columns:
　　　　　　　　　　　　　　　　c1　　　　　　　　　　c2　　　　　　　　col3
apple　　　　　0.486791　　　　0.105759　　　　1.540122
banana　　　-0.990237　　　　1.007885　　　-0.217896
durian　　　-0.483855　　　-1.645027　　　-1.194113
3　　　　　　　　-0.122316　　　　0.566277　　　-0.366028
4　　　　　　　　-0.231524　　　-0.721172　　　-0.112007
5　　　　　　　　　0.438810　　　　0.000225　　　　0.435479

SQL Operations in Pandas Application of Functions in Pandas

Pandas Tutorial

Reindexing in Pandas

Example:

Reindex to align with other objects

Example

Fill when reindexing

Example

Fill limit when reindexing

Example

Renaming