English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Reindexing in Pandas

Rebuild Index It will change the row labels and column labels of the DataFrame. Reindexing is to make the data match a set of given labels on a specific axis.

Multiple operations can be completed through indexing, such as-

Reorder existing data to match a set of new labels.Insert missing value (NA) markers in the label data at the positions where labels do not exist.

Example:

 import pandas as pd
 import numpy as np
 N=20
 df = pd.DataFrame({
    'A': pd.date_range(start='2016-01-01, periods=N, freq='D'),
    'x': np.linspace(0, stop=N-1, num=N),
    'y': np.random.rand(N),
    'C': np.random.choice(['Low','Medium','High'], N).tolist(),
    'D': np.random.normal(100, 10, size=(N)).tolist()
 )
 # DataFrame reindexing
 df_reindexed = df.reindex(index=[02,5, columns=['A', 'C', 'B'])
 print(df_reindexed)

Running Results:

           A    C     B
0  2016-01-01  Low   NaN
2  2016-01-03  High  NaN
5  2016-01-06  Low   NaN

Reindex to align with other objects

You may want to get an object and reindex its axis to make it marked as the same as another object. Consider the following example to understand the same content.

Example

 import pandas as pd
 import numpy as np
 df1 = pd.DataFrame(np.random.randn(10,3),columns=['col1','col2','col3'])
 df2 = pd.DataFrame(np.random.randn(7,3),columns=['col1','col2','col3'])
 df1 = df1.reindex_like(df2)
 print(df1)

Running Results:

         col1         col2         col3
0    -2.467652    -1.211687    -0.391761
1    -0.287396     0.522350     0.562512
2    -0.255409    -0.483250     1.866258
3    -1.150467    -0.646493    -0.222462
4     0.152768    -2.056643     1.877233
5    -1.155997     1.528719    -1.343719
6    -1.015606    -1.245936    -0.295275

Here, df1 DataFrame like df2It is changed and reindexed in the same way. The column names should match, otherwise NAN will be added to the entire column label.

Fill when reindexing

reindex() With the optional parameter method, which is a filling method with the following values

pad/ffill − Fill forward value

bfill/backfill − Fill backward value

nearest − Fill from the nearest index value

Example

 import pandas as pd
 import numpy as np
 df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
 df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
 # Fill NAN
 print df2.reindex_like(df1)
 # Fill NAN with the previous value
 print("DataFrame with forward fill:")
 print(df2.reindex_like(df1, method='ffill'))

Running Results:

         col1        col2       col3
0    1.311620   -0.707176   0.599863
1   -0.423455   -0.700265   1.133371
2         NaN         NaN        NaN
3         NaN         NaN        NaN
4         NaN         NaN        NaN
5         NaN         NaN        NaN
DataFrame with forward fill:
         col1        col2        col3
0    1.311620   -0.707176    0.599863
1   -0.423455   -0.700265    1.133371
2   -0.423455   -0.700265    1.133371
3   -0.423455   -0.700265    1.133371
4   -0.423455   -0.700265    1.133371
5   -0.423455   -0.700265    1.133371

The last four lines are filled.

Fill limit when reindexing

The limit parameter provides additional control for filling when reindexing. It specifies the maximum number of consecutive matches. Let's consider the following example to understand the same content-

Example

 import pandas as pd
 import numpy as np
  
 df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
 df2 = pd.DataFrame(np.random.randn(2,3),columns=['col1','col2','col3'])
 # Fill NAN
 print df2.reindex_like(df1)
 # Fill NAN with the previous value1.DataFrame:")
 print(df2.reindex_like(df1,method='ffill',limit=1))

Running Results:

         col1        col2        col3
0    0.247784    2.128727    0.702576
1   -0.055713   -0.021732   -0.174577
2         NaN         NaN         NaN
3         NaN         NaN         NaN
4         NaN         NaN         NaN
5         NaN         NaN         NaN
Forward fill limit is1DataFrame:
         col1        col2        col3
0    0.247784    2.128727    0.702576
1   -0.055713   -0.021732   -0.174577
2   -0.055713   -0.021732   -0.174577
3         NaN         NaN         NaN
4         NaN         NaN         NaN
5         NaN         NaN         NaN

Note that the sixth line above only filled the seventh line. Then, each row remains the same.

Renaming

Through the rename() method, you can re-label the axes based on some mapping (dictionary or series) or any function.  
Let's consider the following example to understand this-

  import pandas as pd
 import numpy as np
 df1 = pd.DataFrame(np.random.randn(6,3),columns=['col1','col2','col3'])
 print df1
 print ("After renaming rows and columns:")
 print(df1.rename(columns={'col1' : 'c1', 'col2' : 'c2'},
 index = {0 : 'apple', 1 : 'banana', 2 : 'durian'}))

Running Results:

         col1        col2        col3
0    0.486791    0.105759    1.540122
1   -0.990237    1.007885   -0.217896
2   -0.483855   -1.645027   -1.194113
3   -0.122316    0.566277   -0.366028
4   -0.231524   -0.721172   -0.112007
5    0.438810    0.000225    0.435479
After renaming rows and columns:
                c1          c2        col3
apple     0.486791    0.105759    1.540122
banana   -0.990237    1.007885   -0.217896
durian   -0.483855   -1.645027   -1.194113
3        -0.122316    0.566277   -0.366028
4        -0.231524   -0.721172   -0.112007
5         0.438810    0.000225    0.435479