English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Sorting in Pandas

There are two ways to sort in Pandas:

Sort by label Sort by actual value

Let's look at the following example.

 import pandas as pd
 import numpy as np
 unsorted_df=pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
 mns=['col2','col1'])
 print(unsorted_df)

Running Result:

    col2       col1
1  -2.063177   0.537527
4   0.142932  -0.684884
6   0.012667  -0.389340
2  -0.548797   1.848743
3  -1.044160   0.837381
5   0.385605   1.300185
9   1.031425  -1.002967
8  -0.407374  -0.435142
0   2.237453  -1.067139
7  -1.445831  -1.701035

In unsorted_df, the labels and values are not sorted. Let's see how to sort them.

Sort by label

Using the sort_index() method, you can sort a DataFrame by passing the axis parameter and the sorting order. By default, the row labels are sorted in ascending order.

 import pandas as pd
 import numpy as np
 unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
    mns = ['col2','col1'])
 sorted_df=unsorted_df.sort_index()
 print(sorted_df)

Running Result:

     col2        col1
9    0.825697    0.374463
8   -1.699509    0.510373
7   -0.581378    0.622958
6   -0.202951    0.954300
5   -1.289321   -1.551250
4    1.302561    0.851385
3   -0.157915   -0.388659
2   -1.222295    0.166609
1    0.584890   -0.291048
0    0.668444   -0.061294

Sorting order

By passing a boolean value to the ascending parameter, you can control the order of sorting. Let's consider the following example to understand the same situation.

 import pandas as pd
 import numpy as np
 unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
    mns = ['col2','col1'])
 sorted_df = unsorted_df.sort_index(ascending=False)
 print(sorted_df)

Running Result:

     col2        col1
9    0.825697    0.374463
8   -1.699509    0.510373
7   -0.581378    0.622958
6   -0.202951    0.954300
5   -1.289321   -1.551250
4    1.302561    0.851385
3   -0.157915   -0.388659
2   -1.222295    0.166609
1    0.584890   -0.291048
0    0.668444   -0.061294

sort by row

By passing the axis parameter to 0 or1which can be sorted by column labels. By default, axis = 0 sorts by rows. Let's consider the following example to understand the same situation.

 import pandas as pd
 import numpy as np
  
 unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],colu
    mns = ['col2','col1'])
  
 sorted_df = unsorted_df.sort_index(axis=1)
 print(sorted_df)

Running Result:

     col1        col2
1   -0.291048    0.584890
4    0.851385    1.302561
6    0.954300   -0.202951
2    0.166609   -1.222295
3   -0.388659   -0.157915
5   -1.551250   -1.289321
9    0.374463    0.825697
8    0.510373   -1.699509
0   -0.061294    0.668444
7    0.622958   -0.581378

Sorting by Value

Similar to index sorting, sort_values() is a method for sorting by value. It accepts a 'by' parameter that uses the column name of the DataFrame to be sorted by the value.

import pandas as pd
 import numpy as np
 unsorted_df = pd.DataFrame({'col1':'[2,1,1,1],'col2':'[1,3,2,4'])
    sorted_df = unsorted_df.sort_values(by='col1')
 print(sorted_df)

Running Result:

   col1  col2
1    1    3
2    1    2
3    1    4
0    2    1

Note that col1values are sorted, and the corresponding col2values and row indices will be associated with col1together. Therefore, they do not look classified.

'by' The parameters adopt a list of column values.

 import pandas as pd
 import numpy as np
 unsorted_df = pd.DataFrame({'col1':'[2,1,1,1],'col2':'[1,3,2,4'])
    sorted_df = unsorted_df.sort_values(by=['col1','col2'])
 print(sorted_df)

Running Result:

  col1 col2
2   1   2
1   1   3
3   1   4
0   2   1

Sorting Algorithms

sort_values() Specified the selection of algorithms from mergesort, heapsort, and quicksort.

 import pandas as pd
 import numpy as np
 unsorted_df = pd.DataFrame({'col1':'[2,1,1,1],'col2':'[1,3,2,4'])
 sorted_df = unsorted_df.sort_values(by='col1''', kind='mergesort')
 print(sorted_df)

Running Result:

  col1 col2
1    1    3
2    1    2
3    1    4
0    2    1