SQL Operations in Pandas GroupBy in Pandas

Concatenation in Pandas

Pandas join operation example

Pandas has a comprehensive set of high-performance in-memory join operations that are very similar to those in relational databases such as SQL.
Pandas provides a single function merge as the entry point for all standard database join operations between DataFrame objects

　pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None,
　left_index=False, right_index=False, sort=True)

Here, we use the following parameters:

left − A DataFrame object. right − Another DataFrame object. on − The column (name) is added on top. It must be found in both the left and right DataFrame objects. left_on − The columns of the left DataFrame are used as keys. They can be column names or arrays of length equal to the length of the DataFrame. right_on − The columns of the right DataFrame are used as keys. They can be column names or arrays of length equal to the length of the DataFrame. left_index − If True, the index (row label) of the left DataFrame is used as its connection key. If the DataFrame has a MultiIndex (hierarchical), the number of levels must match the number of connection keys in the right DataFrame. right_index − The same usage as left_index for the correct data frame. how − One of “left”, “right”, “outer”, “inner”. The default is internal. Each method is described below. sort − The sorted result data frame adds the dictionary order key. By default, it is set to True, and setting it to False in many cases will greatly improve performance.

Now let's create two different DataFrames and perform merge operations on them.

Example

# import the pandas library
　import pandas as pd
　left = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
　　　　'subject_id':['sub1','sub2','sub4','sub6','sub5']})
　right = pd.DataFrame(
　　　　{'id': [1,2,3,4,5,
　　　　'],
　　　　'subject_id':['sub2','sub4','sub3','sub6','sub5})
　print(left
　print(right)

The running result is as follows:

　　　　　　Name　　id　　　subject_id
0　　　Alex　　　1　　　　　　　　　sub1
1　　　　Amy　　　2　　　　　　　　　sub2
2　　Allen　　　3　　　　　　　　　sub4
3　　Alice　　　4　　　　　　　　　sub6
4　　Ayoung　　5　　　　　　　　　sub5
　　　　Name　　id　　　subject_id
0　　Billy　　　1　　　　　　　　　sub2
1　　Brian　　　2　　　　　　　　　sub4
2　　Bran　　　　3　　　　　　　　　sub3
3　　Bryce　　　4　　　　　　　　　sub6
4　　Betty　　　5　　　　　　　　　sub5

Merging two dataframes on a single key

Example

　import pandas as pd
　left = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
　　　　'subject_id':['sub1','sub2','sub4','sub6','sub5']})
　right = pd.DataFrame({
　']1,2,3,4,5,
　　　　'],
　　　　'subject_id':['sub2','sub4','sub3','sub6','sub5']})
　print(pd.merge(left, right, on='id'))

The running result is as follows:

　　　　　Name　　id　　subject_id_x　　　Name_y　　　subject_id_y
0　　Alex　　　　　　1　　　　　　　　　　sub1　　　　Billy　　　　　　　　　　　sub2
1　　Amy　　　　　　　2　　　　　　　　　　sub2　　　　Brian　　　　　　　　　　　sub4
2　　Allen　　　　　3　　　　　　　　　　sub4　　　　　Bran　　　　　　　　　　　sub3
3　　Alice　　　　　4　　　　　　　　　　sub6　　　　Bryce　　　　　　　　　　　sub6
4　　Ayoung　　　　5　　　　　　　　　　sub5　　　　Betty　　　　　　　　　　　sub5

Merging two dataframes on multiple keys

Example

　import pandas as pd
　left = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
　　　　'subject_id':['sub1','sub2','sub4','sub6','sub5']})
　right = pd.DataFrame({
　']1,2,3,4,5,
　　　　'],
　　　　'subject_id':['sub2','sub4','sub3','sub6','sub5']})
　print(pd.merge(left, right, on=['id', 'subject_id']))

The running result is as follows:

　　　　　　Name_x　　　id　　　subject_id　　　Name_y
0　　　　Alice　　　　4　　　　　　　　　sub6　　　　Bryce
1　　　Ayoung　　　　5　　　　　　　　　sub5　　　　Betty

Merging using the 'how' parameter

The 'how' parameter of merge specifies how to determine which keys to include in the result table. If the combination key does not appear in either the left or right table, the value in the join table is NA.

Here is a summary of how to choose and their SQL equivalent names:

Merge Method	SQL Equivalent	Description
left	LEFT OUTER JOIN	Using the key of the left object
right	RIGHT OUTER JOIN	Using the correct object's key
outer	FULL OUTER JOIN	Using combined keys
inner	INNER JOIN	Using the intersection of keys

Left Join

Example

　import pandas as pd
　left = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
　　　　'subject_id':['sub1','sub2','sub4','sub6','sub5']})
　right = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'],
　　　　'subject_id':['sub2','sub4','sub3','sub6','sub5']})
　print(pd.merge(left, right, on='subject_id', how='left'))

The running result is as follows:

　　　　　　Name_x　id_x　subject_id　Name_y　id_y
0　　Alex　　　　　　1　　　　　　　　　sub1　　　　　　NaN　　NaN
1　　　　　　Amy　　　　　　2　　　　　　　　　sub2　　　　Billy　　　　1.0
2　　　　Allen　　　　　　3　　　　　　　　　sub4　　　　Brian　　　　2.0
3　　　　Alice　　　　　　4　　　　　　　　　sub6　　　　Bryce　　　　4.0
4　　　Ayoung　　　　　　5　　　　　　　　　sub5　　　　Betty　　　　5.0

Right Join

Example

　import pandas as pd
　left = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
　　　　'subject_id':['sub1','sub2','sub4','sub6','sub5']})
　right = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'],
　　　　'subject_id':['sub2','sub4','sub3','sub6','sub5']})
　print(pd.merge(left, right, on='subject_id', how='right'))

The running result is as follows:

　　　　　　Name_x　　id_x　　　subject_id　　　Name_y　　　id_y
0　　Amy　　　2.0　　sub2　　　　Billy　　　　　　1
1　　　　Allen　　　3.0　　sub4　　　　Brian　　　　　　2
2　　　　Alice　　　4.0　　sub6　　　　Bryce　　　　　　4
3　　　Ayoung　　　5.0　　sub5　　　　Betty　　　　　　5
4　　　　　　NaN　NaN　　sub3　　　　　Bran　　　　　　3

Outer Join

Example

　import pandas as pd
　left = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
　　　　'subject_id':['sub1','sub2','sub4','sub6','sub5']})
　right = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'],
　　　　'subject_id':['sub2','sub4','sub3','sub6','sub5']})
　print(pd.merge(left, right, how='outer', on='subject_id'))

The running result is as follows:

　　　　　　Name_x　　id_x　　　subject_id　　　Name_y　　　id_y
0　　Alex　　　1.0　　sub1　　　　　　NaN　　NaN
1　　　　　　Amy　　　2.0　　sub2　　　　Billy　　　　1.0
2　　　　Allen　　　3.0　　sub4　　　　Brian　　　　2.0
3　　　　Alice　　　4.0　　sub6　　　　Bryce　　　　4.0
4　　　Ayoung　　　5.0　　sub5　　　　Betty　　　　5.0
5　　　　　　NaN　NaN　　sub3　　　　　Bran　　　　3.0

Inner Join

The join operation is performed on the index. The join operation accepts the object it calls. Therefore, a.join(b) is not equal to b.join(a).

Example

　import pandas as pd
　left = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
　　　　'subject_id':['sub1','sub2','sub4','sub6','sub5']})
　right = pd.DataFrame({
　　　　']1,2,3,4,5,
　　　　'],
　　　　'subject_id':['sub2','sub4','sub3','sub6','sub5']})
　print(pd.merge(left, right, on='subject_id', how='inner'))

The running result is as follows:

　　　　　　Name_x　id_x　subject_id　Name_y　id_y
0　　Amy　　　　　　2　　　　　　　　　sub2　　　　Billy　　　　　　1
1　　　　Allen　　　　　　3　　　　　　　　　sub4　　　　Brian　　　　　　2
2　　　　Alice　　　　　　4　　　　　　　　　sub6　　　　Bryce　　　　　　4
3　　　Ayoung　　　　　　5　　　　　　　　　sub5　　　　Betty　　　　　　5

SQL Operations in Pandas GroupBy in Pandas

Pandas tutorial

Concatenation in Pandas

Merging two dataframes on a single key

Merging two dataframes on multiple keys

Merging using the 'how' parameter

Left Join

Right Join

Outer Join

Inner Join