SQL Operations in Pandas

Specific examples of Pandas SQL operations

Since many potential Pandas users are familiar with SQL, this page aims to provide some examples to illustrate how to perform various SQL operations using Pandas.

Example

　import pandas as pd
　url = 'https://raw.github.com/pandasdev/
　pandas/master/pandas/tests/data/tips.csv
　tips=pd.read_csv(url)
　print(tips.head())

The running result is as follows:

　　　　　　total_bill    tip    sex    smoker    day    time    size
0　　　　　　　　16.99　　1.01　　　Female    No    Sun    Dinner　　　　　　2
1　　　　　　　　10.34　　1.66　　　　　Male    No    Sun    Dinner　　　　　　3
2　　　　　　　　21.01　　3.50    Male    No    Sun    Dinner　　　　　　3
3　　　　　　　　23.68　　3.31　　　　　Male    No    Sun    Dinner　　　　　　2
4　　　　　　　　24.59　　3.61　　　Female    No    Sun    Dinner　　　　　　4

query

In SQL, selection is done using a comma-separated list of the columns you have selected (or using*to select all columns):

　SELECT total_bill, tip, smoker, time
　from tips
　LIMIT　5;

Column selection can be completed by passing a list of column names to DataFrame using Pandas:

　tips[['total_bill', 'tip', 'smoker', 'time']].head(5)

Let's look at a complete example:

Example

　import pandas as pd
　url = 'https://raw.github.com/pandasdev/
　pandas/master/pandas/tests/data/tips.csv
　　
　tips=pd.read_csv(url)
　print(tips[['total_bill', 'tip', 'smoker', 'time']].head(5)

The running result is as follows:

　　　total_bill    tip    smoker    time
0　　　　　　　16.99　　1.01　　　　　　No    Dinner
1　　　　　　　10.34　　1.66　　　　　　No    Dinner
2　　　　　　　21.01　　3.50    No    Dinner
3　　　　　　　23.68　　3.31　　　　　　No    Dinner
4　　　　　　　24.59　　3.61　　　　　　No    Dinner

Calling a DataFrame without a column name list will display all columns (similar to the comma-separated list of columns in SQL's*).

WHERE clause query

Filtering in SQL is done through the WHERE clause.

　SELECT　*　from tips WHERE time = 'Dinner' LIMIT　5;

DataFrame can be filtered in many ways. The most intuitive method is to use boolean indexing.

　tips[tips['time'] == 'Dinner'].head(5)

Let's look at a complete example

Example

　import pandas as pd
　url = 'https://raw.github.com/pandasdev/
　pandas/master/pandas/tests/data/tips.csv
　tips=pd.read_csv(url)
　print(tips[tips['time'] == 'Dinner']).head(5)

The running result is as follows:

　　　　　total_bill    tip    sex    smoker    day    time    size
0　　　　　　　16.99　　1.01　　　Female    No    Sun    Dinner　　　　2
1　　　　　　　10.34　　1.66　　　　　Male    No    Sun    Dinner　　　　3
2　　　　　　　21.01　　3.50    Male    No    Sun    Dinner　　　　3
3　　　　　　　23.68　　3.31　　　　　Male    No    Sun    Dinner　　　　2
4　　　　　　　24.59　　3.61　　　Female    No    Sun    Dinner　　　　4

The above statement will generate a series of True / Pass a False object to DataFrame and return all rows with True.

GroupBy grouping

This operation retrieves the number of records in each group of the entire dataset. For example, query the gender grouping and count:

　SELECT sex, count(*)
　from tips
　GROUP BY sex;

The operation in Pandas is as follows:

　tips.groupby('sex').size()

Let's look at a complete example

Example

　import pandas as pd
　url = 'https://raw.github.com/pandasdev/
　pandas/master/pandas/tests/data/tips.csv
　tips=pd.read_csv(url)
　print tips.groupby('sex').size()

The running result is as follows:

　sex
　Female　87
　Male　157
　dtype: int64

Query the number of rows

SQL uses LIMIT to return N rows:

　SELECT　*　from tips
　LIMIT　5　;

Operations in Pandas are as follows:

　tips.head(5)

Let's look at a complete example

Example

　import pandas as pd
　url = 'https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv
　tips=pd.read_csv(url)
　tips = tips[['smoker', 'day', 'time']].head(5)
　print tips

The running result is as follows:

　　　　　smoker day time
0 No Sun Dinner
1　　　　　　No Sun Dinner
2　　　　　　No Sun Dinner
3　　　　　　No Sun Dinner
4　　　　　　No Sun Dinner

Points to Note in Pandas

Pandas Tutorial

SQL Operations in Pandas

query

WHERE clause query

GroupBy grouping

Query the number of rows