English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Specific examples of Pandas SQL operations
Since many potential Pandas users are familiar with SQL, this page aims to provide some examples to illustrate how to perform various SQL operations using Pandas.
import pandas as pd url = 'https://raw.github.com/pandasdev/ pandas/master/pandas/tests/data/tips.csv tips=pd.read_csv(url) print(tips.head())
The running result is as follows:
total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4
In SQL, selection is done using a comma-separated list of the columns you have selected (or using*to select all columns):
SELECT total_bill, tip, smoker, time from tips LIMIT 5;
Column selection can be completed by passing a list of column names to DataFrame using Pandas:
tips[['total_bill', 'tip', 'smoker', 'time']].head(5)
Let's look at a complete example:
import pandas as pd url = 'https://raw.github.com/pandasdev/ pandas/master/pandas/tests/data/tips.csv tips=pd.read_csv(url) print(tips[['total_bill', 'tip', 'smoker', 'time']].head(5)
The running result is as follows:
total_bill tip smoker time 0 16.99 1.01 No Dinner 1 10.34 1.66 No Dinner 2 21.01 3.50 No Dinner 3 23.68 3.31 No Dinner 4 24.59 3.61 No Dinner
Calling a DataFrame without a column name list will display all columns (similar to the comma-separated list of columns in SQL's*).
Filtering in SQL is done through the WHERE clause.
SELECT * from tips WHERE time = 'Dinner' LIMIT 5;
DataFrame can be filtered in many ways. The most intuitive method is to use boolean indexing.
tips[tips['time'] == 'Dinner'].head(5)
Let's look at a complete example
import pandas as pd url = 'https://raw.github.com/pandasdev/ pandas/master/pandas/tests/data/tips.csv tips=pd.read_csv(url) print(tips[tips['time'] == 'Dinner']).head(5)
The running result is as follows:
total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4
The above statement will generate a series of True / Pass a False object to DataFrame and return all rows with True.
This operation retrieves the number of records in each group of the entire dataset. For example, query the gender grouping and count:
SELECT sex, count(*) from tips GROUP BY sex;
The operation in Pandas is as follows:
tips.groupby('sex').size()
Let's look at a complete example
import pandas as pd url = 'https://raw.github.com/pandasdev/ pandas/master/pandas/tests/data/tips.csv tips=pd.read_csv(url) print tips.groupby('sex').size()
The running result is as follows:
sex Female 87 Male 157 dtype: int64
SQL uses LIMIT to return N rows:
SELECT * from tips LIMIT 5 ;
Operations in Pandas are as follows:
tips.head(5)
Let's look at a complete example
import pandas as pd url = 'https://raw.github.com/pandas-dev/pandas/master/pandas/tests/data/tips.csv tips=pd.read_csv(url) tips = tips[['smoker', 'day', 'time']].head(5) print tips
The running result is as follows:
smoker day time 0 No Sun Dinner 1 No Sun Dinner 2 No Sun Dinner 3 No Sun Dinner 4 No Sun Dinner