SQL Operations in Pandas Visualization in Pandas

IO Operations in Pandas

Pandas IO operation example

The two main functions for reading text files in Pandas are read_csv() and read_table(). They both use the same parsing code to intelligently convert table data into DataFrame objects:

　pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer',
　names=None, index_col=None, usecols=None

　pandas.read_csv(filepath_or_buffer, sep='\t', delimiter=None, header='infer',
　names=None, index_col=None, usecols=None

Save this data as temp.csv and perform operations on it.

　S.No,Name,Age,City,Salary
　1,Tom,28,Toronto,20000
　2,Lee,32,HongKong,3000
　3,Steven,43,Bay Area,8300
　4,Ram,38,Hyderabad,3900

read.csv

read.csv reads data from a csv file and creates a DataFrame object.

Example

　import pandas as pd
　df = pd.read_csv("temp.csv")
　print df

The running results are as follows:

S.No		Name	Age		City		Salary
0　　　　　1　　　　　　Tom　　　　28　　　　Toronto　　　　20000
1　　　　　2　　　　　　Lee　　　　32　　　Hong Kong　　　　　3000
2　　　　　3　　　Steven　　　　43　　　Bay Area　　　　　8300
3　　　　　4　　　　　　Ram　　　　38　　Hyderabad　　　　　3900

Custom Indexing

This specifies a column in the csv file to use index_col for custom indexing.

Example

　import pandas as pd
　df = pd.read_csv("temp.csv", index_col=['S.No'])
　print df

The running results are as follows:

S.No	Name	Age		City		Salary
1　　　　　　　Tom　　　　28　　　　Toronto　　　　20000
2　　　　　　　Lee　　　　32　　　Hong Kong　　　　　3000
3　　　　Steven　　　　43　　　Bay Area　　　　　8300
4　　　　　　　Ram　　　　38　　Hyderabad　　　　　3900

Converter

The dtype of the column can be passed as a dict.

Example

　import pandas as pd
　df = pd.read_csv("temp.csv", dtype={'Salary': np.float}64)
　print df.dtypes

The running results are as follows:

S.No		int64
Name		object
Age		int64
City		object
Salary	float64
dtype: object

By default, the dtype of the Salary column is int, but it is displayed as float because we have explicitly converted the type. Therefore, the data looks like float.

Thus, the data looks like float −

　　　S.No	Name	Age		City		Salary
0　　　1　　　　　Tom　　　28　　　　Toronto　　　20000.0
1　　　2　　　　　Lee　　　32　　　Hong Kong　　　　3000.0
2　　　3　　Steven　　　43　　　Bay Area　　　　8300.0
3　　　4　　　　　Ram　　　38　　Hyderabad　　　　3900.0

Header Names

Use the names parameter to specify the header names.

Example

　import pandas as pd
　　
　df = pd.read_csv("temp.csv", names=['a', 'b', 'c', 'd', 'e'])
　print df

The running results are as follows:

　　　a  b  c  d  e
0  S.No  Name  Age  City  Salary
1　　　　　　1　　　　　　Tom　　　28　　　　　Toronto　　　　20000
2　　　　　　2　　　　　　Lee　　　32　　　　Hong Kong　　　　　3000
3　　　　　　3　　　Steven　　　43　　　　Bay Area　　　　　8300
4　　　　　　4　　　　　　Ram　　　38　　　Hyderabad　　　　　3900

Please note that custom names are appended to the header names, but the headers in the file have not been removed yet. Now, we use the header parameter to remove it.

If the title is not in the first row, pass the row number to the title. This will skip the previous rows.

Example

　import pandas as pd　
　df = pd.read_csv("temp.csv", names=['a', 'b', 'c', 'd', 'e'], header=0)
　print df

The running results are as follows:

　　a  b  c  d  e
0  S.No  Name  Age  City  Salary
1　　　　　1　　　　　　Tom　　　28　　　　　Toronto　　　　20000
2　　　　　2　　　　　　Lee　　　32　　　　Hong Kong　　　　　3000
3　　　　　3　　　Steven　　　43　　　　Bay Area　　　　　8300
4　　　　　4　　　　　　Ram　　　38　　　Hyderabad　　　　　3900

skiprows

skiprows skips the specified number of rows.

Example

　import pandas as pd
　df = pd.read_csv("temp.csv", skiprows=2)
　print df

The running results are as follows:

　　　　2　　　　　　Lee　　　32　　　　Hong Kong　　　3000
0　　　3　　　Steven　　　43　　　　Bay Area　　　8300
1　　　4　　　　　　Ram　　　38　　　Hyderabad　　　3900

SQL Operations in Pandas Visualization in Pandas

Pandas tutorial

IO Operations in Pandas

read.csv

Custom Indexing

Converter

Header Names

skiprows