English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Pandas IO operation example
The two main functions for reading text files in Pandas are read_csv() and read_table(). They both use the same parsing code to intelligently convert table data into DataFrame objects:
pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None
pandas.read_csv(filepath_or_buffer, sep='\t', delimiter=None, header='infer', names=None, index_col=None, usecols=None
Save this data as temp.csv and perform operations on it.
S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900
read.csv reads data from a csv file and creates a DataFrame object.
import pandas as pd df = pd.read_csv("temp.csv") print df
The running results are as follows:
S.No Name Age City Salary 0 1 Tom 28 Toronto 20000 1 2 Lee 32 Hong Kong 3000 2 3 Steven 43 Bay Area 8300 3 4 Ram 38 Hyderabad 3900
This specifies a column in the csv file to use index_col for custom indexing.
import pandas as pd df = pd.read_csv("temp.csv", index_col=['S.No']) print df
The running results are as follows:
S.No Name Age City Salary 1 Tom 28 Toronto 20000 2 Lee 32 Hong Kong 3000 3 Steven 43 Bay Area 8300 4 Ram 38 Hyderabad 3900
The dtype of the column can be passed as a dict.
import pandas as pd df = pd.read_csv("temp.csv", dtype={'Salary': np.float}64) print df.dtypes
The running results are as follows:
S.No int64 Name object Age int64 City object Salary float64 dtype: object
By default, the dtype of the Salary column is int, but it is displayed as float because we have explicitly converted the type. Therefore, the data looks like float.
Thus, the data looks like float −
S.No Name Age City Salary 0 1 Tom 28 Toronto 20000.0 1 2 Lee 32 Hong Kong 3000.0 2 3 Steven 43 Bay Area 8300.0 3 4 Ram 38 Hyderabad 3900.0
Use the names parameter to specify the header names.
import pandas as pd df = pd.read_csv("temp.csv", names=['a', 'b', 'c', 'd', 'e']) print df
The running results are as follows:
a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 Hong Kong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900
Please note that custom names are appended to the header names, but the headers in the file have not been removed yet. Now, we use the header parameter to remove it.
If the title is not in the first row, pass the row number to the title. This will skip the previous rows.
import pandas as pd df = pd.read_csv("temp.csv", names=['a', 'b', 'c', 'd', 'e'], header=0) print df
The running results are as follows:
a b c d e 0 S.No Name Age City Salary 1 1 Tom 28 Toronto 20000 2 2 Lee 32 Hong Kong 3000 3 3 Steven 43 Bay Area 8300 4 4 Ram 38 Hyderabad 3900
skiprows skips the specified number of rows.
import pandas as pd df = pd.read_csv("temp.csv", skiprows=2) print df
The running results are as follows:
2 Lee 32 Hong Kong 3000 0 3 Steven 43 Bay Area 8300 1 4 Ram 38 Hyderabad 3900