English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

IO Operations in Pandas

Pandas IO operation example

The two main functions for reading text files in Pandas are read_csv() and read_table(). They both use the same parsing code to intelligently convert table data into DataFrame objects:

 pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer',
 names=None, index_col=None, usecols=None
 pandas.read_csv(filepath_or_buffer, sep='\t', delimiter=None, header='infer',
 names=None, index_col=None, usecols=None

Save this data as temp.csv and perform operations on it.

 S.No,Name,Age,City,Salary
 1,Tom,28,Toronto,20000
 2,Lee,32,HongKong,3000
 3,Steven,43,Bay Area,8300
 4,Ram,38,Hyderabad,3900

read.csv

read.csv reads data from a csv file and creates a DataFrame object.

 import pandas as pd
 df = pd.read_csv("temp.csv")
 print df

The running results are as follows:

S.No		Name	Age		City		Salary
0     1      Tom    28    Toronto    20000
1     2      Lee    32   Hong Kong     3000
2     3   Steven    43   Bay Area     8300
3     4      Ram    38  Hyderabad     3900

Custom Indexing

This specifies a column in the csv file to use index_col for custom indexing.

 import pandas as pd
 df = pd.read_csv("temp.csv", index_col=['S.No'])
 print df

The running results are as follows:

S.No	Name	Age		City		Salary
1       Tom    28    Toronto    20000
2       Lee    32   Hong Kong     3000
3    Steven    43   Bay Area     8300
4       Ram    38  Hyderabad     3900

Converter

The dtype of the column can be passed as a dict.

 import pandas as pd
 df = pd.read_csv("temp.csv", dtype={'Salary': np.float}64)
 print df.dtypes

The running results are as follows:

S.No		int64
Name		object
Age		int64
City		object
Salary	float64
dtype: object

By default, the dtype of the Salary column is int, but it is displayed as float because we have explicitly converted the type. Therefore, the data looks like float.

Thus, the data looks like float −

   S.No	Name	Age		City		Salary
0   1     Tom   28    Toronto   20000.0
1   2     Lee   32   Hong Kong    3000.0
2   3  Steven   43   Bay Area    8300.0
3   4     Ram   38  Hyderabad    3900.0

Header Names

Use the names parameter to specify the header names.

 import pandas as pd
  
 df = pd.read_csv("temp.csv", names=['a', 'b', 'c', 'd', 'e'])
 print df

The running results are as follows:

   a  b  c  d  e
0  S.No  Name  Age  City  Salary
1      1      Tom   28     Toronto    20000
2      2      Lee   32    Hong Kong     3000
3      3   Steven   43    Bay Area     8300
4      4      Ram   38   Hyderabad     3900

Please note that custom names are appended to the header names, but the headers in the file have not been removed yet. Now, we use the header parameter to remove it.

If the title is not in the first row, pass the row number to the title. This will skip the previous rows.

 import pandas as pd 
 df = pd.read_csv("temp.csv", names=['a', 'b', 'c', 'd', 'e'], header=0)
 print df

The running results are as follows:

  a  b  c  d  e
0  S.No  Name  Age  City  Salary
1     1      Tom   28     Toronto    20000
2     2      Lee   32    Hong Kong     3000
3     3   Steven   43    Bay Area     8300
4     4      Ram   38   Hyderabad     3900

skiprows

skiprows skips the specified number of rows.

 import pandas as pd
 df = pd.read_csv("temp.csv", skiprows=2)
 print df

The running results are as follows:

    2      Lee   32    Hong Kong   3000
0   3   Steven   43    Bay Area   8300
1   4      Ram   38   Hyderabad   3900