Pandas DataFrame

　　Basic operations of Pandas DataFrame

DataFrame is a two-dimensional data structure, that is, data is aligned in a tabular form across rows and columns.

DataFrame functions

Potential columns are of different types Variable size Labeled axis (rows and columns) Arithmetic operations can be performed on rows and columns

Structure

pandas.Series

The structure of Series is as follows:

Let's assume we are creating a DataFrame using student data.

We can regard it as a representation of SQL table or spreadsheet data.

pandas.DataFrame

The following constructors can be used to create a pandas DataFrame-

　pandas.DataFrame(data, index, columns, dtype, copy)

Parameter Description:

data: Data can take various forms, such as ndarray, series, mapping, list, dict, constants, and another DataFrame. index: For row labels, if no index is passed, the index used for the result frame is Optional Default np.arange(n). columns: For column labels, the optional default syntax is-np.arange(n). Only if no index is passed. dtype: The data type of each column. copy: If the default value is False, this command (or any of its commands) is used to copy data.

Create DataFrame

Pandas DataFrame can be created from various inputs-

Lists dict Series Numpy ndarrays Another DataFrame

In the subsequent sections of this chapter, we will see how to create DataFrames using these inputs.

Create an empty DataFrame

It is possible to create a basic DataFrame as an Empty DataFrame.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　df　=　pd.DataFrame()
　print(df)

Running Results:

　Empty DataFrame
　Columns:　[]
　Index:　[]

Create DataFrame from Lists

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data　=　[1,2,3,4,5]
　df = pd.DataFrame(data)
　print(df)

Running Results:

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data　=　[['Alex',10],['Bob',12],['Clarke',13]]
　df　=　pd.DataFrame(data,columns=['Name','Age'])
　print(df)

Running Results:

　　　　　　　Name　　　　　Age
　0　　　　　Alex　　　　　10
　1　　　　　Bob　　　　　　12
　2　　　　　Clarke　　　13

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data　=　[['Alex',10],['Bob',12],['Clarke',13]]
　df　=　pd.DataFrame(data,columns=['Name','Age'],dtype=float)
　print df

Running Results:

　
　　　　　　Name　　　Age
　0　　　　Alex　　　10.0
　1　　　　Bob　　　　12.0
　2　　　　Clarke　13.0

Note:The dtype parameter changes the type of the Age column to float.

from ndarrays / List's Dict creates a DataFrame

The lengths of all ndarrays must be the same. If an index is passed, the length of the index should be equal to the length of the array.
If no index is passed, the default index will be range(n), where n is the length of the array.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data　=　{'Name':['Tom',　'Jack',　'Steve',　'Ricky'],'Age':[,28,34,29,42]}
　df = pd.DataFrame(data)
　print(df)

Running Results:

　
　　　　Age　　　Name
　0　　　28　　　Tom
　1　　　34　　　Jack
　2　　　29　　　Steve
　3　　　42　　　Ricky

Note:Comply with value 0、1、2、3They are the default indices assigned to each object using the functional range(n).

We use arrays to create indexed DataFrames.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data　=　{'Name':['Tom',　'Jack',　'Steve',　'Ricky'],'Age':[,28,34,29,42]}
　df　=　pd.DataFrame(data,　index=['rank1','rank2','rank3','rank4'])
　print(df)

Running Results:

　
　　　　　　　Age　Name
　rank1　28　Tom
　rank2　34　Jack
　rank3　29　Steve
　rank4　42　Ricky

Note:The index parameter assigns an index to each row.

Creating DataFrame from list of dictionaries

A list of dictionaries can be passed as input data to create a DataFrame. By default, dictionary keys are used as column names.
The following example demonstrates how to create a DataFrame by passing a list of dictionaries.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [{'a':　1, 'b':　2},{'a':　5, 'b':　10, 'c':　20}]
　df = pd.DataFrame(data)
　print(df)

Running Results:

　　　　　a b c
　0　1　2　NaN
　1　5　10　20.0

Note:NaN (Not a Number) is appended in the missing areas.

The following example demonstrates how to create a DataFrame by passing a list of dictionaries and row indices.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [{'a':　1, 'b':　2},{'a':　5, 'b':　10, 'c':　20}]
　df = pd.DataFrame(data, index=['first', 'second'])
　print(df)

Running Results:

　　　　　　　　　　a b c
　first　1　2　NaN
　second　5　10　20.0

The following example demonstrates how to create a DataFrame that contains a list of dictionaries, row indices, and column indices.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [{'a':　1, 'b':　2},{'a':　5, 'b':　10, 'c':　20}]
　# There are two column indices, values are the same as the dictionary keys
　df1　= pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
　# There are two column indices
　df2　= pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])1'])
　print(df1)
　print(df2)

Running Results:

　#df1　output
　　　　　　　a b
　first　1　2
　second　5　10
　#df2　output
　　　　　　　a b1
　first　1　NaN
　second　5　NaN

Note:df2 DataFrame is created using column indices other than the dictionary keys; therefore, NaN was appended to the position. And df1It is created using column indices that are the same as the dictionary keys, so NaN was added.

Creating DataFrame from Dict Series

A series dictionary can be passed to form a DataFrame. The resulting index is the union of all passed series indices.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df)

Running Results:

　　　one two
　a　1.0　1
　b　2.0　2
　c　3.0　3
　d NaN　4

For the first series, no label 'd' was passed, but the result is, for the label 'd', NaN was appended with NaN.
Now let's understand column selection, addition, and deletion through an example.

Column query

We will select a column from the DataFrame to understand this.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df['one'])

Running Results:

　　　a　1.0
　b　2.0
　c　3.0
　d NaN
　Name: one, dtype: float64

Column addition

We will understand this by adding a new column to the existing DataFrame.

Example

# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　# Add a new column to an existing DataFrame object by passing a new series
　print ("Through passing as a Series, adding a new column:")
　df['three']=pd.Series([10,20,30],index=['a','b','c'])
　print df
　print("Add a new column using existing columns in DataFrame:")
　df['four']=df['one']+df['three']
　print(df)

Running Results:

　Add a new column by passing as Series:
　one two three
　a　1.0　1　10.0
　b　2.0　2　20.0
　c　3.0　3　30.0
　d NaN　4　NaN
　Add a new column using existing columns in DataFrame:
　one two three four
　a　1.0　1　10.0　11.0
　b　2.0　2　20.0　22.0
　c　3.0　3　30.0　33.0
　d NaN　4　NaN NaN

Column deletion

Columns can be deleted or popped; let's understand how with an example.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),　
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd']),　
　　　　'three' : pd.Series([10,20,30], index=['a','b','c'])}
　df = pd.DataFrame(d
　print("Our dataframe is:")
　print(df)
　# using del function
　print("Deleting the first column using del function:")
　del df['one']
　print(df)
　# using pop function
　print("Deleting another column using POP function:")
　df.pop('two')
　print(df)

Running Results:

　Our dataframe is:
　one three two
　a　1.0　10.0　1
　b　2.0　20.0　2
　c　3.0　30.0　3
　d NaN NaN　4
　Deleting the first column using del function:
　　　three two
　a　10.0　1
　b　20.0　2
　c　30.0　3
　d NaN　4
　Deleting another column using POP function:
　　　three
　a　10.0
　b　20.0
　c　30.0
　d NaN

Row query, addition, and deletion

Now, let's understand the concept of row selection, addition, and deletion through an example. Let's start with the concept of selection.

Query by label

Rows can be selected by passing a row label to the loc function.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),　
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df.loc['b'])

Running Results:

　
　　　one　2.0
　two　2.0
　Name: b, dtype: float64

The result is a series with labels as DataFrame column names. And, the series name is used to retrieve its labels.

Query by integer position

Rows can be selected by passing an integer position to the iloc function.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df.iloc[2])

Running Results:

　
　　　one　3.0
　two　3.0
　Name: c, dtype: float64

Row slicing

You can use the ':' operator to select multiple rows.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),　
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df[2:4])

Running Results:

　
　　　　　one two
　c　3.0　3
　d NaN　4

Add Rows

Use the append function to add new rows to DataFrame. This function will append rows at the end.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　df = pd.DataFrame([1,　2], [3,　4], columns = ['a','b'])
　df2　= pd.DataFrame([5,　6], [7,　8], columns = ['a','b'])
　df = df.append(df2)
　print(df)

Running Results:

Delete Rows

Delete rows from DataFrame using index labels or delete rows. If labels are duplicated, multiple rows will be deleted.
If you observe, in the above example, the labels are duplicated. Let's delete a label and see how many rows will be deleted.

Example

　# Filename: pandas.py
　# author by: www.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　df = pd.DataFrame([1,　2], [3,　4], columns = ['a','b'])
　df2　= pd.DataFrame([5,　6], [7,　8], columns = ['a','b'])
　df = df.append(df2)
　# Drop rows with label 0
　df = df.drop(0)
　print(df)

Running Results:

In the above example, two lines were deleted because they contained the same label 0.

SQL Operations in Pandas Pandas Series

Pandas Tutorial

Pandas DataFrame

pandas.Series

pandas.DataFrame

Create DataFrame

Create an empty DataFrame

Create DataFrame from Lists

from ndarrays / List's Dict creates a DataFrame

Creating DataFrame from list of dictionaries

Creating DataFrame from Dict Series

Column query

Column addition

Column deletion

Row query, addition, and deletion

Query by label

Query by integer position

Row slicing

Add Rows

Delete Rows