English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Pandas DataFrame

   Basic operations of Pandas DataFrame

DataFrame is a two-dimensional data structure, that is, data is aligned in a tabular form across rows and columns.

DataFrame functions

Potential columns are of different types Variable size Labeled axis (rows and columns) Arithmetic operations can be performed on rows and columns

Structure

pandas.Series

The structure of Series is as follows:

Let's assume we are creating a DataFrame using student data.

We can regard it as a representation of SQL table or spreadsheet data.

pandas.DataFrame

The following constructors can be used to create a pandas DataFrame-

 pandas.DataFrame(data, index, columns, dtype, copy)

Parameter Description:

data: Data can take various forms, such as ndarray, series, mapping, list, dict, constants, and another DataFrame. index: For row labels, if no index is passed, the index used for the result frame is Optional Default np.arange(n). columns: For column labels, the optional default syntax is-np.arange(n). Only if no index is passed. dtype: The data type of each column. copy: If the default value is False, this command (or any of its commands) is used to copy data.

Create DataFrame

Pandas DataFrame can be created from various inputs-

Lists dict Series Numpy ndarrays Another DataFrame

In the subsequent sections of this chapter, we will see how to create DataFrames using these inputs.

Create an empty DataFrame

It is possible to create a basic DataFrame as an Empty DataFrame.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 df = pd.DataFrame()
 print(df)

Running Results:

 Empty DataFrame
 Columns: []
 Index: []

Create DataFrame from Lists

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = [1,2,3,4,5]
 df = pd.DataFrame(data)
 print(df)

Running Results:

 0
 0 1
 1 2
 2 3
 3 4
 4 5
 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = [['Alex',10],['Bob',12],['Clarke',13]]
 df = pd.DataFrame(data,columns=['Name','Age'])
 print(df)

Running Results:

       Name     Age
 0     Alex     10
 1     Bob      12
 2     Clarke   13
 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = [['Alex',10],['Bob',12],['Clarke',13]]
 df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
 print df

Running Results:

 
      Name   Age
 0    Alex   10.0
 1    Bob    12.0
 2    Clarke 13.0
Note:The dtype parameter changes the type of the Age column to float.

from ndarrays / List's Dict creates a DataFrame

The lengths of all ndarrays must be the same. If an index is passed, the length of the index should be equal to the length of the array.
If no index is passed, the default index will be range(n), where n is the length of the array.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[,28,34,29,42]}
 df = pd.DataFrame(data)
 print(df)

Running Results:

 
    Age   Name
 0   28   Tom
 1   34   Jack
 2   29   Steve
 3   42   Ricky
Note:Comply with value 0、1、2、3They are the default indices assigned to each object using the functional range(n).

We use arrays to create indexed DataFrames.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[,28,34,29,42]}
 df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
 print(df)

Running Results:

 
       Age Name
 rank1 28 Tom
 rank2 34 Jack
 rank3 29 Steve
 rank4 42 Ricky
Note:The index parameter assigns an index to each row.

Creating DataFrame from list of dictionaries

A list of dictionaries can be passed as input data to create a DataFrame. By default, dictionary keys are used as column names.
The following example demonstrates how to create a DataFrame by passing a list of dictionaries.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
 df = pd.DataFrame(data)
 print(df)

Running Results:

     a b c
 0 1 2 NaN
 1 5 10 20.0
Note:NaN (Not a Number) is appended in the missing areas.

The following example demonstrates how to create a DataFrame by passing a list of dictionaries and row indices.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
 df = pd.DataFrame(data, index=['first', 'second'])
 print(df)

Running Results:

          a b c
 first 1 2 NaN
 second 5 10 20.0

The following example demonstrates how to create a DataFrame that contains a list of dictionaries, row indices, and column indices.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
 # There are two column indices, values are the same as the dictionary keys
 df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
 # There are two column indices
 df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])1'])
 print(df1)
 print(df2)

Running Results:

 #df1 output
       a b
 first 1 2
 second 5 10
 #df2 output
       a b1
 first 1 NaN
 second 5 NaN
Note:df2 DataFrame is created using column indices other than the dictionary keys; therefore, NaN was appended to the position. And df1It is created using column indices that are the same as the dictionary keys, so NaN was added.

Creating DataFrame from Dict Series

A series dictionary can be passed to form a DataFrame. The resulting index is the union of all passed series indices.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d
 print(df)

Running Results:

   one two
 a 1.0 1
 b 2.0 2
 c 3.0 3
 d NaN 4

For the first series, no label 'd' was passed, but the result is, for the label 'd', NaN was appended with NaN.
Now let's understand column selection, addition, and deletion through an example.

Column query

We will select a column from the DataFrame to understand this.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d
 print(df['one'])

Running Results:

   a 1.0
 b 2.0
 c 3.0
 d NaN
 Name: one, dtype: float64

Column addition

We will understand this by adding a new column to the existing DataFrame.

# Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d
 # Add a new column to an existing DataFrame object by passing a new series
 print ("Through passing as a Series, adding a new column:")
 df['three']=pd.Series([10,20,30],index=['a','b','c'])
 print df
 print("Add a new column using existing columns in DataFrame:")
 df['four']=df['one']+df['three']
 print(df)

Running Results:

 Add a new column by passing as Series:
 one two three
 a 1.0 1 10.0
 b 2.0 2 20.0
 c 3.0 3 30.0
 d NaN 4 NaN
 Add a new column using existing columns in DataFrame:
 one two three four
 a 1.0 1 10.0 11.0
 b 2.0 2 20.0 22.0
 c 3.0 3 30.0 33.0
 d NaN 4 NaN NaN

Column deletion

Columns can be deleted or popped; let's understand how with an example.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 
    'three' : pd.Series([10,20,30], index=['a','b','c'])}
 df = pd.DataFrame(d
 print("Our dataframe is:")
 print(df)
 # using del function
 print("Deleting the first column using del function:")
 del df['one']
 print(df)
 # using pop function
 print("Deleting another column using POP function:")
 df.pop('two')
 print(df)

Running Results:

 Our dataframe is:
 one three two
 a 1.0 10.0 1
 b 2.0 20.0 2
 c 3.0 30.0 3
 d NaN NaN 4
 Deleting the first column using del function:
   three two
 a 10.0 1
 b 20.0 2
 c 30.0 3
 d NaN 4
 Deleting another column using POP function:
   three
 a 10.0
 b 20.0
 c 30.0
 d NaN

Row query, addition, and deletion

Now, let's understand the concept of row selection, addition, and deletion through an example. Let's start with the concept of selection.

Query by label

Rows can be selected by passing a row label to the loc function.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d
 print(df.loc['b'])

Running Results:

 
   one 2.0
 two 2.0
 Name: b, dtype: float64

The result is a series with labels as DataFrame column names. And, the series name is used to retrieve its labels.

Query by integer position

Rows can be selected by passing an integer position to the iloc function.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d
 print(df.iloc[2])

Running Results:

 
   one 3.0
 two 3.0
 Name: c, dtype: float64

Row slicing

You can use the ':' operator to select multiple rows.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 
    'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d
 print(df[2:4])

Running Results:

 
     one two
 c 3.0 3
 d NaN 4

Add Rows

Use the append function to add new rows to DataFrame. This function will append rows at the end.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 df = pd.DataFrame([1, 2], [3, 4], columns = ['a','b'])
 df2 = pd.DataFrame([5, 6], [7, 8], columns = ['a','b'])
 df = df.append(df2)
 print(df)

Running Results:

 
     a b
 0 1 2
 1 3 4
 0 5 6
 1 7 8

Delete Rows

Delete rows from DataFrame using index labels or delete rows. If labels are duplicated, multiple rows will be deleted.
If you observe, in the above example, the labels are duplicated. Let's delete a label and see how many rows will be deleted.

 # Filename: pandas.py
 # author by: www.oldtoolbag.com 
 # Import pandas dependency package and alias
 import pandas as pd
 df = pd.DataFrame([1, 2], [3, 4], columns = ['a','b'])
 df2 = pd.DataFrame([5, 6], [7, 8], columns = ['a','b'])
 df = df.append(df2)
 # Drop rows with label 0
 df = df.drop(0)
 print(df)

Running Results:

 
     a b
 1 3 4
 1 7 8

In the above example, two lines were deleted because they contained the same label 0.