English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Basic operations of Pandas DataFrame
DataFrame is a two-dimensional data structure, that is, data is aligned in a tabular form across rows and columns.
Potential columns are of different types Variable size Labeled axis (rows and columns) Arithmetic operations can be performed on rows and columns
The structure of Series is as follows:
Let's assume we are creating a DataFrame using student data.
We can regard it as a representation of SQL table or spreadsheet data.
The following constructors can be used to create a pandas DataFrame-
pandas.DataFrame(data, index, columns, dtype, copy)
Parameter Description:
data: Data can take various forms, such as ndarray, series, mapping, list, dict, constants, and another DataFrame. index: For row labels, if no index is passed, the index used for the result frame is Optional Default np.arange(n). columns: For column labels, the optional default syntax is-np.arange(n). Only if no index is passed. dtype: The data type of each column. copy: If the default value is False, this command (or any of its commands) is used to copy data.
Pandas DataFrame can be created from various inputs-
Lists dict Series Numpy ndarrays Another DataFrame
In the subsequent sections of this chapter, we will see how to create DataFrames using these inputs.
It is possible to create a basic DataFrame as an Empty DataFrame.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd df = pd.DataFrame() print(df)
Running Results:
Empty DataFrame Columns: [] Index: []
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = [1,2,3,4,5] df = pd.DataFrame(data) print(df)
Running Results:
0 0 1 1 2 2 3 3 4 4 5
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age']) print(df)
Running Results:
Name Age 0 Alex 10 1 Bob 12 2 Clarke 13
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = [['Alex',10],['Bob',12],['Clarke',13]] df = pd.DataFrame(data,columns=['Name','Age'],dtype=float) print df
Running Results:
Name Age 0 Alex 10.0 1 Bob 12.0 2 Clarke 13.0
The lengths of all ndarrays must be the same. If an index is passed, the length of the index should be equal to the length of the array.
If no index is passed, the default index will be range(n), where n is the length of the array.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[,28,34,29,42]} df = pd.DataFrame(data) print(df)
Running Results:
Age Name 0 28 Tom 1 34 Jack 2 29 Steve 3 42 Ricky
We use arrays to create indexed DataFrames.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[,28,34,29,42]} df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4']) print(df)
Running Results:
Age Name rank1 28 Tom rank2 34 Jack rank3 29 Steve rank4 42 Ricky
A list of dictionaries can be passed as input data to create a DataFrame. By default, dictionary keys are used as column names.
The following example demonstrates how to create a DataFrame by passing a list of dictionaries.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] df = pd.DataFrame(data) print(df)
Running Results:
a b c 0 1 2 NaN 1 5 10 20.0
The following example demonstrates how to create a DataFrame by passing a list of dictionaries and row indices.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] df = pd.DataFrame(data, index=['first', 'second']) print(df)
Running Results:
a b c first 1 2 NaN second 5 10 20.0
The following example demonstrates how to create a DataFrame that contains a list of dictionaries, row indices, and column indices.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] # There are two column indices, values are the same as the dictionary keys df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b']) # There are two column indices df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])1']) print(df1) print(df2)
Running Results:
#df1 output a b first 1 2 second 5 10 #df2 output a b1 first 1 NaN second 5 NaN
A series dictionary can be passed to form a DataFrame. The resulting index is the union of all passed series indices.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d print(df)
Running Results:
one two a 1.0 1 b 2.0 2 c 3.0 3 d NaN 4
For the first series, no label 'd' was passed, but the result is, for the label 'd', NaN was appended with NaN.
Now let's understand column selection, addition, and deletion through an example.
We will select a column from the DataFrame to understand this.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d print(df['one'])
Running Results:
a 1.0 b 2.0 c 3.0 d NaN Name: one, dtype: float64
We will understand this by adding a new column to the existing DataFrame.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d # Add a new column to an existing DataFrame object by passing a new series print ("Through passing as a Series, adding a new column:") df['three']=pd.Series([10,20,30],index=['a','b','c']) print df print("Add a new column using existing columns in DataFrame:") df['four']=df['one']+df['three'] print(df)
Running Results:
Add a new column by passing as Series: one two three a 1.0 1 10.0 b 2.0 2 20.0 c 3.0 3 30.0 d NaN 4 NaN Add a new column using existing columns in DataFrame: one two three four a 1.0 1 10.0 11.0 b 2.0 2 20.0 22.0 c 3.0 3 30.0 33.0 d NaN 4 NaN NaN
Columns can be deleted or popped; let's understand how with an example.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 'three' : pd.Series([10,20,30], index=['a','b','c'])} df = pd.DataFrame(d print("Our dataframe is:") print(df) # using del function print("Deleting the first column using del function:") del df['one'] print(df) # using pop function print("Deleting another column using POP function:") df.pop('two') print(df)
Running Results:
Our dataframe is: one three two a 1.0 10.0 1 b 2.0 20.0 2 c 3.0 30.0 3 d NaN NaN 4 Deleting the first column using del function: three two a 10.0 1 b 20.0 2 c 30.0 3 d NaN 4 Deleting another column using POP function: three a 10.0 b 20.0 c 30.0 d NaN
Now, let's understand the concept of row selection, addition, and deletion through an example. Let's start with the concept of selection.
Rows can be selected by passing a row label to the loc function.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d print(df.loc['b'])
Running Results:
one 2.0 two 2.0 Name: b, dtype: float64
The result is a series with labels as DataFrame column names. And, the series name is used to retrieve its labels.
Rows can be selected by passing an integer position to the iloc function.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d print(df.iloc[2])
Running Results:
one 3.0 two 3.0 Name: c, dtype: float64
You can use the ':' operator to select multiple rows.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df = pd.DataFrame(d print(df[2:4])
Running Results:
one two c 3.0 3 d NaN 4
Use the append function to add new rows to DataFrame. This function will append rows at the end.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd df = pd.DataFrame([1, 2], [3, 4], columns = ['a','b']) df2 = pd.DataFrame([5, 6], [7, 8], columns = ['a','b']) df = df.append(df2) print(df)
Running Results:
a b 0 1 2 1 3 4 0 5 6 1 7 8
Delete rows from DataFrame using index labels or delete rows. If labels are duplicated, multiple rows will be deleted.
If you observe, in the above example, the labels are duplicated. Let's delete a label and see how many rows will be deleted.
# Filename: pandas.py # author by: www.oldtoolbag.com # Import pandas dependency package and alias import pandas as pd df = pd.DataFrame([1, 2], [3, 4], columns = ['a','b']) df2 = pd.DataFrame([5, 6], [7, 8], columns = ['a','b']) df = df.append(df2) # Drop rows with label 0 df = df.drop(0) print(df)
Running Results:
a b 1 3 4 1 7 8
In the above example, two lines were deleted because they contained the same label 0.