English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Pandas Data Structures

Pandas has three commonly used data structures

Series DataFrame Panel

These data structures are built on top of Numpy arrays, which means they run very fast.

Python, Numpy, and Pandas comparison

Python

list: Python's built-in data type, mainly used for one-dimensional, simple functionality, low efficiency Dict: Python's built-in data type, multi-dimensional key-value pairs, low efficiency

Numpy

ndarray: Numpy basic data type, single data type Focus on data structure/Operation/Dimension (relationship between data)

Pandas

Series:1A dimension, similar to that with an index1ndarray DataFrame:2A table-like data type, similar to that with row/column indexes2ndarray, a one-dimensional data type, focuses on the relationship between data and index (actual data application)

From the perspective of practicality, functionality, and operability: list < ndarray < Series/DataFrame

In data standardization and analysis work, ndarray arrays serve as a necessary supplement, and most data should use Pandas data types

The best way to consider these data structures is that high-dimensional data structures are containers for low-dimensional data structures. For example, DataFrame is a container for Series, and Panel is a container for DataFrame.

Data structure DimensionDescription
Series1Used to store one-dimensional data of a sequence
Data Frames2DataFrame, as a more complex data structure, is used to store multi-dimensional data
Panel3General3D label, an array of variable size.

Establishing and processing two-dimensional arrays is a tedious task, and when writing functions, users need to consider the orientation of the dataset. However, using Pandas data structures can reduce the user's effort.
For example, for table data (DataFrame), considering the index (row) and column semantically is more important than considering axis 0 and axis1is more helpful above.

Variability

All Pandas data structures are variable in value (can be changed), except for Series, the sizes of others are variable. Series is size-invariant.

Note -DataFrame is widely used and is one of the most important data structures. Panels are used much less.

Series

Series is a one-dimensional array-like structure with uniform data. For example, the following series is an integer10,23,56Set...

10235617526173902672

Series is a one-dimensional array-like structure with uniform data. For example, the following series is an integer10,23,56Set...

Key Points

Same Type Data Size is Invariant Variable Data Values

Data Frames

DataFrame is a two-dimensional array with heterogeneous data. For example,

NameAgeGenderRating
Steve32Male3.45
Lia28Female4.6
Vin45Male3.9
Katie38Female2.78

The table above represents the data of the sales team of the organization and its overall performance rating, with data represented by rows and columns, where each column represents an attribute and each row represents a person.

Data Type of the Column
ColumnType
Name String
Age Integer
Gender String
Rating Float
Key Points

Heterogeneous Data Size is Invariant Data is Variable

Panel

A Panel is a three-dimensional data structure with heterogeneous data. It is difficult to represent a panel graphically. However, a panel can be described as a container for DataFrames.

Key Points

Heterogeneous Data Size is Variable Data is Variable