English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Pandas has three commonly used data structures
Series DataFrame Panel
These data structures are built on top of Numpy arrays, which means they run very fast.
list: Python's built-in data type, mainly used for one-dimensional, simple functionality, low efficiency Dict: Python's built-in data type, multi-dimensional key-value pairs, low efficiency
ndarray: Numpy basic data type, single data type Focus on data structure/Operation/Dimension (relationship between data)
Series:1A dimension, similar to that with an index1ndarray DataFrame:2A table-like data type, similar to that with row/column indexes2ndarray, a one-dimensional data type, focuses on the relationship between data and index (actual data application)
From the perspective of practicality, functionality, and operability: list < ndarray < Series/DataFrame
In data standardization and analysis work, ndarray arrays serve as a necessary supplement, and most data should use Pandas data types
The best way to consider these data structures is that high-dimensional data structures are containers for low-dimensional data structures. For example, DataFrame is a container for Series, and Panel is a container for DataFrame.
Data structure | Dimension | Description |
Series | 1 | Used to store one-dimensional data of a sequence |
Data Frames | 2 | DataFrame, as a more complex data structure, is used to store multi-dimensional data |
Panel | 3 | General3D label, an array of variable size. |
Establishing and processing two-dimensional arrays is a tedious task, and when writing functions, users need to consider the orientation of the dataset. However, using Pandas data structures can reduce the user's effort.
For example, for table data (DataFrame), considering the index (row) and column semantically is more important than considering axis 0 and axis1is more helpful above.
All Pandas data structures are variable in value (can be changed), except for Series, the sizes of others are variable. Series is size-invariant.
Note -DataFrame is widely used and is one of the most important data structures. Panels are used much less.
Series is a one-dimensional array-like structure with uniform data. For example, the following series is an integer10,23,56Set...
10 | 23 | 56 | 17 | 52 | 61 | 73 | 90 | 26 | 72 |
Series is a one-dimensional array-like structure with uniform data. For example, the following series is an integer10,23,56Set...
Same Type Data Size is Invariant Variable Data Values
DataFrame is a two-dimensional array with heterogeneous data. For example,
Name | Age | Gender | Rating |
Steve | 32 | Male | 3.45 |
Lia | 28 | Female | 4.6 |
Vin | 45 | Male | 3.9 |
Katie | 38 | Female | 2.78 |
The table above represents the data of the sales team of the organization and its overall performance rating, with data represented by rows and columns, where each column represents an attribute and each row represents a person.
Column | Type |
Name | String |
Age | Integer |
Gender | String |
Rating | Float |
Heterogeneous Data Size is Invariant Data is Variable
A Panel is a three-dimensional data structure with heterogeneous data. It is difficult to represent a panel graphically. However, a panel can be described as a container for DataFrames.
Heterogeneous Data Size is Variable Data is Variable