SQL Operations in Pandas Pandas Installation

Pandas Data Structures

Pandas has three commonly used data structures

Series DataFrame Panel

These data structures are built on top of Numpy arrays, which means they run very fast.

Python, Numpy, and Pandas comparison

Python

list: Python's built-in data type, mainly used for one-dimensional, simple functionality, low efficiency Dict: Python's built-in data type, multi-dimensional key-value pairs, low efficiency

Numpy

ndarray: Numpy basic data type, single data type Focus on data structure/Operation/Dimension (relationship between data)

Pandas

Series:1A dimension, similar to that with an index1ndarray DataFrame:2A table-like data type, similar to that with row/column indexes2ndarray, a one-dimensional data type, focuses on the relationship between data and index (actual data application)

From the perspective of practicality, functionality, and operability: list < ndarray < Series/DataFrame

In data standardization and analysis work, ndarray arrays serve as a necessary supplement, and most data should use Pandas data types

The best way to consider these data structures is that high-dimensional data structures are containers for low-dimensional data structures. For example, DataFrame is a container for Series, and Panel is a container for DataFrame.

Data structure	Dimension	Description
Series	1	Used to store one-dimensional data of a sequence
Data Frames	2	DataFrame, as a more complex data structure, is used to store multi-dimensional data
Panel	3	General3D label, an array of variable size.

Establishing and processing two-dimensional arrays is a tedious task, and when writing functions, users need to consider the orientation of the dataset. However, using Pandas data structures can reduce the user's effort.
For example, for table data (DataFrame), considering the index (row) and column semantically is more important than considering axis 0 and axis1is more helpful above.

Variability

All Pandas data structures are variable in value (can be changed), except for Series, the sizes of others are variable. Series is size-invariant.

Note -DataFrame is widely used and is one of the most important data structures. Panels are used much less.

Series

Series is a one-dimensional array-like structure with uniform data. For example, the following series is an integer10,23,56Set...

10	23	56	17	52	61	73	90	26	72

Series is a one-dimensional array-like structure with uniform data. For example, the following series is an integer10,23,56Set...

Key Points

Same Type Data Size is Invariant Variable Data Values

Data Frames

DataFrame is a two-dimensional array with heterogeneous data. For example,

Name	Age	Gender	Rating
Steve	32	Male	3.45
Lia	28	Female	4.6
Vin	45	Male	3.9
Katie	38	Female	2.78

The table above represents the data of the sales team of the organization and its overall performance rating, with data represented by rows and columns, where each column represents an attribute and each row represents a person.

Data Type of the Column

Column	Type
Name	String
Age	Integer
Gender	String
Rating	Float

Key Points

Heterogeneous Data Size is Invariant Data is Variable

Panel

A Panel is a three-dimensional data structure with heterogeneous data. It is difficult to represent a panel graphically. However, a panel can be described as a container for DataFrames.

Key Points

Heterogeneous Data Size is Variable Data is Variable

SQL Operations in Pandas Pandas Installation

Pandas Tutorial

Pandas Data Structures

Python, Numpy, and Pandas comparison

Variability

Series

Data Frames

Panel