English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
Pandas connection operation example
Pandas provides various functions that can easily combine Series, DataFrame and Panel objects.
pd.concat(objs,axis=0,join='outer',join_axes=None, ignore_index=False)
objs − This is a sequence or mapping of Series, DataFrame or Panel object. axis − {0,1,...},default is 0. This is the axis to be concatenated. join − {'inner', 'outer'}, default is 'outer'. How to handle index on other axes. External is union, internal is cross. ignore_index − Boolean value, default is False. If True, do not use index values on the concatenation axis. The result axis will be marked as 0, ..., n-1. join_axes − This is a list of index objects. Used for other (n-1)specific index of the axis, rather than executing internally/External setting logic.
The CONCAT function takes on the task of performing all concatenation operations along the axis. Let's create different objects and concatenate them.
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(pd.concat([one,two])))
The running results are as follows:
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Amy4 4 69 06 5 78 sub5 1 89 Bran2 2 8Allen4 3 79 Betty3 4 97 Using append to concatenate6 5 88 Concat useful shortcut is the append instance method on Series and DataFrame. These methods are actually earlier than concat. They concatenate along axis=0, i.e., index5
sub-
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) Alice
The running results are as follows:
sub 1 98 Ayoung1 2 9y2 3 87 sub4 4 69 Brian6 5 78 05 Bran 1 89 Bryce2 2 8Betty4 3 79 The index of the result is duplicated; each index is duplicated.3 4 97 If the resulting object must follow its own index, set ignore_index to True.6 5 88 print(pd.concat([one,two],keys=['x','y'],ignore_index=True))5
subject_id
Name
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) Marks_scored
The running results are as follows:
Note that the index is completely changed, and the keys are also overwritten. 0 98 Alex sub1 1 90 Amy sub2 2 87 Allen sub4 3 69 Alice sub6 4 78 Ayoung sub5 5 89 Billy sub2 6 80 Brian sub4 7 79 Bran sub3 8 97 Bryce sub6 9 88 Betty sub5
If you need to follow axis=
Adding two objects, a new column will be added. 1print(pd.concat([one,two],axis=
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) sub1Brian
The running results are as follows:
0 1 98 Alex1 89 Bran2 2 902 8Bryce4 3 87 Allen4 79 Betty3 4 69 Alice6 97 Using append to concatenate6 5 78 Ayoung5 88 Concat useful shortcut is the append instance method on Series and DataFrame. These methods are actually earlier than concat. They concatenate along axis=0, i.e., index5
subject_id-
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) Name
The running results are as follows:
Marks_scored 1 98 Alex1 2 902 3 87 Allen4 4 69 Alice6 5 78 Ayoung5 1 89 Billy2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
This additional feature can take multiple objects, as well as-
import pandas as pd one = pd.DataFrame({ 'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'], 'subject_id':['sub1','sub2','sub4','sub6','sub5'], 'Marks_scored':[98,90,87,69,78]}, index=[1,2,3,4,5]) two = pd.DataFrame({ 'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'], 'subject_id':['sub2','sub4','sub3','sub6','sub5'], 'Marks_scored':[89,80,79,97,88]}, index=[1,2,3,4,5]) print(one.append([two,one,two]))
The running results are as follows:
Marks_scored Name subject_id 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5 1 98 Alex sub1 2 90 Amy sub2 3 87 Allen sub4 4 69 Alice sub6 5 78 Ayoung sub5 1 89 Billy sub2 2 80 Brian sub4 3 79 Bran sub3 4 97 Bryce sub6 5 88 Betty sub5
Pandas provides a powerful tool for handling time series data, especially in the financial field. When dealing with time series data, we often encounter the following situations:
Generate time sequence Convert time series to different frequencies
It provides a set of relatively compact and independent tools to perform the above tasks.
datetime.now()Provide the current date and time.
import pandas as pd print(pd.datetime.now())
The running results are as follows:
2017-05-11 06:10:13.393147
Timestamp data is the most basic type of time series data that associates values with time points. For pandas objects, this means using time points. Let's take an example-
import pandas as pd print(pd.Timestamp('2017-03-01))
The running results are as follows:
2017-03-01 00:00:00
You can also convert integer or floating-point time. The default unit for these is nanoseconds (since this is the storage method for timestamps). However, the epoch is often stored in another unit that can be specified. Here is an example
import pandas as pd print(pd.Timestamp(1587687255,unit='s'))
The running results are as follows:
2020-04-24 00:14:15
import pandas as pd print(pd.date_range("11:00", "13:3"0", freq="30min").time)
The running results are as follows:
[datetime.time(11, 0) datetime.time(11, 3) datetime.time(12, 0) datetime.time(12, 3) datetime.time(13, 0) datetime.time(13, 3, 0)
import pandas as pd print(pd.date_range("11:00", "13:30", freq="H").time)
The running results are as follows:
[datetime.time(11, 0) datetime.time(12, 0) datetime.time(13, 0)
If you need to convert a series or a list-like object (such as strings, tuples, or mixed types) that contains similar date objects, you can use the to_datetime function. When passed, it will return a Series (with the same index), and list-like lists will be converted to DatetimeIndex. See the following example-
import pandas as pd print(pd.to_datetime(pd.Series(['Jul 31, 2009','2010-01-10', None)))
The running results are as follows:
0 2009-07-31 1 2010-01-10 2 NaT dtype: datetime64[ns]
NaT represents not a time (equivalent to NaN)
Let's take another example.
import pandas as pd print(pd.to_datetime(['2005/11/23', '2010.12.31', None]))
The running results are as follows:
DatetimeIndex(['2005-11-23', '2010-12-31', 'NaT'], dtype='datetime64[ns], freq=None)