English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Join in Pandas

Pandas connection operation example

Pandas provides various functions that can easily combine Series, DataFrame and Panel objects.

 pd.concat(objs,axis=0,join='outer',join_axes=None,
 ignore_index=False)

objs − This is a sequence or mapping of Series, DataFrame or Panel object. axis − {0,1,...},default is 0. This is the axis to be concatenated. join − {'inner', 'outer'}, default is 'outer'. How to handle index on other axes. External is union, internal is cross. ignore_index − Boolean value, default is False. If True, do not use index values on the concatenation axis. The result axis will be marked as 0, ..., n-1. join_axes − This is a list of index objects. Used for other (n-1)specific index of the axis, rather than executing internally/External setting logic.

Concatenation object

The CONCAT function takes on the task of performing all concatenation operations along the axis. Let's create different objects and concatenate them.

 import pandas as pd
 one = pd.DataFrame({
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5'],
    'Marks_scored':[98,90,87,69,78]},
    index=[1,2,3,4,5])
 two = pd.DataFrame({
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5'],
    'Marks_scored':[89,80,79,97,88]},
    index=[1,2,3,4,5])
 print(pd.concat([one,two])))

The running results are as follows:

    Marks_scored  Name  subject_id
1             98     Alex  sub1
2             90  Amy  sub2
3             87    Amy4
4             69    06
5             78   sub5
1             89    Bran2
2             8Allen4
3             79     Betty3
4             97    Using append to concatenate6
5             88    Concat useful shortcut is the append instance method on Series and DataFrame. These methods are actually earlier than concat. They concatenate along axis=0, i.e., index5

sub-

 import pandas as pd
 one = pd.DataFrame({
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5'],
    'Marks_scored':[98,90,87,69,78]},
    index=[1,2,3,4,5])
 two = pd.DataFrame({
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5'],
    'Marks_scored':[89,80,79,97,88]},
    index=[1,2,3,4,5])
 Alice

The running results are as follows:

sub  1  98    Ayoung1
   2  9y2
   3  87    sub4
   4  69    Brian6
   5  78    05
Bran  1  89    Bryce2
   2  8Betty4
   3  79    The index of the result is duplicated; each index is duplicated.3
   4  97    If the resulting object must follow its own index, set ignore_index to True.6
   5  88    print(pd.concat([one,two],keys=['x','y'],ignore_index=True))5

subject_id

Name

 import pandas as pd
 one = pd.DataFrame({
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5'],
    'Marks_scored':[98,90,87,69,78]},
    index=[1,2,3,4,5])
 two = pd.DataFrame({
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5'],
    'Marks_scored':[89,80,79,97,88]},
    index=[1,2,3,4,5])
 Marks_scored

The running results are as follows:

    Note that the index is completely changed, and the keys are also overwritten.
0             98     Alex          sub1
1             90      Amy          sub2
2             87    Allen          sub4
3             69    Alice          sub6
4             78   Ayoung          sub5
5             89    Billy          sub2
6             80    Brian          sub4
7             79     Bran          sub3
8             97    Bryce          sub6
9             88    Betty          sub5

If you need to follow axis=

Adding two objects, a new column will be added. 1print(pd.concat([one,two],axis=

 import pandas as pd
 one = pd.DataFrame({
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5'],
    'Marks_scored':[98,90,87,69,78]},
    index=[1,2,3,4,5])
 two = pd.DataFrame({
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5'],
    'Marks_scored':[89,80,79,97,88]},
    index=[1,2,3,4,5])
 sub1Brian

The running results are as follows:

    0
1           98      Alex1         89         Bran2
2           902         8Bryce4
3           87     Allen4         79          Betty3
4           69     Alice6         97         Using append to concatenate6
5           78    Ayoung5         88         Concat useful shortcut is the append instance method on Series and DataFrame. These methods are actually earlier than concat. They concatenate along axis=0, i.e., index5

print(one.append(two))

subject_id-

 import pandas as pd
 one = pd.DataFrame({
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5'],
    'Marks_scored':[98,90,87,69,78]},
    index=[1,2,3,4,5])
 two = pd.DataFrame({
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5'],
    'Marks_scored':[89,80,79,97,88]},
    index=[1,2,3,4,5])
 Name

The running results are as follows:

    Marks_scored
1           98      Alex1
2           902
3           87     Allen4
4           69     Alice6
5           78    Ayoung5
1           89     Billy2
2           80   Brian    sub4
3           79      Bran    sub3
4           97     Bryce    sub6
5           88     Betty    sub5

This additional feature can take multiple objects, as well as-

 import pandas as pd
 one = pd.DataFrame({
    'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
    'subject_id':['sub1','sub2','sub4','sub6','sub5'],
    'Marks_scored':[98,90,87,69,78]},
    index=[1,2,3,4,5])
 two = pd.DataFrame({
    'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
    'subject_id':['sub2','sub4','sub3','sub6','sub5'],
    'Marks_scored':[89,80,79,97,88]},
    index=[1,2,3,4,5])
 print(one.append([two,one,two]))

The running results are as follows:

    Marks_scored   Name    subject_id
1           98     Alex          sub1
2           90      Amy          sub2
3           87    Allen          sub4
4           69    Alice          sub6
5           78   Ayoung          sub5
1           89    Billy          sub2
2           80    Brian          sub4
3           79     Bran          sub3
4           97    Bryce          sub6
5           88    Betty          sub5
1           98     Alex          sub1
2           90      Amy          sub2
3           87    Allen          sub4
4           69    Alice          sub6
5           78   Ayoung          sub5
1           89    Billy          sub2
2           80    Brian          sub4
3           79     Bran          sub3
4           97    Bryce          sub6
5           88    Betty          sub5

Time series

Pandas provides a powerful tool for handling time series data, especially in the financial field. When dealing with time series data, we often encounter the following situations:

Generate time sequence Convert time series to different frequencies

It provides a set of relatively compact and independent tools to perform the above tasks.

Get the current time

datetime.now()Provide the current date and time.

 import pandas as pd
 print(pd.datetime.now())

The running results are as follows:

2017-05-11 06:10:13.393147

Create a timestamp

Timestamp data is the most basic type of time series data that associates values with time points. For pandas objects, this means using time points. Let's take an example-

import pandas as pd
print(pd.Timestamp('2017-03-01))

The running results are as follows:

2017-03-01 00:00:00

You can also convert integer or floating-point time. The default unit for these is nanoseconds (since this is the storage method for timestamps). However, the epoch is often stored in another unit that can be specified. Here is an example

import pandas as pd
print(pd.Timestamp(1587687255,unit='s'))

The running results are as follows:

 2020-04-24 00:14:15

Creation time range

import pandas as pd
print(pd.date_range("11:00", "13:3"0", freq="30min").time)

The running results are as follows:

 [datetime.time(11, 0) datetime.time(11, 3) datetime.time(12, 0)
 datetime.time(12, 3) datetime.time(13, 0) datetime.time(13, 3, 0)

Change Time Frequency

import pandas as pd
print(pd.date_range("11:00", "13:30", freq="H").time)

The running results are as follows:

[datetime.time(11, 0) datetime.time(12, 0) datetime.time(13, 0)

Convert to Timestamp

If you need to convert a series or a list-like object (such as strings, tuples, or mixed types) that contains similar date objects, you can use the to_datetime function. When passed, it will return a Series (with the same index), and list-like lists will be converted to DatetimeIndex. See the following example-

import pandas as pd
print(pd.to_datetime(pd.Series(['Jul 31, 2009','2010-01-10', None)))

The running results are as follows:

 0 2009-07-31
 1 2010-01-10
 2 NaT
 dtype: datetime64[ns]

NaT represents not a time (equivalent to NaN)

Let's take another example.

import pandas as pd
print(pd.to_datetime(['2005/11/23', '2010.12.31', None]))

The running results are as follows:

DatetimeIndex(['2005-11-23', '2010-12-31', 'NaT'], dtype='datetime64[ns], freq=None)