SQL Operations in Pandas Timedelta in Pandas

Categorical Data in Pandas

Pandas operation example of classified data

Data usually contains duplicate text columns in real time. Gender, country/Functions such as regions and codes are always repetitive. These are examples of classified data.
Categorical variables can only take a limited and usually fixed number of possible values. In addition to fixed length, categorical data may also have order, but cannot perform numerical operations. Categorical is a Pandas data type.

Categorical data types are very useful in the following cases

A string variable that contains only a few different values. Converting such a string variable to a categorical variable will save some memory.

The lexical order of the variable is different from the logical order ("one", "two", "three"). By converting to category and specifying the order on the category, sorting and minimum/The maximum will use logical order instead of alphabetical order.

As a signal from other Python libraries, this column should be considered as a categorical variable (for example, using appropriate statistical methods or plotting types).

Object creation

Categorical objects can be created in various ways. The following describes different methods:

pd.Categorical

Using the standard Pandas categorical constructor, we can create a categorical object.

pandas.Categorical(values, categories, ordered)

Let's look at an example-

Example

　import pandas as pd
　cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'])
　print(cat)

The running results are as follows:

　[a, b, c, a, b, c]
　Categories (3, object): [a, b, c]

Let's look at another example

Example

　import pandas as pd
　cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c', 'd'], ['c', 'b', 'a'])
　print(cat)

The running results are as follows:

　[a, b, c, a, b, c, NaN]
　Categories (3, object): [c, b, a]

Here, the second parameter represents the category. Therefore, any value that does not exist in the category will be considered as NaN.
Now, let's look at the following example:

Example

　import pandas as pd
　cat = cat = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c', 'd'], ['c', 'b', 'a'], ordered=True)
　print(cat)

The running results are as follows:

　[a, b, c, a, b, c, NaN]
　Categories (3, object): [c < b < a]

Logically, this order means a is greater than b and b is greater than c.

Description

Using the .describe() command for categorical data, we get a similar output as a string to a series or dataframe type.

Example

　import pandas as pd
　import numpy as np
　cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
　df = pd.DataFrame({"cat": cat, "s": ["a", "c", "c", np.nan]})
　print(df.describe())
　print(df["cat"].describe())

The running results are as follows:

　　　　cat s
count　　　　3　3
unique　　　2　2
top c c
freq　　　　　2　2
count　　　　　3
unique　　　　2
top c
freq　　　　　　2
Name: cat, dtype: object

Get the attributes of the category

The obj.cat.categories command is used to obtain the categories of the object.

Example

　import pandas as pd
　import numpy as np
　s = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
　print(s.categories)

The running results are as follows:

　　Index([u'b', u'a', u'c'], dtype='object')

The obj.ordered command is used to obtain the order of the object.

Example

　import pandas as pd
　import numpy as np
　cat = pd.Categorical(["a", "c", "c", np.nan], categories=["b", "a", "c"])
　print(cat.ordered)

The running results are as follows:

　　　False

The function returns false because we did not specify any order.

Rename category

Category renaming is completed by assigning a new value to the series.cat.categories attribute.

Example

　import pandas as pd
　s = pd.Series(["a","b","c","a"], dtype="category")
　s.cat.categories = ["Group %s" % g for g in s.cat.categories]
　print(s.cat.categories)

The running results are as follows:

Index([u'Group a', u'Group b', u'Group c'], dtype='object')

The initial category [a, b, c] is updated by the s.cat.categories attribute of the object.

Append new category

The Categorical.add_categories() method can be used to append new categories.

Example

　import pandas as pd
　s = pd.Series(["a","b","c","a"], dtype="category")
　s = s.cat.add_categories([4]
　print(s.cat.categories)

The running results are as follows:

Index([u'a', u'b', u'c',　4], dtype='object')

Remove category

The Categorical.remove_categories() method can be used to remove unnecessary categories.

Example

　import pandas as pd
　s = pd.Series(["a","b","c","a"], dtype="category")
　print(("Original object:"))
　print(s)
　print(("After removal:"))
　print(s.cat.remove_categories("a"))

The running results are as follows:

　Original object:
　0 a
　1　b
　2　c
　3　a
　dtype: category
　Categories (3, object): [a, b, c]
　After removal:
　0 NaN
　1　b
　2　c
　3　NaN
　dtype: category
　Categories (2, object): [b, c]

Categorical Data Comparison

There are three cases where categorical data can be compared with other objects:

Compare equal (== and !=) with objects similar to lists with the same length as categorical data (list, series, array, ...).

When sorting == True and categories are the same, compare category data with all comparisons of another category series (==, !=, >, >=, 　< and <=).< div> 　　

All comparisons between categorical data and scalars.

See the following example:

Example

　import pandas as pd
　cat = pd.Series([1,2,3]).astype("category", categories=[1,2,3], ordered=True)
　cat1　= pd.Series([2,2,2]).astype("category", categories=[1,2,3], ordered=True)
　print(cat>cat1)

The running results are as follows:

　0 False
　1　　False
　2　　True
　dtype: bool

SQL Operations in Pandas Timedelta in Pandas

Pandas tutorial

Categorical Data in Pandas

Object creation

Category

pd.Categorical

Description

Get the attributes of the category

Rename category

Append new category

Remove category

Categorical Data Comparison