SQL Operations in Pandas Sorting in Pandas

Text Processing in Pandas

Pandas text processing operation examples

In this chapter, we will use basic Series / Index discusses string operations. In the following chapters, we will learn how to apply these string functions to DataFrames.

Pandas provides a set of string functions that can easily manipulate string data. Most importantly, these functions ignore (or exclude) missing/ NaN values.

Almost all of these methods can be used for Python string functions (see: https://docs.python.org/3/library/stdtypes.html#string-methods)

). Therefore, convert the Series object to a String object, then perform the operation.

Let's see how each operation is executed.	Method
lower()	Convert the strings in the index to lowercase./Description
upper()	Convert the strings in the index to lowercase./Convert the strings in the index to uppercase.
len()	Calculate the length of the string.
strip()	Help remove spaces from both sides of the series/Remove spaces (including newline characters) from each string in the index.
split(' ')	Split each string with the given pattern.
cat(sep=' ')/td>	Concatenate the series with the given delimiter/Index elements.
get_dummies()	Return a DataFrame with a one-hot encoding value.
contains(pattern)	If the substring is contained in the element, return a boolean True for each element, otherwise return False.
replace(a,b)	Replace the value of a with b.
repeat(value)	Repeat each element a specified number of times.
count(pattern)	Return the number of times the pattern appears in each element.
startswith(pattern)	If the series/Return true if the element in the index starts with the pattern.
endswith(pattern)	If the series/Return true if the element in the index ends with the pattern.
find(pattern)	Return the first position of the first occurrence of the pattern.
findall(pattern)	Return a list of all patterns that appear.
swapcase	Case Folding
islower()<	Check the series/Check if each character in each string in the index is lowercase. Returns a boolean
isupper()	Check the series/Check if each character in each string in the index is uppercase. Returns a boolean value.
isnumeric()	Check the series/Check if each character in each string in the index is a number. Returns a boolean value.

Let's create a Series to see how all the above functions work.

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print s

Running Result:

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　4　NaN
　5　1234
　6　Steve Smith
　dtype: object

lower()

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print s.str.lower()

Running Result:

　0 tom
　1　william rick
　2　john
　3　alber@t
　4　NaN
　5　1234
　6　steve smith
　dtype: object

upper()

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print s.str.upper()

Running Result:

　0 TOM
　1　WILLIAM RICK
　2　JOHN
　3　ALBER@T
　4　NaN
　5　1234
　6　STEVE SMITH
　dtype: object

len()

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom', 'William Rick', 'John', 'Alber@t', np.nan, '1234','SteveSmith'])
　print s.str.len()

Running Result:

　0　3.0
　1　12.0
　2　4.0
　3　7.0
　4　NaN
　5　4.0
　6　10.0
　dtype: float64

strip()

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print s
　print ("After Stripping:")
　print s.str.strip()

Running Result:

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype: object
　After Stripping:
　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype: object

split(pattern)

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print s
　print ('Split Pattern:)
　print s.str.split('　')

Running Result:

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype: object
　Split Pattern:
　0    [Tom, , , , , , , , , , ]
　1　[, , , , , William, Rick]
　2　[John]
　3　[Alber@t]
　dtype: object

cat(sep=pattern)

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print s.str.cat(sep='_')

Running Result:

　　　Tom _ William Rick_John_Alber@t

get_dummies()

Example

　import pandas as pd
　import numpy as np
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print s.str.get_dummies()

Running Result:

　　　William Rick        Alber@t        John        Tom
0        0        0        0        0        0        0        0　　　　　1
1　　　　　　　　　　　　　1　　　　　　　　　0        0        0        0
2　　　　　　　　　　　　　0        0        0　　　　　　1　　　　　0
3　　　　　　　　　　　　　0　　　　　　　　　1　　　　　　0        0

contains ()

Example

　import pandas as pd
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print s.str.contains('　')

Running Result:

　0    True
　1　　True
　2　　False
　3　　False
　dtype: bool

replace(a,b)

Example

　import pandas as pd
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print s
　print ('After replacing @ with $:)
　print s.str.replace('@',')
　)

Running Result:

　0    Tom
　1　William Rick
　2　John
　3　Alber@t
　dtype: object
　After replacing @ with $:
　0    Tom
　1　William Rick
　2　John
　3　Alber$t
　dtype: object

repeat(value)

Example

　import pandas as pd
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print s.str.repeat(2)

Running Result:

0        Tom        Tom
1　　　William Rick        William Rick
2　　　　　　　　　　　　　　　　　　JohnJohn
3　　　　　　　　　　　　　　　　　　Alber@tAlber@t
dtype: object

count(pattern)

Example

　import pandas as pd
　　
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print ('The number of 'm' in each string:')
　print s.str.count('m')

Running Result:

　The number of 'm' in each string:
　0　1
　1　1
　2　0
　3　0

startswith(pattern)

Example

　import pandas as pd
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print ('Strings that start with 'T':)
　print s.str.startwith('T')

Running Result:

　0    True
　1　　False
　2　　False
　3　　False
　dtype: bool

endswith(pattern)

Example

　import pandas as pd
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print ('Strings that end with 't':)
　print s.str.endswith('t')

Running Result:

　Strings that end with 't':
　0 False
　1　　False
　2　　False
　3　　True
　dtype: bool

find(pattern)

Example

　import pandas as pd
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print(s.str.find('e'))

Running Result:

　0　-1
　1　-1
　2　-1
　3　3
　dtype: int64

" -1” indicates that no matches were found in the elements.

findall(pattern)

Example

　import pandas as pd
　s = pd.Series(['Tom ', '  William Rick', 'John', 'Albert'])
　print(s.str.findall('e'))

Running Result:

　0 []
　1　[]
　2　[]
　3　[e]
　dtype: object

An empty list ([]) indicates that no matches were found in the elements

swapcase()

Example

　import pandas as pd
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert'])
　print(s.str.swapcase())

Running Result:

　0 tOM
　1　wILLIAM rICK
　2　jOHN
　3　aLBER@T
　dtype: object

islower()

Example

　import pandas as pd
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert'])
　print(s.str.islower())

Running Result:

　0 False
　1　　False
　2　　False
　3　　False
　dtype: bool

isupper()

Example

　import pandas as pd
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert'])
　print(s.str.isupper())

Running Result:

　0 False
　1　　False
　2　　False
　3　　False
　dtype: bool

isnumeric()

Example

　import pandas as pd
　s = pd.Series(['Tom', 'William Rick', 'John', 'Albert'])
　print(s.str.isnumeric())

Running Result:

　0 False
　1　　False
　2　　False
　3　　False
　dtype: bool

SQL Operations in Pandas Sorting in Pandas

Pandas Tutorial

Text Processing in Pandas

lower()

upper()

len()

strip()

split(pattern)

cat(sep=pattern)

get_dummies()

contains ()

replace(a,b)

repeat(value)

count(pattern)

startswith(pattern)

endswith(pattern)

find(pattern)

findall(pattern)

swapcase()

islower()

isupper()

isnumeric()