English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
How to split a string containing multiple delimiters?
Actual case
We need to split a string into different character segments based on delimiters, the string contains multiple different delimiters, for example:
s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd'
Among <,>,<;>,<|>,<\t> are delimiters, how to handle them?
Solution
Using the split() method continuously, processing one delimiter at a time
# Using Python2 def mySplit(s,ds): res = [s] for d in ds: t = [] map(lambda x: t.extend(x.split(d)), res) res = t return [x for x in res if x] s = 'asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' result = mySplit(s, ';,|\t') print(result)
C:\Users\Administrator>C:\Python\Python27\python.exe E:\python-intensive-training\s2.py ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']
>>> import re >>> re.split('[,;\t|]+','asd;aad|dasd|dasd,sdasd|asd,,Adas|sdasd;Asdasd,d|asd' ['asd', 'aad', 'dasd', 'dasd', 'sdasd', 'asd', 'Adas', 'sdasd', 'Asdasd', 'd', 'asd']
II. How to determine if a string a starts with or ends with string b?
Actual case
If a directory has the following files:
quicksort.c graph.py heap.java install.sh stack.cpp ......
Now we need to give executable permissions to directories ending with .sh and .py
Solution
Using the startswith() and endswith() methods of strings
>>> import os, stat >>> os.listdir('./') ['heap.java', 'quicksort.c', 'stack.cpp', 'install.sh', 'graph.py'] >>> [name for name in os.listdir('./') if name.endswith(('.sh','.py'))] ['install.sh', 'graph.py'] >>> os.chmod('install.sh', os.stat('install.sh').st_mode | stat.S_IXUSR)
[root@iZ28i253je0Z t]# ls -l install.sh -rwxr--r-- 1 root root 0 Sep 15 18:13 install.sh
III. How to adjust the format of text in a string?
Actual case
A log file of some software, where the date format is yyy-mm-dd:
2016-09-15 18:27:26 statu unpacked python3-pip:all 2016-09-15 19:27:26 status half-configured python3-pip:all 2016-09-15 20:27:26 status installed python3-pip:all 2016-09-15 21:27:26 configure asdasdasdas:all python3-pip:all
Need to change the date to the US date format mm/dd/yyy, 2016-09-15 --> 09/15/2016How should this be handled?
Solution
Using the re.sub() method of regular expressions to perform string replacement
Using regular expression capture groups to capture each part of the content, and replace the capture groups in the replacement string in the order they appear.
>>> log = '2016-09-15 18:27:26 statu unpacked python3-pip:all' >>> import re # 按顺序 >>> re.sub('(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1log) '09/15/2016 18:27:26 statu unpacked python3-pip:all' # Using regular expression grouping >>> re.sub('(?P<year>\d{4})-(ɸ})-(ɸ})', r'\g<month>/\g<day>/\g<year>' , log) '09/15/2016 18:27:26 statu unpacked python3-pip:all'
IV. How to concatenate multiple small strings into a large string?
Actual case
When designing a network program, we have customized a network protocol based on UDP, passing a series of parameters to the server in a fixed order:
hwDetect: "<0112>" gxDepthBits: "<32>" gxResolution: "<1024x768>" gxRefresh: "<60>" fullAlpha: "<1>" lodDist: "<100.0>" DistCull: "<500.0>"
In the program, we collect each parameter in order into a list:
["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]
Finally, we need to concatenate all the parameters into a data packet for transmission:
"<0112><32><1024x768><60><1><100.0><500.0>"
Solution
Iterate through the list, using the 'operator+'operation to concatenate each string sequentially
>>> for n in ["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]: ... result += n ... >>> result '<0112><32><1024x768><60><1><100.0><500.0>'
Using the str.join() method, it is faster to concatenate all strings in the list
>>> result = ''.join(["<0112>","<32>","<1024x768>","<60>","<1>","<100.0>","<500.0>"]) >>> result '<0112><32><1024x768><60><1><100.0><500.0>'
If there are numbers in the list, you can use a generator for conversion:
>>> hello = [222'sd',232,2e',0.2] >>> ''.join(str(x) for x in hello) '222sd2322e0.2'
V. How to align strings to the left, right, and center?
Actual case
A dictionary stores a series of attribute values:
{ 'ip':'127.0.0.1', 'blog': 'www.anshengme.com', 'title': 'Hello world', 'port': '"80' }
How to output the content in the following format in the program?
ip : 127.0.0.1 blog : www.anshengme.com title : Hello world port : 80
Solution
Use str.ljust() , str.rjust, str.center() to align the text to the left, right, and center
>>> info = {'ip':'127.0.0.1','blog': 'www.anshengme.com','title': 'Hello world','port': '80'} # Get the maximum length of keys in the dictionary >>> max(map(len, info.keys())) 5 >>> w = max(map(len, info.keys())) >>> for k in info: ... print(k.ljust(w), ':',info[k]) ... # The obtained result port : 80 blog : www.anshengme.com ip : 127.0.0.1 title : Hello world
Using the format() method, pass similar '<20','>20','^2
>>> for k in info: ... print(format(k,'^'+str(w)), ':',info[k]) ... port : 80 blog : www.anshengme.com ip : 127.0.0.1 title : Hello world
How to remove unnecessary characters from a string?
Actual case
Filter out extra whitespace characters from user input: [email protected]
Filter out extra whitespace characters in the text edited under Windows: hello word\r\n
Remove unicode combining symbols (tones) from the text: ‘ní ha&780;o, chi&772; fa&768;n'
Solution
The strip(), lstrip(), rstrip() methods of string remove characters at both ends of the string
>>> email = ' [email protected] ' >>> email.strip() '[email protected]' >>> email.lstrip() '[email protected] ' >>> email.rstrip() ' [email protected]' >>>
To delete a character at a fixed position, you can use slicing+methods for concatenation
>>> s[:3] + s[4:] 'abc123'
The replace() method of string or the regular expression re.sub() can remove characters at any position
>>> s = '\tabc\t'123\txyz' >>> s.replace('\t', '') 'abc'123xyz'
Use re.sub() to delete multiple
>>> import re >>> re.sub('[\t\r]','', string) 'abc123xyzopq'
The translate() method of the string, can delete multiple different characters at the same time
>>> import string >>> s = 'abc123xyz' >>> s.translate(string.maketrans('abcxyz','xyzabc')) 'xyz123abc'
>>> s = '\rasd\t23\bAds' >>> s.translate(None, '\r\t\b') 'asd23Ads'
# python2.7 >>> i = u'ni&769; ha&780;o, chi&772; fa&768;n' >>> i u'ni\u0301 ha\u030co, chi\u0304 fa\u0300n' >>> i.translate(dict.fromkeys([0x0301, 0x030c, 0x0304, 0x0300))) 'hello, have dinner'
Summary
This is the summary of string handling skills in Python, which demonstrates how to solve problems through cases, solutions, and examples. It has certain reference and reference value for everyone to learn or use Python. Those who need it can refer to it.
Readers who are interested in more about Python-related content can check out the special topics on this site: 'Summary of Python String Operation Skills', 'Summary of Python Coding Operation Skills', 'Summary of Python Image Operation Skills', 'Python Data Structures and Algorithms Tutorial', 'Summary of Python Socket Programming Skills', 'Summary of Python Function Usage Skills', 'Classic Tutorial of Python入门与进阶', and 'Summary of Python File and Directory Operation Skills'.