English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
This article will solve each of the following aspects one by one
1Main function of the program
2Implementation process
3Class definition
4Using generator to dynamically update each object and return the object
5Using strip to remove unnecessary characters
6Using rematch to match strings
7Using timestrptime to extract string to time object
8、
, complete code
The main function of the program
Now there is a document similar to a table for storing user information: the first row is the attribute, and each attribute is separated by a comma (,), starting from the second row, each row is the corresponding value of each attribute, and each row represents a user. How to implement reading this document and outputting a user object per line?4Additionally, there is
Some small requirements:
Each document is very large, if you store all the objects generated in one go in a list and return them, the memory will crash. The program can only store one object generated per line.+000000001.24such form, you need to remove the+And 0 are removed, extract1.24
The document has time, the form may be2013-10-29or2013/10/29 2:23:56 such form, you need to convert such strings to a date type
There are many such documents, and each has different properties, for example, this is user information, and that is call records. Therefore, the specific properties in the class must be dynamically generated according to the first line of the document.
Implementation process
1.Class definition
Since the properties are dynamically added, the property-The value pair is also dynamically added, the class must contain the two member functions updateAttributes() and updatePairs(), in addition, use the list attributes to store properties, and use the dictionary attrilist to store mappings. The init() function is the constructor. __attributes has an underscore before it, indicating a private variable and cannot be called directly outside. Instantiate it with a = UserInfo() without any parameters.
class UserInfo(object): 'Class to restore UserInformation' def __init__ (self): self.attrilist={} self.__attributes=[] def updateAttributes(self,attributes): self.__attributes=attributes def updatePairs(self,values): for i in range(len(values)): self.attrilist[self.__attributes[i]]=values[i]
2.Dynamically update each object and return the object using a generator (generator)
A generator is a function that can be initialized once and run multiple times automatically, returning a result each time. However, a function uses return to return results, while a generator uses yield. The next run starts from the position after yield. For example, to implement the Fibonacci sequence, we can use both a function and a generator:
def fib(max): n, a, b = 0, 0, 1 while n < max: print(b) a, b = b, a + b n = n + 1 return 'done'
We calculate the first6Number of:
>>> fib(6) 1 1 2 3 5 8 'done'
If using a generator, simply change print to yield. For example:
def fib(max): n, a, b = 0, 0, 1 while n < max: yield b a, b = b, a + b n = n + 1
Usage:
>>> f = fib(6) >>> f <generator object fib at 0x104feaaa0> >>> for i in f: ... print(i) ... 1 1 2 3 5 8 >>>
We can see that the generator fib itself is an object, each time it executes to yield it will break and return a result, and then it will continue from the next line of code after yield. Generators can also be executed with generator.next().
In my program, the code for the generator part is as follows:
def ObjectGenerator(maxlinenum): filename='/home/thinkit/Documents/usr_info/USER.csv' attributes=[] linenum=1 a=UserInfo() file=open(filename) while linenum < maxlinenum: values=[] line=str.decode(file.readline(),'gb2312)#linecache.getline(filename, linenum,'gb2312) if line=='': print'reading fail! Please check filename!' break str_list=line.split(',') for item in str_list: item=item.strip() item=item.strip('\"') item=item.strip('\'') item=item.strip('+0*) item=catchTime(item) if linenum==1: attributes.append(item) else: values.append(item) if linenum==1: a.updateAttributes(attributes) else: a.updatePairs(values) yield a.attrilist # Change to 'a' to use linenum = linenum +1
Among them, a=UserInfo() is an instance of the class UserInfo. Because the document is gb2312Encoded, the corresponding decoding method was used above. Since the first line is an attribute, there is a function that stores the attribute list into UserInfo, that is, updateAttributes(); the next lines then need to store the attributes-Values are read into a dictionary for storage. p.s. Python's dictionary is equivalent to a map.
3.Use strip to remove unnecessary characters
From the above code, we can see that using str.strip(somechar) can remove somechar characters from the front and back of str. somechar can be a symbol or a regular expression, as above:
item=item.strip()#Remove all escape characters from the string before and after, such as \t, \n, etc. item=item.strip('\"')#Remove the characters before and after item=item.strip('\'') item=item.strip('+0*')#Remove the characters before and after+00...00,*Indicates that the number of 0s can be arbitrary or none
4.re.match matches the string
Function syntax:
re.match(pattern, string, flags=0)
Function parameter description:
Parameter Description
pattern The regular expression to be matched
string The string to be matched.
flags Flag bits used to control the matching method of regular expressions, such as: case sensitive, multi-line matching, etc.
If the match is successful, the re.match method returns a match object, otherwise it returns None.
>>> s='2015-09-18'
>>> matchObj=re.match(r'\d{4}-\d{2}-\d{2}', s, flags= 0)
>>> print matchObj
<_sre.SRE_Match object at 0x7f3525480f38>
1
2
3
4
5
5Use time.strptime to extract string to time object
In the time module, time.strptime(str, format) can convert str into a time object according to the format, common formats in format include:
%y two-digit year representation (00-99)
%Y four-digit year representation (000-9999)
%m month (01-12)
%d day of the month (0-31)
%H 2412-hour clock hour (0-23)
%I 1212-hour clock hour (01-12)
%M minutes (00=59)
%S seconds (00-59)
In addition, the re module needs to be used to match strings with regular expressions to see if they are in a general time format, such as YYYY/MM/DD H:M:S, YYYY-MM-DD et al.
In the above code, the function catchTime determines whether item is a time object, and if so, converts it to a time object.
The code is as follows:
import time import re def catchTime(item): # check if it's time matchObj=re.match(r'\d{4}-\d{2}-\d{2}',item, flags= 0) if matchObj!= None : item = time.strptime(item,'%Y-%m-%d) #print "returned time: %s " %item return item else: matchObj=re.match(r'\d{4}/\d{2}/\d{2\s\d+:\d+:\d+',item,flags=0 ) if matchObj!= None : item = time.strptime(item,'%Y/%m/%d %H:%M:%S') #print "returned time: %s " %item return item
Complete code:
import collections import time import re class UserInfo(object): 'Class to restore UserInformation' def __init__ (self): self.attrilist=collections.OrderedDict()# ordered self.__attributes=[] def updateAttributes(self,attributes): self.__attributes=attributes def updatePairs(self,values): for i in range(len(values)): self.attrilist[self.__attributes[i]]=values[i] def catchTime(item): # check if it's time matchObj=re.match(r'\d{4}-\d{2}-\d{2}',item, flags= 0) if matchObj!= None : item = time.strptime(item,'%Y-%m-%d) #print "returned time: %s " %item return item else: matchObj=re.match(r'\d{4}/\d{2}/\d{2\s\d+:\d+:\d+',item,flags=0 ) if matchObj!= None : item = time.strptime(item,'%Y/%m/%d %H:%M:%S') #print "returned time: %s " %item return item def ObjectGenerator(maxlinenum): filename='/home/thinkit/Documents/usr_info/USER.csv' attributes=[] linenum=1 a=UserInfo() file=open(filename) while linenum < maxlinenum: values=[] line=str.decode(file.readline(),'gb2312)#linecache.getline(filename, linenum,'gb2312) if line=='': print'reading fail! Please check filename!' break str_list=line.split(',') for item in str_list: item=item.strip() item=item.strip('\"') item=item.strip('\'') item=item.strip('+0*) item=catchTime(item) if linenum==1: attributes.append(item) else: values.append(item) if linenum==1: a.updateAttributes(attributes) else: a.updatePairs(values) yield a.attrilist # Change to 'a' to use linenum = linenum +1 if __name__ == '__main__': for n in ObjectGenerator(10): print n # Output the dictionary to see if it is correct
Summary
That's all for this article, I hope it can bring some help to your learning or work. If you have any questions, please leave a message for communication, thank you for your support of Yell Tutorial.