English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
The function is as described in the title, my python2.7, installed in Windows environment, the development tool I use is wingide 6.0
1、Firstly, there is a simple XML file I designed, which is used as the source file for parsing
Below is the content of the file website.xml:
<website> <page name="index" title="[#1#]"> <h1>welcome to</h1> <p>this is a moment</p> <ul> <li><a href="shouting.html" rel="external nofollow" >Shouting</a></li> </ul> </page> <page name="shouting" title="[#2#]"> <h1>My name is likeyou</h1> </page> </website>
Explanation:page refers to a HTML file, here there are two pages, which means two HTML files need to be parsed, namely index.html and shouting.html. In index.html, the <a> link jumps to the shouting.html file to display the content of shouting.html file
2、python code implementation for parsing (xmltest.py)
#!D:\Python27\python.exe #-*- coding:utf-8 -*- from xml.sax import parse from xml.sax.handler import ContentHandler class PageCreate(ContentHandler): pagethrough = False def startElement(self, name, attrs): if name == 'page': self.pagethrough = True self.out = open(attrs['name']) + '.html', 'w') self.out.write('<html> <head> ') self.out.write('<title>%s<)/title>\n' %(attrs['title'])) self.out.write('<')/head>\n<body>\n') elif self.pagethrough: self.out.write('<') self.out.write(name) for str,val in attrs.items(): self.out.write(' %s="%s"' %(str, val)) self.out.write('>') def endElement(self, name): if name == 'page': self.out.write('<')/body>\n</html> self.pagethrough = False self.out.close() if self.pagethrough: self.out.write('<') self.out.write('/' + name) self.out.write('>') def characters(self, content): if self.pagethrough: self.out.write(content) parse('D:\\pyproject\\file\\website.xml', PageCreate())
Code Explanation:
The xml.sax parsing method is called to parse using the parse method, creating a parsing class that inherits from ContentHandler, where the startelement and endelement methods, as well as the characters method, are overridden. The startelement method is called when the start tag is found in the xml file, such as <a>, <h1The passthrough variable is used to determine whether the current element is within a page tag, where true indicates that it is within a page tag, which means it belongs to the current page's elements. Since xml.sax focuses on tags, it does not care about which page you are currently in. The subsequent code is easy to understand, adding the beginning tags of html such as <html>, <head>, <body>, etc. Note that attrs stores the attributes of the tag, for example, in <page> there is name="shouting" and name="index", so attrs stores this name="shouting" and name="index" as the filename of the html file. Similarly, the href=... in <a> is also obtained through this data, stored in the variables str and val, and written into the file through write.
Then endelement is when parsing to</h1>end tags, if it is the end of the file, then it is </page>,at this time, add the </html>、</body>add the end tags of these html, otherwise, it is the end tag of the element inside the page
characters is to add the string found between the start tag and the end tag
After running the Python code, we can see that two html files have been generated in the same directory, namely shouting.html and index.html. Open index.html and you will see a link called 'shouting'. Click on it to open shouting.html
The above method of using Python to parse xml files into html files is all the editor shares with everyone. I hope it can be a reference for everyone, and I also hope everyone will support the Shouting Tutorial.
Statement: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume any relevant legal liability. If you find any content suspected of copyright infringement, please send an email to notice#w3Please send an email to codebox.com (replace # with @ when sending an email) to report any violations, and provide relevant evidence. Once verified, this site will immediately delete the content suspected of infringement.