English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Method to extract hyperlinks from web pages in Python

The simplest implementation method is to first capture the target web page, and then obtain hyperlinks by matching the href attribute of a tags through regular expressions.

The code is as follows:

import urllib2
import re
url = 'http://www.sunbloger.com/'
req = urllib2.Request(url)
con = urllib2.urlopen(req)
doc = con.read()
con.close()
links = re.findall(r'href\=\"(http\:\/\/[a-zA-Z0-9\.\/]+)\"', doc)
for a in links:
  print a

Summary

This is the full content of this article. I hope the content of this article can be helpful to everyone's learning or work. If you have any questions, you can leave a message for communication.

You May Also Like