English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Python spider: crawl Baidu images through keywords

Use tool: Python2.7 Click to download

scrapy framework

sublime text3

I. Set up python (Windows version)

 1. Install python2.7 ---Then enter python in cmd, and the interface as follows indicates successful installation

 2. Integrated Scrapy framework----Enter the command line: pip install Scrapy

The successful installation interface is as follows:

There are many cases of failure, for example:

Solution:

Other errors can be searched on Baidu.

II. Start programming.

1. Crawl static websites without anti-crawling measures. For example, Baidu Tieba, DouBan Book.

For example-A post in 'Desktop Bar'https://tieba.baidu.com/p/2460150866?red_tag=3569129009

Python code as follows:

Code comments: Introduced two modules urllib and re. Defined two functions, the first function is to obtain the entire target web page data, and the second function is to obtain the target image in the target web page, traverse the web page, and sort the obtained images starting from 0.

Note: re module knowledge points:

Crawling picture effect diagram:

The default path for saving pictures is in the same directory file under the established .py.

2. Crawling Baidu images with anti-crawling measures. Such as Baidu images, etc.

For example, keyword search 'meme pack' https://image.baidu.com/search/index#63;tn=baiduimage&ct=201326592&lm=-1&cl=2&ie=gbk&word=%B1%ED%C7%E9%B0%FC&fr=ala&ori_query=%E8%A1%A8%E6%83%85%E5%8C%85&ala=0&alatpl=sp&pos=0&hs=2&xthttps=111111

The pictures are loaded in a scrolling manner, first crawling the most priority ones.30 pictures.

The code is as follows:

Code comments: import4There are several modules, the os module is used to specify the save path. The first two functions are the same as above. The third function uses an if statement and tryException exception.

The crawling process is as follows:

Crawling results:

Note: When writing Python code, pay attention to alignment, and do not mix Tab and space with and, which is easy to cause errors.

That's all for this article. I hope the content of this article can bring some help to everyone's learning or work, and I also hope to get more support for the Yell Tutorial!

Declaration: The content of this article is from the Internet, the copyright belongs to the original author, the content is contributed and uploaded by Internet users spontaneously, this website does not own the copyright, has not been manually edited, and does not assume relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#w3Please replace # with @ when sending an email to report abuse, and provide relevant evidence. Once verified, this site will immediately delete the infringing content.

You May Also Like