English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية
This article describes Python3Implementation method of concurrent verification of proxy pool address. Shared for everyone's reference, as follows:
#encoding=utf-8 #author: walker #date: 2016-04-14 #summary: Using coroutines/Concurrent pool to verify the validity of proxy agents import os, sys, time import requests from concurrent import futures cur_dir_fullpath = os.path.dirname(os.path.abspath(__file__)) 'Accept': '*/*', 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E) } # Check the validity of a single proxy # If valid, return the proxy; otherwise, return an empty string def Check(desturl, proxy, feature): proxies = {'http': 'http://' + proxy} r = None # declare exMsg = None try: r = requests.get(url=desturl, headers=Headers, proxies=proxies, timeout=3) except: exMsg = '* ' + traceback.format_exc() #print(exMsg) finally: if 'r' in locals() and r: r.close() if exMsg: return '' if r.status_code != 200: return '' if r.text.find(feature) < 0: return '' return proxy # Input proxy list(set/list),return valid proxy list def GetValidProxyPool(rawProxyPool, desturl, feature): validProxyList = list() # valid proxy list pool = futures.ThreadPoolExecutor(8) futureList = list() for proxy in rawProxyPool: futureList.append(pool.submit(Check, desturl, proxy, feature)) print('\n submit done, waiting for responses\n') for future in futures.as_completed(futureList): proxy = future.result() print('proxy:') + proxy) if proxy: # valid proxy validProxyList.append(proxy) print('validProxyList size:') + str(len(validProxyList)) return validProxyList # Obtain the original proxy pool def GetRawProxyPool(): rawProxyPool = set() # Obtain the original proxy pool in some way... return rawProxyPool if __name__ == "__main__": rawProxyPool = GetRawProxyPool() desturl = 'http://...' # Target address that needs to be accessed through a proxy feature = 'xxx' # Feature code of the target web page validProxyPool = GetValidProxyPool(rawProxyPool, desturl, feature)
Readers who are interested in more content related to Python can check the special topics on this site: 'Python Basic and Advanced Classic Tutorial', 'Python URL Operation Skill Summary', 'Python Image Operation Skill Summary', 'Python Data Structure and Algorithm Tutorial', 'Python Socket Programming Skill Summary', 'Python Function Usage Skill Summary', 'Python String Operation Summary', and 'Python File and Directory Operation Skill Summary'
I hope the content described in this article will be helpful to everyone's Python program design.
Statement: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (Please replace # with @ when sending an email for reporting. Provide relevant evidence, and once verified, this site will immediately delete the content suspected of infringement.)