English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Python3Method to Implement Concurrent Verification of Proxy Pool Addresses

This article describes Python3Implementation method of concurrent verification of proxy pool address. Shared for everyone's reference, as follows:

#encoding=utf-8
#author: walker
#date: 2016-04-14
#summary: Using coroutines/Concurrent pool to verify the validity of proxy agents
import os, sys, time
import requests
from concurrent import futures
cur_dir_fullpath = os.path.dirname(os.path.abspath(__file__))

      'Accept': '*/*',
      'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)
    }
# Check the validity of a single proxy
# If valid, return the proxy; otherwise, return an empty string
def Check(desturl, proxy, feature):
  proxies = {'http': 'http://' + proxy}
  r = None # declare
  exMsg = None
  try:
    r = requests.get(url=desturl, headers=Headers, proxies=proxies, timeout=3)
  except:
    exMsg = '* ' + traceback.format_exc()
    #print(exMsg)
  finally:
    if 'r' in locals() and r:
      r.close()
  if exMsg:
    return ''
  if r.status_code != 200:
    return ''
  if r.text.find(feature) < 0:
    return ''
  return proxy
# Input proxy list(set/list),return valid proxy list
def GetValidProxyPool(rawProxyPool, desturl, feature):
  validProxyList = list()  # valid proxy list
  pool = futures.ThreadPoolExecutor(8)
  futureList = list()
  for proxy in rawProxyPool:
    futureList.append(pool.submit(Check, desturl, proxy, feature))
  print('\n submit done, waiting for responses\n')
  for future in futures.as_completed(futureList):
    proxy = future.result()
    print('proxy:') + proxy)
    if proxy: # valid proxy
      validProxyList.append(proxy)
  print('validProxyList size:') + str(len(validProxyList))
  return validProxyList
# Obtain the original proxy pool
def GetRawProxyPool():
  rawProxyPool = set()
  # Obtain the original proxy pool in some way...
  return rawProxyPool
if __name__ == "__main__":
  rawProxyPool = GetRawProxyPool()
  desturl = 'http://...'    # Target address that needs to be accessed through a proxy
  feature = 'xxx'    # Feature code of the target web page
  validProxyPool = GetValidProxyPool(rawProxyPool, desturl, feature)

Readers who are interested in more content related to Python can check the special topics on this site: 'Python Basic and Advanced Classic Tutorial', 'Python URL Operation Skill Summary', 'Python Image Operation Skill Summary', 'Python Data Structure and Algorithm Tutorial', 'Python Socket Programming Skill Summary', 'Python Function Usage Skill Summary', 'Python String Operation Summary', and 'Python File and Directory Operation Skill Summary'

I hope the content described in this article will be helpful to everyone's Python program design.

Statement: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume relevant legal liability. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (Please replace # with @ when sending an email for reporting. Provide relevant evidence, and once verified, this site will immediately delete the content suspected of infringement.)

You May Also Like