python+Example code of selenium identifying captcha and logging in

Due to work requirements, logging into the website requires a captcha. Initially, I had researched captcha recognition, but I could never get the captcha I needed. It wasn't until this Friday that I remembered this, and yesterday I successfully solved it.

Let's get to the main topic:

Python version:3.4.3

Required code libraries: PIL, selenium, tesseract

Here is the code:

#coding:utf-8
import subprocess
from PIL import Image
from PIL import ImageOps
from selenium import webdriver
import time,os,sys
def cleanImage(imagePath):
  image = Image.open(imagePath)  #Open the image
  image = image.point(lambda x: 0 if x<143 else 255) #Process each pixel on the image, making each point either black or white
  borderImage = ImageOps.expand(image,border=20,fill='white')
  borderImage.save(imagePath)
def getAuthCode(driver, url="http://localhost/"):
  captchaUrl = url + "common/random"
  driver.get(captchaUrl) 
  time.sleep(0.5")
  driver.save_screenshot("captcha.jpg")  #Capture screen and save the image
  #urlretrieve(captchaUrl, "captcha.jpg")
  time.sleep(0.5")
  cleanImage("captcha.jpg")
  p = subprocess.Popen(["tesseract", "captcha.jpg", "captcha"], stdout=\
             subprocess.PIPE,stderr=subprocess.PIPE)
  p.wait()
  f = open("captcha.txt", "r")
  #Clean any whitespace characters
  captchaResponse = f.read().replace(" ", "").replace("\n", "")
  print("Captcha solution attempt: " + captchaResponse
  if len(captchaResponse) == 4:
    return captchaResponse
  else:
    return False
def withoutCookieLogin(url="http://org.cfu666.com/"):
  driver = webdriver.Chrome()
  driver.maximize_window()
  driver.get(url)
  while True:   
    authCode = getAuthCode(driver, url)
    if authCode:
      driver.back()
      driver.find_element_by_xpath("//input[@id='orgCode' and @name='orgCode']
      driver.find_element_by_xpath("//input[@id='orgCode' and @name='orgCode']
      driver.find_element_by_xpath("//input[@id='account' and @name='username']
      driver.find_element_by_xpath("//input[@id='account' and @name='username']
      driver.find_element_by_xpath("//input[@type='password' and @name='password']
      driver.find_element_by_xpath("//input[@type='password' and @name='password']       
      driver.find_element_by_xpath("//input[@type='text' and @name='authCode']
      driver.find_element_by_xpath("//button[@type='submit']
      try:
        time.sleep(3")
        driver.find_element_by_xpath("//*[@id='side-menu']/li[2]/ul/li/a).click()
        return driver
      except:
        print("authCode Error:", authCode)
        driver.refresh()
  return driver
driver = withoutCookieLogin("http://localhost/")
driver.get("http://localhost/enterprise/add/")

How to get the captcha we need

On the way to get the captcha, I have fallen into too many pitfalls, read too many articles, many of which teach you how to recognize the captcha, but do not explain how to get the captcha image you need at the moment.

My method of handling is:

1. First, use selenium to open the login page address url you need to log in to1

2. Get the captcha address url by accessing the audit element2（In fact, the simplest way is to right-click to open a new page）

3: At url1page, input the address url2Enter the url2page, and then take a screenshot and save the captcha page

4: After obtaining the captcha string, click the browser back button to return to the url1Login Page

5: Enter the information and captcha required for login

6: Click login

7: Verify the login page after login, judge whether it is successful, if not successful, then you need to1-7operation.

To protect the company's information, this page is a service I set up locally. I have tested the captcha acquisition method on the registration page of Juejin Online. It can be passed. (The method of handling the captcha is only for the captcha background with pixel points. If there are horizontal lines in the captcha, additional processing is required.)

That's all for this article. Hope it helps with your learning and also hope everyone supports the Yelling Tutorial.

Statement: The content of this article is from the network, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume relevant legal liabilities. If you find any content suspected of copyright infringement, please send an email to: notice#oldtoolbag.com (When sending an email, please replace # with @ for reporting, and provide relevant evidence. Once verified, this site will immediately delete the infringing content.)

Basic Tutorial