5

Overview

I am using a proxy network and want to configure it with Selenium on Python. I have seen many post use the HOST:PORT method, but proxy networks uses the "URL method" ofhttp://USER:PASSWORD@PROXY:PORT

SeleniumWire

I found SeleniumWire to be a way to connect the "URL method" of proxy networks to a Selenium Scraper. See basic SeleniumWire configuration:

from seleniumwire import webdriver options = { 'proxy': { 'http': 'http://USER:PASSWORD@PROXY:PORT', 'https': 'http://USER:PASSWORD@PROXY:PORT' }, } driver = webdriver.Chrome(seleniumwire_options=options) driver.get("https://some_url.com") 

This correctly adds and cycles a proxy to the driver, however on many websites the scraper is quickly blocked by CloudFlare. This blocking is something that does not happen when running on Local IP. After searching through SeleniumWire's GitHub Repository Issues, I found that this is caused by TLS fingerprinting and that there is no current solution to this issue.

Selenium Options

I tried to configure proxies the conventional selenium way:

from selenium import webdriver options = webdriver.ChromeOptions() options.add_argument("--proxy-server=http://USER:PASSWORD@PROXY:PORT") driver = webdriver.Chrome(options=options) driver.get("https://some_url.com") 

A browser instance does open but fails because of a network error. Browser instance does not load in established URL.

Docker Configuration

The end result of this configuration would be running python code within a docker container that is within a Lambda function. Don't know whether or not that introduces a new level of abstraction or not.

Summary

What other resources can I use to correctly configure my Selenium scraper to use the "URL method" of IP cycling?

Versions

  • python 3.9
  • selenium 3.141.0
  • docker 20.10.11

Support Tickets

Github: https://github.com/SeleniumHQ/selenium/issues/10605

ChromeDriver: https://bugs.chromium.org/p/chromedriver/issues/detail?id=4118

    2 Answers 2

    1

    Selenium Extension:

    A proxy network, or "URL" proxy, can be configured with Selenium as an extension. Create the following JS script and JSON file:

    JS script ("background.js")

    var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "http", host: "<PROXY>", port: parseInt(<PORT>) }, bypassList: ["foobar.com"] } }; chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: { username: "<USER>", password: "<PASSWORD>" } }; } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, ['blocking'] ); 

    JSON File ("manifest.json")

    { "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] }, "minimum_chrome_version":"22.0.0" } 

    Zip background.js and manifest.json as proxy.zip and write the following:

    from selenium import webdriver options = webdriver.ChromeOptions() options.add_extension("proxy.zip") driver = webdriver.Chrome(options=options) driver.get("https://whatismyipaddress.com/") 
    2
    • 1
      May I know how to modify it if my proxy url does't need a username and password ?
      – Rain.Wei
      CommentedJul 17, 2022 at 13:53
    • This is for proxy networks specifically, if you simply have a host:port endpoint it's much easier to configure. There is a lot of support for that: stackoverflow.com/questions/17082425/…CommentedAug 23, 2022 at 14:36
    0

    try to use this

    options.add_argument("--ignore-certificate-errors") 
    1
    • 2
      This answer was reviewed in the Low Quality Queue. Here are some guidelines for How do I write a good answer?. Code only answers are not considered good answers, and are likely to be downvoted and/or deleted because they are less useful to a community of learners. It's only obvious to you. Explain what it does, and how it's different / better than existing 8 month old accepted answer. From ReviewCommentedAug 25, 2022 at 17:16

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.