8

I need to download a massive amount of excel-files (estimated: 500 - 1000) from sellercentral.amazon.de. Manually downloading is not an option, as every download needs several clicks until the excel pops up.

Since amazon cannot provide me a simple xml with its structure, I decided to automate this on my own. The first thing coming to mind was Selenium and Firefox.

The Problem:

A login to sellercentral is required, as well as 2-factor-authentication (2FA). So if I login once, i can open another tab, enter sellercentral.amazon.de and am instantly logged in. I can even open another instance of the browser, and be instantly logged in there too. They might be using session-cookies. The target URL to "scrape" is https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu .

But when I open the URL from my python-script with selenium webdrive, a new instance of the browser is launched, in which I am not logged in. Even though, there are instances of firefox running at the same time, in which I am logged in. So I guess the instances launched by selenium are somewhat different.

What I've tried:

I tried setting a timedelay after the first .get() (to open site), then I'll manually login, and after that redoing the .get(), which makes the script go on for forever.

from selenium import webdriver import time browser = webdriver.Firefox() # Wait for website to fire onload event browser.get("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu") time.sleep(30000) browser.get("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu") elements = browser.find_elements_by_tag_name("browse-node-component") print(str(elements)) 

What am I looking for?

Need solution to use the two factor authentication token from google authenticator.

I want the selenium to be opened up as a tab in the existing instance of the firefox browser, where I will have already logged in beforehand. Therefore no login (should be) required and the "scraping" and downloading can be done. If there's no direct way, maybe someone comes up with a workaround?

I know selenium cannot download the files itself, as the popups are no longer part of the browser. I'll fix that when I get there.

Important Side-Notes: Firefox is not a given! I'll gladly accept a solution for any browser.

6
  • Welcome to SO. First of all, selenium will initiate it's own browser instance (by default without any add-ons'). Alternatively you can open the browser, navigate to the page and enter the credentials using selenium, then use javascript to open new tab with the url and continue with your next steps. Let me know if you need a sample code...
    – supputuri
    CommentedApr 26, 2019 at 15:23
  • @supputuri: I'd love to get some sample code. About entering the credentials: Username and password is obvious, but for 2FA a SMS is used. How would I be able to pass the value to selenium to enter it?
    – Lino
    CommentedApr 26, 2019 at 15:31
  • Is there a way change the 2FA to Email so that you can read the token from the email? If not, then you might have to consider using 'ringcentral' kind of option redirect the sms and then read the 2FA from that using API.
    – supputuri
    CommentedApr 26, 2019 at 15:42
  • @supputuri: I just checked, but sadly not. Only two options to receive the security code: SMS or authentication app. I've just set up Google Authenticator. Maybe I can get access to that? I take by your comments that it is easier to go trough the login with selenium, instead of trying to open the selenium instance inside the browser, right?
    – Lino
    CommentedApr 26, 2019 at 15:56
  • We can retrieve the 2FA from Google Authenticator and use that in your login.
    – supputuri
    CommentedApr 26, 2019 at 16:05

1 Answer 1

14

Here is the code that will read the google authenticator token and used in the login. Used js to open the new tab. Install pyotp package before running the test code.

pip install pyotp

Test code:

from pyotp import * from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By driver = webdriver.Firefox() driver.get("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu") wait = WebDriverWait(driver,10) # enter the email email = wait.until(EC.presence_of_element_located((By.XPATH, "//input[@name='email']"))) email.send_keys("email goes here") # enter password driver.find_element_by_xpath("//input[@name='password']").send_keys("password goes here") # click on signin button driver.find_element_by_xpath("//input[@id='signInSubmit']").click() #wait for the 2FA feild to display authField = wait.until(EC.presence_of_element_located((By.XPATH, "xpath goes here"))) # get the token from google authenticator totp = TOTP("secret goes here") token = totp.now() print (token) # enter the token in the UI authField.send_keys(token) # click on the button to complete 2FA driver.find_element_by_xpath("xpath of the button goes here").click() # now open new tab driver.execute_script("""window.open("https://sellercentral.amazon.de/listing/download?ref=ag_dnldinv_apvu_newapvu")""") # continue with your logic from here 
7
  • Thanks! I'll get it tested in about 1.5h.
    – Lino
    CommentedApr 26, 2019 at 16:43
  • Filling out the form works great so far. However, what is meant by: totp = TOTP("secret goes here")? Is it a token for my Google authenticator? Or simply the token from with for Amazon (which is only 30 sec viable)?
    – Lino
    CommentedApr 26, 2019 at 18:34
  • 1
    Google Authenticator secret, not the token. Click on 'settings>export/import' in your google authenticator. Open the downloaded file, you will find the secret in there. Replace "secret goes here" with the secret from the file. Now code will pick the token from the google authenticator as part of execution. You don't need to worry about the 2FA token any more.
    – supputuri
    CommentedApr 26, 2019 at 18:44
  • 1
    Ty, you're a life saver, I'm telling you. I had already set up the G authenticator when I set up my Amazon account (with barcode back then). I couldn't find a way to get the key from back then, therefore I went through the whole process of setting 2FA for sellercentral up again, but this time with the secret instead of barcode aaaand IT WORKS! Thanks alot. For future visitors: You might need to change By.xpath to By.XPATH. In addition for password fill in, just do passWord.send_keys("your pw"), no need for the driver.find_element_... I will also edit the title, since the solution is different.
    – Lino
    CommentedApr 26, 2019 at 19:36
  • @supputuri i am facing the same 2FA issue ,will you be able to help me .you can ask me for chat here .
    – Marx Babu
    CommentedMay 19, 2020 at 16:43

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.