0

Hello World,

New in Python, I am trying to webscrape a javascript page : https://search.gleif.org/#/search/

Please find below the result from my code (using request)

<!DOCTYPE html> <html> <head><meta charset="utf-8"/> <meta content="width=device-width,initial-scale=1" name="viewport"/> <title>LEI Search 2.0</title> <link href="/static/icons/favicon.ico" rel="shortcut icon" type="image/x-icon"/> <link href="https://fonts.googleapis.com/css?family=Open+Sans:200,300,400,600,700,900&amp;subset=cyrillic,cyrillic-ext,greek,greek-ext,latin-ext,vietnamese" rel="stylesheet"/> <link href="/static/css/main.045139db483277222eb714c1ff8c54f2.css" rel="stylesheet"/></head> <body> <div id="app"></div> <script src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript"></script> <script src="/static/js/vendor.6bd9028998d5ca3bb72f.js" type="text/javascript"></script> <script src="/static/js/main.5da23c5198041f0ec5af.js" type="text/javascript"></script> </body> </html> 

The question: Instead of retrieving the above script:
"src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript""

I would like to have the content of the table in order to store it.

Table that I want to scrapeenter image description here

5
  • What exactly do you want to find?CommentedNov 10, 2019 at 22:51
  • So the question is how to set proxy auth in selenium? You can google that and find some workarounds for selenium's limitations.CommentedNov 10, 2019 at 23:46
  • @pguardiario the question is how do I get the table content instead of the js.script.if you have any hint?
    – A2N15
    CommentedNov 24, 2019 at 10:25
  • @SuperStormer, I want to scrape the table but instead of that Im getting the script js. Have you any idea on how to deal with it?
    – A2N15
    CommentedNov 24, 2019 at 10:28
  • You would use selenium for that.CommentedNov 25, 2019 at 0:56

1 Answer 1

1

Following code is written using PySelenium.

import time from selenium import webdriver country = [] legal_name = [] lei = [] driver = webdriver.Chrome() driver.implicitly_wait(5) for i in range(1,30395): driver.get('https://search.gleif.org/#/search/fulltextFilterId=LEIREC_FULLTEXT&currentPage='+str(i)+'&perPage=50&expertMode=false#results-section') time.sleep(5) country += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell country"]/a')] legal_name += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell legal-name"]/a')] lei += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell lei"]/a')] 

Logging in (Change this with the respective elements.)

driver.find_element_by_id("UserName").send_keys("xxxx") driver.find_element_by_name("Password").send_keys("yyyy") driver.find_element_by_class("loginButton").click() 

Get page content

print(driver.page_source)

6
  • THANK YOU very much it works :) Just have 2 more questions: how can I set my username and password (firefox is asking me authentification each time). Furthermore, how can I just display the content of the result (like request.content?)
    – A2N15
    CommentedNov 25, 2019 at 15:38
  • @Annis15 Edited the answer to include your 2 questions.CommentedNov 26, 2019 at 10:01
  • Thanks again for the content page. However, when I try to add your codes find_element_by_id, nothing happen (with my username and pwd) . I have tried to change the "UserName" by "Username:" (how firefox popup display it) but nothing. However, the first question is solved, right now it's just trying to optimize the script.
    – A2N15
    CommentedNov 26, 2019 at 15:16
  • @Annis15 Could you please provide the page URL your'e trying to scrape, so I can provide you the exact code for the login.CommentedNov 27, 2019 at 6:29
  • The page is the same that you scraped: driver.get('search.gleif.org/#/search/…)
    – A2N15
    CommentedNov 27, 2019 at 9:14

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.