0

I'm trying to do some scraping for educational purposes, I just started and am fairly noob at python.

My problem is, in selenium I am trying to scrape a product page, take the name, price, shipping price, and sale counts and append them all into a dictionary to be pasted into a text file for further use.

My problem is, on this website there are 60 items a page, and the price variable is split into 4: "$", "56", ".", and "32" cents. So when I use the loop, it's either giving me "16" as a number for price, or its giving me the individual names of products and the price of each one is like:

Name: productname, Price: 15 Name: productname2, Price: "."

So it's splitting up all the prices seperate variables.

import sys from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import time def InitializeSearch(): productnnn = [] productppp = [] driver = webdriver.Firefox() driver.get("https://www.aliexpress.us/w/wholesale-legos.html?spm=a2g0o.detail.search.0") driver.maximize_window() driver.execute_script("window.scrollTo(0, 1000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 2000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 3000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 4000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 4500)") time.sleep(3) productname = driver.find_elements(By.XPATH, "//h3[@class='kc_j0']") ##text is questionable productprice = driver.find_elements(By.XPATH, "//span[@style='font-size:20px;decimal_point:.;comma_style:,;currency-symbol:$;show-decimal:true;symbol_position:left']") productsalecount = driver.find_elements(By.XPATH, "//span[@class='kc_jv']") productshippingfee = driver.find_elements(By.XPATH, "//span[@class='ml_a1 ml_mn']") #####This is where the code needs to go##### InitializeSearch() 

Above in #### is where I was putting the code, I have tried quite a bit of different methods including:

for n in productprice: pricedict = {} pricedict["Price"] = (n.text) ###the .text is required as driver returns a web elem print(pricedict) 

nesting this in the exact same one for product name, and nesting them all in another loop that counts through productname.

So basically, how do I take the driver elements I have here, cycle through all 60 of them and then append it all into a dictionary to later append to a .text file? even though the price is split into 4 variables?

Sidenote: driver returns a web element and I can encode(errors=ignore) and I get a byte object (the string starting with b')

when I decode it, it turns back into a web element unless I add ascii(encodedstring.decode(errors=ignore))

How do I convert this in selenium to a regular old string object and not a web element??

Tl;Dr: Make my Driver find elements combine all the variables into a dictionary for each individual item cleanly.

for n[0:3] in productprice: dictionary["price"] = (n.text)

Expecting the variables from the html/javascript to be cleanly laid out into a dictionary for each individual item.

3
  • Scrolling and delaying like that is unlikely to be a reliable approachCommentedApr 9 at 6:57
  • driver always gives web element - so you can use it with next find to search details in this element, or to execute_javascript on this element, or to send keyboard/mouse event to this element. This allows first to find rows in table and later search values in cells/columns in this row.
    – furas
    CommentedApr 9 at 11:22
  • if you use find_elements with char s at the end then it should gives list with all elements (or list with one element, or empty list) and it may need to use for-loop to work with every element separatelly and get text or other value from element. If you need only first element then you can use find_element without char s
    – furas
    CommentedApr 9 at 11:25

1 Answer 1

1

You can select the parent div element, that would give you the full text by .text method.

Also, Your selectros are gonna fail as the class prefix are dynamically generated. So it will frequently fail to find elements.

In the following, I've used chromedriver istead of firefox and updated the selectors so that you can find the elements everytime.

If you want to exclude the $ sign from the price, then you can get format it later.

import sys from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys import time def InitializeSearch(): productnnn = [] productppp = [] chrome_options = Options() chrome_options.add_argument('--start-maximized') chrome_options.add_argument("--headless") chrome_options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/135.0.0.0 Safari/537.36") # Set user agent driver = webdriver.Chrome(chrome_options) driver.get("https://www.aliexpress.us/w/wholesale-legos.html?spm=a2g0o.detail.search.0") driver.execute_script("arguments[0].scrollIntoView();", driver.find_element(By.CSS_SELECTOR,'div.footer-copywrite')) time.sleep(1) card_items = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item") productname = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item h3") productprice = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item div[class$=_k1]") productsalecount = driver.find_elements(By.CSS_SELECTOR, "a.search-card-item span[class$=_jv]") productshippingfee = driver.find_elements(By.XPATH, "//span[@class='ml_a1 ml_mn']") for n in productprice: pricedict = {} pricedict["Price"] = (n.text) ###the .text is required as driver returns a web elem print(pricedict) driver.quit() InitializeSearch() 

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.