I'm trying to do some scraping for educational purposes, I just started and am fairly noob at python.
My problem is, in selenium I am trying to scrape a product page, take the name, price, shipping price, and sale counts and append them all into a dictionary to be pasted into a text file for further use.
My problem is, on this website there are 60 items a page, and the price variable is split into 4: "$", "56", ".", and "32" cents. So when I use the loop, it's either giving me "16" as a number for price, or its giving me the individual names of products and the price of each one is like:
Name: productname, Price: 15 Name: productname2, Price: "."
So it's splitting up all the prices seperate variables.
import sys from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import time def InitializeSearch(): productnnn = [] productppp = [] driver = webdriver.Firefox() driver.get("https://www.aliexpress.us/w/wholesale-legos.html?spm=a2g0o.detail.search.0") driver.maximize_window() driver.execute_script("window.scrollTo(0, 1000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 2000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 3000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 4000)") time.sleep(3) driver.execute_script("window.scrollTo(0, 4500)") time.sleep(3) productname = driver.find_elements(By.XPATH, "//h3[@class='kc_j0']") ##text is questionable productprice = driver.find_elements(By.XPATH, "//span[@style='font-size:20px;decimal_point:.;comma_style:,;currency-symbol:$;show-decimal:true;symbol_position:left']") productsalecount = driver.find_elements(By.XPATH, "//span[@class='kc_jv']") productshippingfee = driver.find_elements(By.XPATH, "//span[@class='ml_a1 ml_mn']") #####This is where the code needs to go##### InitializeSearch()
Above in #### is where I was putting the code, I have tried quite a bit of different methods including:
for n in productprice: pricedict = {} pricedict["Price"] = (n.text) ###the .text is required as driver returns a web elem print(pricedict)
nesting this in the exact same one for product name, and nesting them all in another loop that counts through productname.
So basically, how do I take the driver elements I have here, cycle through all 60 of them and then append it all into a dictionary to later append to a .text file? even though the price is split into 4 variables?
Sidenote: driver returns a web element and I can encode(errors=ignore) and I get a byte object (the string starting with b')
when I decode it, it turns back into a web element unless I add ascii(encodedstring.decode(errors=ignore))
How do I convert this in selenium to a regular old string object and not a web element??
Tl;Dr: Make my Driver find elements combine all the variables into a dictionary for each individual item cleanly.
for n[0:3] in productprice: dictionary["price"] = (n.text)
Expecting the variables from the html/javascript to be cleanly laid out into a dictionary for each individual item.
find
to search details in this element, or toexecute_javascript
on this element, or to send keyboard/mouse event to this element. This allows first to find rows in table and later search values in cells/columns in this row.find_elements
with chars
at the end then it should gives list with all elements (or list with one element, or empty list) and it may need to usefor
-loop to work with every element separatelly and get text or other value from element. If you need only first element then you can usefind_element
without chars