I've written a script using python to grab different categories from a webpage. I used "grequests" in my scraper to perform the activity. My intention here was to perform the action swiftly making asynchronous HTTP requests. My scraper is running flawlessly and collecting data as it should. However, in case of performance, I'm not sure it is giving the optimum. Any suggestion to make it better will be highly appreciated.
import grequests ; from lxml import html main_link = "http://quotes.toscrape.com/" def toscrape_scraper(item_link): storage = [item_link] # Depositing link as a list response = (grequests.get(req) for req in storage) for req in grequests.map(response): #Sending requests tree = html.fromstring(req.text) for titles in tree.cssselect("span.tag-item a.tag"): grabbing_docs(main_link + titles.attrib['href']) def grabbing_docs(base_link): vault = [base_link] # Storing links as a list res = (grequests.get(req) for req in vault) for hreq in grequests.map(res): #Sending requests root = html.fromstring(hreq.text) for soups in root.cssselect("div.quote"): quote = soups.cssselect("span.text")[0].text author = soups.cssselect("small.author")[0].text print(quote, author) next_page = root.cssselect("li.next a")[0].attrib['href'] if root.cssselect("li.next a") else "" if next_page: page_link = main_link + next_page grabbing_docs(page_link) #Reusing the newly collected paginated links toscrape_scraper(main_link)
grabbing_docs(link)
) more than once in the current implementation? If that were to happen, you could enter a cycle.\$\endgroup\$