4
\$\begingroup\$

I'm trying to understand how to work with aiohttp and asyncio. The code below retrieves all websites in urls and prints out the "size" of each response.

  • Is the error handling within the fetch method correct?
  • Is it possible to remove the result of a specific url from results in case of an exception - making return (url, '') unnecessary?
  • Is there a better way than ssl=False to deal with a potential ssl.SSLCertVerificationError?
  • Any additional advice on how i can improve my code quality is highly appreciated
    import asyncio import aiohttp async def fetch(session, url): try: async with session.get(url, ssl=False) as response: return url, await response.text() except aiohttp.client_exceptions.ClientConnectorError as e: print(e) return (url, '') async def main(): tasks = [] urls = [ 'http://www.python.org', 'http://www.jython.org', 'http://www.pypy.org' ] async with aiohttp.ClientSession() as session: while urls: tasks.append(fetch(session, urls.pop())) results = await asyncio.gather(*tasks) [print(f'{url}: {len(result)}') for url, result in results] if __name__ == '__main__': loop = asyncio.get_event_loop() loop.run_until_complete(main()) loop.close() 

    Update

    • Is there a way how i can add tasks to the list from within the "loop"? e.g. add new urls while scraping a website and finding new subdomains to scrape.
    \$\endgroup\$

      1 Answer 1

      4
      \$\begingroup\$
      tasks = [] while urls: tasks.append(fetch(session, urls.pop())) 

      can be largely simplified to

      tasks = [fetch(session, url) for url in urls] 

      Is it possible to remove the result of a specific url from results in case of an exception - making return (url, '') unnecessary?

      Yes, somewhat. asyncio.gather accept a return_exceptions parameters. Set it to True to avoid a single exception failing the gather call. You must filter them out afterwards anyway:

      import asyncio import aiohttp async def fetch(session, url): async with session.get(url, ssl=False) as response: return await response.text() async def main(): urls = [ 'http://www.python.org', 'http://www.jython.org', 'http://www.pypy.org' ] async with aiohttp.ClientSession() as session: tasks = [fetch(session, url) for url in urls] results = await asyncio.gather(*tasks) for url, result in zip(urls, results): if not isinstance(result, Exception): print(f'{url}: {len(result)}') else: print(f'{url} FAILED') if __name__ == '__main__': loop = asyncio.get_event_loop() loop.run_until_complete(main()) loop.close() 
      \$\endgroup\$
      1
      • \$\begingroup\$Not sure if your code example is meant to catch exceptions or if you just wanted to point out how to deal with exceptions in the results list. I only manage to have a working code if i put try... except within the fetch method. If put return e in except your code works.\$\endgroup\$CommentedJul 24, 2018 at 19:46

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.