3
\$\begingroup\$

On last test, the below code takes approximately 10 seconds to download then print the data from 10 url's. I wish to speed this up as much as possible as later on I plan to expand this further and use the scraped data as live data in a GUI.

The display_value() function consists of 95% of the time, which seems like an awful lot considering it's a small number. I am thinking it's due to how I've written the function call, but out of ideas.

def live_indices(): import sys """Acquire stock value from Yahoo Finance using stock symbol as key. Then assign the relevant variable to the respective value. ie. 'GSPC' equates to the value keyed to 'GSPC' stock indices_price_value. """ start_time = datetime.datetime.now() # Use to time how long the function takes to complete import requests import bs4 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'} all_indices_values = {} symbols = ['GSPC', 'DJI', 'IXIC', 'FTSE', 'NSEI', 'FCHI', 'N225', 'GDAXI', 'IMOEX.ME', '000001.SS'] for ticker in symbols: url = f'https://uk.finance.yahoo.com/lookup/all?s={ticker}' tmp_res = requests.get(url, headers=headers) tmp_res.raise_for_status() soup = bs4.BeautifulSoup(tmp_res.text, 'html.parser') indices_price_value = soup.select('#Main tbody>tr td')[2].text all_indices_values[ticker] = indices_price_value end_time = datetime.datetime.now() - start_time sys.stdout.write(f'DONE - time taken = {end_time}'.upper()) return all_indices_values def display_value(live_indices): print(live_indices['GSPC']) print(live_indices['DJI']) print(live_indices['IXIC']) print(live_indices['FTSE']) print(live_indices['NSEI']) print(live_indices['FCHI']) print(live_indices['N225']) print(live_indices['GDAXI']) print(live_indices['IMOEX.ME']) print(live_indices['000001.SS']) display_value(live_indices()) 
\$\endgroup\$
1
  • 2
    \$\begingroup\$Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Feel free to edit and give it a different title if there is something more appropriate.\$\endgroup\$CommentedSep 21, 2022 at 16:11

1 Answer 1

2
\$\begingroup\$

I think your profiling potentially is misleading. Neither 10 prints nor accessing 10 dictionary keys should take upwards 10 seconds. Maybe try profiling again with something like

def display_values(live_indices): for key in SYMBOLS: print(key) all_indices_values = live_indices() display_values(all_indices_values) 

Now, to actually speed up the requests you will have to use parallelism. You are currently doing sequential requests, which understandably takes forever since you have to wait for each scrape to finish before starting the next one.

You can probably also look for some API instead of scraping web pages for the data; that should decrease the payload by quite a bit.

If you do not already know, there are modules for both profiling and for timing built into the standard library - you don't need to manually try to time stuff.

\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.