0

I am not very experienced with coding but I am creating a customtkinter application style script where a user can input a specific type of html that contains diagnostic addresses and various information attributed to that address, and the script will parse through it and return the selected address/information as a dictionary for further use.

The code works but the HTMLs can range from ~10000 to ~70000 lines in length and it will take over a minute to read through the larger HTMLs. I know there inefficiencies in my code so I am trying to figure out ways to reduce the waiting time while the script runs. I figure my biggest bottlenecks are the repeated creation of dataframes and the nested for loop afterwards. I have considered creating only one dataframe and iterating through it but I am unsure of the impact it would make.

How can I write this in a way the improves the runtime?

Here is the function:

# Clear list of previous selections fv.info_values_to_add.clear() # Read user's info selections filter_info_selections() # Open and read protocol file with open(file_name, 'r') as file: contents = file.read() # Global variable to used in export functions global Length_Of_Info_1 Length_Of_Info_1 = len(fv.info_values_to_add) # Create a soup object from the protocol parsed_protocol = BeautifulSoup(contents, "html.parser") for address, address_value in fv.protocol_values_1.items(): # String to be used to find the correct section of the html string_address = "ECU: " + (address) try: # Find the header for the parsed address table = parsed_protocol.find('p', string = re.compile(string_address)) # Select the correct table for the information data_table = table.find_all_next('table') # Create a dataframe from the table data_frame = pd.read_html(io.StringIO(str(data_table)))[1] # Clean data frame columns and values df_clean = data_frame.drop(columns=2, axis=1) # Save selected data to variables to be used sw_version = df_clean.iloc[1,1] hw_part_number = df_clean.iloc[2,1] hw_version = df_clean.iloc[3,1] vehicle_vin = df_clean.iloc[20,1] fazit_id = df_clean.iloc[21,1] coding = df_clean.iloc[7,1] vw_part_number = df_clean.iloc[0,1] # List to store variables to be added to the fv.protocol_values_1 dictionary temp_list = [] # Iterate through the info list and add the selected variables for key in fv.info_values_to_add: if key == "Software Version": temp_list.append(sw_version) elif key == "Hardware part number": temp_list.append(hw_part_number) elif key == "Hardware Version": temp_list.append(hw_version) elif key == "Fazit ID": temp_list.append(fazit_id) elif key == "VIN Number": temp_list.append(vehicle_vin) elif key == "Coding": temp_list.append(coding) elif key == "VW part number": temp_list.append(vw_part_number) else: pass # Add values to the address in the dictionary fv.protocol_values_1[address] = temp_list 
1
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking.
    – CommunityBot
    CommentedApr 16 at 9:00

1 Answer 1

0

If anybody reads this. Through some more googling and AI. I found lineprofiler to benchmark the function and was able to identify the bottlenecks. I found that my find_all_next was taking up most of the time, so I re-worked how it is finding the table with find_next. I also changed the parser from html to lxml for the Soup object. I ended up rewriting the code for the dataframe variables to be:

 # Extract values once using a dictionary for clarity value_map = { "Software Version": df_clean.iloc[1, 1], "Hardware part number": df_clean.iloc[2, 1], "Hardware Version": df_clean.iloc[3, 1], "VIN Number": df_clean.iloc[20, 1], "Fazit ID": df_clean.iloc[21, 1], "Coding": df_clean.iloc[7, 1], "VW part number": df_clean.iloc[0, 1], } # Use list comprehension for speed and readability temp_list = [value_map[key] for key in fv.info_values_to_add if key in value_map] 

These changes brought the runtimes from the larger HTMLs, from over a minute to ~10 seconds.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.