I am not very experienced with coding but I am creating a customtkinter application style script where a user can input a specific type of html that contains diagnostic addresses and various information attributed to that address, and the script will parse through it and return the selected address/information as a dictionary for further use.
The code works but the HTMLs can range from ~10000 to ~70000 lines in length and it will take over a minute to read through the larger HTMLs. I know there inefficiencies in my code so I am trying to figure out ways to reduce the waiting time while the script runs. I figure my biggest bottlenecks are the repeated creation of dataframes and the nested for loop afterwards. I have considered creating only one dataframe and iterating through it but I am unsure of the impact it would make.
How can I write this in a way the improves the runtime?
Here is the function:
# Clear list of previous selections fv.info_values_to_add.clear() # Read user's info selections filter_info_selections() # Open and read protocol file with open(file_name, 'r') as file: contents = file.read() # Global variable to used in export functions global Length_Of_Info_1 Length_Of_Info_1 = len(fv.info_values_to_add) # Create a soup object from the protocol parsed_protocol = BeautifulSoup(contents, "html.parser") for address, address_value in fv.protocol_values_1.items(): # String to be used to find the correct section of the html string_address = "ECU: " + (address) try: # Find the header for the parsed address table = parsed_protocol.find('p', string = re.compile(string_address)) # Select the correct table for the information data_table = table.find_all_next('table') # Create a dataframe from the table data_frame = pd.read_html(io.StringIO(str(data_table)))[1] # Clean data frame columns and values df_clean = data_frame.drop(columns=2, axis=1) # Save selected data to variables to be used sw_version = df_clean.iloc[1,1] hw_part_number = df_clean.iloc[2,1] hw_version = df_clean.iloc[3,1] vehicle_vin = df_clean.iloc[20,1] fazit_id = df_clean.iloc[21,1] coding = df_clean.iloc[7,1] vw_part_number = df_clean.iloc[0,1] # List to store variables to be added to the fv.protocol_values_1 dictionary temp_list = [] # Iterate through the info list and add the selected variables for key in fv.info_values_to_add: if key == "Software Version": temp_list.append(sw_version) elif key == "Hardware part number": temp_list.append(hw_part_number) elif key == "Hardware Version": temp_list.append(hw_version) elif key == "Fazit ID": temp_list.append(fazit_id) elif key == "VIN Number": temp_list.append(vehicle_vin) elif key == "Coding": temp_list.append(coding) elif key == "VW part number": temp_list.append(vw_part_number) else: pass # Add values to the address in the dictionary fv.protocol_values_1[address] = temp_list