Any one can help me optimise those three functions? I did profiling and timing of my original python file and found that most calls and time duration was because of those three functions.
The three functions are from a text normaliser for text processing. The full python file is available if anyone wants to have a look at the whole script. Thank you
def __rstrip(self, token): for i in range(5): if len(token): if token[-1] in [',', '.', ';', '!', '?', ':', '"']: token = token[:-1] else: break return token def __lstrip(self, token): for i in range(5): if len(token): if token[0] in [',', '.', ';', '!', '?', ':', '"', '\'']: token = token[1:] else: break return token def __generate_results(self, original, normalised): words = [] for t in normalised: if len(t[0]): words.append(t[0]) text = ' '.join(words) tokens = [] if len(original): for t in original: idx = t[1] words = [] for t2 in normalised: if idx == t2[1]: words.append(t2[0]) display_text = self.__rstrip(t[0]) display_text = self.__lstrip(display_text) tokens.append((t[0], words, display_text)) else: tokens.append(('', '', '')) return text, tokens
__lstrip
and__rstrip
functions? Is the goal of these function to remove the first/last 5 symbols of a token? Because it seems like a weird use case.\$\endgroup\$token
? If these sequences are large, your current stripping methods could be quite inefficient (you create a new list with each iteration); but if they are tiny, that might not be a major issue. Or, how large isnormalised
and what type of collection is it? If it's a large list/tuple, repeatedin
checks might be costly, but if it's small or if it's a dict/set, then perhaps that's not the source of trouble. Currently, your question is too abstract.\$\endgroup\$