python snippet optimization explanation

Question

I'm attempting optimize and figure out why for input n a function I wrote f(n) appears to never finish executing when n > 26.

TL;DR I need to figure out how to find the complexity of various python instructions that involve concatenating an empty string and dictionary lookups all within a while loop. The rest appears to be O(1) or negligible. Read below for proof that I've RTFM'd extensively and need help grasping this

f(n) returns a random alphabetical string of length n:

def func(n, prefix=None): if prefix is None: empty = "" else: empty = prefix while (len(empty) < n): c = padder[random.randint(0,len(padder) -1)] if empty.find(c) == -1: str = empty+c empty = str return empty

I disassemble the Python bytecode

>>>dis.dis(func(8)) 0 BREAK_LOOP 1 POP_BLOCK 2 PRINT_NEWLINE 3 INPLACE_POWER 4 RETURN_VALUE 5 PRINT_ITEM 6 <69> 7 IMPORT_STAR

However, when n > 26 it hangs. Why is the following is faster? (note: replace x with desired length of string)

print(''.join(random.choice(string.ascii_uppercase) for i in range(x)))

When I disassemble it like above, it appears to execute in O(n).

So far, I came to the conclusion that I should break down each operation, find their complexities, and that would determine why it hangs at n > 26. I found a cheatsheet for the time complexity of common python operations

However, it doesnt cover SETUP_LOOP, LOAD_CONST vs LOAD_FAST, etc. and other python instructions

Can someone please explain why my first script is slower than the second and at least point me to some good reference material? I google dork'd

intext:"python" intext:"time complexity" intext:"instruction"

Which was light on a cheatsheet. I found a good explanation but it was somewhat hard to follow

due to lack of rep I couldnt add that I read the below link also docs.python.org/3.1/library/dis.html — grepNstepN, CommentedMar 18, 2016 at 16:34
If it's going from executing relatively quickly at n = 26 to hanging at n > 26, that's probably a bug, not a case of the runtime growing too asymptotically fast. — DylanSp, CommentedMar 18, 2016 at 16:38
dis.dis(func(8)) doesn't do what you think it does. It executed func(8) and then interprets the resulting string as bytecode. The result is meaningless and in no way related to the bytecode of func. To disassembe func, just write dis.dis(func) — user7043, CommentedMar 18, 2016 at 16:42
Are you looking for an explanation of why your script isn't working (which is explicitly off-topic here) or an explanation of why the second script is faster (which is)? — user53141, CommentedMar 18, 2016 at 16:42
The reason it hangs for N > 26 is because you are checking for unique characters from the alphabet. (Assuming padder contains the letters of the alphabet. You don't show it here. — user53141, CommentedMar 18, 2016 at 18:25

user53141user53141 · Accepted Answer · 2016-03-18 18:41:11Z

There is a significant difference in your two code snippets that causes them to perform differently.

Your first version create a string containing random unique letters from the source set. Your second version merely creates a string containing random letters from the set. So 'AABDEEE' would be generated for the second, but not the first.

So the second version's algorithm is:

while number of letters less than goal select a letter at random add it to the result

Your second is:

while number of letters less than goal do select a letter at random while letter is already in goal add it to the result

Think about what happens as you select letters, assuming the source set is 26 letters.

First choice, you will always get one that works. Second choice, there is a 1/26 chance you have to re-pick. Third choice, there is a 2/26 chance you have to re-pick. . . Twenty-fifth choice, there is a 24/26 chance you have to re-pick.

Thus if you ask for a string that is 25 characters long, with 26 choices, then it will take 13 picks to find one that works. If you do the math, that means it has to do 71 calls to random.randint to fill a 25 character string. In contrast, the second version requires only 25 calls to random.choice to generate a 25 character string (with duplicates.)

if uniqueness is a requirement, then what you really want is an algorithm that uses some sort of shuffle. And in fact, the python random module has something that does exactly what I think you are trying to do:

 ''.join(random.sample(string.ascii_uppercase, 25))

If you want to do it by hand, the algorithm is likely something like:

 s = ascii letters result = "" for i in 1 to 25 i = random int c = s[i] result += c delete s[i]

(Translation to python left as an exercise for the reader.)

Furthermore, by your reasoning, the first algorithm can't produce a longer output than the string it's picking from. Assuming padder is 26 characters long, for n > 26, the first algorithm can't possibly produce a unique choice past the 26th character. — DylanSp, CommentedMar 18, 2016 at 18:54
OOH now what you said sunk in: it'll only get a unique string with no repeating chars so its limited to < 26 (unless i go to unicode lol) thanks — grepNstepN, CommentedMar 18, 2016 at 19:58

Stack Exchange Network

python snippet optimization explanation

1 Answer 1

Hot Network Questions

python snippet optimization explanation

1 Answer 1

Related

Hot Network Questions