0

I'm attempting optimize and figure out why for input n a function I wrote f(n) appears to never finish executing when n > 26.

TL;DR I need to figure out how to find the complexity of various python instructions that involve concatenating an empty string and dictionary lookups all within a while loop. The rest appears to be O(1) or negligible. Read below for proof that I've RTFM'd extensively and need help grasping this


f(n) returns a random alphabetical string of length n:

def func(n, prefix=None): if prefix is None: empty = "" else: empty = prefix while (len(empty) < n): c = padder[random.randint(0,len(padder) -1)] if empty.find(c) == -1: str = empty+c empty = str return empty 

I disassemble the Python bytecode

>>>dis.dis(func(8)) 0 BREAK_LOOP 1 POP_BLOCK 2 PRINT_NEWLINE 3 INPLACE_POWER 4 RETURN_VALUE 5 PRINT_ITEM 6 <69> 7 IMPORT_STAR 

However, when n > 26 it hangs. Why is the following is faster? (note: replace x with desired length of string)

print(''.join(random.choice(string.ascii_uppercase) for i in range(x))) 

When I disassemble it like above, it appears to execute in O(n).

So far, I came to the conclusion that I should break down each operation, find their complexities, and that would determine why it hangs at n > 26. I found a cheatsheet for the time complexity of common python operations

However, it doesnt cover SETUP_LOOP, LOAD_CONST vs LOAD_FAST, etc. and other python instructions

Can someone please explain why my first script is slower than the second and at least point me to some good reference material? I google dork'd

intext:"python" intext:"time complexity" intext:"instruction" 

Which was light on a cheatsheet. I found a good explanation but it was somewhat hard to follow

6
  • due to lack of rep I couldnt add that I read the below link also docs.python.org/3.1/library/dis.htmlCommentedMar 18, 2016 at 16:34
  • 1
    If it's going from executing relatively quickly at n = 26 to hanging at n > 26, that's probably a bug, not a case of the runtime growing too asymptotically fast.
    – DylanSp
    CommentedMar 18, 2016 at 16:38
  • dis.dis(func(8)) doesn't do what you think it does. It executed func(8) and then interprets the resulting string as bytecode. The result is meaningless and in no way related to the bytecode of func. To disassembe func, just write dis.dis(func)
    – user7043
    CommentedMar 18, 2016 at 16:42
  • 1
    Are you looking for an explanation of why your script isn't working (which is explicitly off-topic here) or an explanation of why the second script is faster (which is)?
    – user53141
    CommentedMar 18, 2016 at 16:42
  • 2
    The reason it hangs for N > 26 is because you are checking for unique characters from the alphabet. (Assuming padder contains the letters of the alphabet. You don't show it here.
    – user53141
    CommentedMar 18, 2016 at 18:25

1 Answer 1

1

There is a significant difference in your two code snippets that causes them to perform differently.

Your first version create a string containing random unique letters from the source set. Your second version merely creates a string containing random letters from the set. So 'AABDEEE' would be generated for the second, but not the first.

So the second version's algorithm is:

while number of letters less than goal select a letter at random add it to the result 

Your second is:

while number of letters less than goal do select a letter at random while letter is already in goal add it to the result 

Think about what happens as you select letters, assuming the source set is 26 letters.

First choice, you will always get one that works. Second choice, there is a 1/26 chance you have to re-pick. Third choice, there is a 2/26 chance you have to re-pick. . . Twenty-fifth choice, there is a 24/26 chance you have to re-pick.

Thus if you ask for a string that is 25 characters long, with 26 choices, then it will take 13 picks to find one that works. If you do the math, that means it has to do 71 calls to random.randint to fill a 25 character string. In contrast, the second version requires only 25 calls to random.choice to generate a 25 character string (with duplicates.)

if uniqueness is a requirement, then what you really want is an algorithm that uses some sort of shuffle. And in fact, the python random module has something that does exactly what I think you are trying to do:

 ''.join(random.sample(string.ascii_uppercase, 25)) 

If you want to do it by hand, the algorithm is likely something like:

 s = ascii letters result = "" for i in 1 to 25 i = random int c = s[i] result += c delete s[i] 

(Translation to python left as an exercise for the reader.)

2
  • Furthermore, by your reasoning, the first algorithm can't produce a longer output than the string it's picking from. Assuming padder is 26 characters long, for n > 26, the first algorithm can't possibly produce a unique choice past the 26th character.
    – DylanSp
    CommentedMar 18, 2016 at 18:54
  • OOH now what you said sunk in: it'll only get a unique string with no repeating chars so its limited to < 26 (unless i go to unicode lol) thanksCommentedMar 18, 2016 at 19:58

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.