11
\$\begingroup\$

I took my code from this answer by me, I found it useful, so I developed it further, and here is the result.

This script generates cryptographically secure passwords. The characters used by the passwords are chosen from the 94 printable ASCII characters that aren't blank characters or control characters.

The characters are split into 4 categories: 26 lowercase letters, 26 UPPERCASE letters, 10 digits and 32 punctuations. Each password must contain UPPERCASE letters, lowercase letters, and digits, with optional punctuations.

The length of the passwords is 8 to 94 inclusive if special characters are allowed, or 8 to 62 if they aren't. The passwords can contain unique characters only and also duplicates, whether passwords will contain duplicate characters is controlled by an optional argument. And the count of characters from each of the categories in the passwords are approximately the same.


Code

import random import secrets from functools import reduce from operator import iconcat from string import ascii_lowercase, ascii_uppercase, digits, punctuation from typing import Generator, List CHARSETS = ( (ascii_lowercase, ascii_uppercase, digits), (ascii_lowercase, ascii_uppercase, punctuation, digits) ) COUNTS = ( ((3, 36), (2, 10)), ((4, 68), (3, 42), (2, 10)) ) def choose_charset(charset: str, number: int, unique: bool = False) -> List[str]: if not unique: return [secrets.choice(charset) for _ in range(number)] charset = list(charset) choices = [] for _ in range(number): choice = secrets.choice(charset) choices.append(choice) charset.remove(choice) return choices def split(number: int, symbol: bool = False) -> Generator[int, None, None]: for a, b in COUNTS[symbol]: n = max( 2 + secrets.randbelow( number // a - 1 ), number - b ) number -= n yield n if number > 10: raise ValueError('remaining number is too big') yield number def generate_password(length: int, unique: bool = False, symbol: bool = True) -> str: if not 8 <= length <= (62, 94)[symbol]: raise ValueError( 'argument `length` should be an `int` between 8 and 94, or between 8 and 62 if symbols are not allowed') password = reduce( iconcat, map( choose_charset, CHARSETS[symbol], split(length, symbol), [unique]*4 ), [] ) random.shuffle(password) return ''.join(password) if __name__ == '__main__': import argparse parser = argparse.ArgumentParser( prog='Password_Generator', description='This program generates cryptographically secure passwords.' ) parser.add_argument( 'length', type=int, help='length of the passwords to be generated' ) unique_parser = parser.add_mutually_exclusive_group(required=False) unique_parser.add_argument( '-U', '--unique', dest='unique', action='store_true', help='specifies passwords should contain unique characters only' ) unique_parser.add_argument( '-NU', '--no-unique', dest='unique', action='store_false', help='specifies passwords should not contain unique characters only' ) parser.set_defaults(unique=False) symbol_parser = parser.add_mutually_exclusive_group(required=False) symbol_parser.add_argument( '-S', '--symbol', dest='symbol', action='store_true', help='specifies passwords should contain special characters' ) symbol_parser.add_argument( '-NS', '--no-symbol', dest='symbol', action='store_false', help='specifies passwords should not contain special characters' ) parser.set_defaults(symbol=True) parser.add_argument( '-C', '--count', type=int, default=1, help='specifies the number of passwords to generate, default 1' ) namespace = parser.parse_args() length, unique, symbol, count = [ getattr(namespace, name) for name in ('length', 'unique', 'symbol', 'count')] for _ in range(count): print(generate_password(length, unique, symbol)) 

Help message

PS C:\Users\Xeni> D:\MyScript\password_generator.py -h usage: Password_Generator [-h] [-U | -NU] [-S | -NS] [-C COUNT] length This program generates cryptographically secure passwords. positional arguments: length length of the passwords to be generated options: -h, --help show this help message and exit -U, --unique specifies passwords should contain unique characters only -NU, --no-unique specifies passwords should not contain unique characters only -S, --symbol specifies passwords should contain special characters -NS, --no-symbol specifies passwords should not contain special characters -C COUNT, --count COUNT specifies the number of passwords to generate, default 1 

Example usage

PS C:\Users\Xeni> D:\MyScript\password_generator.py 16 -C 8 1o71NV8{rt*.3l4W L533*8X1m$9`!w6R 9#0<W4~ZuK#"51Vd V9y93Y^2B34Go71/ 50]9o1]E1&ap6HU# ~42pL"46T'7633zd 6Zw1a9z"F16H7~3Z 2Ab7S<r82o7_KN49 PS C:\Users\Xeni> D:\MyScript\password_generator.py 16 -U -C 8 xt7)kG48O9KW\^0] Pz02ac51>8_UY43g 6[kYF?0H98Oq3'21 Fp50N>P47n+369A1 9v743R0%V5\2s186 Dbr1%\0a}486TQF; K?l1Cz;92Wn7|Z54 1a@7]4yBsCx52Y0G PS C:\Users\Xeni> D:\MyScript\password_generator.py 16 -U -NS -C 8 c0YgQ9C54k86Ff3m Dwd3W412059ujm8H 17pWJvhyC9b82460 Qs52409Az673R18Z 89R2Pk6z401nN73H 7gDcO30yxG54d86K i62s1kQC89IuBb45 iw02Zn96Ez1xQY8N 

As the program became sufficiently complex I wanted it to be reviewed, in particular this is the first time I have ever used argparse. I would like my code to be more concise and efficient.

\$\endgroup\$
2
  • 3
    \$\begingroup\$You passed up the opportunity to import typer. You elected to not write any """docstrings""". The identifiers a, b are not well chosen. There's no unit tests, and since you rely on global PRNG state there won't be. Dup suppression is an odd choice, as it reduces entropy. Choosing to suppress punctuation near start / end of password would make good UX sense, in order to make double-click copy-n-paste a little easier.\$\endgroup\$
    – J_H
    CommentedMay 18, 2023 at 19:43
  • 4
    \$\begingroup\$Please do not edit the question, especially the code, after an answer has been posted. Changing the question may cause answer invalidation. Everyone needs to be able to see what the reviewer was referring to. What to do after the question has been answered.\$\endgroup\$
    – Doryx
    CommentedMay 19, 2023 at 15:30

3 Answers 3

18
\$\begingroup\$

Put all code in functions. Code in functions can be tested, experimented with, and reorganized much more easily than code floating around at the top level. Floating code introduces the temptation to rely on global variables, which can create substantial headaches even at moderate program sizes. Putting everything in functions forces you to think clearly about how the behavior and data flow is organized. It's a time-tested practice worth embracing.

Invest a lot more energy in code clarity and readability. I started experimenting with this code expecting to write a typical review suggesting improvements on various details, but I was quickly overwhelmed by the complexity and opacity of the code. A few examples. (1) What is COUNTS? The constant name is generic, there are no comments to provide clues, the data structure is is almost (but not quite) parallel to CHARSETS, and the numbers themselves are not evidently self-explanatory. (2) What is the split() function's purpose and behavior? Again, the name is generic and a bit puzzling (what is being split?). There are no docstrings or comments explaining the intent. Its logic is driven by the mystery numbers in COUNTS, using completely abstract variable names of a and b. After experimenting with the code, I get the sense that the job of split() is to determine how many of each category of characters (lower, upper, digits, punctation) we will select to build a password, but nothing in the code aids me in reaching that determination. I had to figure that out via experimentation, printing, and close reading. (3) What is choose_charset()? Are we really choosing a character set, meaning selecting among the CHARSETS data? It doesn't look like that. Is number in this function the same as number in split()? And how do those generic concepts compare to password length? The naming scheme is both generic and seemingly inconsistent. (4) The algorithm in generate_password() is a head-scratcher. I would confidently wager than less than 1 in 1000 experienced Python programmers have ever used iconcat -- which doesn't necessarily make it a bad thing to use, but it does convey the terrain you're walking on. After a fair bit of study, I managed to figure out what it's doing, but you are really making your reader work hard! Consider the following comparison. Even without improving the rest of the code's naming and readability, we can clarify password generation a lot just by doing the normal Python thing rather than by doing something super-clever.

# Original code: reduce, iconcat?!?, map, choose_charset -- huh? password = reduce( iconcat, map( choose_charset, CHARSETS[symbol], split(length, symbol), [unique]*4 ), [] ) # Revised code: ok, there is a tuple of "splits" (not sure what that means, # but it's the same size as CHARSETS, so I sort of see the connection). Oh, # and then we just build the password from the characters coming out of # choose_charset(). I sort of get it now. splits = tuple(split(length, symbol)) password = [ char for charset, s in zip(CHARSETS[symbol], splits) for char in choose_charset(charset, s, unique) ] 

Does the code work correctly? When I see this kind of complexity, my suspicions are raised. How do I know the code is doing the right thing? I decided to do some quick analysis. Thanks to your good decision to use a command-line argument parser, I quickly added a new behavior to be triggered by --analysis. It takes a bunch of passwords and counts up how often each category of characters is observed. Here's the new function, and a sketch of how I wove it into the existing code.

import sys from collections import Counter def main(args): parser = argparse.ArgumentParser(...) ... opts = parser.parse_args(args) ... if opts.analysis: passwords = [ generate_password(opts.length, opts.unique, opts.symbol) for _ in range(count) ] analyze_passwords(opts, passwords) else: ... def analyze_passwords(opts, passwords): csets = dict( lower = ascii_lowercase, upper = ascii_uppercase, punct = punctuation, digit = digits, ) c = Counter( next(name for name, chars in csets.items() if c in chars) for p in passwords for c in p ) for k in csets: print(k, round(c[k] / opts.count, 3)) if __name__ == '__main__': main(sys.argv[1:]) 

I ran the analysis across various password lengths (8, 16, 35, 70), with and without --unique. The results suggest that you might want to reconsider your approach. As password length increases, digits and especially punctuation become over-represented, while lowercase letters are under-represented. Also, it seems that digits are always being selected in unique mode (or otherwise somehow capped at 10 appearances), even when the user did not specify that option on the command line.

Category | 8 | 16 | 35 | 70 # Non-unique. ---------------------------------------- lower | 2.0 | 3.0 | 5.006 | 9.548 upper | 2.0 | 2.983 | 5.869 | 18.794 punct | 2.0 | 3.386 | 14.127 | 31.658 digit | 2.0 | 6.632 | 9.998 | 10.0 Category | 8 | 16 | 35 | 70 # Unique. ----------------------------------------- lower | 2.0 | 2.997 | 5.007 | 9.559 upper | 2.0 | 3.0 | 5.857 | 18.77 punct | 2.0 | 3.398 | 14.138 | 31.672 digit | 2.0 | 6.606 | 9.998 | 10.0 

A key in the fight against complexity: well-focused data objects. After thinking about your program a while I decided to abandon any effort at line-by-line or even function-by-function advice. I would suggest a complete rethink. Rather than improving the split() function, for example, I would suggest that you figure out a way not to need it at all. One way to start is by creating a data object to represent a CharacterSet and the behaviors that could be tied to it when generating passwords. Such an object should know its universe of characters (lower, upper, digits, punctuation). It should know how to randomly emit another character, subject to any unique constraint. And it should know whether the next request to get another character will fail due to that constraint.

class CharacterSet: def __init__(self, chars, unique = False): self.chars = chars self.unique = bool(unique) self.remaining = list(chars) random.shuffle(self.remaining) def next_char(self): if self.unique: # Because we shuffled, we can just pop from the end. return self.remaining.pop() else: return secrets.choice(self.chars) @property def is_empty(self): return self.unique and not self.remaining 

With that very simple data-object in place, the code to generate a single password in a way that respects the unique constraint and (as much as possible) selects evenly from the different character categories might look like the following:

from itertools import cycle def create_password(length, unique = False, include_punct = False): # Create the CharacterSet instances. categories = (ascii_lowercase, ascii_uppercase, digits, punctuation) csets = [ CharacterSet(chars, unique) for chars in categories ] if not include_punct: csets.pop() random.shuffle(csets) # Build a password by just cycling though the shuffled # CharacterSet instances, getting their next character. password = [] csets = cycle(csets) for _ in range(length): while True: cset = next(csets) if not cset.is_empty: break password.append(cset.next_char()) random.shuffle(password) return ''.join(password) 

Next steps: reconsider even distributions. When passwords generated by the code above are run through the analyze_passwords() function, the passwords end up having a very even distribution across the character categories (again, subject to unique constraint). But you might want to reconsider having such an even distribution. My advice would be to start with the minimum counts desired for each category (eg, 2 characters from each category). You can build that password in the manner shown above. Then, to get the remaining characters needed to achieve full password length, draw from a universal CharacterSet that is created using all of the remaining characters from the category-specific CharacterSet instances.

\$\endgroup\$
0
    15
    \$\begingroup\$

    I feel like you came up with an over-engineered solution that, ultimately, fails to meet expectations.

    Generating a password is a task with well-defined use-cases, that we encounter almost daily.

    Most requirement come from security considerations:

    1. use a cryptographically secure source of randomness
    2. eliminate all identifiable patterns in generated passwords
    3. don't impose a limit on password length
    4. use as wide an alphabet/character set as possible

    Your algorithm fails on almost all accounts.

    After picking characters from different character sets, these are shuffled with random.shuffle, which is not crypto-grade randomness, violating the first requirement.

    The algorithm over-represents digits and punctuation, as noted in FMc's great answer, violating the requirement number 2.

    The -U option also introduces patterns, against requirement 2.

    If the -NU option is selected, the length of the password is limited by the size of the character set for no apparent reason. You might think the length is good enough, but in fact I can't generate a password with my go-to settings (no special chars, 64-length, for good enough entropy and not having to deal with each website's different limitations for special chars). Requirement 3 failed.

    The options allow either to include all printable special characters, or none of them. In fact, in the real world, you are usually limited to a subset of some special chars, but you can't use that with your solution. Requirement 4 failed.


    Now, this set of requirements is not actually hard to meet. In fact, taking inspiration from the secrets module documentation's Recipes and best practices, it's quite simple to come up with a simple solution:

    import secrets def generate_password(length, alphabet): return ''.join(secrets.choice(alphabet) for _ in range(length)) 

    This allows meeting all four requirements. Now, add on top some well-designed command-line switches with sensible defaults, and we actually have a usable password generator.

    \$\endgroup\$
    1
    • 1
      \$\begingroup\$Thank you for mentioning that limiting the number of occurrences of any particular character is wrong as it decreases the number of potential passwords -- and does so drastically. As an example, imagine an 8 digits PIN code where digits have to be unique. There are... 45 distinct sets of digits that can be used. Shuffling them (8!) leads to 1.8 millions combinations. That's 50x less than the 100 millions combinations without the uniqueness requirement.\$\endgroup\$CommentedMay 22, 2023 at 9:10
    7
    \$\begingroup\$

    Another perspective: adopting legacy* code

    I agree with the other answers and I would never use this code to generate passwords, since (1) it's not cryptographically secure and (2) it's highly opaque, and I don't want opaque code near security systems. But it also has many typical features of a lot of 'legacy' code out there which we do want to keep using. So let's think about adopting it as an exercise.

    When you're tempted to write stuff like this:

    COUNTS = ( ((3, 36), (2, 10)), ((4, 68), (3, 42), (2, 10)) ) 

    Reach for collections.namedtuple.

    What are these two numbers? What roles do they play? I don't know, and since the algorithm here isn't cryptographically secure anyway I decided not to figure it out. But you could have told me straight out by declaring a datatype:

    from collections import namedtuple Count = namedtuple("Count", "first_thing,second_thing") COUNTS = ((Count(3, 36), Count(2,10),) 

    The best thing about this is that namedtuples are tuples, so you can bolt this on later. You don't have to use property access (Count.first_thing). Later on, you can go through and convert the access, but for now just declaring the type will get you instant clarity---and your IDE/editor will probably start hinting everything. Later on we can declare a proper class if we need to.

    I often do this immediately when I start reading legacy code: it's one of the lowest-cost ways to start injecting some clarity.

    Of course you have to tell me what first_thing and second_thing are (or I'd have to read your code much more closely than I have time for atm).

    Hang on, though, Counts is a tuple of tuples. (Lists of lists or tuples of tuples are usually a sign that we're not using datastructures very well.) What does it correlate to? Oh, probably CHARSETS. Well then, we should declare that correlation explicitly:

    CHARSETS = { "printable": ..., "all": ..., } COUNTS = { "printable": Count(...), "all": Count(...), } 

    This modification isn't zero-cost, but we can still change a few lines---for now---and replace CHARSETS[x] with CHARSETS["descriptive_x"].

    The next step, if I was adopting legacy code, would probably be to combine counts and charsets into one datatype: they probably belong together, although I got lost in the algorithm so I'm not sure. Doing so is going to require another tweak, but this particular tweak---moving from index-based lookup in a bunch of lists to looking up one object and then accessing its properties---usually removes about 1/3 of the cruft in one swoop, because usually we're spending a lot of time generating a datastructure at runtime, and if we present the data in the right package to begin with we can just focus on the logic. Incidentally this isn't a high-level language thing: plenty of refactoring in C consists of replacing bunches of arrays with a single array (or even a linked list) of a well designed struct.

    I'll stop there as I'd have to get into the particular code and I don't think it's worth it from the cryptographic POV.

    *legacy code = any code older than about 5 minutes. I'm sure there are plenty of excellent programmers who get everything right first time, but your own code often becomes 'legacy' with a few days.

    \$\endgroup\$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.