This is a Python script that generates ngrams using a set of rules of what letters can follow a letter stored in a dictionary.
The output is then preliminarily processed using another script, then it will be filtered further using an api of sorts by number of words containing the ngrams, the result will be used in pseudoword generation.
This is the generation part:
from string import ascii_lowercase import sys LETTERS = set(ascii_lowercase) VOWELS = set('aeiouy') CONSONANTS = LETTERS - VOWELS BASETAILS = { 'a': CONSONANTS, 'b': 'bjlr', 'c': 'chjklr', 'd': 'dgjw', 'e': CONSONANTS, 'f': 'fjlr', 'g': 'ghjlrw', 'h': '', 'i': CONSONANTS, 'j': '', 'k': 'hklrvw', 'l': 'l', 'm': 'cm', 'n': 'gn', 'o': CONSONANTS, 'p': 'fhlprst', 'q': '', 'r': 'hrw', 's': 'chjklmnpqstw', 't': 'hjrstw', 'u': CONSONANTS, 'v': 'lv', 'w': 'hr', 'x': 'h', 'y': 'sv', 'z': 'hlvw' } tails = dict() for i in ascii_lowercase: v = BASETAILS[i] if type(v) == set: v = ''.join(sorted(v)) tails.update({i: ''.join(sorted('aeiou' + v))}) def makechain(invar, target, depth=0): depth += 1 if type(invar) == str: invar = set(invar) chain = invar.copy() if depth == target: return sorted(chain) else: for i in invar: for j in tails[i[-1]]: chain.add(i + j) return makechain(chain, target, depth) if __name__ == '__main__': invar = sys.argv[1] target = int(sys.argv[2]) if invar in globals(): invar = eval(invar) print(*makechain(invar, target), sep='\n')
I want to ask about the makechain
function, I used set
s because somehow the results can contain duplicates if I used list
s, though the result can be cast to set
, I used a nested for
loop and a recursive function to simulate a variable number of for loops.
For example, makechain(LETTERS, 4)
is equivalent to:
chain = set() for a in LETTERS: chain.add(a) for a in LETTERS: for b in tails[a]: chain.add(a + b) for a in LETTERS: for b in tails[a]: for c in tails[b]: chain.add(a + b + c) for a in LETTERS: for b in tails[a]: for c in tails[b]: for d in tails[c]: chain.add(a + b + c + d)
Obviously makechain(LETTERS, 4)
is much better than the nested for loop approach, it is much more flexible.
I want to know, is there anyway I can use a function from itertools
instead of the nested for
loop to generate the same results more efficiently?
I am thinking about itertools.product
and itertools.combinations
but I just can't figure out how to do it.
Any help will be appreciated.