5
\$\begingroup\$

Given a string representation of data, I want to extract the information into its corresponding object.

However,

If the string has "|" separators then these should be considered options and need to be picked at random.

If the string data has numbers shown as a range "1-10" then a random value should be chosen between the range. It should also preserve the numerical datatype i.e int or float

I.e

"(1-3,1,1)" returns either (1, 1, 1), (2, 1, 1) or (3, 1, 1)

"(0.2-0.4,1,1)" returns either (0.2, 1, 1), (0.3, 1, 1) or (0.4, 1, 1)

"foo|bar|foobar" returns either "foo", "bar" or "foobar"

"[1-2,1,2]|foo|bar|[1,8-10,99]" could return :

"foo","bar", [1, 1, 2], [2, 1, 2], [1, 8, 99], [1, 9, 99] or [1, 10, 99]

This is what I have and it works well. But I cant help think it could be achieved in a more concise way. Let me know what I could have done better.

import re import random import ast def randomize_by_pipe(st_value): """ Used to split strings with the pipe character and randomly choose and option. :param: st_value - (str) """ if not st_value is None: st_arr = st_value.split("|") random.shuffle(st_arr) return st_arr[0] else: return st_value def randomise_range(text): if text is None: return text else: matches = re.findall("\d*\.*\d*-{1}\d*\.*\d*",text) for match in matches: startingPos = 0 position = text.find(match, startingPos) while True: position = text.find(match, startingPos) if position > -1: txt = text[position:position+len(match)] txt = rand_no_from_string(txt) new_text = text[0:position+len(match)].replace(match,str(txt)) text = new_text + text[position+len(match):] else: break try: return ast.literal_eval(text) except ValueError: return text def rand_no_from_string(txt): is_int = False txt_arr = txt.split("-") num_arr = [float(x) for x in txt_arr] if int(num_arr[0]) == num_arr[0]: mul = 1 is_int = True else: #new section to deal with the decimals mul = 10 ** len(str(num_arr[0]).split(".")[1]) num_arr = [x*mul for x in num_arr] if num_arr[0] > num_arr[1]: num_arr[1], num_arr[0] = num_arr[0], num_arr[1] val = random.randint(num_arr[0],num_arr[1])/mul return int(val) if is_int else val 

Run with:

text="(108-100,0.25-0.75,100)|Foo|Bar|[123,234,234-250]" randomise_range(randomize_by_pipe(text)) 
\$\endgroup\$
5
  • \$\begingroup\$So which function handles those strings like "(0.2-0.4,1,1)" and "[1-2,1,2]|foo|bar|[1,8-10,99]"? None of the three you showed seems to be able to.\$\endgroup\$CommentedOct 15, 2020 at 13:37
  • \$\begingroup\$@superbrain works fine for me. Have you seen the "Run with" section.\$\endgroup\$CommentedOct 15, 2020 at 13:47
  • \$\begingroup\$Oops, I actually did manage to miss that. So is this how to always run it? Then I think there should be a function to do that. Also, I just tried text = "(0.2-0.4,1,1)", which you say returns either (0.2, 1, 1), (0.3, 1, 1) or (0.4, 1, 1), and it didn't work. I got (0.324, 1, 1) iinstead.\$\endgroup\$CommentedOct 15, 2020 at 13:57
  • \$\begingroup\$@superbrain you are correct. I will have to make an adjustment to take into account the decimals of the float to accommodate this. It should work as follows. 0.2-0.4 would only produce 0.2,0.3,0.4 && 0.20-0.22 would produce 0.20,0.21,0.22 etc etc\$\endgroup\$CommentedOct 15, 2020 at 14:13
  • \$\begingroup\$@superbrain i've tweaked it now.\$\endgroup\$CommentedOct 15, 2020 at 14:48

2 Answers 2

5
\$\begingroup\$

Type hinting

Instead of having helpdocs declare the types of function parameters, why not go with type hinting?

Complexity

Your code currently has too many moving parts. You define 2 different functions to parse the data, and they both need to be called in chain. This should be done by a single parsing function.

Let the parser get data text, then the parser should be handling first parsing using pipe and later using the numerical ranges.

Selection from a list

Your randomize_by_pipe shuffles the list, and selects the 0th value. You can instead let random.choice do the job.

range parsing

I think range parsing can be improved a little. How about the following flow:

  1. Remove [ and ] from the given text.
  2. Split from ,.
  3. For each section of the split, try parsing as float (or int, depending on your dataset)
  4. In case of float conversion error, let the rand_no_from_string get a value.

regex

You have a regex, but you're not making full/elegant use of it. Instead of matches, you can group the results, and operate on those groups. The pattern itself can also be a little optimised:

\d+(?:\.\d+)?-\d+(?:\.\d+)? 

A rewrite, for eg:

from re import sub, Match from random import choice, randint def randomise_range(match: Match): given_range = match.group(0).split("-") low, high = map(float, given_range) if low > high: low, high = high, low if low.is_integer(): return str(randint(int(low), int(high))) multiplier = 10 ** len(given_range[0].split(".")[-1]) low = int(low * multiplier) high = int(high * multiplier) return str(randint(low, high) / multiplier) def extract_range(text: str = None): if not text: return text return sub(r"\d+(?:\.\d+)?-\d+(?:\.\d+)?", randomise_range, text) def parse(text: str = None): if not text: return text selection = choice(text.split("|")) if selection[0] in ('[', '('): return extract_range(selection) return selection if __name__ == "__main__": examples = ( "(1-3,1,1)", "(0.2-0.4,1,1)", "foo|bar|foobar", "(108-100,0.25-0.75,100)|Foo|Bar|[123,234,234-250]", "[1-2,1,2]|foo|bar|[1,8-10,99]", ) for text in examples: print(parse(text)) 
\$\endgroup\$
7
  • \$\begingroup\$I hate regex, and I cant get what you've given me to work. Can you give me an example of how I can group them. For some reason regex just goes over my head. Its just so alien to my brain. Maybe you have a good resource for it I can read up on? I know its powerful and I should learn it.\$\endgroup\$CommentedOct 15, 2020 at 17:57
  • 1
    \$\begingroup\$click the link, regex101 provides a detailed explanation of the expression.\$\endgroup\$CommentedOct 15, 2020 at 18:19
  • \$\begingroup\$You are amazing. Jeez that websites good.\$\endgroup\$CommentedOct 15, 2020 at 18:29
  • \$\begingroup\$I cant believe it even generates you the python code 😮\$\endgroup\$CommentedOct 15, 2020 at 18:31
  • \$\begingroup\$@LewisMorris there is also debuggex.com :)\$\endgroup\$CommentedOct 15, 2020 at 18:40
2
\$\begingroup\$

Here's an implementation whose major endeavour, when compared with your implementation as well as that of the accepted answer, is separation of parsing and execution. It's unclear whether this is important for you, but it's generally good design, and is likely faster to re-execute once parsed:

import re from numbers import Real from random import randint, choice from typing import Union, Callable class Pattern: chunk_pat = re.compile( r'([^|]+)' # group: within a chunk, at least one non-pipe character r'(?:' # non-capturing group for termination character r'\||$' # pipe, or end of string r')' # end of termination group ) option_pat = re.compile( r'([^,]+)' # at least one non-comma character in an option r'(?:' # non-capturing group for termination character r',|$' # comma, or end of string r')' # end of termination group ) range_pat = re.compile( r'^' # start r'(' r'[0-9.]+' # first number group r')-(' r'[0-9.]+' # second number group r')' r'$' # end ) def __init__(self, pattern: str): chunk_strs = Pattern.chunk_pat.finditer(pattern) self.tree = tuple( self.parse_chunk(chunk[1]) for chunk in chunk_strs ) @staticmethod def choose_in_group(group: tuple) -> tuple: for option in group: if isinstance(option, Callable): yield option() else: yield option def choose(self) -> Union[str, tuple]: group = choice(self.tree) if isinstance(group, tuple): return tuple(self.choose_in_group(group)) return group @staticmethod def precis_parse(as_str: str) -> (Real, int): if '.' in as_str: return float(as_str), len(as_str.rsplit('.', 1)[-1]) return int(as_str), 0 @classmethod def make_choose(cls, start: Real, end: Real, precis: int): if precis: factor = 10**precis start = int(start * factor) end = int(end * factor) def choose(): return randint(start, end) / factor else: def choose(): return randint(start, end) return choose @classmethod def parse_options(cls, options: str): for option in cls.option_pat.finditer(options): range_match = cls.range_pat.match(option[1]) if range_match: start_str, end_str = range_match.groups() start, start_n = cls.precis_parse(start_str) end, end_n = cls.precis_parse(end_str) yield cls.make_choose(start, end, max(start_n, end_n)) else: # Fall back to one raw string yield option[1] @classmethod def parse_chunk(cls, chunk: str): if ( chunk[0] == '(' and chunk[-1] == ')' or chunk[0] == '[' and chunk[-1] == ']' ): return tuple(cls.parse_options(chunk[1:-1])) # Fall back to returning the raw string return chunk def test(): p = Pattern('foo|(bar,3-4,50,6.3-7,92-99)') for _ in range(20): print(p.choose()) if __name__ == '__main__': test() 
\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.