The following code is a module that evaluates boolean expressions, which are entered through an unauthorized web API. The exposed function of the library being used is evaluate()
. Its positional argument string
is a string of a pythonic boolean expression that is entered by the user and thusly untrustworthy (so it's potentially any string). I implemented some basic checks to ensure that arbitrary code execution is mitigated. Most of the security is outsourced to the callback function used to evaluate the boolness of user-defined tokens. I'd appreciate a review with focus on security, but also appreciate general feedback.
"""Boolean evaluation.""" from typing import Callable, Iterator __all__ = ['SecurityError', 'evaluate'] PARENTHESES = frozenset({'(', ')'}) KEYWORDS = frozenset({'and', 'or', 'not'}) class SecurityError(Exception): """Indicates a possible security breach in parsing boolean statements. """ def bool_val(token: str, callback: Callable[[str], bool]) -> str: """Evaluates the given statement into a boolean value.""" callback_result = callback(token) if isinstance(callback_result, bool): return str(callback_result) raise SecurityError('Callback method did not return a boolean value.') def tokenize(word: str) -> Iterator[str]: """Yields tokens of a string.""" for index, char in enumerate(word): if char in PARENTHESES: yield word[:index] yield char yield from tokenize(word[index+1:]) break else: yield word def boolexpr(string: str, callback: Callable[[str], bool]) -> Iterator[str]: """Yields boolean expression elements for python.""" for word in string.strip().split(): for token in filter(None, tokenize(word)): if token in KEYWORDS or token in PARENTHESES: yield token else: yield bool_val(token, callback) def evaluate( string: str, *, callback: Callable[[str], bool] = lambda s: s.casefold() == 'true' ) -> bool: """Safely evaluates a boolean string.""" return bool(eval(' '.join(boolexpr(string, callback))))
Update
This library is used in one place only, namely a real estates filtering API, where web applications can retrieve real estates from a database. By using boolean expressions parsed by this library code, the web applications can filter the retrieved real estates. The actual matching is done in the callback function passed to evaluate, which matches common filtering properties on the respective real estates and then uses evaluate()
to assert whether the requested combination of properties is met. E.g. id in [1, 2, 3] and (sales_type == "rent" or price < 250000)
which, depending on the real estate, evaluates the matching to True and ( False or True )
which is then passed to eval()
.