3
\$\begingroup\$

In recent times, I've been using the following algorithm repeatedly in different languages (Java, Python, ...) to search and replace strings. I don't like this implementation very much, it's very procedural and not very descriptive, but I have yet to come up with a better one. How could the implementation be improved?

from typing import Callable import re def find_and_replace(pattern: str, text: str, replace: Callable[[re.Match], str]) -> str: last_end = 0 output = [] for match in pattern.finditer(text): text_before = text[last_end:match.start()] if text_before: output.append(text_before) replacement = replace(match) output.append(replacement) last_end = end text_remaining = text[last_end:] if text_remaining: output.append(text_remaining) return ''.join(output) 
\$\endgroup\$
8
  • \$\begingroup\$Where does PATTERN come from? And are you missing any imports? (re perhaps?)\$\endgroup\$CommentedJan 13, 2024 at 10:10
  • \$\begingroup\$Your question is hard to answer, because "improve" and "very procedural and not very descriptive" are rather opinions. What exactly would you expect from improved code? That said, PATTERN should obviously be a parameter and the checks to not append empty strings to output are redundant. Apart from that, the implementation is easy to follow, so nothing I would obsess about.\$\endgroup\$
    – uli
    CommentedJan 13, 2024 at 10:24
  • \$\begingroup\$@TobySpeight I left out imports for simplicity. Of course, re and typing.Callable need to be imported. And PATTERN should be a parameter, as @uli correctly pointed out. I changed the code accordingly.\$\endgroup\$CommentedJan 13, 2024 at 11:04
  • 1
    \$\begingroup\$This won't run. end is undefined. Don't omit important code for the sake of simplicity.\$\endgroup\$CommentedJan 13, 2024 at 14:35
  • 1
    \$\begingroup\$Please read the "What should I not do?" part of the site guidelines. Edits after a review are subject to being rolled back, so folks can see the code the review applies to.\$\endgroup\$
    – J_H
    CommentedJan 14, 2024 at 3:27

2 Answers 2

2
\$\begingroup\$

wrong signature

def find_and_replace(pattern: str, ... 

You intended pattern: re.Pattern,

Recommend you routinely use mypy or similar type checker if you choose to add optional type annotations.

It is common knowledge that "Comments lie!". Often they're initially accurate, and then the code evolves with edits while the comments remain mired in the past, out of sync. So at best we will "mostly believe" the comments that appear.

A type signature offers stronger promises, since we expect that lint nonsense was cleaned up during routine edit-debug and CI/CD cycles. Posting code for review which doesn't meet that expectation is problematic.

wrong code

 last_end = end 

You intended to assign ... = match.end(). I cannot imagine how one could "test" this source code without noticing a fatal NameError.

missing documentation

The review context merely mentioned that this will "search and replace strings", similar to what the identifier promises. It's unclear how the proposed functionality would differ from what the str or re libraries offer, though the signature does offer a hint. The hint is not enough, and we are not even offered so much as a motivating example of the kinds of Turing machines we might usefully pass in.

missing test suite

This submission contained no doctest, no automated tests of any kind. One reason we write tests is to exercise the code, for example to discover a NameError. Another reason is to educate engineers who might want to call into this library routine.

Here is a test that might usefully have been included in the OP.

import unittest class FindAndReplaceTest(unittest.TestCase): def test_find_and_replace(self) -> None: pattern = re.compile(r"\w+") text = "... Hello, world!" def replace(match: re.Match) -> str: return "<" + str(match.group(0).upper()) + ">" self.assertEqual( "... <HELLO>, <WORLD>!", find_and_replace(pattern, text, replace), ) 

superfluous tests

 if text_before: output.append(text_before) ... if text_remaining: output.append(text_remaining) 

There's no need to test for non-empty, as empty string is the additive identity when catenating. The ''.join(output) result won't be altered in the slightest if we tack on a hundred empty entries with:

 output += [""] * 100 

So as a minor cleanup, simply elide the ifs.

\$\endgroup\$
    1
    \$\begingroup\$

    Delete all of this, and wherever it was called, replace with stock re.sub():

    If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single Match argument, and returns the replacement string.

    \$\endgroup\$

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.