7
\$\begingroup\$

Question

Write a program which removes specific characters from a string.
The first argument is a path to a file. The file contains the source strings and the characters that need to be scrubbed. Each source string and characters you need to scrub are delimited by comma.

Input sample:

how are you, abc
hello world, def

Output sample:

how re you
hllo worl

I have used regex to eliminate the characters.

public class Main{ public static void main(String[] args) throws IOException { File file = new File(args[0]); BufferedReader br = new BufferedReader(new FileReader(file)); String s; while ((s = br.readLine()) != null) { s = s.trim(); String arr[]=s.split(",\\s"); String pat="([^"+arr[1]+"])"; Pattern p=Pattern.compile(pat); Matcher m=p.matcher(arr[0]); while(m.find()) { System.out.print(m.group(0)); } System.out.println(); } br.close(); } } 

The above code passes all the test cases.
Is this the most optimized code or can it be improved further?

\$\endgroup\$

    5 Answers 5

    4
    \$\begingroup\$

    You can extract out the requirement into its own method:

    private static String removeCharacters(String input, String excludedLetters) { // ... } 

    And since you have to split from a single String first, you can use a wrapper method in addition to the one above:

    private static String removeCharacters(String line) { String[] parts = // ... return removeCharacters(parts[0], parts[1]); } 

    If you are on Java 8, you can use a combination of try-with-resources and Files.lines() to Streamline the processing (pun intended):

    try (Stream<String> lines = Files.lines(Paths.get(/* ... */))) { lines.map(Main::removeCharacters).forEach(System.out::println); } 
    \$\endgroup\$
      4
      \$\begingroup\$

      Here are some tips for you (primarily regarding performance):

      • use a method like outputRemovedChars to do the task.
      • regular expressions are slow (except splitting on single characters). So use a simpler way for splitting each line into the text and the characters to be removed.
      • don't call System.out.print for every character: it's slow!
      • use built-in functions to read a file

      A more cleaner example implementation could be like this:

      public static void main(final String[] args) throws IOException { final String filename = args[0]; final List<String> lines = Files.readAllLines(Paths.get(filename), Charset.defaultCharset()); outputRemovedChars(lines); } private static void outputRemovedChars(final List<String> lines) { for (final String line : lines) { System.out.println(removeChars(line)); } } private static String removeChars(final String line) { final int sepPos = line.indexOf(','); final String text = line.substring(0, sepPos); final String charsToRemove = line.substring(sepPos + 2); return text.replaceAll("[" + charsToRemove + "]", ""); } 

      I didn't care about handling errors regarding the input. I assume, that the input has always this structure.

      \$\endgroup\$
      1
      • \$\begingroup\$It's worth noting that String.replaceAll() is implemented using regex.\$\endgroup\$CommentedJul 28, 2015 at 16:36
      3
      \$\begingroup\$

      The code looks okay, and it passed the test, so it is good enough as far as the test is concerned. That said, the code is a bit fragile on input, and it can be further optimised.

      Fragile: Pattern creation

      Because you create a pattern from an input string, you can end up with an invalid pattern, or a pattern that doesn't work as advertised. A line of int[] x = { 15 };, ][; will trip up this program. This doesn't appear to be a problem in the tests your code was submitted to, so it may be valid to assume input is alphanumerical us-ascii.

      Performance

      I've found two alternatives to perform better (with my own randomised data-set(1); your mileage may vary):

      • String.replaceAll basically does what you've implemented, but has the potential to eliminate some copying along the way. (Where String.substring used to be a view on the base string, it is implemented in Oracle JRE 8 as a copy operation.) This change reduced average execution time by about 40%.

      • BitSet performed even better. By putting the chars to be erased into a bitset and checking against that, execution time went down to about 33%. I was a little sloppy and gimmicky and used streams instead of for-looping, but it should stay in the same order.

      I'm sure there are ways to really go 88mph on the code, but that may start to affect readability.


      (1) 20 000 lines, 100 chars per line, 8 chars per pattern. To limit disk churning, I kept the file open in Notepad++, which should help keep the data in OS cache.

      \$\endgroup\$
        3
        \$\begingroup\$

        The first thing you can do to improve your code is move the logic outside of the main method. The section of your code where the string is scrubbed can be moved into a static method that takes two strings, a source string and list of characters to be removed, then returns the scrubbed String. Notice that the method returns a String rather than printing it; this is because it is good practice to keep I/O and program logic separate.

        public static String removeCharacters(String sourceString,String characters){ String pat="([^"+characters+"])"; Pattern p=Pattern.compile(pat); Matcher m=p.matcher(sourceString); StringBuilder scrubbedBuilder = new StringBuilder(); while(m.find()){ scrubbedBuilder.append(m.group(0))); } return scrubbedBuilder.toString(); } 

        Now that the removeCharacters logic has been abstracted from the main method, it's much simpler to modify it's implementation without worrying about how it will effect the rest of the code.

        public static String removeCharacters(String sourceString,String characters){ String regex = "["+characters+"]"; return sourceString.replaceAll(regex,""); } 

        This code is almost exactly the same as the code you provided except a built in method of the String class is being used instead of manually managing the regex.


        Finally, the main method needs to be modified to accommodate the abstraction of the removeCharacters method. At the same time it can be modified to utilize Java7's try-with-resource and a few variables can be given more specific names.

        public static void main(String[] args) { String filePath = args[0]; File file = new File(filePath); try(BufferedReader br = new BufferedReader(new FileReader(file));){ String inputLine; while ((inputLine = br.readLine()) != null) { inputLine = inputLine.trim(); String[] splitInput=inputLine.split(",\\s"); System.out.println(removeCharacters(splitInput[0],splitInput[1])); } } catch (IOException ioe){ System.err.println("Error occurred while attempting to read file " + filePath); } } 

        Here's the final version of the code.

        import java.io.BufferedReader; import java.io.FileReader; import java.io.File; import java.io.IOException; public class Remove{ public static String removeCharacters(String sourceString,String characters){ String regex = "["+characters+"]"; return sourceString.replaceAll(regex,""); } public static void main(String[] args) { String filePath = args[0]; File file = new File(filePath); try(BufferedReader br = new BufferedReader(new FileReader(file));){ String inputLine; while ((inputLine = br.readLine()) != null) { inputLine = inputLine.trim(); String[] splitInput=inputLine.split(",\\s"); System.out.println(removeCharacters(splitInput[0],splitInput[1])); } } catch (IOException ioe){ System.err.println("Error occurred while attempting to read file " + filePath); } } } 

        The only other changes here are the name of the class and the imports. Instead of Main the class is named Remove and I've explicitly listed all the imported classes.

        \$\endgroup\$
          1
          \$\begingroup\$

          Just adding to other answers. As already said

          Because you create a pattern from an input string, you can end up with an invalid pattern, or a pattern that doesn't work as advertised

          I guess, Pattern.quote wouldn't work inside brackets, so I manually escape every character having special meaning there.

          private static final SPECIAL_IN_BRACES = "-^\\]&|"; private Pattern anyOf(String characters) { StringBuilder result = new StringBuilder(); result.append("["); for (int i=0; i<characters.length; ++i) { char c = characters.charAt(i); if (SPECIAL_IN_BRACES.indexOf(c) > -1) { result.append('\\'); } result.append(c); } result.append("]"); return Pattern.compile(result); } 

          This should be tested with all ASCII chars (other don't have a special meaning in regexes).


          My preferred way would be to use Guava CharMatcher as it's specialized class designed for such tasks. Then

          CharMatcher.anyOf(arr[1]).removeAll(arr[0]); 

          is all you need. The CharMatcher can be optimized by using .precomputed(), but that's not worth it for a single use (a normal version uses binary search while a precomputed one uses a BitSet, which may be a big difference, however, the precomputation may be costly).

          \$\endgroup\$

            Start asking to get answers

            Find the answer to your question by asking.

            Ask question

            Explore related questions

            See similar questions with these tags.