3
\$\begingroup\$

I've got an interview coming up for which the following is a prospective question:

Given 2 strings, for instance:

  1. John Lennon
  2. Paul ON

delete every character from string 1 which corresponds with any character in string 2 (case exclusive).

So following the anticipated manipulations, the output in our case would be:

Jhe

I've devised the following program for the same:

 BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); System.out.printf("Enter 2 strings%n"); String s1,s2,s3; s1=br.readLine(); s2=br.readLine(); s3="["; for(int j=0;j<s2.length();j++) { s3+=Character.toLowerCase(s2.charAt(j)); s3+=Character.toUpperCase(s2.charAt(j)); } s3+="]"; Pattern p = Pattern.compile(s3); Matcher m = p.matcher(s1); int[] a = new int[10]; int k=0; while(m.find()) { a[k]=m.start(); k++; } int j=0; for(int i=0;i<k;i++) { System.out.print(s1.substring(j, a[i])); j=a[i]+1; } System.out.print(s1.substring(j)); 

This works seamlessly, but my predicament is that these folks don't want you to just get a bloody veracious output, they want a highly optimized output i.e. spatial and temporal requirements as to your program should to be as minimum as can be. And mine is nowhere near the epitome of the optimal solution

If anybody feels that they can alter my code or come up with something better that'd be optimal, I'd be much obliged!

\$\endgroup\$
2
  • \$\begingroup\$It breaks when the input contains a closing bracket or a backslash. Also when it starts with a caret. I may have missed some...\$\endgroup\$CommentedSep 15, 2014 at 18:18
  • \$\begingroup\$I've never heard of "case exclusive". From the example, I infer that you mean case insensitive. Unless I'm not aware of "in/exclusive" being idiomatic; I'd suggest using the more common "in/sensitive".\$\endgroup\$
    – Flater
    CommentedApr 16, 2018 at 11:10

7 Answers 7

5
\$\begingroup\$

I'd use an object to hold the compiled version of the process for maximum efficiency. This would allow user to create a stripping tool and re-use it for maximum throughput.

Also you have the opportunity to create Strippers. :)

public class Test { /** * Use a Stripper to encode the stripping process. */ private static class Stripper { /** * Boolean lookup for maximimum performance. * * Could use BitSet to save a little space. * * Length 256 - assumes ASCII. */ private final boolean[] strip = new boolean[256]; // Compiles the strip string. public Stripper(CharSequence strip) { for (int i = 0; i < strip.length(); i++) { char ch = strip.charAt(i); // Both lowercase and uppercase. this.strip[Character.toLowerCase(ch)] = true; this.strip[Character.toUpperCase(ch)] = true; } } /** * Strips all characters from s that were in the `strip` parameter of * the constructor. * * Note the use of `CharSequence` to avoid premature `toString`. * * @param s - The source of the data to strip. * @return The stripped version of s */ public CharSequence strip(CharSequence s) { StringBuilder stripped = new StringBuilder(); for (int i = 0; i < s.length(); i++) { char ch = s.charAt(i); if (!strip[ch]) { stripped.append(ch); } } return stripped; } } public static final String strip(String from, String strip) { return new Stripper(strip).strip(from).toString(); } public void test() { System.out.println("Stripped: " + strip("John Lennon", "Paul ON")); } public static void main(String args[]) { try { new Test().test(); } catch (Throwable ex) { ex.printStackTrace(System.err); } } } 

Correctly prints:

Jhe

This algorithm should have an \$O(n)\$ construction complexity and an \$O(n)\$ processing complexity.

Please forgive the I'd do it a completely different way - this answer was posted before it was moved to Code Review.

Added

It seems a bit of narrative would be useful.

Obviously the desired process comes in two parts, first the setup and second the stripping of characters from the original string. This code performs the setup during the construction of the object by building an array of 256 booleans, one for each possible 8-bit character code. Each boolean holds a true value if the character at this position should be stripped from the string.

This ensures that the absolute minimum needs to be done at process time. All that is required is to take one pass through the string, looking up each character in the boolean array. If we find true then the character is discarded, a false means keep it.

\$\endgroup\$
7
  • \$\begingroup\$While I exquisitely appreciate you trying to help me with my predicament, I simply lack the insight to understand Strippers as of now. It's a new concept, I'll have to run some research to develop a keen understanding of it. Though i could perceive what you are doing via your program, it's rather brilliant, i couldn't run it on netbeans. My first guess was: this is a Local Class but i was wrong. I'm only used to running classes which have a main method embedded. How to get this baby to run?\$\endgroup\$CommentedSep 15, 2014 at 19:39
  • \$\begingroup\$@RiteshBhakre - I've added some more narrative and made the whole object into a single test class. Trace it all through and I hope you understand how it is working.\$\endgroup\$CommentedSep 15, 2014 at 20:32
  • \$\begingroup\$You say O(n) but in reality there's two variables, say n and m. It has n construction and m processing.\$\endgroup\$
    – corsiKa
    CommentedSep 15, 2014 at 21:52
  • 1
    \$\begingroup\$@corsiKa - If you look at my bio you will see that I have vowed never to get into any discussion about big-oh.\$\endgroup\$CommentedSep 15, 2014 at 22:05
  • 2
    \$\begingroup\$That's a very interesting perspective. I've asked the experts at cs.stackexchange to weigh in. I personally disagree with that perspective, so let's see what they say. Feel free to follow along: cs.stackexchange.com/q/30036/15285\$\endgroup\$
    – corsiKa
    CommentedSep 15, 2014 at 23:08
3
\$\begingroup\$

You could

  • put the chars in the to-be-deleted into a char array and sort them. This would allow you to look up characters via a binary search.
  • Use a StringBuilder to create the string of characters to keep.

This would be O(N) for space and O(N * log N) for computation time. You can reduce it to O(N) for time by using a bitset of the characters to remove instead of using a binary search.

\$\endgroup\$
1
  • \$\begingroup\$It would be interesting to compare the sorted array approach to a HashSet. `\$\endgroup\$CommentedApr 16, 2018 at 6:53
2
\$\begingroup\$

First of all:

Given 2 strings, for instance:

I would interpret this as the 2 strings are really given:

  • No need to prompt the user to enter them
  • No need to read them from anywhere

They are just given to my method and I just have to implement the task and return a string.

I don't know if it counts as cheating to use regular expressions. If that's ok then this essentially one-line solution works for the example strings that were given:

public class RemoveCharsTest { private String strip(String first, String second) { return Pattern.compile("[" + second + "]", Pattern.CASE_INSENSITIVE).matcher(first).replaceAll(""); } @Test public void testExample() { assertEquals("Jhe", strip("John Lennon", "Paul ON")); } } 
\$\endgroup\$
    1
    \$\begingroup\$

    You this code

    String s1,s2,s3; s1=br.readLine(); s2=br.readLine(); s3="["; for(int j=0;j<s2.length();j++) { s3+=Character.toLowerCase(s2.charAt(j)); s3+=Character.toUpperCase(s2.charAt(j)); } s3+="]"; 

    Could be optimized like + and concat are not the same under the hood https://stackoverflow.com/questions/47605/string-concatenation-concat-vs-operator

    String s1,s2,s3; s1=br.readLine(); s2=br.readLine(); s3="["; s3.concat(s2.toLowerCase()); s3.concat(s2.toUpperCase()); s3+="]"; 
    \$\endgroup\$
      1
      \$\begingroup\$

      One thing to note is that you are currently accepting user input without verifying it. BufferedReader.readLine() could potentially return null or throw and IOException. Since you're doing the same thing twice in a row, I would put this sort of thing in a separate function(see below). Also, I've put the final keyword into my code below, but I'll explain that later.

      private String getStringFromUser(BufferedReader in) { final StringBuilder result = new StringBuilder(""); try { final String s = in.readLine(); if (null != s) // if you append null, your result will be "null" { result.append(s); } } catch (IOException e) { e.printStackTrace(); } return result.toString(); } 

      So now your s1 and s2 initializations look like this:

      final String s1 = getStringFromUser(br); final String s2 = getStringFromuser(br); 

      Most importantly, this ensures that s1 and s2 aren't null, so you don't have to worry about NullPointerExceptions later on. Note, it's OK to define s1 and s2 as String here, because we don't intend to change their value.

      Then, there's s3. I would make that a StringBuilder, like so:

      final StringBuilder s3 = new StringBuilder("["); 

      I don't want to rewrite the entirety of your code, because I think this would be good practice, but I think the first step would be getting a working version with the changes shown above(specifically, the StringBuilder).

      On final: final means that the variable will not be reassigned later. That is, it will always be the same object that you create at its initialization. This helps prevent bugs later on. In the given example, it's all fairly straight forward, but in a larger/more complex algorithm you might accidentally type s1 = ..., which would drastically change how things are processed from that point forward. By declaring s1 as final, you'll receive compiler errors if you ever try to assign to it later on. It also makes sense for user inputs. By ensuring the user inputs don't change, you can always go back and reference them.

      Others have critiqued time complexity, so the only note I have is that your int array has a maximum size of 10. For large input strings, this would quickly be filled. It works for your small example, but if you tried putting the text of a book in s1 and a sentence for s2, your output could be hundreds or thousands of letters.

      The main points I wanted to make were to:

      • Check user input

      • Use a StringBuilder to avoid creating/destroying Strings all the time.

      • Introduce the final keyword. It's a good contract that says "Don't change this. Or else!"

      \$\endgroup\$
        0
        \$\begingroup\$

        Here is my idea. I like to keep things simple and short:

        • Add every character of 2nd string into a HashSet (to only get unique values). We may want to handle the case where we have different capitalizations (such as O and o).
        • For each character c in that HashSet, call str1 = str1.replaceAll(c, "");.
        \$\endgroup\$
        1
        • \$\begingroup\$+1 for HashSet, -1 for repeatedly calling replaceAll which is terrible for performance. Just iterate over the inputSting once, check if the character is in the set and adding it to the outputStringBuilder if it is not.\$\endgroup\$CommentedApr 16, 2018 at 6:53
        0
        \$\begingroup\$

        I've made some alterations in my code, used StringBuilder class as was proposed. This is how it looks now:

        BufferedReader br = new BufferedReader(new InputStreamReader(System.in)); System.out.printf("Enter 2 strings%n"); String s2,s3; StringBuilder sob = new StringBuilder(br.readLine()); s2=br.readLine(); s3="["; s3+=s2.toLowerCase()+s2.toUpperCase()+"]"; Pattern p = Pattern.compile(s3); Matcher m = p.matcher(sob.toString()); while(m.find()) { sob.replace(m.start(), m.end(),"0"); } for(int i=0;i<sob.length();i++) { if(sob.charAt(i)!='0') System.out.print(sob.charAt(i)); } 

        I tried commenting this on one of the answers posted above but apparently it's a tad long by 115 characters, eh :P I'd be ecstatically thankful if the sagacious savants were to review this and let me know what they think, more optimization if possible let me know =)

        \$\endgroup\$

          Start asking to get answers

          Find the answer to your question by asking.

          Ask question

          Explore related questions

          See similar questions with these tags.