Parse string using character value for Unicode characters

Question

I added the getCValue method to reduce the cyclomatic complexity, but the cyclomatic complexity still persists. How can I reduce it? Can I change this code using regular expressions? if I can how?

 public final class UnicodeEscapeUtil { public static String unicodeUnescape(String unicodeEscapedMessage) { if (unicodeEscapedMessage == null || "".equals(unicodeEscapedMessage.trim())) { return ""; } char[] inputCharArray = unicodeEscapedMessage.trim().toCharArray(); int lengthOfInput = inputCharArray.length; char[] outputCharArray = new char[lengthOfInput]; int lengthOfOutput = 0; int index = 0; while (index < lengthOfInput) { char c = inputCharArray[index++]; if (c == '\\') { c = inputCharArray[index++]; if (c == 'u') { int value = 0; for (int i = 0; i < 4; i++) { c = inputCharArray[index++]; value = getCValue(value,c); } outputCharArray[lengthOfOutput++] = (char) value; } else { if (c == 't') { c = '\t'; } else if (c == 'r') { c = '\r'; } else if (c == 'n') { c = '\n'; } else if (c == 'f') { c = '\f'; } else { //log } outputCharArray[lengthOfOutput++] = c; } } else { outputCharArray[lengthOfOutput++] = c; } } return new String(outputCharArray, 0, lengthOfOutput); }

After the getCValue function cyclomatic complexity is reduced: 22 to the 15 allows.

private static int getCValue(int value, int c){ switch (c) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': value = (value << 4) + c - '0'; break; case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': value = (value << 4) + 10 + c - 'a'; break; case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': value = (value << 4) + 10 + c - 'A'; break; default: throw new IllegalArgumentException("Malformed \\uxxxx encoding."); } return value; }

Please briefly decribe the purpose of the code and what it is supposed to do. — TorbenPutkonen, CommentedMay 17, 2022 at 5:32
To manipulate and check string expression in method named UnicodeEscapedSourceConverter — stromboli, CommentedMay 17, 2022 at 6:05
Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Feel free to edit and give it a different title if there is something more appropriate. — Sᴀᴍ Onᴇᴌᴀ, CommentedMay 17, 2022 at 6:51

TorbenPutkonen · Accepted Answer · 2022-05-19 04:39:37Z

So... since you already did it once, why are you not simply extracting more of the code blocks into their own methods?

while (index < lengthOfInput) { char c = inputCharArray[index++]; if (c == '\\') { readEscapedCharacter(...); } else { outputCharArray[lengthOfOutput++] = c; } } private static void readEscapedCharacter(...) { char c = inputCharArray[index++]; switch (c) { case 'u': readUnicodeEscape(...); break; case 't': ... } }

This does require you to pay more attention to the data structures you are using, but you should be doing that anyway. Use StringReader and StringWriter instead of trying to manage the input and output arrays manually.

The input sanity check contains a few surprises: it converts a null input into a non-null result. Non-surprising actions would be to return null for null or reject null input using Objects.requireNonNull(unicodeEscapedMessage). If you return null-for-full, then this may be the odd case in code review where returning an optionals would be the right choice. An optional would make the next step more convenient for the caller.

That the method also does input trimming is quite surprising and limits the reusability of the method a lot. If the caller does not want the semantic content of input changed, then they have to write their own. It's better just to leave the trimming to the caller.

Peter Csala · Accepted Answer · 2022-05-20 13:21:52Z

If you can use Java 13+ then you can take advantage of the switch expression.
It can greatly reduce the getCValue method's implementation:

private static int getCValue(int value, int c){ value = (value << 4) + c; switch (c) { case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' -> value -= '0'; case 'a', 'b', 'c', 'd', 'e', 'f' -> value += 10 - 'a'; case 'A', 'B', 'C', 'D', 'E', 'F' -> value += 10 - 'A'; default -> throw new IllegalArgumentException("Malformed \\uxxxx encoding."); } return value; }

I'm not sure in this case what would be the cyclomatic complexity of this method.

It makes the code shorter and cleaner but doesn't change the cyclomatic complexity. — TorbenPutkonen, CommentedMay 23, 2022 at 4:10

Stack Exchange Network

Parse string using character value for Unicode characters

2 Answers 2

Hot Network Questions

Parse string using character value for Unicode characters

2 Answers 2

Related

Hot Network Questions