7
\$\begingroup\$

I have written a C function to parse an IP address that should match the regular expression ^\d+.\d+.\d+.\d+ and nothing else. Additionally all bytes are limited to 0-255.

I tried to catch all invalid inputs, but I am unsure if there are any edge cases I missed or if it could be simplified.

The function is intended to be used in an embedded system. Therefore I don't want to use external libraries (or anything in stdio.h) and code size is most important.

I am not really looking for stylistic advice, but rather a functional review.

// Returns the number of characters parsed or 0 if not successful unsigned int StringToIp(uint8_t addr[4], const char* ipSrc, const size_t ipSrcLen) { char* ipPtr = ipSrc; int16_t current = -1; uint8_t addrIdx = 0; for (int i = 0; i < ipSrcLen, addrIdx < 4; i++) { char c = *ipPtr++; if (c == '.') { if (current == -1) return 0; // 2 consecutive dots or dot at beginning addr[addrIdx++] = current; current = -1; } else { uint8_t digit = c - '0'; if (digit >= 10) { // Invalid character if (addrIdx == 3 && current >= 0) { // Invalid character at the end is treated as the end of the address addr[addrIdx++] = current; ipPtr--; break; } } if (current == -1) { // first digit of current byte current = digit; } else { // further digits current = current * 10 + digit; if (current > 255) return 0; // current byte greater than 255 => invalid } } } if (addrIdx == 3 && current >= 0) { // write last byte (necessary if at end if string) addr[addrIdx++] = current; } if (addrIdx == 4) { // valid IP address return ipPtr - ipSrc; } else { // invalid IP address return 0; } } 

I also wrote the inverse function. Here I am quite sure I got all cases, but the divisions are suboptimal, because the microcontroller has no dedicated divider. Which means extra code for the divisions has to be generated and this bloats the code size and reduces the speed (which is somewhat less important to me).

Is there any why to avoid the divisions?

unsigned int IpToString(char ipDst[4*3+3+1], const uint8_t addr[4]) { char* ipPtr = ipDst; for (int i = 0; i < 4; i++) { uint8_t current = addr[i]; uint8_t h = current / 100; current -= h * 100; uint8_t d = current / 10; current -= d * 10; uint8_t o = current; if (h > 0) *ipPtr++ = '0' + h; if (h > 0 || d > 0) *ipPtr++ = '0' + d; *ipPtr++ = '0' + o; if (i < 3) *ipPtr++ = '.'; } *ipPtr = '\0'; return ipPtr - ipBuffer; } 
\$\endgroup\$
12
  • 2
    \$\begingroup\$For your second function, since you know the range will be 0-255, why not use a series of if statements? Compare with 200, 100, and then a binary search of 10's digits.\$\endgroup\$
    – aghast
    CommentedNov 8, 2019 at 15:31
  • \$\begingroup\$In the future, it is best to provide test cases as part of the code for review so that we can properly review the code.\$\endgroup\$
    – pacmaninbw
    CommentedNov 8, 2019 at 16:59
  • \$\begingroup\$Rather than write your own, you could duplicate implementations of inet_* from arpa/inet.h, for example in muslinet_aton.c and inet_ntoa.c. (Keep in mind to respect musl's license.)\$\endgroup\$
    – esote
    CommentedNov 8, 2019 at 18:18
  • \$\begingroup\$How should StringToIp(..., "2.0004.6.8", ...) function? (4 digits in a byte)\$\endgroup\$
    – chux
    CommentedNov 8, 2019 at 22:57
  • \$\begingroup\$@chux-ReinstateMonica At the moment it basically ignores the leading zeros which is fine I think, because it still represents a valid IP.\$\endgroup\$CommentedNov 8, 2019 at 23:06

2 Answers 2

4
\$\begingroup\$

Potential bugs

I just spent a few minutes reviewing this, so I'm not sure these are bugs, but you should have a look at them:

  1. The comma operator is not the same as &&. You have this code:

    for (int i = 0; i < ipSrcLen, addrIdx < 4; i++) { 

    but I think you meant to use && instead of ,. I'm not sure it matters because I'm not sure what the chances are that you will get a mismatch between the string contents and the length, but it's a possible source of error.

  2. Incomplete error checking. In the "invalid digit" section of your loop, you have this check:

    if (addrIdx == 3 && current >= 0) { 

    but you don't do anything if that condition is not true. If addrIdx is 0, 1, or 2 you don't handle the bad digit, but instead fall through. I think you need to catch those cases and fail gracefully.

IpToString

I mentioned this in the comments, but you know the range of values is small. So there's no reason not to replace your divisions with either a series of if statements or a lookup table.

Unless you're writing a router or something, I don't expect the lookup table would pay for itself, so the if statements seem to be the way to go. Something like this:

need_tens_0 = FALSE // hundreds digit if number >= 100: need_tens_0 = TRUE if number >= 200: *ptr++ = '2' number -= 200 else: *ptr++ = '1' number -= 100 // tens digit (binary search) if number >= 50: if number >= 80: if number >= 90: *ptr++ = '9' number -= 90 else: *ptr++ = '8' number -= 80 // ... buncha cases omitted ... else if number >= 10: else: // "6" could be 56 or 106 or 216. Check if we need to insert a padding 0 if need_tens_0: *ptr++ = '0' // Ones digit *ptr++ = '0' + number 
\$\endgroup\$
4
  • \$\begingroup\$I think you are right about the comma operator. The compiler output is quite different if I use && instead and does not change if I remove i < ipSrcLen,. So it seems i < ipSrcLen was being ignored completely. Thank you very much.\$\endgroup\$CommentedNov 8, 2019 at 16:18
  • \$\begingroup\$The second point is not true though, because anything else than addrIdx >= 3 should be treated as an error and return 0, which it does. I ended up checking the hundreds with if and the tens with division, because that yielded the smallest code size. On a different architecture the binary search may be more efficient but not on mine.\$\endgroup\$CommentedNov 8, 2019 at 16:23
  • \$\begingroup\$If you hadn't answered this question, it might have been closed due to Lack of Concrete Context.\$\endgroup\$
    – pacmaninbw
    CommentedNov 8, 2019 at 16:55
  • \$\begingroup\$There is no need to replace a division by an integer constant with "more efficient" code, as the compiler will turn the division into a multiplication anyway, at all reasonable optimization levels.\$\endgroup\$CommentedFeb 17, 2024 at 8:37
3
\$\begingroup\$

Your compiler should be smart enough to replace the division by multiplication. Here's an experiment I did with CLang on x86_64:

#include <inttypes.h> uint8_t div8_100(uint8_t a) { return a / 100; } uint8_t div8_10(uint8_t a) { return a / 10; } 

clang -O3 -Wall -Weverything -S div8.c

div8_100: movzbl %cl, %eax leal (%rax,%rax,4), %ecx leal (%rax,%rcx,8), %eax shrl $12, %eax retq div8_10: movzbl %cl, %eax imull $205, %eax, %eax shrl $11, %eax retq 

This is fairly efficient code, and it is not using any exotic opcodes, therefore I expect it to be available on every architecture.

In summary, as long as you divide by integer literals, there is no reason for the compiler to call any external division function.

\$\endgroup\$
1
  • \$\begingroup\$I did not know that. Very good to know. That may be the reason why I did not get any benefit from replacing / 10 by @Austin Hastings proposed `if`` statements.\$\endgroup\$CommentedNov 15, 2019 at 8:03

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.