The first thing to improve the readability of the code is to define proper data types. Having an unsigned char *
is confusing, both to humans and to the compiler.
#include <stdint.h> typedef struct { uint8_t part[4]; } ip4_address;
Next, I found your code hard to follow because of the forever loop and the many if clauses. You do have a point though because you mention '0' and '9' and '.' only once in the code, which is good.
One thing I don't like is that you write unfinished parsing results to the returned IPv4 address. For example, when parsing 123.456
, the returned IPv4 address would be 123.45.?.?
. It's just the unnecessary memory access that I don't like. I prefer code that keeps the intermediate values in local variables and only writes them back when it is completely calculated.
I find it much simpler to read if the code is grouped by things from the application domain, which in this case is numbers and dots. I would write it like this:
#include <stdbool.h> #include <stdint.h> #include <stdio.h> typedef struct { uint8_t part[4]; } ip4_address; bool is_valid_ip4(const char *str, ip4_address *addr) { if (!str) return false; for (int i = 0; i < 4; i++) { if (!(*str >= '1' && *str <= '9')) return false; unsigned n = *str++ - '0'; if (*str >= '0' && *str <= '9') { n = 10 * n + *str++ - '0'; if (*str >= '0' && *str <= '9') { n = 10 * n + *str++ - '0'; if (n > 255) return false; } } addr->part[i] = n; if (i == 3) return *str == '\0'; if (*str++ != '.') return false; } return false; } int main(int argc, char **argv) { ip4_address addr; if (1 < argc && is_valid_ip4(argv[1], &addr)) { printf("%d.%d.%d.%d\n", addr.part[0], addr.part[1], addr.part[2], addr.part[3]); } else return 1; }
An IPv4 address, in the usual syntax, consists of 4 numbers. This is expressed by the for
loop that contains exactly 4 iterations.
An IPv4 address must start with a number. Therefore the upper part of the for
loop is concerned with parsing and validating this number. The lower part handles the dots.
Organizing the code like this produces fewer jumps when stepping through it, and as a result, the execution position is simple to follow. Inside the for
loop, it just goes straight from the top to the bottom. In your code, on the other hand, it jumps around much more often, which makes the code hard to follow.
Sure, my code repeats the character constants more often and is therefore more susceptible to typos, but finding these is the job of the test suite.
I took care of only defining the minimum necessary variables since keeping track of changing variable values is difficult.
As in your code, I explicitly test against the digits 0 to 9 instead of calling isdigit
, just in case the code runs with an Arabic execution character set that defines some more digits. IPv4 addresses, according to the RFC, are written in ASCII digits.
I didn't write any comments in my code because I think that its structure is so simple and follows common practice that there is nothing surprising to it. By contrast, you wrote many comments, most of which are redundant.
inet_pton
and friends?\$\endgroup\$IPv4Address
struct, or something to that effect, and rename this function to something likeIPv4Address_new()
\$\endgroup\$