5
\$\begingroup\$

I implemented a key:value format file parser in C, or something that comes close to it as there is no such thing as dynamically creating structs in C.

Assumptions: we are creating a C program that needs some config options (e.g. the number K of threads in a concurrent server that handles requests using a pool of K worker threads), and we want to have them read from a text file. The file contains a pair key:value on each line. In our program, we know in advance the name of the parameters and their type.

fileparser.h

#ifndef FILE_PARSER_H #define FILE_PARSER_H typedef struct _parser Parser; Parser* parseFile(char* filename, char* delim); void getValueFor(Parser* p, char* key, char* dest, char* defaultVal); int testError(Parser* p); void destroyParser(Parser* p); #endif 

fileparser.c

#include "fileparser.h" #include <string.h> #include <stdio.h> #include <stdlib.h> #include "../scerrhand.h" #define MAX_FILENAME_LEN 100 #define MAX_DELIM_LEN 1 #define MAX_KEY_LEN 50 #define MAX_VAL_LEN 200 struct _pair { char key[MAX_KEY_LEN + 1]; char val[MAX_VAL_LEN + 1]; struct _pair* nextPtr; }; typedef struct _parser { char filename[MAX_FILENAME_LEN + 1]; char delim[MAX_DELIM_LEN + 1]; struct _pair* hPtr; int err; } Parser; static int cmpKeys(char* k1, char* k2) { return strcmp(k1, k2); } static void _push(struct _pair** hPtr, struct _pair* newPair) { newPair->nextPtr = *hPtr; *hPtr = newPair; } static int _parse(Parser* p, FILE* fp) { char buf[BUFSIZ + 1]; struct _pair* kvPair; size_t i = 0; while (i++, fgets(buf, BUFSIZ, fp)) { char* savePtr, * token; kvPair = calloc(sizeof(struct _pair), 1); if (!kvPair) { return -1; // out of memory } // get key token = strtok_r(buf, p->delim, &savePtr); strncpy(kvPair->key, token, strlen(token)); // remove any trailing spaces after the key name kvPair->key[strcspn(kvPair->key, " ")] = '\0'; // get value token = strtok_r(NULL, p->delim, &savePtr); if (!token) { // delim missing: invalid syntax return i; } strncpy(kvPair->val, token, strlen(token)); // remove trailing newline kvPair->val[strcspn(kvPair->val, "\n")] = '\0'; _push(&p->hPtr, kvPair); } return 0; } Parser* parseFile(char* filename, char* delim) { FILE* fp; Parser* p = calloc(sizeof(Parser), 1); if (!p) { return NULL; } strncpy(p->filename, filename, MAX_FILENAME_LEN); strncpy(p->delim, delim, MAX_DELIM_LEN); SYSCALL_OR_DIE_NULL(fp = fopen(filename, "r")); // parse file and capture error code p->err = _parse(p, fp); fclose(fp); return p; } int testError(Parser* p) { return p->err; } void getValueFor(Parser* p, char* key, char* dest, char* defaultVal) { struct _pair* currPtr = p->hPtr, * prevPtr = NULL, ** target = NULL; // address of the node to remove after consuming its value while (currPtr && cmpKeys(key, currPtr->key)) { prevPtr = currPtr; currPtr = currPtr->nextPtr; } // copy value if key is found; otherwise copy default value strncpy(dest, currPtr ? currPtr->val : defaultVal, MAX_VAL_LEN); // pop node out of list if (currPtr) { target = prevPtr ? &prevPtr->nextPtr : &p->hPtr; *target = currPtr->nextPtr; free(currPtr); } } void destroyParser(Parser* p) { struct _pair* tmp; while (p->hPtr) { tmp = p->hPtr; p->hPtr = p->hPtr->nextPtr; free(tmp); } free(p); p = NULL; } 

scerrhand.h

#ifndef SC_ERR_HAND_H #define SC_ERR_HAND_H #include <stdlib.h> // makes a system call and exits if return value is not zero #define SYSCALL_OR_DIE_NZ(s) if(s) { puts("System call failed with nonzero status"); exit(EXIT_FAILURE); } // makes a system call and exits if return value is -1 #define SYSCALL_OR_DIE_NEG_ONE(s) if((s) == -1) { puts("System call failed with status -1"); exit(EXIT_FAILURE); } // makes a system call and exits if return value is NULL #define SYSCALL_OR_DIE_NULL(s) if((s) == NULL) { puts("System call returned NULL"); exit(EXIT_FAILURE); } #endif 

I'd like some feedback on design, implementation, and whatnot.

\$\endgroup\$
2
  • 3
    \$\begingroup\$SYSCALL_OR_DIE seems like a good bumper sticker\$\endgroup\$CommentedApr 23, 2021 at 2:53
  • \$\begingroup\$@Reinderien LOL, did I correctly decipher your comment.\$\endgroup\$
    – pacmaninbw
    CommentedApr 23, 2021 at 13:45

2 Answers 2

3
\$\begingroup\$

Naming

C reserves identifiers beginning with _ followed by any letter for the implementation, so they should not be defined by user programs.

I can't quite remember whether leading _ is permitted in struct tags (that's a different namespace to object and function names), because it's simpler to just avoid using leading _ anywhere, and never have to know the exact details of what's allowed.

Exiting

I'm not a fan of exiting the whole program from within a utility function (particularly if we're creating a library - that can really hamstring the caller). We should include <stdio.h> from scerrhand.h so that the header is self-reliant.

Error messages should go to stderr, not stdout. Given that we're using macros here, we have an opportunity to be more informative about exactly what failed (perhaps only in debug builds), using #s in the substitutions. For example:

#define SYSCALL_OR_DIE_NZ(s) \ if (s) { \ fprintf(stderr, "%s failed with nonzero status", #s); \ exit(EXIT_FAILURE); \ } 

We really want to wrap those in the do { … } while (0) idiom, so that they can be used safely in compound statements (e.g. if (c) SYSCALL_OR_DIE(something); else SYSCALL_OR_DIE(something_else);).

Testing

It's a real shame you haven't shown us the unit-tests for this class. I think there are a few cases that are missing from the tests, as we'll see shortly.

Pointless function

This function seems to have no value:

static int cmpKeys(char* k1, char* k2) { return strcmp(k1, k2); } 

We should change the caller to invoke strcmp() directly.

Parse function

This loop control:

size_t i = 0; while (i++, fgets(buf, BUFSIZ, fp)) { 

has an initialisation, a test and a step, so it reads more naturally as a for loop:

for (size_t i = 1; fgets(buf, BUFSIZ, fp); ++i) { 

I'm not sure why we're using size_t for i but return it as int - it would be better if those two types were made to agree.

As the code stands, we silently split any input lines that are longer than BUFSIZ. I think that should be made more robust, with tests to cover.

Also, why did we create buf with BUFSIZE+1 elements, and never use the last one? (fgets() will write up to BUFSIZE characters, including the terminating null).

 kvPair = calloc(sizeof(struct _pair), 1); 

I don't see why this variable needs the entire function as its scope. We can declare and initialise together, for safer code:

 struct _pair *kvPair = calloc(sizeof *kvPair, 1); 

Do we really need the memory to be zeroed? I think not, as we immediately write valid data there. So use plain malloc() instead.

We have completely missed the point of strncpy() here:

 strncpy(kvPair->key, token, strlen(token)); 

This is just a less efficient way to write strcpy(kvPair->key, token), except that doesn't write the null terminator. The point of the length argument is to tell the function how much it may safely write, but we're claiming that kvPair->key has enough space for the token regardless of its length. In this case, we want

 strncpy(kvPair->key, token, sizeof kvPair->key); kvPair->key[sizeof kvPair->key - 1] = '\0'; 

We could do with improved tests of over-length keys, and consider whether we could do better than truncating them.

strcspn() is overkill when there's only a single character we're looking for - use strchr() instead.

Using strtok_r() to get the value will truncate any value that happens to contain the separator string (again, add tests for this, as well as for values longer than MAX_VAL_LEN).

The early return when value is empty omits free(kvPair), which is a memory leak. Execute the unit tests under Valgrind from time to time.

Value getter

It surprised me that getting a value is a destructive operation. I also don't see why the key and defaultVal arguments need to be modifiable - I expected const char* for those.

Given the suggested use cases, it would be helpful to have versions that obtain numeric values directly (e.g. as int, unsigned and/or double), rather than forcing the user to deal with strings.

File interface

The problems in parseFile() are mostly similar to those in _parse().

Destructor

The scope of tmp can be reduced to within the loop.

The assignment to p at the end is a dead write, as the variable goes out of scope immediately after.

There's really no need to keep reassigning p->hPtr within the loop - we can just use a local:

void destroyParser(Parser *parser) { if (!parser) { return; } struct _pair *p = parser->hPtr; while (p) { struct _pair *tmp = p; p = p->nextPtr; free(tmp); } free(parser); } 
\$\endgroup\$
5
  • \$\begingroup\$Thank you! I didn't know about the underscore thing. As per the wrapping the system calls in my macro(s) inside of a do...while(0) loop, would wrapping them inside of a set of curly brackets have the same effect?\$\endgroup\$CommentedApr 23, 2021 at 20:41
  • \$\begingroup\$The do { … } while (0) idiom is more comfortable for the user of the macro than just braces, because it can be followed by semicolon in contexts where an empty statement wouldn't be allowed..\$\endgroup\$CommentedApr 24, 2021 at 6:25
  • \$\begingroup\$Definitely helpful. A few comments: as far as dealing with strings that are larger than BUFSIZ, how would you deal with that? Do you think I could do a check on the string's length with strlen and throw some kind of error/reallocate a bigger buffer? For the use of calloc: I'd initially used malloc, but I was getting some Conditional jump depends on uninitialized memory from valgrind which apparently only went away when I started using calloc instead.\$\endgroup\$CommentedApr 24, 2021 at 11:11
  • \$\begingroup\$A first step would be to at least report the problem to the caller. You'd need to use the return value from fgets() and see if it read a newline - if not, the line was too long for the buffer, and we should read and discard to the next newline, and return a suitable error code. From there, I'd work up towards having code that can adapt and store longer keys and values.\$\endgroup\$CommentedApr 24, 2021 at 17:05
  • 1
    \$\begingroup\$The problem reported by Valgrind (glad to know you're using the tools well), is because you told strncpy to only write the first strlen() characters of the strings - specifically not copying the final null character. Once you correct that, you'll be Valgrind-clean without calloc().\$\endgroup\$CommentedApr 24, 2021 at 17:07
5
\$\begingroup\$

General Observations

It would be much easier to review this code if there was either a unit test or example code to see how this code should be used.

It also isn't clear why the code deletes the key value pair within the function getValueFor(). The function name does not indicate that it is deleting the pair as well as getting the value for the key.

Error Reporting

There are a few of issues with the SYSCALL_OR_DIE_* macros. The first is that you are hiding the calls to exit(), which can be bad for maintenance.

The second is that the error messages are not very meaningful to the person running the program, for instance I would prefer to see something like Can't open file ... for input rather than System call returned NULL. One I know how to fix as a user, the other I have no clue what it means. If this code were in production and the user was getting these error messages the user would need support to know how to correct what was causing the problem.

The third issue is that the error messages are going to stdout rather than stderr where error messages of this nature should go. If the output of the program is being redirected to a file than the user won't know the program failed until the file is examined.

Maintaining Memory Allocation

A fairly common usage of calloc() is to use what the pointer points to in the sizeof() call because this reduces the amount of editing needed if a change is necessary, for example in the following code from static int _parse(Parser* p, FILE* fp) 2 lines would need to be changed if the type of struct was changed:

 struct _pair* kvPair; // change needed here size_t i = 0; while (i++, fgets(buf, BUFSIZ, fp)) { char* savePtr, * token; kvPair = calloc(sizeof(struct _pair), 1); // change needed here if (!kvPair) { return -1; // out of memory } 

The following code is easier to maintain:

 struct _pair* kvPair; // Only this line changes size_t i = 0; while (i++, fgets(buf, BUFSIZ, fp)) { char* savePtr, * token; kvPair = calloc(sizeof(*kvPair), 1); // No need to change this code if (!kvPair) { return -1; // out of memory } 

Unnecessary Wrapping of Function Calls

It isn't clear why the following function has been added or used since it doesn't change the value returned and doesn't really improve readability:

static int cmpKeys(char* k1, char* k2) { return strcmp(k1, k2); } 

If the function was returning a boolean value rather than an int it might make sense but just returning the value of strcmp() doesn't add any value.

A good compiler may just inline this function, otherwise the code is adding the cost of an additional function call.

\$\endgroup\$
2
  • \$\begingroup\$Thank you. I'll include a unit test soon. @popping the elements from the list upon reading the value: I agree that the function name doesn't make it clear. My rationale was that knowing that the list gets smaller as you consume the values would make the use of a list vs say a hash table more acceptable from a time complexity standpoint. In retrospect, that's pretty silly. @errors: do you think that having the macro accept a string and print that with perror would work better?\$\endgroup\$CommentedApr 23, 2021 at 20:38
  • \$\begingroup\$@wrapping the compare function: I did that in case I want to support compare functions supplied externally, so I wouldn't have to go back into the value that calls it and edit it there. I agree it's pretty useless as it is now\$\endgroup\$CommentedApr 23, 2021 at 20:39

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.