5
\$\begingroup\$

Update: link to corresponding release

The code presented here is tagged as release version v0.1.0 on Github for easy reference.

Introduction

bump is a command-line utility that I use to "bump" up the semantic version numbers present in any text file. It is written entirely in C and doesn't use malloc anywhere. Here's a table describing how the various bump levels work:

| Bump | Release | Result on 3.2.6 | |-------|--------------------|-----------------| | Patch | x.y.z -> x.y.(z+1) | 3.2.7 | | Mnior | x.y.z -> x.(y+1).0 | 3.3.0 | | Major | x.y.z -> (x+1).0.0 | 4.0.0 | 

The CLI can be used in two ways:

  1. Interactive mode - when the user simply types bump and does not provide any command-line arguments. In this case the program will prompt the user for input.
  2. Silent mode - when the user passes in the command line arguments like bump -i version.h and the program verifies the argument, deduces the default value for the other arguments and proceeds silently.

The full source code is available on Github under the MIT License.

Overview of the code

Here's how the algorithm works:

  1. Read the input from the file line-by-line.
  2. Process every line character-by-character.
  3. If we read a digit, start a greedy search for the x.y.z pattern.
  4. Proceed until no characters left to read.

The main logic is contained in bump.h and bump.c.

bump.h

#ifndef BUMP_H #define BUMP_H #include <stdio.h> #include <stdlib.h> typedef struct version_struct { size_t major; size_t minor; size_t patch; } Version; typedef struct line_state_struct { const char *input; char *output; size_t input_index; size_t output_index; size_t limit; } LineState; typedef struct file_state_struct { FILE *input; FILE *output; size_t limit; const char *bump_level; } FileState; char *initialize_version(Version *version, size_t major, size_t minor, size_t patch); char *initialize_line_state(LineState *state, const char *input, char *output, size_t limit); char *initialize_file_state(FileState *state, const char *input_path, const char *output_path, const char *bump_level, size_t limit); char *bump_major(Version *version); char *bump_minor(Version *version); char *bump_patch(Version *version); char *convert_to_string(Version *version, char *output_buffer, size_t *length); char *process_line(LineState *state, const char *bump_level); char *process_file(FileState *state); #endif//BUMP_H 

bump.c

#include <bump/bump.h> #include <bump/fileutil.h> #include <ctype.h> #include <stdbool.h> #include <stdint.h> #include <stdio.h> #include <string.h> char *initialize_version(Version *version, const size_t major, const size_t minor, const size_t patch) { if (!version) { return "Empty pointer received."; } version->major = major; version->minor = minor; version->patch = patch; return NULL; } static char *sanity_check(Version *version) { if (!version) { return "Empty pointer received."; } if (version->major == SIZE_MAX) { return "Major version number is too big."; } if (version->minor == SIZE_MAX) { return "Minor version number is too big."; } if (version->patch == SIZE_MAX) { return "Patch version number is too big."; } return NULL; } char *bump_major(Version *version) { char *message = sanity_check(version); if (message) { return message; } version->major++; version->minor = 0; version->patch = 0; return NULL; } char *bump_minor(Version *version) { char *message = sanity_check(version); if (message) { return message; } version->minor++; version->patch = 0; return NULL; } char *bump_patch(Version *version) { char *message = sanity_check(version); if (message) { return message; } version->patch++; return NULL; } char *convert_to_string(Version *version, char *output_buffer, size_t *length) { if (!version) { return "Empty version pointer value."; } if (!output_buffer) { return "Empty output buffer pointer."; } *length = 0; int count = sprintf(output_buffer, "%zu.%zu.%zu", version->major, version->minor, version->patch); if (count < 0) { return "Error occurred while trying to write to string buffer."; } *length = (size_t) count; return NULL; } char *initialize_line_state(LineState *state, const char *input, char *output, const size_t limit) { if (!state) { return "Empty pointer received for state"; } if (!input) { return "Input buffer is null"; } if (!output) { return "Output buffer is null"; } state->input = input; state->output = output; state->input_index = 0; state->output_index = 0; // The minimum length needed for a version string is 5. // If we have a string length limit smaller than that, we clamp the limit to 0. state->limit = limit < 5 ? 0 : limit; return NULL; } static void keep_going(LineState *state) { if (state->input_index == state->limit) { return; } state->output[state->output_index] = state->input[state->input_index]; state->input_index++; state->output_index++; } static size_t extract_decimal_number(LineState *state) { char c; size_t value = 0; while (state->input_index < state->limit && isdigit(c = state->input[state->input_index])) { value *= 10; value += c - '0'; state->input_index++; } return value; } char *process_line(LineState *state, const char *bump_level) { if (!state) { return "Null value received for state"; } if (state->limit == 0 || state->input_index == state->limit) { return NULL; } while (state->input_index < state->limit) { char c = state->input[state->input_index]; if (isdigit(c)) { // Start a greedy search for the pattern of "x.y.z" where x, y, and z are decimal numbers size_t major = extract_decimal_number(state); c = state->input[state->input_index]; if (c != '.') { // We have a normal number. Dump it to the output and proceed normally. int chars_printed = sprintf(state->output + state->output_index, "%zu", major); if (chars_printed < 1) { return "Error occurred while trying to write to output buffer"; } state->output_index += chars_printed; keep_going(state); continue; } state->input_index++; if (state->input_index == state->limit) { return NULL; } c = state->input[state->input_index]; if (!isdigit(c)) { // We have a x. followed by a non-digit int chars_printed = sprintf(state->output + state->output_index, "%zu.", major); if (chars_printed < 1) { return "Error occurred while trying to write to output buffer"; } state->output_index += chars_printed; keep_going(state); continue; } size_t minor = extract_decimal_number(state); if (state->input_index == state->limit) { return NULL; } c = state->input[state->input_index]; if (c != '.') { // We have an input of the form x.y only. No period follows the y int chars_printed = sprintf(state->output + state->output_index, "%zu.%zu", major, minor); if (chars_printed < 1) { return "Error occurred while trying to write to output buffer"; } state->output_index += chars_printed; keep_going(state); continue; } state->input_index++; c = state->input[state->input_index]; if (!isdigit(c)) { // We have a x.y. followed by a non-digit int chars_printed = sprintf(state->output + state->output_index, "%zu.%zu.", major, minor); if (chars_printed < 1) { return "Error occurred while trying to write to output buffer"; } state->output_index += chars_printed; keep_going(state); continue; } size_t patch = extract_decimal_number(state); c = state->input[state->input_index]; if (c == '.') { // We have x.y.z. which is invalid. int chars_printed = sprintf(state->output + state->output_index, "%zu.%zu.%zu", major, minor, patch); if (chars_printed < 1) { return "Error occurred while trying to write to output buffer"; } state->output_index += chars_printed; keep_going(state); continue; } // We now have all three numbers. Version version = {0}; initialize_version(&version, major, minor, patch); if (strcmp(bump_level, "major") == 0) { bump_major(&version); } else if (strcmp(bump_level, "minor") == 0) { bump_minor(&version); } else if (strcmp(bump_level, "patch") == 0) { bump_patch(&version); } else { return "Invalid bump level"; } size_t version_len; char *error = convert_to_string(&version, state->output + state->output_index, &version_len); state->output_index += version_len; if (error) { return error; } if (state->input_index < state->limit) { strcpy(state->output + state->output_index, state->input + state->input_index); } // We are done so we exit early return NULL; } else { keep_going(state); } } return NULL; } char *initialize_file_state(FileState *state, const char *input_path, const char *output_path, const char *bump_level, const size_t limit) { if (!state) { return "Null pointer received for FileState"; } if (!input_path) { return "Empty file path provided for input"; } if (!output_path) { return "Empty file path provided for output"; } if (!bump_level) { return "Invalid value received for bump level"; } if (!file_is_valid(input_path, "r")) { return "Cannot open input file for reading"; } if (!file_is_valid(output_path, "w")) { return "Cannot open output file for writing"; } state->input = fopen(input_path, "r"); state->output = fopen(output_path, "w"); state->bump_level = bump_level; state->limit = limit; return NULL; } char *process_file(FileState *state) { if (!state) { return "File state is null"; } char input_buffer[state->limit + 1]; char output_buffer[state->limit + 1]; size_t len; bool keep_going = true; while (1) { char *error; error = read_line(state->input, input_buffer, &len, state->limit); if (error) { keep_going = false; } memset(output_buffer, 0, state->limit); LineState line_state = {0}; error = initialize_line_state(&line_state, input_buffer, output_buffer, state->limit); if (error) { return error; } while (line_state.input_index < len) { error = process_line(&line_state, state->bump_level); if (error) { return error; } } if (keep_going) { fprintf(state->output, "%s\n", output_buffer); return NULL; } else { fprintf(state->output, "%s", output_buffer); fclose(state->input); fclose(state->output); return "End of file reached"; } } } 

fileutil.h

#ifndef BUMP_FILEUTIL_H #define BUMP_FILEUTIL_H #include <stdio.h> #include <stdbool.h> bool file_is_valid(const char *input_path, const char *mode); char *read_line(FILE *input, char *buffer, size_t *length, size_t limit); #endif//BUMP_FILEUTIL_H 

fileutil.c

#include <bump/fileutil.h> #include <memory.h> #include <stdbool.h> #include <stdio.h> bool file_is_valid(const char *input_path, const char *mode) { FILE *input_file = fopen(input_path, mode); bool result = input_file != NULL; if (result) { fclose(input_file); } return result; } static char *validate(FILE *input, const char *buffer) { if (!input) { return "Empty pointer for input file."; } if (!buffer) { return "Storage buffer is empty."; } return NULL; } char *read_line(FILE *input, char *buffer, size_t *length, size_t limit) { char *error = validate(input, buffer); if (error) { return error; } int ch; *length = 0; memset(buffer, 0, limit); while (*length < limit && (ch = fgetc(input)) != '\n' && ch != EOF) { buffer[*length] = (char) ch; (*length)++; } buffer[*length] = '\0'; return ch == EOF ? "End of file reached" : NULL; } 

main.c

This file handles all the cases where we need to process the arguments provided or we need to obtain inputs from the user. These are essential for the executable, but not the library, so that's why I've separated the components this way.

#include <bump/bump.h> #include <bump/fileutil.h> #include <bump/version.h> #include <ctype.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_LINE_LENGTH 511 #define INCORRECT_USAGE "Incorrect usage. Type bump --help for more information." #define INTERMEDIATE_FILE "intermediate.temp" /** * Convert the characters stored in the source string to lowercase and store * them in the destination buffer. Additionally, store the number of characters * processed in the len pointer. * * @param destination The buffer to store the lowercase string in. * @param source The source string to convert to lowercase. * @param len The size_t pointer to store the number of characters processed. */ static void convert_to_lowercase_and_store_length(char *destination, const char *source, size_t *len) { *len = 0; for (char *p = (char *) source; *p; p++, (*len)++) { destination[*len] = (char) tolower(*p); } } static void store_file_name_in(const char *prompt, char *file_name_buffer, bool for_input) { bool we_have_file = false; while (true) { printf("%s", prompt); char buffer[MAX_LINE_LENGTH + 1]; size_t len = 0; char *error = read_line(stdin, buffer, &len, MAX_LINE_LENGTH); if (error) { printf("Could not read line. The following error occurred: %s\nTry again.\n", error); continue; } we_have_file = file_is_valid(buffer, for_input ? "r" : "w"); if (we_have_file) { memcpy(file_name_buffer, buffer, len + 1); break; } printf("Could not location file with path: \"%s\"\nTry again.\n", buffer); } } static void store_bump_level_in(char *bump_level_buffer) { bool we_have_bump = false; while (true) { printf("Enter bump level (major/minor/[patch] or M/m/[p]): "); char buffer[MAX_LINE_LENGTH + 1]; size_t len = 0; char *error = read_line(stdin, buffer, &len, MAX_LINE_LENGTH); if (error) { printf("Could not read line. The following error occurred: %s\nTry again.\n", error); continue; } if (len == 0) { strcpy(bump_level_buffer, "patch"); len = strlen("patch"); we_have_bump = true; } else if (len == 1) { bool major = buffer[0] == 'M'; bool minor = buffer[0] == 'm'; bool patch = buffer[0] == 'p'; if (major || minor || patch) { we_have_bump = true; } if (major) strcpy(bump_level_buffer, "major"); if (minor) strcpy(bump_level_buffer, "minor"); if (patch) strcpy(bump_level_buffer, "patch"); } else { if (strcmp(buffer, "major") == 0 || strcmp(buffer, "minor") == 0 || strcmp(buffer, "patch") == 0) { we_have_bump = true; } memcpy(bump_level_buffer, buffer, len + 1); } if (we_have_bump) { break; } printf("Could not identify bump level; use major/minor/patch or M/m/p.\nTry again.\n"); } } static bool read_confirmation() { while (true) { printf("Modify the input file? ([yes]/no or [y]/n): "); char buffer[MAX_LINE_LENGTH + 1]; size_t len = 0; char *error = read_line(stdin, buffer, &len, MAX_LINE_LENGTH); if (error) { printf("Could not read line. The following error occurred: %s\nTry again.\n", error); continue; } convert_to_lowercase_and_store_length(buffer, buffer, &len); if (len == 0) { return true; } else if (len == 1) { if (buffer[0] == 'y') { return true; } if (buffer[0] == 'n') { return false; } } else { if (strcmp(buffer, "yes") == 0) { return true; } if (strcmp(buffer, "no") == 0) { return false; } } printf("Could not understand input.\nTry again.\n"); } } static void print_help_message() { const char *help_message = "Bump semantic version value\n" "===========================\n" "Version : " BUMP_VERSION "\n" "Author : Subhomoy Haldar (@hungrybluedev)\n" "Details : A command-line utility that bumps up the semantic version\n" " number of all instances it detects in a file. It can work\n" " in-place and bump the patch, minor, or major version number.\n\n" "Usage :\n" "1. bump\n" " Starts the program in interactive mode.\n" "2. bump [--help|-h]\n" " Displays this help message.\n" "3. bump [--version|-v]\n" " Shows the current version of this program.\n" "4. bump [--input|-i] path/to/file.txt [[--level|-l] [major|minor|patch]]? \\\n" " [[--output|-o] path/to/output_file.txt]?\n" " Performs the processing on the file path provided if it exists.\n\n" " The level switch and value is optional. The values allowed are:\n" " a. patch or p - a.b.c -> a.b.(c + 1)\n" " b. minor or m - a.b.c -> a.(b + 1).0\n" " c. major or M - a.b.c -> (a + 1).0.0\n\n" " The default level is \"patch\".\n\n" " The output switch and value pair is also optional. By default\n" " the result is stored in-place, in the input file."; printf("%s\n", help_message); } static char *process_input_path_value(char *input_file_name, bool *we_have_input_path, const char *file_path) { if (*we_have_input_path) { return "Repeated input file switch."; } if (!file_is_valid(file_path, "r")) { return "The input file path provided is not valid."; } strcpy(input_file_name, file_path); *we_have_input_path = true; return NULL; } static char *process_output_path_value(char *output_file_name, bool *we_have_output_path, const char *file_path) { if (*we_have_output_path) { return "Repeated output file switch."; } if (!file_is_valid(file_path, "w")) { return "The output file path provided is not valid."; } strcpy(output_file_name, file_path); *we_have_output_path = true; return NULL; } static char *process_single_switch(const char *switch_value) { char command[MAX_LINE_LENGTH] = {0}; size_t len; convert_to_lowercase_and_store_length(command, switch_value, &len); if (len == 2) { // We have an abbreviated switch. It must begin with a '-' if (command[0] != '-') { return INCORRECT_USAGE; } switch (command[1]) { case 'h': print_help_message(); return NULL; case 'v': printf("Bump version : %s\n", BUMP_VERSION); return NULL; default: return INCORRECT_USAGE; } } if (len == 6 && strcmp(command, "--help") == 0) { print_help_message(); return NULL; } if (len == 9 && strcmp(command, "--version") == 0) { printf("Bump version : %s\n", BUMP_VERSION); return NULL; } // If we reached here, we must have had no matching switch return INCORRECT_USAGE; } static char *process_bump_value(char *bump_level, bool *we_have_bump_value, const char *bump_level_argument) { char buffer[MAX_LINE_LENGTH + 1]; size_t value_len; convert_to_lowercase_and_store_length(buffer, bump_level_argument, &value_len); if (*we_have_bump_value) { return "Repeated patch level switch."; } if (value_len == 1) { switch (bump_level_argument[0]) { case 'p': // No need to copy, it is already the default. break; case 'm': strcpy(bump_level, "minor"); break; case 'M': strcpy(bump_level, "major"); break; default: return "Unrecognised patch value. Valid levels are major(M), minor(m), or patch(p)."; } } else if (value_len == 5) { if (strcmp(bump_level_argument, "major") == 0) { strcpy(bump_level, "major"); } else if (strcmp(bump_level_argument, "minor") == 0) { strcpy(bump_level, "minor"); } else if (strcmp(bump_level_argument, "patch") == 0) { // No need to copy } else { return "Unrecognised patch value. Valid levels are major(M), minor(m), or patch(p)."; } } *we_have_bump_value = true; return NULL; } int main(int argc, char const *argv[]) { char input_file_name[MAX_LINE_LENGTH + 1] = {0}; char output_file_name[MAX_LINE_LENGTH + 1] = {0}; char bump_level[MAX_LINE_LENGTH + 1] = {0}; char *error; // Process command-line arguments if (argc == 1) { // There are no arguments. We need to obtain the values required through user input. store_file_name_in("Enter file name to process : ", input_file_name, true); store_bump_level_in(bump_level); if (read_confirmation()) { strcpy(output_file_name, input_file_name); } else { store_file_name_in("Enter output file name : ", output_file_name, false); } } else if (argc == 2) { // Help and version commands. // In all cases, the program will never execute code afterwards outside this block. error = process_single_switch(argv[1]); if (error) { puts(error); return EXIT_FAILURE; } else { return EXIT_SUCCESS; } } else if (argc % 2 == 0) { // The pairs are mismatched puts(INCORRECT_USAGE); return EXIT_FAILURE; } else { // The parameters should be present in pairs. // The count must be odd because the name of the executable is the first argument. bool we_have_input_path = false; bool we_have_bump_value = false; bool we_have_output_path = false; strcpy(bump_level, "patch"); // Iterate pair wise. We do not care about the order in which // the pairs appear. We are only interested in the values, and // if they are valid for the switch given. for (size_t index = 1; index < argc; index += 2) { // The first of the pair should be a switch if (argv[index][0] != '-') { puts(INCORRECT_USAGE); return EXIT_FAILURE; } size_t arg_len = strlen(argv[index]); if (arg_len == 2) { // Abbreviated switch switch (argv[index][1]) { case 'i': error = process_input_path_value(input_file_name, &we_have_input_path, argv[index + 1]); if (error) { puts(error); return EXIT_FAILURE; } break; case 'l': error = process_bump_value(bump_level, &we_have_bump_value, argv[index + 1]); if (error) { puts(error); return EXIT_FAILURE; } break; case 'o': error = process_output_path_value(output_file_name, &we_have_output_path, argv[index + 1]); if (error) { puts(error); return EXIT_FAILURE; } break; default: puts(INCORRECT_USAGE); return EXIT_FAILURE; } } else { // Switch is not abbreviated if (strcmp(argv[index], "--input") == 0) { error = process_input_path_value(input_file_name, &we_have_input_path, argv[index + 1]); if (error) { puts(error); return EXIT_FAILURE; } } else if (strcmp(argv[index], "--level") == 0) { error = process_bump_value(bump_level, &we_have_bump_value, argv[index + 1]); if (error) { puts(error); return EXIT_FAILURE; } } else if (strcmp(argv[index], "--output") == 0) { error = process_output_path_value(output_file_name, &we_have_output_path, argv[index + 1]); if (error) { puts(error); return EXIT_FAILURE; } } } } if (!we_have_input_path) { puts("Input file not specified."); return EXIT_FAILURE; } if (!we_have_output_path) { strcpy(output_file_name, input_file_name); } } bool inplace = strcmp(input_file_name, output_file_name) == 0; if (inplace) { // Check if we can create the file for writing if (!file_is_valid(INTERMEDIATE_FILE, "w")) { puts("Could not create temporary file for writing."); return EXIT_FAILURE; } // Mark the output as the temporary. strcpy(output_file_name, INTERMEDIATE_FILE); } FileState state = {0}; initialize_file_state(&state, input_file_name, output_file_name, bump_level, MAX_LINE_LENGTH); while (!process_file(&state)) ; if (inplace) { int c; FILE *temp = fopen(INTERMEDIATE_FILE, "r"); FILE *dump = fopen(input_file_name, "w"); while ((c = fgetc(temp)) != EOF) { fputc(c, dump); } fclose(temp); fclose(dump); remove(INTERMEDIATE_FILE); } return EXIT_SUCCESS; } 

Unit tests

I use µnit for unit testing. These are the tests that I have in place for CI/CD checks on Github

#include <stdio.h> #include <bump/bump.h> #include <bump/version.h> #include <munit.h> /* * * UNIT TEST FUNCTIONS * =================== * * All unit tests are of the form: * * MunitResult <test_name>(const MunitParameter params[], * void *user_data_or_fixture) { * // perform tests. * // use the munit_assert_... macros. * return MUNIT_OK; * } * * It is necessary for the unit test functions to be added to * the `test` array. */ MunitResult bump_patch_0_0_1() { Version version = {0}; munit_assert_null(initialize_version(&version, 0, 0, 1)); bump_patch(&version); munit_assert(version.major == 0); munit_assert(version.minor == 0); munit_assert(version.patch == 2); return MUNIT_OK; } MunitResult bump_patch_0_0_15() { Version version = {0}; munit_assert_null(initialize_version(&version, 0, 0, 15)); bump_patch(&version); munit_assert(version.major == 0); munit_assert(version.minor == 0); munit_assert(version.patch == 16); return MUNIT_OK; } MunitResult bump_minor_0_0_23() { Version version = {0}; munit_assert_null(initialize_version(&version, 0, 0, 23)); bump_minor(&version); munit_assert(version.major == 0); munit_assert(version.minor == 1); munit_assert(version.patch == 0); return MUNIT_OK; } MunitResult bump_major_0_2_8() { Version version = {0}; munit_assert_null(initialize_version(&version, 0, 2, 8)); bump_major(&version); munit_assert(version.major == 1); munit_assert(version.minor == 0); munit_assert(version.patch == 0); return MUNIT_OK; } MunitResult convert_0_0_1() { char line[16]; Version version = {0}; munit_assert_null(initialize_version(&version, 0, 0, 1)); size_t len; munit_assert_null(convert_to_string(&version, line, &len)); munit_assert_string_equal(line, "0.0.1"); munit_assert(len == 5); return MUNIT_OK; } MunitResult convert_4_23_56() { char line[16]; Version version = {0}; munit_assert_null(initialize_version(&version, 4, 23, 56)); size_t len; munit_assert_null(convert_to_string(&version, line, &len)); munit_assert_string_equal(line, "4.23.56"); munit_assert(len == 7); return MUNIT_OK; } MunitResult process_line_0_1_1() { char line[10] = {0}; const char *input_line = "v0.1.1"; LineState state = {0}; initialize_line_state(&state, input_line, line, strlen(input_line)); munit_assert_null(process_line(&state, "patch")); munit_assert_string_equal(line, "v0.1.2"); munit_assert_size(state.input_index, ==, 6); munit_assert_size(state.output_index, ==, 6); return MUNIT_OK; } MunitResult process_line_1_1_51() { char line[40] = {0}; const char *input_line = "2.5 is a number 1.1.51 is the version"; LineState state = {0}; initialize_line_state(&state, input_line, line, strlen(input_line)); munit_assert_null(process_line(&state, "minor")); munit_assert_string_equal(line, "2.5 is a number 1.2.0 is the version"); munit_assert_size(state.input_index, ==, 22); munit_assert_size(state.output_index, ==, 21); return MUNIT_OK; } MunitResult process_two() { char line[50] = {0}; char buffer[50] = {0}; const char *input_line = "First we (12) have 1.6.84, then we have 8.16.3!"; LineState state = {0}; initialize_line_state(&state, input_line, line, strlen(input_line)); munit_assert_null(process_line(&state, "patch")); munit_assert_string_equal(line, "First we (12) have 1.6.85, then we have 8.16.3!"); munit_assert_size(state.input_index, ==, 25); munit_assert_size(state.output_index, ==, 25); strcpy(buffer, line); munit_assert_null(process_line(&state, "major")); munit_assert_string_equal(line, "First we (12) have 1.6.85, then we have 9.0.0!"); munit_assert_size(state.input_index, ==, 46); munit_assert_size(state.output_index, ==, 45); return MUNIT_OK; } MunitResult process_test_cases() { // Things to update: // 1. the `count` variable - stores the number of test cases // 2. the `input_lines` array - stores the input lines // 3. the `expected_lines` array - stores 3 types of outputs for each input line. The order is patch, minor, major const size_t count = 5;// <- update this const size_t max_line_width = 256; char line[max_line_width]; char copy[max_line_width]; const char *bump_levels[] = {"patch", "minor", "major"}; const char *input_lines[] = { "<modelVersion>4.0.0</modelVersion>", "#define VERSION \"0.3.56\"", "2.5 is a number but 1.0.7 is a version", "8.0. is not a version", "Let's put one at the end 9.", }; // ^ // +-- Add new input lines at the end const char *expected_lines[] = { "<modelVersion>4.0.1</modelVersion>", "<modelVersion>4.1.0</modelVersion>", "<modelVersion>5.0.0</modelVersion>", "#define VERSION \"0.3.57\"", "#define VERSION \"0.4.0\"", "#define VERSION \"1.0.0\"", "2.5 is a number but 1.0.8 is a version", "2.5 is a number but 1.1.0 is a version", "2.5 is a number but 2.0.0 is a version", "8.0. is not a version", "8.0. is not a version", "8.0. is not a version", "Let's put one at the end 9.", "Let's put one at the end 9.", "Let's put one at the end 9.", }; // ^ // +-- Add the three variations of the outputs at the end. Remember, the order is // patch, minor, major for (size_t index = 0; index < count; ++index) { strcpy(line, input_lines[index]); for (size_t bump_index = 0; bump_index < 3; ++bump_index) { memset(copy, 0, max_line_width); LineState state = {0}; munit_assert_null(initialize_line_state(&state, line, copy, max_line_width)); while (state.input_index < state.limit) { process_line(&state, bump_levels[bump_index]); } munit_assert_string_equal(expected_lines[index * 3 + bump_index], copy); } } return MUNIT_OK; } /* * MUNIT TEST CONFIGURATION * ======================== * * Boilerplate code for the munit testing library. * The last NULL test item acts as a sentinel. */ MunitTest tests[] = { {"/bump_patch_0_0_1", bump_patch_0_0_1, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/bump_patch_0_0_15", bump_patch_0_0_15, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/bump_minor_0_0_23", bump_minor_0_0_23, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/bump_major_0_2_8", bump_major_0_2_8, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/convert_0_0_1", convert_0_0_1, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/convert_4_23_56", convert_4_23_56, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/process_line_0_1_1", process_line_0_1_1, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/process_line_1_1_51", process_line_1_1_51, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/process_two", process_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/process_test_cases", process_test_cases, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {NULL, NULL, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}}; static const MunitSuite suite = {"/bump-test-suite", tests, NULL, 1, MUNIT_SUITE_OPTION_NONE}; /* * MAIN FUNCTION * ============= */ int main(int argc, char *argv[]) { printf("Bump project version: %s\n\n", BUMP_VERSION); return munit_suite_main(&suite, NULL, argc, argv); } 

Specific concerns

  1. The process_line function seems a bit repetitive but that's the best I could think of. Any suggestions to improve this?
  2. Are there any potential security issues? Does anything seem like blasphemy?
  3. What could I do to improve the project in general?

Edit: Agreement with G. Sliepen

My main intention was to make a working prototype and get some eyeballs on this project as soon as possible. This is one of the reasons why I didn't jump the gun and release it as version 1.0.0. It's still at 0.0.8.

I will create a follow-up post addressing (some of) the following issues:

  1. Writing error messages to stderr. This was an oversight on my part.
  2. Checking the error messages returned by my functions. Overconfidence and tiredness. Will fix.
  3. Avoiding TOCTOU. I wasn't aware of this. I will consider it from now on.
  4. Writing to buffers blindly. I will use the size safe variants of the functions.
  5. Check for overflows when reading in values. I will prefer using strtoull and other related functions properly.
  6. About hard-coding file lengths: this I mean to change eventually. I want the interactive input file length to be limited, while the actual files being operated on can have any arbitrary file length.
  7. Processing files in chunks instead of byte-by-byte. This is just a prototype and I will eventually work on improving the performance. I will take care of this as well.
  8. More rigorous error-checking in general. Won't hurt; will help a lot.

There are a few reasons why I do not want to use third-party libraries:

  1. While I agree that using a regular expression library will make my code easier to read, I doubt it will be faster. Also, I'm learning about Automata Theory - this is practice for my future projects.
  2. Finding compatible licenses is difficult. Best to avoid controversy and write everything myself. Less hoops to jump through if something needs fixing.
  3. I know I'm re-inventing the wheel. I've been told this several times. But it's the only way I've ever learned a lot really quickly.
  4. I try to avoid using malloc.
  5. I'm writing this for all the three major OS's: Windows, MacOS and Linux.

I'm open to more suggestions and recommendations. I take all relevant feedback into account.

\$\endgroup\$
1
  • \$\begingroup\$Compiled regular expressions will often beat hand-rolled parsers. It's not that hard to find MIT and BSD licensed libraries, you really should not try to implement everything yourself. While it's great to do that to learn things, it's not great for maintainability of your software projects. And consider the art of finding and using libraries also something worth learning ;) Avoiding unnecessarymalloc() calls is great, but don't make this a religion. And it's great that you try to write cross-platform code!\$\endgroup\$CommentedOct 24, 2020 at 17:18

1 Answer 1

4
\$\begingroup\$

There is quite a lot to remark. In general, I think you spend too much time implementing things by hand, when the functionality is also found in the standard library or other commonly available libraries. Furthermore, your program does not do proper error checking, possibly causing silent data corruption. Here is a list of issues I found:

Typedefs

When creating a struct and typedef'ing, I would ensure the names are the same. There is nothing to gain from making the struct's name snake_case with a _struct at the end, but typedef it to something PascalCase. Just write:

typedef struct Version { ... } Version; 

Don't return error messages

Instead of returning a string to indicate an error, I strongly recommend you return a bool instead to signal success, and either print the error message right where it happens, or possibly store it in a global variable.

Note that you are also casting away the constness of the string literals by having the return values be char *. Consider using the -Wwrite-strings option if you are using Clang or GCC to have it warn about this.

If you want to keep your code short you can still write one-liners for both methods mentioned:

extern const char *errmsg; ... bool some_function(...) { ... if (/* error condition */) return fprintf(stderr, "An error occurred!\n"), false; if (/* some other error */) return errmsg = "Another error occurred!", false; else return true; } 

Write error messages to stderr

Don't write error messages to the standard output, as this can cause undesired behaviour if, for example, the regular output is also written to standard output, or if it is redirected somewhere else. Consider for example the following, slightly contrived example:

bump -i input.txt -l minor -o /dev/stdout >output.txt 

Actually check the return value of your functions

intialize_file_state() returns a value indicating whether an error occured or not, but you ignore it when calling that function from main().

Avoid TOCTOU bugs

You first call file_is_valid() to check if a file can be opened, and then call the real fopen(). Something can happen between the check if the file is valid and the second call to fopen(), so that latter can still fail. The proper way is to not use file_is_valid() at all, but do the error checking on the other calls to fopen().

Never write to buffers you don't know the size of

In convert_to_string(), you are happily writing to output_buffer without knowing its size. Ensure you pass the length of the output buffer as a parameter to the function. You can reuse length for that: have the caller set it to the size of the output buffer. Use snprintf() instead of sprintf() to ensure you don't write past the end of the output buffer. You also need to check the return value for both negative values and values larger than the size of the output buffer, since that will indicate that not every character could be written.

Check for numbers larger than can fit in a size_t

In extract_decimal_numbers(), you will happily read in a number with hundreds of digits, overflowing value in the process. It is better to avoid doing this kind of parsing yourself, and instead use strtoul(), paying close attention to its return value to check if it signals an overflow.

Consider using regular expressions to parse the input

The task of finding a version number string in a line of text is easier if you can use regular expressions. On POSIX systems, you can use regcomp(), regexec() and related functions to do this. Alternatively, you can use an external library, such as PCRE. This will greatly simplify your code, and will likely be much more efficient.

Avoid hardcoding line lengths

Are you absolutely sure no file will have lines longer than 511 bytes? What about 1023 bytes? There is no way to tell. Do you want your program to be responsible for inadvertently cutting of parts of long lines? Either exit with an error message and non-zero exit code if you couldn't handle the input correctly, or ensure your program handles arbitrary line lengths in some way. There are various ways to do the latter:

  • Resize the line buffers when you detect they are not large enough.
  • Keep a small buffer, but after processing the contents of the buffer, read in the next part of the line in the buffer and process that. Of course, you need to handle the case of a version number crossing the boundary of the buffer correctly.
  • Use mmap() to map the whole file into memory. This might not be possible for multi-gigabyte files on a 32-bit platform though.

Use getopt() to parse command-line arguments

On POSIX systems, you can use getopt() and likely also getopt_long() to parse command-line arguments. This is vastly simpler than hand-rolling your own argument parser. If you are compiling for Windows, try to find a library implementing getopt() and use that.

Make process_file() process the whole file

Why does process_file() only process a single line? It already has a while-loop, just don't call return NULL after processing each line. Also, when you have processed the whole file, that is expected behaviour and not an error, so you shouldn't return an error in that case. With that changed, you no longer need to call it from inside a while-loop in main().

Avoid processing files byte by byte

There are several places where you read one byte at a time from a file. That is quite inefficient. Even though the data is cached, the overhead of the function calls will slow down your program unnecessarily. This is especially the case when copying the intermediate file back to the original file. Try to use a large buffer (4 kilobytes is a nice value for various reasons), and use fread() and fwrite() to read a whole buffer at a time.

Also check whether writing the output succeeded

While you have some error checking when reading the input file, there is no error checking done while writing to the output file after you have opened it. Check the return value of fprintf() when writing a line to the output file, and also check the return value of fclose()!

\$\endgroup\$
4
  • \$\begingroup\$Thank you for taking the time to post a detailed answer! I agree with your answer for the most part. I wanted to ask a few questions: Why should the struct name and the typedef name be the same? Why is it bad to return char * (or more appropriately const char *) instead of the bool and global variable combination you mentioned? And besides having decades of field experience, are there any books or whitepapers you recommend for people in academia (like me) to learn the best practices for C programming? (I will update my post to acknowledge the other points you have mentioned shortly.)\$\endgroup\$CommentedOct 24, 2020 at 16:25
  • \$\begingroup\$But why would you make the struct and typedef name different? There is no point in adding possible confusion. The problem with returning an error message is that the error message is just for human consumption. Error codes, booleans and so on are easier for the program to deal with internally. With strings, there is also possible confusion, like what if a function returns "", or "Success" instead of NULL? You'll also note that the standard library and many other libraries never return error strings.\$\endgroup\$CommentedOct 24, 2020 at 17:04
  • 1
    \$\begingroup\$My own decades of field experience, a large part of them in academia as well, have told me that there is no single source of knowledge that will teach you everything. As for books I've only read K&R's C 2nd edition, while a great book it's not exactly the best source for software engineering practices. Maybe someone else can give better advice. But keep programming, look at how other software is written, and send your code for reviews :)\$\endgroup\$CommentedOct 24, 2020 at 17:09
  • \$\begingroup\$I suppose the bool approach does seem more robust. Need to do some aggressive refactoring now. I'll inform you when I write a follow up to this post.\$\endgroup\$CommentedOct 24, 2020 at 18:51

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.