7
\$\begingroup\$

I am a C++ dev trying to solve problems in C to practice it, specifically the LeetCode Merge Strings Alternately problem:

You are given two strings word1 and word2. Merge the strings by adding letters in alternating order, starting with word1. If a string is longer than the other, append the additional letters onto the end of the merged string.

Return the merged string.

Here is my code. I am looking for critiques from experienced folks out there.

char *mergeAlternately(char *word1, char *word2) { int len = strlen(word1) + strlen(word2); bool skip_p1 = false; bool skip_p2 = false; char *result = (char *)malloc(sizeof(char) * (strlen(word1) + strlen(word2) + 1)); char *p1 = word1; char *p2 = word2; for (int i;;) { if (*p1 == '\0') { skip_p1 = 1; } if (*p2 == '\0') { skip_p2 = 1; } if (skip_p1 && skip_p2) { result[i] = '\0'; return result; } else if (skip_p1) { result[i++] = *p2; p2++; } else if (skip_p2) { result[i++] = *p1; p1++; } else { result[i++] = *p1; p1++; result[i++] = *p2; p2++; } } return "SHOULD NEVER REACH HERE"; } int main() { char *word1 = "abc"; char *word2 = "pqr"; char *result = mergeAlternately(word1, word2); printf("Result: %s\n", result); return 0; } 
\$\endgroup\$
4
  • 1
    \$\begingroup\$How well does this code perform on the Leetcode website when submitted? Have you turned up the warning sensitivity when you've tried compiling this code?\$\endgroup\$
    – Fe2O3
    CommentedFeb 25 at 8:25
  • \$\begingroup\$@Fe2O3 speed it beats 100% at 0ms memory it is at 8.22MB Beats 3.43%\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 7:19
  • 1
    \$\begingroup\$"speed it beats..." Okay... Just that the variable i is not initialised, so it's something of a stroke of good luck, I guess, that this even runs... Not a good idea to use uninitialised variables...\$\endgroup\$
    – Fe2O3
    CommentedFeb 26 at 8:25
  • \$\begingroup\$yeah it was bad code.. it got lucky that it was 0. I don't know why it's doing so bad in memory consumption.\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 16:47

5 Answers 5

8
\$\begingroup\$

We are missing includes of headers <stdlib.h> (for malloc()) and <string.h> (for strlen()). Without these, compilation will assume they both return int, which may cause Undefined Behaviour.

Although bool is now part of the core language, we should consider including <stdbool.h> for compatibility with compilers that don't yet support C23.


The function needs a good comment to tell its users that it returns a pointer to memory that they need to free(). Unlike many other languages, memory management in C is very manual, and it's vitally important to know who owns memory resources such as this at any given point.


We don't intend to modify the contents of the strings that are passed as arguments, so we should declare them const char* (or equivalently, char const* - the order isn't significant).


Also, memory allocation with malloc() can fail - a sensible way to deal with that is to return null, and pass the handling on to the caller. Note also that the sizeof operator measures in units of char - so sizeof (char) is tautologically 1.

A better allocation would look like this:

 size_t const len = strlen(word1) + strlen(word2); char *const result = malloc(len + 1); if (!result) { return result; } 

As an aside, note that char *const result means we won't change result to point anywhere else. But we're free to modify the chars that it points to.


The skip_p1 and skip_p2 variables are clear in their purpose. Since they are of bool type, it's clearer to assign true rather than 1:

 if (*p1 == '\0') { skip_p1 = true; } if (*p2 == '\0') { skip_p2 = true; } 

We could replace these by declaring the variables within the for loop:

 for (int i;;) { const bool skip_p1 = (*p1 == '\0'); const bool skip_p2 = (*p2 == '\0'); 

Although this code shouldn't be reachable, it's problematic on two counts:

 return "SHOULD NEVER REACH HERE"; 

Since this string wasn't allocated by malloc(), it's an error for the caller to free() it. And because string literals may be stored in read-only memory, we shouldn't use them to initialise char* - only char const*.

I might include <assert.h> and replace with

 assert(!"unreachable"); free(result); return NULL; 

That said, I'm happy to trust my compiler's warnings and omit that part entirely. If there's a way it can be reached, I'll get a warning that we fail to return a value in that path.


As to the algorithm, it's reasonably clear. One thing we might want to consider is that when one of the strings ends, then we can just strcpy() the other into the output (even if the other string has also ended, as that's a good way to add the terminating null character).


Modified code

Taking all the above into account, we get something that looks very different from the original. But I hope it's educational.

#include <stdlib.h> #include <string.h> /* * Create a new string formed by alternating the letters of s1 and s2. * The returned string comes from malloc(); caller must call free(). * Returns a null pointer if allocation fails. */ char *mergeAlternately(char const *s1, char const *s2) { size_t const len = strlen(s1) + strlen(s2); char *const result = malloc(len + 1); if (!result) { return result; } char *out = result; for (;;) { if (*s1 == '\0') { strcpy(out, s2); return result; } if (*s2 == '\0') { strcpy(out, s1); return result; } *out++ = *s1++; *out++ = *s2++; } } 

It's always helpful to have a test program that illustrates how to use the function correctly:

#include <stdio.h> int main() { char *merged = mergeAlternately("AAA", "bbbbb"); if (!merged) { fprintf(stderr, "Allocation failure\n"); return EXIT_FAILURE; } printf("Merged result: %s\n", merged); free(merged); } 
\$\endgroup\$
7
  • \$\begingroup\$Thanks. btw I think declaring the skip_p1 and skip_p2 inside the for loop wouldn't work. because once set, their values won't retain for the next iterations.\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 7:25
  • \$\begingroup\$Question: do you know the difference between "size_t const" vs "const size_t"?\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 7:38
  • 2
    \$\begingroup\$"btw I think declaring the skip_p1 and skip_p2 inside the for loop wouldn't work. because once set, their values won't retain for the next iterations" <- By the time you get to where one of either variable is true, there are no next iterations, just strcpy and return.\$\endgroup\$CommentedFeb 27 at 12:39
  • 1
    \$\begingroup\$Only for the pedantic: strlen(s1) + strlen(s2) may overflow.\$\endgroup\$
    – chux
    CommentedFeb 27 at 14:15
  • 1
    \$\begingroup\$@chux, a code style that my work colleagues like is to compare using equality for numbers, characters, etc, and to compare using implicit boolean conversion for pointers (and, in C++, objects such as streams that convert to bool). I'm growing into it...\$\endgroup\$CommentedFeb 27 at 17:31
10
\$\begingroup\$

Let us compare it with a straight simplistic solution in your style.

char *mergeAlternately(char *word1, char *word2) { int len = strlen(word1) + strlen(word2); char *result = (char *)malloc(sizeof(char) * (strlen(word1) + strlen(word2) + 1)); char *p1 = word1; char *p2 = word2; char *p = result; while (*p1 || *p2) { if (*p1) { *p = *p1; p++; p1++; } if (*p2) { *p = *p2; p++; p2++; } } *p = '\0'; return result; } 

So due to the missing skip variables*p1 and *p2 are repeated twice.

The if *p1 and if *p2 are separately executed, not in an if-else if-else chain. This safes an else and repetititions of copying *p1 and *p2. Also you were almost forced to terminate the merged string inside the loop.

Now you need no artifact return "..."; which might be unreachable, but a code style checker might see a non-freeable result.

The loop condition is telling "end reached of p1 resp. p2."

I have replaced usage of an index i in result with a pointer p - just as you did for word1 and word2.

Personally I am not a friend of the alternative style:

if (*p1) { *p++ = *p1++; } 

What really could be considered:

When one of the words is very short, then on reaching one end you could just copy the other and exit the loop.

for (;;) { if (!*p1) { strcpy(p, p2); break; } if (!*p2) { strcpy(p, p1); break; } *p = *p1; p++; p1++; *p = *p2; p++; p2++; } return result; 

Merge "a" with "zzz.....zzz" and the second solution will be considerably faster.

It also does away with providing a NUL terminator oneself and if statements.

This also teaches a lesson on premature optimization as the skip... variables are no longer needed. Best style is to declare variables near their first usage: the code becomes better readable and modifiable.

\$\endgroup\$
7
  • \$\begingroup\$You might want to add this C and C++ reference to your library\$\endgroup\$
    – pacmaninbw
    CommentedFeb 25 at 13:50
  • \$\begingroup\$Do people read that C reference by itself like a book, or only read it when looking up something\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 7:23
  • 1
    \$\begingroup\$@ijklr personally I am anyway occupying with programming langiage, but I have encountered some people that indeed read language specifications to grasp the entire language. For some languages like C++ the specification is as fragmentary as is the language. For a language like Algol68 it is a dream, though at that time, 1968, it was top. Still is.\$\endgroup\$CommentedFeb 26 at 7:42
  • \$\begingroup\$Sorry, I meant to add that comment to the question, not your answer. @ijklr The user with the currently accepted answer reads the C++ standard. I use it as a reference.\$\endgroup\$
    – pacmaninbw
    CommentedFeb 26 at 12:46
  • 1
    \$\begingroup\$@pacmaninbw Okay, thanks anyway, and nice answer.\$\endgroup\$CommentedFeb 26 at 12:56
8
\$\begingroup\$

General Observations

The code compiles as long as you add the missing header files. There is one warning message when compiled with the -Wall -Wextra -pedantic flags. The variable len is assigned a value but never used.

There is insufficient testing, only one test where the strings are both of the same length. Testing should be done in a loop, tests should include a test where either string is empty, and where both strings are empty.

This program is leaking memory badly: the allocated string is never freed.

The boolean variables are not needed, so stdbool.h doesn't need to be included.

For optimal performance, use a pointer for the result string rather than using an index.

Test the input strings before using strlen() or malloc(). If either string is empty return the string that has a value; there is no need to malloc() in this case. If both strings are empty there is also no reason to allocate memory.

There is no reason to cast the result of the call to malloc().

Test for Possible Memory Allocation Errors

In modern high-level languages such as C++, memory allocation errors throw an exception that the programmer can catch. This is not the case in the C programming language. While it is rare in modern computers because there is so much memory, memory allocation can fail, especially if the code is working in a limited memory application such as an embedded control system. In the C programming language when memory allocation fails, the functions malloc(), calloc() and realloc() return null. Referencing any memory address through a null pointer results in undefined behavior (UB).

Possible unknown behavior in this case can be a memory page error (in Unix this would be called a Segmentation Violation), corrupted data in the program, and in very old computers it could even cause the computer to reboot (corruption of the stack pointer).

To prevent this undefined behavior a best practice is to always follow the memory allocation statement with a test that the pointer that was returned is not null.

Convention When Using Memory Allocation in C

When using malloc(), calloc() or realloc() in C, a common convention is to use sizeof *PTR rather sizeof (PTR_TYPE); this makes the code easier to maintain and less error-prone, since less editing is required if the type of the pointer changes.

 char *result = malloc(sizeof(*result) * (strlen(word1) + strlen(word2) + 1)); 

Incorrect Type Used for Size

The variables len and i should both be of type size_t. This is the type that strlen() returns and that malloc() expects as input.

\$\endgroup\$
6
  • 1
    \$\begingroup\$“if either string is empty return the string that has a value, there is no need to malloc() in this case” – But then the caller can not know if the returned memory needs to be released or not.\$\endgroup\$
    – Martin R
    CommentedFeb 25 at 16:45
  • 1
    \$\begingroup\$I suggest you add -Wwrite-strings to your usual warning arguments - that detects a problem here.\$\endgroup\$CommentedFeb 25 at 18:13
  • \$\begingroup\$Thanks for the feedbacks! These are all great suggestions\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 7:24
  • \$\begingroup\$Is it typical to have this pattern? type_name * abc = malloc(sizeof(*typename) * N);\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 7:28
  • \$\begingroup\$@ijklr Using sizeof(*variable) means that you can change the type without changing the rest of the statement so that is generally preferred because it is easier to maintain. I learned that here on this website when my code was reviewed.\$\endgroup\$
    – pacmaninbw
    CommentedFeb 26 at 12:53
4
\$\begingroup\$

The previous answers cover the major review points. Here are some minor style suggestions.


When working on programming challenges like this, the focus is on correct functionality and efficiency, and rightly so. However, coding style is usually an afterthought.

Documentation

Add a comment to the top of the code to summarize its purpose. Something like the text in the question is appropriate.

Indentation

Code that only uses 2 spaces per indent level is hard to read. Using 4 spaces per level is much easier to read.

Consistency

In one place you set the boolean variable to the value false:

bool skip_p1 = false; 

then later set it to an integer:

skip_p1 = 1; 

Let's go for a little consistency and use:

skip_p1 = true; 

Naming

The variable name result is a bit generic. Perhaps something like merged_word would be more specific.

\$\endgroup\$
3
  • 1
    \$\begingroup\$I think result is fine. It's shorter than merged_word for a start, and is absolutely clear that it's the return value of the function. Just an opinion, of course - there's no defined right and wrong there!\$\endgroup\$CommentedFeb 25 at 18:16
  • \$\begingroup\$@TobySpeight: Agreed... I have no strong preference either way.\$\endgroup\$
    – toolic
    CommentedFeb 25 at 18:23
  • 1
    \$\begingroup\$Appreciate the coding style suggestions.\$\endgroup\$
    – ijklr
    CommentedFeb 26 at 7:29
4
\$\begingroup\$

Many good review points already.

Yet I thought code should take advantage that it knows the lengths and so can reduce testing in the loop.

Keeping track of the lengths also affords an overflow check and simplified appending of the longer string.

#include <stdint.h> #include <stdlib.h> #include <string.h> char *mergeAlternately_alt(const char *word1, const char *word2) { size_t len1 = strlen(word1); size_t len2 = strlen(word2); if (len1 + 1 > SIZE_MAX - len2) { // Check for 2 _big_ strings return NULL; } char *merged = malloc(len1 + len2 + 1); if (merged == NULL) { // Allocation failure? return NULL; } char *dest = merged; size_t len_min = (len1 < len2) ? len1 : len2; for (size_t i = 0; i < len_min; i++) { *dest++ = *word1++; *dest++ = *word2++; } // Append the rest of the longer word (string). strcpy(dest, (len1 > len2) ? word1 : word2); return merged; } 
\$\endgroup\$
11
  • \$\begingroup\$You could reduce the loop further by ensuring that word1 isn't longer than word2. If it is, then write its first char and swap the words. Then do the loop without i and stop it when word1 ends. (Though I have no idea whether that's faster.)\$\endgroup\$CommentedFeb 27 at 22:27
  • \$\begingroup\$@no comment Interesting. Consider posting your idea as an answer.\$\endgroup\$
    – chux
    CommentedFeb 27 at 22:36
  • \$\begingroup\$I don't think I will. It's a potential optimization, so I wouldn't want to post it as an answer without a benchmark, and I'm not competent enough with C to do that properly.\$\endgroup\$CommentedFeb 27 at 22:44
  • \$\begingroup\$@nocomment Ok, then please explain further "You could reduce the loop further by ensuring that word1 isn't longer than word2" with more detail. Since this answer's loop only iterates for the minimum of len1 and len2, how does also "ensuring that word1 isn't longer than word2" help?\$\endgroup\$
    – chux
    CommentedFeb 27 at 22:51
  • 2
    \$\begingroup\$@nocomment: Yes, I'd certainly hope compilers would auto-vectorize this with x86 punpcklbw or ARM zip or uzp. That's what makes this answer 8x to 16x faster than the others for medium-sized strings that fit in L1d or L2 cache: compilers will only auto-vectorize loops whose trip-count can be calculated ahead of the first iteration. (We need the size ahead of the first iteration so even manually vectorizing we need two passes over the inputs, first for strlen then for copying, unless we over-allocate and shrink, or we rely on a good realloc to use e.g. Linux mremap)\$\endgroup\$CommentedFeb 28 at 1:54

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.