3
\$\begingroup\$

This code is an arithmetic parser as is the code in a previous question of mine. However this parser handles floating point arguments and mathematical functions, and handles them without needing to define a new function for every precedence level. The component to parse function names is the topic of a different previous question of mine.

The code uses a recursive approach to parse strings of mathematical expressions. u is short for unary, which handles individual terms, such as individual numbers, functions, or parenthesized expressions. b is short for binary, which handles the binary operators. Associativity and precedence is respected. The first term is evaluated. Then, any operators operating on it are checked. If they are found, the right operand of the operator is recursively evaluated by passing the precedence operator to the function. The function only evaluates until an operator of precedence lower or equal (for right-to-left associativity) or lower (for left-to-right associativity) precedence is encountered. As such, the recursion ends, and the previous recursion level handles the rest. The workings are complicated but I am trying my best to explain them.

As always, if my code is needlessly non-portable, or could use general improvement advice, corrections or comments would be appreciated.

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #include <math.h> static size_t l; static int c(const void *const restrict a, const void *const restrict b) { const unsigned char *const sa = *(const unsigned char *const *)a, *const sb = *(const unsigned char *const *)b; const int cmp = memcmp(sa, sb, l = strlen((const char *)sb)); return cmp ? cmp : isalpha(sa[l]) ? sa[l]-sb[l] : 0; } static double (*func(const char **const str))(double) { static const char *const s[] = {"abs", "acos", "acosh", "asin", "asinh", "atan", "atanh", "cbrt", "ceil", "cos", "cosh", "erf", "erfc", "exp", "expb", "floor", "gamma", "lgamma", "lb", "ln", "log", "round", "sin", "sinh", "sqrt", "tan", "tanh", "trunc"}; static double (*const f[])(double) = {fabs, acos, acosh, asin, asinh, atan, atanh, cbrt, ceil, cos, cosh, erf, erfc, exp, exp2, floor, tgamma, lgamma, log2, log, log10, round, sin, sinh, sqrt, tan, tanh, trunc}; const char *const *const r = bsearch(str, s, sizeof(s)/sizeof(*s), sizeof(*s), c); if (r) { *str += l; return f[r-s]; } return NULL; } static double u(const char **); static double b(const char **const str, const unsigned l) { *str += l != 0; // `l` will only be nonzero if recursively called by `b`, in which case there is an operator character to skip over for (double r = u(str);;) { while (isspace(**str)) ++*str; switch (**str) { case '^': if (l <= 3) // Using `<=` instead of `<` gives `^` right-to-left associativity r = pow(r, b(str, 3)); else break; continue; case '*': if (l < 2) r *= b(str, 2); else break; continue; case '/': if (l < 2) r /= b(str, 2); else break; continue; case '%': if (l < 2) r = fmod(r, b(str, 2)); else break; continue; case '+': if (l < 1) r += b(str, 1); else break; continue; case '-': if (l < 1) r -= b(str, 1); else break; continue; } return r; } } static double u(const char **const str) { while (isspace(**str)) ++*str; if (isdigit(**str) || **str == '.') return strtod(*str, (char **)str); if (isalpha(**str)) { double (*f)(double) = func(str); if (f) return f(u(str)); } else switch (*(*str)++) { case '+': return +u(str); case '-': return -u(str); case '(': { register const double r = b(str, 0); if (*(*str)++ == ')') return r; } } return NAN; } #include <errno.h> int main(const int argc, const char *const *argv) { while (*++argv) if (printf("%g\n", b(&(const char *){*argv}, 0)) < 0) return EXIT_FAILURE; return EXIT_SUCCESS; } 
\$\endgroup\$
9
  • 3
    \$\begingroup\$Thank you for explaining that u / b is short for unary / binary; that is invaluable documentation for the current code. But it should be superfluous. Just declare static double unary and be done with it. In a related vein, sometimes l denotes length and sometimes a precedence level. Add two or so characters to at least one of those identifiers already. Consider stack allocating it rather than using static, so it goes out of scope. Don't compilers mostly ignore "register" advice these days? It's too bad we don't have a bunch of pairs that look like ("abs", fabs), ...\$\endgroup\$
    – J_H
    CommentedFeb 28, 2023 at 22:04
  • 4
    \$\begingroup\$int main(const int argc, const char *const *argv) { while (*++argv) ====> I suggest looking at your last review.\$\endgroup\$CommentedMar 1, 2023 at 0:48
  • 1
    \$\begingroup\$@chux-ReinstateMonica The only side effect of that is . is parsed as 0. c() is used as the comparison function for bsearch().\$\endgroup\$
    – CPlus
    CommentedMar 1, 2023 at 1:05
  • 4
    \$\begingroup\$You have linked previous reviews, but don't seem to have learnt from them. Why should anyone volunteer to review your code if you ignore most suggestions?\$\endgroup\$CommentedMar 1, 2023 at 16:26
  • 1
    \$\begingroup\$...more specifically, your code invokes undefined behaviour (would likely result in a segmentation violation signal) because you didn't check if argv was valid before dereferencing it. It could easily be rendered NULL. And what do you suppose the register keyword does?\$\endgroup\$CommentedMar 1, 2023 at 17:02

3 Answers 3

5
\$\begingroup\$

Avoid global variables for this task

The key problem, in this case, is the added difficultly in proving code correctness.

Alternative:

In func(), instead of *str += l;, use *str += strlen(r);

Note: use of l as a global and l as a local variable decreases code clarity.

I'd re-write it further:

#define SL(s) (s), sizeof (s) - 1 struct { const char *name; unsigned len; double (*const func)(double); } pairs[] = { { SL("abs"), abs}, { SL("acosh"), acosh}, ... }; #undef SL ... and then `bsearch()` on `pairs` and avoid all `strlen()` calls. 

Formatting

First, use an auto-formatter.

Target your audience. On this site, the width of your code is about 310% the width of the window. Adjust your auto formatting to stay within 120%, or even 100%.

Plan for the future

Move the parsing code to another file, say dparse.c. Form dparse.h and main.c.

That way your good work can now readily get used on larger applications.

Detect no conversion

Do not assume a convertible string must begin with a digit/'.'. Other possibilities include , (other locale), i, I, n, N, ....
strtod() also has "In other than the "C" locale, additional locale-specific subject sequence forms may be accepted."

The only side effect of that is . is parsed as 0 is not wholly correct.

Better to let strtod() determine what is convertible.

 // if (isdigit(**str) || **str == '.') // return strtod(*str, (char **)str); char *endptr; double val = strtod(*str, &endptr); if (*str == endptr) { // Handle no conversion } else { *str = endptr; return val; } 

Code then likely needs to test for +, - first.

\$\endgroup\$
10
  • \$\begingroup\$'In func(), instead of *str += l;, use *str += strlen(r);.' I would rather not recalculate the strlen unnecessarily. A quick and dirty fix was to use a global variable. A new solution was to manually add the length to the string pointer from within the c function upon returning 0. 'Better to let strtod() determine what is convertible.' The digit() and == '.' was another quick and dirty fix. Thanks for the alternative suggestion that I had not yet thought of.\$\endgroup\$
    – CPlus
    CommentedMar 1, 2023 at 21:21
  • \$\begingroup\$But your new solution that avoids all strlen calls might be even better.\$\endgroup\$
    – CPlus
    CommentedMar 1, 2023 at 21:31
  • 1
    \$\begingroup\$I dislike the macro that expands s twice. But it's used in a very small scope, and with an #undef immediately following, I would grudgingly pass that.\$\endgroup\$CommentedMar 2, 2023 at 11:18
  • 1
    \$\begingroup\$@user16217248, Macros that expand their arguments more than once are a serious bug risk, because when arguments with side-effects get substituted, those side-effects are repeated despite appearing just once in the unexpanded source (standard example: MACRO(++i)). That's why I dislike them, and strive to avoid them whenever possible. In this particular case, the benefit is probably worthwhile, but adding #undef SL as soon as reasonably possible would be a help to reviewers and future maintainers, as that keeps the danger clearly isolated.\$\endgroup\$CommentedMar 2, 2023 at 15:35
  • 1
    \$\begingroup\$@TobySpeight Had I taken more time, It think a macro (better named too) to initialize all members would be cleanest: #define PAIRS_INIT(s, f) (s), sizeof (s) - 1, (f) ... { PAIRS_INIT("abs", abs)}, ... #undef PAIRS_INIT.\$\endgroup\$
    – chux
    CommentedMar 2, 2023 at 17:58
7
\$\begingroup\$

General Observations

I'm going to address coding style here, since the code is almost unreadable.

If you are interested in getting your code reviewed you may want to pay attention to what the reviewers say. In an earlier review @chux-ReinstateMonica wrote this:

Non-standard main() signature

Drop the const.

Some compilers may whine. Note: "The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program," C17dr § 5.1.2.2.1 2.

There is really no point in reviewing your code if you aren't going to use our suggestions.

Readability

The code is very hard to follow.

Vertical Spacing

The code needs vertical white space to improve readability. Generally logic blocks should be separated by blank lines, functions should definitely be separated by blank lines.

This was mentioned by @chux-ReinstateMonica in a previous review.

Line Length

Limit the length of any lines so that the reader does not lose track of the line they are on. Old screens were limited to 80 characters of width, it isn't necessary to stick to 80 characters, but it would be best to keep it under 100 characters. Having to do horizontal scrolling while reading code is bad.

Variable and Function Names

The use of single character or 2 character variable names reduces the readability of the code. If u stands for unary use unary because it is clearer and doesn't need a paragraph to explain it.

static size_t l; 

The variable l above is global to the file, a single character global variable is definitely a bad practice because finding it will be very difficult.

Late Include File

Why isn't the include for errno.h up with the other include files at the top of the file? Why is errno.h included at all since it doesn't seem to be used.

\$\endgroup\$
15
  • 5
    \$\begingroup\$And of course, l is one of the worst variable names to use at any time, due to its visual similarity with 1.\$\endgroup\$CommentedMar 1, 2023 at 16:21
  • 1
    \$\begingroup\$@TobySpeight Absolutely!\$\endgroup\$
    – pacmaninbw
    CommentedMar 1, 2023 at 16:58
  • 1
    \$\begingroup\$@TobySpeight No l object names is a good idea, especially around Christmas ;-).\$\endgroup\$
    – chux
    CommentedMar 1, 2023 at 18:03
  • 1
    \$\begingroup\$Code's use of const is overdone here in function definitions and reduces clarity, especially on short functions. It is never needed in function declarations. On long functions, it has some merit, yet it use falls into a style issue. Style issues are best coded against group's coding standard - so I tend to avoid those holy wars. I suspect if you coded all this with and without const parameters and inquired what most folks like, they would choose without. IAC, best to leave main() alone - it is special (read C spec 5.1.2.2.1 Program startup) & no need to be any different here.\$\endgroup\$
    – chux
    CommentedMar 1, 2023 at 22:09
  • 3
    \$\begingroup\$@user16217248 Should you desire to post another iteration, take your time (at least a week, maybe month) and reflect on all the feedback in this and previous posts/answers.\$\endgroup\$
    – chux
    CommentedMar 1, 2023 at 22:15
7
\$\begingroup\$

Don't write clever code:

if (printf("%g\n", b(&(const char *){*argv}, 0)) < 0) 

You know you’re brilliant, but maybe you’d like to understand what you did 2 weeks from now.

Similarly, this expression return cmp ? cmp : isalpha(sa[l]) ? sa[l]-sb[l] : 0; seems to say: Read my code, and tremble.

Yes, you should write concise code, but not to the point where it hurts readability. I'd rather opt for an if/else here.

Don’t put multiple statements on a single line unless you have something to hide:

const unsigned char *const sa = *(const unsigned char *const *)a, *const sb = *(const unsigned char *const *)b; 

Multiple assignments, either. The above snippet is better as:

const unsigned char *const sa = *(const unsigned char *const *)a; const unsigned char *const sb = *(const unsigned char *const *)b; 

Use descriptive function/variable names:

"u is short for unary, which handles individual terms, such as individual numbers, functions, or parenthesized expressions. b is short for binary."

Names must be recognizable and quickly distinguishable. The above comment wouldn't be required if the functions used self-descriptive names. parse_unary() is self-descriptive, u() is not. parse_binary() is self-descriptive, b() is not.

Aside: The follow-up failed to show any significant improvements. I do not see any, or most of, the changes @chux suggested in this answer. People would be more likely to review your code if you'd take their suggestions into consideration.

And I appreciate you for using EXIT_SUCCESS and EXIT_FAILURE instead of magic numbers.

\$\endgroup\$
4
  • \$\begingroup\$The 'clever' code seems the most straightforward way to me to write If printf fails, exit with failure status. Maybe I should have added a comment explaining this line.\$\endgroup\$
    – CPlus
    CommentedMar 1, 2023 at 19:07
  • 3
    \$\begingroup\$As an observer, I agree that if (printf(…) < 0) is natural. However, embedding the call to b() within the printf() is too dense for my taste (especially with that cast which wouldn't be necessary if you used a standard signature for main()).\$\endgroup\$CommentedMar 2, 2023 at 7:58
  • \$\begingroup\$@TobySpeight It's not a cast, it's a compound literal, to create an anonymous copy in case modifying the values of argv is problematic for whatever reason.\$\endgroup\$
    – CPlus
    CommentedMar 4, 2023 at 4:13
  • 1
    \$\begingroup\$"Complexity is more apparent to readers than writers. If you write a piece of code and it seems simple to you, but other people think it is complex, then it is complex. “Obvious” is in the mind of the reader: it’s easier to notice that someone else’s code is nonobvious than to see problems with your own code. If someone reading your code says it’s not obvious, then it’s not obvious, no matter how clear it may seem to you." — A Philosophy of Software Design, John Ousterhout\$\endgroup\$CommentedMar 6, 2023 at 8:57

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.