Operator-precedence calculator in C

Question

Recently, I've been wanting to do some larger C projects than I'm used to, since I really like the language. So, as a first step, I decided to implement a nice calculator. My end goal is implementing continued-fraction arithmetic, since I've never really seen that used anywhere before, but currently, it just does int operations - and I thought it'd be nice to know what I can do better before I actually do more.

The interface is pretty simple: You type in expressions to calculate, and it prints the result, like so (user input is indented)

 1 + 1 2 114 - 10 + (_2 ** 3 + 1) * 5 69 * 42 2898 :binary 101101010010 :exit

So far, it supports plus, minus, times, divide and remainder (same symbols as in C), powers with **, negation with _, as well as parentheses.
Additionally, you can use the previous result by leaving out a number at the start of a line, and display it in binary or hex using a command starting with :.

#include <stdio.h> #include <stdbool.h> #include <string.h> #define MAX_LINE_LENGTH 256 #define EVAL_STACK_SIZE 256 typedef enum { RES_GOOD, RES_ERROR, RES_EXIT } EvalRes; typedef struct { enum { PLUS = 0, MINUS = 1, MULT = 2, DIV = 3, MOD = 4, EXP = 5, NEG = 6, INT = 7, OPEN = 8, CLOSE = 9, END = 10, NONE = 11 } type; union { int num; }; } Token; #ifdef DEBUG char *TOKEN_NAMES[] = {"PLUS", "MINUS", "MULT", "DIV", "MOD", "EXP", "NEG", "INT", "OPEN", "CLOSE", "END", "::"}; #endif // + - * / % ** _ int precedence[] = {0, 0, 1, 1, 1, 2, 3}; bool right_assoc[] = {0, 0, 0, 0, 0, 1, 0}; bool is_op(Token t) { return t.type < INT; } char get(char **s) { return **s; } char next(char **s) { return **s ? *(*s)++ : 0; } void unget(char **s) { (*s)--; } bool is_blank(char c) { return c == ' ' || c == '\t' || c == '\n' || c == '\r'; } // return next token, advance *line until position directly after Token next_token(char **line) { while(is_blank(get(line))) next(line); Token token; char c = next(line); if(c == '+') token = (Token) {PLUS}; else if(c == '-') token = (Token) {MINUS}; else if(c == '*') { if(get(line) == '*') { next(line); token = (Token) {EXP}; } else token = (Token) {MULT}; } else if(c == '/') token = (Token) {DIV}; else if(c == '%') token = (Token) {MOD}; else if(c == '_') token = (Token) {NEG}; else if(c >= '0' && c <= '9') { // just return 1 place in line, then parse from there unget(line); int num = 0; while((c = get(line)) >= '0' && c <= '9') { num = 10 * num + (c - '0'); next(line); } token = (Token) {INT, {num}}; } else if(c == '(') token = (Token) {OPEN}; else if(c == ')') token = (Token) {CLOSE}; else if(!c) return (Token) {END}; else return (Token) {NONE}; return token; } // push `t` onto `*stack` void push(Token t, Token **stack) { *(++*stack) = t; } // pop token off `*stack` Token pull(Token **stack) { return *((*stack)--); } // get depth'th token off `*stack`; depth=0 is TOS Token peek(int depth, Token **stack) { return (*stack)[-depth]; } #ifdef DEBUG void print_stack(Token *stack) { int ix = 0; while(stack[-ix].type != NONE) ix++; for(int i = ix; i >= 0; i--) printf("%s ", TOKEN_NAMES[stack[-i].type]); printf("\n"); } #endif // signal value; use as `base_op` in reduce() to reduce all const int ALL_OP = -111111; // returns true if reduced successfully, false on error // this is where the actual computation takes place bool reduce(Token **stack, int base_op) { if(peek(0, stack).type != INT) return false; int n = pull(stack).num; while(is_op(peek(0, stack))) { Token op = pull(stack); // reduce all with higher precedence // and ones with same precedence unless right-associative if(base_op != ALL_OP && ( precedence[op.type] < precedence[base_op] || (op.type == base_op && right_assoc[op.type]) )) { push(op, stack); break; } // evaluate ... _ n if(op.type == NEG) n = -n; // evaluate ... m (op) n else { Token t = pull(stack); if(t.type != INT) return false; int m = t.num; switch(op.type) { case PLUS: n = m + n; break; case MINUS: n = m - n; break; case MULT: n = m * n; break; case DIV: n = m / n; break; case MOD: n = m % n; break; case EXP: { int pow = n; n = 1; while(pow-- > 0) n *= m; break; } default: return false; } } } push((Token) {INT, {n}}, stack); return true; } EvalRes eval(char *line) { // previous answer static int ans = 0; #ifdef DEBUG printf("%zu: \"%.*s\"\n", strlen(line), (int)strlen(line), line); #endif // no input -> no output if(*line == 0) return RES_GOOD; // commands if(*line == ':') { #define lineis(s) !strncmp(line, s, sizeof(s)) if(lineis(":exit") || lineis(":quit")) return RES_EXIT; if(lineis(":ans")) printf("%d\n", ans); // print previous answer in hex signedly if(lineis(":hex")) printf("%s%x\n", ans < 0 ? "-" : "", ans < 0 ? -ans : ans); // print previous answer in binary; sadly, no formatting option for that if(lineis(":binary")) { int n = ans; if(n < 0) { printf("-"); n = -n; } int digits = 1; while(n >= 1 << digits) digits++; while(--digits + 1) printf("%d", n >> digits & 1); printf("\n"); } return RES_GOOD; } Token stack_array[EVAL_STACK_SIZE] = {{NONE}, {INT, {ans}}}; // stack for tokens; the pointer always points to the top element Token *stack = stack_array + 1; #ifdef DEBUG print_stack(stack); #endif bool finished = false; while(!finished) { Token token = next_token(&line); #ifdef DEBUG printf("Token %s\n", TOKEN_NAMES[token.type]); #endif switch(token.type) { // +,- reduce +,-,*,/,%,**,_, then push self case PLUS: case MINUS: // *,/,% reduce *,/,%,**,_, then push self case MULT: case DIV: case MOD: // ** reduces _, then pushes self case EXP: if(!reduce(&stack, token.type)) return RES_ERROR; push(token, &stack); break; // negation, int, open parens are just pushed case NEG: case INT: case OPEN: push(token, &stack); break; // closing paren finished evaluating its sub-expression, pulls open paren off stack case CLOSE: if(!reduce(&stack, ALL_OP)) return RES_ERROR; if(peek(1, &stack).type != OPEN) return RES_ERROR; stack[-1] = stack[0]; stack--; break; case END: finished = true; break; case NONE: return RES_ERROR; } #ifdef DEBUG print_stack(stack); #endif } if(!reduce(&stack, ALL_OP)) return RES_ERROR; // print result, and make it the new previous answer int result = pull(&stack).num; printf("%d\n", result); ans = result; return RES_GOOD; } int main() { char line[MAX_LINE_LENGTH]; while(true) { printf(" "); fgets(line, MAX_LINE_LENGTH, stdin); // remove all blanks (notably, \n and \r) from the end size_t len = strlen(line); while(is_blank(line[len - 1])) line[--len] = 0; EvalRes res = eval(line); if(res == RES_EXIT) break; if(res == RES_ERROR) printf("Invalid Input\n"); } }

Instead of having separate parsing and evaluation stages, lines are tokenized and evaluated on-the-fly using a stack machine.

On start, the stack contains a sentinel - in lieu of tracking the stack height separately to avoid underflow - and the previous result. Then, for every token:

if it's a number, it's pushed on the stack
if it's an operation, then first it tries to evaluate any existing expression on the stack - for example, if the stack contains tokens .. 1 + 2 * 3 and the next token is a +, then it will reduce this to .. 1 + 6, then .. 7, before pushing the + to get .. 7 +. Here, we also achieve operator precedence by not reducing operations with lower precedence than the current one - for example, if the stack contains .. 1 + 2, then a following * will not reduce this to .. 3. All this is implemented in reduce() above.
if it's an open paren, it's just pushed, and acts like the sentinel value at the bottom for any following operations; if it's a closing paren, then it does a reduce, before removing the open paren it was preceded by.

In the end, the result of the evaluation (after one more reduce) is what's on top of the stack. I know that this means something like 1 2 3 4 5 is valid input with result 5, but for now I'm deciding that's a feature :)

G. Sliepen · Accepted Answer · 2024-08-11 17:29:35Z

Organize your data

I see one struct, the rest of your data is stored in several arrays, and pointers to pointers are passed around to access them. Try to organize your data even more. For example, information about the tokens, like name, precedence and association, could be grouped in a struct:

typedef struct { const char* name; int precedence; bool right_assoc; } TokenInfo; static const TokenInfo token_infos[] = { [PLUS] = {"PLUS", 0, false}, [MINUS] = {"MINUS", 0, false}, … };

I would also create a struct TokenStack and a struct Line, so we can pass the stack and line via a single pointer. These structs could not only hold the actual data, but also members that indicate where the top of the stack and the current position in the line is.

Avoid using overly generic names

Avoid using names that are very generic, like get() and next(). In a larger program, this increase the chance of name conflicts.

Make more use of the standard library

There is a lot of functionality in the standard library that you can make use of, avoiding reinventing the wheel. For example, instead of is_blank(), use isspace(). Instead of treating a string buffer as a stream, consider passing a FILE* to eval() and have it use getc(), ungetc() and so on. This can also be used on string buffers by the way, by using the POSIX fmemopen() function, although that is not standard C.

Consider implementing a Pratt parser

Your parser works, but adding more operations will increase the complexity. You also have to manually manage a stack. Consider implementing a Pratt parser instead; it will result in cleaner and more easily extensible code. In particular, it handles the difference between prefix and infix operators very well, which brings me to the following:

Negation should be done with `-`

It's very annoying to have to use _ as opposed to - to negate a number. No mainstream programming language needs that. You should be able to modify your code to handle that case, by examining the top of the stack when handling MINUS: if it's empty or the top is not an INT, treat it as a NEG instead.

Avoid hardcoded buffer sizes

Hardcoding the line length to 256 seems fine for the simple example expressions you are testing, but if this is going to be used in production, there will come a time when someone uses it for an even larger expression. Your current code might in the best case crash, in the worst case it will continue running but will have unexpected results because it is reading and writing out of bounds.

Either avoid hardcoding buffer sizes, and make them resize when necessary, or at the very least check whether you are about to go past the end of a buffer, in which case you should handle this situation gracefully; for example by printing an error message to stderr and exiting with EXIT_FAILURE.

Note that this is one of the drawbacks of C. If you like the language in general but would like to avoid most of the memory issues, consider using C++ in combination with the standard containers.

I agree that unless you have a specific need for C, C++ will be better. You can even make classes that can natively work with arithmetic operations like negative -. — qwr, CommentedAug 12, 2024 at 14:18
If you use - for both unary negation and binary subtraction, then there's an ambiguity with - at the start of a line: does it mean unary negation, or is it subtracting something from the previous result? — Daniel Schepler, CommentedAug 12, 2024 at 16:52
@DanielSchepler True, although that is of course a choice from pushing the result of the previous expression back on the stack. That makes a lot of sense with an RPN calculator, but for an infix calculator it's much less useful. There are other ways to solve this, for example by having a token represent the result of the previous expression. This would solve the ambiguity and be more flexible as well. Although technically there is no ambiguity; if you want it to work like the current code, it would be treated as infix, if you don't you just wrap a negative number in parentheses. — G. Sliepen, CommentedAug 12, 2024 at 17:48
You make many good points, thank you! Just three things I disagree/"disagree" with: (A) The point about standard library is good, but I'm not on a POSIX system, so I'm reluctant to pass a FILE * instead of char * (B) As someone already mentioned, - is ambiguous when in first position. And doing it the way I did it is not an arbitrary decision - it's how the calculator I'm using at home works. That also has two separate symbols for unary - and binary -, and I decided to use _ as the closest thing to - (inspired by J) (C) I know C++ as well, I just like doing things in C sometimes :) — Cecilia, CommentedAug 13, 2024 at 23:13

Stack Exchange Network

Operator-precedence calculator in C

1 Answer 1

Organize your data

Avoid using overly generic names

Make more use of the standard library

Consider implementing a Pratt parser

Negation should be done with `-`

Avoid hardcoded buffer sizes

Hot Network Questions

Operator-precedence calculator in C

1 Answer 1

Organize your data

Avoid using overly generic names

Make more use of the standard library

Consider implementing a Pratt parser

Negation should be done with -

Avoid hardcoded buffer sizes

Related

Hot Network Questions

Negation should be done with `-`