9

I'm analyzing a Fortran IV program written for a PDP-10. I was briefly acquainted with the language but had never studied it. Now I have a couple Fortran reference manuals (IV and 77), and I'm reading that spaces (blanks) are (in most cases) meaningless and even unnecessary.

DECsystem-10 FORTRAN IV Programmer's Reference Manual:

Except for alphanumeric data within a FORMAT statement, DATA statement, or literal constant, blanks (spaces) and TABS are ignored and may be used freely for appearance purposes. Thus the following statements are equivalent.

...
END FILE 2
ENDFILE2

And a Fortran 77 handbook says spaces are ignored even inside a variable name: FOO BAR is the same as FOOBAR.

How do old Fortran parsers, given this lexical flexibility, resolve ambiguities when recognizing keywords and identifiers?

In most programming languages, identifiers may not match a keyword. For example, you may not name your variable IF. But this Fortran flexibility could lead to ambiguities if even an identifier merely had a prefix that matches a keyword.

Consider these statements:

 GOTOP(X, Y) = Y - X GOBOT(X, Y) = X + Y ASSIGN 456 TO P GO TO P (123, 456) 

The last of those statements is an error. It's either an assigned go to statement missing a comma, or it's an attempt to invoke the arithmetic statement function GOTOP that fails to store the result.

Are there language rules that guarantee the parser would see it one way and not the other? Do the parsers backtrack?

Clarification: A couple people have pointed out that since neither interpretation forms a valid statement, that there's no ambiguity. At the statement level, that's true. My question, however, intended to ask about the ambiguity inherent in the lexical analysis.

Given a statement that begins with G O T O P, once the scanner reads the second O, does it

  1. decide that the line begins with a GOTO keyword and treat the P as the beginning the next token?

  2. keep reading until it encounters a delimiter that cannot possibly be part of the same token [maximal munch]?

  3. look ahead to determine whether it's found the end of a token (perhaps after checking prefixes of variable names and other identifiers encountered so far)?

  4. try one interpretation, and, if it doesn't lead to a valid statement, backtrack and try the other?

  5. something else?

4
  • None of your statements is ambiguous. Three only have one interpretation, one is an error. Have you tried to construct an actual ambiguous statement? It's possible, but unlikely to be done by accident.
    – John Doty
    Commentedyesterday
  • 1
    @JohnDoty: I've added a clarification. I wasn't intending to ask about ambiguity at the statement level but at the lexical analysis level. Somehow, the scanner has to decide whether GOTOP is a GOTO keyword followed by an identifier named P or a whether it's just an identifier named GOTOP.Commentedyesterday
  • Fortran predates the concept of lexical analysis. Compilers were more holistic.
    – John Doty
    Commentedyesterday
  • See my answer - the compiler recognizes 'GOTO' after eliminating the only case where it could be a programmer-assigned indentifier, the 'arithmetic statement'.
    – dave
    Commentedyesterday

6 Answers 6

8

There are no ambiguities to resolve. You just ignore spaces when reading source code, except in the special case of a hollerith constant.

Your last example GOTOP(123,456) is not an arithmetic statement. An 'expression' is not any kind of statement. (Chapter 3).

It is also not a valid assigned goto. The compiler, I suppose, could attempt to guess what particular mistake you'd made, but it is still an erroneous construction.

FWIW, I used to write FORTRAN IV programs on 80-column cards that only permitted use of even-numbered columns. Though we did have a supply of cards pre-punched with 'C' in column 1.


According to this specification, FORTRAN II first identified an arithmetic statement by seeing (1) an equal sign not within parentheses, and (2) not followed by a comma not within parentheses [else it would be a DO statement].

If not an arithmetic statement, the 'beginning' of the statement is looked up in a dictionary in order to classify it.

The FORTRAN II code posted in a comment by @texdr.aft shows that the compiler accumulates a 'token' one non-blank character at a time, until it can find a match in its dictionary. It thus recognizes 'GOTO' as a keyword, prior to further analysis.

Per that, and assuming FORTRAN IV was broadly similar, your error case would be 'an incorrect assigned goto' rather than 'an incorrect arithmetic statement'. But it hardly seems to matter what sort of wrong it was, given it was wrong.

10
  • 3
    There is the infamous FORMAT(X5H)=(1.), an ambiguous but improbable statement.
    – John Doty
    Commentedyesterday
  • True. Per the FORTRAN II logic, that would be classified as an arithmetic statement.
    – dave
    Commentedyesterday
  • 1
    here are pointers to the source code for FORTRAN I/II: texdraft.github.io/fortran/fortran-10.html#C0190-definition subroutine for getting next nonblank; texdraft.github.io/fortran/fortran-10.html#CB000-definition part of logic for statement classification
    – texdr.aft
    Commentedyesterday
  • @JohnDoty Out of 3 FORTRAN compilers I can try, two treat it as a FORMAT statement if it is labeled, but behave differently when it is not (one prints a warning as for any other unlabeled FORMAT statement, the other ignores the statement silently), and the third compiler compiles the statement as an assignment operator. If it is labeled and a PRINT statement refers to it, garbage in the output and I/O errors result.
    – Leo B.
    Commentedyesterday
  • Why garbage? It was legal in FORTRAN IV to have a FORMAT with only hollerith and blank conversions, e.g. to print titles. So once you've decided it was a FORMAT and not an arithmetic statement, it should work. (I am not disbelieving you, I just don't get why that bug happens). What compiler?
    – dave
    Commentedyesterday
4

Early Fortran compilers did many passes. The one I used to know well (IBM 1130) did a couple of dozen passes. The passes were often specialists in particular statement types. An early pass classified statements, so later passes that handled the details would know which statements it should work on.

4
  • Thus it would be necessary to hypothetically decide early on whether the case in point was an arithmetic statement or an assigned goto. But since it's neither, I assume it was easily classifiable as 'not a valid statement form'.
    – dave
    Commentedyesterday
  • It had some difficulty identifying valid statements. RETURN X would crash the machine!
    – John Doty
    Commentedyesterday
  • Seems a bit extreme. Is that crash at compile time or run time?
    – dave
    Commentedyesterday
  • @dave Compile time. The pass that was supposed to swallow the entire RETURN statement would leave the extra token, and that completely derailed the following pass.
    – John Doty
    Commentedyesterday
4

And hence, the spacecraft was lost

Except, not because it was caught in bench testing. But, the first law of journalism is never let the facts get in the way of a good story.

So either I am passing on a favor or seeking an outlet for revenge. The line of FORTRAN code

DO 10 I = 1.100 

apparently did indeed occur. The compiler apparently did what it was supposed to do, ignoring white space and implicitly declaring and then assigning DO10I. Remarkably, the program was actually tested before use. The error was detected, thereby saving a rocket but ruining a future factoid. The demise of the rocket has been widely reported and has become part of programmer folklore.

2
  • I made that typo more than a few times myself. Keep in mind that the characters printed at the top of the punch cards were very blurry, often with "." and "," looking identical.Commented19 hours ago
  • I believe "not" needs a comma afterward to communicate what you're going for. And then maybe not a comma before? (Tough, as you're trying to capture a colloquial expression.) As is it reads to me as "the spacecraft was lost. The reason for the loss was not bench-testing, but rather something else caused the loss."
    – nitsua60
    Commented4 hours ago
1

Fortran now allows two source formats.

"Fixed form" is the name for Fortran I style where blanks are not significant, and is still allowed. And it has to be still unambiguous, even with all the new statements added over the decades.

In "Free form", blanks are significant, and the column restrictions are loosened. And a new way to specify continuation lines.

A few years ago, I was compiling IBM's ECAP from about 1968 using modern gfortran.

New contributor
gah is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
    1

    Fortran allowed spaces everywhere and then ignored them.

    This was less confusing than one might think, as generally people wrote code without any spaces at all in order to fit as much as possible into 80 column cards.

    While Döstädning in 2021, I found this deck of punch cards. It was written in June 1968, when I was invited to stay for a week at the University of Waterloo as a Junior Math Contest munchkin.

    The Output is a 60-line, 132-column, table of millimetre to fractional inches conversions. E.g. "590 23 15/64" means 590mm = 2315/64 inches. I think I wrote it to give to my father (who actually might have made use of it).

    Notice the use of self-documenting variable names and the way white-space is used to make the flow of control more visually obvious. Also notice that it was necessary to increase the allowed CPU time to 10 seconds, even though this was running on a multi-million dollar computer.

    $JOB WATFOR 02410237,KP=29,TIME=10 RAY BUTTERWORTH INTEGERN(10)/10*64/,L(10),IJK(10),NQ(10) PRINT6 DO2ID=1,59 K=0 DO5I=ID,590,59 K=K+1 X=320.*I/127. IJK(K)=I L(K)=X IF(X-L(K).GE..5)L(K)=L(K)+1 NQ(K)=L(K)/N(K) L(K)=L(K)-N(K)*NQ(K) DO4KFG=1,6 IF(L(K)/2*2.NE.L(K))GOTO5 L(K)=L(K)/2 4 N(K)=N(K)/2 5 CONTINUE PRINT3,(IJK(LP),NQ(LP),L(LP),N(LP),LP=1,10) DO2I=1,10 2 N(I)=64 PRINT6 3 FORMAT(' ',10(I4,2I3,'/',I2)) 6 FORMAT('1',10(2X,'MM INCHES')) STOP END $ENTRY 

    University of Waterloo's "Red Room"

      0

      The WATFIV compiler only allows statements starting with FORMAT( to be actual FORMAT statements.

      There are no actual ambiguities, but there are some that are harder to parse. (Note that for run time format, which are required to be in an array, FORMAT is the most obvious name for the array.)

      Fortran was before pretty much any understanding of parsing theory.

      For the case above: FORMAT(X5H)=(1.)

      The X format descriptor needs digits before it.

      But yes, the harder to parse cases are FORMAT statements with H format descriptor. There are supposed to be commas between format descriptors, though some compilers make them optional.

      New contributor
      gah is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
      7
      • Good point about 'nX' being needed, I had forgotten that.
        – dave
        Commentedyesterday
      • Please do not write multiple answer to the same question. Try to edit into onw.
        – Raffzahn
        Commented22 hours ago
      • 1
        Whether the X descriptor needed a count before it depended on the dialect. Some Fortrans defaulted to "1" if no count was given.
        – John Doty
        Commented20 hours ago
      • @JohnDoty For example, docs.oracle.com/cd/E19957-01/805-4939/z40007437a2e/index.html says so.
        – Leo B.
        Commented19 hours ago
      • @LeoB. Fortran 77? There were many earlier dialects.
        – John Doty
        Commented19 hours ago

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.