2

I am trying to delete all comments from a VHDL file with sed and a regular expression.

VHDL comments start with -- , the rest of the line after this is a comment.

My first approach was: sed -i 's/--.*//g' file.vhdl

This deletes all comments, but the file could also contain assignments with don't cares: symbol - . Therefore assignments like sig1 <= "11--000" also are affected. Additionally assignments can be concatenations like sig1 <= "0--" & "--1". Is there a good regex to cover all these cases? Maybe matching from the end of a line, as an assignment line has to be ended with a ; ?

A test file which covers all the cases:

-- comment start of line architecture beh of ent_name is signal sig1 : std_logic_vector(6 downto 0); -- comment end of line begin proc: process (sensitivity) begin sig1 <= "0--11-1"; -- another comment sig1 <= "0--11--"; sig1 <= "00--" & "--1"; -- yet another sig1 <= "00--" & "--1"; end process proc; end beh; 

Thanks!

7
  • 1
    Out of interest, what is your reason for deleting comments?CommentedOct 18, 2017 at 11:03
  • The files are user submitted files, which are automatically checked for certain keywords. E.g. the students have to use predefined entities, so I check for the occurrence of the entityname. I don't want them to trick the system by writing the name as a comment. Or for example if i prohibit the wait statement and someone writes a comment with wait in it it would be rejected.
    – MartinM
    CommentedOct 18, 2017 at 11:07
  • Oh, nice idea. In case it matters, your test code does not cover the case where there is a double quote inside a comment.CommentedOct 18, 2017 at 11:26
  • 1
    What about VHDL-2008 block comments? :-)
    – Matthew
    CommentedOct 18, 2017 at 11:40
  • 1
    char <= '"'; -- Assign a " to char
    – lasplund
    CommentedOct 18, 2017 at 16:26

3 Answers 3

3

Using a parser would be a better solution.

Let's assume you can't, add what you don't want in your pattern, i.e. in here no quotation mark up to end of line:

--[^"]*?$ 

This certainly doesn't cover all cases, but in your example it should work.
Demo here.

1
  • 1
    The possibility of a " in comment is exactly why a code parser would definitely be a better solution. Even with .NET balancing groups or with PCRE recursive constructs we can't assure to accurately parse code, regex are not meant to such tasks (you got my upvote for the point).CommentedOct 19, 2017 at 4:11
2

Quoting IEEE 1076-2008:

15.9 Comments

A comment is either a single-line comment or a delimited comment. A single-line comment starts with two adjacent hyphens and extends up to the end of the line. A delimited comment starts with a solidus (slash)character immediately followed by an asterisk character and extends up to the first subsequent occurrence of an asterisk character immediately followed by a solidus character.

An occurrence of two adjacent hyphens within a delimited comment is not interpreted as the start of a singleline comment. Similarly, an occurrence of a solidus character immediately followed by an asterisk character within a single-line comment is not interpreted as the start of a delimited comment. Moreover, an occurrence of a solidus character immediately followed by an asterisk character within a delimited comment is not interpreted as the start of a nested delimited comment.

A single-line comment can appear on any line of a VHDL description and may contain any character except the format effectors vertical tab, carriage return, line feed, and form feed. A delimited comment can start on any line of a VHDL description and may finish on the same line or any subsequent line. The presence or absence of comments has no influence on whether a description is legal or illegal. Furthermore, comments do not influence the execution of a simulation module; their sole purpose is to enlighten the human reader.

Examples:

-- The last sentence above echoes the Algol 68 report. end; -- Processing of LINE is complete. ----------- The first two hyphens start the comment. /* A long comment may be written on several consecutive lines */ x := 1; /* Comments /* do not nest */ 

NOTE 1—Horizontal tabulation can be used in comments, after the starting characters, and is equivalent to one or more spaces (SPACE characters) (see 15.3).

NOTE 2—Comments may contain characters that, according to 15.2, are non-printing characters. Implementations may interpret the characters of a comment as members of ISO/IEC 8859-1:1998, or of any other character set; for example, an implementation may interpret multiple consecutive characters within a comment as single characters of a multi-byte character set.

Seeing this, it seems impossible to achieve your goal using only a regular expression parser, as you need to parse the string preceding the comment. You will likely need a VHDL parser to evaluate the language specifics. You could look into the prettyprint code that StockOverflow uses. It seems to detect comments quite well.

3
  • Comments are lexical elements typically discarded as not affecting the meaning of a VHDL specification. Historically there are pragmas implemented as comments, intended to be supplanted by -2008 tool directives. Lexical analyzers are a complete ordered set of regular expression analyzers capable of detecting all valid lexical elements. Pretty printers or syntax highlighters typically don't provide a complete set without which you may depend on style conventions.
    – user1155120
    CommentedOct 18, 2017 at 21:11
  • What is syntax highlighting and how does it work? for all Stack Exchange Q&A sites us to lang-vhdl.js implementing an incomplete lexical analyzer. Note strings are evaluated before comments. The RE's evaluation order is defined by the standard.
    – user1155120
    CommentedOct 18, 2017 at 22:47
  • If you look closely the Prettify syntax highlighter used here is susceptible to highlighting errors because it's not complete. See the answer talking about Issue Report IR1045 here. It's an example of why you should really have a complete lexical analyzer.
    – user1155120
    CommentedOct 18, 2017 at 22:58
-1
s/((?:[^”]|”[^”]*”)*)\s*—-.*/$1/ 
1
  • 2
    Thank you for contributing to the Stack Overflow community. This may be a correct answer, but it’d be really useful to provide additional explanation of your code so developers can understand your reasoning. This is especially useful for new developers who aren’t as familiar with the syntax or struggling to understand the concepts. Would you kindly edit your answer to include additional details for the benefit of the community?Commented4 hours ago

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.