Edit inside an HTML tag with ed(1)

Question

Consider my humble hello.html file, edited with mighty ed:

$ ed hello.html 28 ,p <title>Hello world!</title>

What's your general approach to edit inside that title HTML tag (bonus if you can edit inside any HTML tag)?

I tried a regular expression that matches inside the tag:

s/>.*/>My new title/p <title>My new title u . <title>Hello world!</title>

But, sadly, you can see that I chopped my tag (and it would be way too much work to type out that </title> bit every time!).

For further education, I browsed through Software Tools in Pascal page to 174—see https://archive.org/details/softwaretoolsinp00kern/page/174/mode/1up?view=theater page—and discovered the & special character that helpfully reaches the middle of the sentence:

s/world/& again/p <title>Hello world again!</title>

But, that's not quite right, since I want to substitute the middle, not just reach the middle.

For a quick change, I would do something like s/>[^>]*</>My new title</, but you'd have to provide some representative input and output for us to say what might work. — muru, CommentedFeb 6, 2024 at 6:29

Stéphane Chazelas · Accepted Answer · 2024-02-06 07:35:07Z

You can use [^<] instead of . to match any character other than < instead of any character.

28 ed> ,n 1 <title>Hello world!</title> ed> s/>[^<]*/>new title/ ed> ,n 1 <title>new title</title>

Another approach could be to insert newlines before and after each < or > so the thing you want to change is on its own line which you can change with c:

28 ed> ,n 1 <title>Hello world!</title> ed> s/[<>]/\ &\ /g ed> ,n 1 2 < 3 title 4 > 5 Hello world! 6 < 7 /title 8 > 9 ed> 5c new title . ed> ,n 1 2 < 3 title 4 > 5 new title 6 < 7 /title 8 > 9 ed> 1,9j ed> ,n 1 <title>new title</title>

Excellent! That >[^<]* regex is very creative and straightforward. I also like that you only need to repeat a the front > instead of the > and < like in unix.stackexchange.com/questions/768628/… comment. — mbigras, CommentedFeb 11, 2024 at 1:45

Chris Davies · Accepted Answer · 2024-02-06 10:41:50Z

A better way is to use an HTML-aware parser and use that to edit content. My preferred tool is xmlstarlet because although it's an XML parser/editor it can also handle HTML:

Create a sample page

cat >my.html <<'EOF' <html> <title>Hello world!</title> <body><p>Thank you for reading my page</p></body> </html> EOF

Replace Hello world! with Hello everyone!:

xmlstarlet format --html my.html 2>/dev/null | xmlstarlet edit --omit-decl --update '//title' --value 'Hello everyone!' <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <head> <title>Hello everyone!</title> </head> <body> <p>Thank you for reading my page</p> </body> </html>

Output is written to stdout, and the usual approach here is to write it to a temporary file and then replace the original. This isn't perfect but it's probably acceptable:

file=my.html ( [ "${file#/}" = "$file" ] && file="./$file" xmlstarlet format --html "$file" 2>/dev/null | xmlstarlet edit --omit-decl --update '//title' --value 'Hello everyone!' >"$file.tmp" && cp -p -- "$file" "$file.old" && mv -f -- "$file.tmp" "$file" )

Note that if $file starts with - you will get errors from xmlstarlet and you cannot use -- to separate it from true options. What we do here is to check whether the file name is absolute, and if not then we prepend ./. You can omit the cp line if you do not need to save a copy of the original content.

xmlstarlet is great, with good documentation, but I find it hard to learn. — reinierpost, CommentedFeb 6, 2024 at 11:38
@reinierpost, there are a number of posts here on U&L and others on StackOverflow that show examples of its use. It is fiddly and the documentation isn't always as clear as it could be (I think it lacks real-world examples) but it can help particularly when you cannot guarantee the shape of the file structure — Chris Davies, CommentedFeb 6, 2024 at 11:46

Mark McKinstry · Accepted Answer · 2024-02-06 06:38:51Z

You shouldn't use a regex to parse HTML. See https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

If you want to do it with ed the below will do it for the HTML tag you give it. But it might be better to use sed. This works because you can use any character with s, it doesn't have to be s/old/new/ it can be s|old|new| or s!old!new!.

$ ed hello.html 28 ,p <title>Hello world!</title> s|<title>.*</title>|<title>foo</title>| ,p <title>foo</title>

The / characters may be uniformly replaced by any other single character within any given s command. The / character (or whatever other character is used in its stead) can appear in the regexp or replacement only if it is preceded by a \ character.

From https://www.gnu.org/software/sed/manual/html_node/The-_0022s_0022-Command.html

Stack Exchange Network

Edit inside an HTML tag with ed(1)

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Edit inside an HTML tag with ed(1)

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions