Replace hyphen(s) with underscore(s) within angle brackets in a file

Question

I’m aware of how to replace string; e.g., replace hyphen with underscore in a file.

However, I wish to replace all hyphens with underscore for all text that falls within angle brackets in a given file.

For example, the below file:

<charset-params> <input-charset> <resource-path>/*</resource-path> <java-charset-name>UTF-8</java-charset-name> </input-charset> </charset-params>

should change to:

<charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params>

Note that UTF-8 is unchanged because it is not within angle brackets. How can I do this?

G-Man Says 'Reinstate Monica' · Accepted Answer · 2022-03-16 06:48:04Z

Do

sed ': loop; s/\(<[^>]*\)-\([^>]*>\)/\1_\2/g; t loop'

The s/\(<[^>]*\)-\([^>]*>\)/\1_\2/g looks for a <, a bunch (zero or more) characters that aren’t >, a hyphen (-), another a bunch of characters that aren’t >, and finally a >. It replaces it with the part before the -, and _, and the part after the -. The g operator will cause it to do multiple substitutions at once, but it can do only one <…> at a time. So, for example,

<the-quick><brown-fox> <jumps-over> upside-down <the-lazy-dog>

will change to

<the_quick><brown_fox> <jumps_over> upside-down <the-lazy_dog>

Note that only every <…> word that contained hyphen(s) was changed, but the one that had two hyphens (<the-lazy-dog>) had only its second - changed. The t loop says, if any substitution(s) were made, go back and try to find some more.

Stéphane Chazelas · Accepted Answer · 2022-03-16 08:32:58Z

It's easier with perl:

perl -pe 's{<.*?>}{$& =~ y/-/_/r}ge' < your-file

Or:

perl -i -pe 's{<.*?>}{$& =~ y/-/_/r}ge' your-file

To edit the file in-place.

Bit of a tangent, but what if one wanted a prefix for each replacement, e.g. "prefix_". — malthe, CommentedJul 18, 2023 at 9:14

Moises Najar · Accepted Answer · 2022-03-16 07:26:46Z

Using your sample in a file:

<charset-params> <input-charset> <resource-path>/*</resource-path> <java-charset-name>UTF-8</java-charset-name> </input-charset> </charset-params>

the following awk will do the job in the following manner it will interpret the ">" as the end of the stanza to be treated, and will use the "<" as the beginning of the stanza, thus what is not considered inside the stanza will be saved in the array part and after that all that is inside the stanza will be treated for the desired substitution with gsub, after that the script has to restitute separators and contents outside the scope of the treatment:

awk ' {numrec=split($0,regs,">") for (i=1; i<numrec; ++i){ split(regs[i],part,"<") gsub("-","_",part[2]) res = sprintf("%s%s",res, part[1] "<" part[2] ">")} print res res=""}' entraunder

with the following result:

<charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params>

HTH

Ed Morton · Accepted Answer · 2022-03-16 20:14:35Z

Using GNU awk for the 3rd arg to match() and gensub():

$ awk '{ while ( match($0,/(.*)(<[^>]*-[^>]*>)(.*)/,a) ) { $0 = a[1] gensub(/-/,"_","g",a[2]) a[3] } print }' file <charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params>

Using any awk in any shell on every Unix box:

$ awk '{ while ( match($0,/<[^>]*-[^>]*>/) ) { tgt = substr($0,RSTART,RLENGTH) gsub(/-/,"_",tgt) $0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH) } print }' file <charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params>

Kusalananda · Accepted Answer · 2022-04-11 19:17:56Z

Using xq which is a command-line XML parser that is part of the yq tool package from https://kislyuk.github.io/yq/ (which is a wrapper around the well known JSON parser jq):

xq -x ' walk( if type == "object" then with_entries(.key |= gsub("-"; "_")) else . end )' file.xml

This recursively walks over the whole structure of the given XML file, and if the current thing is an object, it substitutes all dashes with underscores in all keys found in that object.

Example:

$ cat file.xml <charset-params> <input-charset> <resource-path>/*</resource-path> <java-charset-name>UTF-8</java-charset-name> </input-charset> <something/> </charset-params>

$ xq -x 'walk(if type == "object" then with_entries(.key|=gsub("-";"_")) else . end)' file.xml <charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> <something></something> </charset_params>

The xq tool can perform in-place edits by using the -i or --in-place options.

Stack Exchange Network

Replace hyphen(s) with underscore(s) within angle brackets in a file

5 Answers 5

You must log in to answer this question.

Hot Network Questions

Replace hyphen(s) with underscore(s) within angle brackets in a file

5 Answers 5

You must log in to answer this question.

Related

Hot Network Questions