1

I’m aware of how to replace string; e.g., replace hyphen with underscore in a file.

However, I wish to replace all hyphens with underscore for all text that falls within angle brackets in a given file.

For example, the below file:

<charset-params> <input-charset> <resource-path>/*</resource-path> <java-charset-name>UTF-8</java-charset-name> </input-charset> </charset-params> 

should change to:

<charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params> 

Note that UTF-8 is unchanged because it is not within angle brackets.  How can I do this?

    5 Answers 5

    2

    Do

    sed ': loop; s/\(<[^>]*\)-\([^>]*>\)/\1_\2/g; t loop' 

    The s/\(<[^>]*\)-\([^>]*>\)/\1_\2/g looks for a <, a bunch (zero or more) characters that aren’t >, a hyphen (-), another a bunch of characters that aren’t >, and finally a >.  It replaces it with the part before the -, and _, and the part after the -.  The g operator will cause it to do multiple substitutions at once, but it can do only one <> at a time.  So, for example,

    <the-quick><brown-fox> <jumps-over> upside-down <the-lazy-dog> 

    will change to

    <the_quick><brown_fox> <jumps_over> upside-down <the-lazy_dog> 

    Note that only every <> word that contained hyphen(s) was changed, but the one that had two hyphens (<the-lazy-dog>) had only its second - changed.  The t loop says, if any substitution(s) were made, go back and try to find some more.

      2

      It's easier with perl:

      perl -pe 's{<.*?>}{$& =~ y/-/_/r}ge' < your-file 

      Or:

      perl -i -pe 's{<.*?>}{$& =~ y/-/_/r}ge' your-file 

      To edit the file in-place.

      1
      • Bit of a tangent, but what if one wanted a prefix for each replacement, e.g. "prefix_".
        – malthe
        CommentedJul 18, 2023 at 9:14
      1

      Using your sample in a file:

      <charset-params> <input-charset> <resource-path>/*</resource-path> <java-charset-name>UTF-8</java-charset-name> </input-charset> </charset-params> 

      the following awk will do the job in the following manner it will interpret the ">" as the end of the stanza to be treated, and will use the "<" as the beginning of the stanza, thus what is not considered inside the stanza will be saved in the array part and after that all that is inside the stanza will be treated for the desired substitution with gsub, after that the script has to restitute separators and contents outside the scope of the treatment:

      awk ' {numrec=split($0,regs,">") for (i=1; i<numrec; ++i){ split(regs[i],part,"<") gsub("-","_",part[2]) res = sprintf("%s%s",res, part[1] "<" part[2] ">")} print res res=""}' entraunder 

      with the following result:

      <charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params> 

      HTH

        1

        Using GNU awk for the 3rd arg to match() and gensub():

        $ awk '{ while ( match($0,/(.*)(<[^>]*-[^>]*>)(.*)/,a) ) { $0 = a[1] gensub(/-/,"_","g",a[2]) a[3] } print }' file <charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params> 

        Using any awk in any shell on every Unix box:

        $ awk '{ while ( match($0,/<[^>]*-[^>]*>/) ) { tgt = substr($0,RSTART,RLENGTH) gsub(/-/,"_",tgt) $0 = substr($0,1,RSTART-1) tgt substr($0,RSTART+RLENGTH) } print }' file <charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> </charset_params> 
          0

          Using xq which is a command-line XML parser that is part of the yq tool package from https://kislyuk.github.io/yq/ (which is a wrapper around the well known JSON parser jq):

          xq -x ' walk( if type == "object" then with_entries(.key |= gsub("-"; "_")) else . end )' file.xml 

          This recursively walks over the whole structure of the given XML file, and if the current thing is an object, it substitutes all dashes with underscores in all keys found in that object.

          Example:

          $ cat file.xml <charset-params> <input-charset> <resource-path>/*</resource-path> <java-charset-name>UTF-8</java-charset-name> </input-charset> <something/> </charset-params> 
          $ xq -x 'walk(if type == "object" then with_entries(.key|=gsub("-";"_")) else . end)' file.xml <charset_params> <input_charset> <resource_path>/*</resource_path> <java_charset_name>UTF-8</java_charset_name> </input_charset> <something></something> </charset_params> 

          The xq tool can perform in-place edits by using the -i or --in-place options.

            You must log in to answer this question.

            Start asking to get answers

            Find the answer to your question by asking.

            Ask question

            Explore related questions

            See similar questions with these tags.