1

I have many XML files as below where i would like to replace a string with a new string. I cannot seem to get the sed command to work on the xml files.

<form version="1.1" theme="dark"> <label>Forcepoint DLP Dashboard - LongTerm</label> <description>Activity for those with Long-Term Exceptions</description> <fieldset submitButton="false" autoRun="false"> <input type="time" token="TimeFrame" searchWhenChanged="true"> <label>Timeframe</label> <default> <earliest>-48h@h</earliest> <latest>now</latest> </default> </input> </fieldset> <row> <panel> <html> <p>Macros In Use:</p> <p>`ForcepointApprovedUSB` = Known Approved USB Devices</p> <p>`ForcepointKnownCDDVD` = Known CD/DVD Drives</p> <p>`ForcepointKnownMultiFunction` = Known Multi-Function Devices</p> </html> </panel> </row> <row> <panel> <title>Exception Info</title> <table> <search> <query>index=restricted_security sourcetype=forcepoint | rex field=_raw "(.*act=(?&lt;Action&gt;.*?)\s.*)" | rex field=_raw "(.*duser=(?&lt;Device&gt;.*?)(:\s\d|;|\sfname=).*)" | rex field=_raw "(.*duser=.*?;\s(?&lt;Serial&gt;.*?)\sfname=)" | rex field=_raw "(.*fname=(?&lt;Filename&gt;.*?)\smsg=.*)" | rex field=_raw "(.*fname=.:\\\(?&lt;RawFilename&gt;.*)(?:\s-\s.*)\smsg=.*)" | rex field=_raw "(.*suser=(?&lt;Name&gt;.*)\scat=.*)" | rex field=_raw "(.*loginName=.*\\\\(?&lt;Username&gt;.*)\ssourceIp=.*)" | rex field=_raw "(.*sourceIp=(?&lt;IP&gt;.*)\sseverityType=.*)" | rex field=_raw "(.*sourceHost=(?&lt;Source&gt;.*)\sproductVersion=.*)" | rex field=_raw "(.*sourceServiceName=(?&lt;AlertType&gt;.*)\sanalyzedBy=.*)" | eval Username=lower(Username) | eval Action=if(isnull(Action),"-",Action) | eval Serial=if(isnull(Serial),"-",Serial) | eval EnumDeviceType=case( (`ForcepointApprovedUSB`),"ApprovedUSB", (`ForcepointKnownCDDVD`),"CDDVD", (`ForcepointKnownMultiFunction`),"MultiFunction", AlertType="Endpoint Applications" AND Device="Bluetooth","Bluetooth", AlertType="Endpoint Removable Media" AND Device="Windows Portable Device (WPD)","WPD", AlertType="Endpoint Removable Media" AND Device!="Windows Portable Device (WPD)" AND NOT (`ForcepointApprovedUSB`) AND NOT (`ForcepointKnownCDDVD`) AND NOT (`ForcepointKnownMultiFunction`),"UnApprovedUSB") | join type=inner Username [ search index=restricted_security sourcetype=dlp_lt | rename UserID as Username | eval Check = "Yes" | fields Username,Check,Justification,Type,ExpireDate ] | where isnotnull(EnumDeviceType) AND Check="Yes" | eval Time=strftime(_time, "%B %d, %Y %H:%M %Z") | dedup Username | table Time Username Name Justification Type ExpireDate | sort Name</query> <earliest>$TimeFrame.earliest$</earliest> <latest>$TimeFrame.latest$</latest> </search> <option name="drilldown">none</option> <option name="refresh.display">progressbar</option> </table> </panel> </row> <row> <panel> <title>Transfers By Those With Long-Term Exceptions</title> <table> <search> <query>index=restricted_security sourcetype=forcepoint | rex field=_raw "(.*act=(?&lt;Action&gt;.*?)\s.*)" | rex field=_raw "(.*duser=(?&lt;Device&gt;.*?)(:\s\d|;|\sfname=).*)" | rex field=_raw "(.*duser=.*?;\s(?&lt;Serial&gt;.*?)\sfname=)" | rex field=_raw "(.*fname=(?&lt;Filename&gt;.*?)\smsg=.*)" | rex field=_raw "(.*fname=.:\\\(?&lt;RawFilename&gt;.*)(?:\s-\s.*)\smsg=.*)" | rex field=_raw "(.*suser=(?&lt;Name&gt;.*)\scat=.*)" | rex field=_raw "(.*loginName=.*\\\\(?&lt;Username&gt;.*)\ssourceIp=.*)" | rex field=_raw "(.*sourceIp=(?&lt;IP&gt;.*)\sseverityType=.*)" | rex field=_raw "(.*sourceHost=(?&lt;Source&gt;.*)\sproductVersion=.*)" | rex field=_raw "(.*sourceServiceName=(?&lt;AlertType&gt;.*)\sanalyzedBy=.*)" | eval Username=lower(Username) | eval Action=if(isnull(Action),"-",Action) | eval Serial=if(isnull(Serial),"-",Serial) | eval EnumDeviceType=case( (`ForcepointApprovedUSB`),"ApprovedUSB", (`ForcepointKnownCDDVD`),"CDDVD", (`ForcepointKnownMultiFunction`),"MultiFunction", AlertType="Endpoint Applications" AND Device="Bluetooth","Bluetooth", AlertType="Endpoint Removable Media" AND Device="Windows Portable Device (WPD)","WPD", AlertType="Endpoint Removable Media" AND Device!="Windows Portable Device (WPD)" AND NOT (`ForcepointApprovedUSB`) AND NOT (`ForcepointKnownCDDVD`) AND NOT (`ForcepointKnownMultiFunction`),"UnApprovedUSB") | join type=inner Username [ search index=restricted_emn_security sourcetype=dlp_lt | rename UserID as Username | eval Check = "Yes" | dedup Username | fields Username, Check ] | where isnotnull(EnumDeviceType) AND Check="Yes" | eval Time=strftime(_time, "%B %d, %Y %H:%M %Z") | table Time Username Name Action Source Filename Device Serial EnumDeviceType | sort -Time</query> <earliest>$TimeFrame.earliest$</earliest> <latest>$TimeFrame.latest$</latest> </search> <option name="count">30</option> <option name="drilldown">none</option> </table> </panel> </row> </form> 

The pattern i would like to replace is

index=restricted_security sourcetype=forcepoint 

with

index=newname sourcetype=forcepoint 

So any pattern where

index=restricted_security sourcetype=forcepoint 

should be replaced with the new value.

The XML files have many combinations like

index=restricted_security sourcetype=someother value, index=someindex sourcetype=forcepoint 

etc but they don't need to be replaced.

I have tried many patterns like below with many combinations of sed but it does not seem to work

sed 's/index=restricted_security\s\nsourcetype=forcepoint/index=restricted_security sourcetype=forcepoint/g' 

Can someone please point out how to get this to replace?

5
  • Looks a task for Python, Ruby or Perl.CommentedOct 10, 2024 at 14:56
  • Shouldn't the replacement text be /index=newname ...? If it is /index=restricted_security ... it is the same as the text you want to change.CommentedOct 10, 2024 at 15:06
  • 4
    sed (like many *nix utilities) is designed to process inputs a line at a time. sed DOES support a hold buffer and other tricks, but that is advanced usage and can be very brittle AND creates a maintenance nightmare. The GNU sed does support reading the whole file into the buffer, but then you'll need to get it installed in your production environment (assuming this is a real project) and many organizations won't allow such installations. Also processing the whole file requires superior regex skills. Learn to use python below, or as mentioned above xmlstarlet and others.
    – shellter
    CommentedOct 10, 2024 at 15:37
  • 7
    Don't attempt to process XML using non-XML-aware tools. Use XPath, XSLT, or XQuery for this kind of job (or a tool such as xmlstarlet, mentioned below, which is based on XPath).CommentedOct 10, 2024 at 18:43
  • 2
    At this point it's sort-of obligatory to post a link to wise words on the topic in another StackOverflow answer: stackoverflow.com/questions/1732348/…CommentedOct 11, 2024 at 10:22

3 Answers 3

6

Using Python's lxml:

import re from lxml import etree file_path = '/tmp/file.xml' tree = etree.parse(file_path) root = tree.getroot() xpath_expression = '//table/search/query/text()' text_nodes = root.xpath(xpath_expression) if text_nodes: first_text_node = text_nodes[0] lines = first_text_node.splitlines() if lines and 'index=restricted_security' in lines[0]: lines[0] = 'index=NEW_NAME' updated_text = '\n'.join(lines) parent_element = first_text_node.getparent() parent_element.text = updated_text tree.write(file_path, pretty_print=True, xml_declaration=True, encoding='UTF-8') 

The script edit the file in place.

    6

    Using xmlstarlet as shell commands, in 2 calls of this utility:

    #!/bin/sh xmlstarlet sel -t -v '//table/search/query/text()' file.xml > /tmp/temp.txt grep 'index=restricted_security' /tmp/temp.txt || exit 0 xmlstarlet ed -u '//table/search/query' -v "index=NEW_NAME $(awk 'NR>1' /tmp/temp.txt)" file.xml 

    You can add the -L switch to xmlstarlet ed if you need to edit in place.

    You can even edit the /tmp/temp.txt file with sed if needed:

    (this is not XML but text after the first execution of xmlstarlet)

    #!/bin/sh xmlstarlet sel -t -v '//table/search/query/text()' file.xml > /tmp/temp.txt sed -i 's/index=restricted_security/index=NEW_NAME/' /tmp/temp.txt xmlstarlet ed -u '//table/search/query' -v "$(</tmp/temp.txt)" file.xml 
      4

      Using GNU sed for -z, -E, \s shorthand for space, and word boundaries \< and \>:

      $ sed -Ez 's/\<(index=)restricted_security(\s+sourcetype=forcepoint)\>/\1newname\2/g' file > o1 

      $ diff file o1 28c28 < <query>index=restricted_security --- > <query>index=newname 81c81 < <query>index=restricted_security --- > <query>index=newname 

      or if you wanted the 2 strings concatenated onto a single line (it's not clear from your question):

      $ sed -Ez 's/\<(index=)restricted_security\s+(sourcetype=forcepoint)\>/\1newname \2/g' file > o1 

      $ diff file o1 28,29c28 < <query>index=restricted_security < sourcetype=forcepoint --- > <query>index=newname sourcetype=forcepoint 81,82c80 < <query>index=restricted_security < sourcetype=forcepoint --- > <query>index=newname sourcetype=forcepoint 

        You must log in to answer this question.

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.