0

Let's say I have the following example string:

<ETH0_IP><![CDATA[10.0.100.10]]></ETH0_IP> 

I would like to extract the first numberand the IP in the following format:

0 10.0.100.10 

I do know how to extract the first (sed 's@^[^0-255]*\([0-255]\+\).*@\1@') number and IPs (grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'), but one at the time and I was wondering if I can achieve it in just one line

    3 Answers 3

    1

    If you want all (integer) numbers and all IP(v4) addresses, add an alternation to the regex with grep:

    ... | grep -oE '[0-9]+|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' 

    This would print the values one per line, and would of course also catch the 0 from the ETH0 at the end.


    If you want just the number and the IP from input lines that contain a similar structure as the above (and no others) you could use e.g. sed:

    ... | sed -nEe 's,.*<ETH([0-9]+)_IP><!\[CDATA\[([0-9.]+)\]\]></ETH[0-9]+_IP>.*,\1 \2,p' 

    \1 and \2 correspond to the first and second group in parenthesis, and I matched the IP with just [0-9.]+ here for both clarity and laziness.


    or similarly in Perl:

    ... | perl -ne 'print "$1 $2\n" if m,<ETH([0-9]+)_IP><!\[CDATA\[([0-9.]+)\]\]></ETH[0-9]+_IP>,' 
      1

      Replace all non digit (and not ".") characters to spaces and then you can print out the first and 2nd columns:

      echo '<ETH0_IP><![CDATA[10.0.100.10]]></ETH0_IP>' | \ sed -re 's;[^0-9.]; ;g' | \ awk '{print $1,$2}' 

      output:

      0 10.0.100.10 

      ps: you have to make it more sophisticated if you have "." elsewhere and not only in the IPs.

        1

        Using xq (from https://kislyuk.github.io/yq/), and assuming that the input is literally the single XML node from the question:

        xq -r 'to_entries[] | [ (.key|ltrimstr("ETH")|rtrimstr("_IP")), .value ] | @tsv' file.xml 

        This converts the XML document into JSON, and then extracts the remainder from the tag name by stripping off ETH from the start and _IP from the end. The IP address is also extracted and the two resulting values are outputted as a tab-delimited list.

        The ltrimstr() and rtrimstr() calls could possibly be replaced by gsub("[^[:digit:]]"; "")) or gsub("\\D"; "")), which would delete all non-digits from the tag name.

        The intermediate JSON document would look like

        { "ETH0_IP": "10.0.100.10" } 

        ... and the output would at the end would be

        0 10.0.100.10 

          You must log in to answer this question.

          Start asking to get answers

          Find the answer to your question by asking.

          Ask question

          Explore related questions

          See similar questions with these tags.