How to get all numbers AND IPs from string using regex?

Question

Let's say I have the following example string:

<ETH0_IP><![CDATA[10.0.100.10]]></ETH0_IP>

I would like to extract the first numberand the IP in the following format:

0 10.0.100.10

I do know how to extract the first (sed 's@^[^0-255]*\([0-255]\+\).*@\1@') number and IPs (grep -oE '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'), but one at the time and I was wondering if I can achieve it in just one line

ilkkachu · Accepted Answer · 2022-07-06 16:30:45Z

If you want all (integer) numbers and all IP(v4) addresses, add an alternation to the regex with grep:

... | grep -oE '[0-9]+|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'

This would print the values one per line, and would of course also catch the 0 from the ETH0 at the end.

If you want just the number and the IP from input lines that contain a similar structure as the above (and no others) you could use e.g. sed:

... | sed -nEe 's,.*<ETH([0-9]+)_IP><!\[CDATA\[([0-9.]+)\]\]></ETH[0-9]+_IP>.*,\1 \2,p'

\1 and \2 correspond to the first and second group in parenthesis, and I matched the IP with just [0-9.]+ here for both clarity and laziness.

or similarly in Perl:

... | perl -ne 'print "$1 $2\n" if m,<ETH([0-9]+)_IP><!\[CDATA\[([0-9.]+)\]\]></ETH[0-9]+_IP>,'

redseven · Accepted Answer · 2022-07-06 16:05:34Z

Replace all non digit (and not ".") characters to spaces and then you can print out the first and 2nd columns:

echo '<ETH0_IP><![CDATA[10.0.100.10]]></ETH0_IP>' | \ sed -re 's;[^0-9.]; ;g' | \ awk '{print $1,$2}'

output:

0 10.0.100.10

ps: you have to make it more sophisticated if you have "." elsewhere and not only in the IPs.

Kusalananda · Accepted Answer · 2022-07-06 16:49:39Z

Using xq (from https://kislyuk.github.io/yq/), and assuming that the input is literally the single XML node from the question:

xq -r 'to_entries[] | [ (.key|ltrimstr("ETH")|rtrimstr("_IP")), .value ] | @tsv' file.xml

This converts the XML document into JSON, and then extracts the remainder from the tag name by stripping off ETH from the start and _IP from the end. The IP address is also extracted and the two resulting values are outputted as a tab-delimited list.

The ltrimstr() and rtrimstr() calls could possibly be replaced by gsub("[^[:digit:]]"; "")) or gsub("\\D"; "")), which would delete all non-digits from the tag name.

The intermediate JSON document would look like

{ "ETH0_IP": "10.0.100.10" }

... and the output would at the end would be

0 10.0.100.10

Stack Exchange Network

How to get all numbers AND IPs from string using regex?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

How to get all numbers AND IPs from string using regex?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions