0

I want to sort contents of file.txt by date. The date to sort is in the fourth table data <td></td> tag

E.g. Content of file.txt:

<tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2018Mar01</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2017Jan31</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2018Apr02</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2017Dec25</td></tr> 

Desired output: How can I do this?

<tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2017Jan31</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2017Dec25</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2018Mar01</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2018Apr02</td></tr> 

I've been using sort command but it's not working.

cat file.txt 2> /dev/null | sort -t'>' -k9n -k9.4M -k9.7n

EDIT: I found this reference link but still doesn't work correctly. https://stackoverflow.com/a/16060031/7842707

4
  • 1
    This isn't really text processing, so much as xml or html processing.
    – agc
    CommentedApr 3, 2018 at 6:46
  • I'm creating an html file to send as an email. So whenever my recipient receives my email, it is in a form of an html table.CommentedApr 3, 2018 at 6:51
  • Don't parse HTML with awk, try a real parserCommentedApr 3, 2018 at 7:09
  • Actually file.txt is just a text file. It's not the final HTML file. After sorting that out, I'll be redirecting the output to an HTML file. As you can see, there's no HTML tag or table tag in it.CommentedApr 3, 2018 at 7:11

1 Answer 1

1

In case if each <tr> item is on a separate line:

awk + sort solution:

awk -F'[<>]' '{ print $(NF-4), $0 }' file.txt | sort -k1,1n -k1.5M | cut -d' ' -f2- 
  • -F'[<>]' - treat < and > as field separator
  • $(NF-4) - contains the last <td> tag value (e.g. 2017Jan31) from each row. This is used as a sorting key.
  • -k1,1n - sort by the 1st field numerically (i.e. by date year)
  • -k1.5M - date month sorting starting from the 5th char
  • cut -d' ' -f2- - remove an auxiliary sorting key (the 1st field)

The output:

<tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2017Jan31</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2017Dec25</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2018Mar01</td></tr> <tr><td>some_name_here</td><td>number_code_here</td><td>2018Mar31</td><td>2018Apr02</td></tr> 
5

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.