Shell script to sort by date column using Linux command 'sort'

Question

I have a file called session.log It contains some text data. 1st column contains serial number. 2nd column contains username. 3rd column contains last logged in date. 4th column contains result.

One user logged in multiple times. I need to find the last logged in date for each unique user. So I wrote a shell script. Output will display the lines with latest logged in date (3rd column) for each unique user (2nd column). Output should not contain same username multiple times.

 $ cat session.log 1 u1 2018-05-19 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass 4 u4 2018-05-17 pass 5 u2 2018-05-14 pass 6 u4 2018-07-11 pass 7 u1 2018-05-16 pass 8 u3 2018-05-13 pass 9 u1 2018-08-12 pass 10 u1 2018-05-10 pass

What I tried:

( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt for line in $(cat "store.txt") do touch "$line.txt" grep "$line" session.log > "$line.txt" temp=$(sort -k3 "$line.txt" | awk 'END{print}') echo $temp > "$line.txt" cat "$line.txt" done

Output

 $ ./sort.sh 9 u1 2018-08-12 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass 6 u4 2018-07-11 pass

The shell script works for the date format (yyyy-mm-dd) and (yyyy/mm/dd). Is there any better code to do the same work? Using awk how do we do this?

EDIT:

$ cat sort.sh ( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt for line in $(cat "store.txt") do #touch "$line.txt" grep "$line" session.log > "$line.txt" echo $(sort -k3 "$line.txt" | awk 'END{print}') #temp=$(sort -k3 "$line.txt" | awk 'END{print}') #echo $temp > "$line.txt" #cat "$line.txt" done rm -f store.txt

Kusalananda · Accepted Answer · 2018-06-17 11:27:42Z

$ sort -k 3,3r session.log | awk '!seen[$2]++ { print }' 9 u1 2018-08-12 pass 6 u4 2018-07-11 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass

(The { print } can be removed completely, I'm only including it to show what happens. The default action is to print the entire input record if the condition is true.)

This sorts the file that you have, by decreasing dates (the most recent date first). The awk program reads the sorted data and prints out the first entry found for each user. The seen variable is an associative array/hash that uses the usernames as keys. If its value is zero for a user, then that user has not yet been seen, and so the line from the file is printed.

Your code with my annotations:

# get list of unique usernames from log: ( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt # shorter equivalent: # cut -d ' ' -f2 <session.log | sort -u >store.txt # loop over the usernames: for line in $(cat "store.txt") do # get entries related to this user: grep "$line" session.log > "$line.txt" # echo the last entry: echo $(sort -k3 "$line.txt" | awk 'END{print}') # shorter equivalent of both of the above commands: # awk -v user="$line" '$2 == user { print }' session.log | sort -k3,3 | tail -n 1 done rm -f store.txt

So an alternative approach based on your shell loop:

cut -d ' ' -f2 <session.log | sort -u | while read username; do awk -v user="$username" '$2 == user { print }' session.log | sort -k 3,3 | tail -n 1 done

Again, the { print } may be left out from the awk script above.

This still sorts subsets of session log once for each user, which is kind of wasteful.

Pre-sorting the log by decreasing dates:

sort -k3,3r -o sorted.log session.log cut -d ' ' -f2 <session.log | sort -u | while read username; do awk -v user="$username" '$2 == user { print; exit }' sorted.log done rm sorted.log

This still requires us to scan the log from the top once per user. The natural improvement is to allow awk to keep track of what users have been seen and what users have not been seen, which gives you my answer at the top.

Stack Exchange Network

Shell script to sort by date column using Linux command 'sort'

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Shell script to sort by date column using Linux command 'sort'

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions