2

I have a file called session.log It contains some text data. 1st column contains serial number. 2nd column contains username. 3rd column contains last logged in date. 4th column contains result.

One user logged in multiple times. I need to find the last logged in date for each unique user. So I wrote a shell script. Output will display the lines with latest logged in date (3rd column) for each unique user (2nd column). Output should not contain same username multiple times.

 $ cat session.log 1 u1 2018-05-19 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass 4 u4 2018-05-17 pass 5 u2 2018-05-14 pass 6 u4 2018-07-11 pass 7 u1 2018-05-16 pass 8 u3 2018-05-13 pass 9 u1 2018-08-12 pass 10 u1 2018-05-10 pass 

What I tried:

( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt for line in $(cat "store.txt") do touch "$line.txt" grep "$line" session.log > "$line.txt" temp=$(sort -k3 "$line.txt" | awk 'END{print}') echo $temp > "$line.txt" cat "$line.txt" done 

Output

 $ ./sort.sh 9 u1 2018-08-12 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass 6 u4 2018-07-11 pass 

The shell script works for the date format (yyyy-mm-dd) and (yyyy/mm/dd). Is there any better code to do the same work? Using awk how do we do this?

EDIT:

$ cat sort.sh ( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt for line in $(cat "store.txt") do #touch "$line.txt" grep "$line" session.log > "$line.txt" echo $(sort -k3 "$line.txt" | awk 'END{print}') #temp=$(sort -k3 "$line.txt" | awk 'END{print}') #echo $temp > "$line.txt" #cat "$line.txt" done rm -f store.txt 

    1 Answer 1

    5
    $ sort -k 3,3r session.log | awk '!seen[$2]++ { print }' 9 u1 2018-08-12 pass 6 u4 2018-07-11 pass 2 u2 2018-06-15 pass 3 u3 2018-05-18 pass 

    (The { print } can be removed completely, I'm only including it to show what happens. The default action is to print the entire input record if the condition is true.)

    This sorts the file that you have, by decreasing dates (the most recent date first). The awk program reads the sorted data and prints out the first entry found for each user. The seen variable is an associative array/hash that uses the usernames as keys. If its value is zero for a user, then that user has not yet been seen, and so the line from the file is printed.


    Your code with my annotations:

    # get list of unique usernames from log: ( awk {print} session.log | cut -f2 -d' ' | sort | uniq ) > store.txt # shorter equivalent: # cut -d ' ' -f2 <session.log | sort -u >store.txt # loop over the usernames: for line in $(cat "store.txt") do # get entries related to this user: grep "$line" session.log > "$line.txt" # echo the last entry: echo $(sort -k3 "$line.txt" | awk 'END{print}') # shorter equivalent of both of the above commands: # awk -v user="$line" '$2 == user { print }' session.log | sort -k3,3 | tail -n 1 done rm -f store.txt 

    So an alternative approach based on your shell loop:

    cut -d ' ' -f2 <session.log | sort -u | while read username; do awk -v user="$username" '$2 == user { print }' session.log | sort -k 3,3 | tail -n 1 done 

    Again, the { print } may be left out from the awk script above.

    This still sorts subsets of session log once for each user, which is kind of wasteful.

    Pre-sorting the log by decreasing dates:

    sort -k3,3r -o sorted.log session.log cut -d ' ' -f2 <session.log | sort -u | while read username; do awk -v user="$username" '$2 == user { print; exit }' sorted.log done rm sorted.log 

    This still requires us to scan the log from the top once per user. The natural improvement is to allow awk to keep track of what users have been seen and what users have not been seen, which gives you my answer at the top.

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.