1

I have the following csv file:

 "V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","Class" 65,Female,0.7,0.1,187,16,18,6.8,3.3,0.9,1 62,Male,10.9,5.5,699,64,100,7.5,3.2,0.74,1 62,Male,7.3,4.1,490,60,68,7,3.3,0.89,1 58,Male,1,0.4,182,14,20,6.8,3.4,1,1 72,Male,3.9,2,195,27,59,7.3,2.4,0.4,1 46,Male,1.8,0.7,208,19,14,7.6,4.4,1.3,1 

I am only interested in the columns V1:age, V2:sex, V8:grade1, V9:grade2.

I would like to create a bash script that will output the the data where V9 is equal to 3 and sort the output by sex, showing the Female data first.

I am a 100% beginner with bash scripts and although I know how to obtain this output from shell, I could only come up with this when it comes to bash script commands:

#!/usr/bin/env bash INPUT=./phpOJxGL9.csv OLDIFS=$IFS IFS=',' [ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; } echo Grade2 = 3 echo Age Sex Grade2 Grade1 echo '************************' while read V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 do if [ $V9 -eg "3" ]; then cut -d',' -f1,2,8,9 | sort -k2 -t',' fi done < $INPUT IFS=$OLDIFS 

The out put should look somewhat like this:

enter image description here

Can anyone help?

9
  • 1
    A shell script is pretty much just a sequence of commands. So if you can solve your problem at the command line you can solve it in a shell script by using those same commands. Does this help?CommentedOct 30, 2021 at 19:02
  • 1
    You might be better with a text processing language like perl, awk, or Python - this task is very easy in Python using the pandas lib and this can be used in a script
    – moo
    CommentedOct 30, 2021 at 19:06
  • Refine your problem description. Your problem description "V9 is equal to 3" doesn't match any of your example data. This is difficult in bash, but easy in languages designed for text handling, like perl, Python,...But see man cut paste bash.CommentedOct 30, 2021 at 20:11
  • I do know that python is a better language to manipulate a csv, but I would like to learn bash scripting. I have managed to get the result with adding shell commands to this script. Thank you @roaima (: the only part I am missing, is filtering the output based on the column value. The if statement does not work, is it a correct way to try to output only the rows for witch the V9 column value is 3?
    – Olaola
    CommentedOct 31, 2021 at 8:38
  • The script you post is incomplete -- it just ends in mid-air. The line if [$V9 -eg 3]; then is junk: the variable should be double-quoted, it needs spaces either side of each square bracket, the operator for equal is -eq, none of your data is equal to this, and shell does not do real numbers anyway. This cannot be the script you are running. Pass all scripts through shellcheck.net before running.CommentedOct 31, 2021 at 9:03

2 Answers 2

2

Your own bash script is a good start. But using suitable tools can make life easier. Here is an example: Your sample input doesn't have any v9=3, so I have used v9>=3 just to demonstrate the command.

tail -n+2 your-input | awk -F, '($9>=3){print $1, $2, $8, $9}' | sort -k2 | awk 'OFS="," {print $1,$2,$3,$4}' 65,Female,6.8,3.3 58,Male,6.8,3.4 62,Male,7,3.3 62,Male,7.5,3.2 46,Male,7.6,4.4 

Explanation: tail -n+2 simply removes the title line.

note that we must have tab or space separation before using sort column option

the second awk is to replace spaces by commas

    0

    Your script is already almost finished. The only thing left is to check in an if condition if V9 is equal to 3. In order to show female data first, I'd suggest putting the while loop in a function which requires the gender as a first argument and then run the function once for each gender.

    INPUT=phpOJxGL9.csv OLDIFS=$IFS IFS=',' [ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; } function readCsv { while read V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 do requiredGender="$1" if [[ "$V2" == "$requiredGender" ]] then if [[ "$V9" == "3" ]] then echo "$V1,$V2,$V8,$V9" fi fi done < $INPUT } echo Grade2 = 3 echo Age Sex Grade2 Grade1 echo '************************' echo echo "Women" echo "--------------" readCsv "Female" echo echo "Men" echo "--------------" readCsv "Male" IFS=$OLDIFS 

    You'll have to make the script executable in order to run it:

    chmod +x script.sh ./script.sh 

    Keep in mind that the csv file you provided above doesn't contain a single column where V9 is equal to 3, thus running the script above wouldn't output any data. I added those two sample columns:

    50,Female,,,,,,1,3,, 50,Male,,,,,,1,3,, 

    and this is the script's output:

    Grade2 = 3 Age Sex Grade2 Grade1 ************************ Women -------------- 50,Female,1,3 Men -------------- 50,Male,1,3 

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.