0

I am trying to get data from a file that is like this:

 6 6 1 0 0.1166667E+02 0.4826611E-09 0.4826611E-09 0.3004786E-09 0.5000000E-15 1.000000000000000E-004 CAR system-001 10.51965443 -34.96542345 301 1.95329810 1.00000000 -15.558 0.1631E+01 0.1597E+02 -15.407 0.1661E+02 0.1779E+02 -15.255 0.4253E+01 0.1990E+02 -15.104 0.0000E+00 0.2000E+02 -14.952 0.0000E+00 0.2000E+02 -3.884 0.0000E+00 0.2000E+02 -3.732 0.0000E+00 0.2000E+02 -3.581 0.0000E+00 0.2000E+02 -3.429 0.0000E+00 0.2000E+02 -3.277 0.8214E-03 0.2000E+02 -3.126 0.3543E+00 0.2002E+02 1.726 0.1019E+01 0.4386E+02 1.877 0.5581E+00 0.4399E+02 2.029 0.0000E+00 0.4400E+02 2.181 0.0000E+00 0.4400E+02 2.332 0.0000E+00 0.4400E+02 2.484 0.0000E+00 0.4400E+02 2.636 0.0000E+00 0.4400E+02 2.787 0.0000E+00 0.4400E+02 2.939 0.0000E+00 0.4400E+02 3.090 0.0000E+00 0.4400E+02 3.242 0.0000E+00 0.4400E+02 3.394 0.0000E+00 0.4400E+02 3.545 0.0000E+00 0.4400E+02 3.697 0.0000E+00 0.4400E+02 3.849 0.0000E+00 0.4400E+02 4.000 0.0000E+00 0.4400E+02 4.152 0.6271E-01 0.4400E+02 4.303 0.4520E+01 0.4433E+02 4.455 0.5040E+01 0.4511E+02 

I want to take always the fourth column from the 6 line (1.95329810 in this case), then look for its closest value in the following lines, from the first column(1.877 in this case). That only for referencing, after founding that, I want to extract the next line which its second column is non zero (4.152).

So I would like to get 1.95329810 and 4.152 as output, so I can substract them and get:

band_gap=4.152-$fermi_energy 

By taking in consideration @DopeGhoti s answer, I used his code with an if statement:

#!/bin/bash fermi_energy=$(awk 'NR==6 {printf $4}' DOSCAR-62.4902421.st) awk -f go.awk DOSCAR-62.4902421.st 

Where the go.awk file is:

BEGIN { test=0 } NF == 3 && test == 0 && $2 != "0.0000E+00" { keptvalue=$1 } NF == 3 && test == 0 && $2 == "0.0000E+00" { #print keptvalue test=1 } NF == 3 && test == 1 && $2 != "0.0000E+00" { if ( sqrt(($fermi_energy-$1)**2) < 0.5 ) { print $1 test=0 } } 

But I think that it is not the right way to use bash variables inside an awk script.

P.D. In the case you are wondering, the data represents the calculations of the Density Of States of the electrons of an oxide. The first column represents the electron's energies, the second the electron's amount in that energy level. Therefore, when looking for the next non '0.0000E+00' value since the closest level of the Fermi Energy, we can calculate the energy required to make the electrons jump and conduct electricity. (Metals have zero band gap, thus they do not need energy input to conduct electricity)

2
  • 1
    How do you define "nearest"? isn't 2.029 nearer than 1.877 to 1.95329810, in absolute terms?CommentedFeb 28, 2018 at 0:44
  • @steeldriver you are right, but as you may have read, I do not need to consider that value, because it has '0.0000E+00' in its second column.CommentedFeb 28, 2018 at 2:39

2 Answers 2

3

The answer below makes a number of changes to your technique.

  1. Do it all in a single awk program instead of two. You can do that because your second run only deals with lines after line 6:

  2. Properly assign your fermi_energy value from line 6.

  3. No longer need to check for NF==3 because all lines after line 6 fulfill that criteria.

  4. Eliminate variable test, and instead let's keep a running tab of the minimum difference between fermi_energy and $1. For that, we will create a variable min which will initially have a ridiculously large value, guaranteed to fail the first test. We'll also assign understandable names to the other variables, and print only one result, after testing all lines of the file.

  5. Replace your computationally heavy absolute value test with a computationally light zero test.

  6. Note that awk supports floating point scientific notation. For example, in a printf command, one may use the format %E. As usual, see the man page or your favorite search engine for more.

  7. All this done with no understanding of particle physics, so I may have got something incorrect. Pardon. If so, I hope at least this puts you on the right track.

    awk ' BEGIN { min=1000 ; jump_energy="INIT" } function abs(v) {return v < 0 ? -v : v} NR==6 {fermi_energy=$4} NR>6 { if (jump_energy != 0) { this_diff=abs(fermi_energy-$1) if (this_diff < min) { min=this_diff energy_level=0 jump_energy=0 getline } } if (jump_energy == 0 && $2 != "0.0000E+00") { energy_level=$1 jump_energy=$2 } } END { printf " Fermi Energy: %f\n Energy Level: %f\n Jump Energy: %E\n", fermi_energy, energy_level, jump_energy }' 
12
  • 1
    @JoshuaSalazar - yes! So sorry. I had the if statement syntax wrong; I thought I could nest address ranges, like I remember in sed.CommentedFeb 28, 2018 at 3:54
  • 1
    @JoshuaSalazar - I would like to, but I don't know enough about what literature is out there and current to be the best person to make a recommendation. Also, the code has been updated.CommentedFeb 28, 2018 at 4:08
  • 1
    @JoshuaSalazar - yes, corrected in cross-talk already. See update. My other error was using the incorrect format specifiers %s is for string, and we want float and scientific notation. My bad, I should have originally tested with the actual data instead of just coding the answer. Note how awk adjusts the scientific notation of the jump energy so that the first digit is never zero.CommentedFeb 28, 2018 at 4:19
  • 1
    @JoshuaSalazar - Hmm, my results are: Fermi Energy: 1.953298 Energy Level: 4.152000 Jump Energy: 6.271000E-02CommentedFeb 28, 2018 at 4:46
  • 1
    @JoshuaSalazar : I think I see a difference. My version has ` NR>6 {` with the brace on the same line, not on a dedicated separate line. Try that, please.CommentedFeb 28, 2018 at 4:54
0
awk 'NR == 6 { fe = $4 } NR > 6 && $1 > fe && $2 > 0 { print fe, $1; exit }' file 

For the given input data in file this will produce

1.95329810 4.152 

The awk script ignores the first five lines of input. At line six, it picks out the fourth field an assigns it to the variable fe (short ofr "Fermi energy".

The code then assumes that the values in column one are increasing, and when the first of these first column values reaches a value above the value stored in fe, and if the second column is non-zero, it prints out fe and the value from the first column and exits.

Unfortunately, I don't fully understand your longer code segment as there is no explanation of what you actually want it to do.

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.