2

I am new to sed and would like to know how to replace a pattern with a different variable every time

I have a txt file that looks like this:

@K3KFV:1:1109:11598:25872 @K3KFV:1:2101:22577:15247 @K3KFV:1:1110:13477:13178 @K3KFV:1:2113:23585:6859... (etc) 

In total there are 200 different lines. In addition I have another file:

ASF356_KB822565.1:1065516-1065795 TAGGTCAAGCCCTCGGTCTATTAGTATTGGTCAGCTTAATACATTGCTGCACTTACACCT CCAACCTATCTACCTTGTTGTCTTCAAGGGACCTTACTCACTTGCGTTTTGGGATATCTT ASF356_KB822565.1:1065796-1066075 CGGATAGGGACCGAACTGTCTCACGACGTTCTGAACCCAGCTCGCGTACCGCTTTAATGG GCGAACAGCCCAACCCTTGGGACCTACTTCAGCCCCAGGATGCGATGAGCCGACATCGAG ASF356_KB822565.1:1066076-1066355 CCTTTTGCCTTTACACTCTTTGAATGGTTTCCAATCATTCTGAGGTGACCTTCGAGCGCC TCCGTTACTCTTTTGGAGGCGACCGCCCCAGTCAAACTGCCCGCCTGACATTGTCCTTCA 

which also contains 200 instances of "ASF....."

What I want is to replace the line containing "ASF..." with one from "@K3KFV:....." so in the end it will look like:

@K3KFV:1:1109:11598:25872 TAGGTCAAGCCCTCGGTCTATTAGTATTGGTCAGCTTAATACATTGCTGCACTTACACCT CCAACCTATCTACCTTGTTGTCTTCAAGGGACCTTACTCACTTGCGTTTTGGGATATCTT @K3KFV:1:2101:22577:15247 CGGATAGGGACCGAACTGTCTCACGACGTTCTGAACCCAGCTCGCGTACCGCTTTAATGG GCGAACAGCCCAACCCTTGGGACCTACTTCAGCCCCAGGATGCGATGAGCCGACATCGAG @K3KFV:1:1110:13477:13178 CCTTTTGCCTTTACACTCTTTGAATGGTTTCCAATCATTCTGAGGTGACCTTCGAGCGCC TCCGTTACTCTTTTGGAGGCGACCGCCCCAGTCAAACTGCCCGCCTGACATTGTCCTTCA 

This is the shell script I have so far:

input="K3KFVfile.txt" while IFS= read -r title do sed '/ASF/c'$title'' ASF_file done < "$input" 

But instead of giving me 200 lines of @K3KFV... I get 40000 because each ASF line gets replaced with every single one of the @K3KFV lines.

Is there a way to use sed to replace a pattern using a variable only once before moving on? Is sed the correct command to use in this case?

2
  • I added the blank lines because of formatting. Thanks for the suggestion I'll edit it for clarification.
    – Jasmine
    CommentedMar 16, 2022 at 18:41
  • Thanks! much clearerCommentedMar 16, 2022 at 18:49

3 Answers 3

3

If you have the GNU implementation of sed, you could use the (uppercase) R command - one of its Commands Specific to GNU sed - to read and insert a single line of the first file each time it matches a line starting with ASF in the second. Then delete the matched line:

$ sed '/^ASF/{ R K3KFVfile.txt d }' ASF_file @K3KFV:1:1109:11598:25872 TAGGTCAAGCCCTCGGTCTATTAGTATTGGTCAGCTTAATACATTGCTGCACTTACACCT CCAACCTATCTACCTTGTTGTCTTCAAGGGACCTTACTCACTTGCGTTTTGGGATATCTT @K3KFV:1:2101:22577:15247 CGGATAGGGACCGAACTGTCTCACGACGTTCTGAACCCAGCTCGCGTACCGCTTTAATGG GCGAACAGCCCAACCCTTGGGACCTACTTCAGCCCCAGGATGCGATGAGCCGACATCGAG @K3KFV:1:1110:13477:13178 CCTTTTGCCTTTACACTCTTTGAATGGTTTCCAATCATTCTGAGGTGACCTTCGAGCGCC TCCGTTACTCTTTTGGAGGCGACCGCCCCAGTCAAACTGCCCGCCTGACATTGTCCTTCA 

You can write it as a one-liner if you prefer:

sed -e '/^ASF/{R K3KFVfile.txt' -e 'd}' ASF_file 

Alternatively you could consider using awk:

awk 'NR==FNR{K[FNR] = $0; next} /^ASF/{$0 = K[++n]} 1' K3KFVfile.txt ASF_file 
    1

    This answer is a bit of a riff on @steeldriver's

    If the blank lines in the ASF_file are truly empty (no whitespace), then this awk would work

    awk ' NR == FNR {x[FNR] = $0; next} {$1 = x[FNR]; print} ' K3KFVfile.txt RS='' ORS='\n\n' FS='\n' OFS='\n' ASF_file 

    Before it starts reading the 2nd file, I'm changing some awk variables to control how the records and fields are determined. I'm usually not a fan of this style, but it works well here. This GNU awk version is a little tidier

    gawk ' NR == FNR {x[FNR] = $0; next} ENDFILE {RS = ""; ORS = "\n\n"; FS = OFS = "\n"} {$1 = x[FNR]; print} ' K3KFVfile.txt ASF_file 
      1

      Using awk :

      awk '/^ASF/ {getline < "@K3FVfile.txt"};1' ASF_file 

      Same thing in Perl:

      perl -pe 's/^ASF.*/<STDIN>/se' ASF_file < @K3FVfile.txt 

      Using POSIXly sed:

      sed -n '/\n/bh 1{ :k3 H;1h;n /^@K3KFV/bk3 } /^ASF/g P;/\n.*\n/D s/.*\n//;th d;:h h ' @K3FVfile.txt ASF_file 

      Using list comprehension in Python:

      python3 -c 'import sys;a,b = sys.argv[1:] with open(a) as f, open (b) as g: print(*[next(f) if l.startswith("ASF") else l for l in g],sep="",end="") ' @K3FVfile.txt ASF_file 

      Output :

      @K3KFV:1:1109:11598:25872 TAGGTCAAGCCCTCGGTCTATTAGTATTGGTCAGCTTAATACATTGCTGCACTTACACCT CCAACCTATCTACCTTGTTGTCTTCAAGGGACCTTACTCACTTGCGTTTTGGGATATCTT @K3KFV:1:2101:22577:15247 CGGATAGGGACCGAACTGTCTCACGACGTTCTGAACCCAGCTCGCGTACCGCTTTAATGG GCGAACAGCCCAACCCTTGGGACCTACTTCAGCCCCAGGATGCGATGAGCCGACATCGAG @K3KFV:1:1110:13477:13178/ CCTTTTGCCTTTACACTCTTTGAATGGTTTCCAATCATTCTGAGGTGACCTTCGAGCGCC TCCGTTACTCTTTTGGAGGCGACCGCCCCAGTCAAACTGCCCGCCTGACATTGTCCTTC 

        You must log in to answer this question.

        Start asking to get answers

        Find the answer to your question by asking.

        Ask question

        Explore related questions

        See similar questions with these tags.