2

I am trying to automate running a python command with arguments using multiple files (one at a time) and writing the output in output directories with the same name of the input file but without the extension.

The files are fasta files containing the proteins from bacterial genomes. The python routine is designed to extract those proteins with certain properties and printing various outputs in the directory /myruns/test. If I run it with one genome, the name of test would be replaced.

Example:

fast file XYZ.faa will be taken as input and the output files will be placed in /myruns/XYZ.

Manually it works very well, but I want to use it in batch with multiple *.faa files and create directories with their corresponding names, otherwise the last file processed will erase the content of the previous one.

So far I have built made the following script:

#!/bin/sh for filename in *.faa ; do python predict_genome.py \ --fasta_path /Users/mvalvano/DeepSecE/myruns/${filename} \ --model_location /Users/mvalvano/DeepSecE/model/checkpoint.pt \ --data_dir data \ --out_dir myruns/test --save_attn --no_cuda done exit 0 

This script works, and the output files are saved in a test directory that is specified in the --out_dir argument. My question is how can I replace test in the --out_dir argument with a function that would name the directory with the same name as the input file. I have tried a few options but they do not seem to work.

Thanks Mike

5
  • 1
    Please add a small example of the input files you have and how the resulting output files and directories should be named.
    – Bodo
    CommentedAug 30, 2024 at 17:01
  • 1
    Maybe minor, but your script is running /bin/sh while the question is tagged bash. Do you need a script strictly compatible with /bin/sh?
    – doneal24
    CommentedAug 30, 2024 at 17:05
  • 1
    Does replacing myruns/test with myruns/$(basename -s .faa filename) do what you need? Are whitespace or special characters possible in the file names? As others have said, without knowing exactly what you want & tried and an example of the input and desired output, we're guessing here.
    – doneal24
    CommentedAug 30, 2024 at 17:11
  • 1
    Always double quote variables (or strings containing variables)CommentedAug 30, 2024 at 17:21
  • I tried to fix the formatting. Your directory specifications are not clear. /myruns/test would be an absolute directory, but in your script you wrote myruns/test which is relative to the current working directory which is not necessarily the same as the location of your script and not necessarily /Users/mvalvano/DeepSecE/myruns. I suggest to add an example that makes your directory structure and location of all files and scripts clear.
    – Bodo
    CommentedSep 2, 2024 at 11:55

2 Answers 2

5

Replace test with "${filename%.faa}" to get the name of the file with .faa removed. You should also quote "${filename}" to avoid problems in case of filenames with spaces.

#!/bin/sh for filename in *.faa ; do python predict_genome.py \ --fasta_path /Users/mvalvano/DeepSecE/myruns/"${filename}" \ --model_location /Users/mvalvano/DeepSecE/model/checkpoint.pt \ --data_dir data \ --out_dir myruns/"${filename%.faa}" \ --save_attn --no_cuda done exit 0 

With input files

bar.faa foo.faa 

the script will run

python predict_genome.py --fasta_path /Users/mvalvano/DeepSecE/myruns/bar.faa --model_location /Users/mvalvano/DeepSecE/model/checkpoint.pt --data_dir data --out_dir myruns/bar --save_attn --no_cuda python predict_genome.py --fasta_path /Users/mvalvano/DeepSecE/myruns/foo.faa --model_location /Users/mvalvano/DeepSecE/model/checkpoint.pt --data_dir data --out_dir myruns/foo --save_attn --no_cuda 

Possible problems with this script:

Since you specify --fasta_path /Users/mvalvano/DeepSecE/myruns/"${filename}", your script will only work without error if the current directory is /Users/mvalvano/DeepSecE/myruns/ or if this directory contains at least the same set of *.faa files as the current directory. (*.faa will expand to the file names in the current directory.)

When /Users/mvalvano/DeepSecE/myruns/ is the current directory, the argument --out_dir myruns/foo might expect or create a directory /Users/mvalvano/DeepSecE/myruns/myruns/foo with double myruns.

2
  • Thank you, but this did not work. The output directory cannot be created and the program fails
    – Mike
    CommentedAug 30, 2024 at 18:20
  • @mike If the script does not work, copy&paste the error message. Your requirements and the existing files and directories are not clear in your question.
    – Bodo
    CommentedSep 2, 2024 at 11:56
2

Maybe it would make more sense to write it as:

#! /bin/zsh - topdir=/Users/mvalvano/DeepSecE ret=0 for filename in $topdir/myruns/*.faa(N); do outdir=$filename:r mkdir -p -- $outdir && python -- $topdir/predict_genome.py \ --fasta_path $filename \ --model_location $topdir/model/checkpoint.pt \ --data_dir $topdir/data \ --out_dir $outdir \ --save_attn --no_cuda || ret=$? done exit $ret 

Where we use only absolute paths removing the doubt about what relative paths are relative to.

(here switching to zsh (since that /Users suggests macos) for its :rootname modifier (from csh), its Nullglob qualifier and to remove the need to quote all expansions).

2
  • Thank you Stéphane; your modified script works like a charm. Much appreciated!
    – Mike
    CommentedAug 31, 2024 at 9:16
  • If one of the answers here solved your issue, @Mike, please take a moment and accept it by clicking on the checkmark on the left. That is the best way to express your thanks on the Stack Exchange sites.
    – terdon
    CommentedAug 31, 2024 at 11:23

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.