For-loop - appending to arrays with iterator in the array name

Question

I have the following problem. I have an array arr with some values. I want to sort each value into a set of different - and already declared - arrays earr$j, i.e. arr[0] into earr1, arr[1] into earr2 and, in general, arr[j-1] into earr$j. (Later, I will have the elements of similar arrs be appended as the next elements of the target earr$js). I have tried doing so with the following snippet of code (which is part of a larger piece of code):

for j in $(seq 1 $number_of_elements); do earr$j+=(${arr[j-1]}); done

I have been told (see my post "https://unix.stackexchange.com/questions/675454/for-loop-and-appending-over-list-of-arrays") it looks as though I intend to create a 2-D array (which Bash does not support). I stress that this is not my intention, regardless of what the result of my poor use of Bash syntax may suggest. I am reposting this as my old post really described the issue poorly.

Given you're using the array assignment syntax (with the parens), earrN must be arrays themselves, and if we look at a set of variables named foo1, foo2, etc. that does look a lot like an array in itself. Though you're right, it's not a 2D-array, but an array of arrays, since it doesn't need to be rectangular. — ilkkachu, CommentedOct 30, 2021 at 22:32
Use perl. Or python. Or awk. Or any language that isn't shell. Shell is a terrible language for doing data processing in. Perl, for example, supports arrays-of-arrays (AoA), arrays-of-hashes (AoH), as well as HoAs and HoHs, and multi-level data structures based on them nested as many levels deep as you need. It will be much easier, and your code will run much faster. — cas, CommentedOct 31, 2021 at 10:22
BTW, one of the benefits of arrays in shell implementations that support them (bash, ksh, zsh, etc) is that you don't have to do that ugly and human-error-prone variable indirection (like earr$j) any more, so why even do that? IMO you probably need to rethink your data structure from the ground up. And, as I said above, use a language that's actually suited to processing data instead of a language that's suited to co-ordinating the execution of other programs to do the data-processing work. — cas, CommentedOct 31, 2021 at 10:25
I thank both of you for answering. I'll give it all a thought. It's odd however, isn't it? That I cannot just access the value in the j-th element of an array and append this value to another array (which happens to have j in its name). — UserAthos, CommentedOct 31, 2021 at 12:05
no, that will give you a syntax error because shell won't see earr$j as one token (variable name), it will split the token at the $ sign, not construct a token from the fixed string earr and the value of $j. Try eval "earr$j+=(1)" instead. Note that eval is potentially dangerous and should be used with caution - it tells your shell to evaluate the string and execute it. eval has its uses, but most of those uses are awkwardly trying to work around a deficiency in shell. have i suggested using a better language yet? :-) — cas, CommentedOct 31, 2021 at 13:31

Stéphane Chazelas · Accepted Answer · 2021-10-31 21:25:23Z

To answer the question literally, here, that's typically a job for eval:

for i in "${!arr[@]}"; do eval ' earr'"$i"'+=( "${arr[i]}" ) ' done

eval is dangerous, but safe if used properly. A good approach to limit the risk of mistake is to quote everything with single quotes except the parts that definitely need do undergo some expansion and make sure the part that is not within single quotes (here $i which is in double quotes instead and will be expanded to the contents of the i variable) is fully under your control. In this case, we know $i will contain only digits, so that's not random data that eval would evaluate as shell code (compare with ${arr[i]} that you definitely don't want to leave out of the single quotes).

I still don't see why you'd say 2D arrays are not appropriate. In ksh93 (bash copied most of its syntax from ksh93, but didn't copy multidimensional arrays), you'd do:

for i in "${!arr[@]}"; do earr[i]+=( "${arr[i]}" ) done

In any case, unless there's a specific reason why you need to use a shell, I agree with @cas that it sounds like you'd be better off using a proper programming language such as perl or python.

Hi Stéphane. Thank you too for your answer. I have managed to solve my issue (more or less) using the eval command as suggested by @cas. I will try out your suggestion as well, which is a tad different (although I expect the same result). I fully agree with both of you that I might try using a different language in the future. — UserAthos, CommentedOct 31, 2021 at 21:04

cas · Accepted Answer · 2021-10-31 12:59:13Z

Here's an example of how to do what you described using perl and a Hash-of-Array-of-Arrays (HoAoA) data structure.

To help in understanding this, the following man pages will be useful: perldata (perl data types), perldsc (data structures), perllol (lol = lists of lists), perlref (references) and perlreftut (tutorial for references). You can also get details on specific perl functions with the perldoc command - e.g. perldoc -f opendir or perldoc -f grep.

Note that the sort and grep used in the script are built-in perl functions. They are not the sort and grep command-line tools...you can call those from perl if you want to (with backticks or qx quoting, or the system() function, or with the open() function to open a pipe, and several other ways). Use perldoc for details on all of these and more.

$ cat HoAoA.pl #!/usr/bin/perl use strict; use Data::Dump qw(dd); # $h is a ref to Hash-of-Array-ofArrays (HoAoA). # # This will be a data structure with the directory names # (Folder1, Folder2, Folder3) as the hash keys of the top-level # hash. Each element of that hash will be an array where the # indexes are the line numbers of the data.txt files in each # of those directories. The data in these second-level arrays # will be an array containing the three values in each line of # data.txt: $$h{directory}[line number][element] my $h; # get the directory name from the first command line arg, default to ./ my $dir = shift // './'; # get a list of subdirectories that contain 'data.txt', # excluding . and .. opendir(my $dh, "$dir") || die "Couldn't open directory $dir: $!\n"; my @dirs = sort grep { $_ !~ /^\.+$/ && -d $_ && -f "$_/data.txt" } readdir($dh); closedir($dh); dd \@dirs; # Data::Dump's dd function is great for showing what's in an array print "\n"; foreach my $d (@dirs) { my $f = "$d/data.txt"; open(my $fh,"<",$f) || die "Couldn't open file $f: $!\n"; my $lc=0; # line counter while(<$fh>) { chomp; # strip trailing newline char at end-of-line my @row = split /\s*,\s*/; # assume simple comma-delimited values push @{ $$h{$d}[$lc++] }, @row; } close($fh); } # dd is even better for showing complex structured data dd $h; print "\n"; # show how to access individual elements, e.g. by changing the # zeroth element of line 0 of 'Folder1' to 999. $$h{'Folder1'}[0][0] = 999; dd $h; print "\n"; # show how to print the data without using Data::Dump # a loop like this can also be used to process the data. # You could also process the data in the main loop above # as the data is being read in. foreach my $d (sort keys %{ $h }) { # `foreach my $d (@dirs)` would work too print "$d/data.txt:\n"; foreach my $lc (keys @{ $$h{$d} }) { print " line $lc: ", join("\t",@{ $$h{$d}[$lc] }), "\n"; } print "\n"; }

Note: the above is written to process simple comma-delimited data files. For actual CSV, with all its quirks and complications (like multi-line double-quoted fields with embedded commas), use the Text::CSV module. This is a third-party library module that isn't included with the core perl distribution. On Debian and related distros you can install this with apt-get install libtext-csv-perl libtext-csv-xs-perl. Other distros probably have similar package names. Or you can install it with cpan (a tool to install and manage library modules that IS included with perl core).

Also note: the above script uses the Data::Dump. This is a third-party module which is useful for dumping structured data. Unfortunately, it's not included as part of the perl core library. On Debian etc apt-get install libdata-dump-perl. Other distros will have a similar package name. And, as a last resort, you can install it with cpan.

Anyway, with the following folder structure and data.txt files:

$ tail */data.txt ==> Folder1/data.txt <== 1,2,3 4,5,6 7,8,9 ==> Folder2/data.txt <== 7,8,9 4,5,6 1,2,3 ==> Folder3/data.txt <== 9,8,7 6,5,4 3,2,1

running the HoHoA.pl script produces the following output:

$ ./HoAoA.pl ["Folder1", "Folder2", "Folder3"] { Folder1 => [[1, 2, 3], [4, 5, 6], [7, 8, 9]], Folder2 => [[7, 8, 9], [4, 5, 6], [1, 2, 3]], Folder3 => [[9, 8, 7], [6, 5, 4], [3, 2, 1]], } { Folder1 => [[999, 2, 3], [4, 5, 6], [7, 8, 9]], Folder2 => [[7, 8, 9], [4, 5, 6], [1, 2, 3]], Folder3 => [[9, 8, 7], [6, 5, 4], [3, 2, 1]], } Folder1/data.txt: line 0: 999 2 3 line 1: 4 5 6 line 2: 7 8 9 Folder2/data.txt: line 0: 7 8 9 line 1: 4 5 6 line 2: 1 2 3 Folder3/data.txt: line 0: 9 8 7 line 1: 6 5 4 line 2: 3 2 1

BTW, the above doesn't really need Data::Dump - if it's not installed on your cluster, and you can't get your admins to install it for you, and you can't install it on the compute nodes with cpan then just get rid of it from the script. Data::Dump is just a convenient pretty-printing tool to do essentially the same thing as the last output loop in the script does anyway. — cas, CommentedOct 31, 2021 at 12:39
Awesome, thank you! As I said in my comments above, whether I can use Perl or not does not entirely depend on me. I suppose it will be possible, though, and your suggestion may come in handy. Thanks again for your time! — UserAthos, CommentedOct 31, 2021 at 12:42
perl is probably installed on your cluster, especially if it's running linux...core perl, at least, if not the cpan modules. Otherwise, python almost certainly will be - python is a very popular language for scientific computing and HPC users. Depending on how you need to process the data in your data.txt files, python may be a better choice than perl. Also, if you have lots of python users nearby that you can ask for help from, that would also make it a better choice. — cas, CommentedOct 31, 2021 at 12:56

Stack Exchange Network

For-loop - appending to arrays with iterator in the array name

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

For-loop - appending to arrays with iterator in the array name

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions