Here is a simple script which is curling https://unix.stackexchange.com/ and storing the result into an array, which is working fine.
#!/usr/local/bin/bash [ -f pgtoscrap ] && { rm pgtoscrap; }; curl -o pgtoscrap https://unix.stackexchange.com/; declare -a arr; fileName="pgtoscrap"; exec 10<&0 exec < $fileName let count=0 while read LINE; do arr[$count]=$LINE ((count++)) done exec 0<10 10<&-
But, each time I run this script; I get some error for the wrong file descriptor.
./shcrap ./shcrap: line 14: 10: No such file or directory
I think I don't understand well how to use exec
command in a loop correctly. Can someone explain?
-- Update after implementing mapfile
for Bash 4 it became much simpler --
#!/usr/local/bin/bash ## Pass a parameter as e.g. ./linkscrapping.bash https://unix.stackexchange.com/ mapfile -t arr < <(curl -s $1); ## Doing exec stuff with process substitution regex="<a[[:print:]]*<\/a>"; ELEMENTS=${#arr[@]}; firstline=0; for((i=0;i<$ELEMENTS;i++)); do if [[ ${arr[${i}]} =~ $regex ]]; then [[ $firstline<1 ]] && { echo ${BASH_REMATCH[0]} > scrapped; let firstline=$firstline+1; } || { echo ${BASH_REMATCH[0]} >> scrapped; } fi done pg2scrap="scrapped"; mapfile -t arr2 < <(cat $pg2scrap); regex="href=[\"\'][0-9a-zA-Z\:\/\.]+"; ELEMENTS2=${#arr2[@]}; line2=0 for ((i=0;i<$ELEMENTS2;i++)); do if [[ ${arr2[${i}]} =~ $regex ]]; then [[ $line2<1 ]] && { echo ${BASH_REMATCH[0]#href=\"} > links; (( line2++ )); } || { echo ${BASH_REMATCH[0]#href=\"} >> links; } fi done; cat links;
exec
at the end at all, unless you are sourcing the script or reusing this code in a function, because the file descriptor redirections don't effect the calling process from which the file descriptors are inherited, and because the file descriptors will be automatically closed when the shell process exits.