array1=($(find /etc -mindepth 1 -maxdepth 1 -type d))
Is wrong as it performs split+glob on the output of find
to get the list (and the output of find
without -print0
is not post-processable anyway). The correct syntax in bash
(4.4+) would be:
readarray -td '' array1 < <(find /etc -mindepth 1 -maxdepth 1 -type d -print0)
Or in zsh
:
array1=(/etc/*(ND/))
In echo $var | wc -c
You're counting the number of bytes in the output of echo
. That's not the number of bytes in $var
for several reasons:
- you forgot to quote
$var
so it's subject to split+glob echo
does some transformations. Some implementations expand \x
escape sequences, some treat values like -n
as options- finally,
echo
append a newline character to the output (-n
can skip that with some echo
implementations).
Here, to use wc
to count the bytes, you'd do:
printf %s "$var" | wc -c
In bash
, ${#var}
expands to the number of characters in the variable¹. For it to be the number of bytes, you can fix the locale to C:
LC_ALL=C echo "${#var}"
To get the sum of the length in byte of all the elements of an array, you could concatenate them and then get the length of the resulting string:
printf %s "${array[@]}" | wc -c
Or:
IFS= concat="${array[*]}" LC_ALL=C echo "${#concat}"
With zsh, you could do:
() { set -o localoptions +o multibyte echo ${#${(j[])array}} }
Where the j[sep]
parameter expansion flag is used to join the elements of the array instead of using "${array[*]}"
which uses the global $IFS
. Instead of fixing the locale to C
we can just disable the multibyte
option to get character ≍ byte (which we do here locally in an anonymous function).
Note that to see the difference between byte and character, you need a locale that uses a multibyte encoding as its charmap (such as UTF-8, GB18030, BIG5...) and characters encoded on more than one byte. a
is typically encoded on one byte, so you won't see a difference. €
is encoded on 3 bytes in UTF-8 and one byte in ISO8859-15 for instance.
An example (here from zsh
):
$ a=($'\xe2\x82\xac20' '$25' $'\xa420') $ locale charmap UTF-8 $ typeset -p a typeset -a a=( €20 '$25' $'\M-$20' ) $ printf %s "${a[@]}" | wc -c 11 $ printf %s "${a[@]}" | wc -m 8 $ echo ${#${(j[])a}} 9 $ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}} 11
And if I switch to a locale where the charmap is ISO8859-15:
$ locale charmap ISO-8859-15 $ a=($'\xe2\x82\xac20' '$25' $'\xa420') $ typeset -p a typeset -a a=( â¬20 '$25' €20 ) $ printf %s "${a[@]}" | wc -c 11 $ printf %s "${a[@]}" | wc -m 11 $ echo ${#${(j[])a}} 11 $ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}} 11
ISO8859-15 is a single byte character encoding, so character ≍ byte there.
More reading:
¹ similar to what wc -m
does except that bash (or zsh) will also count bytes that can't be decoded into a character as one character each.
echo
you're adding a newline. If you want the actual size in bytes, useecho -n
to avoid adding a newline. This is why an "empty" variable gives 1 when you useecho
and a single character gives 2.