0

For example, if I build an array:

$ array1=($(find /etc -mindepth 1 -maxdepth 1 -type d)) $ echo ${#array[@]} 105 $ for each in ${array1[@]}; do echo $each ; done /etc/alternatives /etc/apache2 /etc/apparmor /etc/apparmor.d ... so on and so forth, you get the idea. 

is there a way to use bash to print the size in bytes, not the length? Or do i just have to do the math, manually, based on a count of characters?

I've tried something like this:

$ var2="a" $ echo $var2 | wc -c 2 $ echo $var2 a 

but then this:

$ var2="" $ echo $var2 | wc -c 1 $ echo $var2 

it seems to be that a blank variables existence is 1 byte. but a character count of 1 letter is also 2 characters.

$ echo a | wc -c 2 $ echo a | wc -m 2 $ echo aa | wc -c 3 $ echo aa | wc -m 3 

looks like a new line is one byte, each character is one byte. It seems difficult to take an array, count the new lines, count the characters, then do math. Am i over thinking this or is there a utility that gives me an accurate number, already?

3
  • 1
    For an array are you looking for the size of the memory allocated to store the array in bytes, or the total byte length of all strings in the array? Also, keep in mind that some character encodings have characters that take up multiple byes (though wc -c counts the bytes, wc -m characters). By using echo you're adding a newline. If you want the actual size in bytes, use echo -n to avoid adding a newline. This is why an "empty" variable gives 1 when you use echo and a single character gives 2.
    – frabjous
    CommentedMay 9, 2022 at 22:51
  • I suppose it's more about the bytes allocated in memory. The reason being is, I'm building some arrays based on large directories and I was trying to be aware of how many bytes of memory this action could take. I've got a file server with something like 150,000 folders per year over about 6 years with each folder containing about 10 files. building arrays of the full path of the filenames is (my best estimation) 68Mb of text.CommentedMay 9, 2022 at 23:35
  • Here's a stack overflow question that may be relevant, but are you sure you want to handle a job like that in bash?
    – frabjous
    CommentedMay 10, 2022 at 0:40

1 Answer 1

5
array1=($(find /etc -mindepth 1 -maxdepth 1 -type d)) 

Is wrong as it performs split+glob on the output of find to get the list (and the output of find without -print0 is not post-processable anyway). The correct syntax in bash (4.4+) would be:

readarray -td '' array1 < <(find /etc -mindepth 1 -maxdepth 1 -type d -print0) 

Or in zsh:

array1=(/etc/*(ND/)) 

In echo $var | wc -c

You're counting the number of bytes in the output of echo. That's not the number of bytes in $var for several reasons:

  • you forgot to quote $var so it's subject to split+glob
  • echo does some transformations. Some implementations expand \x escape sequences, some treat values like -n as options
  • finally, echo append a newline character to the output (-n can skip that with some echo implementations).

Here, to use wc to count the bytes, you'd do:

printf %s "$var" | wc -c 

In bash, ${#var} expands to the number of characters in the variable¹. For it to be the number of bytes, you can fix the locale to C:

LC_ALL=C echo "${#var}" 

To get the sum of the length in byte of all the elements of an array, you could concatenate them and then get the length of the resulting string:

printf %s "${array[@]}" | wc -c 

Or:

IFS= concat="${array[*]}" LC_ALL=C echo "${#concat}" 

With zsh, you could do:

() { set -o localoptions +o multibyte echo ${#${(j[])array}} } 

Where the j[sep] parameter expansion flag is used to join the elements of the array instead of using "${array[*]}" which uses the global $IFS. Instead of fixing the locale to C we can just disable the multibyte option to get characterbyte (which we do here locally in an anonymous function).

Note that to see the difference between byte and character, you need a locale that uses a multibyte encoding as its charmap (such as UTF-8, GB18030, BIG5...) and characters encoded on more than one byte. a is typically encoded on one byte, so you won't see a difference. is encoded on 3 bytes in UTF-8 and one byte in ISO8859-15 for instance.

An example (here from zsh):

$ a=($'\xe2\x82\xac20' '$25' $'\xa420') $ locale charmap UTF-8 $ typeset -p a typeset -a a=( €20 '$25' $'\M-$20' ) $ printf %s "${a[@]}" | wc -c 11 $ printf %s "${a[@]}" | wc -m 8 $ echo ${#${(j[])a}} 9 $ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}} 11 

And if I switch to a locale where the charmap is ISO8859-15:

$ locale charmap ISO-8859-15 $ a=($'\xe2\x82\xac20' '$25' $'\xa420') $ typeset -p a typeset -a a=( â¬20 '$25' €20 ) $ printf %s "${a[@]}" | wc -c 11 $ printf %s "${a[@]}" | wc -m 11 $ echo ${#${(j[])a}} 11 $ (){set -o localoptions +o multibyte; echo ${#${(j[])a}}} 11 

ISO8859-15 is a single byte character encoding, so characterbyte there.

More reading:


¹ similar to what wc -m does except that bash (or zsh) will also count bytes that can't be decoded into a character as one character each.

    You must log in to answer this question.

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.