2

I'm trying to find a simple way to just test an array for duplicate values. It would be nice, but not completely necessary, to be able to identify the specific lines that have duplicates, but the important point is simply being able to see that there's a duplicate.

I have an array, $key_array, which contains some numbers:

# echo ${key_array[@]} 1 2 3 4 3 3 

This array could have an arbitrary number of numbers, some of which could be duplicates of others. They will be integer numbers only. (Numbers beginning with a 0, such as 03, should not make into the array at all, but in the off-chance that it happens, catching 3 and 03 as a duplicate of each other would be better than treating them as different numbers.)

I need to determine if any of these numbers are duplicates. I was thinking this could be done with an exit code if nothing else. What I was after was something like this:

if $(some command); then echo "Array contains duplicates." exit 1 fi $(commands to run after duplicate check) 

The idea being in the end that the script informs the user and exits if there are duplicates (not super important to identify where the duplicates are, just telling the user to check for duplicates is enough), or if there aren't any duplicates, it proceeds and runs a bunch of other stuff.

How would I best accomplish this?

3
  • Should 3 be considered a duplicate of 03? Is the array of integers only?CommentedAug 25, 2020 at 16:16
  • @Quasímodo it's integers only. numbers beginning with a 0 such as 03 should not make into the array at all, but in the off-chance that it happens, catching them as a duplicate would be better than treating them as different numbers.
    – Kefka
    CommentedAug 25, 2020 at 16:21
  • $() in if $(some command); then is not necessary. The if construct by default takes a list (a sequence of one or more pipelines) and executes them, testing their exit status. Thus, if some command; then is sufficient.CommentedApr 17, 2022 at 20:15

4 Answers 4

5

In the zsh shell:

array=(1 2 3 4 3 3) if (($#array != ${#${(u)array}})); then print -u2 array contains duplicates exit 1 fi 

Where ${(u)array} expands to the unique elements of the array, so we're just comparing the number of elements with the number of unique elements.

The bash shell doesn't have an equivalent, but as its arrays can't contain NUL bytes anyway, if you're on a GNU system, you could do something like:

readarray -td '' dups < <( (( ${#array[@]} == 0 )) || printf '%s\0' "${array[@]}" | LC_ALL=C sort -z | LC_ALL=C uniq -zd ) if ((${#dups[@]} > 0)); then echo >&2 "array has duplicates:" printf >&2 ' - "%s"\n' "${dups[@]}" exit 1 fi 

In those, elements are considered duplicate if they are byte-to-byte identical, not if their numeric value if any is the same (1, 01, 0x1, 1e0, 2-1, $'1\n', ' 1' are all considered different).

    2

    Assuming arr contains only integers and that zero padded numbers should be considered duplicates (e.g., 01 is a duplicate of 1), we can use a second array to keep the values already "seen" when parsing each element of the first array arr.

    #!/bin/bash arr=(1 2 3 4 3 3) seen=() for i in "${arr[@]}"; do #Remove padding zeroes, if any i=$((10#$i)) # If element of arr is not in seen, add it as a key to seen if [ -z "${seen[i]}" ]; then seen[i]=1 else echo "Array contains a duplicate." break fi done 
    8
    • 3
      Note that that assumes array elements are decimal integers (there's also the question of whether 01 and 1 should be deemed duplicate of each other).CommentedAug 25, 2020 at 15:55
    • 1
      Since Bash 4.0 you may use an associative array to not have subscripts treated as arithmetic expressions. (That wouldn't properly treat them as "numbers" either, of course).
      – fra-san
      CommentedAug 25, 2020 at 16:24
    • 1
      Note that since bash interprets numbers with leading 0s as octal, 010 will be considered the same as 8, not 10.CommentedAug 25, 2020 at 16:43
    • 1
      Note that the i=$((10#$i)) work around doesn't work for negative numbers, where 10#-010 is interpreted as 10#0 - 010, so 0 - 8 so -8.CommentedAug 25, 2020 at 18:17
    • 1
      @ilkkachu, with the caveat that bash associative arrays don't support empty strings in their keys (which can be worked around by adding a prefix for instance).CommentedAug 25, 2020 at 18:18
    1

    If you need it to work in Bash 3.X, you could use uniq:

    IFS=$'\n' sort <<<"${key_array[*]}" | uniq -d; unset IFS

    This will return with, and only with, all duplicate elements of the array.

    Description

    1. IFS=$'\n' sets the internal field separator to a new line character, ensuring that "${key_array[*]}" will expand into a single line per array element.
    2. <<< is a here string that feeds the output of "${key_array[*]}" into the standard input of sort.
    3. sort well, sorts.
    4. uniq -d outputs "...a single copy of each line that is repeated in the input." (from man uniq).
    5. unset IFS is just good business, and resets IFS back to its default.
    2
    • 1
      (1) You should explain exactly you believe that why the Bash version matters.  (2) This is very similar to the Bash part of Stéphane Chazelas’s answer.  (2b) As his answer foreshadows, sort and uniq may produce unexpected results in some locales.  (3) You say “unset IFS is just good business …”  I disagree.  This snippet may be used in a large script where IFS has been changed to something non-standard.  … (Cont’d)CommentedApr 14, 2022 at 5:11
    • 1
      (Cont’d) …  (See also this.)  (3b) And worst of all, you don’t need to reset IFS.  Your command modifies IFS only for the scope of the sort command.  (4) The question asks for a test that can produce a yes-or-no answer in a script.  How does this produce a yes-or-no answer in a script? … … … (5) P.S. Thanks for including the explanation.CommentedApr 14, 2022 at 5:11
    0

    Assuming that your key_array array only ever contains whole numbers (positive integers), we may use the fact that ordinary arrays are sparse in the bash shell. The following code is looping over the array of keys while instantiating elements in a regular array until we find a key that we have already processed:

    key_array=( '09' 1 2 3 4 3 3 '04' '001' '07' ) has_dupes () ( unset -v a for key do ${a[10#$key]+'return'} # execute "return" if a[10#$key] is set a[10#$key]= # set a[10#$key] to empty string done return 1 ) if has_dupes "${key_array[@]}"; then echo 'array has dupes' else echo 'array has no dupes' fi 

    This introduces a utility function, has_dupes, that takes a list of whole numbers, and returns zero if there is a duplicate value in the list and non-zero if there are no duplicated values.

    The standard parameter expansion ${variable+word} is used to insert the word return if a[10#$key] was previously set. When return is substituted, it terminates the function's execution and returns a zero exit status to the caller signifying that we found a duplicate value. The index 10#$key means "the value $key interpreted as a base 10 integer" and allows us to equate keys like 03 and 3.

      You must log in to answer this question.

      Start asking to get answers

      Find the answer to your question by asking.

      Ask question

      Explore related questions

      See similar questions with these tags.