Merge duplicate keys in associative array BASH

Question

I've got an array that contains duplicate items, e.g.

THE_LIST=( "'item1' 'data1 data2'" "'item1' 'data2 data3'" "'item2' 'data4'" )

Based on the above, I want to create an associative array that would assign itemN as key and dataN as value.

My code iterates over the list, and assigns key => value like this (the additional function is shortened, as it performs some additional jobs on the list):

function get_items(){ KEY=$1 VALUES=() shift $2 for VALUE in "$@"; do VALUES[${#VALUES[@]}]="$VALUE" done } declare -A THE_LIST for ((LISTID=0; LISTID<${#THE_LIST[@]}; LISTID++)); do eval "LISTED_ITEM=(${THE_LIST[$LISTID]})" get_items "${LISTED_ITEM[@]}" THE_LIST=([$KEY]="${VALUES[@]}") done

when I print the array, I'm getting something like:

item1: data1 data2 item1: data2 data3 item2: data4

but instead, I want to get:

item1: data1 data2 data3 item2: data4

Cannot find a way of merging the duplicate keys as well as removing duplicate values for the key.

What would be the approach here?

UPDATE

The actual code is:

THE_LIST=( "'item1' 'data1 data2'" "'item1' 'data2 data3'" "'item2' 'data4'" ) function get_backup_locations () { B_HOST="$2" B_DIRS=() B_DIR=() shift 2 for B_ITEM in "$@"; do case "$B_ITEM" in -*) B_FLAGS[${#B_FLAGS[@]}]="$B_ITEM" ;; *) B_DIRS[${#B_DIRS[@]}]="$B_ITEM" ;; esac done for ((B_IDX=0; B_IDX<${#B_DIRS[@]}; B_IDX++)); do B_DIR=${B_DIRS[$B_IDX]} ...do stuff here... done } function get_items () { for ((LOCIDY=0; LOCIDY<${#LOCATIONS[@]}; LOCIDY++)); do eval "LOCATION=(${LOCATIONS[$LOCIDY]})" get_backup_locations "${LOCATION[@]}" THE_LIST=([$B_HOST]="${B_DIR[@]}") done | sort | uniq }

when printing the array with:

for i in "${!THE_LIST[@]}"; do echo "$i : ${THE_LIST[$i]}" done

I get

item1: data1 data2 item1: data2 data3 item2: data4

Your code as given won't work at all - THE_LIST is already a normal array, so you can't redeclare it as an associative array, and even if you could, you're overwriting it each time in the loop with THE_LIST=([$KEY]="${VALUES[@]}"). — muru, CommentedJun 13, 2019 at 7:34
@muru, so, by what you're saying, I cannot convert an array into associative array, or just not this way? — Bart, CommentedJun 13, 2019 at 7:41
I'm saying that the code has no relation to the output that you say you're getting. — muru, CommentedJun 13, 2019 at 7:42
This is not helping your question, but have you taken a look at python? Complex stuff like this is often easy as hell in python. — Panki, CommentedJun 13, 2019 at 7:52
@Panki, yes, Python or perl might be better approach here, however, I'm adding additional feature to an existing bash script, thus the pain.. it's simply too large to rewrite the whole thing in time. if I don't find a way, I may just as well use another language for the task. — Bart, CommentedJun 13, 2019 at 7:54

muru · Accepted Answer · 2019-06-13 08:50:00Z

If the keys and values are guaranteed to be purely alphanumerical, something like this might work:

declare -A output make_list() { local IFS=" " declare -A keys # variables declared in a function are local by default for i in "${THE_LIST[@]}" do i=${i//\'/} # since everything is alphanumeric, the quotes are useless declare -a keyvals=($i) # split the entry, filename expansion isn't a problem key="${keyvals[0]}" # get the first value as the key keys["$key"]=1 # and save it in keys for val in "${keyvals[@]:1}" do # for each value declare -A "$key[$val]=1" # use it as the index to an array. done # Duplicates just get reset. done for key in "${!keys[@]}" do # for each key declare -n arr="$key" # get the corresponding array output["$key"]="${!arr[*]}" # and the keys from that array, deduplicated done } make_list declare -p output # print the output to check

With the example input, I get this output:

declare -A output=([item1]="data3 data2 data1" [item2]="data4" )

The data items are out of order, but deduplicated.

Might be best to use Python with the csv module instead.

that does the job after some adjustments as bash version on target machine doesn't support namerefs declaration. thanks for pointers! — Bart, CommentedJun 13, 2019 at 9:36
I was too fast being cheerful. this indeed works on a newer bash, but what I thought was a workaround, didn't work out well in the end. I'll most probably rewrite the script, worst case, put an RFC to update bash on a server ;) — Bart, CommentedJun 13, 2019 at 10:46

m0dular · Accepted Answer · 2019-06-21 05:53:05Z

If there is no whitespace in any of the values, this solution might work. Use awk associative arrays to build up declare -A commands.

#!/bin/bash THE_LIST=( "'item1' 'data1 data2'" "'item1' 'data2 data3'" "'item2' 'data4'" ) eval "$(\ for i in "${THE_LIST[@]}"; do row=($(eval echo $i)) echo "${row[@]}" done | awk '{ for (i=2; i<=NF; i++) if (seen[$1] !~ $i) { seen[$1]=seen[$1]$i" " } } END { for (s in seen) print "declare -A new_list["s"]=\""seen[s] }' | sed 's/[[:space:]]*$/"/' )" for i in "${!new_list[@]}"; do echo "$i: ${new_list[$i]}" done

This prints:

item2: data4 item1: data1 data2 data3

The order of the values is preserved, but the keys are reordered. I couldn't figure out how to trim the trailing whitespace of an array entry in awk so I just used sed to replace it with a quote, but it's already a total hack to begin with.

Stack Exchange Network

Merge duplicate keys in associative array BASH

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Merge duplicate keys in associative array BASH

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions