PHP Group associative Array duplicates and make subarrays of different values

Question

I have this type of table:

$this->data = [ 0 => [ 'reference' => 'ABC123', 'title' => 'lorem', 'parent' => 12, ], 1 => [ 'reference' => 'ABC456', 'title' => 'ipsum', 'parent' => 42, ], 2 => [ 'reference' => 'ABC789', 'title' => 'dolor', 'parent' => 36, ], 3 => [ 'reference' => 'ABC123', 'title' => 'lorem', 'parent' => 18, ], 4 => [ 'reference' => 'ABC789', 'title' => 'dolor', 'parent' => 26, ] ];

And I made this script to remove duplicates with keeping different known keys as subarrays:

// Get number of duplicates for each 'reference' of $this->data $occurences = array_count_values(array_column($this->data, 'reference')); // Array to put index to remove of $this->data $to_unset = []; foreach($occurences as $reference => $count) { // If item unique, continue if($count == 1) continue; // Get all indexes of 'reference' inside $this->data $indexes = array_keys(array_column($this->data, 'reference'), $reference); // Keep first index of occurence $first_index = $indexes[0]; // Remove it from indexes unset($indexes[0]); // Keep the first occurence $model = $this->data[$first_index]; // Convert different known keys as array $model['parent'] = [$model['parent']]; foreach($indexes as $index){ // Group 'parent' in same array array_push($model['parent'], $this->data[$index]['parent']); // Put index to remove then array_push($to_unset, $index); } // Replace the first item occurence by model $this->data[$first_index] = $model; } // Remove all occurences $this->data = array_diff_key($this->data, array_flip($to_unset)); // Reindex $this->data = array_values($this->data);

To get this:

$array = [ 0 => [ 'reference' => 'ABC123', 'title' => 'lorem', 'parent' => [12, 18], ], 1 => [ 'reference' => 'ABC456', 'title' => 'ipsum', 'parent' => 42, ], 2 => [ 'reference' => 'ABC789', 'title' => 'dolor', 'parent' => [36, 26], ] ];

But my script is very slow (20 seconds for +13k items), how can I improve it ?

For a given reference, if present multiple times, can it have different titles? — Cid, CommentedSep 6, 2021 at 15:35
What I need is to remove the duplicate data by grouping by the key "reference" and grouping the possible different values in subarrays ;-) — aurepito, CommentedSep 7, 2021 at 6:24
Same basic technique as stackoverflow.com/a/68885107/2943403 (don't mind the revenge vote on my answer -- it's Stack Overflow) — mickmackusa, CommentedSep 7, 2021 at 22:36
@Aur what is the source of this input data? Perhaps it can be restructured in an earlier layer. Or maybe I should ask why you are doing this. Is this an XY Problem? Why do you need all 13000 rows of data? — mickmackusa, CommentedSep 8, 2021 at 21:25
@mickmackusa This input data comes from a data interface which unfortunately I cannot optimize further — aurepito, CommentedSep 9, 2021 at 16:46

Sᴀᴍ Onᴇᴌᴀ · Accepted Answer · 2021-11-29 21:51:33Z

Broad review

It seems this code is part of a class method, yet it is difficult to know anything about the method or class other than the fact that it has a data member.

Consider following PSR-1 and PSR-12.

Indentation is not always consistent - e.g. some lines are indented with two spaces and others appear to have four spaces yet at the same indentation level. For example:

 // Keep the first occurence $model = $this->data[$first_index]; // Convert different known keys as array $model['parent'] = [$model['parent']];

From PSR-12 Section 2. General:

2.4 Indenting
Code MUST use an indent of 4 spaces for each indent level, and MUST NOT use tabs for indenting.

There is a loose equality comparison at the start of the loop:

if($count == 1) continue;

Since both operands in the condition should be integers strict equality comparison (i.e. ===) can be used. It is a good habit to use strict equality operators whenever possible.

Possible optimizations

My initial suggestion was going to be to make a new array with the reference values as the keys and have the values be arrays from the parent values of the original data, however that may likely require a lot more memory than is acceptable.

If memory allows, it could be simplified like this:

$trimmedData = []; foreach ($this->data as $model) { // reference already exists in final array if (isset($trimmedData[$model['reference']])) { $target = &$trimmedData[$model['reference']]; // ensure parent list is an array if (!is_array($target['parent'])) { $target['parent'] = [$target['parent']]; } $target['parent'][] = $model['parent']; } else { //set the index using the reference value $trimmedData[$model['reference']] = $model; } } // reindex, remove index by reference $this->data = array_values($trimmedData);

I noticed that there is a call to array_column($this->data, 'reference') on two lines:

$occurences = array_count_values(array_column($this->data, 'reference'));

as well as this line within the foreach loop:

$indexes = array_keys(array_column($this->data, 'reference'), $reference);

For the sample data those inner calls to get the values in the reference column yield the same value - that could be stored in a variable before the loop and used in those two lines, which would eliminate the call on each iteration of the loop where the count of that value occurs more than once.

Instead of storing $first_index and then calling unset on that index, you could consider doing that in one line using array_splice() though I'm not sure if performance would be better or worse.

Instead of:

// Keep first index of occurence $first_index = $indexes[0]; // Remove it from indexes unset($indexes[0]);

use array_splice():

$first_index = array_splice($indexes, 0, 1)[0];

Thanks ! array_reduce method is better, but still too long (~19 seconds now) — aurepito, CommentedSep 8, 2021 at 14:03
In hindsight I should have mentioned using array_reduce() would most likely be slower because it is executing an additional function for each iteration - perhaps that wasn't worth mentioning — Sᴀᴍ Onᴇᴌᴀ, CommentedSep 8, 2021 at 15:14

Stack Exchange Network

PHP Group associative Array duplicates and make subarrays of different values

1 Answer 1

Broad review

2.4 Indenting

Possible optimizations

Hot Network Questions

PHP Group associative Array duplicates and make subarrays of different values

1 Answer 1

Broad review

2.4 Indenting

Possible optimizations

Related

Hot Network Questions