Vectorized approach for generating random numbers with specified count and sum

Question

I'm looking to optimize my code by avoiding the use of a for loop, in order to improve performance when working with large data frames controlled by N_H (e.g., N_H = 700000). The for loop in the original code randomly allocates a given number of resources among individuals in the D_H table, based on their species group (A or B), as defined in the species column.

For each patch (patch_ID), the code first determines how many individuals of each group are present, which is stored in the variables N_A (for group A) and N_B (for group B). These values are calculated using the n() function within each group of individuals, allowing the code to know how many individuals of each species exist in each patch.

Then, for each row in the D_T table, the code generates random integer vectors for each group of individuals (A or B) based on the resources they should receive. These resources are specified by the columns s_aL_A and s_aL_B, which represent the total sum of resources to be allocated for the individuals in group A and group B, respectively. If the number of individuals of a species in a given patch is non-zero (i.e., N_A or N_B is not zero), a function called generateIntegers() is used to generate an integer vector whose sum equals the value of s_aL_A or s_aL_B, respectively. These values represent the distribution of resources among the individuals in each group.

Once these vectors are generated, the resources are assigned to the individuals in the D_H table. This is done by matching the patch using the patch_ID column and then assigning the computed resources for each species using the indices of the individuals in each group. For example, the values generated for s_aL_A will be assigned to individuals in group A in the given patch, and similarly for s_aL_B and individuals in group B.

Here is the code to build the data frames D_H and D_T, and the function generateIntegers (no need to be optimized):

set.seed(1, kind="Mersenne-Twister", normal.kind="Inversion") generateIntegers <- function(size, sum){ ## Randomly generate integers v <- sort(c(0, sample(0:sum, size = size - 1, replace = TRUE), sum)) ## print(v) ## Compute the differences between consecutive integers dv <- diff(v) return(dv) } N_H <- 15 D_H <- data.frame(ID = seq(1, N_H, by = 1), patch_ID = sample(1:10, size = N_H, replace = TRUE), s_aL = NA, i_aL = NA) %>% dplyr::mutate(species = c(rep("A", round(N_H*0.6)), rep("B", round(N_H*0.4)))) %>% dplyr::mutate(keep_species = species) %>% dplyr::group_by(species, patch_ID) %>% dplyr::mutate(N_H_in_patches = n()) %>% tidyr::pivot_wider(names_from = species, values_from = N_H_in_patches, values_fill = 0) %>% dplyr::rename(dplyr::any_of(c(species = "keep_species", N_A = "A", N_B = "B"))) print(D_H) D_T <- data.frame(patch_ID = unique(D_H$patch_ID), s_aL_A = NA, s_aL_B = NA) %>% dplyr::left_join(unique(D_H[, c("species", "patch_ID", "N_A", "N_B")]), by = "patch_ID") %>% dplyr::mutate(s_aL_A = sample(1:10, dim(.)[1], replace = TRUE)) %>% dplyr::mutate(s_aL_B = sample(1:10, dim(.)[1], replace = TRUE)) %>% dplyr::mutate(i_aL_A = sample(1:10, dim(.)[1], replace = TRUE)) %>% dplyr::mutate(i_aL_B = sample(1:10, dim(.)[1], replace = TRUE)) print(D_T)

Here is the code to be optimized. It's an example with N_H = 15.

for(i in 1:dim(D_T)[1]){ ## print(i) s_aL_A <- ifelse(D_T[i, c("N_A")] != 0, list(generateIntegers(size = D_T[i, c("N_A")], sum = D_T[i, c("s_aL_A")])), NA)[[1]] s_aL_B <- ifelse(D_T[i, c("N_B")] != 0, list(generateIntegers(size = D_T[i, c("N_B")], sum = D_T[i, c("s_aL_B")])), NA)[[1]] i_aL_A <- ifelse(D_T[i, c("N_A")] != 0, list(generateIntegers(size = D_T[i, c("N_A")], sum = D_T[i, c("i_aL_A")])), NA)[[1]] i_aL_B <- ifelse(D_T[i, c("N_B")] != 0, list(generateIntegers(size = D_T[i, c("N_B")], sum = D_T[i, c("i_aL_B")])), NA)[[1]] if(all(!is.na(s_aL_A))){D_H[D_H$patch_ID == D_T$patch_ID[i] & D_H$species == "A", c("s_aL"), drop = TRUE] <- s_aL_A} if(all(!is.na(s_aL_B))){D_H[D_H$patch_ID == D_T$patch_ID[i] & D_H$species == "B", c("s_aL"), drop = TRUE] <- s_aL_B} if(all(!is.na(i_aL_A))){D_H[D_H$patch_ID == D_T$patch_ID[i] & D_H$species == "A", c("i_aL"), drop = TRUE] <- i_aL_A} if(all(!is.na(i_aL_B))){D_H[D_H$patch_ID == D_T$patch_ID[i] & D_H$species == "B", c("i_aL"), drop = TRUE] <- i_aL_B} }

Welcome to Code Review! Please edit your question so that the title describes the purpose of the code, rather than its mechanism. We really need to understand the motivational context to give good reviews. It's best to describe what value this code provides to its user. — Toby Speight, CommentedApr 6 at 7:14
First, I would put the check whether N_A=0 into the generateInteger function. This wouldn't change the speed as it is the same number of operation but makes the code shorter and easier. Next the rows are independent of each other. So instead of a for loop you could write a function that takes i as input. You can then evaluate that function at multiple values in parallel. Something vector based would be even better but I don't see a way to do that right now. — quarague, CommentedApr 8 at 11:26

Stack Exchange Network

Vectorized approach for generating random numbers with specified count and sum

0

Hot Network Questions

Vectorized approach for generating random numbers with specified count and sum

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions