2
\$\begingroup\$

I'm solving a linear programming problem, made of many small problems with some common shared constraints, meaning that they are all together.

Each individual unit has constraints unique to the unit and also global constraints which are influenced by all units. I've implemented this by making the constraint matrix like:

+-------+-----------------------+-----------------------+-----+-----------------------+-----------------------+ | o | Constraint_set1 | Constraint_set2 | ... | Constraint_setN | Global Constraints | +-------+-----------------------+-----------------------+-----+-----------------------+-----------------------+ | Unit1 | Component_Constraints | 0 | ... | 0 | Component_Constraints | | Unit2 | 0 | Component_Constraints | ... | 0 | Component_Constraints | | ... | ... | ... | ... | 0 | Component_Constraints | | UnitN | 0 | 0 | 0 | Component_Constraints | Component_Constraints | +-------+-----------------------+-----------------------+-----+-----------------------+-----------------------+ 

The reasoning for which I've taken from here - https://webgol.dinfo.unifi.it/OptimizationModels/MultiPeriodModels.html

This serves as the A Matrix in Ax <= b

The following code is to add to the above to add further percentage limits to each of the components in each unit.

Setup code:

import numpy as np import pandas as pd from IPython.display import display # %% num_of_comp_id_cols = 2 # Name and Group, but more possible components_str = """Name,Group,OtherVals Component0,ComponentGroup0,4 Component2,ComponentGroup1,2""" limits_string = """Name,Component,Value Item1,Component1,0.2 Group1,Component2,0.1 Item4,ComponentGroup1,0.2 Group2,ComponentGroup0,0.9""" # grouped components and grouped units num_of_unit_id_cols = 2 # Name and Group, but more possible units_string = """Name,Group,OtherVals Item1,Group1,3 Item2,Group2,1 Item3,Group1,4 Item4,Group2,1""" components = pd.DataFrame([x.split(',') for x in components_str.split('\n')[1:]], columns=components_str.split('\n')[0].split(',')) num_components = len(components) limits = pd.DataFrame([x.split(',') for x in limits_string.split('\n')[1:]], columns=limits_string.split('\n')[0].split(',')) num_limits = len(limits) limits['Value'] = limits['Value'].astype(np.float32) units = pd.DataFrame([x.split(',') for x in units_string.split('\n')[1:]], columns=units_string.split('\n')[0].split(',')) num_units = len(units) 

That is the data loaded and setup, the main code is:

# This is per limit: results = [] limits_T = limits.T for limit in limits_T: # Iterate over each limit in limits # Find which units the limit applies to, they could be referenced by name # or by group checks = np.tile(limits_T[limit].Name, (num_units, num_of_unit_id_cols)) impacted_units = units[['Name', 'Group']] == checks impacted_units = impacted_units.any(axis=1) # Impacted units is now 1xlen bool of which units are selected for # current limit # Each unit needs to be in a separate column as in the linear programming # problem they are independent in this aspect. part_eye = impacted_units.values * np.eye(num_units) # Remove the 0 columns: part_eye = part_eye[:, ~(part_eye == 0).all(axis=0)] # The above two steps does something similar to block diag, but not quite.. # Now building the limits for each unit, for percentage limits a linear # programming array can be built with [-a, -a, -a, -a+1].dot(X) <=0 to # constrain the last element to be less than a% of the total. (Similarly # for groups [-a+1, -a, -a, -a+1] would constrain the sum of the first and # last to be <= a%) checks2 = np.tile(limits_T[limit].Component, (num_components, num_of_comp_id_cols)) impacted_components = components[['Name', 'Group']] == checks2 impacted_components = impacted_components.any(axis=1) percentage_vec = np.ones((num_components, 1)) * -limits_T[limit].Value percentage_vec = percentage_vec + impacted_components.values.astype(int)[:, None] # vec now built # use this limit vector for each of the impacted units: single_limit_result = np.kron(part_eye, percentage_vec) results.append(single_limit_result) result = np.concat(results, axis=1) 
# result array([[-0.2, -0.1, -0. , -0. , 0. , 0. ], [-0.2, 0.9, 0. , 0. , -0. , -0. ], [-0. , -0. , -0. , -0. , 0.1, 0. ], [-0. , 0. , 0. , 0. , -0.9, -0. ], [-0. , -0. , -0.1, -0. , 0. , 0. ], [-0. , 0. , 0.9, 0. , -0. , -0. ], [-0. , -0. , -0. , -0.2, 0. , 0.1], [-0. , 0. , 0. , 0.8, -0. , -0.9]]) 

This works and gives the expected results, but the looping over the limits is not elegant. Any feedback on the code in general would be great, and any comments on if there is a more elegant way (and hopefully quicker at scale) way to build the same matrix?

For a more elegant way I currently have the following which calculates the impacted units for all limits:

(units[['Name', 'Group']*num_limits] == np.tile( np.repeat(limits['Name'], num_of_unit_id_cols), (num_units, 1)) ).astype(int) 

But, I don't know of a good way to do something akin to block_diag and get each component constraint on a different column from here. In the for loop, this is completed by multiplying by I, but I don't see how that would work here.

\$\endgroup\$
4
  • \$\begingroup\$There's a lot going on here, but to your credit, your example code is executable. Good job.\$\endgroup\$CommentedDec 21, 2024 at 14:17
  • \$\begingroup\$Which LP solver are you feeding this to? scipy.optimize.milp?\$\endgroup\$CommentedDec 21, 2024 at 21:58
  • \$\begingroup\$Currently using scipy.linprog which as I understand it is essentially milp (nothing currently is integer constrained though)\$\endgroup\$
    – AReubens
    CommentedDec 22, 2024 at 0:35
  • \$\begingroup\$Use milp. linprog has fewer API conveniences.\$\endgroup\$CommentedDec 22, 2024 at 0:36

1 Answer 1

2
\$\begingroup\$

Don't x.split(','). If you want to consume CSV data from a string literal, use StringIO and call read_csv.

You say

the looping over the limits is not elegant

There may be a way to vectorise the whole thing, but that's going to be a lot of work and the payback is dubious. If the whole thing scales up massively, then maybe it would be a different story; but for now I demonstrate a similar loop.

Don't for limit in limits_T. Call iterrows on the (non-transposed) dataframe.

Don't np.tile. In context, you have a row vector, indexing to a variable gives you a scalar, so you can simply broadcast from that scalar on the == operator.

Currently part_eye is dense. I don't think that's a very good idea; your data are sparse, and there's a chance (you haven't stated) that these matrices are directly consumable as scipy.sparse arrays if fed into scipy.optimize.milp. The sparse representation for part_eye can use the direct CSR representation with little effort.

percentage_vec is dense, so I leave it as-is, but don't multiply on a ones(). Instead, make appropriate use of np.full().

Don't index on None. Use the self-describing np.newaxis.

For single_limit_result, you can still use a Kronecker product, but you should switch to the sparse version.

The following produces the same output:

import io import numpy as np import pandas as pd import scipy num_of_comp_id_cols = 2 # Name and Group, but more possible num_of_unit_id_cols = 2 # Name and Group, but more possible with io.StringIO("""Name,Group,OtherVals Component0,ComponentGroup0,4 Component2,ComponentGroup1,2""") as f: components = pd.read_csv(f) num_components = len(components) with io.StringIO("""Name,Component,Value Item1,Component1,0.2 Group1,Component2,0.1 Item4,ComponentGroup1,0.2 Group2,ComponentGroup0,0.9""") as f: limits = pd.read_csv(f) num_limits = len(limits) with io.StringIO("""Name,Group,OtherVals Item1,Group1,3 Item2,Group2,1 Item3,Group1,4 Item4,Group2,1""") as f: units = pd.read_csv(f) num_units = len(units) # This is per limit: results = [] for limit_index, limit_row in limits.iterrows(): # Iterate over each limit in limits # series of boolean, with units index impacted_units = (units[['Name', 'Group']] == limit_row['Name']).any(axis=1) i, = impacted_units.values.nonzero() part_eye = scipy.sparse.csc_array( # signature 5 ( np.ones(i.size), # data i, # indices np.arange(1 + i.size), # indptr ), shape=(num_units, i.size), ) # Now building the limits for each unit, for percentage limits a linear # programming array can be built with [-a, -a, -a, -a+1].dot(X) <=0 to # constrain the last element to be less than a% of the total. (Similarly # for groups [-a+1, -a, -a, -a+1] would constrain the sum of the first and # last to be <= a%) impacted_components = (components[['Name', 'Group']] == limit_row['Component']).any(axis=1) percentage_vec = ( np.full(shape=(num_components, 1), fill_value=-limit_row['Value']) + impacted_components.values.astype(int)[:, np.newaxis] ) # use this limit vector for each of the impacted units: single_limit_result = scipy.sparse.kron(A=part_eye, B=percentage_vec, format='csc') results.append(single_limit_result) result = scipy.sparse.hstack(results) assert np.allclose( result.toarray(), [ [-0.2, -0.1, 0. , 0. , 0. , 0. ], [-0.2, 0.9, 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. , 0.1, 0. ], [ 0. , 0. , 0. , 0. , -0.9, 0. ], [ 0. , 0. , -0.1, 0. , 0. , 0. ], [ 0. , 0. , 0.9, 0. , 0. , 0. ], [ 0. , 0. , 0. , -0.2, 0. , 0.1], [ 0. , 0. , 0. , 0.8, 0. , -0.9], ], rtol=0, atol=1e-7, ) 
\$\endgroup\$

    Start asking to get answers

    Find the answer to your question by asking.

    Ask question

    Explore related questions

    See similar questions with these tags.