Parse YAML file with nested parameters as a Python class object

Question

I would like to use a YAML file to store parameters used by computational models developed in Python. An example of such a file is below:

params.yaml

reactor: diameter_inner: 2.89 cm temperature: 773 kelvin gas_mass_flow: 1.89 kg/s biomass: diameter: 2.5 mm # mean Sauter diameter (1) density: 540 kg/m^3 # source unknown sphericity: 0.89 unitless # assumed value thermal_conductivity: 1.4 W/mK # based on value for pine (2) catalyst: density: 1200 kg/m^3 # from MSDS sheet sphericity: 0.65 unitless # assumed value diameters: [[86.1, 124, 159.03, 201], microns] # sieve screen diameters surface_areas: values: - 12.9 - 15 - 18 - 24.01 - 31.8 - 38.51 - 42.6 units: square micron

Parameters for the Python model are organized based on the type of computations they apply to. For example, parameters used by the reactor model are listed in the reactor section. Units are important for the calculations so the YAML file needs to convey that information too.

I'm using the PyYAML package to read the YAML file into a Python dictionary. To allow easier access to the nested parameters, I use an intermediate Python class to parse the dictionary values into class attributes. The class attributers are then used to obtain the values associated with the parameters. Below is an example of how I envision using the approach for a much larger project:

params.py

import yaml class Reactor: def __init__(self, rdict): self.diameter_inner = float(rdict['diameter_inner'].split()[0]) self.temperature = float(rdict['temperature'].split()[0]) self.gas_mass_flow = float(rdict['gas_mass_flow'].split()[0]) class Biomass: def __init__(self, bdict): self.diameter = float(bdict['diameter'].split()[0]) self.density = float(bdict['density'].split()[0]) self.sphericity = float(bdict['sphericity'].split()[0]) class Catalyst: def __init__(self, cdict): self.diameters = cdict['diameters'][0] self.density = float(cdict['density'].split()[0]) self.sphericity = float(cdict['sphericity'].split()[0]) self.surface_areas = cdict['surface_areas']['values'] class Parameters: def __init__(self, file): with open(file, 'r') as f: params = yaml.safe_load(f) # reactor parameters rdict = params['reactor'] self.reactor = Reactor(rdict) # biomass parameters bdict = params['biomass'] self.biomass = Biomass(bdict) # catalyst parameters cdict = params['catalyst'] self.catalyst = Catalyst(cdict)

example.py

from params import Parameters pm = Parameters('params.yaml') # reactor d_inner = pm.reactor.diameter_inner temp = pm.reactor.temperature mf_gas = pm.reactor.gas_mass_flow # biomass d_bio = pm.biomass.diameter rho_bio = pm.biomass.density # catalyst rho_cat = pm.catalyst.density sp_cat = pm.catalyst.sphericity d_cat = pm.catalyst.diameters sa_cat = pm.catalyst.surface_areas print('\n--- Reactor Parameters ---') print(f'd_inner = {d_inner}') print(f'temp = {temp}') print(f'mf_gas = {mf_gas}') print('\n--- Biomass Parameters ---') print(f'd_bio = {d_bio}') print(f'rho_bio = {rho_bio}') print('\n--- Catalyst Parameters ---') print(f'rho_cat = {rho_cat}') print(f'sp_cat = {sp_cat}') print(f'd_cat = {d_cat}') print(f'sa_cat = {sa_cat}')

This approach works fine but when more parameters are added to the YAML file it requires additional code to be added to the class objects. I could just use the dictionary returned from the YAML package but I find it easier and cleaner to get the parameter values with a class interface.

So I would like to know if there is a better approach that I should use to parse the YAML file? Or should I organize the YAML file with a different structure to more easily parse it?

Maarten Fabré · Accepted Answer · 2018-04-18 14:05:21Z

you could use a nested parser using pint to do the unit parsing

from pint import UnitRegistry, UndefinedUnitError UNITS = UnitRegistry() def nested_parser(params: dict): for key, value in params.items(): if isinstance(value, str): try: value = units.Quantity(value) except UndefinedUnitError: pass yield key, value if isinstance(value, dict): if value.keys() == {'values', 'units'}: yield key, [i * UNITS(value['units']) for i in value['values']] else: yield key, dict(nested_parser(value)) if isinstance(value, list): values, unit = value yield key, [i * UNITS(unit) for i in values] dict(nested_parser(yaml.safe_load(params)))

{'reactor': {'diameter_inner': <Quantity(2.89, 'centimeter')>, 'temperature': <Quantity(773, 'kelvin')>, 'gas_mass_flow': <Quantity(1.89, 'kilogram / second')>}, 'biomass': {'diameter': <Quantity(2.5, 'millimeter')>, 'density': <Quantity(540.0, 'kilogram / meter ** 3')>, 'sphericity': <Quantity(0.89, 'dimensionless')>, 'thermal_conductivity': <Quantity(1.4, 'watt / millikelvin')>}, 'catalyst': {'density': <Quantity(1200.0, 'kilogram / meter ** 3')>, 'sphericity': <Quantity(0.65, 'dimensionless')>, 'diameters': [<Quantity(86.1, 'micrometer')>, <Quantity(124, 'micrometer')>, <Quantity(159.03, 'micrometer')>, <Quantity(201, 'micrometer')>], 'surface_areas': [<Quantity(12.9, 'micrometer ** 2')>, <Quantity(15, 'micrometer ** 2')>, <Quantity(18, 'micrometer ** 2')>, <Quantity(24.01, 'micrometer ** 2')>, <Quantity(31.8, 'micrometer ** 2')>, <Quantity(38.51, 'micrometer ** 2')>, <Quantity(42.6, 'micrometer ** 2')>]}}

You might need to make your units understandable for pint, but for me that just meant changing the microns to µm and square micron to µm², and unitless to dimensionless

using this

statically

configuration = dict(nested_parser(yaml.safe_load(params))) # reactor reactor_config = configuration['reactor'] d_inner = reactor_config['diameter_inner'] temp = reactor_config['temperature'] mf_gas = reactor_config['gas_mass_flow'] print('\n--- Reactor Parameters ---') print(f'd_inner = {d_inner}') print(f'temp = {temp}') print(f'mf_gas = {mf_gas}')

dynamically

for part, parameters in nested_parser(yaml.safe_load(params)): print(f'--- {part} Parameters ---') for parameter, value in parameters.items(): print(f'{parameter} = {value}') print('\n')

you can check out the pintdocumentation on string formatting to format the units the way you want

My next step is to incorporate Pint so thank you for the example. Can you also comment on how to utilize the your approach in a Python script? In my example I use the class objects in params.py to read the YAML dictionary and assign the values to attributes. Then I refer to those classes in the example.py script. Would this approach work with pint? Or is there a different approach I should use? — wigging, CommentedApr 17, 2018 at 16:29
pint works with this approach. The value of the attributes are not instances of pint.Quantity, so the handling of string methods and so will change, but fundamentally these quantities are no different than floats and ints. You can reform your classes to accept the dict of parameters, and use setattr to set the attributes dynamically. Note that using a class to only hold the parameter values is a bit overkill, and a dict will suffice for that purpose — Maarten Fabré, CommentedApr 18, 2018 at 7:17
I agree that using a class is overkill. Can you provide an example of how to get the values from the dictionary? I’m thinking that a function like get_value(‘density’) would work but how would I define which density? — wigging, CommentedApr 18, 2018 at 12:14
One more question. In yaml.safe_load(params) what is params? Is it a string representing the path to the yaml file? — wigging, CommentedApr 19, 2018 at 0:40
nested_parser takes any dict in the as in the yaml file, so params can be the file or a yaml string — Maarten Fabré, CommentedApr 19, 2018 at 19:50

l0b0 · Accepted Answer · 2018-04-17 01:57:49Z

If you split the configuration fields into magnitude and unit (as you've already done for surface_areas) you won't have to split and parse them in code.
If you then convert your configuration to JSON you won't need to convert strings to numbers. JSON strings must be quoted, and numbers must be unquoted, so the json module will simply do those conversions for you.

Other than that:

Configuration handling should be separate from building other objects - that way it's easy to use your code whether the configuration comes from a file or from command-line parameters.
Accessing properties two levels deep (such as pm.biomass.diameter) violates the Law of Demeter. You could write for example an as_parameter_list for each class to get a representation like f'rho_cat = {rho_cat}' etc.

I'm not interested in using JSON for the parameters file because it does not support comments. I plan to use comments to add more information about certain parameters. I also feel that the YAML format is more readable than JSON. Can you provide an example of the configuration handling you mentioned? — wigging, CommentedApr 17, 2018 at 2:13
Nothing open source off the top of my mind, but I bet any large project that allows configuration either via files or via command line arguments do this. — l0b0, CommentedApr 17, 2018 at 2:16

Stack Exchange Network

Parse YAML file with nested parameters as a Python class object

2 Answers 2

using this

statically

dynamically

Hot Network Questions

Parse YAML file with nested parameters as a Python class object

2 Answers 2

using this

statically

dynamically

Related

Hot Network Questions