My solution was to first convert the xml to json; xml_to_json
Then I defined:
-- converts xml from data in spreadsheet templates table into a table via json CREATE OR REPLACE FUNCTION public.xml_to_table(xml) RETURNS TABLE(sheetname text, attributename text, attributevalue text) LANGUAGE 'sql' COST 100 VOLATILE PARALLEL UNSAFE ROWS 1000 AS $BODY$ -- from the records returned by the subquery below this returns records with columns SheetName, -- attribute name (address) and attribute value e.g.: -- Sheet1 E54 3 -- Sheet1 G23 1.1 -- Sheet1 N87 0 -- Sheet2 W32 thing -- ... select e.sheetname, jsonb_object_keys(e.attr) as attributename, e.attr ->> jsonb_object_keys(e.attr) as attributevalue from ( -- removes the rows with null for the list of attributes from the results from the subquery under this, and -- separates each attribute to its own row e.g.: -- Sheet1 {"E54": "3"} -- Sheet1 {"G23": "1.1"} -- Sheet1 {"N87": "0"} -- Sheet2 {"W32": "thing"} -- ... select d.sheetname, jsonb_array_elements(d.exceldata) as attr from ( -- separates each line from the subquery under this into records containging columns for sheetname, and -- this can handle xml having more than one element at the addresses level -- (e.g. it can handle a NamedeCells element alongside Addresses ) -- Sheet1 [{"E54": "3"}, ... -- Sheet2 [{"W32": "thing"}] -- ... select b.sheetname, b.records -> jsonb_object_keys(b.records) -> 'attr' as exceldata from ( -- separates each line from the subquery under this into records with columns for sheetname and -- a row for the JSON for each of Addresses e.g.: -- Sheet1 {Addresses: {attr: ... -- Sheet2 {Addresses: {attr: ... -- ... select jsonb_object_keys(a.sheetjson) as sheetname, jsonb_array_elements((a.sheetjson->jsonb_object_keys(a.sheetjson) -> 'childs')) as records from ( -- separates the supplied xml into json records for each sheet e.g.: -- {Sheet1: {attr: ... -- {Sheet2: {attr: ... -- ... select jsonb_array_elements(xml_to_json($1)->'Sheets'->'childs') as sheetjson ) as a ) as b ) as d ) as e; $BODY$;
this can be called with
select * from xml_to_table('<Sheets> <Sheet1> <Addresses E54="3" G23="1.1" N87="0"/> </Sheet1> <Sheet2> <Addresses W32="thing"/> </Sheet2> </Sheets>')
to produce
sheetname | attributename | attributevalue |
---|
Sheet1 | E54 | 3 |
Sheet1 | G23 | 1.1 |
Sheet1 | N87 | 0 |
Sheet2 | W32 | thing |
This function is not a general as I'd like but it will suffice for my data clean up needs
Any comments welcome.