How to extract indeterminate attributes from xml in a postgres db?

Question

I've inherited a legacy DB in which excel data is stored in a text column of a table in a postgres DB. A value from that column might look like:

<Sheets> <Sheet1> <Addresses E54="3" G23="1.1" N87="0"/> </Sheet1> <Sheet2> <Addresses W32="thing"/> </Sheet2> </Sheets>

I know I can pick out values of specific adresses with

select xpath( '//Addresses/@E54', cast(ssd.data as xml)) from spreadsheetdata ssd

but I have no idea how many distinct addresses exist.

What I'm hoping to do is produce a table looking like:

sheet address value Sheet1 E54 "3" Sheet1 G23 "1.1" Sheet1 N87 "0" Sheet2 W32 "thing" ...

How do I do that?

Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. — Community, CommentedAug 6, 2024 at 8:14

Gareth Vaughan · Accepted Answer · 2024-08-07 23:14:37Z

My solution was to first convert the xml to json; xml_to_json

Then I defined:

-- converts xml from data in spreadsheet templates table into a table via json CREATE OR REPLACE FUNCTION public.xml_to_table(xml) RETURNS TABLE(sheetname text, attributename text, attributevalue text) LANGUAGE 'sql' COST 100 VOLATILE PARALLEL UNSAFE ROWS 1000 AS $BODY$ -- from the records returned by the subquery below this returns records with columns SheetName, -- attribute name (address) and attribute value e.g.: -- Sheet1 E54 3 -- Sheet1 G23 1.1 -- Sheet1 N87 0 -- Sheet2 W32 thing -- ... select e.sheetname, jsonb_object_keys(e.attr) as attributename, e.attr ->> jsonb_object_keys(e.attr) as attributevalue from ( -- removes the rows with null for the list of attributes from the results from the subquery under this, and -- separates each attribute to its own row e.g.: -- Sheet1 {"E54": "3"} -- Sheet1 {"G23": "1.1"} -- Sheet1 {"N87": "0"} -- Sheet2 {"W32": "thing"} -- ... select d.sheetname, jsonb_array_elements(d.exceldata) as attr from ( -- separates each line from the subquery under this into records containging columns for sheetname, and -- this can handle xml having more than one element at the addresses level -- (e.g. it can handle a NamedeCells element alongside Addresses ) -- Sheet1 [{"E54": "3"}, ... -- Sheet2 [{"W32": "thing"}] -- ... select b.sheetname, b.records -> jsonb_object_keys(b.records) -> 'attr' as exceldata from ( -- separates each line from the subquery under this into records with columns for sheetname and -- a row for the JSON for each of Addresses e.g.: -- Sheet1 {Addresses: {attr: ... -- Sheet2 {Addresses: {attr: ... -- ... select jsonb_object_keys(a.sheetjson) as sheetname, jsonb_array_elements((a.sheetjson->jsonb_object_keys(a.sheetjson) -> 'childs')) as records from ( -- separates the supplied xml into json records for each sheet e.g.: -- {Sheet1: {attr: ... -- {Sheet2: {attr: ... -- ... select jsonb_array_elements(xml_to_json($1)->'Sheets'->'childs') as sheetjson ) as a ) as b ) as d ) as e; $BODY$;

this can be called with

select * from xml_to_table('<Sheets> <Sheet1> <Addresses E54="3" G23="1.1" N87="0"/> </Sheet1> <Sheet2> <Addresses W32="thing"/> </Sheet2> </Sheets>')

to produce

sheetname	attributename	attributevalue
Sheet1	E54	3
Sheet1	G23	1.1
Sheet1	N87	0
Sheet2	W32	thing

This function is not a general as I'd like but it will suffice for my data clean up needs

Any comments welcome.

Stack Exchange Network

How to extract indeterminate attributes from xml in a postgres db?

1 Answer 1

Hot Network Questions

How to extract indeterminate attributes from xml in a postgres db?

1 Answer 1

Related

Hot Network Questions