I have a class of (flat) objects that are going to be passed around between three different parties, each possibly running a different software stack.
my_object: item: "string_a" money: amount: 99.9 currency: "EUR" metadata: 5 origin: "https://www.example.com/path"
These objects will need to be signed and the signatures will need to be verifiable.
In order for the signed hash to be stable from one party to another, there has to be an exact canonical string representation.
EDIT:
For example, both the following are valid string representations of the above object, but they'll have different hashes.
<my_object xmlns="https://www.TBD.com"> <item>string_a</item> <money> <amount>99.9</amount> <currency>EUR</currency> </money> <metadata>5</metadata> <origin>https://www.example.com/path</origin> </my_object>
<my_object xmlns="https://www.TBD.com"> <item>string_a</item> <money><amount>99.9</amount><currency>EUR</currency></money> <metadata>5.0</metadata> <origin>https://www.example.com/path</origin> </my_object>
All parties will be parsing the objects, so it's the values that need to be verifiable in the signature process. In the above example, the data that needs to be signed is
{"string_a", 99.9, "EUR", 5, "https://www.example.com/path"}.
Of course there must be no ambiguity about which value goes with which key; for example it must be clear that 5 is the metadata
and not the amount
.
That said, it seems reasonable to have the original serialization of the data treated as a data-value in-itself, and the contained data parsed out of it as needed. This would solve the above problem of a byte-perfect canonical version.
END EDIT
I think it's a good idea to define the "outer" signed object in XML.
<signed_object xmlns="https://www.TBD.com"> <data>string representation of the inner data object</data> <signature>RSA signature of the SHA512 hash of the value in `data`</signature> </signed_object>
Clearly I have a variety of options for what should go in that inner <data>
tag.
- An escaped XML string would work.
- We know all the parties have xml parsing set up.
- We could help with verification by providing an XSD.
- The relatively large size of an xml string isn't a big deal because this will all get compressed.
- On the other hand, it would be a pain for a human to look at an xml-escaped xml string and understand what was going on.
- Also, a naive implementer might try to rebuild the XML from the contained data, get a different hash, and decline a valid signature.
- JSON and Yaml might be slightly better than XML for human readability, but would have the same problem that "equivalent" objects could have different hashes.
- A delimited string (with commas, pipes, etc) would more be human-readable, and would clearly be string data in itself.
- Taking that idea even farther, we could provide a canonical regex with capture groups for unambiguously validating and parsing these strings.
- And finally, we could decide not to worry about having a canonical string version of each such object, and instead have a really well defined process for serializing the data for hashing.
- This would look the best.
- This would be the easiest to screw up.
EDIT:
Expanding on that last option (because it's the one I've actually seen done in the wild):
- Simply concatenating the values in an explicit order probably won't work: what if
item
ends with numeric characters? - Concatenating the keys and values in an explicit order would probably work, the only problems I can think of are ambiguities in how to represent numbers (
5
vs5.0
), or possibly conversions between UTF-8/UTF-32/ASCII.
I like the idea of defining the string format using a regex. Is that a bad idea for any reason?
EDIT: I'll be asking other people to implement parts of this system, on a variety of platforms. What system will be the easiest and/or most reliable to implement, in general?
"string_a"
,99.
,"EUR"
, etc.? Both are valid use cases: The first case allows you to verify an object by comparing its serialization string, e.g., the XML, against its hash, which can be done without actually parsing the XML. The second case allows you to verify the object only after deserialization, which means that this is completely independent from the serialization format and you can use (and even mix) XML, JSON, etc.