Data format for binary data transfer

Question

I have to make an oscilloscope. Ideally, the oscilloscope connects to a probe. This probe should be another program, and could get the data locally or from a network. This data are numbers, floats and/or integers. The number of channels may vary. And they change with time. Not all channels need to have the same frequency.

Now, sometimes the frequency can be 1KHz, so using json is not ok because is wastes a lot of resources since is based in text. I'm looking for a binary data format, so the data transfer is faster.

So the first part of the question is: Do you know of a format to transfer binary data like this? It should have some kind of header, and then just numbers. My guess is that somebody must have had this problem before and there should be some kind of library to use this.

Then comes the second part of the question: What protocol should we use to transfer the data?

Maybe using json but with a base 64 encoder for the data using standart http calls could work, but for 1KHz signal that doesn't make sense. I wonder what people do to solve this.

Protobuf? Also, why would you use HTTP? If they are on the same network, TCP works fine (or even lower level if it's a 1-1 connection) — Ordous, CommentedMay 12, 2015 at 16:29
You are right, TCP is fine. I only wrote HTTP because is what I've worked before. I'll take a look at protobuf. — cauchi, CommentedMay 12, 2015 at 18:49
I'm doing the GUI with pyside (the python bidings for qt). The probe can be done in python or C. Or whatever comes in handy really. — cauchi, CommentedMay 12, 2015 at 20:12
If you are worried about size and have the processing power to spare, you can use Huffman compression or zlib to make your data smaller. You can also set up your protocol such that your most common message types are encoded with fewer bits. Also, you can do things like have "short" versions of commands. Like if you have a command that has a uint32 as a parameter but is usually less than 16, you could have a short version of the same command which took a 4 bit parameter. — Alan Wolfe, CommentedMay 13, 2015 at 3:40

shawty · Accepted Answer · 2015-05-12 22:06:33Z

You might also want to consider looking up the subject of "ber tlv"

http://en.wikipedia.org/wiki/Type-length-value

This is the data stream format used by the worlds financial processing network, and is the backbone behind Chip & Pin terminals, ATM's and many other banking systems.

The best part about it is it's very simple to use.

The T,L,V part stands for

Tag
Length
Value

The Tag is a numerical identifier that specify's the type of message, in your case if your just making use of the format, you can most likely define your own tags for this, but if you where actually handling real financial data you'd have to use the tag numbers defined in the standard.

The Length part is again simple, it specify's how many bytes of data the value part is, like any value the more bytes you allocate to the length, the longer your payload can be.

The Value part is then just the payload for your actual message, and how you format the bytes for this is entirely up to you.

Beacuse it's such a simple to decode/encode protocol (You can do it entirely using byte arrays) you can easily sent it over UDP using a very small, very fast packet size.

It's also suitable for streaming, and multiple messages can be sent back to back in a continuous non closing connection, which is quite ideal if you decide you must use TCP.

It would in theory work well over HTTP too using a web socket, but I've never actually tried that so I couldn't comment on how well.

As for libraries supporting it, last time I looked "Craig's Utility Library"

https://github.com/JaCraig/Craig-s-Utility-Library

had very good support for TLV based structures in it, as do many of the smartcard libraries that are floating around (TLV is used on a lot of cards too)

If TLV is not your thing, then I'd definately back up what others have said and take a good close look at "Protobuf"

Update

I don't know what language your working in, or even if what I'm about to post will be of any use to you :-) , but here goes anyway.

This is a TLV decoder (Note it has no encoding capabilities, but it should be easy to reverse it) I wrote back in 2008(ish) using C# , it was designed for decoding ber-tlv packets coming off a smart card in a payment terminal, but it might serve as a starting point for you to hack it into a more useful shape.

using System; using System.Collections.Generic; using System.Linq; using System.Text; namespace Card_Analyzer { public class tlv { public int tag = 0; public int length = 0; public byte tagClass = 0; public byte constructed = 0; public List<byte> data = new List<byte>(); } public class tlvparser { // List of found TLV structures public List<tlv> tlvList = new List<tlv>(); // Constructor public tlvparser(byte[] data) { if (data != null) { this.doParse(data); } } // Main parsing function public void doParse(byte[] data) { int fulltag = 0; int tlvlen = 0; int dptr = 0; while (dptr < data.Length) { byte temp = data[dptr]; int iclass = temp & 0xC0; int dobj = temp & 0x20; int tag = temp & 0x1F; if (tag >= 31) // Using extracted vars, decide if tag is a 2 byte tag { fulltag = (temp << 8) + data[dptr + 1]; tlvlen = data[dptr + 2]; dptr += 3; } else { fulltag = temp; tlvlen = data[dptr + 1]; dptr += 2; }// End if tag 16 bit if ((tlvlen & 128) == 128) { tlvlen = (tlvlen << 8) + data[dptr]; dptr++; } tlv myTlv = new tlv(); myTlv.tag = fulltag; myTlv.length = tlvlen; myTlv.tagClass = Convert.ToByte(iclass >> 6); myTlv.constructed = Convert.ToByte(dobj >> 5); for (int i = 0; i < tlvlen; i++) { if(dptr < data.Length) myTlv.data.Add(data[dptr++]); } if (myTlv.constructed == 1) this.doParse(myTlv.data.ToArray()); tlvList.Add(myTlv); }// End main while loop }// End constructor }// end class tlvparser }// end namespace

If it's of no use, then feel free to just ignore it.

TLV allows for very compact data encoding. One problem I see with TLV is that it requires to know the length in advance. It is not possible to write the response to the socket while it is being generated. You need to collect it and then generate the response. It might prevent some optimizations, as I understand performance is critical. PS: or you have to send small independent blocks. — Florian F, CommentedFeb 17, 2021 at 10:00
Indeed, if you don't know the length in advance, then it may not be an appropriate transmission protocol to use. However, you need to remember 2 things 1) TLV can be streamed, so you can easily make a guestimate, fix your packet size at too large, 2) TLV is realistically only recommended for small payloads anyway, it was originally invented for use in financial services environments to send payment data around a banks merchant system. The golden rule always applies, "correct tool for the job" which is something you only find after looking at all the suggestions and weighing up the pros & cons — shawty, CommentedFeb 17, 2021 at 12:32

heltonbiker · Accepted Answer · 2015-05-12 17:12:10Z

There is a file format called EDF, intended to serialize a stream of multi-channel, multi-sample-rate data. Using a file structure as example is good because it is a stream, just like a network stream or any other byte stream between the probe and the oscilloscope.

The idea of the EDF format is to multiplex the various data channels into data "blocks", where each block contain a given amount of time from each channel, and blocks are saved serially into file.

The file itself contains a file header with info about the file and the channels, then N channel headers containing info about each channel, and then you have only raw data. So, if hypothetically you lose header data, you could not (at least easily) recompose the signals.

At last, the numeric part (the raw data) of this file format is simple because it is rigid: every value is of type Int16 (short integer, two bytes, little-endian).

Now let's see how this file format could be useful in the context of network transfer:

Network transfer assumes you have a session, this session is configured (that is, you have a contract between sender and receiver), and session configuration does not change unexpectedly. This would mean you can get info from your probe (or conversely, set it up), and this config info should be enough for the receiver to know the structure of each data-frame. So, you would have two moments: one to configure the data-transfer (sending the "header" data between probe and oscilloscope), and the data-transfer itself (serially sending data-frames with a given format).
Now for the data-frame, it is (as usual) a byte stream/array with a given length, and it should encode the information someway. Usually, you use something similar to Type-Length-Value, where "Type" is a field of a predefined length describing "what" is coming, "Length" is another field describing how many bytes the following Length field will have, and finally the "Value" itself is the data itself.

From your question, configurable variables seem to be:

Number of channels;
Sampling rate for a given channel;
Numeric type: int, float, double;

I didn't quite understand what you mean by "binary" format, but I believe this would be compared to text (ascii, utf, etc.). Mind that every stream-based transfer would be composed of sequential bytes, and there are standard ways to represent numeric values as an array of bytes (float has 4 bytes, double has 8 bytes, Uint16 (aka short) has 2 bytes, etc) . Also, every language has its library to convert between typed numbers in memory and bytes in a stream.

Hope this helps, and comment something if you feel like, because I am currently working exactely with a custom binary file format for multi-channel data-acquisition in my current project, and we surely could exchange some thoughts about it.

The header is way too big for streaming data in an oscilloscope. The EDF looks like the SEG data formats used in geophysics. Had you heard of the TLV stream data format that @shawty talks about? looks simply and easy to implement. — cauchi, CommentedMay 12, 2015 at 18:51
The idea would not stream the header, but only use its content for "handshaking" between oscilloscope and probe before streaming starts. Then, when you stream, you stream only the raw part, composed of a sequence of frames, each frame containing the appropriate amount of samples from each probe. — heltonbiker, CommentedMay 12, 2015 at 19:31
what if the number of channels change? in my case it could happen. It would be the equivalent of connecting another signal to the probe. You redo the handshake? — cauchi, CommentedMay 12, 2015 at 20:14
EDF is pretty miserable, because of it's limits on record sizes and others. Reasonable for 1992, not so much now. However, it's a reasonable example to look at as a starting point. — whatsisname, CommentedMay 12, 2015 at 20:30
Edf is just mentioned as a starting point, of course its useful ideas can be applied without its rather artificial limitations. @jbcolmenares, anything that changes between consecutive frames should be encoded in each frame, I think, so I would suggest you to create a field in your frame to describe sensor configuration for that frame. Alternatively, you could have something like segments, that is, you have a handshake, then write the start of a segment (with some parameters, then a datastream, then an end-of-segment marker, then another start of segment... — heltonbiker, CommentedMay 13, 2015 at 3:36

user53141user53141 · Accepted Answer · 2015-05-12 16:33:21Z

Base 64 encoding is used for sending binary data over a text channel. It is exactly the opposite of what you want. It will merely require encoding/decoding code and make your messages larger.

It is hard to make a recommendation without knowing exactly what data you need to send. You might want to look at Protocol Buffers as it is designed to be compact.

capnproto looks amazing. Would you expand the idea in an answer? from what I understand is a format, but it also has a RPC server/client. I'm between choosing the TLV format with TCP as protocol, or capnproto with it's own RPC server from what I've read/understood/googled. — cauchi, CommentedMay 14, 2015 at 8:51

Stack Exchange Network

Data format for binary data transfer

3 Answers 3

Update

Hot Network Questions

Data format for binary data transfer

3 Answers 3

Update

Related

Hot Network Questions