TCP connection-oriented synchronous data write

Question

thinking about the design of an IoT project where the devices don't have a standard application layer but rather a thin custom application layer utilizing TCP sockets. What I mean by this is that the device wakes up every 15 minutes or so opens a socket on the server and sends data in, waits for any incoming data and then goes back to sleep/powers off.

On the software side Node.js is being used with a net.createServer to create a socket server and handle incoming data using a single-threaded event loop. When the socket is created the bytes are read and translated into numeric values which are then published to a MQTT topic for downstream processing.

What I'm mulling over is the other part of this data exchange in which bytes are written upon a connection. I.e. sending a command to the device. Since it's a TCP connection-oriented device, the software has to queue any asynchronous data to the device and hold it while waiting for the device to connect. Previously I would hold this data in an array (used C++ previously) and then write to the device when it made a connection. This proved to be a poor design since sofware crashes from time to time and loses any outgoing data to the device.

Designing something new, is it a good practice to handle this in a database? Meaning the asynchronous data is written to the database and then the software simply reads the database for the data at connection time. If there are thousands of devices reporting in that's a lot of reads to the database as well as writes to it for confirmation purposes. Was thinking instead to possibly use a subscribe via MQTT topic (retained messages) when device reports in to read outgoing data. However I'm not sure how well this can scale and the speed at which the software can subscribe to a topic which seems like it would be a bottleneck. Any ideas on which one to go with or alternatives to these two designs?

jfriend00 · Accepted Answer · 2021-05-18 18:43:42Z

This really depends upon how many devices you're trying to support and how often a transaction occurs with a given device and how much data is involved for a given device. Based on that, reasonable solutions range from:

Using a file per device to persist the data waiting to be sent to the device
Using a clustered regular database holding the data for each device
Using high transaction, high scale queuing software.

It also matters whether you will need to cluster your nodejs server to handle the load or not.

If you don't know right now that you have to go really high scale or how much load that scale leads to, then I'd be inclined to build an abstract class for the store (a class that could have any of the types of storage mechanisms mentioned above behind it) and initially put a "file per device implementation" behind the abstract store API. That "file per device" implementation can probably be built pretty quickly. You would just need a persistent ID to identify each device that you could then use in a filename.

Then, see how well that works and how well it scales and performs. Then, you only go to one of the more expensive solutions (to acquire, code, deploy and maintain) when you know you really need it. This builds the opportunity to scale more in the future into your architecture, but doesn't over invest in it now.

Thanks for the response, conservatively estimating 10000 devices. As for the amount of data getting written, pretty small, on the order of 60 bytes. At most the transmission to the device would be no more than once per 15 minutes (when the device wakes up and transmits data to the socket). Realistically though maybe once or twice per day. What kind of "high transaction, high scale queuing software" are you referring to, if you can point to an existing service? — sam, CommentedMay 18, 2021 at 18:42
@sam - Twice a day for 10,000 devices means 20,000 transactions/day. If that was spread over 12 hours, that would be 27 transactions per minute or about 1 transaction every 2 seconds. I would think you can handle that with a single server and the file-per-device implementation with an opportunity to monitor performance as you grow and decide when to scale up. Even the file-based system can be scaled up by clustering your nodejs server and moving to shared file storage like NAS. — jfriend00, CommentedMay 18, 2021 at 18:49
Personally I'd be wary of any sort of straight file storage where the data can come in from multiple sources. A very basic database (SQL or not) seems like a better choice. — jaskij, CommentedMay 18, 2021 at 23:06
@JanDorniak - That is indeed something to factor in. But, it's also certainly solvable with a file system implementation given the simple needs of this system (one could force incoming data to go through an API that could just use transient file locking, for example). It really depends upon whether the developer has a good database for the job ready to go or not that they know would be as scalable as a file system approach at the expected transaction level. Either can certainly work. — jfriend00, CommentedMay 18, 2021 at 23:33
@jfriend00 file locking can run into time of check vs time of use and isn't as trivial as it might seem to an inexperienced developer. It is the developer's choice, but personally I'd gladly pay the overhead of using, say, postgres, just to know all those pesky details are taken care of. — jaskij, CommentedMay 18, 2021 at 23:43

Stack Exchange Network

TCP connection-oriented synchronous data write

1 Answer 1

Hot Network Questions

TCP connection-oriented synchronous data write

1 Answer 1

Related

Hot Network Questions