Previously, we told you how we arrived at our streaming charts UI for the User Portal using D3.js. In this post, we will dig into how SAMI uses WebSockets to handle data streaming from device to browser, and the different parts that make this work.
The amount of data that we can transfer from SAMI to the browser depends mostly on the number of devices that the user has connected to his/her account, and the ones that are currently sending data to the platform. With this in mind, and having Simband as a first device, we knew that we could get a really high volume of data that needed to be transmitted in short periods of time—so short that making AJAX call to poll the server wasn’t an option at all. The AJAX call would’ve taken longer than the time we’d have to process all the data received. The next possible solution was obvious, given the current technology that exists: WebSockets.
What is a WebSocket? Chances are that if you are in the web industry, you already know. If not, here is a little extract from WebSocket.org:
Now, talking about top-edge technology and new specifications, the WebSocket API is still a proposed standard that has already been implemented in several of the modern web browsers. Once we knew that WebSockets was the way to go, we set up support for the following browsers:
- Chrome 39+
- Safari 7.1+ for OS X
- Firefox 34+ for desktop
We decided to use the WebSockets API without a wrapper—we knew that the modern browsers we would be targeting already have the API implemented, and we really didn’t need a fallback for old browsers. Our code for managing the socket connection to our backend is simple, so a native WebSocket, with our callback functions attached to each of the events the socket responds to, made the deal.
We did implement a small object to wrap the socket object, in case we needed several sockets opened at the same time or we need to implement a wrapper library in the future. We ended up using this object in order to monitor when the socket was closed, so we could open it again to keep the data flow going. More on this below.
When we were able to finish the communication channel to SAMI, and started receiving data from the socket, we knew that the easy part was done. We had the data, but we had to ask: “What do we have to do with it?” The main goal was data visualization: give the user a way to see the data that his/her devices are tracking and that is being stored in SAMI. Still, up until now, the way to visualize this data kept evolving.
We will not talk, right now, about how SAMI receives the messages and how it stores them, nor the structure it follows to format the data received. The focus of this article is how we deal with that data in the browser.
This is the basic flow we have to receive the data:
The socket handler is the object we mentioned earlier. It wraps the simple WebSocket and monitors it, so that we can open a new one if the current one is closed for any reason.
The blackbox defines the way we process the data we receive. This is diagrammed below:
I will talk about each of the parts in this diagram, and hopefully explain why we did it this way.
We explained the purpose of the socket handler above.
When we have the socket connection and we receive data, we will receive any data for all the user’s devices that are sending data. Let’s say that the user has 2 devices streaming data, DeviceA and DeviceB. We need to split the data related to each different device, process it and store it separately.
The filter handler is the part in charge of making the split. When we first designed the filter handler, it could split as many devices the user had. If the user had 10 devices, we could receive data from all 10 devices and we would have to split them all. Then we realized that these devices could all be from the same device type, and that things could be simplified by splitting per device type instead of device.
Why per device type? Because of the structure of the data that we receive.
There isn’t a list of the different devices SAMI can handle. In fact, that is one of the good things: There is no limit. The device could be any client that sends data to a platform, following certain guidelines we call the Manifest. Now, with that in mind, we can’t expect to receive all data with the same structure. Some devices might follow a common pattern, but this might not happen all the time. So our assumption was: “different device types => different data structure”.
To achieve our main goal, data visualization, we had to create charts. So our next step was to get (x, y) paired values from the data received from SAMI. And that is the purpose of the data formatters. They are in charge of digging into the structure of data that we receive, finding the data we care about (in order to be able to create charts) and extracting that data into one or more tuples of (x, y) values.
In the diagram above, you can see different data formatters. The general data formatter will attempt to get the data following the Manifest given by the creator of the device type. But sometimes (and this happened with Simband) the Manifest is way too complex, or handles data in a way specific to the device type. For this, we have created specific data formatters, which follow the guidelines given by the creators of the device.
Data Storage Handler
After having the data that we care about, pulled from the structure received via the socket, we end up with one or more pairs of values (x, y). Next step: Find out which values belong to what device. Device, not device type. For instance, you could have 2 different Simbands (don’t ask me why)—both belonging to the same device type, but each one being a unique device that sends different data. That data must not be mixed.
This is the task of the Data Storage Handler. It will filter and send to its correct Data Storage each pair of values that belongs to the correct device. It doesn’t bother to check the values, nor does it sort them. It just routes the data to the correct place.
Once the data is formatted (pairs of values [x, y]) it has to be stored somewhere. This somewhere is a Data Storage. This has three main tasks: Sort the data, store the values, and output them when requested.
Sort them? Why? Well, SAMI outputs the data through the socket, but it doesn’t order values for small time windows (less than one second, approximately). If we don’t sort them, they will be stored by order of arrival, instead of the time when the devices tracked the data. After we sort them, they are stored in an array. This array is the main concentration of the information, per device, that arrived through the socket. It will contain all the information that has been streamed through the socket to the applications, starting from the moment the user opened the web app. From this array, we pull the pair of values to create charts. You can see the tuples (timestamp, value) in the screenshot below, which shows the first 5 values requested each time we repaint the chart.
This is just an overview of how things work from the socket point until the chart point. Each part has its own history, and continues to evolve. Most of our features started as a simple proof of concept. Once we found that a feature was possible, we started building from this—improving, changing what we had for more robust code that would fulfill the new requirements.
We are trying to improve performance, because that is our main concern. We redo things when we find out that we did something wrong, or that there is a better way to get it done.
In the future, we might write about the histories behind each of these parts, our headaches and our stumbles.
Top image: Petter Duvander