Cloud First IoT with Syft




Introduction

Syft Technologies is a leading scientific equipment manufacturer specializing in chemical analysis. To achieve this goal Syft has developed a line of cutting edge spectrometers for over a decade.


Challenge

Syft’s customers and internal teams run the spectrometers for days at a time, often chaining together multiple chemical analysis runs. Before working with Calculated Systems these teams relied on manually collecting results from each individual instrument and processing them. This process resulted in inconsistencies and work could be delayed by hours unless an operator was immediately available. This was acceptable with a limited number of instruments but did not scale as the number of spectrometers grew. Both Syft and their customers needed a better solution for achieving a scalable, reliable way of processing test results.

An additional challenge was the future need for the solution to be isolated from the public facing internet. Several of Syft’s key customers maintain an air-gap between the spectrometers and any external network. This requirement disqualified many cutting edge cloud solutions and required an innovative set of tools.





Solution

Syft had 6 months to develop a highly scalable, production grade system that would enable them to collect and interpret test results. With the aid of Calculated Systems, the architecture was broken down into a sequence of core stages.


  • Device + Capture - Data generated onboard the scientific instruments needed to be collected and transmitted. This could be achieved either through an agent-architecture or a polling system. Apache NiFi Agents on each instrument enabled us to setup secure agents that could transmit both to the cloud or a dedicated on-premise box exceptionally well while maintaining a cloud agnostic approach. Furthermore this enables a highly secure architecture where as the machines are only generating outbound connections without allowing hazardous inbound requests.

  • Data Gateway + Initial Landing - Transmission and collection into a central store was required to manage the decentralized incoming data into a centralized repository for all later stages such as exploration and visualization. This stage doubled as a decoupling buffer between the collection and processing in case the rate of collection outpaced the ability to interpret and parse the data. In Syft’s case Apache NiFi’s internal buffering capabilities were sufficient although some companies may wish to explore using Kafka.

  • Processing Data - Raw log data coming in needed to be transformed and in some cases enriched with metadata such as associating foreign database keys or other test information.

  • Storage - This segment refers both to raw data storage and normalized database storage. It is best practice to store raw data exactly as it is collected along side storing it in a usable, referencable format.

  • Exploration - A suite of tools was needed to explore the data to discover new relationships within the data. This job was done by subject matter experts and data engineers. Typically this set of tools involves programming and complex tools.

  • Visualize - This capability is repeatable access to questions already defined in the exploration set of tools. This box represents tools that produce graphics and are at most a spreadsheet-level of difficulty and complexity.

  • Act - Once data was interpreted and parsed automated actions could be taken. This could be as simple as an email to alert users of anomalous conditions, to the adjustment of machine parameters.

Results

Syft is now able to collect, analyze, and interpret scientific results within seconds of a test completing, instead of waiting for hours or even days This enables operators to track instrument health as well as quickly access the results of an individual experiment.


The onboard Apache NiFi agents securely transmit to a cloud Apache NiFi instance every time a new result is available. This data is automatically transferred and stored in a postgres database and made available for visualization through both JDBC and Metabase. Calculated Systems was able to accelerate Syft’s development so that in 90 days a prototype flow was functional. Over the following months the system was hardened to include encryption, high availability, and error handling.


Conclusion

Calculated Systems helped Syft build a real-time data pipeline to the cloud for reliable processing of test results from customers’ spectrometers. The scalable, production pipeline was deployed within 3 months and eliminated the inconsistencies and delay of manual processing of chemical analysis runs. Now operating at production scale, Syft can access results in seconds instead of hours or days.


About Calculated Systems

Calculated Systems accelerates time to market for new innovations. With cloud automation tools, deep industry expertise, and experience productionalizing workloads development cycles are cut down to a fraction of their normal time.

The ability to quickly develop IoT Systems decreases the risk companies face in long development cycles. Learn how you can get started with our 90 days to IoT program here: https://www.calculatedsystems.com/90-days-to-iot