Architect for change because you can’t keep up with the IoT by hand-coding

Focus on data cleansing, prep, and quality when facing the higher volume, wider variety, and faster velocity of data generated by the Internet of Things.

“In an IoT future, the adage of “garbage in, garbage out” will be magnified.”

—Alan Lundberg, principal marketing manager, Emerging Products, at Informatica

Informatica’s Senior Product Manager Amrish Thakkar; Principal Marketing Manager, Emerging Products, Alan Lundberg; and R&D Vice President in the Office of the CTO David Lyle sat down to discuss the challenges of collecting and deciphering machine data from the Internet of Things (IoT).

What are the biggest challenges for developers who need to build the infrastructures and applications that collect data from the IoT?

Thakkar:The IoT is a very distributed system. Unlike transactional or traditional data, however, the IoT will continuously generate data, whether you do something or not. Every few seconds, or faster, there will be a new data event. And there is an enormous variety of data. You need to make sense out of all of it to get a holistic picture.

What new technologies must developers learn in order to collect data from the IoT?

Lundberg: New technologies like Hadoop, MapReduce, Kafka, and Flume distill information out of massive data sets. In an IoT future, the adage of “garbage in, garbage out” will be magnified. It will be even more imperative that developers understand data cleansing, data prep, and data quality techniques.

Lyle:There are also all sorts of techniques from the last 10 years that are still useful in the IoT world. When devices are pumping out data continuously, one of the techniques from the past we can use is to filter out identical messages—reduce the amount of data that’s flowing into your systems.

Flume, Kafka, MapReduce—they’re all code. Code that is not mapped or supported by any accompanying metadata. So, in effect, you will be creating a hand-coded hairball as part of your effort with the IoT. You can create something, and something that might work, but let's learn from a few decades of software development. How can you develop something quickly, architected for change, and manage it in a scalable fashion?

It sounds like a new data collection architecture is necessary to harness big data in real time. What should developers look for in this architecture?

Thakkar: You need to look at a product that helps you with data collection from any source to any target. It should let you focus on the most valuable thing: analyzing the data.

Lundberg: You should also think about an architecture that allows you to collect data to respond more rapidly to the business. They are being driven, in turn, by their management to answer top and bottom line questions about customer churn, sentiment analysis, win/loss ratios, and contextual marketing data. With a more modern architecture, you can create applications that alert you to events with limited windows of opportunity to respond, such as those indicating customer abandonment or potential fraud.

You may not know what IoT data you want to measure in the future. You need to build in flexibility to avoid hand-coding everything from scratch with every new device or sensor. You need an architecture that helps take care of that complexity and makes your applications more adaptable and future-proof.

As IoT data streams nonstop into your enterprise, you need to be ready to collect and transform it in real time. Only then can it provide the insights that will add value to the business.

Learn more about accelerating the flow of data in the Internet of Things.

Related content


5 steps to deriving business value from the IoT with existing resources

Gathering and analyzing data for real-time decision-making doesn’t have to depend on investments in brand new tools, technologies, or skills.


Build on innovation and avoid reinvention

How can developers reuse objects if they don’t know they exist in the first place? Try employing a “write once, deploy anywhere” methodology.