With all the excitement around big data, it is easy to believe that Hadoop can solve all your data-related problems. Yet despite all the hype, Hadoop is likely to become just another data silo. It’s important to consider Hadoop within the context of a holistic big data strategy.
Hadoop is a powerful technology, but it is just one component of the big data technology landscape. Hadoop is designed for specific data types and workloads. For example, it is a very cost-effective technology for staging large amounts of raw data, both structured and unstructured, which can then be refined and prepared for analytics. Hadoop can also help you avoid costly upgrades of existing proprietary databases and data warehouse appliances when their capacity is being consumed too quickly with raw, unused data and extract-load-transform processing.
However, unless Hadoop is integrated with the rest of the data management infrastructure, it quickly becomes another island of data adding complexity to your enterprise IT environment. One aspect of this integration is the ability to bridge Hadoop with other data processing and analytics systems. For example, preprocessing large volumes of raw data may occur in Hadoop, where it can be cost-effectively implemented. But then the resulting data may flow to a different system outside of Hadoop, which is more suited to a particular type of analytics required by the business.
The second aspect of integration—skills integration— is even more important. It is also more difficult. In most early Hadoop deployments, organizations resorted to time-consuming hand coding for data processing, despite high costs and downstream maintenance headaches. They did this because there were no Hadoop tools available that utilized existing skill sets. Instead, Hadoop projects required specialist skills in programming in languages such as MapReduce, Hive, and Pig.
Informatica optimizes data processing across all your systems and platforms, both Hadoop and non-Hadoop, with a no-code development environment built on the Informatica Vibe virtual data machine (VDM). Vibe makes it possible for data integration developers to graphically design data integration mappings once and then deploy those mappings anywhere, whether virtually or not, to traditional data movement platforms or on Hadoop. Vibe makes developers five times more productive, without ever having to learn how to program in Hadoop. With Vibe, every Informatica developer is now a Hadoop developer.
Vibe provides another key long-term benefit. The big data ecosystem is evolving extremely rapidly, with new distributions, new languages and new technologies emerging almost weekly. There is no way to predict where the technology will be in a few months, much less a few years. With its “Map Once. Deploy Anywhere.” capabilities, Vibe insulates you from the constant changes underlying Hadoop and other big data technologies. Whenever you choose to deploy a new technology, Vibe will empower you to reuse your logic without recoding.
Informatica PowerCenter Big Data Edition, powered by Informatica Vibe, provides all the capabilities you need to successfully build and deploy data integration on Hadoop. Right now. And with Vibe, you can be confident that you will be prepared for whatever the future of big data may bring.