What is Apache Hadoop?

Unleash the Power of Hadoop with Informatica Big Data Integration Capabilities

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. Apache Hadoop is designed to scale up from single servers to thousands of machines, each offering local computation and storage unique to Hadoop and its subprojects including Hadoop Distributed File Systems (HDFS), Hadoop MapReduce, Hive, Pig, and HBase.

Informatica provides powerful Big Data integrations capabilities that empower your organization to leverage the following strengths of Hadoop:

  • Complex data analytics. Data types that are complex and diverse in sources and formats, including videos, images, text, real-time feeds, devices, scientific, sensors, call detailed records (CDR), etc., are good targets for consideration for Hadoop. Many organizations perform sentiment analysis or fraud analysis combining transaction data with textual and other data coming for this reason in Hadoop.
  • Store large amounts of data. Hadoop stores data without a need to change it into a different format that is common in a traditional data warehouse. In Hadoop, data is not lost in the translation process. When you need to accommodate new data sources and formats, but aren’t ready to finalize which ones, Hadoop is often a good framework, allowing the flexibility for a data analyst to choose how and when to perform data analysis.
  • Scaling through distributed processing. While Hadoop is used for large-scale data processing with petabytes of data, many organizations using Hadoop are performing data processing tasks that are at terabyte scale. If you adopt Hadoop, a Big Data processing platform, it can scale as your organizational demand changes. Hadoop, with its MapReduce framework, can abstract the complexity of running distributed, shared-nothing, data processing functions across multiple nodes in the cluster, making it easier to gain benefits of scaling.
  • Cost advantage. Hadoop is open source software that runs on commodity hardware, and you can add or retire nodes based on your project demand. This cost advantage in Hadoop is driving many organizations to add Hadoop clusters in their data processing investment portfolio, rather than replacing the existing portfolio. The ability to store and process data cost-effectively in Hadoop is helping organization to harness more data for projects that were not making “business” sense previously.
  • Power of Open Source community. Hadoop is supported by an active, global, and growing network of developers worldwide, and Hadoop’s subprojects are supported by the world’s largest Web-centric companies, like Yahoo and Facebook. Organizations who choose Hadoop also find this aspect of sharing best practices, implementing enhancements and fixes in software and documentation crucial to consider or use Hadoop.

Informatica gives you a single platform to access, integrate, and manage any type of data on Hadoop and interoperate with any processing platform. With Big Data integration capabilities on Hadoop, your IT organization can:

  • Deliver the full value of Big Data
  • Reduce the cost and risk of processing vast quantities of data
  • Uncover a complete view of your organization

One of the key capabilities for Hadoop from Informatica is PowerExchange for Hadoop.

PowerExchange for Hadoop provides native, high-performance connectivity to the Hadoop Distributed File System (HDFS). It enables IT organizations to take advantage of the Hadoop’s processing power using their existing IT infrastructure and resources. With PowerExchange for Hadoop, IT organizations can bring any and all enterprise data into Hadoop for integrating and processing Big Data. IT organizations can easily move data into and out of Hadoop in batch or real time using universal connectivity to all data.

With PowerExchange for Hadoop, IT organizations can:

  • Use Hadoop to efficiently and cost-effectively integrate and process Big Data, delivering a more complete and trusted view of the business
  • Engage Hadoop to deliver Big Data projects faster with universal data access and a fully integrated development environment
  • Minimize the risk of Big Data processing with support for variety of Hadoop distributions