Big data has the potential to transform business, improve lives, and change our world — but your company needs to be able to unleash its potential while minimizing the risk associated with new technologies. Informatica Big Data Edition provides a safe, efficient way to integrate all types of data on Hadoop at any scale without having to learn Hadoop.

Increase Productivity up to Five Times

Early adopters of Hadoop and other big data technologies had no choice but to hand-code using Java or scripting languages such as Pig or Hive. Informatica Big Data Edition saves your developers time and effort with a visual development environment, reusable business rules and mapplets, efficient collaboration tools, and flexible deployment models that let them work up to five times faster and more efficiently.

Integrate All Data at Scale

Transaction data stored in databases and flat file formats is only a fraction of the ever-growing volume of data that today's businesses need to integrate. Informatica provides hundreds of high-speed connectors and pre-built transformations so you can integrate data on Hadoop from cloud and mobile applications, social media, sensor devices, and more.

Readily Staff Big Data Projects

Hiring and retaining big data experts can be a time -consuming and costly challenge — but it doesn't have to be. Your data scientists and analysts likely spend only 20 percent of their time on data analysis and the rest on the tedious mechanics of data integration such as accessing, parsing, and managing data. With Informatica Big Data Edition, no specialized coding is required to scale performance on distributed computing platforms like Hadoop. With more than 100,000 trained Informatica developers worldwide, the resources and skills you need for big data projects are never out of reach or scale.

Adopting New Technologies Just Got Safer

The big data ecosystem is evolving rapidly, but you shouldn't have to worry about betting on the wrong technology or running into problems with a change or upgrade. Informatica Big Data Edition is powered by Informatica Vibe™ so you can "Map Once, Deploy Anywhere," knowing that even as technologies change you can run data integration jobs without having to rebuild data processing flows

Informatica PowerCenter Big Data Edition is highly scalable, high-performance enterprise data integration software. Its visual development environment lets developers build ETL data flows that run natively on Hadoop — without having to learn Hadoop.

Up to Five Times More Productivity

Increase developer productivity on Hadoop up to five times over hand-coding through the visual Informatica development environment.  Easily reuse data flows and collaborate with other developers and analysts with a common integrated development environment (IDE).

Universal Data Access

Your IT team can access all types of big transaction data, including RDBMS, OLTP, OLAP, ERP, CRM, mainframe, cloud, and others. You can also access all types of big interaction data, including social media data, log files, machine sensor data, Web sites, blogs, documents, emails, and other unstructured or multi-structured data.

High-Speed Data Ingestion and Extraction

You can access, load, transform, and extract big data between source and target systems or directly into Hadoop, HBase, or your data warehouse. High-performance connectivity through native APIs to source and target systems with parallel processing ensures high-speed data ingestion and extraction.

Real-Time Data Collection and Streaming

Collect massive amounts of machine, log file, and sensor data and stream in real-time over both local and wide area networks directly into Hadoop. A highly available, easy- to- use, and many-to-many configurable topology ensures fast, reliable connections between all sources and targets.

Unlimited Scalability

Your IT organization can process all types of data at any scale—from terabytes to petabytes—with no specialized coding on distributed computing platforms such as Hadoop.

Optimized Performance for Lowest Cost

Based on data volumes, data type, latency requirements, and available hardware, Informatica Big Data Edition deploys big data processing on the highest performance and most cost-effective data processing platforms. You get the most out of your current investments and capacity whether you deploy data processing on SMP machines, traditional grid clusters, distributed computing platforms such as Hadoop, or data warehouse appliances.

Data Integration (ETL) on Hadoop

This edition provides an extensive library of prebuilt transformation capabilities on Hadoop, including data type conversions and string manipulations, high-performance cache-enabled lookups, filters, joiners, sorters, routers, aggregations, and many more. Your IT team can rapidly develop data flows on Hadoop using a codeless graphical development environment to increase productivity and promote reuse.

Data Profiling on Hadoop

Data on Hadoop can be profiled through the Informatica developer tool and a browser-based analyst tool. This makes it easy for developers, analysts, and data scientists to understand the data, identify data quality issues earlier, collaborate on data flow specifications, and validate mapping transformation and rules logic.

Data Quality on Hadoop

Cleanse, match, and standardize data of any type and volume natively on Hadoop to deliver authoritative and trustworthy data. Use an extensive set of prebuilt data quality rules or create your own using the visual development environment. Execute address validation to parse, cleanse, standardize, and enrich global address data.

Complex Data Parsing on Hadoop

This edition makes it easy to access and parse complex, multi-structured, unstructured, and industry standard data such as Web logs, JSON, XML, and machine device data. Optional parsers are offered for market data and industry standards such as FIX, SWIFT, ACORD, HL7, HIPAA, and EDI.

End-to-End Data Lineage

To ensure trust and regulatory compliance, data analysts and business users can view complete end-to-end data lineage. This visual data lineage includes a detailed history of all data movement and transformations (in Hadoop and traditional systems), from target applications all the way back to original source systems. Business/IT collaboration and search is enhanced with a business glossary of common business terms that relate to data objects and their corresponding data lineage.

Data Discovery on Hadoop

Automate the discovery of data domains and relationships on Hadoop. For example, discover customer and product related datasets or sensitive data such as Social Security numbers and credit card numbers so that you can mask the data for compliance.

Natural Language Processing on Hadoop

Using natural language processing and a list of keywords or phrases, entities related to your customers and products can be easily extracted and classified from unstructured data such as emails, social media data, and documents. You can enrich master data with insights into customer behavior or product information such as competitive pricing.

Design Once and Deploy Faster

The Hadoop eco-system is rapidly changing with new innovations continuously emerging in the open-source community. The Big Data Edition builds on top of the open-source Hadoop framework and preserves all the transformation logic in your data pipelines. This means developers can design once, without any specialized knowledge of Hadoop concepts and languages, and easily deploy data pipelines without having to rebuild each time Hadoop changes. As a result, Hadoop innovations are implemented faster with less impact and risk to production systems.

Mixed Workflows

Your IT team can easily coordinate, schedule, monitor, and manage all interrelated processes and workflows across traditional and Hadoop environments to simplify operations and enable drilling down into individual Hadoop jobs while meeting SLAs.

High Availability

This edition provides 24x7 high availability with seamless failover, flexible recovery, and connection resilience. When it comes time to develop new products and services using big data insights, you can rest assured that they are scalable and available 24x7 for mission-critical operations.

Deployment Flexibility

Key Benefits of Informatica Big Data Edition

Helps you bring innovative products and services to market faster and improve business operations

Lowers big data project costs while handling growing data volumes and complexity

Expands Hadoop adoption across the enterprise to realize performance and cost benefits

Represents proven data integration and data quality software, so the risk of adopting new technologies is minimized

Compare these Informatica Big Data Editions to select the one that’s right for you.

Standard Governance
Data Integration Transforms on Hadoop
Data Quality Transforms on Hadoop    
Data Profiling on Hadoop Column, Rule, Join Validation, Mapping Generation from Profile, Midstream, Comparative Profiling and Scorecarding
Complex Data Parsing (Big Data Parser) Restricted to logs, XML, JSON, custom/proprietary data formats
End-to-End Data Lineage     Restricted to supporting Big Data Edition
Business Glossary     Restricted to supporting Big Data Edition
Natural Language Processing (NLP) Transforms on Hadoop
Address Validation Transforms on Hadoop    
Data Domain Discovery on Hadoop    
Identity Matching on Hadoop     Includes 1 Identity Resolution Country Population selected at time of purchase
Data Masking Transforms on Hadoop (Limited)
Real-Time Data Collection and Streaming (Vibe Data Stream) Restricted to HDFS targets and 100 GB daily source data volume Restricted to HDFS targets and 100 GB daily source data volume
High-Speed Data Ingestion
Database Connectivity
Hadoop Connectivity
HBase Connectivity
Social Media Connectivity Unlimited Datatypes Unlimited Datatypes
Ten (10) Informatica Data Analyst Named Users
Support (included with subscription license only) 8 x 5 24 x 7