Informatica PowerCenter Big Data Edition is highly scalable, high-performance enterprise data integration software. Its visual development environment lets developers build ETL data flows that run natively on Hadoop — without having to learn Hadoop.
Up to Five Times More Productivity
Increase developer productivity on Hadoop up to five times over hand-coding through the visual Informatica development environment. Easily reuse data flows and collaborate with other developers and analysts with a common integrated development environment (IDE).
Universal Data Access
Your IT team can access all types of big transaction data, including RDBMS, OLTP, OLAP, ERP, CRM, mainframe, cloud, and others. You can also access all types of big interaction data, including social media data, log files, machine sensor data, Web sites, blogs, documents, emails, and other unstructured or multi-structured data.
High-Speed Data Ingestion and Extraction
You can access, load, transform, and extract big data between source and target systems or directly into Hadoop, HBase, or your data warehouse. High-performance connectivity through native APIs to source and target systems with parallel processing ensures high-speed data ingestion and extraction.
Real-Time Data Collection and Streaming
Collect massive amounts of machine, log file, and sensor data and stream in real-time over both local and wide area networks directly into Hadoop. A highly available, easy- to- use, and many-to-many configurable topology ensures fast, reliable connections between all sources and targets.
Your IT organization can process all types of data at any scale—from terabytes to petabytes—with no specialized coding on distributed computing platforms such as Hadoop.
Optimized Performance for Lowest Cost
Based on data volumes, data type, latency requirements, and available hardware, Informatica Big Data Edition deploys big data processing on the highest performance and most cost-effective data processing platforms. You get the most out of your current investments and capacity whether you deploy data processing on SMP machines, traditional grid clusters, distributed computing platforms such as Hadoop, or data warehouse appliances.
Data Integration (ETL) on Hadoop
This edition provides an extensive library of prebuilt transformation capabilities on Hadoop, including data type conversions and string manipulations, high-performance cache-enabled lookups, filters, joiners, sorters, routers, aggregations, and many more. Your IT team can rapidly develop data flows on Hadoop using a codeless graphical development environment to increase productivity and promote reuse.
Data Profiling on Hadoop
Data on Hadoop can be profiled through the Informatica developer tool and a browser-based analyst tool. This makes it easy for developers, analysts, and data scientists to understand the data, identify data quality issues earlier, collaborate on data flow specifications, and validate mapping transformation and rules logic.
Data Quality on Hadoop
Cleanse, match, and standardize data of any type and volume natively on Hadoop to deliver authoritative and trustworthy data. Use an extensive set of prebuilt data quality rules or create your own using the visual development environment. Execute address validation to parse, cleanse, standardize, and enrich global address data.
Complex Data Parsing on Hadoop
This edition makes it easy to access and parse complex, multi-structured, unstructured, and industry standard data such as Web logs, JSON, XML, and machine device data. Optional parsers are offered for market data and industry standards such as FIX, SWIFT, ACORD, HL7, HIPAA, and EDI.
End-to-End Data Lineage
To ensure trust and regulatory compliance, data analysts and business users can view complete end-to-end data lineage. This visual data lineage includes a detailed history of all data movement and transformations (in Hadoop and traditional systems), from target applications all the way back to original source systems. Business/IT collaboration and search is enhanced with a business glossary of common business terms that relate to data objects and their corresponding data lineage.
Data Discovery on Hadoop
Automate the discovery of data domains and relationships on Hadoop. For example, discover customer and product related datasets or sensitive data such as Social Security numbers and credit card numbers so that you can mask the data for compliance.
Natural Language Processing on Hadoop
Using natural language processing and a list of keywords or phrases, entities related to your customers and products can be easily extracted and classified from unstructured data such as emails, social media data, and documents. You can enrich master data with insights into customer behavior or product information such as competitive pricing.
Design Once and Deploy Faster
The Hadoop eco-system is rapidly changing with new innovations continuously emerging in the open-source community. The Big Data Edition builds on top of the open-source Hadoop framework and preserves all the transformation logic in your data pipelines. This means developers can design once, without any specialized knowledge of Hadoop concepts and languages, and easily deploy data pipelines without having to rebuild each time Hadoop changes. As a result, Hadoop innovations are implemented faster with less impact and risk to production systems.
Your IT team can easily coordinate, schedule, monitor, and manage all interrelated processes and workflows across traditional and Hadoop environments to simplify operations and enable drilling down into individual Hadoop jobs while meeting SLAs.
This edition provides 24x7 high availability with seamless failover, flexible recovery, and connection resilience. When it comes time to develop new products and services using big data insights, you can rest assured that they are scalable and available 24x7 for mission-critical operations.
Key Benefits of Informatica Big Data Edition
Helps you bring innovative products and services to market faster and improve business operations
Lowers big data project costs while handling growing data volumes and complexity
Expands Hadoop adoption across the enterprise to realize performance and cost benefits
Represents proven data integration and data quality software, so the risk of adopting new technologies is minimized