It's common to hear Big Data defined through the so-called "3 Vs" of Volume, Variety, and Velocity. However, experienced data practitioners also add a fourth “V”: Veracity. Informatica Data Quality Big Data Edition ensures you can manage all four “Vs” in situations where truthful, accurate data is business-critical.

Make Big Data Fit for Use

Social media data for sentiment analysis can contain some uncertainty. Customer transaction data for financial reporting, fraud detection, or regulatory compliance has no such wiggle room. Informatica Data Quality Big Data Edition ensures your business users and data scientists can trust the results of Big Data analytics with pre-built data quality rules and ETL transformations that integrate, cleanse, and match natively on Hadoop to make your Big Data fit for use.

Make Big Data Fit for Use

Move Quickly From Data Prep to Analysis

Your data scientists spend up to 80 percent of their time on data integration and data quality. Help them spend their time analyzing Big Data rather than preparing it. Informatica Data Quality Big Data Edition includes Informatica PowerCenter Big Data Edition, which creates comprehensive data flows that easily access, integrate, cleanse, and match data, all on Hadoop.

Facilitate Collaboration on Big Data Projects

Informatica Data Quality Big Data Edition provides intuitive role-based collaboration tools to bring together your business subject matter experts, data scientists and analysts, ETL developers, application developers, and data stewards in a coherent, cohesive Big Data project team. Business users can easily find and profile data; data stewards can identify and fix data quality issues; and data analysts can define data flow specifications that automatically generate mapping logic so developers can quickly provision data integration and data quality transformations, all on Hadoop.

Informatica Data Quality Big Data Edition, powered by Vibe™ virtual data machine, delivers authoritative and trustworthy data of any type and volume using pre-built data quality rules and ETL transformations processed natively on Hadoop. It transforms the way your business works by instilling trust in all your data, for all your needs, at all times.

Up to Five Times the Productivity

  • Increase developer productivity on Hadoop up to five times over hand-coding through the visual Informatica development environment
  • Easily reuse data flows and collaborate with other developers and analysts with a common integrated development environment (IDE)

Universal Data Access

  • Access all types of big transaction data, including RDBMS, OLTP, OLAP, ERP, CRM, mainframe, cloud, and more
  • Access all types of big interaction data, including social media data, log files, machine sensor data, Web sites, blogs, documents, emails, and other unstructured or multi-structured data

High-Speed Data Ingestion and Extraction

  • Access, load, transform, and extract Big Data between source and target systems or directly into Hadoop, HBase, or your data warehouse
  • Ensure high-speed data ingestion and extraction with high-performance connectivity through native APIs to source and target systems with parallel processing

High-Speed Data Ingestion and Extraction

  • Access, load, transform, and extract Big Data between source and target systems or directly into Hadoop, HBase, or your data warehouse
  • Ensure high-speed data ingestion and extraction with high-performance connectivity through native APIs to source and target systems with parallel processing

Unlimited Scalability

  • Process all types of data on distributed computing platforms such as Hadoop at any scale, from terabytes to petabytes, with no specialized coding

Optimized Performance for Lowest Cost

  • Deploy Big Data processing on the highest-performance, most cost-effective data processing platforms for your data volumes, data type, latency requirements, and available hardware
  • Optimize your current investments and capacity whether you deploy data processing on SMP machines, traditional grid clusters, distributed computing platforms like Hadoop, or data warehouse appliances

Data Quality on Hadoop

  • Cleanse and standardize data natively on Hadoop
  • Use an extensive set of prebuilt data quality rules or create your own using the visual development environment

ETL on Hadoop

  • Leverage an extensive library of prebuilt transformation capabilities on Hadoop, including data type conversions and string manipulations, high performance cache-enabled lookups, filters, joiners, sorters, routers, aggregations, and many more
  • Rapidly develop data flows on Hadoop using a codeless graphical development environment that increases productivity and promotes reuse

Data Domain Discovery on Hadoop

  • Discover what type of data you have on Hadoop
  • Identify sensitive PII (personably identifiable information) data that should be masked and secured for regulatory compliance

Profiling on Hadoop

  • Profile data on Hadoop through the Informatica developer tool and a browser-based analyst tool to better understand the data, identify data quality issues earlier, collaborate on data flow specifications, and validate mapping transformation and rules logic

Data Matching on Hadoop

  • Match customer and product data at scale on Hadoop to remove duplicates and incomplete records

Design Once and Deploy Anywhere

  • Design the ETL process once, without any specialized knowledge of Hadoop concepts and languages, and easily deploy data flows on Hadoop or traditional systems.

Complex Data Parsing on Hadoop

  • Access and parse complex, multi-structured, unstructured, and industry standard data such as Web logs, JSON, XML, and machine device data
  • Add optional prebuilt parsers for market data and industry standards like FIX, SWIFT, ACORD, HL7, HIPAA, and EDI (licensed separately)

Entity Extraction and Data Classification on Hadoop

  • Use natural language processing (NLP) and a list of keywords or phrases to extract and classify entities related to your customers and products from unstructured data such as emails, social media data, and documents.
  • Enrich master data with insights into customer behavior or product information such as competitive pricing

Mixed Workflows

  • Easily coordinate, schedule, monitor, and manage all interrelated processes and workflows across traditional and Hadoop environments to simplify operations and meet SLAs
  • Drill down into individual Hadoop jobs

High Availability

  • Protect current and future Big Data projects with 24x7 high availability and scalability with seamless failover, flexible recovery, and connection resilience