Informatica Big Data Parser is a data parsing transformation environment optimized for Hadoop. This easy-to-use, codeless parsing software processes any file format natively on Hadoop with scale and efficiency. It provides out-of-the-box parsers to address the variety and complexity of data sources, including logs, industry standards, documents, and binary or hierarchical data.

Access and Parse Complex Data

Informatica Big Data Parser enables access to the most difficult data and file formats in Hadoop. It reduces the time and cost of developing data handlers by 70 percent while streamlining management of industry standards, binary documents, and hierarchical data. A sparse transformation pattern is Big Data Parser's core strength:

  • Big Data Parser understands a format in its entirety—binary, hierarchy, text, and standards.
  • Big Data Parser software easily skips un-interesting structures based on their physical or logical attributes.
  • Big Data Parser reaches deep into data hierarchies with a single instruction.

Agile Development for Data Parsing

Big Data Parser supplies a unique agile development environment for data parsing and transformation. Your IT organization can view data samples within big Data Parser Studio and understand their structure and layout through a set of integrated tools. You can then experiment by extracting desired data elements into a target Hadoop format. Finally, you can test your big Data Parser transformation against other similar data samples. All of these activities take place within a single Big Data Parser Studio and can be rapidly performed within minutes.

Informatica Big Data Parser Key Features

Wide Support for Multiple Industry Standards

It can be a challenge to process industry-specific data formats such as:

  • EDI for manufacturing
  • FIX, SWIFT, NACHA, and SEPA for financial services payments
  • ACORD for insurance
  • ASN.1 for telecommunications
  • HL7, HIPAA for healthcare

Typically defined by industry groups or government organizations, these formats are continually evolving. Most standards have at least one new version per year, requiring any multiyear big data analytics initiative to support multiple versions and variations. With its broad set of libraries, versions, and messages, Informatica Big Data Parser offers wide support for multiple industry standards, including regular updates of new and existing standards soon after their release. This enables the current process to support new formats as they become available.

9.6 Launch - bd_hparser_screenshot1_300w.png

Extraction from Binary Documents into Hadoop

Your organization stores huge amounts of data in documents, such as legal files and contracts in Microsoft Word, Adobe PDF and financial reports and forecasts in Microsoft Excel. Informatica Big Data Parser offers out-of-the-box support for these binary documents so you can process and extract relevant data from them into Hadoop.

Data Processing from Deep Hierarchical Structures

Formats such as XML and JSON increase the complexity of hierarchical data. The ability to effectively process data from a deep hierarchy and support advanced schema and structures is required to successfully process the complex data in these formats. Informatica Big Data Parser features native support for XML and JSON as well as an optimized approach to extracting data from hierarchical structures.

9.6 Launch - bd_hparser_screenshot2_300w.png

Specifications-Driven Transformation Engine for Defining Logs

Informatica Big Data Parser uses a patented transformation engine to define log specifications, including hierarchical, delimited, and positional logs. These specifications can also be leveraged to parse and extract data from a variety of logs-Web logs, call detail records logs, mainframe logs, and proprietary logs.

Elastic Scaling

Big Data Parser runtime is designed to support Hadoop seamlessly at any scale — even with algorithms developed on individual machines, tested with a few nodes, and then run on massive computer clusters. Regardless of data file format or size, Big Data Parser can process it and scale with the topology of the available Hadoop cluster.

Informatica Big Data Parser Key Benefits

Streamline Development. Big Data Parser's example-based transformation capability dramatically increases productivity. Users view a data sample in original and text formats, which allows continuous development of the parser or data handler and provides instant feedback without the need to compile and deploy.

Increase Productivity and Flexible Deployment. Informatica Big Data Parser speeds development on Hadoop up to five times by supplying pre-built parsers for many industry standards The Big Data Parser engine is accessible for the Hadoop developer in a simple call, enabling the parsing of any data format inside Hadoop.

Rapidly Abstract Data. Advanced big data analytics scenarios depend on the ability to process data from multiple sources. Informatica Big Data Parser provides a visual development environment to parse and transform these structured and semistructured formats rapidly into a usable, canonical, and flattened format. With Big Data Parser, Hadoop developers can use a single transformation engine, instead of multiple coded data handlers, to develop a single program agnostic to the data variation.