Informatica HParser is a data parsing transformation environment optimized for Hadoop. This easy-to-use, codeless parsing software processes any file format natively on Hadoop with scale and efficiency. It provides out-of-the-box parsers to address the variety and complexity of data sources, including logs, industry standards, documents, and binary or hierarchical data.

Access and Parse Complex Data

Informatica HParser enables access to the most difficult data and file formats in Hadoop. It reduces the time and cost of developing data handlers by 70 percent while streamlining management of industry standards, binary documents, and hierarchical data. A sparse transformation pattern is HParser's core strength:

  • HParser understands a format in its entirety—binary, hierarchy, text, and standards.
  • HParser software easily skips non-interesting structures based on their physical or logical attributes.
  • HParser reaches deep into data hierarchies with a single instruction.

Agile Development for Data Parsing

HParser provides a unique agile development environment for data parsing and transformation. Your IT organization can view data samples within HParser Studio and understand their structure and layout through a set of integrated tools. You can then experiment by extracting desired data elements into a target Hadoop format. Finally, you can test your HParser transformation against other similar data samples. All of these activities take place within a single HParser Studio and can be rapidly performed within minutes.

Informatica HParser Key Features

Wide Support for Multiple Industry Standards

It can be a challenge to process industry-specific data formats like:

  • EDI for manufacturing
  • FIX, SWIFT, NACHA, and SEPA for financial services payments
  • ACORD for insurance
  • ASN.1 for telecommunications
  • HL7, HIPAA for healthcare

Typically defined by industry groups or government organizations, these formats are continually evolving. Most standards have at least one new version per year, requiring any multiyear Big Data analytics initiative to support multiple versions and variations. With its broad set of libraries, versions, and messages, Informatica HParser offers wide support for multiple industry standards, including regular updates of new and existing standards soon after their release. This enables the current process to support new formats as they become available.

9.6 Launch - bd_hparser_screenshot1_300w.png

Extraction from Binary Documents into Hadoop

Your organization stores huge amounts of data in documents, such as legal files and contracts in Microsoft Word and PDF and financial reports and forecasts in Microsoft Excel. Informatica HParser offers out-of-the-box support for these binary documents so you can process and extract relevant data from them into Hadoop.

Data Processing from Deep Hierarchical Structures

Formats such as XML and JSON increase the complexity of hierarchical data. The ability to effectively process data from a deep hierarchy and support advanced schema and structures is required to successfully process the complex data in these formats. Informatica HParser features native support for XML and JSON as well as an optimized approach to extracting data from hierarchical structures.

9.6 Launch - bd_hparser_screenshot2_300w.png

Specifications-Driven Transformation Engine for Defining Logs

Informatica HParser uses a patented transformation engine to define log specifications, including hierarchical, delimited, and positional logs. These specifications can also be leveraged to parse and extract data from a variety of logs--web logs, call detail records (CDR) logs, mainframe logs, and proprietary logs.

Elastic Scaling

HParser runtime is designed to support Hadoop seamlessly at any scale — even with algorithms developed on individual machines, tested with a few nodes, and then run on massive computer clusters. Regardless of data file format or size, HParser can process it and scale with the topology of the available Hadoop cluster.

Informatica HParser Key Benefits

Streamline Development. Hparser’s example-based transformation capability dramatically increases productivity. Users view a data sample in original and text formats, which allows continuous development of the parser or data handler and provides instant feedback without the need to compile and deploy.

Increase Productivity and Flexible Deployment. Informatica HParser speeds development on Hadoop up to five times by providing pre-built parsers for many industry standards The HParser engine is accessible for the Hadoop developer in a simple call, enabling the parsing of any data format inside Hadoop.

Rapidly Abstract Data. Advanced Big Data analytics scenarios depend on the ability to process data from multiple sources. Informatica HParser provides a visual development environment to parse and transform these structured and semistructured formats rapidly into a usable, canonical, and flattened format. With HParser, Hadoop developers can use a single transformation engine, instead of multiple coded data handlers, to develop a single program agnostic to the data variation.