Informatica HParser Key Features
Wide Support for Multiple Industry Standards
It can be a challenge to process industry-specific data formats like:
- EDI for manufacturing
- FIX, SWIFT, NACHA, and SEPA for financial services payments
- ACORD for insurance
- ASN.1 for telecommunications
- HL7, HIPAA for healthcare
Typically defined by industry groups or government organizations, these formats are continually evolving. Most standards have at least one new version per year, requiring any multiyear Big Data analytics initiative to support multiple versions and variations. With its broad set of libraries, versions, and messages, Informatica HParser offers wide support for multiple industry standards, including regular updates of new and existing standards soon after their release. This enables the current process to support new formats as they become available.
Extraction from Binary Documents into Hadoop
Your organization stores huge amounts of data in documents, such as legal files and contracts in Microsoft Word and PDF and financial reports and forecasts in Microsoft Excel. Informatica HParser offers out-of-the-box support for these binary documents so you can process and extract relevant data from them into Hadoop.
Data Processing from Deep Hierarchical Structures
Formats such as XML and JSON increase the complexity of hierarchical data. The ability to effectively process data from a deep hierarchy and support advanced schema and structures is required to successfully process the complex data in these formats. Informatica HParser features native support for XML and JSON as well as an optimized approach to extracting data from hierarchical structures.
Specifications-Driven Transformation Engine for Defining Logs
Informatica HParser uses a patented transformation engine to define log specifications, including hierarchical, delimited, and positional logs. These specifications can also be leveraged to parse and extract data from a variety of logs--web logs, call detail records (CDR) logs, mainframe logs, and proprietary logs.
HParser runtime is designed to support Hadoop seamlessly at any scale — even with algorithms developed on individual machines, tested with a few nodes, and then run on massive computer clusters. Regardless of data file format or size, HParser can process it and scale with the topology of the available Hadoop cluster.
Informatica HParser Key Benefits
Streamline Development. Hparser’s example-based transformation capability dramatically increases productivity. Users view a data sample in original and text formats, which allows continuous development of the parser or data handler and provides instant feedback without the need to compile and deploy.
Increase Productivity and Flexible Deployment. Informatica HParser speeds development on Hadoop up to five times by providing pre-built parsers for many industry standards The HParser engine is accessible for the Hadoop developer in a simple call, enabling the parsing of any data format inside Hadoop.
Rapidly Abstract Data. Advanced Big Data analytics scenarios depend on the ability to process data from multiple sources. Informatica HParser provides a visual development environment to parse and transform these structured and semistructured formats rapidly into a usable, canonical, and flattened format. With HParser, Hadoop developers can use a single transformation engine, instead of multiple coded data handlers, to develop a single program agnostic to the data variation.