Informatica Big Data Parser Key Features
Wide Support for Multiple Industry Standards
It can be a challenge to process industry-specific data formats such as:
- EDI for manufacturing
- FIX, SWIFT, NACHA, and SEPA for financial services payments
- ACORD for insurance
- ASN.1 for telecommunications
- HL7, HIPAA for healthcare
Typically defined by industry groups or government organizations, these formats are continually evolving. Most standards have at least one new version per year, requiring any multiyear big data analytics initiative to support multiple versions and variations. With its broad set of libraries, versions, and messages, Informatica Big Data Parser offers wide support for multiple industry standards, including regular updates of new and existing standards soon after their release. This enables the current process to support new formats as they become available.
Extraction from Binary Documents into Hadoop
Your organization stores huge amounts of data in documents, such as legal files and contracts in Microsoft Word, Adobe PDF and financial reports and forecasts in Microsoft Excel. Informatica Big Data Parser offers out-of-the-box support for these binary documents so you can process and extract relevant data from them into Hadoop.
Data Processing from Deep Hierarchical Structures
Formats such as XML and JSON increase the complexity of hierarchical data. The ability to effectively process data from a deep hierarchy and support advanced schema and structures is required to successfully process the complex data in these formats. Informatica Big Data Parser features native support for XML and JSON as well as an optimized approach to extracting data from hierarchical structures.
Specifications-Driven Transformation Engine for Defining Logs
Informatica Big Data Parser uses a patented transformation engine to define log specifications, including hierarchical, delimited, and positional logs. These specifications can also be leveraged to parse and extract data from a variety of logs-Web logs, call detail records logs, mainframe logs, and proprietary logs.
Big Data Parser runtime is designed to support Hadoop seamlessly at any scale — even with algorithms developed on individual machines, tested with a few nodes, and then run on massive computer clusters. Regardless of data file format or size, Big Data Parser can process it and scale with the topology of the available Hadoop cluster.
Informatica Big Data Parser Key Benefits
Streamline Development. Big Data Parser's example-based transformation capability dramatically increases productivity. Users view a data sample in original and text formats, which allows continuous development of the parser or data handler and provides instant feedback without the need to compile and deploy.
Increase Productivity and Flexible Deployment. Informatica Big Data Parser speeds development on Hadoop up to five times by supplying pre-built parsers for many industry standards The Big Data Parser engine is accessible for the Hadoop developer in a simple call, enabling the parsing of any data format inside Hadoop.
Rapidly Abstract Data. Advanced big data analytics scenarios depend on the ability to process data from multiple sources. Informatica Big Data Parser provides a visual development environment to parse and transform these structured and semistructured formats rapidly into a usable, canonical, and flattened format. With Big Data Parser, Hadoop developers can use a single transformation engine, instead of multiple coded data handlers, to develop a single program agnostic to the data variation.