Big Data Management Q&A[/caption] Recently, I had the fortune of presenting a deep dive webinar on Informatica Big Data Management and we had nearly 1000 attendees! It was awesome to see such huge interest in Big Data Management. However, I wasn't able to answer all the questions in the time we had. So, I’ve gone thru the list of questions and answered the most common ones here on this blog. Here’s the link to the replay, as well as the top questions and answers from the session. Feel free to reach out to your local Informatica account manager if you’d like clarification or more details about the new features of Big Data Management. Informatica Big Data Management Deep Dive and Demo
Question & Answers:
1) Which version of Informatica’s Big Data solutions should I use to leverage Blaze or the Smart Executor?
Informatica’s Blaze & Smart executor was released in November 2015 with Informatica Big Data Management (v10). The product is available in an Enterprise and Advanced edition.
2) Do I, as a customer, have to be aware of YARN, BLAZE, Map Reduce, etc. or does Informatica hide all this complexity?
Informatica’s smart executor, using built-in optimization & your cluster configuration, determines the best processing engine. This layer of abstraction eliminates the complexity of specialized development for specific engines. This ultimately enables speed, flexibility, and repeatability.
3) Can you clarify what you mean by polyglot engine?
The Smart executor is the “polyglot engine”, which means the smart executor understands different processing engines and their programming framework. The smart executor has been designed to figure out which is the best execution engine on hadoop or for database pushdown, or even native DTM engine. All of this enables speed, flexibility, and repeatability for organizations.
4) What are the unique benefits of Informatica Blaze for customers?
Informatica Blaze is a purpose built engine for Hadoop which overcomes the functionality & processing gaps of other engines in the Hadoop ecosystem to consistently deliver maximum performance for customers. One single engine will never be optimal for all complex batch ETL processing use cases. Each engine (MapReduce, Hive on Tez, and Spark) has its benefits in the ecosystem. Blaze complements these engines by optimizing data processing across 100+ pre-built advanced data integration, data quality, parsing and masking transformations
5) Does the Smart Executor work on all engines or just on Blaze ?
The execution plan is designed for all engines integrated with the smart executor.
6) Many use cases around Big Data Management apply to customer, product, and vendor data. What about providing insight into IT operational data?
Informatica Big Data Management is used today across many complex use cases for Big Data, in any industry. Whether the data domain is around customers, products, etc or the data sources are around social media, sensors, weblogs, relational data, or others, Informatica Big Data Management can help turn raw data into trusted data assets quickly and flexibly.
7) Does Blaze run on Amazon Web Services or Azure or it has to be on same servers we have Informatica servers?
Informatica Big Data Management (v10) has been certified for on-prem hadoop clusters and cloud-based Hadoop cluster on platforms like Azure & Amazon Web Services.
8) Is Blaze faster than Spark?
There isn’t an independent benchmark available, but we did publish internal finds for performance, you can read more here.
9) Is Live Data Map the evolution of Metadata Manager?
Metadata Manager is a data governance solution for use cases like impact analysis. Live Data Map, on the other hand, is designed as a universal metadata catalog service which catalogs data from hadoop, EDW, BI Tools, ETL tools (PowerCenter), making it easy for data analyst, data scientists to explore all the data assets in the enterprise by providing a 360 degree relationship view of the data assets and providing a detail data lineage view.
10) Are dynamic templates limited to Data Warehouse Optimization use cases for Hadoop
Data Warehouse Optimization for Hadoop is typically the first step of any company’s journey into Big Data but dynamic templates are relevant for enhancing developer productivity through a variety of use cases. The best way to use the dynamic mapping is to figure out what are the design patterns you see with your data for any use case (Hadoop or not) and any stage of the data (ETL) processing requirement.