The Value of Cloud Data Management: A Q&A with AWS’s Rahul Pathak

Last Published: Dec 23, 2021 |
Jitesh Ghai
Jitesh Ghai

Chief Product Officer

Informatica and Amazon Web Services are longstanding partners with more than a decade of supporting joint customers on their cloud journeys. Jitesh Ghai, Senior Vice President and General Manager, Data Management at Informatica, sat down virtually with Rahul Pathak, General Manager for Analytics at AWS, to get his perspective on what’s top of mind for customers and emerging data trends.





Jitesh Ghai: What is AWS seeing as the prevailing trends in our shared customer base?

Rahul Pathak: We’re seeing an explosion in the volume of data that customers are dealing with, as well as tremendous growth in the types of data sources.

We've traveled from this world of tabular data to one where we've got a lot of semi-structured data. And typically, we've seen about a ten-times volume increase every five years. So just in the time AWS and Informatica have been working together, that's a hundred times increase in data for our joint customers.

We're also seeing an acceleration in migrations to the cloud. Our customers are both optimizing for agility, and trying to find ways to deal with data at scale. That means automation. And then we also see that they want to simultaneously optimize their businesses—to get a better understanding of how their businesses are operating—and to make sure they’re optimizing spend and mapping that to their usage. The elasticity in the cloud also helps with that.


Jitesh Ghai: I couldn’t agree more. The economic benefits of the cloud—the resiliency, flexibility, and agility—make it an exciting space, and data and analytics are undergoing tremendous innovation as well. First there were data marts and data warehouses, then data lakes that became swamps with Hadoop. Then came along EMR, Hive, and Spark, until finally customers said, "You know what? I don't want to deal with the operational burden. Let's move to the cloud."

So here’s another question, dealing with the current pandemic: what has AWS seen happening over the last few months, with everybody working remotely and businesses continuing operations?

Rahul Pathak: I think we’re all adjusting. But we've seen a lot of unpredictability. Some businesses are having to adjust operations downwards, because of the economic fallout. Yet others are experiencing a massive amount of growth. Still, across the board, there's a desire to start using data to make better decisions. And I think architectures have evolved so organizations are looking at how they can integrate what used to be siloed data so they get a complete view of their businesses and their customers. As a result, we're seeing a lot more integration across cloud data warehousing and data lakes, with customers wanting to access and use all that data. In the meantime, they’re generating more and more data. So there's an ever-increasing amount to make sense of, and to quickly convert into a form that they can use to drive business value.

Jitesh Ghai: To that point, as you mentioned earlier, automation is fundamental to operating this architecture. But automation is equally needed to scale and curate this exponentially ever-growing volume of data that organizations are trying to harness and derive value from. So how is automation playing out within the data and analytics stack?

Rahul Pathak: Automation is a huge deal on multiple levels, especially with the massive amounts of data coming in. We've long passed the point where it's practical for it to be curated by hand. Our customers are looking for machine learning, automation, and intelligence to make sense of the data coming in— to clean it up, classify it, enrich it, and mask or protect it. Given the extreme volumes involved, you really want to automate the process.

This is translating to having more and more automation and machine learning at the analytical layer. This in turn is driving the rise of machine learning and predictive applications. But all of it must be built on a foundation of trusted data that's well governed and managed and ready to use. Trusted data is the starting point for anything that you might want to do downstream.


Jitesh Ghai: At Informatica, when we look at from a data management standpoint—whether for data warehouses, data lakes, or lakehouses—we believe that there are three pillars, or three legs of a stool, needed to successfully scale to enterprise workloads and systems of records. These pillars are intelligent and automated metadata management, data integration, and data quality.

So how does AWS think about the data management side of things? Informatica certainly brings a lot to the table, and then there are AWS’ own market-leading analytics services.

Rahul Pathak : We share a similar perspective. AWS has had a really long partnership with Informatica and our customers have benefited from the innovation that each of us has achieved in our respective areas. And what we've seen is—as you mentioned—that managing metadata and making sure data from multiple sources is well integrated and of high quality are critical aspects of building a solid foundation. And then, from our perspective, we're focused on giving customers the right analytics tools for the right jobs to help them make sense of all of this data.

So whether that's offering Amazon Redshift in data warehousing for its ability to query Amazon S3 data lakes, services like EMR, or our data catalogs, it's a very synergistic relationship that we have with Informatica. Working together, we're able to get data into a form that makes it easy for customers to then build analytical applications on top of the services that we offer.

Jitesh Ghai: What do you see as the next set of challenges that our customers will face? And what are AWS’ opportunities for innovation in light of these challenges?

Rahul Pathak: A few things will always stay the same. I think customers are looking for scalability, security, operational reliability, and we're always going to be investing in those things. In addition to that, customers are increasingly looking for solutions and services that are easy to use and operate. We're seeing a lot of interest from customers in serverless technologies and in automated systems that scale up and down to handle variable traffic loads. I think that will continue.

Jitesh Ghai: Agreed. AWS Lambda has been leading the charge with its serverless capabilities. We’re equally seeing elastic compute capabilities— whether on infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), or analytics services—in particular being used within the data management side of things. So that our customers get the complete economic benefits of an end-to-end cloud native stack, whether from the business intelligence (BI), the data science, the data engine, or the persistence of processing across the stack. And certainly that's where Informatica has been innovating with our cloud-native capabilities.

At this point, we've been partners with AWS for more than a decade, and we have a rich set of common enterprise customers that have benefited from building cloud data warehouses and cloud data lakes using our solutions together. Perhaps we could talk about a few of those?

Rahul Pathak: Absolutely. And I'd say as a partner, it's been terrific to see Informatica transforming, and bringing all of the benefits that you've had from your long experience in the data space into a cloud-native environment. Our joint customers have certainly benefitted.

There are organizations like the Community Technology Alliance, which has been focused on integrating analytics and data across multiple silos to reduce homelessness. It’s an amazing initiative that we wholeheartedly support and are delighted to be part of. It's been great to see them use our technologies together to bring data into Amazon Redshift and then use its visualization layer to help drive outcomes for their communities.

Jitesh Ghai: Yes, a project like that really hits home. The Community Technology Alliance was able to reduce homelessness in our community by 75%. A terrific example of what you can do with trusted data-driven insights.

We're also seeing a lot of that within our customer base in the public sector, within the healthcare space with COVID-19. Now more than ever, healthcare providers have to make critical decisions on a foundation of trusted data. We saw healthcare providers governing data on a foundation of metadata by identifying data sets on ventilator supplies or critical COVID-19 patient lists. It was eye opening.

Another example that you had mentioned at AWS re:Invent was Sysco Foods. Could you talk about that?

Rahul Pathak: Sure. Sysco, one of the largest food distributors, was looking to completely modernize its infrastructure. We’re seeing a lot of this, by the way. Whether it's migrating from legacy systems, or, as in Sysco's case, moving core data off a mainframe, companies are working to get an end-to-end picture of how they're operating their businesses. For Sysco, this involved first working with Informatica to get its data movement and integration processes operating, and to making sure that the trusted data that it was using on-premises was cleanly transferred to AWS.

And then on AWS, Sysco really went all the way in modernizing. It ended up using a combination of Amazon S3 for its data lakes, Amazon Athena for serverless queries, and then Amazon Redshift for its most performance-sensitive data warehousing workloads. It was a great end-to-end story and a terrific example of a large-scale enterprise transforming itself and its analytics infrastructure.

Jitesh Ghai: Yes. I couldn’t agree more. Now more than ever, ensuring that the food supply chain functions optimally is good for all of us.

Data is a foundational element of that. Now, I'm a data person. You're obviously a data person. We're passionate about this. But it’s encouraging to see the real-world outcomes that data and analytics are delivering in a cloud-native world. The resilience. The flexibility. And, above all, the trusted insights.

Rahul Pathak: Jitesh, I completely agree. It's been great to see how the things that we're passionate about can drive positive outcomes, especially in these times.

Jitesh Ghai: So a healthy supply chain is critical, and healthy communities are critical. So why not talk about fitness as well? Equinox is another shared customer of ours that has modernized by moving to a cloud-first strategy, leveraging Informatica’s Intelligent Cloud Services as well as AWS's analytics services. Perhaps you could share some of the success Equinox is having with our combined offerings.

Rahul Pathak: This is another great example of a customer migrating from a legacy data warehouse system to Amazon Redshift with Informatica handling the data management and data quality processes. What Equinox did: it captured the data coming off its fitness equipment to build personalized environments for people looking to get into better shape and in general be healthier. And with Informatica being able to manage Equinox’s data processes, connect to its numerous data sources, and then get that data into a form where it was clean, curated, and ready for further analytics, Equinox was able to use Amazon Redshift for high-performance analytics to drive better experiences for its customers.

With our joint solutions, Equinox was able to get a 60% to 70% improvement in its data loading time and meet all of its service level agreements (SLAs) for real-time reporting. This ultimately helped them drive better outcomes for the business.

Jitesh Ghai: Healthy communities, healthy food supply chains, healthy people. We're not quite solving world peace, but I think those are the fundamentals of getting there.

Rahul, thank you again for the continued partnership, the continued innovation together and the continued shared successes we're delivering to our enterprise customers. We’re excited to continue bring organizations to the cloud and modernizing them with end-to-end cloud native stack from Informatica and AWS.

This conversation originally aired as the series premiere of the Intelligent Data Summit for Cloud Data Warehouses, Data Lakes, and Lakehouses. All sessions from the summit are now available on demand.

First Published: Aug 19, 2020