At a Monday morning scrum meeting, Sam, VP of Data Management, discusses plans to modernize current on-premises data infrastructure to a cloud-based data warehouse and data lake. This IT-led modernization project will achieve scalability, elasticity, agility, and bring it better performance. Part of the project plan includes a data ingestion and processing framework to process and store data in a cloud lakehouse. Sam asks if such a solution can be built, operationalized, and supported for the new initiative.
Tim, the lead data engineer, takes on the challenge and commits to a one-week deliverable for a data ingestion and processing framework up and running in the cloud. Feeling confident, Tim and his four developers get down to hand coding the solution for five data sources that have been identified for this pilot.
The first data source is relatively easy to work with, as the data is well structured. Tim moves to the next datasets and that’s when trouble starts. The data comes from an OLAP system; however, the base tables to support the OLAP system are derived from a CRM system, a mainframe system, and an MDM system. Tim’s code needs to connect, parse, integrate, and cleanse these varied data types that are siloed in different systems. What’s more, Tim is not even sure how to maintain the quality of the data. And, as the lead engineer, Tim has a new dilemma: He needs to figure out where and how code will scale to process the large volume of data.
And now the deliverable due date doesn’t seem achievable, potentially setting the team back months or more.
The above scenario fits any IT organization supporting a cloud-based AI or analytics initiative that charges its technical developers to design and develop a hand-coded data integration solution.
After several months of project delays, Sam jots down a few lessons learned, based on the team’s initial attempts to hand code a data integration solution:
These are only a few points Sam learned from the first phase of this project. Additionally, Sam thinks about reusability – the team built a solution for a specific system. How will the solution manage other systems within the enterprise? The team will have to customize a solution for 70-plus systems. Secondly, Sam also manages the operations team. How will this team deploy and maintain 70-plus hand-coded solutions?
Nine months after kicking off the initial modernization project, Sam realizes he needs to re-evaluate his cloud data management strategy. With the benefit of hindsight, he recognizes that this strategy needs to improve productivity, reduce manual work, and increase efficiencies through automation and scale. Sam outlines the requirements for a cloud data management solution that includes:
Sam realizes over time that his team’s hand coding data integration solution leads to higher costs, especially for enterprise environments. Sam is looking to avoid mistakes that prevent the organization from implementing successful cloud analytics initiatives. Like many IT leaders, Sam is looking to de-risk cloud modernization projects by avoiding hand coding or limited point solutions and lowering cost.
A year after that first scrum meeting, Sam has invested in an intelligent and automated data integration solution. The solution bridges the gap between on-premises and cloud deployments and accelerates time to value. Sam’s team now has native access and visibility to various data types, enabling them to standardize on data variety as well as validate, verify, and process the data without having to rebuild everything from scratch. The result? The company can now reap the benefits of a successful cloud lakehouse implementation.