Building a Cloud-Native Architecture: 3 Keys For Success
How can you create an effective cloud-native architecture? You first need to understand the principles that underlie its design. Here’s a quick crash course on what cloud-native is, why a cloud-native architecture is important and three key principles for designing an application architecture that drives business value, scalability and cost-effectiveness.
Cloud-Native Architecture: What It Is and Why It’s Important
According to the Cloud Native Computing Foundation, “Cloud-native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure and declarative APIs exemplify this approach.”
By taking a cloud-native approach, organizations can design, build, modernize and operationalize applications and services capitalizing on the cost, speed, agility, flexibility, resilience and scalability benefits of the cloud.
At its core, cloud-native architecture is a design methodology for developing and improving system architectures specifically for the cloud. A cloud-native system clears away the constraints on the development teams of the traditional stack while expanding business value and accelerating time to market.
Three Principles for Cloud-Native Architecture
While there are a variety of cloud-specific principles to consider, it’s important to understand three core principles — XOps, serverless and consumption-based pricing — and how they benefit your organization.
1. XOps: What it is, what it does and how to use it
XOps is an umbrella term that refers to the generalized operations of all IT disciplines and responsibilities. This includes BizOps, MarketingOps, DevOps, AIOps, MLOps and DataOps. The goal of XOps is to achieve efficiencies and economies of scale using DevOps best practices. It ensures reliability, reusability and repeatability by enabling automation while reducing technology and process duplication.
The application of continuous delivery and DevOps to data analytics is called DataOps. It applies principles from DevOps, agile development, and the statistical process control. The purpose is to improve the cycle time of extracting value from data analytics.
How to operationalize XOps to drive business value: Retail use case
Let's say you lead the data and analytics practice in a multinational retail company. During the holiday season, there’s a sudden surge in demand for video game consoles. This is due to discount offers across all your stores in North America. But your retail stores can’t meet the demand, resulting in a loss of customers and revenue to competitors.
To address the situation, you’re asked to accurately forecast demand for the consoles. That way, the retail stores have the right amount of stock on hand. To do this, your team needs to build machine learning models for automating forecasts based on demand. But the data infrastructure between different stores and external suppliers is siloed. This makes it difficult for the developers, data scientists and data analysts to provide trusted insights. So how can you dissolve data silos, enabling end-to-end visibility of your inventory data? Your team needs to employ DataOps principles for creating an agile, seamless collaboration model.
Now, let's see how you can improve the inventory management with built-in DataOps capabilities from Informatica for continuous integration, delivery, and operations to automate timely delivery of trusted data.
To begin, your developers need to understand the attributes and patterns of point-of-sale (POS) and inventory data from various sources. These include your enterprise resource planning software (ERP), customer resource management tools (CRM), and software as a service (SaaS) applications.
But how do you discover and track the transformation of this data so that it’s consumable for analytics?
By discovering data assets with the Google-like search capability in the Informatica Enterprise Data Catalog, your development teams get a holistic view of POS and inventory data. They see all the data attributes like customer name, date, time, quantity, frequency, stock inventory, stock changes and more.
After you’ve discovered your data assets, you need to ingest them from the store’s Oracle databases and the supplier’s mainframe systems into Snowflake, which is your company’s unified analytics platform. The wizard-based approach of Informatica Cloud Mass Ingestion lets you ingest terabytes of data from Oracle databases into Snowflake. You automatically capture and update changes in the data in real time and replicate them into Snowflake.
Once the data is ingested, your data engineers integrate and process the data by building a codeless data pipeline with Informatica Cloud Data Integration. Data stewards then use Cloud Data Quality to profile, cleanse and transform the data for analytics consumption. The data scientists are now able to use clean, trusted data to build machine learning models.
After the machine learning models have been built, data engineers operationalize them by deploying them using a python transformation in the Cloud Data Integration interface.
Finally, using Informatica Operational Insights, the delivered data pipelines can be continuously monitored to identify job failures and make sure the machine learning models are functional.
Implementing DataOps with Informatica has allowed your stores to accurately forecast demand. This results in higher sales, increased store revenue, and happy customers.
2. Serverless development: Helping you do more with less
Serverless is a cloud-native architecture development & deployment model that allows developers to manage building, testing and deployment of data and applications without having to manage servers. There are still servers in serverless, but they’re abstracted away from development. A cloud provider handles the routine work of provisioning, maintaining and scaling the server infrastructure.
Once deployed, a serverless cloud-native app responds to demand and automatically scales up and down as needed. Serverless offerings from public cloud providers are usually metered on-demand through an event driven execution model. As a result, when a serverless function is sitting idle, it doesn’t cost you anything.
A real-world example of serverless development: Food delivery
Let’s say you lead the data engineering and analytics development teams for a fast-growing food delivery startup. Your company just signed a deal with a major credit card company. It lets cardholders get free food delivery on your app. You expect a huge spike in transactions, web traffic and data volume in the next few days.
Your personalized recommendation engine suggests dishes and restaurants based on past orders and user preferences. However, it was built on an older MapReduce framework that can’t scale. To deliver personalized, real-time recommendations, your data science team creates a ranking algorithm. It is based on user preferences, demographic similarity and a pre-filtering feature based on geolocation. This algorithm only presents users with restaurants within their delivery range.
To take this machine learning model to production quickly, you need to run it as part of your data engineering pipeline, which was primarily built for a batch-type pattern. There are key questions to consider:
- How do you run batch loads and real-time recommendation engines in the same data pipeline?
- How much data volume should you plan for?
- Do you have enough processing power to generate recommendations in real time for new users, while avoiding cost overruns?
Not to worry, Informatica can help. With our Advanced Serverless offering, you operationalize your machine learning models and leverage your existing data pipelines while the Informatica Spark-based engine processes terabytes of data in a few minutes.
To set up Informatica serverless, your IT administrators just need to provide your cloud infrastructure and network details on the Informatica user interface (UI). Alternatively, you can implement infrastructure-as-code using our ready-to-use Cloud Formation templates on AWS. This allows you to provision all necessary resources in a secure, repeatable manner within a couple of minutes.
Once set up, you run your data pipelines and operationalize your machine learning models on-demand in response to an event or even at a pre-set schedule. And the best part? You don’t need to change any of the Informatica mappings. Just repoint your runtime to use the serverless environment.
On the first day of free delivery, you saw a more than 10-fold increase in traffic. Thankfully, Informatica automatically scaled processing capacity for your needs. And when the traffic subsided, Informatica shut down two of the clusters to save you from unnecessary compute costs.
Spark processing jobs also take less time than what you’ve seen with other Spark clusters. In some cases it’s as much as 65% faster. This is because the artificial intelligence and machine learning (AI/ML) engine, CLAIRE™, automatically tunes clusters at runtime to get the best performance.
Moreover, with the metering dashboard, you monitor service usage in real time. So, not only do you keep compute costs lower, but you’ll also be able to keep a close eye on them in real-time.
3. Consumption-based pricing: Grow in the cloud at a pace that makes sense for your business
What is consumption-based pricing? It’s where you’re charged according to how much of the service you consume — and for the storage of data or amount of computing power required for a task.
In a cloud-native world, this is the most flexible pricing option, aligning with your needs as a customer. You’re only paying for what you’re using. The return on investment (ROI) of this model is obvious: quick time-to-value without a lot of commitment.
Consumption-based pricing: One company’s model
Let’s say you lead the data engineering & analytics group at a multinational insurance company. Your organization has numerous systems for managing data. This includes information on customers’ insurance policies, claims and sales, and opportunities. Salesforce CRM is used to maintain customer records. Snowflake is the platform for reporting and analytics. Azure Data Lake is used for deploying machine learning models to detect fraudulent claims.
But business has scaled and use cases have evolved. Your existing integration platform hasn’t been able to keep up with diverse integration needs. Nor has it been able to control costs. Your data integration pipelines are taking much longer to process data and they require more resources. These challenges have led to cost overruns and delays in reporting. Not to mention, your existing integration platform gives limited visibility into resource consumption and lacks any capability to limit it. To make things worse, your company is already incurring expensive data transfer charges for processing data in Snowflake and Azure Data Lake.
You know your organization needs a cloud data integration solution. One that addresses diverse integration use cases and controls resource consumption, while supporting multi-cloud ecosystems.
Cloud Data Integration by Informatica can help you operationalize your use cases quickly with the flexibility of its optimization engine. Plus, it gives you the freedom of complete cost governance.
First, you can transfer your weekly insurance sales and opportunity data from Salesforce to Snowflake using a simple four step wizard. This is powered by the Extract Transform Load (ETL) execution engine, which can efficiently transform the data and quickly load it. Once the data is in the Snowflake landing zone, it’s curated, aggregated and moved to the enterprise data warehouse zone. Using the ELT execution engine with advanced pushdown capability, you can save on data transfer charges and achieve up to 50 times faster performance than your existing ETL integration.
Finally, using the Informatica Spark-based engine processing on Kubernetes, your company can clean and transform the historical insurance claims data stored in your Azure data lake. From there, you can use it for AI/ML use cases to detect fraudulent claims.
The volume of data processing will fluctuate due to sudden increases or decreases in the number of claims. When this happens, the AI-powered CLAIRE engine provides auto-scaling and auto-tuning capabilities. This lets you choose an option that best suits your workload. Additionally, the Informatica Spark engine helps you detect fraudulent claims up to 10 times faster than before. For each of these uses cases, the Informatica Optimization Engine helps limit compute hours and data transfer costs by sending the processing to the fastest, most cost-effective option.
Lastly, Informatica Operational Insights gives you an aggregate view of all your operations and resource consumption analytics to adjust resources to meet demand. The Informatica Processing Unit dashboard lets you track and monitor all your usage and costs with a single pane of glass. The built-in capabilities for cost optimization and governance enable you to optimize your IT budget. It will also boost ROI by paying only for what you use with our consumption-based pricing.
Back-to-Basics Webinar Series: Data Integration
Ready to turn your data into actionable insights and business value by connecting all your disparate data sources? Register for the on-demand “Back-to-basics webinar series: Data integration.” You’ll get a better understanding of data integration, why it’s critical to analytics and business initiatives and how it’s evolving with changing business requirements, technologies and frameworks.