Big Data in Banking: Use Cases in 2020 and Beyond

Last Published: Aug 05, 2021 |
Peter Ku
Peter Ku

VP, Chief Industry Strategist – Financial Services

What is the Role of Big Data in Banking?

Big data is defined by four main characteristics: volume, velocity, variety, and veracity.

The global financial services industry generates massive amounts of structured and unstructured data every day by processing hundreds of billions of financial transactions as well as through interactions such as email, audio and video communications, call logs, weblogs, and mentions on social media.

One significant driver of this data explosion is an increase in global payment volumes, fueled by ecommerce and mobile payments. No one knows yet how the global COVID-19 pandemic and ensuing economic downturn will affect the global payments market, but it was previously predicted to reach $2 trillion by the end of 2025, with a compound annual growth rate of 7.83%. E-commerce also continues to grow dramatically, especially at a time when consumers are being warned to do as little in-person shopping as possible. And ATM usage, paperless mortgage processing and closing, peer-to-peer payments through apps like Venmo and, and other mobile and remote digital banking services are growing increasingly popular.

What Are the Benefits of Big Data in Banking?

With digital, social, and mobile technologies becoming a standard platform for information access and commerce, the amount of data banks is producing and consuming is nothing short of staggering. Managing all of this data presents business and IT challenges, but it also creates opportunities for banks to grow their business, combat fraud, and improve operational efficiencies.

By applying analytic solutions powered by the cloud, AI, machine learning, and natural language processing technology, banks can leverage their data for previously impossible levels of insight into every aspect of their business. They will be able to understand what happened in the past, why it happened, and what is likely to happen next. And eventually, this analysis will allow them to answer the next logical question: “What do you (and your customers) want to happen?”

What Are Some Big Data Use Cases in Banking?

The use cases for big data in banking are the same as they were when banks first realized they could use their huge data stores to generate actionable insights: detecting fraud, streamlining and optimizing transaction processing, improving customer understanding, optimizing trade execution, and ultimately, competing in a crowded market by delivering superior customer experience. However, as they gather ever more data, the resulting insights and customer experiences become more accurate and meaningful. Here are a few examples:

  • Western Union offers an omnichannel approach that tailors and personalizes customer experiences by processing more than 29 transactions a second and integrating all that data into a single platform for statistical modeling and predictive analysis.
  • An Eastern European bank that opened without brick-and-mortar locations in the late 2000s, offering credit cards and other banking services entirely, is staying ahead of the online offerings of its older and more established competitors by using big data analytics to assess and respond to credit applications in near real-time—a consumer-pleasing feature that has boosted conversion rates for certain upsell campaigns tenfold.

How are Banks Addressing New Big Data Challenges?

Although the use cases for big data in banking remain the same, the challenges have shifted as data engineering technology evolves.

Speed is of the Essence

Shifting from traditional data warehousing to running Hadoop with its massively parallel engine on commodity hardware allowed banks to cut the length of time it took to extract insights from their data from three months to a day or less. Adopting cloud-based data processing reduced that timeframe still further. However, banks still tend to process data in monthly batches, which means they may not spot a trend for 30 days or more.

Apache Spark is one potential solution to this problem. Like Hadoop, it’s an open source big data analytics engine, but it’s faster, more scalable, and easier to use. It can also be deployed natively in cloud environments to capture and analyze streaming data in real time, for more timely and accurate responses to business questions.

Big Data Needs to be Managed and Governed

Hadoop and Spark can move large volumes and varieties of data into a data lake so it can then be pushed to an on-premise or cloud data warehouse where business users can access it. However, they can’t guarantee that data is fit for use. Neither Hadoop nor Spark perform data management or data governance natively, so they can’t help business users understand what they have, what it means, or how it’s used. Nor do they provide data lineage so users can see all the transformations their data has undergone on its journey from source systems to analytic tools across the enterprise.

Technology is Modular and Commoditized

Banks have tackled the costs, skills gaps, and infrastructure management issues of big data analytics by shifting data processing from on-premises hardware to the cloud or hosted colocation facilities. However, when a local credit union and a multinational bank have the same access to AWS, Microsoft Azure, or a managed service provider, the ability to process large volumes of data is no longer a competitive differentiator.

Banks must be able to move faster to transform their data into intelligent insights, and then put those insights into action to improve customer service, connect customers to information and products when and where it’s needed most, and protect sensitive data and customer accounts from threats.

Improve Data-Driven Decision Making in Banking with Intelligent Big Data Management

When everyone has vast amounts of data, what matters most is how you can use it. Informatica ensures that you can capture all kinds of data from all kinds of sources, then facilitates governance and consumption, including identifying and masking sensitive data and distributing it according to your access rules.

Your users can be confident that the data they’re analyzing is clean, correct, compliant, relevant, and secure. And when you implement automated decision-making driven by AI, machine learning, and natural language processing, Informatica’s real-time, scalable processing ensures that all your processes have trusted, governed, secure data, too. So you know that however your bank’s decisions get made, they’re driven by good data.

For more on real-world applications of AI for banking and financial services, check out our webinar.

First Published: Apr 12, 2020