Amr Awadallah:
Win big with big data

Use the vision of a big data pioneer to understand why you need to “walk before you run” to build the foundation of an enterprise data hub.  

“Businesses will succeed if they adopt a data-driven future and truly make data part of their infrastructure.”

—Amr Awadallah, co-founder and CTO of Cloudera

Big data tools and methodologies can be a powerful way to drive business value, says Amr Awadallah, Ph.D., CTO, and co-founder of Cloudera. Cloudera provides an Apache Hadoop-based platform that enables an organization to look at both structured and unstructured data. Using this platform, organizations can build an enterprise data hub. This hub serves as a central place to mix and match structured and unstructured data for various workloads. Making this a reality requires you to start with a defined strategy.

Awadallah spoke to Potential at Work about how to pilot big data projects and evangelize the benefits their first steps provide. This helps build the business case for fully using the power of an enterprise data hub.

What makes a good pilot for big data that could be turned into a project for the enterprise?
Awadallah: The key is to start with a clearly defined business process and apply the new technology there. Different businesses will have different process-improvement opportunities that will make a pilot succeed for them.

One of the best starting points for adopting big data technology is improving operational efficiency around the data pipeline. Operational efficiency is not necessarily about doing anything new. It's about doing what you're doing today, but doing it faster, cheaper, and better.

With the increase in data volumes and types, extracting, transforming, and loading (ETL) that data into proper formats for business use simply takes too long. This often impacts the business decisions that rely on rapid availability of that data. When organizations implement an enterprise data hub, one of the immediate benefits is the ability to accelerate ETL processes. This can improve ETL service level agreements from hours to minutes. The best part is it will cost a fraction of what people are paying right now in terms of scaling the transformations inside of a data warehouse. Once the business sponsors begin to see these operational efficiencies become real, support for the enterprise data hub will grow.

Let’s look at it from the opposite perspective. What would make a pilot unsuccessful?
Awadallah: Simply put: if you try to boil the ocean – if you try to run before you walk. An enterprise data hub has all the promise and all the potential to transform the business. But, it is not “pixie dust” that you can throw at a problem and have money come out at the other end. It never works like that. Hadoop is a very powerful technology but it is also new. You need to get comfortable with it. If you try to do everything at the same time and don’t have a clearly defined business use case, you will fail.

How do you convince people of the need to transform their data warehouse infrastructure to now include Hadoop?
Awadallah: There are two angles to consider. One is making people comfortable with the technology. That is, explaining how the enterprise data hub complements an existing data warehouse. You can do this by providing one secure, low-cost location for managing both structured and unstructured data as an active archive. The second is making people aware that this really is a major shift. If you miss this shift, your business is at risk of not succeeding, period. Businesses will succeed if they adopt a data-driven future and truly make data part of their infrastructure. If you don't, you will be left behind by businesses that do make the shift to embrace all their data.

Hear more from Awadallah and Informatica Chief Architect Sanjay Krishnamurthy on these topics in Cloudera and Informatica’s Big Data + Hadoop Leadership Video Series.

Related content


Continuous data delivery is not rocket science--it's common sense

Continuous data delivery is not rocket science—it’s common sense.


Architecting for big data

The Internet of Things produces data at rates beyond human comprehension. It may also overwhelm your data environment.


IT, automate thyself

Heed your own advice by focusing on automation and efficiencies. You will find it scales better than just throwing head count at your processes.