Say "Hello World" to Hadoop

Understand the inherent risks and complexities that an innovative solution can bring to already complex big data efforts.


For the same reasons you profile and cleanse data before integrating it, you need to understand and evaluate Hadoop before introducing it to your extended environment.

The "arrival" of big data has been heralded with both excitement and trepidation. Organizations that successfully tackle the volume, velocity, and variety of big data with energy and creativity will have an undeniable competitive advantage. Companies hampered by problems with data integration and analysis will be besieged by cost, complexity, and risk issues.

You can take a leadership role by helping guide the business through the complexities of big data projects. Avoid potential pitfalls by using caution when introducing unproven, albeit innovative, technologies such as Hadoop into the mix.

Big data hype

You no doubt face daily challenges integrating unstructured data from disparate sources such as social media and mobile devices and would welcome a solution. And Hadoop is a significant innovation. Its support for all data types, its open-source roots, and the media buzz surrounding it are alluring.

“Most organizations are still in the early stages, and few have thought through an enterprise approach or realized the profound impact that big data will have on their infrastructure, organizations and industries1,” says Doug Laney, research vice president at Gartner.

Start small

Hadoop will challenge even the most innovative developers. Training and certification programs are scarce and best practices have yet to be established. Adding Hadoop to both your skill set and your existing environment will require education and planning.

But fear not — you are not alone. Many developers are looking for a “Hello World” introduction to Hadoop and big data. For the same reasons you profile and cleanse data before integrating it, you need to understand and evaluate Hadoop before introducing it to your extended environment. Otherwise, you may be introducing unnecessary risks and complexity.

Hadoop can certainly be used to analyze huge volumes of data. It does not manage metadata, however, so your data must be clean, consistent, and reliable. It also lacks logging and audit trails, administration and security, and operations visibility and tuning assistance. That’s where your skills with time-tested models such as ETL (extract, transform, load) will be valuable.

Your experience leaves you in the perfect position to take the best Hadoop has to offer now and grow your skills as it matures as a technology. For reassurance and specifics, see "7 reasons a seasoned developer is an asset to any big data project."

Find out more detail on how to brace yourself for big data by reading the white paper, "Preparing for the Big Data Journey."

Related content


Speed of change simplifies machine data collection

Streaming technology moves data quickly, removing complexity and reducing potential for failure.


Profile, test, measure, improve, repeat

Learn from your own experience, and Disney's, by leaving nothing to chance and everything to data.


Turn to CDC for real-time data

If you are looking to extract value from data, remember that big data is still too slow and unstructured for real time.