Get 3 steps closer to big data gold

NoSQL will not replace all your tools, but it will be a valuable addition to your toolbox.


“Organizations are unwilling to part with big data that may prove valuable later on. NoSQL lets you economically store all of that data you used to throw away in case need it in the future.”

NoSQL has been extolled as the solution to big data storage problems. SQL and other popular relational databases cannot handle unstructured big data because of storage limitations and inflexible processes. Proponents applaud NoSQL’s ability to store seemingly boundless amounts of data and handle changing data schemas. But NoSQL is not a replacement for preceding analytic technologies. It is an additional tool to ask new questions of new types of data.

Traditionally, you have had to choose which data was important to your organization at a given time and throw away the rest. But organizations are now unwilling to part with big data that may prove valuable later on. NoSQL lets you economically store all of your data in case need it in the future.

Inflexibility has also been an issue. It is difficult to change schemas with typical database architectures. If something changes unexpectedly or you need to reorder your data patterns, a system can break and the problems can cascade downstream. A NoSQL database lets you ingest data regardless of the schema.

Follow these three steps to get closer to extracting value from big data:

  1. Understand the data. Typically, you have a huge amount of data that you may or may not care about. When you have data from different sources, you still have to understand how the datasets relate. The “schema on read” movement does not mean that you can ignore understanding how datasets relate to each other. Instead, you must identify and fix the nesting or join relationships between two or more documents or datasets. At that point, your attributes critical for analysis should be standardized enough for valid initial results.
  2. Process the data. Next analyze your data using a system with the processing power of a NoSQL database or Hadoop. Sift through your integrated data and extract value from it. As quickly as possible after you ingest and relate the relevant data, determine its analytic usefulness. You do not want to waste time structuring, cleaning, and preparing data if data scientists do not find the data to be useful. Skip step 3 and return to step 1 with new datasets if the data does not answer your questions. Keep the existing data, however, because it may be useful later.
  3. Transform the data. If you find the data useful, prepare it for additional processing and usability by a wider analytic audience than just data scientists. While you can hand code the necessary transformation, standardization, and cleansing required, this approach is typically slower and not sustainable. Instead use a tool that can handle different data sources, including complex data and data from NoSQL databases. Then transform it into useful information that humans can read.

It is not unlike mining for gold. You have to process tons of earth to find even an ounce of gold. As long as you have a powerful engine to process the dirt and can continue to creatively come up with relevant business questions, you will keep finding gold.

For more information on the steps required to extract value from big data, see “Preparing for the Big Data Journey: A Strategic Roadmap to Maximizing Your Return from Big Data.”

Related content


7 ways to reinvigorate your development skills in 2014

Focus on learning new skills, developing innovative projects, and getting certified. You will gain job security and greater exposure in the process.


3 ways to derive operational intelligence from event-driven architecture

Event-driven architecture lets you create real-time applications that empower the business to make the right decisions at the right time.


Take agility to the next level by using data virtualization for prototyping

Shed the waterfall model and reap the benefits of prototyping: earlier feedback, tighter collaboration, and more accurate results.