|
| |||
|
|
The Paleo Diet: Unstructured Data for the Enterprise CEO Shomit Ghose works with ONSET Ventures. The original Big Data dates from mankind’s Paleolithic age: speech, pictures and writing. The data gained from sight (text, images) and sound (language, music) remain the essential media of communication for humans today. Unfortunately, the data that’s conveyed in speech, text and pictures falls into the category of “unstructured” data as it has no defined structure unlike, for example, numeric data that can be easily mapped into a database and interpreted. For enterprises, the Paleo Diet – i.e., accessing and unlocking the insights contained in unstructured data – presents the key strategic opportunity in the field of Big Data. As we swim in ever greater oceans of Big Data, IDC has found that 90 percent of this digital information is unstructured content. For the enterprise, this unstructured data takes the form of social media posts, call center notes, email, images, video, Web content, sensor and mobile data, warranties, contracts, sounds, shapes, ads, click-streams, Office documents, X-rays, MRIs, doctors’ notes, real estate listings and annual reports. Needless to say, the rows and columns of a traditional structured database are completely unsuited to organizing and making sense of unstructured data. If properly harnessed and combined with structured data, unstructured data promises to deliver to enterprises deep, 360-degree views of customers. Unstructured data is a powerful resource for applications like audience clustering, predictive marketing and sentiment analysis. Essentially, while the stream of structured (transactional) data readily explains what is happening at the moment, the stream of unstructured data can yield insights into what’s going to happen, or why something happened. To date, structured data has been the basis of enterprise analytics because it’s relatively easy to interpret: structured data is primarily numeric, repeatable in type, and predictable in timing and treatment. Unstructured data is far more challenging. Not only is the data volume vastly greater, but unstructured data has (by definition!) no inherent format or repeatability, and brings with it an extremely unfavorable signal-to-noise ratio. Further, securing unstructured data is an added challenge (think regulatory risk) given that its content cannot be known a priori. As well, different industries have different levels of reliance on unstructured data, and different departments within the same company may rely on entirely different sources of unstructured data: Marketing on social media; engineering on design documents; customer support on call notes; finance on emails; sales on contracts; and HR on employee reviews. For an enterprise, making sense of unstructured data can seem a daunting undertaking. So, how does a C-level executive manage an unstructured data initiative within their company? Despite the apparent complexities of bringing unstructured data into the enterprise, the recipe for embarking on a corporate Paleo Diet is rather straightforward:
We continue to flood the Internet with many trillions of gigabytes of data annually, overwhelmingly through the unstructured data of text, images and video. The opportunity for enterprises to gain insights from this, the largest and oldest class of business data, is immense. With a thoughtful strategy, and thanks to enabling technologies such as Hadoop, enterprises are now able to drive predictive analytics from the patterns and connections hidden within the 90 percent of Big Data that is unstructured. A comprehensive strategy that marries unstructured and structured data promises to fully and finally deliver the benefits of Big Data to the enterprise. Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library. |
|||||||||||||