Unstructured content – mostly text, but also images and audio/video content – is everywhere. It's in emails, legal documents and corporate records. It's on Web pages and in PDFs and RSS feeds.
And while the focus of most data analytics initiatives to date has been on unlocking the value of structured data, many organizations are now turning their attention to content analytics – and for good reason.
By many accounts, unstructured data comprises 80% of all enterprise information. Much of that content includes customer feedback in the form of emails, consumer reviews, survey results and more. That's an awful lot of valuable information to just leave lying around.
Like the field itself, the term unstructured content analytics is still evolving. But in general, it refers to the process of extracting information from unstructured data heretofore locked away in emails, documents and other files; applying at least a minimal layer of structure to it in the form of metadata and data definitions; and then analyzing and displaying the data to help business users make better decisions.
Since text is the most abundant form of unstructured data – and the easiest to collect and process – it's not surprising that text analytics is the most well-known type of content analytics technology. Text analytics tools are adept at detecting and understanding patterns in written text, much the way a human would, according to Sue Feldman, an analyst at Framingham, Mass.-based IDC.
Looking for 'building blocks of meaning' via content analytics
Text analytics software identifies and extracts the "building blocks of meaning – people, places and things and their relationships to each other," Feldman wrote in a recent report. "Traditional text analytics stores these elements in a specialized structured text index or repository. From there, the contents of a large collection of text can be mined for patterns, trends and relationships."
The technology is often used by marketers to help ascertain the "voice of the customer," or VoC, said Gareth Herschel, an analyst at Gartner Inc. in Stamford, Conn. VoC initiatives aim to discern customers' wants and needs as well as their feelings about a company, brand or product. With VoC information in hand, companies can better tailor their marketing campaigns in an effort to achieve better sales results.
Here's how it works: Customers leave feedback on products and services in any number of places. They might tweet about a new fabric softener they just tried, or write reviews about the performance of their car on a consumer review site, or leave comments about a new laptop in a designated area on the vendor's website.
What those actions all have in common is that they create content items made up of free-flowing text. It would be impossible for humans to troll through each item, condense the findings and display them in compelling visuals – some companies have hundreds of thousands, even millions, of customers. That's where text mining and analytics tools go to work.
Content analytics takes the measure of customer sentiment
There are many flavors of text analytics as applied to VoC projects, but a common type is called sentiment analytics. The goal here is to determine the sentiment – positive, negative or neutral – of individual pieces of content or to find customer sentiment trends in larger numbers of items.
For example, text analytics tools are used to determine the sentiment of tweets by examining and understanding keywords. When comments are identified as negative – perhaps people tweeted about their disappointment with a new coffee maker, using terms like "unacceptable" or "defective" – marketing workers can reach out to customers and try to resolve the problems.
Increasingly, customers are turning to social media outlets like Twitter to vent their frustrations with or sing the praises of new products and services, making them a breeding ground of unstructured content ripe for analysis, according to James Kobielus, an analyst at Cambridge, Mass.-based Forrester Research Inc.
"Social media is a huge source of unstructured content and semi-structured content," Kobielus said. He added that the abundance of customer feedback via social media, and the increasing capabilities of unstructured data analytics software, is even making more formal efforts to garner customer feedback – surveys, for example – moot. "All of that information is already there for the taking" on the Web, Kobielus said.
ABOUT THE AUTHOR
Jeff Kelly is a freelance writer.