kentoh - Fotolia


How analytics engines could -- finally -- relieve enterprise search pain

Analytics engines could take what was once the province of e-discovery and make effective enterprise search a reality across all company information.

Over the past few years, the search market has shifted quite a bit. After emerging a decade ago with promise, the market stalled. Slowly, the landscape shifted from the productivity side of the work equation to managing information post-mortems through e-discovery technologies.

But today, the pendulum is swinging back. E-discovery technologies are now shifting from a reactive to proactive approach to solving problems, enabling search to evolve with more visible progress than we have seen in the past.

Unfulfilled promises

People just don't spend time on complex classification schemes.

When the enterprise search market took off, the tools promised to be the cure-all for our information sprawl problems. Google entered the fray with an enterprise search appliance. But even Google couldn't make a dent in a limited market.

Google's failure highlighted the real problem with enterprise search, which attempts to be a comprehensive search tool for the information within an enterprise but has never been on par with external search engines like Google. When indexing the Internet, Google tracks links, headings, authors and other elements to determine the validity of a webpage to use in the rankings. This information is metadata about those webpages that augments the text of the webpage.

But the backbone of the equation is flawed. In the average enterprise content management (ECM) system, metadata is poor and incomplete. It barely exists on network file shares. People just don't spend time on complex classification schemes. As a result, content classification is inconsistent and unreliable.

These realities affected search technologies. Search engines that ship with ECM systems are typically tuned to understand the metadata models that the system automatically tracks. Centralized enterprise search systems cannot understand the metadata schema intricacies of every system, leading to less accuracy than is present in native systems. Federated search tools tried to step into the gap, but they couldn't solve the problem of how to rank results when each system scores relevancy differently. The same piece of information could score as an 80% match in one system and as a 50% match in another system.

So, the search market stagnated, while the e-discovery market began to take off. In an area where the ROI of improving the discovery process was clear, innovation grew. E-discovery vendors focused on finding documents that were similar to one another. While the software needs to be trained, the results are impressive. Accuracy rates in the 90th percentile easily surpass human classification efforts.

The question became, how can predictive coding technology be applied to the front end of the information governance spectrum to improve enterprise search? Bringing more effective search to information governance could transform content management -- because now people might actually be able to find and access the content they are supposed to save to these systems.

Coming full circle

In 2014, e-discovery vendors began applying text analytics engines to the classification problem. The idea was straightforward. If we have 1,000 documents related to a company's human resources policies, how many of the remaining 1 million have to do with those policies? Now apply this to any concept or classification category and use it throughout the organization.

There are two key differences between what e-discovery vendors have traditionally done and this proactive approach. The first is that analytics engines don't create two buckets of content, where the goal is to identify documents that are deemed responsive. Instead, analytics engines identify documents that fall into each category and apply the respective metadata tags to the documents.

Second, people don't use these engines to search for content. The engines apply metadata to documents to allow search engines to find the correct information when people search for it. Text analytics provides the correct metadata to finally make search work within the enterprise.

This approach is worth money. Microsoft saw this promise and signaled its intention to buy Equivio, an e-discovery vendor. Equivio started in 2014 by launching a proactive classification tool for records managers. When Microsoft offered to buy Equivio, after buying and successfully integrating the FAST search engine into its SharePoint platform, it showed the importance of the differing functions of the two types of engines.

More e-discovery vendors will make the same shift in strategy that Equivio has made, because they see the vast possibilities in the front end of information governance -- and given the saturation of the e-discovery market.

Over the next several months, expect to hear vendors and analysts talking about how analytics engines on the front end can drive success in terms of content accessibility, organization and usability. We are already hearing about text analytics for email. Once it is applied to the vast, unwashed chaos of the average ECM repository, there will be much rejoicing.

Next Steps

The invisible hand of website search

SharePoint search brings back results, but are they relevant?

How to boost your ECM system's search capabilities

Dig Deeper on Text analytics and natural language processing software