bluebay2014 - Fotolia
Effective enterprise search tools are a vital component of any information governance strategy, but users are increasingly dissatisfied with technology that's cumbersome, work-intensive and slow.
In part one of this series, Jonathan Bordoli explains how Web search engines, like Google, have established high expectations for enterprise search performance. His clients want a "Google- like search experience," but there are major differences in the technologies which create challenges for bringing enterprise search features on par with top Web search engines.
For example, Google uses standardized search "facets" that classify and help provide context for content across the Web, but enterprise search facets should be much more specific, and pegged to a specifically calibrated taxonomy that's based on the vocabulary for a given line of business.
Additionally, Web crawlers have the easier task of sorting structured websites, which are often classified with help from search engine optimization (SEO) specialists, whose job is to ensure that search engines can find a given piece of content. Conversely, enterprise content is often unstructured and stored by employees who may not grasp the business value of making content easy to find and access.
In the second and final part of this series, Bordoli outlines the unique needs for enterprise search features, and discusses some of the emerging technologies that could affect efforts to bring enterprise search on-par with top Web search engines.
A better alternative?
My clients often ask for a Google-like search experience. But perhaps they really should be asking for something better -- something that will deliver improved performance, while also fittings the unique needs of the enterprise:
- Findability of content against a foundation of a well-defined strategic taxonomy scheme that drives faceted search.
- Findability of content that is locked up in documents (not Web pages).
- Findability of content that transcends documents and is stored within line-of-business applications.
- Findability of content holistically, whether it’s enterprise content or third -party content sources through effective federation.
Clearly, improving findablility is key. The good news is that enterprise search is arguably better now than ever before. But the playing field is also changing, as search vendors strive to keep up with these emerging trends:
Cloud and specifically PaaS (platform as a service). Increasingly organizations won’t install software either on-premises or even within cloud IaaS (infrastructure as a service). Rather, they will want to switch on a service for enterprise search, just as they can switch on SQL or Hadoop storage as a service. The challenge here is that few organizations will be totally in the cloud, with hybrid being the likely architecture of choice
Big data and IoT (Internet of Things). Increasingly organizations will want to see content (unstructured documents) and data (structured either in relational or Hadoop style stores) treated holistically, with findability of content across the spectrum, irrespective of where it is stored or content type. IoT projects are driving high volumes of data around telemetry data from machines (trains, planes, automobiles) and other connected devices now becoming pervasive, such as fitness trackers.
Rich media. Growing abundance of rich media, such as video and audio, will need to be enabled within the search function.
Knowledge graphs. Popularized by Google, but also an aspect of graph databases, a knowledge graph models relationships between entities (nodes) such as people, account and business. Nodes are linked (related) to other nodes with edges (in- and outgoing edges) and nodes also have properties (attributes), such as people might have properties of name, age, eye color; account might have properties such as account number and account sector. The point of the knowledge graph is to add meaning to content.
In the Google context, a search for Leonardo da Vinci [where a node=person] can show he was a renaissance painter [attribute] that produced paintings [node=painting]. Of course a search now on any of the paintings he produced, e.g. The Mona Lisa [node=painting] will relate to him as the artist. The knowledge graph starts to add meaning to the underlying content -- an implemented vision of the concept of the semantic Web as first described by Tim Berners Lee in 2001.
Machine learning. Machine learning is becoming commoditized and core to making things smarter. In our case smarter search could mean the enterprise search tools predicting likely searches based on past activity, with machine learning and predictive analysis techniques facilitating that.
The goal of delivering a smart and centralized enterprise search experience is also complicated by the need to integrate multiple content repositories into query results.
The enterprise search engine historically sits outside the content and content containers, as an installable piece of technology that crawls and analyzes content from discrete repositories. Federation technology offers the veneer of a single search interface, but there's potential for significant hidden pitfalls. The first is security context: lacking a uniform user-identity access, each of the repositories needs to understand what content the viewer is allowed to see-- when searching across applications -- or the results could be incomplete. Additionally, key context will not be available unless all content sources share the same taxonomy. When those issues are in play, the federated search is more like looking at a search page where several separate searches were done against different search sources, with the results shown in different areas of the screen. The outcome is search results that are not easily interwoven.
Perhaps the core components for the next generation of enterprise search engines need to be baked into the underlying technology of existing and emerging technology stacks, such as relational database layer, media services layer, Hadoop storage layer and so on -- providing a cooperating search federation across where content actually resides. If all content sources were managed by the same security model and all shared the same taxonomy, then federation should allow interleaving of results and correct security trimming. But the difficult part would be creating the holistic single search experience across these federated services layers.
In the meantime, users should focus on developing and refining a taxonomy -- this is the content classification scheme that will make existing enterprise search technologies more productive and deliver greater ROI by producing more relevant search results.
How analytics engines can improve enterprise search
Predictive coding augments traditional search tools
Building and maintaining an enterprise taxonomy