What is search?
Everyone knows what search is, right? On the public Web, it used to be Yahoo; perhaps now it is Google or Bing;...
and if you have a good memory, you will recall Lycos, Alta Vista and others.
What about inside the enterprise? Enterprise search vendors such as HP Autonomy, Google, Oracle, IBM and Microsoft SharePoint spring to mind. Search is also buried within many services and applications, such as Facebook or Netflix. Inside the enterprise, search is a part of operating systems such as Windows, but it's also within many common applications such as Microsoft Outlook, Word and Excel. Search is pervasive and most often perceived as a capability where someone tries to find something with search terminology.
A new paradigm for search is emerging: search-based applications, which allow a new form of information delivery. In these applications, search is invisible to the user and works behind the scenes. Here we explore how search-based applications can be architected with common technologies and why they are important in the information revolution.
What is a search-based application?
Compared with traditional, database-centered application architectures, search-based application architectures (SBAs) are quite different. In database-centric application architectures, content is derived from direct database queries. In SBAs, a search engine platform aggregates and delivers information from structured and unstructured content sources, through a unified interface.
Content sources range from simple to complex. Simple content is tagged and classified according to a predetermined classification scheme or involves data in a relational form, and hence, structure is inherent. Complex content is highly unstructured, and structure must be determined through semantic analysis. Examples of complex, unstructured content sources include wikis, blogs, websites and video. Less complex but nonetheless unstructured content sources include documents and forms.
No matter how many types of content are used, the search-based application platform creates a single, structured data layer where information is decoupled from its original source but where information retains its original characteristics. The SBA provides a query model against the single, structured data layer. The net result is that information can be surfaced holistically and related across the breadth of content in ways that are otherwise impossible.
During the 1980s, relational database technologies defined information access based on structured data; enterprise content management technologies emerged during the early 1990s to define information access to unstructured content. During the 2000s, search technologies matured, and now SBA services have emerged to provide a hybrid model. New application architectures have been fueled by the exponential growth of content sources, together with the desire for a 360-degree view of information.
Search-based applications in real life
Let's consider two real-life uses of search-based applications and how they combine structured and unstructured data.
A medical research tool allows researchers to view overlays of patients with particular diseases on a map correlating to other information, such as proximity to chemical plants. To achieve this, the search-based architecture would consume structured content from medical databases such as Epic, but also unstructured content via semantic analysis from physicians' patients' notes.
A Web-based apartment finder portal allows people to search for an apartment to rent or buy. To add value, the portal would assemble a full range of real estate options, including the number of bedrooms and bathrooms; distance to public schools and transport hubs, such as an airport; and price ranges. To achieve this, the search-based architecture would consume content on apartments from multiple real estate sources, use semantic analysis to normalize descriptions of apartments and data about the apartments, and consume geographic data sources to calculate distances of apartments to key points of interest.
Search-based applications are gaining currency as a way for companies to aggregate different data sources into a much more coherent picture. Companies that exploit these applications combine structured and unstructured data to enable a basis for more informed decision making.
About the author:
Jonathan Bordoli is a senior manager in Hitachi Consulting's Microsoft Platform Practice. Based in Chicago, he is responsible for architecting and delivering a broad range of solutions based on the Microsoft technology stack. Bordoli has more than 25 years of experience defining, designing and managing technology delivery across the full project lifecycle, with particular emphasis on delivery and adoption of content, collaboration, and process improvement solutions.
Building the case for enterprise search
Getting started with enterprise search