Powering a more efficient enterprise search engine
A comprehensive collection of articles, videos and more, hand-picked by our editors
Taking voluminous amounts of data and turning it into actionable information has become the holy grail for many companies today. And that makes people like Sid Probstein, chief technology officer at enterprise search company Attivio Inc., a modern-day hero of sorts -- or at least a change agent.
Companies like Attivio, based in Newton, Mass., have elevated enterprise search beyond "flat" document or term search to enable companies to make data-driven business decisions. Attivio's Active Intelligence Engine searches structured and unstructured data -- that means databases, spreadsheets, documents, PDFs, information in SharePoint, emails, blogs and social media, as well as other sources. The technology chooses the simplicity of the search engine user interface while capturing the relationships and complexity in databases. When combined, Attivio's search engine can crunch data quickly and address complex questions, like understanding what a certain group of customers who are active on Twitter have said about a company over the past year and which strategies to use to sell them a new product in the future.
Applications can be built on top of the engine to enable complex querying and data analysis. For today's data-driven enterprises, it's critical to bring easy-to-digest data to the executive suite. But it's no small feat. Probstein brings to bear two decades of experience contending with the successes – and the flaws – of enterprise search technology.
Probstein is a founding member of Attivio, which launched in 2007. Previously, he worked at other leading search technology companies. He was vice president of technology at Fast Search & Transfer, an enterprise search company that is now part of Microsoft. Before joining Fast Search & Transfer in 2002, he was a vice president of engineering at Northern Light Technology, an early-era search engine that was noted for delivering highly relevant results in the days before Google's ubiquity.
SearchContentManagement sat down with Probstein to discuss traditional enterprise search and how Attivio's technology differs.
You have quite a history in enterprise search. Tell me about your time at Fast Search & Transfer and Northern Light Technology.
Sid Probstein: Northern Light was a relatively well-regarded, unique search engine that was large scale, but it had business-model challenges in trying to sell content. It was a great opportunity to learn about search because there were so few search engines out there. I was able to bring that experience to Fast.
Fast was initially a hardware solution, and its core technology was a good, highly scalable general search engine. But then, one of the big decisions that was made was to focus on the enterprise, not the consumer, because there was a real opportunity to disrupt [that market]. We took that product from nothing to $150 million in revenue in five years. It had a blue-chip set of customers. Everyone was using that technology.
You can't really build a great search app without mastering structured data.
What is the problem enterprise search tries to solve?
Probstein: Everyone wants a search interface; we hate SQL [Structured Query Language] interfaces, we don't love BI [business intelligence]; it's a necessary evil we have put up with. We want the search model to work, but it has to work on all of our data. And what Google does on the Web doesn't work behind the firewall. [So, neither search engines nor databases on their own solve today's business analytics dilemmas.]
All interesting business questions start with structured data, not unstructured data. In this era of being data-driven, the questions are driven from structured data. We observe an effect: A transaction occurs, for example. The problem is that we split the what from the why. The why part is in unstructured data. A contract doesn't come through, a payment doesn't get made, and you need to know why. The companies that are really good at [analytics]are bringing in the why. Deep in that is the mystery of intent.
How does Attivio differ from traditional enterprise search engines?
Probstein: You need structured data and unstructured data together. You can't really build a great search app without mastering structured data.
You get so only far with search, but as soon as you have to bring in the business logic, the business details, the numbers, the transactions, old search engines don't do it. It's all about unifying information so you can bring the analytics to it and build apps on top of it. That is the search part.
So, Attivio is focused on unifying information. The end goal of big data is to make some insight that is actionable. But how do you do analytics of any kind if you can't get the data to it?
Today, in the technology world, there is a real split between these two. If you're going to do structured data, it's the old database stack; if you're going to do unstructured data, it's search, or maybe not only SQL for semistructured, middle data. But no one is making it easy to put information together without modeling -- which is the downside of a database -- without rationalizing it into some weird format, which might be something like an XML database. It can't be something that doesn't deal with structured.
How does Attivio's architecture work?
Probstein: We can pull data a lot of different ways. We can pull your data out of Office 365 in the cloud, or pull the data out of your locally installed versions of SharePoint. We have no ETLs [extract, transform and load], so it's fast: The data begins to flow into our engine right away.
After we bring in that data from sources like Exchange, SharePoint, Documentum, custom databases -- you name it -- we bring the data together. Structured data is kept in structure -- if there is a relational database model, we keep it. There is no flattening that data into customer records, which is what you would do on the search side. We recreate tables just like a database does. We can do joins between them, just as a database does, but we're also a search engine. You can do full-text search on any table or all tables, and when you're doing that full-text search, you can use the relationships between tables just as you would in a relational database. You can't do that with a search engine, and you can't do that with a database.
With search engines, there is no concept that one page is linked to another. So, we added a graph engine, a mathematical graph, where there are nodes and links between them. We use the graph to link the results in query A to all possible results in query B. So, it is incredibly fast.
Do you need analyst expertise or is the technology user-friendly?
Probstein: It's very user-friendly. It's a portal; multiple users log in. Security is pervasive in everything we do, so each user may have a different view of what they see, especially for internal data.
One of the great things about search, you don't have those ugly pick lists, typical in the reporting world, where you check a few boxes, or if things are opposed to one another, you get no results. Our customers like search interfaces, search is more sophisticated; it has things like type ahead, auto-complete, instant search. If you develop something too complex, customers can't get the value.
For more on data-driven decision making
Effective search strategy more than a technology problem
Big data calls for data-driven decision making skills
SharePoint data analytics a reality?
Business decision making must progress in the age of big data
Let me give an example. One of our customers is a bank. One group manages cash payments -- very important. They have a tight SLA [service-level agreement] on them, about 15 minutes for any outages. When we first met them, they were 50% over their SLA; so, when they would have an error, it would take 27 minutes to fix it -- that's much too long.
They learned that the average sysadmin who would solve this problem has to go to up to nine silos of data to figure out what the problem is. There are release notes for the application, ticket systems, wikis, a SharePoint site where sysadmins have left their notes and thoughts, knowledge base provided by the vendors -- there's a lot of different stuff. These sysadmins are going to each one, doing searches. Two to three minutes per silo, nine siloes, 27 minutes. They looked at Splunk -- but that solves only one silo. Splunk is great technology; but they solve one silo.
When we integrated the nine silos of data -- it took us two weeks, and we built a search application on top -- we cut the time to go to those sources; it now takes three minutes to search those nine sources. Now, on average, 80% of the problems are solved with only one search. It's not a simple search, but the sophistication is behind the check box. The user doesn't have to deal with it. This is the impact I want to emphasize.
They are able to change the way they hire. In the past two years, they don't hire general sysadmins, who are expensive and require a lot of care and feeding. Now they hire kids out of college. They put them in front of this interface. When a ticket comes in, they automatically identify the related content across all sources. Now, the sysadmins are happier, the company is happier, and these folks are learning all about this environment that they don't really need to be trained on.