News Stay informed about the latest enterprise technology news and product updates.

EMC introduces data classification for files

EMC has pulled together components of Legato, Documentum and Smarts to create a data classification product for files.

EMC Corp. will introduce a new product this week for data classification of files that stitches together components from its acquisitions of Legato, Documentum and Smarts and aims to help users get a better handle on managing unstructured data.

Two years in the making, Infoscape borrows the discovery engine from Smarts, which finds all file servers and EMC storage on a network without the use of agents and provides content and metadata analysis from Documentum and data movement functionality from Legato. It also uses indexing and search software from Fast Search & Transfer Inc., a company EMC hasn't acquired, yet. Combined, EMC claims these features will allow users to classify information based on how important it is; move it to the appropriate tier of storage according to predetermined policies and manage its retention for compliance purposes.

More on storage data management

EMC buys Tablus for data classification and security

Data classification is end users' job

A step-by-step approach to data classification  

Buyer's Guide: Data classification tools  

"They've added the brains to information lifecycle management (ILM) [information lifecycle management]," said Arun Taneja, founder and consulting analyst of the Taneja Group. "Users don't know what is where today or where the sensitivities lie, they are at the mercy of the gods …[Infoscape] is a step in the right direction, out of this chaos."

However, he cautions that data classification is a "nontrivial" exercise that could take days or weeks depending on how many files are being indexed and whether the product is indexing files based on their content or more simply, the metadata. EMC notes that full-text or content-based indexing will have an impact on performance and this process should be undertaken in off-peak hours. Taneja advises users to add specific servers "a few at a time" to gauge the impact on performance.

Brian Babineau, senior analyst with the Enterprise Strategy Group, adds that the index itself can consume a lot of storage capacity, "which is the downside," but "the more you feed it, the better the benefits." Taneja estimates that getting from where we are today to the point where the whole enterprise has a handle on this is a three-to-five year process.

Today Infoscape only supports files, but eventually it's likely to have plug-ins for email and structured data. The next release, expected in the second half of 2007, will support encryption. EMC is planning to build a partner program around the product to build onto its functionality. For example, an independent software vendo (ISV) in the legal realm could write to the Infoscape application program interface (API) to provide metadata specific to a certain kind of law. EMC has written 25 initial taxonomies, or lists of keywords, for various verticals, including the healthcare, manufacturing and automotive industries.

Service and pricing

The company has also created a service called EMC Information Management Strategy Service to help users define policies and procedures; after all, products like Infoscape are only as a good as the policies the user defines for them. Infoscape costs $125,000 for the first 10 terabytes (TB) of data to be classified, and then it is priced per terabyte on a declining scale as more data is classified.

Mike Fisch, director of storage and networking with The Clipper Group, said he expects EMC to tie in more data protection features over time, such as remote replication, continuous data protection (CDP) and security. He said the hardest part to tackle with data classification is not the technology but the human factor. "Getting all the right people together from the right departments to figure out the policies is the difficult part." However, he said he thinks that the threat of noncompliance and enormous fines is starting to light a fire at many companies and will prompt them to look at Infoscape.

EMC has a jump on its competition here. Neither IBM nor Hewlett-Packard Co. (HP) has a product that offers this kind of functionality, yet. And while Network Appliance Inc. (NetApp) has an OEM deal with Kazeon Systems Inc., it hasn't made big strides here. The market is largely in the hands of startups right now, including Abrevity Inc., Arkivio Inc., Copernic Technologies Inc., Index Engines Inc., Kazeon, KOM Networks Inc., Njini Inc., Scentric Inc., Seven Ten Storage Software Inc. and StoredIQ Corp.

In general, analysts seem to agree that this is the first significant product from EMC that combines elements from some of its key acquisitions. "They've bought a lot of software players … it's one thing to massage a user interface, but it's more substantial to bring the pieces together," Fisch said.

EMC says it has several beta customers testing the software, but none were available by press time.


Dig Deeper on Enterprise content management software platforms

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.