Artur Marciniec - Fotolia


Are text analytics tools the future of records management?

In recent years, text analytics tools have simplified the time-consuming and onerous work of records management.

Transaction documents are a crucial byproduct of business, and records management is all about ensuring proper stewardship of those resources for compliance and operational planning. Whether paper or digital, records need to be organized and accessible to have any value. It was once thought the rise of computers would greatly ease and improve those tasks for businesses. Largely, that hasn't happened, but text analytics tools and auto-classification software may finally deliver on that promise.

Records management in the digital age has not been a resounding success. Ask a records manager if they have digital records identified and properly controlled, and many will deflect the question or respond with qualified answers. Don't blame them though; records management is hard to do right.

The good news is help is on the way. Emerging technology is opening doors for an age of information governance (IG), where business finally takes a wider view of compliance. There are hopeful conversations getting underway about treating information as an asset and not just a liability.

But, there is a problem. We still cannot accurately classify the thousands upon thousands of documents residing in the organization. Even classifying new documents going forward is a high hurdle, to say nothing of classifying past documents. When you get down to it, people do not want to be records managers and industry needs to remove that burden.

Paper worked

Organizations have kept and classified all sorts of records for decades. ARMA was founded 60 years ago to help records managers collaborate to perfect their profession. Remember the triplicate copy system? It worked because one of those copies went down to records, and was filed by a person who looked at the document and properly classified it.

Things changed when workplaces went digital. The clerical workers and secretaries were downsized, as businesses looked to save on supplies, storage and personnel. The theory was that document creators could file records quickly and, potentially, more accurately. There were savings, but at the expense of individual productivity.

Another drawback is that most employees do not want to be a records manager. It isn't something they are trained for, nor is it something to which they aspire. If employees have to perform a task outside their core job, it needs to be simple, quick and require almost no training.

If you have used a records management system, you know that's typically not the case. It's only easy in cases where the organization invested the time and money to streamline the process so much that the people saving documents don't even have to think about it.

Unfortunately, investing in a custom interface is cost prohibitive for most organizations. That leaves many records managers with sleepless nights, as they ponder options to move the needle of progress forward.

Enter analytics

Text analytics tools are rapidly becoming a bigger piece of the discussion. The specific technology is new, but the premise is as old as digital records itself: Use search to find what you want and automatically classify it.

Anyone who used search engines back in the 1990s will readily tell you that they were long on promise and short on delivery. They worked fine if you needed to find a keyword in a subset of documents. If you wanted to determine which of the thousand or so record categories to which a document belonged, you were out of luck.

Over the past few years, those thousands of categories have been narrowed to hundreds, as people realize that fewer options, when married with other metadata, can create a more compliant organization. While the original plan was to make it easier for people to classify records, it has prepared the way for computers to auto-classify them.

Auto-classification doesn't depend upon reading a document and knowing what it says. A large effort to codifying knowledge isn't necessary. The analytic engine learns where documents go through example. Analytic engines use similarities to group documents together. If you give the engine 1,000 documents and tell it that they are all budget documents, the engine can tell you, with some confidence, if document 1,001 is a budget document as well.

This is the same learning mechanism that has been at expanding in the E-discovery space. The most recent wave of vendors don't just depend upon keywords, they depend on being taught. After several days of watching an expert determine whether a document is compliant with a discovery request, the engine knows enough to classify documents with over 90% accuracy. Several days to a week may seem like a long time to train an engine, but given that the sheer volume of documents would typically take months to manually classify, it is a relative bargain.

By leveraging existing documents, record managers are already well down the road to having a set of records ready to train the analytic engines. When applied to existing repositories, the proper classification can be made for huge backlogs and even correct documents that may be improperly classified.

Time matters

If you're a records manager, this is exciting news. If the technology can be applied to the proactive declaration of records, then the broader approach of information governance has a chance to take hold. Information governance works only if all information is identified and classified correctly, and that requires quality records management. Like any asset, if information cannot be found because it is misplaced or lost, it cannot provide any value to the organization.

Auto-classification is already being applied to email systems. Is your current suite of software partners ready to dive in and apply this technology to your existing infrastructure? If not, it is time to start looking more closely at vendors offering auto-classification services and determining if they can solve your organization's challenges.

Once we know what we have, we can finally start to use our information as an asset-- and not just pay lip service to the concept.

Next Steps

Learn why content analytics is becoming a must-have for enterprises

Why information governance strategy equals information access

Dig Deeper on Information governance management