Sergey Nivens - Fotolia

Predictive coding plus crowdsourcing could cure e-discovery challenges

As the volume of data proliferates exponentially, companies can no longer handle the e-discovery process manually. They need software and human discretion to help.

Few companies relish the time, effort and cost of taking on the tasks associated with the e-discovery process. It requires teams of lawyers and hours of effort to comb through enterprise data to identify documents that might be relevant to a pending legal case.

As the volume of data proliferates exponentially, companies can no longer take on e-discovery challenges manually. They need to turn to software tools, such as autoclassification, predictive coding and keyword research, to identify relevant documents through what can be terabytes of data. Technologies like predictive coding, or technology-assisted review (TAR), can eliminate the manual labor of human review of documents for e-discovery and use methods such as search, filtering and sampling to automate portions of document review. 

Matt Lease, professor at the University of Texas at AustinMatt Lease

Now, new models are emerging that supplement technology with human assistance, but at vastly lower costs than the old methods of having $500-an-hour lawyers reviewing documents retrieved by software to see whether they should be set aside for the case. Today, companies are combining TAR with crowdsourcing -- and outsourcing -- to call on workers from a pool who have been selected to assist with document review. They may be selected through a system like Amazon's Mechanical Turk, which crowdsources tasks like audio transcription, or Upwork, which crowdsources higher-level tasks. These crowd members can review documents that the software has retrieved to ensure that they should be part of the document set. Augmenting software efforts with human judgments, and even training the software to better identify these documents, is the idea behind the hybrid approach to e-discovery.

Matt Lease, a professor of the University of Texas at Austin, and a speaker at the InfoGovCon conference Sept. 29-Oct. 1 in Hartford, Conn., discussed the merits of this hybrid approach and some of the hurdles the legal industry has yet to overcome in making it viable.

Why is technology-assisted review increasingly important to the e-discovery process?

In e-discovery, we often have a lot of records that need to get processed, and, traditionally, we have relied on doing it manually. But the question is, "Are there ways we can automate it that are still consistent with the quality we expect?" We can automate quite well.

How does the hybrid approach help take on e-discovery challenges?

It's about predictive coding combined with crowdsourcing. So, we run the software, and the software returns a set of documents. We're not sure about some of them, so we might ask a crowd member [an outsourced worker who is among a pool chosen to assist with review] about some, so we never go to the lawyer at all. Or, alternatively, if some of the documents returned are complex, we ask the lawyer to review some.

With interaction between tool and human, we easily match the full human process [in accuracy] and use far fewer human judgments to make that happen.
Matt Leaseprofessor at the University of Texas at Austin

In this approach, it's not just having human labor or automation, but having a pyramid of labor where you could automate the entire task with somewhat less accuracy, or ask a crowd member at somewhat higher cost and greater accuracy, or ask a lawyer and he'll be the most expensive to ask, but most accurate.

With that kind of interaction between tool and human, we easily match the full human process [in accuracy] and use far fewer human judgments to make that happen. It would be foolish to do things fully manually when you have that technology available.

What are the concerns about this approach?

The No. 1 thing everyone worries about with crowdsourcing is quality. The challenge becomes, "How do I integrate all those so I deliver the expected level of quality, while minimizing the cost and time needed to get there?" The biggest problem is trust in the community and whether legal policy allows us to use it or not.

Do you believe this is the future of e-discovery?

Companies are always looking for ways to be more efficient. We can get in near-real-time online contributors to fill in the gap where artificial intelligence [or computer-based methods] falls short, so it's exciting in terms of the capabilities that we can deliver.

Crowdsourcing isn't going away; I do think it will be increasingly prevalent, because it's enabling greater efficiencies and that's where everyone always runs to. It's also a big source of innovation.

While the legal industry isn't excited about this, they're doing it because they just have no choice. They can't afford the typical processes, and lawyers can't contend with the volume of documents. That's what's really driving this thing. The scale just can't be dealt with anymore and people are grudgingly trying to figure out how to make things work.

Next Steps

Cloud-based records management is a 'thing'

Records management gets a fresh face with technology

Cloud-based file sharing technology aids human rights organization

Dig Deeper on Information governance management