Selecting text analysis software demands a rigorous proof of concept

Experts agree that content and text analysis software has unique demands and requires a rigorous proof of concept or testing protocol to ensure organizations that they’ve chosen wisely.

Of all the types of software that ought to be put through the paces with a proof of concept or a testing protocol before purchase, text analytics is unique in that it is fundamentally iterative and demands ongoing development even as you try it out.

That’s the assessment of analysts and consultants with expertise in text mining and content analytics technology.

“Text analytics is different than other software,” said Tom Reamy, chief knowledge architect and founder of KAPS Group, a text analytics consultancy based in Oakland, Calif. “It’s something that you’re going to continue to be refining. … It doesn’t perform just out of the box. Development is what is key.”

Before starting a pilot program or proof of concept, however, organizations should examine their text analysis needs first. “It’s a significant research project into your own organization,” said Reamy. “That’s essential just to get the feature list that you’re going to need for your software program. That’s the first step.”

Then it’s time to take the traditional route of choosing one or two products to actually test, Reamy said. Namely, gathering background information, making calls, researching features, conducting initial evaluations against all the different features the research project turned up, setting up focus groups and then getting demos of the top four or five products.

Anticipate a six-week pilot process. Reamy said enterprises ready to begin a proof of concept of text analysis software should prepare for a six-week testing and development process. It requires at least three rounds of test and development to focus a text analysis program to an individual organization’s needs.

“You develop some rules and categorization capabilities and test it on content, then test your development on new content, refine the rules, and test again,” he said.

“There’s a huge difference between software configuration and software customization,” said Theresa Regli, principal analyst at The Real Story Group of Olney, Md. “You never really know how much work it’s going to take until you’re really testing it.”

Because of that it is key to anticipate a longer process than is typical for other software evaluations.

Organizations need to test the capability of the software to get the information users are looking for as well as their own development capability, Reamy said. It's typical to get only 30% accuracy after five weeks, but just a couple of minutes or hours more of development can often yield the 90% accuracy most companies look for.

Use actual content to test. “I believe a test-based selection process is the best way using a real-world situation with actual content,” Regli said. Have a few use cases at hand that represent the business challenges you’re trying to solve. Testing the software against the actual business requirements and use cases will help the testing team determine not just whether the software will work, but how.

“I think that’s something that these selection processes tend to ignore,” she said, explaining it was also important to use actual text and content to determine whether coding will have to take place for customization and to find out how the software works with different formats.

Using real-world content to test analytics programs is essential for a valid and focused proof of concept, said Seth Grimes, founder of consulting firm Alta Plana Corp. in Takoma Park, Md. If the organization doing the testing is reluctant to use actual content for testing because of security or confidentiality issues, it's important to get as close to the real-world example as possible, he said.

“It is fundamental to use the real data that you’re going to be using,” said Reamy. It’s what Grimes called getting at a “well-defined set of business problems to be solved.” Once you do that, he explained, you can demonstrate the real potential of the technology to the testing team.

Make sure the team and plan remain flexible. Another essential element to the pilot program for testing text analysis software is making sure the testing team remains open to learning and refocusing its efforts during the course of evaluations, Grimes said.

“A careful -- but not rigid -- plan is definitely the way to go for organizations looking to adopt any form of analytics,” he said. 

“I think there should be a point at which certain parties get more weight in the final decision,” Regli said, explaining that while flexibility was good, the final choice of software depends on who will be using it and for what purpose.

Be careful of the “CTO problem” as Reamy called it. “He or she knows how to do software evaluation and expects this is going to be the same, and it clearly isn’t.” Instead, he said, use subject-matter experts to put together initial categorization of the content to be tested “to jump start the whole development process.”

Use the people who are going to be working with the software to help prove the concept. And, if you don’t have one on staff, hire a taxonomy or linguistics expert to help. “The part that you’re going to spend most of your time on is categorization,” said Reamy. “It’s complex and difficult to do and second, it’s fundamental.”

Above all, he repeated, take your time and test, test, test the text analysis program. “It’s something that you’re going to continue to be refining and it’s going to get better and better.”

Dig Deeper on Text analytics and natural language processing software