Text analytics and beyond
Quite a few companies have been working on software products that operate on data that has no schema and is heterogeneous. Cognitive technology from Coseer is being used to build a database for a company that deals with healthcare products. The company managed 10 million SKUs representing healthcare products. “There was no standardized product database,” says Praful Krishna, CEO of Coseer, “and about 35 million PDF documents were in the company’s files or posted on the Web. No table of attributes had been created, so searching for the right product or comparing different products was extremely difficult.”
Coseer’s software ingested that diverse collection of unstructured content, which included product brochures, catalogs, surgical protocols and white papers. It detected patterns in the content that enabled it to create metatags and organize the information. “After going through this process, we are able to determine when one SKU is very similar to another one that may, for example, be much cheaper,” Krishna says. “Analyzing this much content with absolutely no structure is impossible to do without cognitive processing of text.”
The resulting database allows executives to find all the potential candidates, focus on certain attributes and help the care providers make better choices regarding products that meet their needs. “This project uses a model developed specifically for the domain,” Krishna explains, “and it took several months to train the system. But at the end, the results are more accurate than those for a sample categorized by humans, and the fill rate for identifying attributes was 12 percent higher.”