Add Context to Content
In the past, context was king. The insights researchers were able to gain when conversing informally were extremely rich
because each individual brought a powerful data processing source to the table: his or her own mind. Human brains are adept
at making contextually relevant associations; they categorize information into loose groupings and can make intelligent connections
that a structured database is incapable of. For example, a human would know immediately that the words "auto," "automobile,"
and "car" mean the same thing, or that a past experiment may be "kind of" like one being conducted in a current project.
But what happens when the available knowledge base includes an enormous breadth of sources, data formats, and locations? A
researcher may be able to conclude that any files containing either the term aspirin or acetylsalicylic acid are relevant
to his work, but would find it impossible to quickly or easily access the most important information if he had to actually
read through thousands of pages of published literature, or manually search through hundreds of ELN documents to find it.
This is where emerging technologies that enable less rigid, artificially intelligent search capabilities come in. If applications
enabling semantic search and text analytics were built into the services-based scientific information management platform
described in the section above, research teams could more easily add context to content, and take advantage of the valuable
stores of complex data available to them—structured and unstructured, proprietary and public.
For example, one of the world's largest global pharmaceutical companies wanted to search a vast amount of unstructured content—ranging
from external patents and journal articles to their own internal company documents—in order to identify and extract information
related to specific new business opportunities for its existing intellectual property. The organization leveraged a services-based
IT platform to integrate both the unstructured data sources, as well as an array of text-mining applications. The outcome
of its work clearly indicated that while standard text mining applications were useful, scientifically aware text analysis
methods were even more critical to success. These allowed researchers to scan the content for IUPAC, SMILES strings, and common/brand
names of interest and quickly pinpoint the most contextually relevant sources of information. Standard text methods, such
as FAST, cannot recognize chemical structures or biological sequences in the manner above and thus an integrated approach
was the best solution. Without this ability to quickly integrate both an array of applications and content sources, the time
and cost constraints involved in leveraging this valuable information would have been too high, and critical insights would
have been missed.
The insights that lead to new breakthroughs are often hidden in a deluge of data, inaccessible to the researchers who need
them, and disconnected from other relevant sources of information. In order to transform this data into the knowledge that
drives discoveries, today's organizations need to bring back the contextually rich collaboration that existed at the company
lunch table, but in a form more suited to the modern research environment. This requires a global, services-based IT architecture
that supports cross-disciplinary data-sharing and integration, local information delivery, and scientifically aware search
capabilities. As a result, organizations can take advantage of all relevant research sources like never before.
Frank Brown is chief science officer at Accelrys. He can be reached at FBrown@accelrys.com
|