The amount of "unstructured information"—data not easily read by a computer—that pharmaceutical companies produce staggers
the mind. This information typically appears in the form of research reports, sales records, emails, patent documentation,
and clinical studies. Though critical information lies within this unstructured mass, locating the data in an online search
is often extremely difficult.
J. Brooker Aker
Unstructured content has no conceptual definition. In structured data, a word is simply a word—like a spreadsheet in which
every piece of information is in a defined format, and stored electronically. But according to a 2003 Merrill Lynch report,
structured data accounts for only 15 percent of all potentially usable business information. The rest is hard-to-access, unstructured
The opportunities and challenges presented by this efflorescence of information, misinformation, and disinformation are unprecedented.
Making unstructured data searchable and retrievable is essential to performing all kinds of pharma functions, from understanding
trends to quelling rumors to researching drug development. The trick is turning all that noise into meaningful signals. One
way to do this is with semantic intelligence.
The Science of Meaning
A search engine is a retrieval system designed to help find information stored in a computer. Since the birth of the Internet,
there have been various search engine incarnations, but none is as successful and popular as Google. Google crawls the Web,
stores a local cache of the pages it finds, and builds a lexicon of common words, then builds a list of pages containing each
word. A query for a given word returns that list, sorted by page rank.
The problem with Google and its search clones is that the intelligence behind the search does not fully recognize language
in its natural form—the clichés, synonyms, homonyms, brand names, and so on. For example, when searching for the phrase "soft
tissues," a Google search cannot decipher whether you are looking for information about a group of cells or a handful of Kleenex.
Instead, you get results about both.
Semantic intelligence extracts valuable information from texts by identifying people, places, things, time, money, and many
other key signifiers. (Why "semantic"? Because the method derives from the science of assigning meaning to every word in
a document rather than simply treating a word, as Google does, as a token.) But more than that, semantic intelligence finds
events of interest, how objects are connected to one another, and the tone or sentiment expressed by an author. The end result
is more precise search results.
Searches based on semantic intelligence can draw direct relationships between terms because it understands both the user's
query and the Web text. How? By using cognitive algorithms similar to the way the human brain thinks. It knows magazines are
also called publications, that people work for a company, that price is related to cost, and so on. By bringing together linguistics
and IT, semantic intelligence offers a highly sophisticated solution for searching, collecting, and organizing unstructured
For example, if a user searches for the word "aspirin," semantic intelligence automatically extends the search to words for
related concepts or synonyms such as "analgesic," "anodyne," and "painkiller." In addition, if your search uses a keyword
that has multiple meanings, such as "formula," semantic intelligence retrieves only the documents that use the word in the
desired context. A tool that allows users to select the intended use of a word by right-clicking will let them discard search
results that do not apply.
Semantic intelligence connects objects (words) by locating events of interest to the end user and storing these in a relational
database. Once in the database, different models can be built into queries and visualizations that represent the connections.
For example, a user can semantically process thousands of Big Pharma patents, tuning the search to look for examples of how
each company uses oxygenation or pH levels as a method to fight cancer, and retrieve it in an X by Y chart to show how each
company has chosen to concentrate on one method over the other. This insight is hard to glean from reading alone; there's
simply too much data to process.