Search for: Meaning

Aug 01, 2008


J. Brooker Aker
The amount of "unstructured information"—data not easily read by a computer—that pharmaceutical companies produce staggers the mind. This information typically appears in the form of research reports, sales records, emails, patent documentation, and clinical studies. Though critical information lies within this unstructured mass, locating the data in an online search is often extremely difficult.

Unstructured content has no conceptual definition. In structured data, a word is simply a word—like a spreadsheet in which every piece of information is in a defined format, and stored electronically. But according to a 2003 Merrill Lynch report, structured data accounts for only 15 percent of all potentially usable business information. The rest is hard-to-access, unstructured data.

The opportunities and challenges presented by this efflorescence of information, misinformation, and disinformation are unprecedented. Making unstructured data searchable and retrievable is essential to performing all kinds of pharma functions, from understanding trends to quelling rumors to researching drug development. The trick is turning all that noise into meaningful signals. One way to do this is with semantic intelligence.

The Science of Meaning

A search engine is a retrieval system designed to help find information stored in a computer. Since the birth of the Internet, there have been various search engine incarnations, but none is as successful and popular as Google. Google crawls the Web, stores a local cache of the pages it finds, and builds a lexicon of common words, then builds a list of pages containing each word. A query for a given word returns that list, sorted by page rank.

The problem with Google and its search clones is that the intelligence behind the search does not fully recognize language in its natural form—the clichés, synonyms, homonyms, brand names, and so on. For example, when searching for the phrase "soft tissues," a Google search cannot decipher whether you are looking for information about a group of cells or a handful of Kleenex. Instead, you get results about both.

Semantic intelligence extracts valuable information from texts by identifying people, places, things, time, money, and many other key signifiers. (Why "semantic"? Because the method derives from the science of assigning meaning to every word in a document rather than simply treating a word, as Google does, as a token.) But more than that, semantic intelligence finds events of interest, how objects are connected to one another, and the tone or sentiment expressed by an author. The end result is more precise search results.

Searches based on semantic intelligence can draw direct relationships between terms because it understands both the user's query and the Web text. How? By using cognitive algorithms similar to the way the human brain thinks. It knows magazines are also called publications, that people work for a company, that price is related to cost, and so on. By bringing together linguistics and IT, semantic intelligence offers a highly sophisticated solution for searching, collecting, and organizing unstructured data.

For example, if a user searches for the word "aspirin," semantic intelligence automatically extends the search to words for related concepts or synonyms such as "analgesic," "anodyne," and "painkiller." In addition, if your search uses a keyword that has multiple meanings, such as "formula," semantic intelligence retrieves only the documents that use the word in the desired context. A tool that allows users to select the intended use of a word by right-clicking will let them discard search results that do not apply.

Semantic intelligence connects objects (words) by locating events of interest to the end user and storing these in a relational database. Once in the database, different models can be built into queries and visualizations that represent the connections. For example, a user can semantically process thousands of Big Pharma patents, tuning the search to look for examples of how each company uses oxygenation or pH levels as a method to fight cancer, and retrieve it in an X by Y chart to show how each company has chosen to concentrate on one method over the other. This insight is hard to glean from reading alone; there's simply too much data to process.