OR WAIT 15 SECS
Volume 0, Issue 0
Here's how to search for meaning through unstructured data.
The amount of "unstructured information"—data not easily read by a computer—that pharmaceutical companies produce staggers the mind. This information typically appears in the form of research reports, sales records, emails, patent documentation, and clinical studies. Though critical information lies within this unstructured mass, locating the data in an online search is often extremely difficult.
J. Brooker Aker
Unstructured content has no conceptual definition. In structured data, a word is simply a word—like a spreadsheet in which every piece of information is in a defined format, and stored electronically. But according to a 2003 Merrill Lynch report, structured data accounts for only 15 percent of all potentially usable business information. The rest is hard-to-access, unstructured data.
The opportunities and challenges presented by this efflorescence of information, misinformation, and disinformation are unprecedented. Making unstructured data searchable and retrievable is essential to performing all kinds of pharma functions, from understanding trends to quelling rumors to researching drug development. The trick is turning all that noise into meaningful signals. One way to do this is with semantic intelligence.
A search engine is a retrieval system designed to help find information stored in a computer. Since the birth of the Internet, there have been various search engine incarnations, but none is as successful and popular as Google. Google crawls the Web, stores a local cache of the pages it finds, and builds a lexicon of common words, then builds a list of pages containing each word. A query for a given word returns that list, sorted by page rank.
The problem with Google and its search clones is that the intelligence behind the search does not fully recognize language in its natural form—the clichés, synonyms, homonyms, brand names, and so on. For example, when searching for the phrase "soft tissues," a Google search cannot decipher whether you are looking for information about a group of cells or a handful of Kleenex. Instead, you get results about both.
Semantic intelligence extracts valuable information from texts by identifying people, places, things, time, money, and many other key signifiers. (Why "semantic"? Because the method derives from the science of assigning meaning to every word in a document rather than simply treating a word, as Google does, as a token.) But more than that, semantic intelligence finds events of interest, how objects are connected to one another, and the tone or sentiment expressed by an author. The end result is more precise search results.
Searches based on semantic intelligence can draw direct relationships between terms because it understands both the user's query and the Web text. How? By using cognitive algorithms similar to the way the human brain thinks. It knows magazines are also called publications, that people work for a company, that price is related to cost, and so on. By bringing together linguistics and IT, semantic intelligence offers a highly sophisticated solution for searching, collecting, and organizing unstructured data.
For example, if a user searches for the word "aspirin," semantic intelligence automatically extends the search to words for related concepts or synonyms such as "analgesic," "anodyne," and "painkiller." In addition, if your search uses a keyword that has multiple meanings, such as "formula," semantic intelligence retrieves only the documents that use the word in the desired context. A tool that allows users to select the intended use of a word by right-clicking will let them discard search results that do not apply.
Semantic intelligence connects objects (words) by locating events of interest to the end user and storing these in a relational database. Once in the database, different models can be built into queries and visualizations that represent the connections. For example, a user can semantically process thousands of Big Pharma patents, tuning the search to look for examples of how each company uses oxygenation or pH levels as a method to fight cancer, and retrieve it in an X by Y chart to show how each company has chosen to concentrate on one method over the other. This insight is hard to glean from reading alone; there's simply too much data to process.
By decreasing the volume of document iteration, semantic intelligence makes information gathering faster and more efficient, saving time and money at the front end of R&D. For example, it can:
» Ease access to new information in scientific studies There's a hierarchy among words, and semantics can make precise use of this hierarchy. For example, one study may refer to the antibacterial methenamine, while another mentions a different antibacterial, nitrofurtoin; neither study mentions the other, and a researcher is unaware that both are used for urinary tract infections. Semantic intelligence can make this connection because it tracks the hierarchy of both.
» Accelerate the earliest phase of drug development by recognizing new chemical formulas All candidate compounds start with a literature review to match them with indications. Often there are more degrees of freedom, or steps that make a connection, between compounds and indications than can be discovered by researchers. Semantic intelligence can capture all of this knowledge.
» Help protect the intellectual property of a patented drug Many patents are a fog of claims and descriptions. Semantic intelligence can draw out the overlap between two patents no matter how densely they are written.
It's common knowledge that health is one of the most frequently searched categories online. A recent report from iCrossing on how Americans search for health questions found that in the last year, patients turned first to the Internet 59 percent of the time. Knowing what consumers are asking and saying about drugs, disease, symptoms, side effects, and other issues can help inform new approaches to medical and marketing communications.
A central concern is, of course, the opinions and other expressions circulated about the company and its drugs through so-called consumer-generated media (CGM): Internet forums, blogs, wikis, social networks, and Twitter (see "Today's New Words!"). Pharma faces unique hurdles with CGM in the following areas:
» Misinformation: Consumer-to-consumer communication can promote inaccurate perceptions of the use of drugs. Semantic intelligence can be used to find misinformation about issues such as dosing, side effects, and "natural" alternatives. Pinpointed "back-messaging" (targeting responses to negative comments) can then help correct the misinformation.
» Off-label Use: CGM can make it appear as if a drugmaker is promoting a product for unapproved uses—and committing fraud. Finding, tracking, and recording the source using semantic intelligence is a form of legal protection.
» Mandated Reporting: A drugmaker is responsible for informing FDA of all reports, including Web-based, of a drug's side effects and interactions with other drugs. Semantic intelligence can understand unstructured content, and uncover data in the far corners of the Web; companies can report blog comments and online discussions about side effects and other issues.
Whether in the executive suite, the laboratory, or the field, employees at pharmaceutical companies are simultaneously drowning in information and starved for knowledge. Finding new ways to negotiate the proliferating universe of unstructured data can help focus even drug giants on their most important goal: developing new drugs to target challenging diseases.
J. Brooke Aker is the CEO of Expert System USA, a semantic technologies company. He can be reached at firstname.lastname@example.org.