OR WAIT 15 SECS
Volume 36, Issue 10
A practical roadmap for pharma brand teams in avoiding the pitfalls in statistical thinking when appraising and presenting data.
Practical tips to help pharma brand teams sidestep the pitfalls in statistical thinking when appraising and presenting product and market data
A recent conference on content marketing encapsulated what has been a growing problem in efforts to glean insights from today’s swell of healthcare information: An astounding number of statistics were presented with no context, along with infographics with much more focus on the visuals and little picture elements, and no scrutiny on the totality of erroneous and misrepresented data.
The very term “infographic” suggests that the “information” is primary and is supported by graphical elements. But in reality, most of them are “graphormation,” where instead the integrity and accuracy of the data takes a back seat to the primary element, which is just intended to be eye candy. These graphics try (and often fail) to convey information. They often show isolated numbers and statistics in adjacent blocks of space with no context. One shouldn’t endeavor to purvey such analytical smut. With a few expert tips, you can certainly avoid it.
For example, a favorite approach for infographics or social marketing tells readers about the “7 Best Practices for [Something],” or “The 8 Ways People Do [Something].” Here’s a tip: There aren’t “7” best practices for anything, or “8” ways people do certain things. Practices, decisions and behavior exist along a continuum, and claiming to quantify this into precise bins is a fallacy. Do XX percent of people really [____]? The concept of being overconfident in data will be explained in the next section. But the very simplicity of infographics makes it easy to misuse them; most are full of spurious correlations, poor decision-making and superficial “science.”
One of the big problems with analyzing big data to look for trends is that there’s no hypothesis to test. Looking at volumes of data after the fact to try to find correlations is a surefire way to convince yourself of causality where none exists. This is the fallacy known as post hoc, ergo propter hoc (after this, therefore because of this): there seems to be some trend of interest in the data and one looks to see what preceded it. The “rush to conclude” bias has viewers drawing associations where they don’t exist to some recent market launch or ad campaign.
The fact is that there are so many factors that are not actually related, a person is more than likely to choose wrong. Said another way, there are actually many factors that preceded the interesting data trend, and most aren’t the cause; how do you know which one(s) is/are the cause if you didn’t control all of the other variables? You don’t.
Another pervasive practice in all data analysis, but most visible to the public in marketing data, is the seeming unwavering precision of published numbers: “59% of patients refuse their tablet medications,” or “88% of your clients use social media to make decisions.” ALL their decisions? Surely there are still decisions being made by healthcare providers and patients which aren’t being steered by their Twitter feeds? They are still capable of driving to the grocery store, with all of the complexity that that entails, and can choose their own food items, and even make it back home.
Also, the percentage of people who use or rely on social media is not reliably “88%” if different groups are resampled. In most cases, these seemingly-precise numbers were drawn from a non-scientific poll or a small sample group from which it is irrelevant and erroneous to extrapolate to the broader society. Every inferential statistic is constrained by an estimate of uncertainty. This set of boundaries is called the confidence interval, and it not only helps data analysis to be more accurate, but it drives better business thinking.
Imagine this: If a company is going to make corporate-level decisions about the success of a new product launch based on a sample or a focus group which indicated a 92% adoption rate, it may find itself in a position of extreme financial loss if the actual uptake rate in the broader population was more like 60%-70%. Each percentage point difference could mean thousands (or more) individuals not adopting a company’s technology, and in a financial sense, each planned customer conversion that is unrealized is equivalent to potential dollars lost. How is this? Through the concept of time value of money (TVM), where money now is more valuable than the same amount of money later. Whether missed customer opportunities or uninvested dollars, each is an opportunity loss, and that loss was directly driven by an overestimate of the accuracy of statistics.
Confidence intervals act as a safety net for the ability to “know” what is going on within the data. Think about driving on the street: Each lane has a bit of buffer space to your left and right, and then there are lane markers (lines) to keep you honest. Presumably, if everyone operated their vehicle within the painted lines, there would be no accidents. This could be one’s business if operating within the certainty of their knowledge.
Let’s use the driving analogy: Pay attention to your market and your data (keep your eyes on the road), and don’t take overly risky actions and veer outside your data’s confidence intervals (traffic lanes), and you’ll be doing a great deal to
insulate your business from unnecessary risk.
In the example depicted in the chart at right, 60 people were polled to ask the likelihood that they would make a particular choice; the average (mean) is 53%. Not acknowledging the confidence interval is to disavow a significant amount of information about how uncertain the estimate is. The 95% confidence interval for the mean ranges from 43%-64%; what this means is that there is a 19-in-20 chance that the actual average of an individual’s dataset is between 43% and 64%, and it’s not known with any more precision than that. This represents a huge difference.
Another concept that companies may not give enough credit to, and is often executed incorrectly, is that of framing. What this refers to is putting numbers in a context (or framing them) in such a way that they’ll increase the likelihood of customers to make certain choices.
Example: If one works in clinical settings with physicians and patients, there is a much greater chance that people will elect to choose a particular treatment if it is framed as having a 90% success rate at treating their particular condition, versus if they are told the drug has a 10% failure rate. Even though 10% is simply the corollary of the success rate (the remaining of the 90% for whom there was no treatment effect), it isn’t perceived that way. And this confounds physicians very often, as well as patients. Why? Because human beings have emotional reactions to “success“ and “failure,” and our brains are not very good at working with probabilities.
Does someone want to engage with your content? Are they searching your site for particular things, or just browsing? How about email “hit rate”? For this, I’ve developed a term called “clickiness” to describe how differently customers respond depending upon what they are looking at. There are non-random patterns in web traffic’s clickiness, where certain sites do better than others, and within sites, there are zones, sections, and content which is more trafficked and “clicky” than others.
What defines this? Certainly, there are visual cues-and some people respond more strongly than others to certain visual stimuli; and because of this, there will never be one type of site or content which always outperforms others. The subjective variance in the “clickers” will always leave noise in the data. For example, eye-tracking analyses (where subjects’ eyes tend to track on a given image) show differences in the focus of visual interest, with some areas of an image attracting the attention of users’ eyes the most, while other areas having relatively low visual scrutiny.
So, to enhance your contents’ clickiness, you’d want to build up the visuals and other stimuli in such a way that their density drives proportionally greater traffic to where you are deriving the most value from their presence.
When it comes to brand marketing in the biopharmaceutical and healthcare space, there are several overlooked but common pitfalls in statistical thinking-requiring an improved approach by drugmakers in appraising their own data. Small, smartly-taken decisions in looking at and analyzing data can make a considerable difference in helping an organization’s analytics not be a quantitative disaster.