In short, Simpson’s Paradox occurs when groups of data show one particular trend, but this trend is reversed when the groups are combined. It was first made famous during a 1973 study to detect gender bias among graduate school admissions to UC-Berkeley. When looking at all of the data together, the figures showed that men applying were more likely than women to be admitted.
However, when the data was broken down and examined by individual departments, it told a different story. When looking at the six largest departments (as seen in the table below), the conclusion was made that women tended to apply to competitive departments with low admission rates. In contrast, men tended to apply to less-competitive departments with high rates of admission.
Let’s take a look at an example a little closer to home. When marketers look at good versus bad leads, they generally assess them based on the overall conversion rate. In the example below, you’d likely conclude that Good Leads are performing worse than Bad Leads, right?
However, when breaking down Good and Bad Leads by Lead Source, it tells a different story - that Good Leads outperform Bad Leads in every lead source.
The moral of this data science phenomenon? Beware of averages.
The same misinterpretation of data is found when looking at the Product-Led Growth Funnel (i.e., your product sign up and onboarding funnel). We’ll use a real-world example to highlight exactly how this played out for one PLG company, and how they solved it.
This particular customer was generating thousands of product signups each month - amazing, right? However, they soon discovered that following the signup less than half of the leads returned to the product on day two. As many of us would, leadership believed this was a significant problem within the funnel - a real leak in the bucket! - and something they needed to reallocate resources to solve.
That’s when our data science team came in to take a closer look at the data. Unfortunately, as is the case for many B2B organizations, the data quality was poor. When we ran the product signups against the custom-built customer fit model, we found that for leads that were identified as quality leads (i.e., they had a need to buy), the retention rate was greater than 80%. For those same leads, the retention on day 25 was over 75% (i.e., they were still active users after 25 days)!
So, it turns out the company did not have a day two retention problem. Instead, they had a conversion problem on their hands. This realization helped them shift their efforts, and their team began focusing sales and marketing energy on leads that were more likely to convert and were identified as highly qualified. When they did this, they started to see their conversions increase. Who doesn’t love a happy ending?!
It’s imperative that, when examining your product signup and onboarding funnel, you slice the data in different ways so you don’t end up focusing energy and resources on a problem that doesn’t exist. It’s also important to keep in mind that, while PLG companies tend to use their product as the primary demand generation vehicle, not all signups are created equal. More on that in another chapter…