We’ve spoken with dozens, if not hundreds, of marketing teams that have considered building a lead scoring model internally. They’ve already got the lead scoring basics figured out, and both sales and marketing are reaping the benefits of it. They’ve got a ton of data and are now ready to take it to the next level. And they’ve got the resources to help them do some impressive data analysis and build an algorithm specifically for their use case - bonus!
Not so fast… Given the extensive experience we have with the matter, we thought we'd share some of the most frequently overlooked aspects of such an endeavor. Check out the top 10 things to consider below!
We've already covered a lot of this in our article on build vs buy to build here. The gist of it is that, while it might seem straightforward to build some point-based or even basic regression-based models, the opportunity cost is high. Opportunity cost is probably one of the most underrated metrics in startup-land, yet it is fundamental. Sure... building it in-house doesn't cost you anything from a "new cost = new line on the financial report" perspective, but it does cost you in features that won't be shipped because your engineering team is working on scoring leads. Growth engineers are better off building experiences and automation than optimizing internal alignment.
The very basic cost calculator we've put together is pretty eye-opening.
One of the most challenging aspects of building a lead scoring model is that your data engineers aren't necessarily marketers and even fewer have any background in sales - the end-users of the lead score you are trying to implement. Sales and marketing alignment is so important, so it is important to also address some of the fundamental differences in their personalities.
It is critical when building a lead score model to bear in mind that it will be evaluated by Sales as much as it will be by Marketing.
We recommend companies use their recurring sales+marketing meetings to better define the requirements for a lead score model that is most likely to be adopted by Sales. Knowing precisely how the model will be used and surfaced will determine which trade-offs can be made. For example, I've seen companies hide the scores from reps to avoid having to debate why a lead is a 92 and another an 88… #facepalm
Neural Nets were all the hype a couple of years ago and, while AI is no longer the hottest topic, there is still a lot of noise around Machine Learning. There is absolutely no need for any fancy algorithms in B2B sales. The datasets are simply too small to warrant anything fancier than basic regression algorithms or maybe a kNN.
I've spent the past 10 years building models for B2B and reading research papers on the lookout for a breakthrough but there hasn't been anything to date. If you're curious, the most interesting paper I've seen is how IBM built a size of wallet predictions (see here for the full paper).
As you'll see in the next few paragraphs, the complexity instead lies in the data prep for the model and the success metric.
Weren't we all so disappointed when the promise of Big Data analytics platforms failed to deliver because of "bad data"? Most Go-To-Market teams realized they were unable to unlock the benefits of these platforms. This IBM study claimed that poor data quality costs US businesses over $3.1 trillion per year.
The same data can be "Big" based on one of the 4 Vs referenced in the IBM article. I also believe that data can be "Bad" for any of the following reasons:
B2B marketing teams are constantly seeking the best enrichment tools that will unlock the ever-elusive 100% match rate. In the meantime, we have to deal with sparse records in Salesforce missing company size, industry, and even HQ country. This missing data makes it harder to run a regression on top of your data. It also explains why Salesforce Einstein hasn't crushed every single lead score model built outside of the platform. During our incubation by Salesforce, we ran a test to validate this assumption. The TL;DR is that a model with fewer but more present datapoints outperformed a model using 5 features (company size, industry, traffic, technologies used, and 1 custom data point from the form) by 50%.
Figuring out how you manage self-input information is also more complicated than it may seem. Many of our customers complain that the model they've built internally relies heavily on a few fields from specific forms (e.g. # of sales people for sales automation tools, # of images on website for cloud hosting). The challenge is that they know they should reduce the number of fields on their forms to increase conversion rates, but if they remove these heavily weighted fields all of the leads will get scored as medium quality. You want to avoid having the same person (defined as having the same email) be scored 95 and 55 in your system just because of the absence of that self-input data. Based on what you are trying to optimize for -- increase the MQL-to-SQL conversion rate, solve for capacity, or increase the predictability of revenue -- you will need to configure your scoring mechanism differently to handle self-input information.
The TL;DR still stands true: no amount of algorithmic brute force will ever make up for appropriate data preparation.
I've found that most marketing operators were not trained to keep behavioral and firmographic signals separate. I would often see scoring models that would allocate +50 pts for a demo request and +20 pts for being an executive. While this might seem to make sense at first, it quickly creates operational problems. The main one being that you are setting a hierarchy between intrinsic and intent attributes.
We recommend starting with the ideal business goal before building anything and writing down how each campaign response should be evaluated for being MQL-ed. In general, we'd recommend building a model to tier your hand-raisers SLA (which hand-raisers should be contacted within 5min vs 2h vs 24h). Then you can work your way down to other campaign response types.
When building a lead score, you essentially want to mimic the qualification process your best reps go through. While the BANT framework (Budget, Authority, Need, Timing) is extremely helpful, the order in the acronym along with combining the 4 elements together can be misleading. From a business standpoint, we would rather use NBTA:
Speaking to someone with authority but no budget is a classic time-sink. However, speaking to the wrong person at the right company means you need to work your way up the authority ladder within the org before marking the deal as qualified.
This is why we highly recommend building a firmographic model to predict B & N, then layering on the A component. T is usually predicted through intent data.
I've seen many companies task an in-house data scientist to build a lead scoring model. The core challenge with this is that the typical model performance metrics (f1-score, R^2, AUC, etc.) aren't meaningful in B2B lead scoring. False negatives and false positives don't have the same impact and that impact differs from one company to another. For instance, a false negative (lead scored as poor quality but is, in fact, high quality) is going to hurt your top line because you might not have a rep talk to them. But a false positive (lead scored as high quality but is, in fact, poor quality) can destroy trust in your model.
I've heard countless stories of models weighing company size heavily and therefore scoring universities as A-grade leads. Once your sales reps have seen this, it will be an uphill battle to convince them to trust any lead score regardless of your f1-scores and AUC numbers.
At MadKudu we've created a custom error function to measure the performance of the models we create for our customers. Put simply it has 4 components that we recommend implementing:
In any case, remember that the final users of lead scores are sales reps and therefore spot-checking is a critical part of validating the model is ready to go live.
Class imbalance is the problem in machine learning where the total number of a class of data (positive) is far less than the total number of another class of data (negative). The challenge with ML models is that they work better when the classes are roughly equal which is not the case with lead conversion.
To make this more obvious, consider the exaggerated scenario of having a 1% conversion rate from lead to SQO and you are looking to build a model to predict if a lead is likely to convert to an SQO. The typical approach here is to use some logistic regression technique since the outcome is binary. However, the model would be right 99% of the time if it always predicted 0 (negative outcome).
To solve this issue, you should do both sampling and boosting:
We've spoken about Simpson’s paradox many times in the past. Simpson’s paradox is highly relevant in marketing, especially when looking to build a lead scoring model.
Companies will generally only prospect "Good" leads but their conversion rate (from lead created to any deal stage) is expected to be 10x lower than an average inbound lead. When we look at the distribution of leads overall in a CRM, we tend to see a 50/50 split between good and bad leads. This is mainly because our outbound deteriorates our conversion rate. However, when we look at inbound leads only, the ratio is expected to be 20/80.
Looking at the chart above, it seems like the Average Conversion Rate of Good leads is lower than that of Bad leads (8.4% vs 8.6%). However, when we look at each channel individually, Good leads convert consistently at 3x the rate of Bad leads (30% vs 9% for inbound; 3% vs 0.7% for outbound). Therefore building an overall model vs at the channel level will cause the model to be incorrect.
We recommend building the training cohort off of a single channel with uniform intent in order to ensure the main conversion rate differentiator will be firmographic quality.
We’re all feeling it, in-person events are back. Even with budgets and capacity shrinking lots of businesses are seeing value in being where their target customers are, in real life. The question everyone is asking is, are events worth the large investment?
Read StoryWhile using one lead grade may work for some companies, it typically ignores the critical nuances necessary to drive the right business outcomes. Read more to learn why having separate behavioral and demographic scores better supports the complexities of most B2B organizations.
Read StoryLead scoring is foundational to any personalized marketing program but often falls short due to data challenges. It is also time-consuming, costly, and inflexible to the changing needs of your business. Learn how to implement lead scoring and gain sales adoption in just a few weeks, rather than months.
Read Story