Improve your behavioral lead scoring model with nuclear physics

According to various sources (SiriusDecision, SpearMarketing) about 66% of B2B marketers leverage behavioral lead scoring. Nowadays we rarely encounter a marketing platform that doesn’t offer at least point based scoring capabilities out of the box.

However, this report by Spear Marketing reveals that only 50% of those scores include an expiration scheme. A dire consequence is that once a lead has reached a certain engagement threshold, the score will not degrade. As put it in the report, “without some kind of score degradation method in place, lead scores can rise indefinitely, eventually rendering their value meaningless.” We’ve seen this at countless companies we’ve worked with. It is often a source of contention between Sales and Marketing.

So how do you go about improving your lead scores to ensure your MQLs get accepted and converted by Sales at a higher rate?

Phase 1: Standard Lead scoring

In the words of James Baldwin, “If you know whence you came, there are absolutely no limitations to where you can go”. So let’s take a quick look at how lead scoring has evolved over the past couple of years.

Almost a decade ago, Marketo revolutionized the marketing stack by giving marketers the option to build heuristical engagement models without writing a single line of code. Amazing! A marketer, no coding skills required, could configure and iterate over a function that scored an entire database of millions of leads based on specific events they performed.

Since the introduction of these scoring models, many execution platforms have risen. The scoring capability has long become a standard functionality according to Forester when shopping for marketing platforms.

This was certainly a good start. The scoring mechanism had however 2 major drawbacks over which much ink has been spilt:

  • The scores don’t automatically decrease over time
  • The scores are based on coefficients that were not determined statistically and thus cannot be considered predictive

Phase 2: Regression Modeling

The recent advent of the Enterprise Data Scientist, formerly known as the less hype Business Analyst, started a proliferation of lead scoring solutions. These products leverage machine learning techniques and AI to accommodate for the previous models inaccuracies. The general idea is to solve for:  

Y = ∑𝞫.X + 𝞮


Y is the representation of conversion
X are the occurrences of events
𝞫 are the predictive coefficients


So really the goal of lead scoring becomes finding the optimal 𝞫. There are many more or less sophisticated implementations of regression algorithms to solve for this, from linear regression to trees, to random forests to the infamous neural networks.

Mainstream marketing platforms like Hubspot are adding to their manual lead scoring some predictive capabilities.

The goal here has become helping marketers configure their scoring models programmatically. Don’t we all prefer to blame a predictive model rather than a human who hand-picked coefficients?!

While this approach is greatly superior, there are still a major challenge that need to be addressed:

  • Defining the impact of time on the scores

After how long does having “filled a form” become irrelevant for a lead? What is the “thermal inertia” of a lead, aka how quickly does a hot lead become cold?

Phase 3: Nuclear physics inspired time decay functions

I was on my way home some time ago, when it struck me that there was a valid analogy between Leads and Nuclear Physics. A subject in which my co-founder Paul holds a masters degree from Berkeley (true story). The analogy goes as follows:
Before the leads starts engaging (or being engaged by) the company, it is a stable atom. Each action performed by the lead (clicking on a CTA, filling a form, visiting a specific page) results in the lead gaining energy, thus furthering it from its stable point. The nucleus of an unstable atom will start emitting radiation to lose the gained energy. This process is called the nuclear decay and is quite well understood. The time taken to free the energy is defined through the half-life (λ) of the atom. We can now for each individual action compute the impact over time on leads and how long the effects last.

Putting all the pieces together we are now solving for:

Y = ∑𝞫.f(X).e(-t(X)/λ) + 𝞮


Y is still the representation of conversion
X are the events
f are the features functions extracted from X
t(X) is the number of days since the last occurrence of X
𝞫 are the predictive coefficients
λ are the “half-lives” of the events in days


This approach yields better results (~15% increase in recall) and accounts very well for leads being reactivated or going cold over time.

top graph: linear features, bottom graph: feature with exponential decay


Next time we’ll discuss how unlike Schrödinger’s cat, leads can’t be simultaneously good and bad…


xkcd Relativistic Baseball:
Marketo behavioral lead score:
Amplitude correlation analysis:
HubSpot behavioral lead score:
MadKudu: lead score training sample results