Hundreds of small businesses sign up on Instamojo every day. As a data analyst in the team, one of our many jobs is to sift through all the data and build scalable models for other teams to use. One such problem statement we recently worked on was a lead-scoring project that didn’t just simplify the sales teams’ life but also helped us get the most out of the leads.
Today, I’ll take you through how we used machine learning to build a lead-scoring model at Instamojo. – Kaustub Rao, Lead Data Analyst @ Instamojo.
How to score leads? – The problem statement
At instamojo, the sign-up and onboarding process is very simple. That being said, not all those who sign up, complete the onboarding process and start collecting payments or start off with an online store.
We observed that having a sales representative call every onboarded lead and guide them through the platform (shortly after onboarding) can have a great impact on their probability of conversion (i.e. starting to collect payments).
It can also significantly improve the amount and number of transactions these businesses process through Instamojo while also improving retention over time.
However, our sales team has limited bandwidth and cannot contact all of these merchants.
How can we leverage Machine Learning to qualify leads that are worth contacting?
Qualifying leads – the approach
To determine what approach to take, we focussed on what we already know about the businesses/leads signing up and combined it with the requirement at hand:
- Lead qualification needs to happen soon after a merchant onboards.
- We only have information that was collected during the onboarding process and limited data related to the merchant’s activity on Instamojo for the first few days post onboarding.
A naive approach would be to use this information to predict a merchants payment volume and then send the best leads to sales. But we found a couple of problems with this approach:
- This approach had an underlying assumption that the optimum use of a sales reps time is to contact merchants that will bring in a high volume of payments, which may not be the case. Such a merchant may bring in the same volume irrespective of whether a sales rep contacts them or not.
- Materialised GMV is really a product of 2 aspects of a merchant
- Potential – how big is the merchants entire business?
- Intent – how likely is this merchant to collect some or all of their payments via Instamojo?
We felt that the best approach would be to try to segment merchants based on the above traits (potential & intent) first, and not place any assumptions on who the best leads to send to sales may be.
Modelling Potential and Intent of the leads
Supervised vs Unsupervised learning
When it comes to determining the potential size of a business, a supervised approach would not really work. We have no true labels even for historical data. In such cases, it is better to use unsupervised approaches, such as clustering your data based on relevant variables and studying the profile of clusters to determine which clusters are high potential.
This can be defined using a specific activity that merchants have done on our platform that has been historical indicators of high intent. A supervised classification model works here, as we have true labels for historical data.
Broadly speaking, we have 2 types of merchant data –
- Profile data – these are attributes of a merchants business that have little to do with Instamojo. For example – a merchant’s website
- Activity data – this is data related to specific actions that a merchant has taken on the Instamojo platform
We only considered variables that could be derived from profile data for the potential model, as any activity related data could bias our clusters (as any activity at all is a sign of intent and including such data would underestimate the potential of low intent merchants).
Conversely, we avoided using profile attributes in our intent model, as high potential merchants are generally more likely to be high intent and we did not want the model to learn and amplify such relationships that are purely correlational in nature.
The solution to scalable lead scoring
We combined the output of these models to create our final segments.
Leads from the quadrants ranking high on either intent or potential (HIHP, HILP, LIHP) are sent to the sales team for targeting.
We also hold out a control sample from each of these segments, and use this to measure where the effect of sales contact is maximum.
As a result of implementing this model, we saw a 40% lift on revenue for the leads that were sent to sales vs a similar control group.
“Talking to merchants who need what we sell is an ultimate dream for any sales team. This lead scoring mechanism helped us identify high potential leads automatically,” – Shinda Shivaji, Inside Sales Head @ Instamojo.
“This pushed our team to engage with the merchants at the right time. It helped improve conversion significantly & accomplish our GMV/revenue goals.”
The future of lead qualification at Instamojo
As we gather more data on contacted merchants, we would want to model for the incremental value of contacting a merchant and send leads that score highest on this metric.
Hypothetically, this can be done by having 2 models for predicting the metric of importance (e.g. sales), one trained on contacted users, and the other on uncontacted population.
New leads can be scored on both models and the difference in the metric would indicate the incremental value of contact.
We are excited about the possibilities that machine learning can make viable for teams across Instamojo.
This article was contributed by Kaustub Rao, a lead data analyst at Instamojo. If you have questions or want to give Kaustub kudos, connect with him on LinkedIn or comment on this post.