Causal Machine Learning for Customer Retention a Practical Guide with Python
Causal Machine Learning for Customer Retention: a Practical Guide with Python
An accessible guide to leveraging causal machine learning for optimizing client retention strategies
·
Details of this series
This article is the second in a series on uplift modeling and causal machine learning. The idea is to dive deep into these methodologies both from a business and a technical perspective.
Before jumping into this one, I highly recommend reading the previous episode which explains what uplift modeling is and how it can help your company in general.
Link can be found below.
From insights to impact: leveraging data science to maximize customer value
Uplift modeling: how causal machine learning transforms customer relationships and revenue
towardsdatascience.com
Introduction
Picture this: you’ve been a client of a bank for a couple years. However, for a month or two, you’ve been considering leaving because their application has become too complicated. Suddenly, an employee of the bank calls you. He asks about your experience and ends up quickly explaining to you how to use the app. In the meantime, your daughter, who’s a client of the same bank also thinks about leaving them because of their trading fees; she thinks they’re too expensive. While about to unsubscribe, out of the blue, she receives a voucher allowing her to trade for free for a month! How is that even possible?
In my previous article, I introduced the mysterious technique behind this level of personalisation: uplift modeling. When traditional approaches usually predict an outcome — e.g. the probability of churn of a customer— , uplift modeling predicts the potential result of an action taken on a customer. The likelihood of a customer staying if called or if offered a voucher, for example!
This approach allows us to target the right customers — as we’ll be removing customers who wouldn’t react positively to our approach — but also to increase our chance of success by tailoring our approach to each customer. Thanks to uplift modeling, not only do we focus our resources toward the right population, we also maximise their impact!
Sounds interesting, wouldn’t you agree? Well this is your lucky day as in this article we’ll dive deep into the implementation of this approach by solving a concrete example: improving our retention. We’ll go through every step, from defining our precise use case to evaluating our models results. Our goal today is to provide you with the right knowledge and tools to be able to apply this technique within your own organisation, adapted to your own data and use case, of course.
Here’s what we’ll cover:We’ll start by clearly defining our use case. What is churn? Who do we target?
What actions will we set up to try and retain our clients with?
Then, we’ll look into getting the right data for the job. What data do we need to implement uplift modeling and how to get it?
After that, we’ll look into the actual modeling, focusing on understanding the various models behind uplift modeling.
Then, we’ll apply our newly acquired knowledge to a first case with a single retention action: an email campaign.
Finally, we’ll deep dive into a more complicated implementation with many treatments, approaching user-level personalisation
Our use case: improving customer retention
Before we can apply uplift modeling to improve customer retention, we need to clearly define the context. What constitutes “churn” in our business context? Do we want to target specific users? If yes, why? Which actions do we plan on setting up to retain them? Do we have budget constraints? Let’s try answering these questions.
Defining Churn
This is our first step. By precisely and quantitatively defining churn, we’ll be able to define retention and understand where we stand, how it has evolved and, if needed, take action. The churn definition you’ll choose will 100% depend on your business model and sector. Here are some factors to consider:If you’re in a transaction-based company, you can look at transaction frequency, or transaction volumes evolution. You could also look at the time since the last transaction occured or a drop in account activity.
If you’re in a subscription based company, it can be as simple as looking at users who have unsubscribed, or subscribed users who have stopped using the product.
If you’re working in a transaction based tech company, churn could be defined as “customer who has not done a transaction in 90 days”, whereas if you’re working for a mobile app you may prefer to define it as “customer who has not logged in in 30 days”. Both the time frame and the nature of churn has to be defined beforehand as flagging churned user will be our first step.
The complexity of your definition will depend on your company’s specificities as well as the number of metrics you want to consider. However, the idea is to set up definitions that provide thresholds that are easy to understand and that enable us identify churners.
Churn Prediction Window
Now that we know what churn is, we need to define exactly what we want to avoid. What I mean is, do we want to prevent customers from churning within the next 15 days or 30 days? Based on the answer here, you’ll have to organise your data in a specific manner, and define different retention actions. I would recommend not to be too optimistic here for 2 reasons:The longer the time horizon the harder it is for a model to have good performances.
The longer we wait after the treatment, the harder it will be to capture its effect.
So let’s be reasonable here. If our definition of churn encompasses a 30-day timeframe, let’s go with a 30 days horizon and let’s try to limit churn within the next 30 days.
The idea is that our timeframe must give us enough time to implement our retention strategies and observe their impact on user behavior, while maintaining our models’ performances.
Selecting Target Users [Optional]
Another question we need to answer is: are we targeting a specific population with our retention actions? Multiple reasons could motivate such an idea.We noticed an increase in churn in a specific segment.
We want to target highly valuable customers to maximize our ROI with those actions.
We want to target new customers to ensure a durable activation.
We want to target customers that are likely to churn soon.
Depending on your own use case, you may want to select only a subset of your customers.
In our case, we’ll choose to target clients with a higher probability of churn, so that we target customers that need us most.
Defining retention Actions
Finally, we have to select the actual retention actions we want to use on our clients. This is not an easy one, and working alongside your business stakeholders here is probably a good idea. In our case, we’ll select 4 different actions:Personalized email
In-app notifications highlighting new features or opportunities
Directly calling our customer
Special offers or discounts — another uplift model could help us identify the best voucher amount, should we explore that next?
Our uplift model will help us determine which of these actions (if any) is most likely to be effective for each individual user.
We’re ready! We defined churn, picked a prediction window, and selected the actions we want to retain our customers with. Now, the fun part begins, let’s gather some data and build a causal machine learning model!
Data gathering: the foundation of our uplift model
Building an effective uplift model requires a good dataset combining both existing user information with experimental data.
Leveraging existing user data
First, let’s look at our available data. Tech companies usually have access to a lot of those! In our case, we need customer level data such as:Customer information (like age, geography, gender, acquisition channel etc.)
Product specifics (creation or subscription date, subscription tier etc.)
Transactions information ( frequency of transactions, average transaction value, total spend, types of products/services purchased, time since last transaction etc.)
Engagement (e.g., login frequency, time spent on platform, feature usage statistics, etc.)
We can look at this data raw, but what brings even more value is to understand how it evolves over time. It enables us to identify behavioral patterns that will likely improve our models’ performances. Lucky for us, it’s quite simple to do, we just have to look at our data from a different perspective; here are a few transformations that can help:Taking moving averages (7, 30 days…) of our main usage metrics — transactions for instance.
Looking at the percentage changes over time.
Aggregating our data at different time scales such as daily, weekly etc.
Or even adding seasonality indicators such as the day of week or week of year.
These features bring “dynamic information” that could be valuable when it comes to detect future changes! Understanding more precisely which features we should select is beyond the scope of this article, however those approaches are best practices when it comes to work with temporal data.
Gathering Experimental Data for Uplift Modeling
The second part of our data gathering journey is about collecting data related to our retention actions. Now, uplift modeling does not require experimental data. If you have historical data because of past events — you may already have sent emails to customers or offered vouchers — you can leverage those. However, the more recent and unbiased your data is, the better your results will be. Debiasing observational or non randomized data requires extra steps that we will not discuss here.
So what exactly do we need? Well, we need to have an idea of the impact of the actions you plan to take. We need to set up a randomized experiment where we test these actions. A lot of extremely good articles already discuss how to set those up, and I will not dive into it here. I just want to add that the better the setup, and the bigger the training set, the better it is us!
After the experiment, we’ll obviously analyse the results. And while those are not helping us directly in our quest, it will provide us with additional understanding of the expected impact of our treatments as well as a good effect baseline we’ll try to outperform with our models. Not to bore you too much with definitions and acronyms, but the result of a randomized experiment is called “Average treatment effect” or ATE. On our side, we’re looking to estimate the Conditional Average Treatment Effect (CATE), also known as Individual Treatment Effect (ITE).
While experimental data is ideal, uplift modeling can still provide insights with observational data if an experiment isn’t feasible. If not randomized, several techniques exists to debias our dataset, such as propensity score matching. The key is to have a rich dataset that captures user characteristics, behaviors, and outcomes in relation to our retention efforts.
Generating synthetic data
For the purpose of this example, we’ll be generating synthetic data using the causalml package from Uber. Uber has communicated a lot on uplift modeling and even created an easy to use and well documented Python package.
Nomination Link : https://sensors-conferences.sciencefather.com/award-nomination/?ecategory=Awards&rcategory=Awardee
Registration Link :https://sensors-conferences.sciencefather.com/award-registration/
contact as : sensor@sciencefather.com
SOCIAL MEDIA
Twitter :https://x.com/sciencefather2
Blogger : https://x-i.me/b10s
Pinterest : https://in.pinterest.com/business/hub/
Linkedin : https://www.linkedin.com/feed/
#sciencefather #researchaward#CausalMachineLearning, #CustomerRetention, #MachineLearning, #PythonProgramming, #DataScience, #CustomerAnalytics, #AIForBusiness, #CustomerLoyalty, #ChurnPrediction, #MLWithPython, #CausalInference, #BusinessIntelligence, #CustomerSuccess, #RetentionStrategies, #MachineLearningGuide, #PredictiveAnalytics, #PythonDataScience, #CustomerInsights, #DataDrivenDecisions #RobotSensors, #AdvancedSensors #ResearchCoordinator, #PrincipalInvestigator, #ClinicalResearchCoordinator, #GrantWriter, #R&DManager, #PolicyAnalyst, #TechnicalWriter, #MarketResearchAnalyst, #EnvironmentalScientist, #SocialScientist, #EconomicResearcher, #PublicHealthResearcher, #Anthropologist, #Ecologist,
Then, we’ll look into getting the right data for the job. What data do we need to implement uplift modeling and how to get it?
After that, we’ll look into the actual modeling, focusing on understanding the various models behind uplift modeling.
Then, we’ll apply our newly acquired knowledge to a first case with a single retention action: an email campaign.
Finally, we’ll deep dive into a more complicated implementation with many treatments, approaching user-level personalisation
Our use case: improving customer retention
Before we can apply uplift modeling to improve customer retention, we need to clearly define the context. What constitutes “churn” in our business context? Do we want to target specific users? If yes, why? Which actions do we plan on setting up to retain them? Do we have budget constraints? Let’s try answering these questions.
Defining Churn
This is our first step. By precisely and quantitatively defining churn, we’ll be able to define retention and understand where we stand, how it has evolved and, if needed, take action. The churn definition you’ll choose will 100% depend on your business model and sector. Here are some factors to consider:If you’re in a transaction-based company, you can look at transaction frequency, or transaction volumes evolution. You could also look at the time since the last transaction occured or a drop in account activity.
If you’re in a subscription based company, it can be as simple as looking at users who have unsubscribed, or subscribed users who have stopped using the product.
If you’re working in a transaction based tech company, churn could be defined as “customer who has not done a transaction in 90 days”, whereas if you’re working for a mobile app you may prefer to define it as “customer who has not logged in in 30 days”. Both the time frame and the nature of churn has to be defined beforehand as flagging churned user will be our first step.
The complexity of your definition will depend on your company’s specificities as well as the number of metrics you want to consider. However, the idea is to set up definitions that provide thresholds that are easy to understand and that enable us identify churners.
Churn Prediction Window
Now that we know what churn is, we need to define exactly what we want to avoid. What I mean is, do we want to prevent customers from churning within the next 15 days or 30 days? Based on the answer here, you’ll have to organise your data in a specific manner, and define different retention actions. I would recommend not to be too optimistic here for 2 reasons:The longer the time horizon the harder it is for a model to have good performances.
The longer we wait after the treatment, the harder it will be to capture its effect.
So let’s be reasonable here. If our definition of churn encompasses a 30-day timeframe, let’s go with a 30 days horizon and let’s try to limit churn within the next 30 days.
The idea is that our timeframe must give us enough time to implement our retention strategies and observe their impact on user behavior, while maintaining our models’ performances.
Selecting Target Users [Optional]
Another question we need to answer is: are we targeting a specific population with our retention actions? Multiple reasons could motivate such an idea.We noticed an increase in churn in a specific segment.
We want to target highly valuable customers to maximize our ROI with those actions.
We want to target new customers to ensure a durable activation.
We want to target customers that are likely to churn soon.
Depending on your own use case, you may want to select only a subset of your customers.
In our case, we’ll choose to target clients with a higher probability of churn, so that we target customers that need us most.
Defining retention Actions
Finally, we have to select the actual retention actions we want to use on our clients. This is not an easy one, and working alongside your business stakeholders here is probably a good idea. In our case, we’ll select 4 different actions:Personalized email
In-app notifications highlighting new features or opportunities
Directly calling our customer
Special offers or discounts — another uplift model could help us identify the best voucher amount, should we explore that next?
Our uplift model will help us determine which of these actions (if any) is most likely to be effective for each individual user.
We’re ready! We defined churn, picked a prediction window, and selected the actions we want to retain our customers with. Now, the fun part begins, let’s gather some data and build a causal machine learning model!
Data gathering: the foundation of our uplift model
Building an effective uplift model requires a good dataset combining both existing user information with experimental data.
Leveraging existing user data
First, let’s look at our available data. Tech companies usually have access to a lot of those! In our case, we need customer level data such as:Customer information (like age, geography, gender, acquisition channel etc.)
Product specifics (creation or subscription date, subscription tier etc.)
Transactions information ( frequency of transactions, average transaction value, total spend, types of products/services purchased, time since last transaction etc.)
Engagement (e.g., login frequency, time spent on platform, feature usage statistics, etc.)
We can look at this data raw, but what brings even more value is to understand how it evolves over time. It enables us to identify behavioral patterns that will likely improve our models’ performances. Lucky for us, it’s quite simple to do, we just have to look at our data from a different perspective; here are a few transformations that can help:Taking moving averages (7, 30 days…) of our main usage metrics — transactions for instance.
Looking at the percentage changes over time.
Aggregating our data at different time scales such as daily, weekly etc.
Or even adding seasonality indicators such as the day of week or week of year.
These features bring “dynamic information” that could be valuable when it comes to detect future changes! Understanding more precisely which features we should select is beyond the scope of this article, however those approaches are best practices when it comes to work with temporal data.
Gathering Experimental Data for Uplift Modeling
The second part of our data gathering journey is about collecting data related to our retention actions. Now, uplift modeling does not require experimental data. If you have historical data because of past events — you may already have sent emails to customers or offered vouchers — you can leverage those. However, the more recent and unbiased your data is, the better your results will be. Debiasing observational or non randomized data requires extra steps that we will not discuss here.
So what exactly do we need? Well, we need to have an idea of the impact of the actions you plan to take. We need to set up a randomized experiment where we test these actions. A lot of extremely good articles already discuss how to set those up, and I will not dive into it here. I just want to add that the better the setup, and the bigger the training set, the better it is us!
After the experiment, we’ll obviously analyse the results. And while those are not helping us directly in our quest, it will provide us with additional understanding of the expected impact of our treatments as well as a good effect baseline we’ll try to outperform with our models. Not to bore you too much with definitions and acronyms, but the result of a randomized experiment is called “Average treatment effect” or ATE. On our side, we’re looking to estimate the Conditional Average Treatment Effect (CATE), also known as Individual Treatment Effect (ITE).
While experimental data is ideal, uplift modeling can still provide insights with observational data if an experiment isn’t feasible. If not randomized, several techniques exists to debias our dataset, such as propensity score matching. The key is to have a rich dataset that captures user characteristics, behaviors, and outcomes in relation to our retention efforts.
Generating synthetic data
For the purpose of this example, we’ll be generating synthetic data using the causalml package from Uber. Uber has communicated a lot on uplift modeling and even created an easy to use and well documented Python package.
WEBSITE : sensors.sciencefather.com
Nomination Link : https://sensors-conferences.sciencefather.com/award-nomination/?ecategory=Awards&rcategory=Awardee
Registration Link :https://sensors-conferences.sciencefather.com/award-registration/
contact as : sensor@sciencefather.com
SOCIAL MEDIA
Twitter :https://x.com/sciencefather2
Blogger : https://x-i.me/b10s
Pinterest : https://in.pinterest.com/business/hub/
Linkedin : https://www.linkedin.com/feed/
Comments
Post a Comment