Ahmed Youssef, PhD Head of Data Science, smava GmbH
Creating digital products tailored to your customers’ individual needs and preferences is a significant source of untapped business value. Unfortunately, the Data Science teams tasked with driving these personalization programs often fail to realize this promise. The biggest missing enablers are the level of statistical and architectural maturity of the organization. Below are a few recommendations that will put you on the right track.
GET YOUR STATISTICS RIGHT
By construction, data-driven personalization is a hard problem. With just a few distinguishing features about the customers, you already need to keep track of a KPI over hundreds of segments. In other words, once you start any serious personalization program, you are, by definition, in the small data limit. When data is scarce, statistical savviness is no longer an option but a hard requirement.
After running an A/B test, you should try to look for customers on which the treatment performed best/worst. This process, known as post-segmentation, can be executed with different levels of sophistication, ranging from a simple manual process to a systematic, algorithmic search. Done right, you can expect significant uplift in your KPIs. This has been proven time and again: treatment effect heterogeneity is ubiquitous in the complex adaptive systems that most markets are. However, I urge you to exercise statistical caution while playing this game. It is indeed easy to fantasize effects that don’t generalize beyond the specific data you are dissecting. As the saying goes—if you torture the data long enough, it will confess. So stick with large, coherent, easily interpretable segments and don’t forget to correct for multiple hypothesis testing.
Sooner or later, you will reach the limits of what a simple randomized experiment can teach you. You might have extensive amounts of data that warrant highly granular customer segmentation. Additionally, your product might come in many different meaningful flavors. It is now time to go to the next level and change the question you are asking altogether. Instead of trying to find the best variant, you should focus on maximizing the cumulative rewards for personalization program, say in the next three years. Multi-armed bandits are the tool of choice here and are being successfully deployed in leading tech companies. This family of algorithms is designed to optimally balance exploration of a massive space of possibilities (different matchings between customer segments and product flavors) and exploitation of already found fruitful strategies.
GET YOUR ARCHITECTURE RIGHT
If using machine learning instead of a simple heuristic increases your time to market from three to nine months, you probably already lost the race. By now, the company perhaps deployed a more straightforward solution, or business priorities have changed altogether. As a result, many Data Science endeavors sadly only live on a laptop without ever being taken to production and yielding business value.
Simply put, serving data-intensive applications is qualitatively different and way harder than delivering traditional software. Here are some guidelines that will simplify your journey to frequently and safely deploy machine learning products.
Your architecture should allow for full reproducibility even when algorithms are built on non-stateful data sources.
The Data Science team should be able to directly take models to production without the need to share any code with the engineering side. Think of the Data Science end product as a fully isolated micro-service. The application calls it whenever needed without ever worrying about implementation details (programming language, package versions, availability etc.)
Test-driven development is a key enabler of continuous delivery and deployment. Unit tests are, however entirely insufficient to guarantee the correct behavior of machine learning products. On top of the usual testing framework, I highly recommend running a comprehensive series of statistical tests on the input and output of the algorithms.
Finally, the combination of sophisticated machine learning and automated decision-making algorithms routinely results in products that can be difficult to reason about before observing their behavior in the wild. This justifies significant efforts to build exhaustive real-time monitoring of your models’ predictive performance as well as their effect on your fast and slow KPIs.