11 June 2026By Dominik Thalmeier

RFM and CLV: Who Your Most Valuable Customers Are — and Which Ones You’re Losing

Most eCommerce organisations know with reasonable precision who bought a lot in the last quarter. They are less sure about who will buy a lot in the next quarter — and they are rarely able to identify the loyal customers who are quietly slipping away. A Customer Lifetime Value prediction closes exactly that gap: it turns order history into a robust expectation of how much revenue an individual customer will contribute over the next twelve or twenty-four months. In our work at foobar Agency, this step rarely fails because of the method. It fails because RFM analyses and CLV calculations sit side by side instead of building on each other. This article walks through the bridge from RFM to probabilistic CLV, explains what BG/NBD and Gamma-Gamma actually deliver in day-to-day operations — and shows which business decisions get made differently as a result.

We build on the first article in this series, which covered the Snowflake foundation as a shared data base — and we place the use case into the maturity path we keep seeing at retail and B2C clients across DACH.

RFM: Three Numbers That Say More Than a Dashboard

RFM stands for Recency, Frequency and Monetary — three metrics per customer derived from order history: When did this customer last buy? How often have they bought in a defined time window? What is the average value of their orders? In the classic variant, each of the three dimensions is split into five quintiles, and every customer gets a three-digit profile — from 1-1-1 (gone for a while, rarely bought, small amounts) to 5-5-5 (recently active, frequent buyer, high value).

What sounds dry is in practice often the first point at which marketing leaders see clearly who their most valuable customers actually are. The typical realisations: top-frequency customers are not automatically top-monetary customers. Reactivation candidates can be precisely scoped — not "all customers who haven't bought in 90 days", but "customers with historically high frequency whose recency score is now slipping into the second quintile". And marketing spending that was previously distributed evenly across the database gets its first proper hierarchy.

RFM has two limits worth knowing. First, it describes the past, not the future — today's 5-5-5 customer may already be a churner without showing it in the score. Second, it treats all customers with the same recency score as equivalent, even though a customer with 180 days of recency and a historical buying frequency of one month is a very different case from a customer with 180 days of recency and a frequency of six months. This is exactly where probabilistic models come in.

Probabilistic CLV: What BG/NBD and Gamma-Gamma Actually Do

Probabilistic CLV models treat every customer as an individual probability distribution — not as a point in a quintile grid. The standard setup in non-contractual eCommerce, meaning anywhere customers buy without a subscription or contract, consists of two models building on each other: BG/NBD for the question "how often will this customer buy again?" and Gamma-Gamma for the question "how much will they spend per order?".

BG/NBD — Beta-Geometric / Negative Binomial Distribution — was introduced in 2005 by Fader, Hardie and Lee as a pragmatic evolution of the older Pareto/NBD model (Schmittlein/Morrison/Colombo, 1987). Both models answer the same questions; BG/NBD is just considerably easier to estimate and more numerically stable. The core in two sentences: an active customer's purchase frequency follows a Poisson process with an individual rate; the probability that a customer drops out after a transaction follows a beta distribution. Only the transactions are observed — from those, the model derives the probability that a customer is still active and the expected number of their purchases in a future time window.

Gamma-Gamma models the average order value. The assumption is that a customer's expected order value scatters around an individual mean, and these means are gamma-distributed across the customer base. Important: Gamma-Gamma formally requires frequency and order value to be uncorrelated — an assumption that is checked before model training in practice, because it doesn't hold in every assortment.

The product of expected future order count (from BG/NBD) and expected order value (from Gamma-Gamma) yields the individual CLV forecast over a defined horizon. The standard implementation in Python is the `lifetimes` library by Cam Davidson-Pilon — it turns a transactional order table into a per-customer CLV value in a few steps.

What matters in day-to-day work: the output per customer is not a point estimate but a value with an indication of uncertainty. Marketing leaders who internalise that start asking different questions — not "how high is the CLV?", but "how confident are we that this customer is still active?".

A DACH Multi-Channel Retailer on Snowflake

For a DACH multi-channel retailer, we built exactly this bridge: RFM analysis and probabilistic CLV in one data model on Snowflake, transformed with dbt, with clean interfaces into marketing and service tooling.

Verankert. The starting point was a concrete business question from the marketing organisation: which share of the upcoming acquisition budget is genuinely worth spending on acquisition, and which share should actually go into retention? You can't answer that without CLV — and existing RFM reports only gave the answer from a backward-looking perspective. The project therefore did not start with the model, but with the business decisions the model was supposed to support: spending allocation between acquisition and retention, CRM triggers for reactivation, prioritisation of service cases by customer value.

Vernetzt. The data foundation sat in Snowflake — order data from the commerce system, returns, customer attributes from the CRM. A layer of RFM models was built in dbt: per customer, Recency, Frequency and Monetary, plus the quintile scores, plus a stable segment assignment. The probabilistic CLV calculation builds on the same layer: training and holdout time windows are defined in dbt, the actual model training runs in a Python environment against Snowflake (using Cam Davidson-Pilon's `lifetimes` library), and the predictions — expected purchases in the next twelve months, expected order value, the resulting CLV — flow back into the dbt model as new columns. Deterministic calculations (RFM quintiles, historical values) and probabilistic predictions (BG/NBD, Gamma-Gamma) sit in the same data table, side by side, with the same keys — and are consumed by service and marketing tooling through the same customer ID.

Vorausgedacht. The architecture is deliberately cut so that further predictions can be added without new data pipelines — next-best action, churn classification, category-specific CLV views. Snowflake is not "the data island for predictions" but the same data foundation that reporting and operational steering also live on. Marketing automation, service tooling and CRM campaigns consume the scores via regular reverse ETL or directly through live APIs — depending on the use case.

The value is measurable wherever the previous mode was gut feel: reactivation campaigns are targeted at customers with high predicted residual value — not at "all inactive customers". Service staff see customer value on escalations and decide differently. And acquisition budgets get a target value: we know what a new customer is worth on average — and can say whether a new channel is economically viable at all.

From Score to Stack: What Marketing and Service Do With It

A CLV prediction that only lives in the data warehouse is not a use case — it is a table. The value emerges when the numbers arrive in the systems where decisions are made.

Spending allocation. With customer-individual CLV, the expected value of a new customer can be calculated — segmented by acquisition channel, campaign or product category. If the average CLV of a new customer from channel A sits at a certain value and the CAC there exceeds it, that is no longer a gut decision but a number. The same applies to retention: what may a reactivation campaign cost per reactivated customer? Answer, again: the expected CLV of the reactivated cohort, weighted by reactivation probability.

CRM triggering. BG/NBD yields a per-customer probability of still being active. As soon as that probability falls below a threshold — and predicted residual value is high at the same time — a precise reactivation signal emerges. Instead of a broad "no order in 90 days" trigger, the CRM gets a value-weighted early-warning trigger.

Service prioritisation. In service escalations, customer value is a relevant input — more contentious, but relevant. When a service team knows that an incoming case concerns a high-value existing customer whose active probability is declining, it gets handled differently from a first-time buyer with a small basket. This is not autopilot — it is information that service leads use when appropriate.

Personalised communication. Frequency and expected order value are strong inputs for mailing cadence, recommendation logic and assortment curation. A customer with high predicted frequency does not need weekly newsletter pushes — they buy anyway. A customer with high CLV and low frequency often responds to targeted occasions (assortment launches, limited editions), not to volume communication.

CLV Compared: Heuristic, RFM-Based, Probabilistic

If you need a quick comparison of where you stand today and where the next step could take you, these three maturity levels are the stations we see again and again in projects.

Maturity	Method	Effort	Insight	Use case
Heuristic	Average order value × order frequency × margin × assumed lifetime	Days	Backward-looking, no customer differentiation beyond segment	Rough economic modelling, early strategy phase
RFM-based	Recency, frequency, monetary quintiles, segment-level CLV estimate derived from them	Weeks	Describes the past well, robust as a future signal but coarse	Reactivation campaigns, base segmentation in CRM
Probabilistic	BG/NBD + Gamma-Gamma (or comparable models), per individual customer	Weeks to months to set up, then ongoing operation	Individual predictions with uncertainty; separates active from likely inactive customers	Spending allocation, CRM triggering, service prioritisation

The important reading: this is not a contest. RFM remains in place even in advanced setups as a descriptive layer — it is the simplest language in which marketing and service teams talk about customers. The probabilistic layer sits on top; it does not replace it.

CLV changes the decision pattern more than it changes reporting. As soon as marketing spending and service prioritisation hang on an individual value expectation, an entire class of gut decisions falls away — and the conversation between marketing, sales and finance becomes a different one. That is the real value behind the methodology.
Philipp KruegerCo-CEO, foobar Agency

Frequently Asked Questions

: CLV is a customer-individual value — the expected future value contribution from one specific customer. Customer Equity is the sum of CLVs across all customers of a company; the metric values the entire customer base as an asset. In practice, marketing operates with CLV; Customer Equity is a steering metric for executive leadership and for aggregate-level discussions on acquisition and retention investment.
: A rule of thumb from practice: at least two complete purchase cycles of your typical customer — for consumables that can be twelve months, for durables more like 24 to 36. What matters is not only length but cleanliness: anonymous orders, returns, voucher transactions and employee accounts have to be cleanly separated from regular purchases, otherwise they distort the model estimates.
: In principle yes, but with caveats. BG/NBD and Gamma-Gamma are optimised for non-contractual customer bases with many small transactions — classic B2C eCommerce. In B2B you often see few large orders per customer, longer sales cycles and account structures with multiple buyers behind a single customer number. That makes plain BG/NBD setups unstable in many cases. The next article in this series shows how to adapt the methodology for B2B — with account hierarchies, order types and longer time horizons.

CLV Use Case Workshop

In two hours, foobar Agency walks through with you which RFM and CLV maturity level fits your data, your commerce stack and your marketing processes — with a concrete maturity check, not with slides.

Request workshop

Dominik Thalmeier

Data Scientist

Dominik ist Data Scientist und Softwareentwickler und setzt sich leidenschaftlich dafür ein, eine Brücke zwischen Forschung und Industrie zu schlagen. Er hat sich mit Algorithmen des verstärkenden Lernens und des maschinellen Lernens befasst, um Krankheiten wie Demenz und genetisch bedingten Hörverlust zu diagnostizieren. In der Industrie ist er als Berater für Data Science Architektur und Governance tätig und arbeitet als KI-Entwickler und Data Scientist.

All articles by Dominik Thalmeier

Get in touch

We look forward to your enquiry.

Please accept marketing cookies to load the registration form.

Matthias Dietrich

CEO

+49 89 244 174 840 matthias.dietrich@foobar.agency