Attribution choice is incentive design
The Numerator | Issue #2 | May 19, 2026
The first time the org rolled up the three recommendations teams as a cluster, the number was flat. Not declining, not growing. Just flat across the quarter. Each team’s own dashboard told a different story: green here, red there, green again, all of it inside the band that looks like normal quarter-to-quarter noise. No team was obviously broken. The aggregate said something was. That was the gap that started the conversation.
The Numerator is practitioner notes on product data science at consumer scale. Written by Harel Rechavia, formerly Google Waze, Amazon Alexa Shopping, and Viber. Subscribe for one or two posts a month.
The setup
A large ecommerce platform had three recommendations surfaces, each owned by a different team. Replenishment notifications pushed a message when the platform predicted a user was running low (”Heads up, you’re probably running low on coffee pods”). Search recommendations surfaced products when a user typed a query. High personalization recommendations was the always-on, model driven surface that picked products based on a behavioral profile.
One user, one coffee pods purchase, three teams that could plausibly claim credit. So the platform picked an attribution model. Three options were on the table: First-Touch (credit to whichever surface introduced the product to the user first inside a 30-day window), Even-Split (credit divided across every surface the user saw inside the window), and Last-Touch (credit to whichever surface the user came closest to the moment of purchase). The org picked First-Touch. The reasoning sounded right: reward the team that introduced the product. I was on one of the three teams when this rolled out and saw firsthand how our decisions started changing.
What the teams actually optimized for
The behavior change was rational. If credit went to whoever was first, the dominant move was not “recommend coffee pods when they are most relevant.” It was “recommend coffee pods now, before the other two surfaces do.” Anything a user might plausibly buy in the next 30 days became fair game to push in front of them today.
The catch is that most of those purchases would have happened anyway. The user was going to buy coffee pods because they were running low, not because three different surfaces had impressed them with the idea. The platform was crediting baseline behavior the user would have done anyway, not incremental purchases the team had caused. This is the same mechanic Uber’s growth team eventually surfaced when an incrementality test against a no-paid-ads control group led them to cut roughly $30M in annual U.S. Meta spend. Their paid ads were mostly reaching people who would have signed up anyway. Standard attribution had been crediting them for it.
Because the underlying purchases were largely fixed, the attribution credit was not being created. It was being shuffled. When Replenishment feature won the race to be first on a cohort of coffee-pods buyers, their metric went up and one of the other two surfaces dropped. Next quarter, the trade ran the other way. A senior Airbnb growth-marketing lead described the same pattern publicly as teams using attribution “to re-slice the pie” instead of growing the pie. From any one team’s seat, the metric was volatile but the direction was unclear. No one was lying. No one was sandbagging. Everyone was rationally chasing the credit the model handed out.
The slow realization
No single postmortem caught it. Each team’s DS saw their own metric move, sometimes up, sometimes down, mostly inside the band that quarter-to-quarter noise normally lives in. From inside one team, there was nothing to see. The behavior was only visible if you stood far enough back, and nobody was standing that far back.
The catalyst was structural, not analytical. The org started measuring the recommendations cluster as a group, on top of measuring each team. At the group level, growth was slow. The aggregate told a different story than any individual page told. That gap was the first thing that made anyone ask why three teams that each looked roughly fine were collectively not moving the number that mattered.
The thesis
Attribution choice is incentive design. Picking a model is not a measurement decision. It is a behavioral one. It will tell you how your teams are going to race.
Three teams plus First-Touch is a race to be first on purchases the user would have made anyway. Three teams plus Even-Split is either cooperation or quiet collusion on volume. Three teams plus Last-Touch is a race to intercept buyers at the bottom of the funnel. There is no neutral option in this set. Every model selects for a behavior. The right question before picking one is not “which model is most accurate.” It is “which behavior do we want from the teams.”
Mature measurement systems sidestep some of this by calibrating attribution against incrementality experiments. Most orgs picking their first attribution model are not running both, and the behavior consequence shows up first. When a metric becomes the target teams are graded on, the teams will optimize for the metric.
What to do Monday
When you are asked to design attribution for a multi team product, treat it as a behavior design problem, not a measurement problem. Bring the team leads in before the model is picked, not after. Ask “what behavior do we want from these teams” before “what attribution model fits the data.” If the org cannot answer the first question, the model will answer it for them. And the answer will look like normal noise on every individual dashboard for a long time before it shows up at the aggregate.
Further Reading
Goodhart’s Law (Wikipedia, also covers Campbell’s Law)



