The systems behind how Spotify runs recommendations

As companies push more software decisions through machine learning systems, a quiet problem keeps showing up—especially in large recommendation platforms like Spotify’s recommendations. The same infrastructure is often asked to do two very different jobs.

One job is personalisation. That system needs to respond in real time, serve millions of users at once, and stay stable even when traffic spikes. The other job is experimentation. That system needs to test ideas, compare outcomes, run analyses, and accept failure as part of learning. At Spotify, engineers decided those two jobs should not live in the same place.

The company has spent years building large-scale personalisation systems that decide what music, podcasts, or playlists users see. At the same time, it runs constant experiments to test ranking logic, recommendations, and user flows. Early on, those systems were tightly linked. Over time, that coupling became a problem.

Personalisation systems demand speed and reliability. Experimentation systems demand flexibility and careful measurement. Mixing the two made both harder to manage. Spotify’s response was to separate them.

Rather than treating experimentation as a layer inside the personalisation stack, engineers split the two into distinct systems with clear boundaries. One focuses on serving results. The other focuses on learning from them.

This may sound like a small architectural choice, but it shapes how teams work, how models move into production, and how risk is handled across the company.

Two systems behind Spotify’s recommendation architecture

In Spotify’s setup, personalisation pipelines are built for low latency and high availability. These systems answer live requests, often under strict time limits. Any delay or failure is visible to users.

Experimentation systems operate under different rules. They collect data, run comparisons, and support analysis over time. Latency matters far less than accuracy, traceability, and repeatability.

Keeping those systems separate allows each to be optimised for its own job. Experimentation can change more often without risking outages. Personalisation can stay stable even while new ideas are tested elsewhere.

This separation also limits blast radius. A flawed experiment does not interfere with production traffic. A production issue does not invalidate weeks of experiment data.

For engineers, this changes how work flows through the organisation. Models are not pushed straight into user-facing systems. They move through an evaluation path first, where results are checked and debated before anything is served at scale. That path matters more as machine learning becomes harder to reason about.

Why this matters more in the AI era

As Spotify’s recommendation systems grow more complex, it becomes harder to explain why a model behaves the way it does. Small changes can have wide effects. Debugging often happens after users notice something is wrong.

By keeping experimentation separate, Spotify can slow down decision-making without slowing down delivery. Teams can ask whether a change helped, hurt, or shifted behaviour in ways that were not expected.

This also creates a clearer record of how decisions were made. Experiments are logged, compared, and reviewed before models move forward. That record becomes important when teams need to revisit past choices or respond to internal questions about impact.

The approach also reflects a broader shift in how large software teams think about AI systems. Models are no longer treated as isolated components. They are treated as ongoing processes that need oversight, testing, and rollback options.

For developers, this means more work happens before production, not less. The goal is to catch issues early, when they are cheaper to fix and easier to explain.

Platform engineering, not model tuning

What stands out in Spotify’s approach is that the main challenge is not model quality. It is coordination.

Separating experimentation from personalisation forces teams to agree on interfaces, data contracts, and ownership. It requires shared tooling for logging, evaluation, and review. It also requires patience, because not every idea moves forward.

This is where platform engineering plays a central role. The platform is not just a set of tools. It is the system that decides how ideas move through the organisation.

When those rules are clear, teams can work faster without stepping on each other. When they are unclear, progress slows and trust breaks down. Spotify’s experience suggests that scaling AI is less about choosing the right model and more about building systems that support disagreement, measurement, and gradual change.

Lessons from Spotify’s approach to recommendations and experimentation

Most companies do not operate at Spotify’s scale, but the trade-offs apply widely.

Many teams still run experiments directly inside production systems because it feels simpler. Over time, that simplicity disappears. Changes become harder to reason about. Rollbacks become risky. Confidence in results drops.

Separating experimentation from serving does require upfront work. It adds process. It forces teams to slow down in places where speed once felt more important.

But that friction can be useful.

It creates space to ask whether a system is doing what it was meant to do. It allows teams to test ideas without committing to them. It gives engineers clearer signals about what is safe to ship.

As AI systems take on more responsibility, those signals matter. Spotify’s architecture choice is not a template that others should copy line by line. It is a reminder that infrastructure shapes behaviour. When systems are designed to favour learning over speed, teams tend to make better long-term decisions.

In the push to scale AI, that may be one of the most practical lessons to take away.

(Photo by Imtiyaz Ali)

See also: Software development in 2026: Curing the AI party hangover

Want to learn more about cybersecurity from industry leaders? Check out Cyber Security & Cloud Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the AI & Big Data Expo. Click here for more information.

Developer is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

Two systems behind Spotify’s recommendation architecture

Why this matters more in the AI era

Platform engineering, not model tuning

Lessons from Spotify’s approach to recommendations and experimentation

Related Posts