advertisement
How Spotify uses automation and microservices to gain speed advantage on larger rivals
So how does it ensure that it can keep pace with the development of new features, while scaling operations globally…
So how does it ensure that it can keep pace with the development of new features, while scaling operations globally to meet customer demand?
Speaking at an EMC-hosted event in Gothenburg last week, Spotify’s principal engineer, Niklas Gustavsson said that putting code into production quickly and effectively is key to its operations, and the way that the business organises its development teams is crucial.
“We have chosen to optimise for speed of delivery,” said Gustavsson. “The main reason behind this is that we are in a highly competitive and complicated market and we think that we can win by being faster than our competitors.”
advertisement
“Some of our competitors are much bigger companies, so we can’t beat them on that. The way that we can be faster than we otherwise could is by having highly autonomous delivery teams.”
Gustavsson said that this involves giving each team a well-defined mission they can act on: “That might be to make the best search product in the world, or the best music stream quality product in the world. Or it might be things like growing our subscriber base or building APIs for third parties.”
Spotify splits its development teams into groups on a number of levels – the three main ones being ‘squads’, ‘tribes’ and ‘chapters’. “The most important team we optimise for is the squad. A squad is the team that produces a feature – so the search team or the audio quality. In your usual agile methodology this would be a scrum team.”
advertisement
‘Squads’ are designed to operate independently of each other to avoid bottlenecks in development. “That is the man idea – by making teams as independent as we can, they won’t be blocking each other. Each team can execute on their own.”
‘Chapters’ focus on each individual’s personal development. “That is where we are trying to build strong engineers or strong QA, for example,” said Gustavsson – while ‘tribes’ arrange both of these group into a wider project, and “are supposed to work independently as a startup within the company”.
Individual ‘squads’ are also held responsible for the code that they put into production, building and deploying software, then managing the machine it is running on.
advertisement
“A very common theme in the way we evolve our organisation is the way we go from centralised teams to distributed functionality,” Gustavsson said. “A good example is the way we run our operations and make sure that our production and live environment stays up and working.”
“This also solves another problem, which is that they have a good feedback cycle, where, if you produce shitty software, [the creator] will be the one being woken up in the middle of the night, as opposed to someone else.”
Another way to allow its developers to move quickly is to adopt microservice architectures for applications, breaking them down into smaller, interlinking components. This allows the business to move “much, much faster” than it would otherwise. Read next: Microservices explained – Is microservices just tweaked SOA, or something much bigger?
“The nice thing is that this decouples teams,” Gustavsson explained. “When I deploy a new version of my software [] I don’t need to go coordinate with a bunch of other teams and figure out when we are able to find a good time to deliver. I can deliver whenever I want.”
“So a team at Spotify might be deploying into production tens or hundreds of times per day if they like.”
The microservices approach fits in with one of Spotify’s main goals – to automate as many processes it can.
“When it comes to the way that we build things technologically-wise, we want to automate as much as possible, basically everything,” he said. “We don’t want to do manual provisioning of servers, we don’t want to do manual employment of software, everything should be automated.”
There are also benefits for site reliability: “It also raises quality quite a bit because humans can be pretty stupid, so we will fail much more often. If there is something that we can automate, then that will dramatically reduce the failure rate, so that is what we are trying to do.”