Oct. 1st, 2023
Sharing is caring
The minute computers were born, people rushed to find ways to share them.
Like any scarce and prohibitively expensive resource, borrowing one made economical sense
if you only needed it temporarily, or if demand outstripped supply.
In 2023, you rent an H100 in the cloud or an Airbnb facing the Eiffel tower for the same reason.
Interestingly, once personal computers became affordable, the need for sharing their resources did not disappear: it
just changed timescale.
A modern operating system is mostly in the business of sharing resources between processes.
Take the task scheduler: its main purpose is to give the illusion of concurrency to thousands of threads contending
for a very small number of CPUs. It does so at a timescale that can be as low as a few microseconds.
Memory management is also crucial, in a time when people routinely discuss
loading LLMs with billions of parameters into a GPU memory. In fact, "memory is the new CPU": incompressible and hard to oversubscribe,
it is the most scarce resource of modern computing.
Underneath the operating system, the hardware itself is busy sharing its resources: whether it's caches or execution units through pipelining,
keeping the silicon idle is the ultimate sin. Though at this level the motivation is usually speed and not cost, the two are often interchangeable
when you borrow someone else's resources in the cloud: time is money.
CFOs darling, perf engineers foe
While multitenancy is so obiquitous in personal computing that it is often forgotten, discussing multitenancy in the cloud
is often a controversial topic. There are two main categories of reasons for that:
- Technical. Doing cloud multi-tenancy right involves dealing with scheduling & resource allocation problems spanning
10 orders of magnitude in timescale: from contention inside CPU runqueues or the NIC which can manifest at the microsecond level,
to deciding which pods to prioritize launching on a k8s cluster in a few hundred milliseconds,
to spawning or killing VMs on a minute or hour timescale. The security aspect also involves dealing with all the nuances
and subtleties of virtualization technologies.
The expertise needed to solve these problems well is non-trivial. But like most engineering problems, put smart people at it for long enough, and they will find a solution.
Organizational. Executives footing the bill of a company's cloud infrastructure want to make the most of
what they pay for: "why is this cluster sitting idle between 2am and 4am?" they ask. On the other hand, anyone oncall for a critical online service
values predictability over pretty much anything else, and multitenancy is often synonymous of variability.
Security engineers would also sleep better at night if they didn't have to worry about the latest speculative execution attack and
how it might impact a multi-tenant environment.
Making it happen
The technical challenges are hard, but not insurmountable. The organizational ones are harder, because they involve people. Designing
a multi-tenant system carefully trading off performance, cost and security means making a lot of different people in the organization happy.
Things that help:
- Observability. Scheduling for a multi-tenant compute platform is hard. Behaviors only visible at a microsecond timescale
with an eBPF program can have a cluster-wide impact. Conversely, colocation decisions made in a k8s scheduler plugin can trigger
edge cases inside the resource allocation logic of a Linux subsystem. Building means to observe the system at all levels of granularity is essential
to avoid relying on the loudest voice's personal anecdotes to make decisions.
- AB-test driven performance. Software perf is usually assessed through local benchmarking. This is a fine way to do things
in a single tenant environment, but can break in a cloud multi-tenant one: as "real world" performance of a piece of software
will be impacted by its cotenant neighbors, you want to build a mean for perf people to easily run experiments on real clusters full of
real neighbors. They will be able to see for themselves if that cache locality optimization that makes a piece of code 1% faster is worth the extra
10% cost for the overall platform.
Data. Be relentlessly data-driven. To the extreme, that might mean letting machine-learnt systems make the right scheduling decisions.
Systems and kernel engineers often build "slack" in their systems to account for the unknown (burst of unpredictable requests or jobs to execute).
This slack means wasted dollars. Machine learning can claw back some of this waste by learning the inbound demand patterns a scheduler receives,
and use this knowledge to make better scheduling decisions through statistical multiplexing and oversubscription.
- Aligning incentives. The extra variance introduced by multitenancy might be on the back of the mind of an oncall engineer when an odd
performance regression happens in production. Making the people responsible for the multi-tenancy layer part of oncall
rotations helps everyone build empathy and understanding of the tradeoffs involved.
Finally, since the changes needed are likely spanning a huge part of the engineering organization, keeping the number of decision makers small
helps get things done.
Is it worth it?
It depends of course. We made it work financially at Netflix for a big chunk of containerized compute, but it wasn't without sweat.
A long era of high-interest rates might push more companies to explore similar ways to be more efficient with their cloud spend.