What to expect from serverless
Moving an industry forward requires some give and take
Lately I've been seeing so much churn around what does and does not constitute a "serverless" product that I am getting weary of the term. It is being misused for short-term gain by both proponents and critics - some in glass hype houses have been throwing marketing stones. There is alignment between the goals of serverless and the goals of cloud-based infrastructure (virtualization and containerization before that). You want to innovate faster by applying your precious resources to make a differentiated impact. You don't want to waste time building and operating system components that could easily be commoditized. This industry (and it is not alone) has long been on an undeniable path towards consolidation, standardization, commoditization. There was a time when there was no choice but to build your own computer!
In this blog, I'm going to share serverless insights from my own journey. I know... you might be thinking "oh, great - yet another pundit pushing their personal serverless definition". I hope you'll stick around, as I'm going to talk about this not only from the consumer side, but from the provider side - something for everyone! I'll build the discussion around pricing models and scaling elasticity, and we'll consider unstated business and developer contracts in any serverless product that aims to be more than a fleeting also-ran in the push towards a simpler future. Let's start with a baseline around serverless expectations - for the consumer of the service, and then for the provider.
What the consumer wants (it's a lot!)
Simpler and faster path to getting a particular architectural functionality in place and operating efficiently and reliably.
Cost which generally scales linearly with consumption, and in small increments.
Decoupling of fundamental resource types when it comes to cost and scale: CPU, memory, persistent storage, network throughput.
Enough choices to cover the majority of needs, with a few extra options to cover some corner cases. Don't require a bunch of configurables just to get started.
Low financial barrier to entry. For experimentation and other non-production environments, give me a starting point that doesn't result in a monthly bill that's hard to justify - and give me automation to avoid expensive accidents.
Scaling elasticity - and minimal management of limits (or quotas, as AWS likes to call them these days). Scale out and scale in as required, covering all spikes in load without penalty.
Present the product to me as a nice clean API.
Minimize disruption from maintenance.
Discounts at scale: let me share in the economy of scale my consumption brings.
[Meanwhile, at the serverless pet shop... a customer wants a replacement parrot.]
"Sorry Gov, I'm right out of parrots - but I've got a slug!"
Facing the challenge as a provider (it's hard!)
Have to abstract the servers away from the product experience.
Need to build around multi-tenancy and achieve economies of scale with oversubscription.
Need to build in just enough resource capacity buffer to cover rapid changes in requirements for all customers without requiring onerous planning effort on the customer's part.
Have to isolate customer loads and avoid noisy neighbor effects without imposing limits that become onerous enough to ruin the customer experience.
Must develop a pricing model which is cost-following: allocates cost fairly to consumers based on the portion of load they place on various component parts of the service infrastructure.
The biggest fish in the bowl might actually be a whale
(Yes, I know whales aren't fish). When you're building a serverless product, you focus closely on the infrastructure costs and you try to track to a multi-tenancy ideal. There are many advantages to multi-tenant designs in the context of serverless products - Khawaja Shams wrote an article titled "the dark art of multi-tenancy" that I'd recommend as a foundation if you are interested in learning more about the pros/cons. The article speaks to the difficult balance of simplified scaling elasticity and protection against noisy neighbors. I want to add some emphasis on the degree of difficulty when economies of scale are a distant hope, or when dealing with variability in tenant size.
When we think about multi-tenancy and oversubscription, we tend to imagine a whole bunch of customers all neatly fitting into a "typical" subscription size. In practice, the distribution might look more like the following plot.
Such a scenario has greater significance as a new provider, or an existing provider branching into a new pool of resources (new region? different cloud provider?). What kind of experience do those "whale" customers get? Well, they aren't getting the same elasticity of scale - they are paying for the large pool of resources that everyone else is happily frolicking in. The smaller customers are living it up, neglecting good practices for smoothing load like caching and queue-based load-leveling, and paying minimally for the benefit of a "no ceiling" experience. Often the whale is asked to: 1/ provide signal on scaling requirements; 2/ spend time tweaking limits; and 3/ make commitments about continued consumption in the future. The whale may begin to wonder if they are really getting a fair deal - oversubscription is not working out so well for them.
A knee-jerk provider reaction may be to offer a separate whale "pod" - you tell us about your load requirements and we'll build you a resource pool that you don't have to share - at a great price! This is not a great long-term outcome for anyone, actually - it is short-sighted. As a provider, you need to focus on scaling out in order to get the best of those multi-tenant benefits for everyone.
A school of herring can outweigh a whale
As a serverless provider wanting to grow your product, you understand that getting broad industry adoption takes time. You have to win the hearts and minds of consumers in small increments - enticing people to experiment with your product with a low barrier to entry is on your 3 year plan! And so you focus on an offering that is low in commitment and high in value. Maybe even a free tier? If your service actually persists some state, be very careful about offering a free tier in perpetuity. It may not seem important at first, but in time the resources associated with consumption in the free tier (often forgotten and unused) can contribute to slowing of your progress towards the efficiency of scale that benefits your customers - and your paying customers end up footing the bill - the free tier isn't free, folks.
"Scale-to-zero" could be a race to the bottom
Calls for scale-to-zero are common and some will criticize services which don't appear to check that box as "not serverless!" It's an unreasonable demand in many cases, and here is why: some services must maintain significant state and/or a certain level of isolated infrastructure capacity to performantly serve the first sporadic request from a customer without an unacceptable noisy neighbor risk. What makes the difference? Persistence and heavy operations. A simple request to retrieve a stored record by key is light and easy to serve. But what if the request is for a complex SQL query that could scan many uncached gigabytes and consume a lot of CPU with JOIN, SORT, etc? If you're familiar with Redis, think about the difference between the GET (@fast) and GEOSEARCH (@slow) commands - vastly different time complexities. Logically, demanding a true scale-to-zero experience (such as that of S3) isn't going to end up well for some services because of their very nature - it will mean comprised quality of service for the consumer, and takes everyone back towards the undesirable whale/herring scenarios discussed above.
There is a very reasonable consumer expectation that we can take from a "scale-to-zero" ask - as usual, one needs to don the product manager hat and read between the lines. In some cases, a consumer of the service is willing to make a trade-off on the experience, and is happy to indicate this through a configuration. A good example is Aurora Serverless v1 - customers could check a box to say that for a particular database, they wanted the resource to go to sleep and stop incurring some charges if it appeared inactive for a while. For experimenters and non-production stages it might be a very reasonable choice - it's okay if the first request takes a while to see a result (30 seconds, a minute, perhaps) - just don't make me manage starting up and shutting down - and I don't want to be at risk of running up a bill on forgotten test environments. This was a terrific solution for both consumer and provider. Unfortunately, it went away with v2 of the product.
Aligning around a contract
How should we bring all of the above together for long term success as a consumer and as a provider? The most crucial element is the pricing model. Get it wrong early, or appear wishy-washy, and your product will face an extended uphill battle on acceptance and growth - that won't help anybody. Tips for pricing model success:
Make it "cost following" but simple. The cost that is passed to the consumer should track well with the costs incurred in providing service to them. This might seem obvious, but you'd be surprised how many products get this wrong. DynamoDB got the core dimensions of this right: you pay for gigabytes stored, and you pay for requests by size. This works well because the requests have bounded complexity - the metering by size encapsulates the minimal compute cost, the network transfer, and the storage access. It's not perfect, and if there were ever a feature added to respond to complex/aggregates, there might need to be a compute element added. But that's okay - the existing dimensions still fit. Customers will have workload variations that consume these dimensions in different ratios, but the cost is still allocated fairly.
Keep minimum buy-in low. If you push experimenters away by making it expensive to play with and learn about your service, your product's long term prognosis is poor. Further, don't burn learners with a surprising bill if they accidentally forget to shut down a resource. This doesn't mean "scale-to-zero" is a requirement - it means giving them reasonably priced minimal configurations, and making it easy to have them automatically shut down when idle.
Don't give away too much, or for too long. Having consumers paying for their consumption encourages good practices that benefit everyone - optimize, clean up unused resources, avoid waste.
Don't bundle indirectly related resource types as a single unit of spend or scale. An invented "unit" that couples say, storage volume with compute cycles in a service where workloads will run a broad spectrum of consumption ratios is just like going back to servers. As a consumer, you'll have me overspending on one resource or the other, and you might even have me trying to optimize by choosing between different types or fixed configurations of the "unit". If it looks like a server and smells like a server, please don't try to sell it to me as serverless.
Make scaling and metering highly granular - I want my cost per unit of consumption to be close to linear - if there appear to be step function increases ahead, I'm not interested.
Offer simple mechanisms to share in the economy of scale that my consumption affords the product. This can be tiered pricing (like S3), or a provisioned/reserved rate in return for a commitment or signal about expected traffic patterns. This is where all of the "expensive at scale" complaints come from - listen to them because there's certainly some truth behind them.
No penalties for scaling in. If I have temporarily scaled out in consumption of one resource dimension (without signing on for a greater commitment), I want to be able to return to the exact same level of spend for the original consumption level. Scaling elasticity is about cost as well as function.
Some assessments to consider
You may or may not fully agree with me on the following, and I'd love to hear what you think. If nothing else, I hope the below information will give you some fresh perspective.
ElastiCache Serverless - this was long overdue, and the result is really quite a good start! The Memcached variation is most meaningful and achieves the best serverless experience - it even adds multi-AZ resilience where there was no replication before. It's reasonable that "scale-to-zero" may not be a fit for ElastiCache, and the Redis variation complicates things in particular (supporting a range of high time complexity operations). The minimum spend of $90/mo (presented as a minimum billed consumption of $0.125 per GB-hour) is not ideal, but it's also not terrible. Needs work: the variability of unit consumption by time complexity of the operation feels a bit opaque ("commands that require additional vCPU time or transfer more than 1 KB of data will consume proportionally more ECPUs"), and there needs to be an option to allow for automatic "sleep" when consumption truly drops to zero for some period (no data stored and no active operations). There is also no mechanism for discounts at scale.
DynamoDB - a long time standard in the serverless landscape. Started out well with decoupled metering/pricing of storage and requests - truly a groundbreaking product at launch. There's no maintenance window with DynamoDB, and no visible version concept (except for Global Tables!!) - service-side maintenance is minimally disruptive and for the most part passes unnoticed by customers. 12 years in, DynamoDB is showing its age: metering anomalies and billing pain points that force customers to jump through hoops have not been addressed. An on-demand capacity mode was introduced to simplify management of scaling beyond auto scaling of the provisioned capacity mode, but it is comparatively so expensive that it rarely makes sense for customers operating at scale. The choice of capacity modes has actually moved the product in the direction of greater complexity instead of enhancing simplicity. The integrated caching feature is woefully "serverful" and has poor software support. There is a path to discounted rates at scale: provisioned capacity with auto scaling, and capacity reservations - but managing reserved capacity is a miserable experience. DynamoDB's core infrastructure costs must surely be storage, compute and network throughput - all of which have seen massive bang-per-buck improvements over the years, yet DynamoDB has not delivered a price reduction in a decade. Product improvements that move the needle have become few and far between. Sadly, it seems that this product has reached stasis - may no longer be receiving any AWS investment? Customers should expect more.
Lambda - functions as a service! Often held as the genesis of serverless at AWS, but actually preceded by many other services which track well to the serverless target. I think there is a lot to love about the Lambda service - it allows running of your own code in an execution environment you have a lot of control of, without requiring you to deal with hassles at an operating system level. There are terrific ecosystem integrations available. The pricing model checks the "scale-to-zero" box for most people, and you pay for actual consumption. There is a mechanism for savings at scale with a commitment (Compute Savings Plans). In short, Lambda provides an amazing level of abstraction that is a great fit for many needs. Unfortunately, some cracks have appeared in its shiny serverless appearance since launch in 2014. As developers push it harder, they find reasons (like latency, memory configuration vs performance) to dig deeper into the underlying containers and operating details. One particular element that troubles me is the scaling which couples memory configuration with vCPUs - when developers have to choose a configuration and delve deep into tuning this feels too much like selecting an instance size. And dealing with concurrency limits, reserved concurrency, and provisioned concurrency seems to stray quite a ways from the serverless ideal. I can't help thinking that there will be a next level of abstraction to further refine the serverless compute experience.
OpenSearch Serverless - this launch seemed to land with a thud - and I am not surprised. In some ways it is a very helpful step forward in management for OpenSearch on AWS, but I feel (as many others do) that it falls way short of being an acceptable effort for anything that wants to bill itself as serverless. Reading through the pricing page it is readily apparent that the OpenSearch Compute Unit is smoke and mirrors - it includes a fixed ratio of memory, vCPUs and EBS storage - sounds a bit like a server, right? It goes on to explain that a minimum of four such units will be provisioned, because it requires leader and standby for each of two different functionalities. These are details one should not have to be concerned with in a serverless product. The minimum configuration costs around $350/mo (no standby OCUs) - which just doesn't make sense at all for a builder who is starting out with a small project, and scaling out from there is in increments of $175/mo. There's no discount mechanism at scale. It takes 5 or 10 minutes to create a new collection, and from there on some form of auto scaling will occur - there's not much detail on this.
What was I talking about again?
Ah yes, all the churn around this "serverless" term. Has it lost its meaning? Well, I know what it means for me, whether you choose to call it serverless or not.
Serverless is a (newish) tool for our toolbox. Making that tool everything it can be will take relentless (but realistic and clearly stated) consumer demands. And it will also take providers who begin with an earnest attempt and then consistently iterate on the experience while never wavering from the fundamentals: deliver simplicity and the benefits of multi-tenancy by achieving economies of scale. If we get all this right, developers can move more of their applications to the serverless band of the spectrum - that's the progress I want to see.