If you are joining us, here is the story so far.
Episode 1 introduced our protagonist, The Entrepreneur, CEO of an AI-powered custom software factory in 2034. She struggled to understand why she lost money on her company’s customer guarantee: “Your idea, live in production in one business day, or your money back.”
Her engineers showed her that their delivery process was shipping orders with a cycle time of half a day. With two engineers, she was convinced they should comfortably meet their customer service guarantee. All her competitors were doing it, so why not her team?
In Episode 2, the Advisor, a Queueing Theorist, showed her why, using a simple queueing model for her business. The critical thing she was missing was that as queuing in front of the system increased with increasing order flow, the probability that customer lead times would exceed a day and trigger refunds also increased.
The Advisor alerted her that while her current losses were still modest, she was at much greater risk of losing money on the customer guarantee should her business grow further, as it seemed to be doing.
We pick up the story here.
The Current State
“Let’s recap,” the Advisor says.
“We know that currently, your current order arrival rate means your team is operating at a 60% utilization, and so you should expect some amplification of your service time of half a day when considering the end-to-end customer lead time, and this is entirely due to stochastic delays introduced by queueing.”
“Approximately 7% of your customers see lead times of greater than a day, and these orders trigger the refunds on the customer guarantee. As I explained, you must consider this a queueing tax for your operations. It is unavoidable given your current utilization.”
“The thing about your one-day guarantee is that it materializes the cost of delay and makes it tangible in your financials. The cost of delay should always have been part of your business model and planning, and you must manage this risk much more closely than you have been doing. Luckily, things are not terrible now but could easily worsen if demand increases.”
“For instance, let's look at what happens if demand increases by 20%, assuming that arrival patterns remain largely the same as they are today.”
“You can see now that the probability that a customer will experience a lead time of more than a day has increased to 20%! That is a significant fraction of your revenue that you will be refunding, and you probably won’t be able to sustain that. The cost of delay has become material now.”
“Wow!” says the Entrepreneur. “No way we can survive that, for sure! What do we do?”
“Well,” says the Advisor, “first, let’s see why this is happening.”
“At a 20% increase in demand, and assuming your service stays at the steady half-day cycle time, your system is at 73% utilization. At these levels, it is much more probable that an incoming order will find both engineers busy working on orders, and the probability of queueing increases rapidly. So too does the probability of customer lead times exceeding a day.”
“So what should you do? I recommend hiring a third engineer immediately. If your business expands as you think it will, they will pay for themselves in short order if they can help you keep all the money you earn in revenues instead of refunding customers.”
Scenario 1: More Engineers
“Let’s see what your process will look like with 20% more demand and an extra engineer.”
“You can see immediately how much of a difference the extra engineer makes! The Average customer lead time drops to 0.55 days even with 20% increased demand, and you only have a 2% queueing tax. Queueing is still a problem, but with your extra engineer, you are operating at a much lower utilization of 46% percent, so you have plenty of headroom to absorb both the overall demand and the variability in that demand.”
“But wait,” says the Entrepreneur. “This is good, but you’re telling me my only option is to hire more engineers? This was not something I had in my business plan at this time. I’ll need to show my board we considered other options before they’ll sign off on this!”
“Of course!” says the Advisor. “We should examine those options. It’s hard to hire skilled engineers even if you have the budget, so let’s consider the alternatives.”
Scenario 2: Increase Efficiency
“Before increasing staffing, it’s always a good idea to see if you are operating as efficiently as you can today. We already know that you have a highly optimized process, but it’s not difficult to assess what a hypothetical improvement to the current service time might do.”
“Let’s consider what the impact of reducing your current service time might look like. Let’s say we go from an average of 0.5 to 0.4 days for a 20% improvement in service time. Let’s assume your demand goes up by 20% simultaneously.”
“Here are your customer lead time probabilities.”
" The results are remarkable! The entire lead time probability distribution has shifted to the left. The most probable lead time is 0.42 days, and your average customer lead time is 0.51. There is only a 2.9% probability of lead times over a day. These results are very comparable to the effects of hiring a new engineer.”
“This shows you the hidden power of improving efficiencies in your operation. Improving efficiency has allowed your service a higher demand at a lower utilization, with the same number of engineers! This is a powerful tool to have at your disposal. It’s an option you always need to consider in these situations.”
The case for slack
“Except there is a catch in your case. Your team is already operating close to the edge in servicing the current demand. You have to consider how you are going to find these efficiencies you are looking for when the same team members are the ones who will need to help you figure out where they are and how to improve the system. Taking them offline for any time to create these improvements will impact your ability to service current customers.”
“You are running with minimal slack capacity, given your one-day guarantee. Adding an engineer is your best bet at this stage because once you have that engineer on board, you’ll be running at around 38% utilization at your current demand. This might seem scandalously low if you expect your engineers to be maximally busy at all times. But, this is capacity that you should reserve to optimize and improve your system.”
“You’ll have the opportunity to do this once the new engineer is onboarded and producing, and that is the point at which you can shift your lead engineer’s attention to trying to find efficiencies and further optimize your process.”
“Do both things, and you’ll always be running Lean.”
“Ultimately, the best way to manage your capacity strategically is to maintain enough slack to keep your options open on how to act. Once you lose the flexibility of having slack in your system, like you have today, you only have hard choices left, leaving you at risk of being unable to respond to your customer demand when you most need it.”
“I get it now." said the Entrepreneur.
“But there is one thing that is still bothering me. You seem to be implying that 60% is a soft limit on the utilization of my team if I want to maintain my customer guarantees. Are you saying there is no way I can achieve the maximum velocity we expected in our business plan? That seems like a big miss!”
“Yes,” says the Advisor. Since we are considering scenarios, let's play this out so you understand what achieving maximum velocity would imply.”
Scenario 3: Saturation
“Let’s say we wanted to run at maximum velocity. This means that we are delivering four orders per day, which implies that orders are arriving at least at the rate of four orders a day. In other words, utilization is 100%. For discussion, let’s use 99% utilization in our scenario.1
“Here is what your customer lead time distribution looks like in this scenario.”
“That’s completely shocking!” says the Entrepreneur.
“Absolutely, but completely expected.” says the Advisor. Your average customer lead time is now 18.5 days. A significant number of customers wait a month or more for their orders. Your team is maxed out servicing orders continuously, and every new order that comes in has to get in line behind a large queue of waiting orders. You are now in the world of saturation.”
“Once high utilization pushes a system over into saturation, it can become hard to dig yourself out of this zone and move back into a stable operating zone.”
“Before we had all these AI-powered tools we have today, much of the software industry operated in a permanent zone of saturation. Demand was always vastly higher than supply, and it was typical for customer lead times to stretch into months and even years. It was simply considered the way software development worked.
“Once you reach saturation, you only have one choice assuming you can’t add staff or increase efficiency: you throttle arrivals.”
“You set up backlogs to decide which customers deserve to be serviced before others, add prioritization and expediting policies to deal with exceptions, and generally have an inconsistent and unwieldy process and perenially unhappy customers.
“You would never stay in business with a money-back guarantee like the one you can offer today. As you can see, nearly all your customers would get free orders under this scenario!”
“From a utilization standpoint, what you are doing in this scenario is throttling arrivals to reduce the effective rate at which you start orders to maintain a certain utilization level relative to the service rate.”
“You put a big physical queue in front of your process where orders wait until your team pulls them into the service queue when they are ready to work on them. Maintaining that physical queue lets you pay attention to priorities and service customers with different strategies, etc.
“But most of your customers would wait for a long time to get anything prioritized through the development process.”
“I am only introducing these options here as a thought experiment and for historical context. In the 2034 market, these are entirely unacceptable strategies if you want to stay in business. Your customers would move on to more operationally competent providers!
“Unfortunately, the one-day guarantee is now table stakes in this business, so you cannot ignore the cost of delay like people could a decade ago.”
“Ok. You’ve convinced me,” says the Entrepreneur. “I guess our strategy has to follow your recommendation: hire another engineer and plan on continuously optimizing our process to push off the point where we need to hire the fourth as much as possible while maintaining sufficient slack at all times.”
“I’m glad you concur.” says the Advisor.
The Takeaway
“I’m happy we brought you in to help us resolve this disconnect in our metrics,” says the Entrepreneur.
“We thought we were dealing with a measurement or data problem, but you’ve opened my eyes to something much more fundamental I should have been paying attention to all along.”
The Advisor says, “You are not alone. I often see this in my work. It is common for operational metrics to be internally focused and explicitly focused on maximizing velocity. Velocity is a necessary measurement component, but you will always miss the big picture of your work's impact unless you also measure things from a customer perspective.
“Your one-day guarantee made the cost of delay embedded in your business model visible very early in your company's lifecycle. This was invisible in your operational metrics, but you started paying attention early because it cost you real money in refunds. Now you know what it means for your business, have the tools to continually monitor it, and stay ahead of the curve as you grow.
“Once you are upside down on having adequate capacity to meet demand and forced to operate at high utilization, you will always have much more complex choices. I am happy I was able to help you see this at a point where you can get ahead of it.”
“Let me summarize the key takeaways,” says the Entrepreneur.
My customer lead times depend not just on having an efficient service, but I also need to understand how customers interact with and place demand on the service in my operational metrics.
A probabilistic model helps me understand the causal relationships between customer experiences and my operational processes. My best bet is to use these models to improve the long-term probabilities of desirable customer experiences.
Since my business model is based on short lead times, the cost of delay is an important constraint that I need to pay attention to. In particular, I cannot expect to maximize the velocity in my process and run at very high utilization since this will increase the probability that I trigger refunds under the one-day guarantee.
I need to maintain a healthy amount of slack to be responsive to variable demand and have spare capacity to improve my process continuously. This is a mutually reinforcing cycle since improving efficiency creates more slack!
“Perfect! I could not have said it better!” says the Advisor.
The parable of the Entrepreneur and the Queueing Theorist will form the starting point of a deeper exploration into the applications of queueing theory and probabilistic reasoning in the rest of Season 2 of The Polaris Flow Dispatch.
We’ll delve into the Advisor's techniques in modeling the entrepreneur’s process and explain the math behind why her approach works.
We’ll also explore how this parable surfaces many missing aspects of measurement in software development today - the customer perspective, the cost of delay, probabilistic modeling, incorporating variability, etc., as core measurement tenets for software development systems.
While this example might seem overly simplistic, it surfaces deeper issues worth examining in more detail.
I hope you’ll stay tuned for the rest of the season.
Technically, utilization can be greater than 100% when the arrival rate exceeds the service rate. We use 99% here simply for technical reasons - closed-form computations in our models have singularities at utilization equal to 100%. We will discuss this fact in more detail in later posts.