Note: This post was originally published on the Smarter Engineering blog
Little's law applies to any queueing process, defined simply as any process where work enters and leaves the process after spending a finite amount of time in the process. This is an amazingly large universe.
Little's Law establishes a relationship between three measurements made periodically on the queueing process:
The rate at which work arrives (the arrival rate) or leaves (departure rate or Throughput)
The time work spends in the queue (Cycle Time) and
The total number of work items that are in progress in the queue (WIP).
If you are interested in understanding how to allow work to flow efficiently through a queueing process, these are clearly the things you want to understand and manage.
Little's law states that under certain conditions, there is a precise relationship between the three quantities above, and mathematically proves the conditions under which this relationship holds. It implies that when these conditions hold, knowing any two of the above quantities completely determines the third one.
There are very few physical laws that apply to software development, but this is an ironclad one that we can almost think of as Newton's Law for our field so it's worth really understanding it well.
So let's dive deeper.
Statement of the law
There are several versions of this law, but the one that is most relevant for software development is shown below. This particular version assumes that the queue never empties out at any point during the time we are making measurements, which is a reasonable assumption to make in our context.
We'll state the law more fully below just to begin with, but we'll break the components apart and parse each for meaning in the next few sections.
Parsing the Law
Let's leave aside all the technical conditions for a second and look at the relationship between the measurements.
The first thing to note is that it is a relationship between averages. So, the law is not saying anything about the cycle time of any individual item or the actual throughput of the system, but just about average values of these properties for the queue over a sufficiently long measurement period.
This is important to keep in mind, particularly in light of the way you might see the law written out in popular expositions (without those pesky average symbols).
Now, let's look at the conditions
The average arrival rate is equal to the average departure rate (throughput)
All work started will eventually complete and exit the system.
The amount of work in progress is roughly the same at the beginning and end of the measurement period.
The average age of work in progress is neither increasing nor decreasing.
Note that the conditions have nothing to do with the internal structure of the queues or how long any given item takes to process, or even what is actually going on in the process. The four conditions above are all needed for a well-behaved process regardless of its internal structure and can be viewed as the definition of flow.
The conditions are process agnostic, and relatively simple to measure based on the average behavior of the queue over time. That we can characterize a well-behaved queueing process without knowing much about the details of the process, is truly a non-intuitive result!
So now we have a reliable set of measurements to diagnose when a software delivery process is not well-behaved. We can use this to adapt the process so that it becomes well-behaved, i.e, it has a predictable flow.
How we change the process does not matter, but we have a way of testing whether the changes improved flow or not in a very concrete, measurable way. This is a powerful foundation for continuous, iterative improvement of a software delivery process, which is our true objective here.
Kanban, it turns out, is only one possible collection of techniques that exploit these conditions to improve flow.
An example
Let's take a simple example to illustrate things. Assume we have a Scrum team operating in 2-week sprints 2. They typically finish most items that they signed up for during the sprint, but sometimes things spill over and get done in sprints later than the ones they were originally started in.
Let's start by looking at this team over a measurement period of 30 days which is about 2 sprints worth of work. We also have to choose a fixed unit of measurement to apply Little's Law and we choose a day for this.
Over the 30 days, the team finished 10 stories. The average throughput is therefore 1/3 stories per day. This is the first key component of Little's Law. It measures the rate at which the team is finishing work (not how much work the team finished).
The second quantity is the average daily WIP. This is the number of stories that were in progress on average, for each day in the 30-day period. This number can go up and down as stories are started and finished during the period. In this example, the average is 5.
Then what Little's Law is saying is that assuming the rest of the conditions of the theorem were satisfied, each story should expect to spend an average of 5/(1/3) or 15 days in the process. This is the expected average cycle time for the process during this measurement period per Little's Law.
But, if we actually measure the average cycle time and it turns out to be much more or much less than this expected value, then we know for sure at least one of the conditions must have been violated, and we can go looking for the root causes here.
Alternatively, we can use Little's Law to limit the WIP.
As a second example, let's say the team finished the same 10 stories in the 30-day period, but the average daily WIP was 8, then the average expected cycle time is now 24 days (8/(1/3)). So, given that we are completing stories at the same rate, if you have more stories in progress at any time, you should expect each story to take longer on average.
Interpreting Little's Law
In both cases, what Little's Law is ultimately saying seems somewhat obvious and intuitive. This is the behavior we would expect to see if say, we were standing in a long checkout line at the grocery store - you can look at how often people are leaving with their groceries and look at the number of people ahead of you in the line and try and get a quick estimate for how long you would have to wait to get checked out. This is clearly easy to see in a single checkout line that is moving efficiently.
What is truly non-intuitive is that this same, simple formula can be applied even if you looked at all the people waiting in line across all checkout lines in the grocery store, no matter how many lines there were and no matter how fast any single checkout line was moving, or how long any given person took to get checked out provided the whole checkout process is operating efficiently.
Little's Law tells us what conditions have to hold across the whole process, for this more complex system to behave just as predictably as the single checkout line, and this is what we define as efficient flow.
What is even more exciting is that they are not even particularly complicated or esoteric conditions to check. The four conditions are our secret weapon here which we can use to figure out if a software delivery process is running with a predictable flow of work.
If we fix the problems that cause the conditions to fail, we will have a well-behaved process, and we can start predicting accurately how long things will take to make it through the process at any given time.
This is the power and beauty of Little's Law. If we can make the conditions of Little's Law apply, we just need to make a few easy measurements on the process periodically, without knowing much else about how work is actually done, to manage the flow and ensure a predictable delivery process. If you've ever struggled with the question of figuring out how to get a software team to deliver predictably, this is clearly a nice set of tools to have, right?
Of course, the million-dollar question is, do real-world software delivery processes meet those conditions?
And the answer is a definite no.
It turns out that in most software delivery processes, even ones that have adopted Kanban, the process that takes direct inspiration from Little's Law for its construction, not all of the conditions typically hold naturally.
We have to carefully manage each aspect independently to achieve flow. This is a big reason why software delivery is so unpredictable in general. But if we apply Little's Law carefully from first principles, we have a good analytical framework to find and fix problems that lead to this unpredictability.
This is what we will look at in more detail in a series of posts that we will be publishing here in the coming weeks. If you'd like to stay in the loop, please subscribe to our blog.
Acknowledgments
This post draws heavily from Dan Vacanti's published work for inspiration, and the definition of Little's Law is based on the version in his book "Actionable Agile Metrics for Predictability". I highly recommend that book for its accessible explanation of theories of Flow that give you the key ideas without getting too technical.
If you are interested in a much more technical read, Prof Ed Lazowska's textbook which is available online is a fantastic resource for the more mathematically inclined. Even though it is focused on the performance of computer systems, the underlying theory is the same because Little's Law is so general. It's also a good way to build intuition for how to apply Little's Law from first principles, which we will be exploring in more detail in future posts.
Many thanks to Priya Mukundan, Bryan Finster, and Ron Lichty for edits and comments to versions of this post.
Footnotes:
There are a set of assumptions about the steady-state behavior of a stochastic process that is technically needed to support applying Little's Law, the most important being sufficiently long intervals for measurements and using consistent periods and units for measurements. We are using days as the unit here, and choosing to describe the law in terms of daily averages for simpler exposition, but the general form of the law can be applied using any consistent set of units.
These conditions are necessary, primarily because the law is phrased in terms of system response time. This means we need to account for end effects: when observing arrivals and departures over a fixed interval, we need to account for the arrivals that came before the start or end of the interval and have not departed. There is a version of this law that applies unconditionally if we express it in terms of a quantity called residence time, that ignores end effects. The conditions of Little's Law may be viewed as those that make residence time identical to response time. In practice, many tools of operational analysis use residence time as the key measurement, under the assumption that the measurement interval is much larger than the average residence time. This allows you to do some powerful types of analysis on queueing networks, which we will also explore in an upcoming post.
We are choosing a Scrum example here just to start motivating the fact that we can apply the reasoning of Little's Law to any software delivery process. ScrumBan is an example of a branded process that applies these principles, but our point here is to avoid jumping straight into a canned process implementation but rather encourage you to think about the underlying reasoning process and come up with how to adapt whatever process you have today from first principles. It does not need to be one of the canned off-the-shelf name-branded processes, but if those work for you as a starting point, please feel free to use them as well. You'll still find this analysis process useful to adapt those to your needs too.
Of course, just because the average cycle time is below 2 weeks does not mean everything is completed in 2 weeks, we typically need to have stronger conditions on the cycle time distribution to ensure this.