This is the first in a series of posts that examine the common measurements used to improve product development flow.
Product Development Flow
In software product development, "flow" refers to the smooth and continuous movement of the work needed to take incremental improvements to an existing software product through the development pipeline from ideation to adoption.
The goal is to create a development process that is efficient and predictable so that we can focus on maximizing the value delivered to users via those improvements.
Without the efficient flow of work, it is hard to focus on delivering value effectively, regardless of how you define value.
This is why improving flow is a concern in so many different contexts.
The DevOps program from DORA emphasizes improving system-level flow as one of the primary principles of DevOps.
Organizational design techniques like Team Topologies aim to “optimize team interactions for fast flow.”
At the team execution level, Scrum teams are starting to integrate techniques from Kanban.
Even SAFe, the most catholic of process frameworks, has belatedly made “making value flow without interruptions” a fundamental principle in SAFe 6.0.
So it’s clear that the notion of improving product development flow is having its moment in the spotlight.
Why measure flow?
Simple - everyone talks about solutions that promise to improve flow, but there is relatively less focus on how one can verify whether these promises hold up via rigorous measurement.
Most well-understood solutions needed to improve flow, such as visual management, smaller batch sizes, limiting WIP, etc., are grounded in sound theory.
Therefore, in theory, they “should” work.
But do they work on the ground for a specific company or team?
Especially across a system where there are many teams and complex dependencies?
What happens when they don’t?
How do you know?
There is often a big gap between what should happen in theory and what does happen, and it is hard to tell why without accurate flow measurements.
Measuring flow
Flow metrics have become very popular recently, with Dr. Mik Kersten’s Flow Framework®, the most influential example of incorporating rigorous flow measurements into a larger framework for value stream management.
However, most lay users, and even many knowledgeable practitioners, are not well-versed in the underlying principles behind these metrics.
As a result, they are often misused and misunderstood.
Because these metrics were initially applied to improve flow in the production of physical goods, some claim that they don’t apply to “knowledge work.”
Because they are incorrectly used as “performance metrics,” others claim they can be harmful. (To which the simple answer is, don’t. They are not performance metrics.)
Because they are often associated with Lean Software development techniques such as the Kanban method, they are considered “Lean” metrics.
Additionally, there are many different names for the same metrics, and it is not often clear how the flow metrics in one framework relate to ones in the other unless you pay careful attention to the details.
There is also some scope creep as frameworks like SAFe add less rigorously developed metrics like “flow predictability” into the mix they call “flow metrics.”
Our goal in this series of posts is to give some context and explain what flow metrics mean from first principles and how they evolved, so you can learn how to use them carefully to make the right decisions safely and recognize the connections between the “popular” versions of the flow metrics and the underlying theories they implement.
First, we will distinguish between the abstract properties of “flow” in a system and concrete techniques of measuring them in the real world.
This will let us separate how to use the measurements as reasoning tools from the practical complexities of measuring them in a specific context.
Many of the challenges with flow measurement are in implementing measures via tools, and they are worth considering and acknowledging explicitly.
To make this distinction, we will lean on the GSM Framework.
Goals, Signals, and Metrics
Ciera Jaspan’s article Measuring engineering productivity at Google, also available in the great book Software Engineering at Google, introduced me to the Goals, Signals, and Metrics framework they use for disciplined measurement at Google.
Here is an excerpt from the article that describes the framework.
At Google, we use the Goals/Signals/Metrics (GSM) framework to guide metrics creation.
A goal is a desired end result. It’s phrased in terms of what you want to understand at a high level and should not contain references to specific ways to measure it.
A signal is how you might know that you’ve achieved the end result. Signals are things we would like to measure, but they might not be measurable themselves.
A metric is a proxy for a signal. It is the thing we can measure. It might not be the ideal measurement, but it is something that we believe is close enough.
The GSM framework encourages several desirable properties when creating metrics. First, by creating goals first, then signals, and finally metrics, it prevents the streetlight effect…..
Excerpted from Measuring Engineering Productivity (there is much more in that article that I highly recommend reading it all!)
The beauty of the GSM framework is that it puts explicit structure around how you should measure things.
The measurement goals are often qualitative, like “improving developer productivity” or “improving product development flow.”
Turning them into measurable quantities is approached as a two-stage process, first defining abstract signals.
Signals need to have a coherent hypothesis, or theory, for why they measure what they claim to measure, but we don't need to define how the measurement is done or whether we can even do the measurements in practice.
If you don’t have a sound theory that connects your measurement goals to your signals, then your signal is likely just noise and has no meaning. This is an area where many measurements we tout in software fall flat.
In the GSM framework, a metric is a concrete measurement tied to a measuring system or tool. It can often only be viewed as approximate values for the underlying signals.
If your signal connects the measurement to the goal, the key desirable property of the metric is that it accurately measures the signal so that your inferences based on your theories are valid.
Applying this disciplined framework lets you separate the question of the validity or utility of a measurement (signal) from the accuracy of the techniques used to measure it.
So measurement becomes a matter of developing new, useful signals to characterize improvements toward a goal and more accurate techniques to measure those signals.
Let’s look at flow measurement through this lens.
Flow Signals
In our context, the goal is to measure “product development flow,” which we described informally as the “smooth and continuous movement of work in the product development pipeline.” This is the abstract construct we want to measure.
In our case, we have a precise mathematical theory that defines the flow signals we need to measure to characterize this property of systems. These come from queueing theory. You are likely to be familiar with some of them already:
Lead Time: This is the time it takes for a work item to move through the entire development process, from ideation to deployment.
Cycle Time: This is the time it takes for a work item to move through a specific stage of the development process, typically development/engineering cycle time, time to do code reviews, etc.
WIP (Work in Progress): This refers to the number of work items actively being worked on at any given time.
WIP Age: This refers to the age of the work items that are in progress, as opposed to those that have already been completed.
Throughput: This is the rate at which work items are completed and delivered.
Flow Efficiency: This measures the percentage of time that work items are actively being worked on, as opposed to waiting in queues or being blocked by dependencies.
We call these flow signals because we want to distinguish between the signal and the underlying techniques we use to measure these numbers - the flow metrics.
As signals, Little’s Law and related tools from queueing theory define precise mathematical relationships between groups of these measurements that can be used as a process and framework-agnostic definitions of certain aspects of flow and show you how to improve them.
For example, Little’s Law tells us that no matter how complex a system is, whether it is in stable flow can be determined by examining the relationship between the average lead/cycle time and the corresponding values of average WIP and throughput.
The relationships of Little’s Law and its implications are discussed in much more detail in my article, The Iron Triangle of Flow.
So, in this case, we will treat “Stability” as one of the goals to improve flow and use queueing theory to define our signals.
The critical point is that queuing theory tells us the precise things to measure to characterize “Stability.” It also gives techniques to reason about these numbers to determine what we need to do to create a stable flow through the system.
Since proven mathematical theorems back these relationships, we don’t need other techniques to ensure their validity. But, you must carefully consider the assumptions behind those theorems and ensure they apply in your context.
All this, of course, provided we can measure the flow signals accurately.
This is where the flow metrics come in.
Flow metrics
Flow metrics are proxies for the flow signals, measured using some instrumentation on work moving through the product development pipeline. In most contexts, we measure these by instrumenting developer tools like work tracking systems and code repositories for activities as work progresses through the pipeline.
We’ll have much more to say about various techniques to measure flow and their pros and cons. Still, the biggest challenge with using flow metrics using data extracted from these tools is that they are often wildly inaccurate proxies for the underlying signals.
This is worth keeping in front of mind when dealing with them.
For example, when we measure the average lead time signal, we expect this number to measure the time it takes for work to move from ideation to production (say). The flow metric we use to proxy the signal may be defined as the time a ticket in the work tracking system was marked ready by a product manager to the point that ticket was marked as being deployed in production by a release team.
The measurement error in the metric relative to the underlying signal can be significant. It depends upon when the activity was recorded by people all along the activity chain, as opposed to when it happened in the real world.
In particular, it is not unusual for engineering cycle time metrics measured from work tracking systems, for example, to be off by a factor of 2-5x or more compared to the underlying signals (we’ll talk in a later post about how we can estimate this error, and in the process, arrive at better ways of measuring engineering cycle time).
Flow efficiency metrics from most commercial tools also suffer significantly from this problem, often reporting much higher flow efficiencies (10x is not unusual) than the underlying signal.
And if your metrics don’t track the signals, then the underlying theories don’t hold either, and you could be making the wrong conclusions and decisions based on them.
Most people encounter flow metrics in this highly error-prone and polluted form, for example, the out-of-the-box charts measuring cycle times in a tool like Jira or your naive homegrown ticket analysis system that spits out what looks like ridiculously wrong cycle times for anyone who examines the underlying data (did that story take 30 seconds to go from in-progress to complete? ).
Even very sophisticated and expensive enterprise tools suffer from these measurement errors.
As a result, if you are making critical decisions on how and where you allocate your flow improvement investments based on these tools without quantifying their accuracy, you are likely throwing good money down the drain and don’t even realize it.
In general, you have two camps: folks who identify the poor accuracy of flow metrics as implemented with the underlying signals and claim that the signals are irrelevant because the metrics can’t be trusted, and folks who blindly use the metrics to make critical decisions without realizing how far off they are from the underlying signals.
Both camps are missing the point.
The underlying issue is not in the validity of the flow signals to help make impactful decisions. Instead, we need accurate ways of proxying the flow signals with better flow metrics.
This is a fixable problem, and in the coming posts, we will discuss how we can drastically improve the accuracy of flow metrics relative to their underlying signals by simply being more careful about how we compute them and verifying that they more closely track the underlying flow signals.
Key takeaways so far:
We want to develop accurate techniques to measure product development flow because they can be used to determine whether process improvements that claim to deliver impact by improving flow do so.
There are a set of measurements, the flow signals, which are mathematically proven to characterize systems with stable flow. We can use these signals to assess whether any given set of process changes has the desired impact on flow. These are tools we can use to reason about flow.
Flow metrics are proxies for these signals, but many current flow metrics are wildly inaccurate and, thus, should be used cautiously in making decisions using flow metrics.
We need better flow metrics.
Next up.
In the next post, we will examine the flow signals more carefully and see how we can use them to reason about and improve flow.
This is mainly a matter of understanding how to properly apply the tools from queuing theory, so visiting the Iron Triangle of Flow post for background beforehand is worthwhile.
We won’t use much math for the most part, except for some very elementary calculations, so don’t be put off by that.
Once we understand how to use these flow signals this way, we can separately address the problem of improving the quality of the flow metrics that track these signals more accurately.
When we conflate the two, we confuse the signal with the metric and often throw the baby out with the bathwater.
Additional Reading
If you are interested in discovering more about product development flow, here are a few books I highly recommend. They directly influenced many of the things you will see in my writing.
Principles of Product Development Flow, Don Reinersten. This is the granddaddy of them all. A dense read, but it is the best reference book for most of the ideas discussed here. While this is written from the perspective of developing engineered physical products, the underlying principles can be adapted to software development.
The Book of TameFlow, Steve Tendon. A much more down-to-earth and practical book that helps you develop powerful mental models about flow and builds your intuition for how to use them to solve day-to-day problems. The book also introduces several practical tools you can use to improve the flow of your existing processes.
The Phoenix Project, Gene Kim. Written in the style of a business novel, this is “The Goal” for the software industry. This is an excellent introduction to the Three Ways of DevOps and the Theory of Constraints in an accessible form.
Sooner Safer Happier: Antipatterns and Patterns for Business Agility, Jon Smart. This book approaches the problem of improving flow by identifying a set of antipatterns that inhibit flow. A helpful set of tools you can use every day to identify and avoid patterns of behavior that lead to poor product development flow.
Project to Product: How to Survive and Thrive in the Age of Digital Disruption with the Flow Framework, Dr. Mik Kersten. This book introduced the Flow Framework®, a systematic approach to managing flow using Flow Metrics. The ideas here are implemented in the Planview TaskTop product, and in a sense, this is an ideal reference point to look at the distinction between flow signals and flow metrics.
Very solid post. Thank you for sharing it. We also write about DevOps, you can check out one of our articles here https://www.metridev.com/metrics/test-effectiveness-metrics-how-to-optimize-your-testing-strategy/