13 Comments
User's avatar
Tristan Slominski's avatar

First, this is an excellent post, as is the series. I'm thoroughly enjoying it.

A comment regarding footnote 26: "This is important because traditional techniques for detecting routine and exceptional variation like xMR charts assume that the average is stationary. You cannot apply the traditional Western Electric formulas directly to non-stationary data (See Wheeler - Appendix 3) (...)"

Much as Little's Law has been misunderstood and your exposition highlights we can put it to use in circumstances typically dismissed, I feel the same applies here, where XmR chart is misunderstood.

Appendix 3 does not state that XmR cannot be applied to non-stationary data. Appenix 3, para 3 states:

> When a process is operated unpredictably the process behavior chart will detect the presence of the assignable causes. Each and every signal on a process behavior chart represents an opportunity to gain more insight into your process (includes a figure with a point outside Natural Process Limit labeled "Evidence of Assignable Cause").

My interpretation of Appendix 3 is that when you have non-stationary data, an XmR will show you data points with assignable causes. And it points out that investigating those data points is worthwhile as it will highlight assignable causes of variability. This seems worthwhile if one would like to reduce variability.

If one uses XmR not as control charts, but as signal detection charts (I think this is a more apt use in non-stationary data like product development), I think they are what enables us to know when a squiggly plot of Sample Path Behavior is stable (XmR shows no assignable cause), meta stable (XmR shows process change every now and then), divergent (XmR never settles), etc.

Expand full comment
Krishna Kumar's avatar

Tristan, BTW if you are interested in trying out the ideas in this post I am building these out under various repos in The Presence Calculus Project on Github.

https://github.com/presence-calculus

The samplepath-flow repo implements the things we talk about in this post. That one is an end-user usable state. The rest are all in various states of shambles/explorations.

Expand full comment
Tristan Slominski's avatar

Oh cool. I'll check it out after I catch up on the rest of the series 👍.

Expand full comment
Krishna Kumar's avatar

Thank you Tristan! I think the question of whether or not you should use xMR charts on a non-stationary process is quite a subtle one. I think I am taking a somewhat extreme position here for sure, but I also don't buy the idea that every point outside the natural process limits on any xMR chart is worth analyzing.

The question is not so much whether or not you should use xMR charts or whether they are are useful (they are), but *when* they become useful.

My position is that they are most useful when you have established that the first moment of the distribution is quasi-stable (and ideally stationary) and you are focusing on detecting changes in the second moment (variance).

This was the context in which these techniques were originally developed and the general philosphy towards variation upon which it is built (the assumption is that all variation is bad and must be eliminated).

This is reflected in the fact the the average line is always shown in these charts as a flat line against time.

My larger argument is that before you throw a process behavior chart at a problem. you should first establish what your goals for identifying routine and exceptional variation are.

Sample path analysis gives you the tools to establish those more rigorously in case of non-stationary processes. If you are operating in a stable or quasi-stable regime, where we have a rough distribution "shape" to work with, then for sure you can use PBCs for fine tuning your analysis of variability.

But before then these larger scale moves at the distributional level will mask whatever insights your PBCS might throw up - especially if the observation intervals are small relative to the time it takes the sample path averages to converge (assuming they do).

This is particularly true for any distribution involving process time - which is a really special beast and where it is really hard to reason about the cause of variability before you have established convergence and coherence of that parameter.

My posts on process time in this series go into more details here.

Expand full comment
Tristan Slominski's avatar

A more thorough reference why non-stationary data is OK: https://x.com/tristanls/status/1745661787079422018/photo/1

Expand full comment
Krishna Kumar's avatar

The basic philosophy of sample path analysis is that we dont need to make any assumptions about distributions at all to reason about cause and effect. The finite form of Little's Law is a much more powerful tool when dealing with non-stationary process since it is not based on statistical reasoning at all - but rather on determinisic cause and effect rules that work even in stochastic processes.

It's highly counter-intuitive, but its why I think this is such an under-appreciated set of techniques that have huge relevance to a whole host of analysis problems in software development.

Expand full comment
Tristan Slominski's avatar

I think sample path analysis is great. I'm definitely going to use it extensively.

Expand full comment
Krishna Kumar's avatar

Tristan - this one seems to be talking about normal vs non-normal distributions. Stationarity is a different beast. Strict stationarity requries that none of the moments of the distribution change over time. The averge is the first moment.

Even if the distribution changes, provided the average value does not change the techniques should apply.

Sample path analysis is for the case when even the average of the disribution has not settled down, and may never settle down no longer how long you observe the process.

Expand full comment
Tristan Slominski's avatar

Fair point, normalness is orthogonal to stationarity.

The claim I am attempting to make is that XmR _does not assume_ stationarity. In fact, it can be a detector of non-stationarity (although upon a cursory search there are better ones).

The highlight of there not needing to be a normal distribution to use an XmR implies that the distribution can be unknown.

So, combining these two, I think XmR can be useful to detect non-stationarity (signals) on unknown distributions. In fact, that's how I would go about detecting the actual boundaries of the regions A, B, C, D, E in your example.

Expand full comment
Krishna Kumar's avatar

I think it requires a stationary first moment. None of the other calculations make sense if the average is changing over time. The upper and lower process limits are based on that stationary average, right?

Again if your sample path has converged, or is stable within small bounds then there is no issue here.

I dont think you can use these charts to detect to detect those regions i my example, you'd be better off measuring the first derivatives of the actual sample path to detect changes.

This is a key philosophical difference - we are using real analysis over time rather than using distributional properties.

Statistical distributions marginalize time out of data. Sample path analysis IS about reasoning explicitly about process behavior over time.

For example: think of how people do technical analysis on stock prices. This is more like how sample path analysis techniques work. We assume that the signal is in the actual process trajectory over time, and not in any statistical summary.

We can do this in software because we can actually observe the full signal and don't need to rely on statistical sampling to get at the truth.

The world in which PBCs were developed could not make that assumption.

Expand full comment
Tristan Slominski's avatar

Figure 5.13 (Wheeler's Understanding Variation) shows a "trended" XmR where the average goes up and to the right.

Practically speaking, when I see that sort of thing, I start tracking the change in X on the XmR as the new X value (e.g., X = change in followers on social media account).

I think that the use of moving range (as opposed to variance) is what keeps XmRs relevant for real analysis over time, as a moving range is not a distributional property (if I understand correctly). But, I think we're at the limit of my knowledge here :).

Expand full comment
Tristan Slominski's avatar

Not sure if relevant, but in case this context is useful, the only utility I get out of XmR is qualitative, in form of the presence of these signals: something unusual, no change, doing better, doing worse.

Expand full comment